You are browsing a read-only backup copy of Wikitech. The live site can be found at wikitech.wikimedia.org

Server Admin Log

From Wikitech-static
Revision as of 23:50, 6 May 2021 by imported>Stashbot (brennen@deploy1002: rebuilt and synchronized wikiversions files: Rollback group1 and group2 to 1.37.0-wmf.3 (T282193))
Jump to navigation Jump to search

2021-05-06

  • 23:50 brennen@deploy1002: rebuilt and synchronized wikiversions files: Rollback group1 and group2 to 1.37.0-wmf.3 (T282193)
  • 22:52 legoktm: upgrading mailman3 and hyperkitty on lists1001 (T282092)
  • 22:11 brennen@deploy1002: Synchronized php-1.37.0-wmf.4/includes/specials/SpecialWatchlist.php: Backport: Reorder tables in SpecialWatchlist (T282181) (duration: 00m 57s)
  • 21:48 legoktm: upgraded mailman3 and hyperkitty on lists1002 (T282092)
  • 21:46 legoktm: uploaded new mailman3 and hyperkitty packages to apt.wm.o (T282092)
  • 21:11 hashar: restarted CI Jenkins due to T281737
  • 19:05 brennen@deploy1002: rebuilt and synchronized wikiversions files: all wikis to 1.37.0-wmf.4
  • 19:04 ejegg: updated fundraising CiviCRM from 8034e47008 to 2052d79248
  • 18:58 otto@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Migrate WikidataCompletionSearchClicks to event platform on all wikis (T282140) (duration: 01m 04s)
  • 18:55 urbanecm@deploy1002: Synchronized wmf-config/Wikibase.php: 338d1df: Wikibase: Use wikidataclient-test dblist for testwikidata localClientDatabases (T282160) (duration: 01m 05s)
  • 18:46 urbanecm@deploy1002: Synchronized wmf-config/Wikibase.php: 7e21cf0: NO-OP: Wikibase: Use wikidataclient dblist directly for repo localClientDatabases (T282160) (duration: 01m 04s)
  • 18:31 otto@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Declare WikidataCompletionSearchClicks stream and migrate on testwiki - T282140 (duration: 01m 06s)
  • 17:59 volans@cumin2001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for cumin1001.eqiad.wmnet
  • 17:59 volans@cumin2001: START - Cookbook sre.hosts.remove-downtime for cumin1001.eqiad.wmnet
  • 17:47 volans@cumin2001: END (FAIL) - Cookbook sre.hosts.remove-downtime (exit_code=99) for cumin1001.eqiad.wmnet
  • 17:47 volans@cumin2001: START - Cookbook sre.hosts.remove-downtime for cumin1001.eqiad.wmnet
  • 17:35 jgiannelos@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'proton' for release 'production' .
  • 17:33 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'proton' for release 'production' .
  • 17:27 bblack@cumin1001: conftool action : set/pooled=no; selector: name=cp203[34].codfw.wmnet
  • 17:20 jgiannelos@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'proton' for release 'production' .
  • 17:15 volans: upgrade spicerack on cumin* to 0.0.52
  • 17:15 ryankemper: [Elastic] Set `elastic2043` as the only banned node in Cirrussearch Elasticsearch clusters (`elastic2058-production-search-codfw`, `elastic2058-production-search-omega-codfw`, `elastic2058-production-search-psi-codfw`)
  • 17:13 papaul: powerdown ms-be2057 for relocation
  • 17:13 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'proton' for release 'production' .
  • 17:12 volans: uploaded spicerack_0.0.52 to apt.wikimedia.org buster-wikimedia,bullseye-wikimedia
  • 17:00 papaul: powerdown elastic2058 for relocation
  • 16:43 vgutierrez: Enforce Puppet Internal CA validation on trafficserver@ulsfo - T281673
  • 16:12 papaul: powerdown mc-gp2002 for relocation
  • 16:09 ryankemper: [Elastic] Set `elastic2058` as the only banned node in Cirrussearch Elasticsearch clusters (`elastic2058-production-search-codfw`, `elastic2058-production-search-omega-codfw`, `elastic2058-production-search-psi-codfw`)
  • 15:58 Amir1: starting upgrade of public mailing lists in group d and e (T280322)
  • 15:50 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs1012.eqiad.wmnet with reason: REIMAGE
  • 15:47 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs1012.eqiad.wmnet with reason: REIMAGE
  • 15:42 papaul: powerdown logstash2027 for relocation
  • 15:41 mvolz@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'zotero' for release 'production' .
  • 15:40 ryankemper@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) reboot without plugin upgrade (3 nodes at a time) for ElasticSearch cluster search_codfw: codfw reboot - ryankemper@cumin1001 - T280563
  • 15:34 XioNoX: push cloud-gw-transport-eqiad to asw2-b-eqiad and cloudsw
  • 15:33 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation reboot without plugin upgrade (3 nodes at a time) for ElasticSearch cluster search_codfw: codfw reboot - ryankemper@cumin1001 - T280563
  • 15:32 ryankemper: T280382 `sudo -i wmf-auto-reimage-host -p T280382 wdqs1012.eqiad.wmnet` on `ryankemper@cumin1001` tmux session `reimage`
  • 15:32 ryankemper: T280382 `sudo -i wmf-auto-reimage-host -p T280382 wdqs2003.codfw.wmnet` on `ryankemper@cumin1001` tmux session `reimage`
  • 15:31 mvolz@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'zotero' for release 'staging' .
  • 15:29 cdanis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:05:00 on cumin1001.eqiad.wmnet with reason: quiz
  • 15:29 cdanis@cumin1001: START - Cookbook sre.hosts.downtime for 0:05:00 on cumin1001.eqiad.wmnet with reason: quiz
  • 15:26 ryankemper: T280382 [WDQS] Pooled `wdqs1007` and `wdqs2004`
  • 15:26 ryankemper: T280382 `wdqs2004.codfw.wmnet` has been re-imaged and had the appropriate wikidata/categories journal files transferred. `df -h` shows disk space is no longer an issue following the switch to `raid0`: `/dev/md2 2.6T 998G 1.5T 40% /srv`
  • 15:26 ryankemper: T280382 `wdqs1007.eqiad.wmnet` has been re-imaged and had the appropriate wikidata/categories journal files transferred. `df -h` shows disk space is no longer an issue following the switch to `raid0`: `/dev/md2 2.6T 998G 1.5T 40% /srv`
  • 15:20 mvolz@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'citoid' for release 'production' .
  • 15:16 mvolz@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'citoid' for release 'production' .
  • 15:14 papaul: powerdown ms-be2053 for relocation
  • 15:10 moritzm: imported wmfbackups 0.5+deb11u1 for bullseye-wikimedia to apt.wikimedia.org
  • 15:07 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 9 hosts with reason: T270704
  • 15:06 aborrero@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 9 hosts with reason: T270704
  • 15:06 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 105 hosts with reason: T270704
  • 15:06 aborrero@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 105 hosts with reason: T270704
  • 15:06 mvolz@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'citoid' for release 'staging' .
  • 15:05 moritzm: imported wmfmariadbpy 0.6+deb11u1 for bullseye-wikimedia to apt.wikimedia.org
  • 14:55 papaul: powerdown kafka-main2002 for relocation
  • 14:30 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1113:3315', diff saved to https://phabricator.wikimedia.org/P15833 and previous config saved to /var/cache/conftool/dbconfig/20210506-143002-marostegui.json
  • 14:09 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1113:3315 for schema change', diff saved to https://phabricator.wikimedia.org/P15829 and previous config saved to /var/cache/conftool/dbconfig/20210506-140916-marostegui.json
  • 13:37 marostegui@cumin1001: dbctl commit (dc=all): 'db1144:3315 (re)pooling @ 100%: Repool db1144:3315', diff saved to https://phabricator.wikimedia.org/P15828 and previous config saved to /var/cache/conftool/dbconfig/20210506-133738-root.json
  • 13:22 marostegui@cumin1001: dbctl commit (dc=all): 'db1144:3315 (re)pooling @ 75%: Repool db1144:3315', diff saved to https://phabricator.wikimedia.org/P15827 and previous config saved to /var/cache/conftool/dbconfig/20210506-132234-root.json
  • 13:21 XioNoX: push pfw policies - T281942
  • 13:07 marostegui@cumin1001: dbctl commit (dc=all): 'db1144:3315 (re)pooling @ 50%: Repool db1144:3315', diff saved to https://phabricator.wikimedia.org/P15826 and previous config saved to /var/cache/conftool/dbconfig/20210506-130730-root.json
  • 12:52 marostegui@cumin1001: dbctl commit (dc=all): 'db1144:3315 (re)pooling @ 25%: Repool db1144:3315', diff saved to https://phabricator.wikimedia.org/P15825 and previous config saved to /var/cache/conftool/dbconfig/20210506-125226-root.json
  • 11:44 hnowlan@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts eventlog1002.eqiad.wmnet
  • 11:35 mlitn@deploy1002: Synchronized wmf-config: Config: Enable Extension:MediaSearch on betacommons (T265939) (duration: 01m 06s)
  • 11:34 mlitn@deploy1002: sync-file aborted: Config: Enable Extension:MediaSearch on betacommons (T265939) (duration: 00m 56s)
  • 11:34 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1173.eqiad.wmnet with reason: REIMAGE
  • 11:31 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1173.eqiad.wmnet with reason: REIMAGE
  • 11:30 hnowlan@cumin1001: START - Cookbook sre.hosts.decommission for hosts eventlog1002.eqiad.wmnet
  • 11:28 hnowlan@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=99) for hosts eventlog1002.eqiad.wmnet
  • 11:27 hnowlan@cumin1001: START - Cookbook sre.hosts.decommission for hosts eventlog1002.eqiad.wmnet
  • 11:23 wmde-fisch@deploy1002: Synchronized wmf-config/CommonSettings.php: Config: Enable ReferencePreviews as full default on pilot wikis (T271206) (duration: 01m 06s)
  • 11:22 wmde-fisch@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Enable ReferencePreviews as full default on pilot wikis (T271206) (duration: 01m 06s)
  • 11:12 kormat@cumin1001: dbctl commit (dc=all): 'db1173 depooling: Reimage to buster T280751', diff saved to https://phabricator.wikimedia.org/P15824 and previous config saved to /var/cache/conftool/dbconfig/20210506-111256-kormat.json
  • 11:12 kormat: reimaging db1173 to buster T280751
  • 10:59 volans: upgrading spicerack on cumin hosts to 0.0.51-1
  • 10:59 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1144:3315 for schema change', diff saved to https://phabricator.wikimedia.org/P15823 and previous config saved to /var/cache/conftool/dbconfig/20210506-105909-marostegui.json
  • 10:58 marostegui@cumin1001: dbctl commit (dc=all): 'db1110 (re)pooling @ 100%: Repool db1110', diff saved to https://phabricator.wikimedia.org/P15822 and previous config saved to /var/cache/conftool/dbconfig/20210506-105850-root.json
  • 10:43 marostegui@cumin1001: dbctl commit (dc=all): 'db1110 (re)pooling @ 75%: Repool db1110', diff saved to https://phabricator.wikimedia.org/P15821 and previous config saved to /var/cache/conftool/dbconfig/20210506-104346-root.json
  • 10:28 marostegui@cumin1001: dbctl commit (dc=all): 'db1110 (re)pooling @ 50%: Repool db1110', diff saved to https://phabricator.wikimedia.org/P15820 and previous config saved to /var/cache/conftool/dbconfig/20210506-102842-root.json
  • 10:19 jynus: stop dbprov2002 in advance of maintenance T281135
  • 10:13 marostegui@cumin1001: dbctl commit (dc=all): 'db1110 (re)pooling @ 25%: Repool db1110', diff saved to https://phabricator.wikimedia.org/P15819 and previous config saved to /var/cache/conftool/dbconfig/20210506-101339-root.json
  • 09:55 akosiaris@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'linkrecommendation' for release 'external' .
  • 09:55 akosiaris@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'linkrecommendation' for release 'internal' .
  • 09:50 akosiaris@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'linkrecommendation' for release 'external' .
  • 09:50 akosiaris@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'linkrecommendation' for release 'internal' .
  • 09:45 akosiaris@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'linkrecommendation' for release 'staging' .
  • 09:22 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1110 for schema change', diff saved to https://phabricator.wikimedia.org/P15818 and previous config saved to /var/cache/conftool/dbconfig/20210506-092217-marostegui.json
  • 09:18 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3315 (re)pooling @ 100%: Repool db1096:3315', diff saved to https://phabricator.wikimedia.org/P15817 and previous config saved to /var/cache/conftool/dbconfig/20210506-091818-root.json
  • 09:03 elukey: sudo apt-get remove linux-image-4.19.0-11-amd64 linux-image-4.19.0-9-amd64 linux-image-4.19.0-13-amd64 on ping[123]001 host to free some space (tiny root partition, these are old kernels)
  • 09:03 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3315 (re)pooling @ 75%: Repool db1096:3315', diff saved to https://phabricator.wikimedia.org/P15816 and previous config saved to /var/cache/conftool/dbconfig/20210506-090315-root.json
  • 08:48 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3315 (re)pooling @ 50%: Repool db1096:3315', diff saved to https://phabricator.wikimedia.org/P15815 and previous config saved to /var/cache/conftool/dbconfig/20210506-084811-root.json
  • 08:47 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1087 db1167', diff saved to https://phabricator.wikimedia.org/P15814 and previous config saved to /var/cache/conftool/dbconfig/20210506-084754-marostegui.json
  • 08:44 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1087 and db1167 to switch sanitarium masters', diff saved to https://phabricator.wikimedia.org/P15813 and previous config saved to /var/cache/conftool/dbconfig/20210506-084443-marostegui.json
  • 08:39 marostegui@cumin1001: dbctl commit (dc=all): 'db1160 (re)pooling @ 100%: Repool db1160', diff saved to https://phabricator.wikimedia.org/P15812 and previous config saved to /var/cache/conftool/dbconfig/20210506-083910-root.json
  • 08:33 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3315 (re)pooling @ 25%: Repool db1096:3315', diff saved to https://phabricator.wikimedia.org/P15811 and previous config saved to /var/cache/conftool/dbconfig/20210506-083307-root.json
  • 08:27 ariel@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts snapshot1007.eqiad.wmnet
  • 08:24 marostegui@cumin1001: dbctl commit (dc=all): 'db1160 (re)pooling @ 75%: Repool db1160', diff saved to https://phabricator.wikimedia.org/P15810 and previous config saved to /var/cache/conftool/dbconfig/20210506-082406-root.json
  • 08:23 moritzm: imported wikimedia-lvs-realserver to apt.wikimedia.org/bullseye T275873
  • 08:18 ariel@cumin1001: START - Cookbook sre.hosts.decommission for hosts snapshot1007.eqiad.wmnet
  • 08:16 ariel@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts snapshot1006.eqiad.wmnet
  • 08:09 marostegui@cumin1001: dbctl commit (dc=all): 'db1160 (re)pooling @ 50%: Repool db1160', diff saved to https://phabricator.wikimedia.org/P15809 and previous config saved to /var/cache/conftool/dbconfig/20210506-080902-root.json
  • 08:06 ariel@cumin1001: START - Cookbook sre.hosts.decommission for hosts snapshot1006.eqiad.wmnet
  • 08:04 ariel@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts snapshot1005.eqiad.wmnet
  • 07:54 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1096:3315 for schema change', diff saved to https://phabricator.wikimedia.org/P15808 and previous config saved to /var/cache/conftool/dbconfig/20210506-075416-marostegui.json
  • 07:53 marostegui@cumin1001: dbctl commit (dc=all): 'db1160 (re)pooling @ 25%: Repool db1160', diff saved to https://phabricator.wikimedia.org/P15807 and previous config saved to /var/cache/conftool/dbconfig/20210506-075359-root.json
  • 07:47 jynus: shutting down and removing db2098:s3 instance
  • 07:47 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1160 for schema change', diff saved to https://phabricator.wikimedia.org/P15806 and previous config saved to /var/cache/conftool/dbconfig/20210506-074746-marostegui.json
  • 07:45 ariel@cumin1001: START - Cookbook sre.hosts.decommission for hosts snapshot1005.eqiad.wmnet
  • 07:29 vgutierrez: Enforce Puppet Internal CA validation on trafficserver@cp[4026,4032] - T281673
  • 07:26 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 07:24 moritzm: installing exim security updates on bullseye hosts
  • 07:24 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 06:40 marostegui@cumin1001: dbctl commit (dc=all): 'db1112 (re)pooling @ 100%: Repool db1112 after checking its tables', diff saved to https://phabricator.wikimedia.org/P15805 and previous config saved to /var/cache/conftool/dbconfig/20210506-064020-root.json
  • 06:29 marostegui@cumin1001: dbctl commit (dc=all): 'db1079 (re)pooling @ 100%: Repool db1079', diff saved to https://phabricator.wikimedia.org/P15804 and previous config saved to /var/cache/conftool/dbconfig/20210506-062931-root.json
  • 06:29 marostegui@cumin1001: dbctl commit (dc=all): 'db1158 (re)pooling @ 100%: Repool db1158', diff saved to https://phabricator.wikimedia.org/P15803 and previous config saved to /var/cache/conftool/dbconfig/20210506-062915-root.json
  • 06:25 marostegui@cumin1001: dbctl commit (dc=all): 'db1112 (re)pooling @ 75%: Repool db1112 after checking its tables', diff saved to https://phabricator.wikimedia.org/P15802 and previous config saved to /var/cache/conftool/dbconfig/20210506-062516-root.json
  • 06:20 elukey: apt-get clean on ping[1,2,3]001 to free some space
  • 06:14 marostegui@cumin1001: dbctl commit (dc=all): 'db1079 (re)pooling @ 75%: Repool db1079', diff saved to https://phabricator.wikimedia.org/P15801 and previous config saved to /var/cache/conftool/dbconfig/20210506-061427-root.json
  • 06:14 marostegui@cumin1001: dbctl commit (dc=all): 'db1158 (re)pooling @ 75%: Repool db1158', diff saved to https://phabricator.wikimedia.org/P15800 and previous config saved to /var/cache/conftool/dbconfig/20210506-061411-root.json
  • 06:10 marostegui@cumin1001: dbctl commit (dc=all): 'db1112 (re)pooling @ 50%: Repool db1112 after checking its tables', diff saved to https://phabricator.wikimedia.org/P15799 and previous config saved to /var/cache/conftool/dbconfig/20210506-061012-root.json
  • 06:01 ryankemper: T280382 `sudo -i cookbook sre.wdqs.data-transfer --source wdqs1011.eqiad.wmnet --dest wdqs1007.eqiad.wmnet --reason "transferring fresh wikidata journal following reimage" --blazegraph_instance blazegraph` on `ryankemper@cumin1001` tmux session `reimage`
  • 06:00 ryankemper: T280382 `sudo -i cookbook sre.wdqs.data-transfer --source wdqs2008.codfw.wmnet --dest wdqs2004.codfw.wmnet --reason "transferring fresh wikidata journal following reimage" --blazegraph_instance blazegraph` on `ryankemper@cumin1001` tmux session `reimage`
  • 06:00 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 05:59 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 05:59 marostegui@cumin1001: dbctl commit (dc=all): 'db1079 (re)pooling @ 50%: Repool db1079', diff saved to https://phabricator.wikimedia.org/P15798 and previous config saved to /var/cache/conftool/dbconfig/20210506-055923-root.json
  • 05:59 marostegui@cumin1001: dbctl commit (dc=all): 'db1158 (re)pooling @ 50%: Repool db1158', diff saved to https://phabricator.wikimedia.org/P15797 and previous config saved to /var/cache/conftool/dbconfig/20210506-055907-root.json
  • 05:55 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1083 T281445', diff saved to https://phabricator.wikimedia.org/P15796 and previous config saved to /var/cache/conftool/dbconfig/20210506-055535-marostegui.json
  • 05:55 marostegui@cumin1001: dbctl commit (dc=all): 'db1112 (re)pooling @ 25%: Repool db1112 after checking its tables', diff saved to https://phabricator.wikimedia.org/P15795 and previous config saved to /var/cache/conftool/dbconfig/20210506-055509-root.json
  • 05:44 marostegui@cumin1001: dbctl commit (dc=all): 'db1079 (re)pooling @ 25%: Repool db1079', diff saved to https://phabricator.wikimedia.org/P15794 and previous config saved to /var/cache/conftool/dbconfig/20210506-054419-root.json
  • 05:44 marostegui@cumin1001: dbctl commit (dc=all): 'db1158 (re)pooling @ 25%: Repool db1158', diff saved to https://phabricator.wikimedia.org/P15793 and previous config saved to /var/cache/conftool/dbconfig/20210506-054404-root.json
  • 05:43 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 05:43 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 05:38 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 05:38 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1079 and db1158 to switch sanitarium masters', diff saved to https://phabricator.wikimedia.org/P15792 and previous config saved to /var/cache/conftool/dbconfig/20210506-053801-marostegui.json
  • 05:38 ryankemper: T280382 `sudo -i cookbook sre.wdqs.data-transfer --source wdqs1011.eqiad.wmnet --dest wdqs1007.eqiad.wmnet --reason "transferring fresh categories journal following reimage" --blazegraph_instance categories` on `ryankemper@cumin1001` tmux session `reimage`
  • 05:37 ryankemper: T280382 `sudo -i cookbook sre.wdqs.data-transfer --source wdqs2008.codfw.wmnet --dest wdqs2004.codfw.wmnet --reason "transferring fresh categories journal following reimage" --blazegraph_instance categories` on `ryankemper@cumin1001` tmux session `reimage`
  • 05:37 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 05:32 tstarling@deploy1002: Synchronized php-1.37.0-wmf.4/includes/page/PageReferenceValue.php: fixing T282070 RC/log breakage due to unblocking autoblocks (duration: 01m 09s)
  • 05:27 effie: upgrade scap to 3.17.1-1 - T279695
  • 03:56 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs2004.codfw.wmnet with reason: REIMAGE
  • 03:54 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs1007.eqiad.wmnet with reason: REIMAGE
  • 03:53 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs2004.codfw.wmnet with reason: REIMAGE
  • 03:52 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs1007.eqiad.wmnet with reason: REIMAGE
  • 03:38 ryankemper: T280382 `sudo -i wmf-auto-reimage-host -p T280382 wdqs1007.eqiad.wmnet` on `ryankemper@cumin1001` tmux session `reimage`
  • 03:38 ryankemper: T280382 `sudo -i wmf-auto-reimage-host -p T280382 wdqs2004.codfw.wmnet` on `ryankemper@cumin1001` tmux session `reimage`
  • 03:18 ryankemper: [Elastic] `elastic2043` is ssh unreachable. Power cycling it to bring it briefly back online - if it has the shard it should be able to repair the cluster state. Otherwise I'll have to delete the index for `enwiki_titlesuggest_1620184482` given the data would be unrecoverable
  • 03:08 ryankemper: [Elastic] `ryankemper@elastic2044:~$ curl -H 'Content-Type: application/json' -XPUT http://localhost:9200/_cluster/settings -d '{"transient":{"cluster.routing.allocation.exclude":{"_host": null,"_name": null}'`}}
  • 03:08 ryankemper: [Elastic] Temporarily unbanning `elastic2033` and `elastic2043` from `production-search-codfw` to see if we can get the cluster green again. If it returns to green then we'll ban one node, wait for the shards to redistribute, and then ban the other
  • 03:06 ryankemper: [Elastic] I banned two nodes simultaneously earlier today - if there's an index with only 1 replica, and its primary and replica happened to be on the two nodes I banned, then that would have caused this situation
  • 03:04 ryankemper: [Elastic] It looks like we've got a single missing shard in `production-search-codfw` (port 9200), which is putting the cluster into red status. The cluster won't get back into green status without intervention
  • 02:56 ryankemper@cumin1001: END (ERROR) - Cookbook sre.elasticsearch.rolling-operation (exit_code=97) reboot without plugin upgrade (3 nodes at a time) for ElasticSearch cluster search_codfw: codfw reboot - ryankemper@cumin1001 - T280563
  • 02:55 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation reboot without plugin upgrade (3 nodes at a time) for ElasticSearch cluster search_codfw: codfw reboot - ryankemper@cumin1001 - T280563
  • 00:35 Amir1: sudo service mailman3-web restart

2021-05-05

  • 23:35 ryankemper: T281621 T281327 [Elastic] Banned `elastic2033` and `elastic2043` from the Cirrussearch Elasticsearch clusters
  • 23:10 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.4/extensions/GlobalWatchlist/modules/SpecialGlobalWatchlist.display.css: 4947241: Fix centering of as-of label (duration: 01m 08s)
  • 22:13 mutante: welcome new deployer derick - user created on deploy1002 and bastions (T281564)
  • 22:05 mutante: pushing puppet run on all bastion hosts
  • 21:45 mutante: mailing lists: approved Alangi Derick's pending request for membership in ops mailing list (is becoming deployer) T281309
  • 21:37 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.4/extensions/CentralAuth/includes/CentralAuthUser.php: 52b134e: Cross-wiki block should pass correct wiki blocker (T281972) (duration: 01m 09s)
  • 21:34 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.3/extensions/CentralAuth/includes/CentralAuthUser.php: 6526884: Cross-wiki block should pass correct wiki blocker (T281972) (duration: 01m 08s)
  • 21:32 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.4/includes/user/UserIdentityValue.php: f189c46: UserIdentityValue: Introduce convenience static factory methods (T281972) (duration: 01m 09s)
  • 21:30 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.3/includes/user/UserIdentityValue.php: 8ffb52d: UserIdentityValue: Introduce convenience static factory methods (T281972) (duration: 01m 11s)
  • 21:29 urbanecm@deploy1002: sync-file aborted: 8ffb52d: UserIdentityValue: Introduce convenience static factory methods (T281972) (duration: 00m 04s)
  • 20:37 ejegg: updated email preferences wiki (donorwiki) from d449599540 to 9f51ace546
  • 20:36 ejegg: updated payments-wiki from d449599540 to 9f51ace546
  • 20:20 ejegg: updated email preferences wiki (donorwiki) from a232fc3438 to d449599540
  • 19:59 jbond42: re-enable puppet post 685485
  • 19:53 jbond42: disable puppet: rolling out change (685485) which affects all hosts
  • 19:21 brennen@deploy1002: Synchronized php: group1 wikis to 1.37.0-wmf.4 (duration: 01m 07s)
  • 19:19 brennen@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.37.0-wmf.4
  • 19:16 jbond42: ignore the last log message will wait for deploy to finish
  • 19:16 brennen@deploy1002: Synchronized php-1.37.0-wmf.4/tests/phpunit/includes: Backport: Fix order of joins in SpecialRecentChanges (T281981) (duration: 01m 10s)
  • 19:16 jbond42: disable puppet: rolling out change (685485) which affects all hosts
  • 19:14 brennen@deploy1002: Synchronized php-1.37.0-wmf.4/includes/specials: Backport: Fix order of joins in SpecialRecentChanges (T281981) (duration: 01m 08s)
  • 19:10 Amir1: starting migration of public mailing lists in group b and c to mailman3 (T280322)
  • 19:01 brennen: 1.37.0-wmf.4 train status (T281145): deploying patch for T282038 and then rolling forward to group1.
  • 18:59 bblack@cumin1001: conftool action : set/pooled=yes; selector: name=cp501[46].eqsin.wmnet
  • 18:50 bblack@cumin1001: conftool action : set/pooled=yes; selector: name=cp501[35].eqsin.wmnet
  • 18:43 tgr_: Morning deploys done
  • 18:43 tgr@deploy1002: Synchronized php-1.37.0-wmf.4/extensions/GrowthExperiments/modules/homepage/addlink/AddLinkArticleTarget.js: Backport: Prevent edit notices from appearing (T281960) (duration: 01m 08s)
  • 18:42 tgr@deploy1002: Synchronized php-1.37.0-wmf.3/extensions/GrowthExperiments/modules/homepage/addlink/AddLinkArticleTarget.js: Backport: Prevent edit notices from appearing (T281960) (duration: 01m 08s)
  • 18:40 tgr@deploy1002: Synchronized wmf-config/flaggedrevs.php: Config: flaggedrevs.php: Use MediaWikiServices, not an extension function (duration: 01m 08s)
  • 18:34 tgr@deploy1002: Synchronized php-1.37.0-wmf.4/extensions/Popups/includes: Backport: Enable Reference Previews for more users (T271206) (duration: 01m 08s)
  • 18:33 tgr@deploy1002: Synchronized php-1.37.0-wmf.3/extensions/Popups/includes: Backport: Enable Reference Previews for more users (T271206) (duration: 01m 11s)
  • 18:24 tgr@deploy1002: Synchronized wmf-config/ProductionServices.php: Config: replace mwlog1001 with new mwlog[12]002 hosts (T224565) (duration: 01m 24s)
  • 17:59 bblack@cumin1001: conftool action : set/weight=100; selector: name=cp501[3456].eqsin.wmnet,service=ats-be
  • 17:59 bblack@cumin1001: conftool action : set/weight=1; selector: name=cp501[3456].eqsin.wmnet,service=ats-tls
  • 17:59 bblack@cumin1001: conftool action : set/weight=1; selector: name=cp501[3456].eqsin.wmnet,service=varnish-fe
  • 17:59 mutante: adding a systemd timer to all thumbor servers that writes output of fc-list command into /srv/fc-list/fc-list (T280718)
  • 17:58 XioNoX: push pfw policies - T281942
  • 17:10 ejegg: updated standalone SmashPig deploy from 250a8570d1 to be272c02ce
  • 15:54 marostegui@cumin1001: dbctl commit (dc=all): 'db1126 (re)pooling @ 100%: Repool db1126', diff saved to https://phabricator.wikimedia.org/P15786 and previous config saved to /var/cache/conftool/dbconfig/20210505-155453-root.json
  • 15:43 herron@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts icinga2001.wikimedia.org
  • 15:39 marostegui@cumin1001: dbctl commit (dc=all): 'db1126 (re)pooling @ 75%: Repool db1126', diff saved to https://phabricator.wikimedia.org/P15785 and previous config saved to /var/cache/conftool/dbconfig/20210505-153949-root.json
  • 15:25 herron@cumin1001: START - Cookbook sre.hosts.decommission for hosts icinga2001.wikimedia.org
  • 15:24 marostegui@cumin1001: dbctl commit (dc=all): 'db1126 (re)pooling @ 50%: Repool db1126', diff saved to https://phabricator.wikimedia.org/P15784 and previous config saved to /var/cache/conftool/dbconfig/20210505-152445-root.json
  • 15:23 herron@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts icinga1001.wikimedia.org
  • 15:11 herron@cumin1001: START - Cookbook sre.hosts.decommission for hosts icinga1001.wikimedia.org
  • 15:10 herron: decommissioning icinga[12]001 hosts T279601 T279602
  • 15:10 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on 10 hosts with reason: Table check on db2129 T280751
  • 15:10 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on 10 hosts with reason: Table check on db2129 T280751
  • 15:09 marostegui@cumin1001: dbctl commit (dc=all): 'db1126 (re)pooling @ 30%: Repool db1126', diff saved to https://phabricator.wikimedia.org/P15783 and previous config saved to /var/cache/conftool/dbconfig/20210505-150942-root.json
  • 14:54 marostegui@cumin1001: dbctl commit (dc=all): 'db1126 (re)pooling @ 20%: Repool db1126', diff saved to https://phabricator.wikimedia.org/P15782 and previous config saved to /var/cache/conftool/dbconfig/20210505-145438-root.json
  • 14:44 marostegui@cumin1001: dbctl commit (dc=all): 'db1165 (re)pooling @ 100%: Repool db1165', diff saved to https://phabricator.wikimedia.org/P15781 and previous config saved to /var/cache/conftool/dbconfig/20210505-144431-root.json
  • 14:39 marostegui@cumin1001: dbctl commit (dc=all): 'db1126 (re)pooling @ 10%: Repool db1126', diff saved to https://phabricator.wikimedia.org/P15780 and previous config saved to /var/cache/conftool/dbconfig/20210505-143934-root.json
  • 14:29 marostegui@cumin1001: dbctl commit (dc=all): 'db1165 (re)pooling @ 75%: Repool db1165', diff saved to https://phabricator.wikimedia.org/P15779 and previous config saved to /var/cache/conftool/dbconfig/20210505-142927-root.json
  • 14:28 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 10 hosts with reason: Reimage db2129 T280751
  • 14:28 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 10 hosts with reason: Reimage db2129 T280751
  • 14:24 marostegui@cumin1001: dbctl commit (dc=all): 'db1126 (re)pooling @ 5%: Repool db1126', diff saved to https://phabricator.wikimedia.org/P15778 and previous config saved to /var/cache/conftool/dbconfig/20210505-142431-root.json
  • 14:19 kormat@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2129.codfw.wmnet with reason: REIMAGE
  • 14:18 marostegui: Upgrade kernel and enable report_host on db1126
  • 14:17 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1126 to enable report_host', diff saved to https://phabricator.wikimedia.org/P15777 and previous config saved to /var/cache/conftool/dbconfig/20210505-141735-marostegui.json
  • 14:17 kormat@cumin2001: START - Cookbook sre.hosts.downtime for 2:00:00 on db2129.codfw.wmnet with reason: REIMAGE
  • 14:14 marostegui@cumin1001: dbctl commit (dc=all): 'db1165 (re)pooling @ 50%: Repool db1165', diff saved to https://phabricator.wikimedia.org/P15776 and previous config saved to /var/cache/conftool/dbconfig/20210505-141423-root.json
  • 13:59 marostegui@cumin1001: dbctl commit (dc=all): 'db1165 (re)pooling @ 25%: Repool db1165', diff saved to https://phabricator.wikimedia.org/P15775 and previous config saved to /var/cache/conftool/dbconfig/20210505-135920-root.json
  • 13:58 kevinbazira@deploy1002: Finished deploy [ores/deploy@5612f30]: Regular ORES Deployment T278723 (duration: 16m 47s)
  • 13:48 wmde-fisch@deploy1002: Synchronized wmf-config/CommonSettings.php: Config: Revert "Enable ReferencePreviews on first wikis CommonSettings" () (duration: 02m 08s)
  • 13:41 kevinbazira@deploy1002: Started deploy [ores/deploy@5612f30]: Regular ORES Deployment T278723
  • 13:33 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1165 for schema change', diff saved to https://phabricator.wikimedia.org/P15774 and previous config saved to /var/cache/conftool/dbconfig/20210505-133259-marostegui.json
  • 13:32 marostegui@cumin1001: dbctl commit (dc=all): 'db1180 (re)pooling @ 100%: Repool db1180', diff saved to https://phabricator.wikimedia.org/P15773 and previous config saved to /var/cache/conftool/dbconfig/20210505-133202-root.json
  • 13:19 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 9 hosts with reason: Reimage db2129 T280751
  • 13:19 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 9 hosts with reason: Reimage db2129 T280751
  • 13:16 marostegui@cumin1001: dbctl commit (dc=all): 'db1180 (re)pooling @ 75%: Repool db1180', diff saved to https://phabricator.wikimedia.org/P15772 and previous config saved to /var/cache/conftool/dbconfig/20210505-131658-root.json
  • 13:12 kormat: reimaging db2129 to buster T280751
  • 13:01 marostegui@cumin1001: dbctl commit (dc=all): 'db1180 (re)pooling @ 50%: Repool db1180', diff saved to https://phabricator.wikimedia.org/P15771 and previous config saved to /var/cache/conftool/dbconfig/20210505-130155-root.json
  • 12:46 marostegui@cumin1001: dbctl commit (dc=all): 'db1180 (re)pooling @ 25%: Repool db1180', diff saved to https://phabricator.wikimedia.org/P15770 and previous config saved to /var/cache/conftool/dbconfig/20210505-124651-root.json
  • 12:23 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1180 for schema change', diff saved to https://phabricator.wikimedia.org/P15769 and previous config saved to /var/cache/conftool/dbconfig/20210505-122351-marostegui.json
  • 12:13 marostegui@cumin1001: dbctl commit (dc=all): 'db1173 (re)pooling @ 100%: Repool db1173', diff saved to https://phabricator.wikimedia.org/P15768 and previous config saved to /var/cache/conftool/dbconfig/20210505-121353-root.json
  • 12:01 moritzm: installing exim security updates on stretch
  • 11:58 marostegui@cumin1001: dbctl commit (dc=all): 'db1173 (re)pooling @ 75%: Repool db1173', diff saved to https://phabricator.wikimedia.org/P15767 and previous config saved to /var/cache/conftool/dbconfig/20210505-115849-root.json
  • 11:43 marostegui@cumin1001: dbctl commit (dc=all): 'db1173 (re)pooling @ 50%: Repool db1173', diff saved to https://phabricator.wikimedia.org/P15765 and previous config saved to /var/cache/conftool/dbconfig/20210505-114345-root.json
  • 11:28 marostegui@cumin1001: dbctl commit (dc=all): 'db1173 (re)pooling @ 25%: Repool db1173', diff saved to https://phabricator.wikimedia.org/P15764 and previous config saved to /var/cache/conftool/dbconfig/20210505-112842-root.json
  • 11:25 urbanecm@deploy1002: Synchronized wmf-config/CommonSettings.php: 3565427: Enable ReferencePreviews on first wikis (T271206; 2/2) (duration: 01m 10s)
  • 11:24 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 4f3051b: Enable ReferencePreviews on first wikis (T271206; 1/2) (duration: 01m 20s)
  • 11:17 urbanecm@deploy1002: Scap failed!: Call to mwscript eval.php stderr: not empty
  • 11:11 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 289dc34: Enable new language button for all logged in users outside test projects (T280526) (duration: 02m 24s)
  • 10:29 jmm@cumin2001: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0)
  • 09:54 hashar: Restarted Zuul / CI
  • 09:49 marostegui@cumin1001: dbctl commit (dc=all): 'db1168 (re)pooling @ 100%: Repool db1168', diff saved to https://phabricator.wikimedia.org/P15762 and previous config saved to /var/cache/conftool/dbconfig/20210505-094945-root.json
  • 09:40 marostegui@cumin1001: dbctl commit (dc=all): 'db1178 (re)pooling @ 100%: Slowly pool db1178 into s8 T275633', diff saved to https://phabricator.wikimedia.org/P15761 and previous config saved to /var/cache/conftool/dbconfig/20210505-094005-root.json
  • 09:34 marostegui@cumin1001: dbctl commit (dc=all): 'db1168 (re)pooling @ 75%: Repool db1168', diff saved to https://phabricator.wikimedia.org/P15760 and previous config saved to /var/cache/conftool/dbconfig/20210505-093441-root.json
  • 09:25 marostegui@cumin1001: dbctl commit (dc=all): 'db1178 (re)pooling @ 80%: Slowly pool db1178 into s8 T275633', diff saved to https://phabricator.wikimedia.org/P15759 and previous config saved to /var/cache/conftool/dbconfig/20210505-092501-root.json
  • 09:19 marostegui@cumin1001: dbctl commit (dc=all): 'db1168 (re)pooling @ 50%: Repool db1168', diff saved to https://phabricator.wikimedia.org/P15758 and previous config saved to /var/cache/conftool/dbconfig/20210505-091938-root.json
  • 09:09 marostegui@cumin1001: dbctl commit (dc=all): 'db1178 (re)pooling @ 70%: Slowly pool db1178 into s8 T275633', diff saved to https://phabricator.wikimedia.org/P15757 and previous config saved to /var/cache/conftool/dbconfig/20210505-090957-root.json
  • 09:08 hashar: Upgraded Jenkins ldap plugin from 1.26 to 2.6 # T281737
  • 09:04 marostegui@cumin1001: dbctl commit (dc=all): 'db1168 (re)pooling @ 25%: Repool db1168', diff saved to https://phabricator.wikimedia.org/P15756 and previous config saved to /var/cache/conftool/dbconfig/20210505-090434-root.json
  • 08:55 hashar: Restarting CI Jenkins # T281737
  • 08:54 marostegui@cumin1001: dbctl commit (dc=all): 'db1178 (re)pooling @ 60%: Slowly pool db1178 into s8 T275633', diff saved to https://phabricator.wikimedia.org/P15755 and previous config saved to /var/cache/conftool/dbconfig/20210505-085454-root.json
  • 08:50 jgiannelos@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
  • 08:47 jgiannelos@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
  • 08:41 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'mobileapps' for release 'staging' .
  • 08:39 marostegui@cumin1001: dbctl commit (dc=all): 'db1178 (re)pooling @ 50%: Slowly pool db1178 into s8 T275633', diff saved to https://phabricator.wikimedia.org/P15754 and previous config saved to /var/cache/conftool/dbconfig/20210505-083950-root.json
  • 08:38 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1168 for schema change', diff saved to https://phabricator.wikimedia.org/P15753 and previous config saved to /var/cache/conftool/dbconfig/20210505-083810-marostegui.json
  • 08:26 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1113:3316', diff saved to https://phabricator.wikimedia.org/P15752 and previous config saved to /var/cache/conftool/dbconfig/20210505-082609-marostegui.json
  • 08:24 marostegui@cumin1001: dbctl commit (dc=all): 'db1178 (re)pooling @ 35%: Slowly pool db1178 into s8 T275633', diff saved to https://phabricator.wikimedia.org/P15751 and previous config saved to /var/cache/conftool/dbconfig/20210505-082446-root.json
  • 08:13 volans: uploaded spicerack_0.0.51 to apt.wikimedia.org buster-wikimedia
  • 08:09 marostegui@cumin1001: dbctl commit (dc=all): 'db1178 (re)pooling @ 30%: Slowly pool db1178 into s8 T275633', diff saved to https://phabricator.wikimedia.org/P15750 and previous config saved to /var/cache/conftool/dbconfig/20210505-080942-root.json
  • 07:54 marostegui@cumin1001: dbctl commit (dc=all): 'db1178 (re)pooling @ 25%: Slowly pool db1178 into s8 T275633', diff saved to https://phabricator.wikimedia.org/P15749 and previous config saved to /var/cache/conftool/dbconfig/20210505-075438-root.json
  • 07:53 ryankemper@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) reboot without plugin upgrade (3 nodes at a time) for ElasticSearch cluster search_codfw: codfw reboot - ryankemper@cumin1001 - T280563
  • 07:39 marostegui@cumin1001: dbctl commit (dc=all): 'db1178 (re)pooling @ 20%: Slowly pool db1178 into s8 T275633', diff saved to https://phabricator.wikimedia.org/P15748 and previous config saved to /var/cache/conftool/dbconfig/20210505-073934-root.json
  • 07:37 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1113:3316 for schema change', diff saved to https://phabricator.wikimedia.org/P15747 and previous config saved to /var/cache/conftool/dbconfig/20210505-073722-marostegui.json
  • 07:36 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3316 (re)pooling @ 100%: Repool db1096:3316 after schema change', diff saved to https://phabricator.wikimedia.org/P15746 and previous config saved to /var/cache/conftool/dbconfig/20210505-073653-root.json
  • 07:35 jmm@cumin2001: START - Cookbook sre.cassandra.roll-restart
  • 07:35 moritzm: rolling restart of cassandra in eqiad to pick up Java security updates
  • 07:34 marostegui@cumin1001: dbctl commit (dc=all): 'db1156 (re)pooling @ 100%: Repool db1156', diff saved to https://phabricator.wikimedia.org/P15745 and previous config saved to /var/cache/conftool/dbconfig/20210505-073416-root.json
  • 07:32 marostegui@cumin1001: dbctl commit (dc=all): 'db1074 (re)pooling @ 100%: Repool db1074', diff saved to https://phabricator.wikimedia.org/P15744 and previous config saved to /var/cache/conftool/dbconfig/20210505-073223-root.json
  • 07:24 marostegui@cumin1001: dbctl commit (dc=all): 'db1178 (re)pooling @ 15%: Slowly pool db1178 into s8 T275633', diff saved to https://phabricator.wikimedia.org/P15743 and previous config saved to /var/cache/conftool/dbconfig/20210505-072431-root.json
  • 07:21 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3316 (re)pooling @ 75%: Repool db1096:3316 after schema change', diff saved to https://phabricator.wikimedia.org/P15742 and previous config saved to /var/cache/conftool/dbconfig/20210505-072149-root.json
  • 07:19 marostegui@cumin1001: dbctl commit (dc=all): 'db1156 (re)pooling @ 75%: Repool db1156', diff saved to https://phabricator.wikimedia.org/P15741 and previous config saved to /var/cache/conftool/dbconfig/20210505-071912-root.json
  • 07:17 marostegui@cumin1001: dbctl commit (dc=all): 'db1074 (re)pooling @ 75%: Repool db1074', diff saved to https://phabricator.wikimedia.org/P15740 and previous config saved to /var/cache/conftool/dbconfig/20210505-071720-root.json
  • 07:11 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1082 T281794', diff saved to https://phabricator.wikimedia.org/P15739 and previous config saved to /var/cache/conftool/dbconfig/20210505-071132-marostegui.json
  • 07:09 marostegui@cumin1001: dbctl commit (dc=all): 'db1178 (re)pooling @ 10%: Slowly pool db1178 into s8 T275633', diff saved to https://phabricator.wikimedia.org/P15738 and previous config saved to /var/cache/conftool/dbconfig/20210505-070927-root.json
  • 07:06 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3316 (re)pooling @ 50%: Repool db1096:3316 after schema change', diff saved to https://phabricator.wikimedia.org/P15737 and previous config saved to /var/cache/conftool/dbconfig/20210505-070646-root.json
  • 07:04 marostegui@cumin1001: dbctl commit (dc=all): 'db1156 (re)pooling @ 50%: Repool db1156', diff saved to https://phabricator.wikimedia.org/P15736 and previous config saved to /var/cache/conftool/dbconfig/20210505-070409-root.json
  • 07:02 marostegui@cumin1001: dbctl commit (dc=all): 'db1074 (re)pooling @ 50%: Repool db1074', diff saved to https://phabricator.wikimedia.org/P15735 and previous config saved to /var/cache/conftool/dbconfig/20210505-070216-root.json
  • 06:54 marostegui@cumin1001: dbctl commit (dc=all): 'db1178 (re)pooling @ 5%: Slowly pool db1178 into s8 T275633', diff saved to https://phabricator.wikimedia.org/P15734 and previous config saved to /var/cache/conftool/dbconfig/20210505-065423-root.json
  • 06:51 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3316 (re)pooling @ 25%: Repool db1096:3316 after schema change', diff saved to https://phabricator.wikimedia.org/P15733 and previous config saved to /var/cache/conftool/dbconfig/20210505-065142-root.json
  • 06:49 marostegui@cumin1001: dbctl commit (dc=all): 'db1156 (re)pooling @ 25%: Repool db1156', diff saved to https://phabricator.wikimedia.org/P15732 and previous config saved to /var/cache/conftool/dbconfig/20210505-064905-root.json
  • 06:47 marostegui@cumin1001: dbctl commit (dc=all): 'db1074 (re)pooling @ 25%: Repool db1074', diff saved to https://phabricator.wikimedia.org/P15731 and previous config saved to /var/cache/conftool/dbconfig/20210505-064712-root.json
  • 06:42 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1074 and db1156 to switch sanitarium hosts T280492', diff saved to https://phabricator.wikimedia.org/P15730 and previous config saved to /var/cache/conftool/dbconfig/20210505-064204-marostegui.json
  • 06:41 marostegui: Check tables on db1112 (lag might show up on s3 on wiki replicas) T280492
  • 06:39 marostegui@cumin1001: dbctl commit (dc=all): 'db1178 (re)pooling @ 3%: Slowly pool db1178 into s8 T275633', diff saved to https://phabricator.wikimedia.org/P15729 and previous config saved to /var/cache/conftool/dbconfig/20210505-063920-root.json
  • 06:24 marostegui@cumin1001: dbctl commit (dc=all): 'db1178 (re)pooling @ 2%: Slowly pool db1178 into s8 T275633', diff saved to https://phabricator.wikimedia.org/P15728 and previous config saved to /var/cache/conftool/dbconfig/20210505-062416-root.json
  • 06:09 marostegui@cumin1001: dbctl commit (dc=all): 'db1178 (re)pooling @ 1%: Slowly pool db1178 into s8 T275633', diff saved to https://phabricator.wikimedia.org/P15727 and previous config saved to /var/cache/conftool/dbconfig/20210505-060912-root.json
  • 06:08 marostegui@cumin1001: dbctl commit (dc=all): 'Add db1178 into dbctl T275633', diff saved to https://phabricator.wikimedia.org/P15726 and previous config saved to /var/cache/conftool/dbconfig/20210505-060814-marostegui.json
  • 06:06 marostegui@cumin1001: dbctl commit (dc=all): 'Remove db1104 from API', diff saved to https://phabricator.wikimedia.org/P15725 and previous config saved to /var/cache/conftool/dbconfig/20210505-060636-marostegui.json
  • 06:00 marostegui: Restart mysqld on x1 database primary master (db1103) T281212
  • 05:38 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1099:3311 into main traffic', diff saved to https://phabricator.wikimedia.org/P15724 and previous config saved to /var/cache/conftool/dbconfig/20210505-053841-marostegui.json
  • 05:32 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1106 into s1 vslow, remove db1099:3311', diff saved to https://phabricator.wikimedia.org/P15723 and previous config saved to /var/cache/conftool/dbconfig/20210505-053211-marostegui.json
  • 05:29 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1096:3316 for schema change', diff saved to https://phabricator.wikimedia.org/P15722 and previous config saved to /var/cache/conftool/dbconfig/20210505-052943-marostegui.json
  • 04:53 eileen: civicrm revision changed from e7c610fd87 to 8034e47008, config revision is 189788d452
  • 03:58 ryankemper: T280563 `sudo -i cookbook sre.elasticsearch.rolling-operation search_codfw "codfw reboot" --reboot --nodes-per-run 3 --start-datetime 2021-04-29T23:04:29 --task-id T280563` on `ryankemper@cumin1001` tmux session `elastic_restarts`
  • 03:58 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation reboot without plugin upgrade (3 nodes at a time) for ElasticSearch cluster search_codfw: codfw reboot - ryankemper@cumin1001 - T280563
  • 03:56 ryankemper: T280563 Reboot of `eqiad` complete. Only ~half of `codfw` is remaining.
  • 03:56 ryankemper@cumin1001: END (PASS) - Cookbook sre.elasticsearch.rolling-operation (exit_code=0) reboot without plugin upgrade (3 nodes at a time) for ElasticSearch cluster search_eqiad: eqiad reboot to apply sec updates - ryankemper@cumin1001 - T280563
  • 03:54 ryankemper: T280382 `wdqs1011.eqiad.wmnet` has been re-imaged and had the appropriate wikidata/categories journal files transferred. `df -h` shows disk space is no longer an issue following the switch to `raid0`: `/dev/mapper/vg0-srv 2.7T 998G 1.6T 39% /srv`
  • 03:52 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation reboot without plugin upgrade (3 nodes at a time) for ElasticSearch cluster search_eqiad: eqiad reboot to apply sec updates - ryankemper@cumin1001 - T280563
  • 03:51 ryankemper: T280382 [WDQS] `ryankemper@wdqs2007:~$ sudo depool` (need to monitor host to see if it becomes ssh unreachable again or if it was a one-off; also high update lag)
  • 03:50 ryankemper: T280382 `wdqs2007.codfw.wmnet` has been re-imaged and had the appropriate wikidata/categories journal files transferred. `df -h` shows disk space is no longer an issue following the switch to `raid0`: `/dev/mapper/vg0-srv 2.7T 998G 1.6T 39% /srv`
  • 03:07 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 03:02 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 02:59 ryankemper@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) reboot without plugin upgrade (3 nodes at a time) for ElasticSearch cluster search_eqiad: eqiad reboot to apply sec updates - ryankemper@cumin1001 - T280563
  • 01:55 ryankemper: T281327 [Elastic] Unbanned `elastic2043` from cluster
  • 01:50 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 01:49 ryankemper: T280382 `sudo -i cookbook sre.wdqs.data-transfer --source wdqs2001.codfw.wmnet --dest wdqs2007.codfw.wmnet --reason "transferring fresh wikidata journal following reimage" --blazegraph_instance blazegraph` on `ryankemper@cumin1001` tmux session `reimage` (will likely fail due to underlying hw but we'll see)
  • 01:47 ryankemper@cumin1001: END (ERROR) - Cookbook sre.wdqs.data-transfer (exit_code=97)
  • 01:45 ryankemper: T280382 `sudo -i cookbook sre.wdqs.data-transfer --source wdqs1006.eqiad.wmnet --dest wdqs1011.eqiad.wmnet --reason "transferring fresh wikidata journal following reimage" --blazegraph_instance blazegraph` on `ryankemper@cumin1001` tmux session `reimage`
  • 01:45 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 01:44 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 01:43 ryankemper: T280382 [WDQS] `racadm>>racadm serveraction powercycle` on `wdqs2007`
  • 01:39 ryankemper: T280382 `sudo -i cookbook sre.wdqs.data-transfer --source wdqs1006.eqiad.wmnet --dest wdqs1011.eqiad.wmnet --reason "transferring fresh categories journal following reimage" --blazegraph_instance categories` on `ryankemper@cumin1001` tmux session `reimage`
  • 01:39 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 01:36 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation reboot without plugin upgrade (3 nodes at a time) for ElasticSearch cluster search_eqiad: eqiad reboot to apply sec updates - ryankemper@cumin1001 - T280563
  • 00:29 eileen: civicrm revision changed from 94e321dbe0 to e7c610fd87, config revision is 189788d452
  • 00:15 ejegg: updated payments-wiki from 44570561f2 to d449599540
  • 00:08 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 3f6ea8c: Growth: enwiki: Add list of mentors (T281896) (duration: 01m 10s)
  • 00:00 urbanecm@deploy1002: Synchronized fc-list: 9397049: update fc-list to current version on buster (T79424) (duration: 01m 09s)

2021-05-04

  • 23:41 urbanecm@deploy1002: Synchronized wmf-config/config/enwiki.yaml: d29dbb2: Enable Growth features on enwiki in the dark mode (T281896; 3/3) (duration: 01m 09s)
  • 23:40 urbanecm@deploy1002: Synchronized dblists/growthexperiments.dblist: d29dbb2: Enable Growth features on enwiki in the dark mode (T281896; 2/3) (duration: 01m 09s)
  • 23:38 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: d29dbb2: Enable Growth features on enwiki in the dark mode (T281896; 1/3) (duration: 01m 09s)
  • 23:31 urbanecm@deploy1002: Synchronized wmf-config/config/bgwiki.yaml: 5b4c516: Enable Growth team features in dark mode on bgwiki (T280824; 3/3) (duration: 01m 09s)
  • 23:30 urbanecm@deploy1002: sync-file aborted: 5b4c516: Enable Growth team features in dark mode on bgwiki (T280824; 3/3) (duration: 00m 03s)
  • 23:30 urbanecm@deploy1002: Synchronized dblists/growthexperiments.dblist: 5b4c516: Enable Growth team features in dark mode on bgwiki (T280824; 2/3) (duration: 01m 09s)
  • 23:28 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 5b4c516: Enable Growth team features in dark mode on bgwiki (T280824; 1/3) (duration: 01m 09s)
  • 23:26 Urbanecm: Create tables for GrowthExperiments extension on enwiki (T281896)
  • 23:24 Urbanecm: Create tables for GrowthExperiments extension on bgwiki (T280824)
  • 23:22 urbanecm@deploy1002: Synchronized wmf-config/CommonSettings.php: a3c24f3: Avoid using User::getGroups() and ::getEffectiveGroups() (T281823) (duration: 01m 10s)
  • 23:13 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: e467d92: Add extendedconfirmed on ptwiki (T281926) (duration: 01m 10s)
  • 23:06 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 012d613: Add extendedconfirmed on azwiki (T281860) (duration: 01m 10s)
  • 22:49 bblack@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on cp5016.eqsin.wmnet with reason: REIMAGE
  • 22:47 bblack@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on cp5015.eqsin.wmnet with reason: REIMAGE
  • 22:46 bblack@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp5014.eqsin.wmnet with reason: REIMAGE
  • 22:44 bblack@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp5016.eqsin.wmnet with reason: REIMAGE
  • 22:44 bblack@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp5013.eqsin.wmnet with reason: REIMAGE
  • 22:42 bblack@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp5015.eqsin.wmnet with reason: REIMAGE
  • 22:42 bblack@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp5014.eqsin.wmnet with reason: REIMAGE
  • 22:42 bblack@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp5013.eqsin.wmnet with reason: REIMAGE
  • 21:30 eileen: civicrm revision changed from 33a63d5789 to 94e321dbe0, config revision is a212d6ab23
  • 21:17 ebernhardson@deploy1002: Finished deploy [wikimedia/discovery/analytics@06a4a3e]: Bump glent to 0.2.4 (duration: 03m 55s)
  • 21:13 ebernhardson@deploy1002: Started deploy [wikimedia/discovery/analytics@06a4a3e]: Bump glent to 0.2.4
  • 20:13 pt1979@cumin2001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 20:10 pt1979@cumin2001: START - Cookbook sre.dns.netbox
  • 20:09 joal@deploy1002: Finished deploy [analytics/refinery@0dc3ae7] (hadoop-test): Regular analytics weekly train HADOOP-TEST [analytics/refinery@0dc3ae7] (duration: 05m 16s)
  • 20:04 joal@deploy1002: Started deploy [analytics/refinery@0dc3ae7] (hadoop-test): Regular analytics weekly train HADOOP-TEST [analytics/refinery@0dc3ae7]
  • 20:03 joal@deploy1002: Finished deploy [analytics/refinery@0dc3ae7] (thin): Regular analytics weekly train THIN [analytics/refinery@0dc3ae7] (duration: 00m 07s)
  • 20:03 joal@deploy1002: Started deploy [analytics/refinery@0dc3ae7] (thin): Regular analytics weekly train THIN [analytics/refinery@0dc3ae7]
  • 20:03 joal@deploy1002: Finished deploy [analytics/refinery@0dc3ae7]: Regular analytics weekly train [analytics/refinery@0dc3ae7] (duration: 17m 15s)
  • 19:46 joal@deploy1002: Started deploy [analytics/refinery@0dc3ae7]: Regular analytics weekly train [analytics/refinery@0dc3ae7]
  • 19:38 dancy@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.37.0-wmf.4
  • 17:58 dancy@deploy1002: Finished scap: testwikis wikis to 1.37.0-wmf.4 (duration: 42m 33s)
  • 17:26 ebernhardson@deploy1002: Finished deploy [wikimedia/discovery/analytics@0c4538f]: Increase convert_to_esbulk memory overhead (duration: 01m 46s)
  • 17:24 ebernhardson@deploy1002: Started deploy [wikimedia/discovery/analytics@0c4538f]: Increase convert_to_esbulk memory overhead
  • 17:16 dancy@deploy1002: Started scap: testwikis wikis to 1.37.0-wmf.4
  • 17:03 brennen: 1.37.0-wmf.4 was branched at f069fd8 for T281145
  • 17:00 volans: uploaded spicerack_0.0.51 to apt.wikimedia.org bullseye-wikimedia
  • 16:26 ebernhardson@deploy1002: Finished deploy [wikimedia/discovery/analytics@e6ae572]: Increase convert_to_esbulk memory overhead (duration: 01m 54s)
  • 16:25 ebernhardson@deploy1002: Started deploy [wikimedia/discovery/analytics@e6ae572]: Increase convert_to_esbulk memory overhead
  • 16:16 dzahn@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
  • 16:15 dzahn@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
  • 16:13 mutante: k8s: upgrading release=namespaces, helmfile apply to create miscweb namespace T281538
  • 16:13 dzahn@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 16:12 hnowlan@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 16:12 hnowlan@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 16:12 dzahn@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 16:07 hnowlan@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 16:07 hnowlan@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 15:59 hnowlan@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 15:59 hnowlan@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 14:41 jmm@cumin2001: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0)
  • 13:46 moritzm: installing exim security updates on buster
  • 13:39 marostegui@cumin1001: dbctl commit (dc=all): 'db1137 (re)pooling @ 100%: Repool db1137 after mysql upgrade', diff saved to https://phabricator.wikimedia.org/P15721 and previous config saved to /var/cache/conftool/dbconfig/20210504-133950-root.json
  • 13:24 marostegui@cumin1001: dbctl commit (dc=all): 'db1137 (re)pooling @ 75%: Repool db1137 after mysql upgrade', diff saved to https://phabricator.wikimedia.org/P15720 and previous config saved to /var/cache/conftool/dbconfig/20210504-132446-root.json
  • 13:14 moritzm: upgrading linux-libc-dev on buster hosts (to version introduced by 10.9 point release)
  • 13:12 hnowlan@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 13:12 hnowlan@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 13:09 marostegui@cumin1001: dbctl commit (dc=all): 'db1137 (re)pooling @ 50%: Repool db1137 after mysql upgrade', diff saved to https://phabricator.wikimedia.org/P15719 and previous config saved to /var/cache/conftool/dbconfig/20210504-130943-root.json
  • 13:01 moritzm: installing debian-archive-keyring updates on buster
  • 12:54 marostegui@cumin1001: dbctl commit (dc=all): 'db1137 (re)pooling @ 25%: Repool db1137 after mysql upgrade', diff saved to https://phabricator.wikimedia.org/P15718 and previous config saved to /var/cache/conftool/dbconfig/20210504-125439-root.json
  • 12:50 marostegui: Upgrade mysql and kernel on db1137 T281212
  • 12:49 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1137 to upgrade its mysql T281212', diff saved to https://phabricator.wikimedia.org/P15717 and previous config saved to /var/cache/conftool/dbconfig/20210504-124937-marostegui.json
  • 12:48 marostegui@cumin1001: dbctl commit (dc=all): 'db1120 (re)pooling @ 100%: Repool db1120 after mysql upgrade', diff saved to https://phabricator.wikimedia.org/P15716 and previous config saved to /var/cache/conftool/dbconfig/20210504-124848-root.json
  • 12:46 kormat@cumin1001: dbctl commit (dc=all): 'Repooling after sanitarium master switch T280751', diff saved to https://phabricator.wikimedia.org/P15715 and previous config saved to /var/cache/conftool/dbconfig/20210504-124647-kormat.json
  • 12:35 kormat@cumin1001: dbctl commit (dc=all): 'Depooling for sanitarium master switch T280751', diff saved to https://phabricator.wikimedia.org/P15714 and previous config saved to /var/cache/conftool/dbconfig/20210504-123537-kormat.json
  • 12:34 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 15 hosts with reason: Replace db1085 with db1165 T280751
  • 12:34 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 15 hosts with reason: Replace db1085 with db1165 T280751
  • 12:33 marostegui@cumin1001: dbctl commit (dc=all): 'db1120 (re)pooling @ 75%: Repool db1120 after mysql upgrade', diff saved to https://phabricator.wikimedia.org/P15713 and previous config saved to /var/cache/conftool/dbconfig/20210504-123344-root.json
  • 12:27 hnowlan@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 12:27 hnowlan@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 12:22 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 683b876: 5763630: GrowthExperiments: Rename control variant to control, GrowthExperiments: Set linkrecommendation variant to 0 (T281727) (duration: 00m 58s)
  • 12:20 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.3/extensions/GrowthExperiments/: 8f938c2: c8c07ab: GrowthExperiments backports (T281727) (duration: 00m 59s)
  • 12:18 marostegui@cumin1001: dbctl commit (dc=all): 'db1120 (re)pooling @ 50%: Repool db1120 after mysql upgrade', diff saved to https://phabricator.wikimedia.org/P15712 and previous config saved to /var/cache/conftool/dbconfig/20210504-121841-root.json
  • 12:08 jmm@cumin2001: START - Cookbook sre.cassandra.roll-restart
  • 12:03 marostegui@cumin1001: dbctl commit (dc=all): 'db1120 (re)pooling @ 25%: Repool db1120 after mysql upgrade', diff saved to https://phabricator.wikimedia.org/P15711 and previous config saved to /var/cache/conftool/dbconfig/20210504-120337-root.json
  • 11:58 marostegui: Upgrade mysql and kernel on db1120 T281212
  • 11:56 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1120 to upgrade its mysql T281212', diff saved to https://phabricator.wikimedia.org/P15710 and previous config saved to /var/cache/conftool/dbconfig/20210504-115634-marostegui.json
  • 11:40 jmm@cumin2001: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0)
  • 11:31 Urbanecm: Run `User::newSystemUser( 'Maintenance script', [ 'steal' => true ] );` on arwiki, bnwiki, viwiki (T278710, T281703)
  • 11:27 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 87dff0b: GrowthExperiments: Enable link recommendations for target wikis (T278710) (duration: 00m 57s)
  • 11:10 Urbanecm: Create growthexperiments_link_recommendations and growthexperiments_link_submissions on arwiki,bnwiki,viwiki x1 (T266913)
  • 11:06 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 8228f6b: Disable ContentTranslation New article campaign in fiwiki (T277473) (duration: 00m 59s)
  • 10:26 marostegui@cumin1001: dbctl commit (dc=all): 'db1167 (re)pooling @ 100%: Repool db1167', diff saved to https://phabricator.wikimedia.org/P15707 and previous config saved to /var/cache/conftool/dbconfig/20210504-102649-root.json
  • 10:11 marostegui@cumin1001: dbctl commit (dc=all): 'db1167 (re)pooling @ 75%: Repool db1167', diff saved to https://phabricator.wikimedia.org/P15705 and previous config saved to /var/cache/conftool/dbconfig/20210504-101145-root.json
  • 09:57 moritzm: installing bind9 security updates on buster (client side tools/libs only)
  • 09:56 marostegui@cumin1001: dbctl commit (dc=all): 'db1167 (re)pooling @ 50%: Repool db1167', diff saved to https://phabricator.wikimedia.org/P15704 and previous config saved to /var/cache/conftool/dbconfig/20210504-095642-root.json
  • 09:45 godog: +50G for prometheus k8s in codfw
  • 09:41 marostegui@cumin1001: dbctl commit (dc=all): 'db1167 (re)pooling @ 25%: Repool db1167', diff saved to https://phabricator.wikimedia.org/P15703 and previous config saved to /var/cache/conftool/dbconfig/20210504-094138-root.json
  • 09:04 jmm@cumin2001: START - Cookbook sre.cassandra.roll-restart
  • 09:04 moritzm: rolling restart of cassandra in codfw to pick up Java security updates
  • 08:17 marostegui@cumin1001: dbctl commit (dc=all): 'db1082 (re)pooling @ 100%: Repool db1082', diff saved to https://phabricator.wikimedia.org/P15702 and previous config saved to /var/cache/conftool/dbconfig/20210504-081716-root.json
  • 08:02 marostegui: Check tables on db1106, lag will show up on s1 on wiki replicas (T280492)
  • 08:02 marostegui@cumin1001: dbctl commit (dc=all): 'db1161 (re)pooling @ 100%: Repool db1161', diff saved to https://phabricator.wikimedia.org/P15701 and previous config saved to /var/cache/conftool/dbconfig/20210504-080213-root.json
  • 08:02 marostegui@cumin1001: dbctl commit (dc=all): 'db1082 (re)pooling @ 75%: Repool db1082', diff saved to https://phabricator.wikimedia.org/P15700 and previous config saved to /var/cache/conftool/dbconfig/20210504-080212-root.json
  • 08:02 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1106 from s1 vslow to get its tables checked and pool db1099:3311 instead T280492', diff saved to https://phabricator.wikimedia.org/P15699 and previous config saved to /var/cache/conftool/dbconfig/20210504-080206-marostegui.json
  • 07:46 marostegui@cumin1001: dbctl commit (dc=all): 'db1161 (re)pooling @ 75%: Repool db1161', diff saved to https://phabricator.wikimedia.org/P15698 and previous config saved to /var/cache/conftool/dbconfig/20210504-074639-root.json
  • 07:46 marostegui@cumin1001: dbctl commit (dc=all): 'db1082 (re)pooling @ 50%: Repool db1082', diff saved to https://phabricator.wikimedia.org/P15697 and previous config saved to /var/cache/conftool/dbconfig/20210504-074632-root.json
  • 07:31 marostegui@cumin1001: dbctl commit (dc=all): 'db1161 (re)pooling @ 50%: Repool db1161', diff saved to https://phabricator.wikimedia.org/P15696 and previous config saved to /var/cache/conftool/dbconfig/20210504-073135-root.json
  • 07:31 marostegui@cumin1001: dbctl commit (dc=all): 'db1082 (re)pooling @ 25%: Repool db1082', diff saved to https://phabricator.wikimedia.org/P15695 and previous config saved to /var/cache/conftool/dbconfig/20210504-073127-root.json
  • 07:16 marostegui@cumin1001: dbctl commit (dc=all): 'db1161 (re)pooling @ 25%: Repool db1161', diff saved to https://phabricator.wikimedia.org/P15694 and previous config saved to /var/cache/conftool/dbconfig/20210504-071632-root.json
  • 07:16 marostegui@cumin1001: dbctl commit (dc=all): 'db1082 (re)pooling @ 10%: Repool db1082', diff saved to https://phabricator.wikimedia.org/P15693 and previous config saved to /var/cache/conftool/dbconfig/20210504-071623-root.json
  • 07:11 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1161 and db1082 to change s5 sanitarium master T280492', diff saved to https://phabricator.wikimedia.org/P15692 and previous config saved to /var/cache/conftool/dbconfig/20210504-071146-marostegui.json
  • 06:50 marostegui@cumin1001: dbctl commit (dc=all): 'db1158 (re)pooling @ 100%: Repool db1158', diff saved to https://phabricator.wikimedia.org/P15691 and previous config saved to /var/cache/conftool/dbconfig/20210504-065034-root.json
  • 06:35 marostegui@cumin1001: dbctl commit (dc=all): 'db1158 (re)pooling @ 75%: Repool db1158', diff saved to https://phabricator.wikimedia.org/P15690 and previous config saved to /var/cache/conftool/dbconfig/20210504-063530-root.json
  • 06:20 marostegui@cumin1001: dbctl commit (dc=all): 'db1158 (re)pooling @ 50%: Repool db1158', diff saved to https://phabricator.wikimedia.org/P15689 and previous config saved to /var/cache/conftool/dbconfig/20210504-062027-root.json
  • 06:17 marostegui@cumin1001: dbctl commit (dc=all): 'db1118 (re)pooling @ 100%: Repool db1118', diff saved to https://phabricator.wikimedia.org/P15688 and previous config saved to /var/cache/conftool/dbconfig/20210504-061700-root.json
  • 06:05 marostegui@cumin1001: dbctl commit (dc=all): 'db1158 (re)pooling @ 25%: Repool db1158', diff saved to https://phabricator.wikimedia.org/P15687 and previous config saved to /var/cache/conftool/dbconfig/20210504-060523-root.json
  • 06:01 marostegui@cumin1001: dbctl commit (dc=all): 'db1118 (re)pooling @ 75%: Repool db1118', diff saved to https://phabricator.wikimedia.org/P15686 and previous config saved to /var/cache/conftool/dbconfig/20210504-060156-root.json
  • 05:51 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1167 to clone db1178 T275633', diff saved to https://phabricator.wikimedia.org/P15684 and previous config saved to /var/cache/conftool/dbconfig/20210504-055116-marostegui.json
  • 05:50 marostegui@cumin1001: dbctl commit (dc=all): 'db1158 (re)pooling @ 10%: Repool db1158', diff saved to https://phabricator.wikimedia.org/P15683 and previous config saved to /var/cache/conftool/dbconfig/20210504-055020-root.json
  • 05:46 marostegui@cumin1001: dbctl commit (dc=all): 'db1118 (re)pooling @ 50%: Repool db1118', diff saved to https://phabricator.wikimedia.org/P15682 and previous config saved to /var/cache/conftool/dbconfig/20210504-054653-root.json
  • 05:45 marostegui: Stop mysql on db1158 to clone db1178
  • 05:45 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1158 to clone db1178 T275633', diff saved to https://phabricator.wikimedia.org/P15680 and previous config saved to /var/cache/conftool/dbconfig/20210504-054539-marostegui.json
  • 05:36 marostegui: Deploy schema change on s6 codfw, lag will appear - T266486 T268392 T273360
  • 05:31 marostegui@cumin1001: dbctl commit (dc=all): 'db1118 (re)pooling @ 25%: Repool db1118', diff saved to https://phabricator.wikimedia.org/P15678 and previous config saved to /var/cache/conftool/dbconfig/20210504-053149-root.json
  • 05:26 marostegui@cumin1001: dbctl commit (dc=all): 'db1121 (re)pooling @ 100%: Repool db1121', diff saved to https://phabricator.wikimedia.org/P15677 and previous config saved to /var/cache/conftool/dbconfig/20210504-052612-root.json
  • 05:11 marostegui@cumin1001: dbctl commit (dc=all): 'db1121 (re)pooling @ 75%: Repool db1121', diff saved to https://phabricator.wikimedia.org/P15676 and previous config saved to /var/cache/conftool/dbconfig/20210504-051108-root.json
  • 05:07 marostegui: Restart sanitarium hosts to pick up new filters T263817
  • 04:56 marostegui@cumin1001: dbctl commit (dc=all): 'db1121 (re)pooling @ 50%: Repool db1121', diff saved to https://phabricator.wikimedia.org/P15675 and previous config saved to /var/cache/conftool/dbconfig/20210504-045605-root.json
  • 04:41 marostegui@cumin1001: dbctl commit (dc=all): 'db1121 (re)pooling @ 25%: Repool db1121', diff saved to https://phabricator.wikimedia.org/P15674 and previous config saved to /var/cache/conftool/dbconfig/20210504-044101-root.json
  • 04:06 ryankemper@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) reboot without plugin upgrade (3 nodes at a time) for ElasticSearch cluster search_eqiad: eqiad reboot to apply sec updates - ryankemper@cumin1001 - T280563
  • 03:38 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation reboot without plugin upgrade (3 nodes at a time) for ElasticSearch cluster search_eqiad: eqiad reboot to apply sec updates - ryankemper@cumin1001 - T280563
  • 03:38 ryankemper@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) reboot without plugin upgrade (3 nodes at a time) for ElasticSearch cluster search_eqiad: eqiad reboot to apply sec updates - ryankemper@cumin1001 - T280563
  • 03:36 ryankemper: T280563 `sudo -i cookbook sre.elasticsearch.rolling-operation search_eqiad "eqiad reboot to apply sec updates" --reboot --nodes-per-run 3 --start-datetime 2021-04-29T23:04:29 --task-id T280563`
  • 03:35 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation reboot without plugin upgrade (3 nodes at a time) for ElasticSearch cluster search_eqiad: eqiad reboot to apply sec updates - ryankemper@cumin1001 - T280563
  • 02:09 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on phab2002.codfw.wmnet with reason: REIMAGE
  • 02:07 pt1979@cumin2001: START - Cookbook sre.hosts.downtime for 2:00:00 on phab2002.codfw.wmnet with reason: REIMAGE
  • 01:41 ryankemper@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) reboot without plugin upgrade (3 nodes at a time) for ElasticSearch cluster search_eqiad: eqiad reboot to apply sec updates - ryankemper@cumin1001 - T280563

2021-05-03

  • 23:18 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 230ef57: Prepare for new configuration option (T277951) (duration: 00m 57s)
  • 23:15 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 7c47ee1: Replace $wgRelatedArticlesFooterWhitelistedSkins (T277958) (duration: 00m 57s)
  • 23:14 urbanecm@deploy1002: sync-file aborted: 7c47ee1: Replace $wgRelatedArticlesFooterWhitelistedSkins (T277958)¨ (duration: 00m 01s)
  • 22:17 legoktm: ran disable_list for: iegcom wikien-l fundraiser spcommittee-private-l spcommittee-l mediation-en-l test-second wikifr-colloque-l
  • 22:14 mutante: [backup1001:~] $ sudo check_bacula.py --icinga
  • 21:56 ryankemper: T280563 `sudo -i cookbook sre.elasticsearch.rolling-operation search_eqiad "eqiad reboot to apply sec updates" --reboot --nodes-per-run 3 --start-datetime 2021-04-29T23:04:29 --task-id T280563`
  • 21:55 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation reboot without plugin upgrade (3 nodes at a time) for ElasticSearch cluster search_eqiad: eqiad reboot to apply sec updates - ryankemper@cumin1001 - T280563
  • 21:54 ryankemper: T280563 eqiad reboot failed with: `curator.exceptions.FailedExecution: Exception encountered. Rerun with loglevel DEBUG and/or check Elasticsearch logs for more information. Exception: ConnectionTimeout caused by - ReadTimeoutError(HTTPSConnectionPool(host='search.svc.eqiad.wmnet', port=9243): Read timed out. (read timeout=10))`
  • 21:52 ryankemper@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) reboot without plugin upgrade (3 nodes at a time) for ElasticSearch cluster search_eqiad: eqiad reboot to apply sec updates - ryankemper@cumin1001 - T280563
  • 21:47 ryankemper: T280563 `sudo -i cookbook sre.elasticsearch.rolling-operation search_eqiad "eqiad reboot to apply sec updates" --reboot --nodes-per-run 3 --start-datetime 2021-04-29T23:04:29 --task-id T280563`
  • 21:46 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation reboot without plugin upgrade (3 nodes at a time) for ElasticSearch cluster search_eqiad: eqiad reboot to apply sec updates - ryankemper@cumin1001 - T280563
  • 21:32 krinkle@deploy1002: Synchronized wmf-config/InitialiseSettings.php: d95b91648 (duration: 00m 58s)
  • 21:27 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs1011.eqiad.wmnet with reason: REIMAGE
  • 21:25 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs1011.eqiad.wmnet with reason: REIMAGE
  • 21:22 ryankemper: [WDQS] `ryankemper@wdqs1003:~$ sudo pool`
  • 21:20 ryankemper: T280382 [WDQS] `ryankemper@puppetmaster1001:~$ sudo confctl select 'name=wdqs1011.eqiad.wmnet' set/pooled=no`
  • 21:19 ryankemper@puppetmaster1001: conftool action : set/pooled=no; selector: name=wdqs1011.eqiad.wmnet
  • 21:09 ryankemper: T280382 `sudo -i wmf-auto-reimage-host -p T280382 wdqs1011.eqiad.wmnet` on `ryankemper@cumin1001` tmux session `reimage`
  • 21:06 ryankemper: T280382 `sudo -i cookbook sre.wdqs.data-transfer --source wdqs2001.codfw.wmnet --dest wdqs2007.codfw.wmnet --reason "transferring fresh wikidata journal following reimage" --blazegraph_instance blazegraph` on `ryankemper@cumin1001` tmux session `reimage`
  • 21:05 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 21:02 ryankemper: T280382 `wdqs1010.eqiad.wmnet` has been re-imaged and had the appropriate wikidata/categories journal files transferred. `df -h` shows disk space is no longer an issue following the switch to `raid0`: `/dev/md2 2.6T 975G 1.5T 39% /srv`
  • 20:56 ryankemper: T280382 [WDQS] `ryankemper@wdqs2001:~$ sudo run-puppet-agent --force`
  • 20:44 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 20:42 ryankemper@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99)
  • 20:37 ryankemper: T280382 `sudo -i cookbook sre.wdqs.data-transfer --source wdqs2001.codfw.wmnet --dest wdqs2007.codfw.wmnet --reason "transferring fresh categories journal following reimage" --blazegraph_instance categories` on `ryankemper@cumin1001` tmux session `reimage`
  • 20:37 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 19:40 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs2007.codfw.wmnet with reason: REIMAGE
  • 19:39 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs2007.codfw.wmnet with reason: REIMAGE
  • 19:24 ryankemper: T280382 `sudo -i cookbook sre.wdqs.data-transfer --without-lvs --source wdqs1003.eqiad.wmnet --dest wdqs1010.eqiad.wmnet --reason "transferring fresh wikidata journal following reimage" --blazegraph_instance blazegraph` on `ryankemper@cumin1001` tmux session `reimage`
  • 19:24 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 19:21 ryankemper@puppetmaster1001: conftool action : set/pooled=no; selector: name=wdqs1004.eqiad.wmnet
  • 19:21 ryankemper: T280382 [WDQS] `sudo confctl select 'name=wdqs1004.eqiad.wmnet' set/pooled=no` (`wdqs1004` failed re-image [not sure why yet] and won't let me ssh in to depool so using conftool instead)
  • 18:20 Urbanecm: Morning B&C window done
  • 18:19 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.3/extensions/RelatedArticles/resources/ext.relatedArticles.readMore.bootstrap/index.js: cf9d9da: Hotfix: loadRelatedArticles should consider existence of container element (T281547) (duration: 00m 57s)
  • 18:15 urbanecm@deploy1002: Synchronized wmf-config/filebackend.php: bc1bc90: NOOP: beta: Use upload.wikimedia.beta.wmflabs.o for uploads (T281650; 2/2) (duration: 00m 57s)
  • 18:14 urbanecm@deploy1002: Synchronized wmf-config/CommonSettings.php: bc1bc90: NOOP: beta: Use upload.wikimedia.beta.wmflabs.o for uploads (T281650; 1/2) (duration: 00m 58s)
  • 17:44 ryankemper@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) reboot without plugin upgrade (3 nodes at a time) for ElasticSearch cluster search_eqiad: eqiad reboot to apply sec updates - ryankemper@cumin1001 - T280563
  • 17:20 hashar: Restarting CI Jenkins due to "Gearman worker contint2001.wikimedia.org_manager" thread dieing unexpectedly # T281737
  • 16:30 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation reboot without plugin upgrade (3 nodes at a time) for ElasticSearch cluster search_eqiad: eqiad reboot to apply sec updates - ryankemper@cumin1001 - T280563
  • 16:29 ryankemper: T281498 `sudo confctl select 'name=wdqs2004.codfw.wmnet' set/pooled=yes:weight=10` after merge of https://gerrit.wikimedia.org/r/c/operations/puppet/+/684435
  • 16:27 ryankemper@puppetmaster1001: conftool action : set/pooled=yes:weight=10; selector: name=wdqs2004.codfw.wmnet
  • 16:19 legoktm: legoktm@lists1001:~$ sudo apt install default-mysql-client # for temporary debugging
  • 15:48 pt1979@cumin2001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:44 pt1979@cumin2001: START - Cookbook sre.dns.netbox
  • 15:27 Amir1: upgrade group A to mailman3 (T280322)
  • 14:27 volans: uploaded conftool_1.3.1 to apt.wikimedia.org bullseye-wikimedia
  • 13:43 volans: uploaded cumin_4.1.0 to apt.wikimedia.org buster-wikimedia,bullseye-wikimedia
  • 13:10 Urbanecm: Run `User::newSystemUser( 'Maintenance script', [ 'steal' => true ] )` on cswiki to make the user a proper system user (T281703)
  • 12:36 kostajh: Backport window done
  • 12:33 kharlan@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: GrowthExperiments: Set default variant (T278123) GrowthExperiments: enable link recommendations frontend on cswiki (T278710) (duration: 00m 57s)
  • 12:07 kharlan@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: GrowthExperiments: enable link recommendations backend on cswiki (T278710) (duration: 00m 57s)
  • 11:56 kharlan@deploy1002: Synchronized php-1.37.0-wmf.3/extensions/GrowthExperiments: Backport: refreshLinkRecommendations.php: Use per-wiki locks Handle DB readonly errors (T281382) (duration: 00m 58s)
  • 11:15 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.3/extensions/Popups/: a438b64: Fix settings dialog offering ReferencePreviews when unavailable (T281352) (duration: 00m 58s)
  • 11:11 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: c5a7c67: Set wgGEMentorshipMigrationStage to SCHEMA_COMPAT_NEW everywhere (T279853) (duration: 00m 57s)
  • 11:04 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: f1a5ef0: wikidata: post edit constraint jobs on 70% of edits (T204031) (duration: 00m 57s)
  • 10:59 moritzm: installing avahi security updates on buster
  • 10:47 jdrewniak@deploy1002: Synchronized portals: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 00m 57s)
  • 10:46 jdrewniak@deploy1002: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 00m 58s)
  • 09:42 moritzm: installing python3.7 security updates
  • 09:41 joal@deploy1002: Finished deploy [analytics/refinery@584ed6a] (hadoop-test): Hotfix analytics deploy (monthly sqoop) HADOOP-TEST [analytics/refinery@584ed6a] (duration: 29m 24s)
  • 09:12 joal@deploy1002: Started deploy [analytics/refinery@584ed6a] (hadoop-test): Hotfix analytics deploy (monthly sqoop) HADOOP-TEST [analytics/refinery@584ed6a]
  • 09:10 joal@deploy1002: Finished deploy [analytics/refinery@584ed6a] (thin): Hotfix analytics deploy (monthly sqoop) THIN [analytics/refinery@584ed6a] (duration: 00m 07s)
  • 09:10 joal@deploy1002: Started deploy [analytics/refinery@584ed6a] (thin): Hotfix analytics deploy (monthly sqoop) THIN [analytics/refinery@584ed6a]
  • 09:09 joal@deploy1002: Finished deploy [analytics/refinery@584ed6a]: Hotfix analytics deploy (monthly sqoop) [analytics/refinery@584ed6a] (duration: 16m 06s)
  • 08:52 joal@deploy1002: Started deploy [analytics/refinery@584ed6a]: Hotfix analytics deploy (monthly sqoop) [analytics/refinery@584ed6a]
  • 08:01 moritzm: installing edk2 security updates
  • 07:31 moritzm: installing libimage-exiftool-perl security updates

2021-05-02

  • 13:40 dcaro@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on cloudmetrics1002.eqiad.wmnet with reason: Flaky host
  • 13:40 dcaro@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on cloudmetrics1002.eqiad.wmnet with reason: Flaky host

2021-05-01

  • 19:12 Urbanecm: Invalidate password for MaraBot@SUL (T281586)
  • 16:58 legoktm@deploy1002: Synchronized logos/config.yaml: Add eswiki 20th anniversary logos (duration: 00m 57s)
  • 16:56 legoktm@deploy1002: Synchronized wmf-config/logos.php: Use eswiki 20th anniversary logos (T280908) (duration: 00m 56s)
  • 16:50 legoktm@deploy1002: Synchronized static/images/project-logos/: Add eswiki 20th anniversary logos (duration: 00m 57s)
  • 07:22 elukey: powercycle elastic2033 - no ssh, no tty available via mgmt

2021-04-30

  • 21:54 mutante: people1003 - rsycncing /home from peopel1002
  • 15:30 dcaro@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudmetrics1002.eqiad.wmnet with reason: Flaky host
  • 15:29 dcaro@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudmetrics1002.eqiad.wmnet with reason: Flaky host
  • 15:25 bstorm: hard rebooting cloudmetrics1002 T275605
  • 11:40 ladsgroup@deploy1002: Synchronized static/favicon/wikitech.ico: Config: Update wikitech logo (duration: 00m 56s)
  • 11:36 ladsgroup@deploy1002: Synchronized static/images/project-logos/wikitech-1.5x.png: Config: Update wikitech logo (duration: 00m 56s)
  • 11:34 ladsgroup@deploy1002: Synchronized static/images/project-logos/wikitech-2x.png: Config: Update wikitech logo (duration: 00m 57s)
  • 11:33 ladsgroup@deploy1002: Synchronized static/images/project-logos/wikitech.png: Config: Update wikitech logo (duration: 00m 57s)
  • 11:31 ladsgroup@deploy1002: Synchronized logos/config.yaml: Config: Update wikitech logo (duration: 00m 57s)
  • 09:04 dcaro@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1040.eqiad.wmnet with reason: primary nic disconnected
  • 09:03 dcaro@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1040.eqiad.wmnet with reason: primary nic disconnected
  • 08:11 moritzm: remove mc1027 from debmonitor, server is broken and won't return (T276415)
  • 07:38 moritzm: installing iputils updates from Buster point release
  • 06:15 marostegui@cumin1001: dbctl commit (dc=all): 'db1114 (re)pooling @ 100%: Repool db1114', diff saved to https://phabricator.wikimedia.org/P15667 and previous config saved to /var/cache/conftool/dbconfig/20210430-061549-root.json
  • 06:00 marostegui@cumin1001: dbctl commit (dc=all): 'db1114 (re)pooling @ 75%: Repool db1114', diff saved to https://phabricator.wikimedia.org/P15666 and previous config saved to /var/cache/conftool/dbconfig/20210430-060046-root.json
  • 05:51 ryankemper@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) reboot without plugin upgrade (3 nodes at a time) for ElasticSearch cluster search_eqiad: eqiad reboot to apply sec updates - ryankemper@cumin1001 - T280563
  • 05:45 marostegui@cumin1001: dbctl commit (dc=all): 'db1114 (re)pooling @ 50%: Repool db1114', diff saved to https://phabricator.wikimedia.org/P15665 and previous config saved to /var/cache/conftool/dbconfig/20210430-054542-root.json
  • 05:30 marostegui@cumin1001: dbctl commit (dc=all): 'db1114 (re)pooling @ 25%: Repool db1114', diff saved to https://phabricator.wikimedia.org/P15664 and previous config saved to /var/cache/conftool/dbconfig/20210430-053038-root.json
  • 05:16 marostegui: Upgrade kernel on db1114
  • 05:16 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1114 to enable report_host T266483', diff saved to https://phabricator.wikimedia.org/P15663 and previous config saved to /var/cache/conftool/dbconfig/20210430-051558-marostegui.json
  • 05:08 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db1080.eqiad.wmnet
  • 04:57 marostegui@cumin1001: START - Cookbook sre.hosts.decommission for hosts db1080.eqiad.wmnet
  • 04:56 ryankemper: [WDQS] `ryankemper@wdqs1006:~$ sudo systemctl restart wdqs-blazegraph`
  • 04:43 ryankemper: T280563 `sudo -i cookbook sre.elasticsearch.rolling-operation search_eqiad "eqiad reboot to apply sec updates" --reboot --nodes-per-run 3 --start-datetime 2021-04-29T23:04:29 --task-id T280563` on `ryankemper@cumin1001` tmux session `elastic_restarts`
  • 04:43 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation reboot without plugin upgrade (3 nodes at a time) for ElasticSearch cluster search_eqiad: eqiad reboot to apply sec updates - ryankemper@cumin1001 - T280563
  • 04:42 ryankemper: T261239 `elastic2033`, which is known to be in a state of hardware failure (we have a ticket open), is holding up the reboot of codfw. I don't think we have a good way to exclude a node currently. Going to just proceed to `eqiad` for now
  • 04:41 ryankemper@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) reboot without plugin upgrade (3 nodes at a time) for ElasticSearch cluster search_codfw: codfw reboot - ryankemper@cumin1001 - T280563
  • 04:39 ryankemper: T280382 `sudo -i cookbook sre.wdqs.data-transfer --source wdqs1003.eqiad.wmnet --dest wdqs1010.eqiad.wmnet --reason "transferring fresh categories journal following reimage" --blazegraph_instance categories` on `ryankemper@cumin1001` tmux session `reimage`
  • 04:39 ryankemper@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99)
  • 04:39 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 04:05 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs1010.eqiad.wmnet with reason: REIMAGE
  • 04:03 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs1010.eqiad.wmnet with reason: REIMAGE
  • 03:50 ryankemper: T280382 `sudo -i wmf-auto-reimage-host -p T280382 wdqs1010.eqiad.wmnet` on `ryankemper@cumin1001` tmux session `reimage`
  • 03:47 ryankemper: T280563 about half of codfw nodes have been rebooted before the failure caused by write queue not emptying fast enough, kicking it off again:`sudo -i cookbook sre.elasticsearch.rolling-operation search_codfw "codfw reboot" --reboot --nodes-per-run 3 --start-datetime 2021-04-29T23:04:29 --task-id T280563` on `ryankemper@cumin1001` tmux session `elastic_restarts`
  • 03:45 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation reboot without plugin upgrade (3 nodes at a time) for ElasticSearch cluster search_codfw: codfw reboot - ryankemper@cumin1001 - T280563
  • 01:08 ryankemper@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) reboot without plugin upgrade (3 nodes at a time) for ElasticSearch cluster search_codfw: codfw reboot - ryankemper@cumin1001 - T280563

2021-04-29

  • 23:36 thcipriani@deploy1002: Synchronized README: Config: Revert "DEMO: Add newline to README" (duration: 00m 56s)
  • 23:18 ryankemper: T280563 successful reboot of `relforge100[3,4]`; `relforge` cluster is back to green status.
  • 23:16 thcipriani@deploy1002: Synchronized README: Config: DEMO: Add newline to README (duration: 00m 56s)
  • 23:08 ryankemper: T280563 `sudo -i cookbook sre.elasticsearch.rolling-operation search_codfw "codfw reboot" --reboot --nodes-per-run 3 --start-datetime 2021-04-29T23:04:29 --task-id T280563` on `ryankemper@cumin1001` tmux session `elastic_restarts` (amended command)
  • 23:06 ryankemper: T280563 `sudo -i cookbook sre.elasticsearch.rolling-operation codfw "codfw reboot" --reboot --nodes-per-run 3 --start-datetime 2021-04-29T23:04:29 --task-id T280563` on `ryankemper@cumin1001` tmux session `elastic_restarts`
  • 23:05 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation reboot without plugin upgrade (3 nodes at a time) for ElasticSearch cluster search_codfw: codfw reboot - ryankemper@cumin1001 - T280563
  • 22:46 ryankemper: T280563 Current master is `relforge1003-relforge-eqiad`, will reboot `1004` first then `1003` after
  • 22:44 ryankemper: T280563 Bleh, we never moved the new config into spicerack, so it's trying to talk to the old relforge hosts which no longer exist. Will reboot relforge manually and use the cookbook for codfw/eqiad, and circle back later for the spicerack change
  • 22:37 ryankemper@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) reboot without plugin upgrade (2 nodes at a time) for ElasticSearch cluster relforge: relforge reboot - ryankemper@cumin1001 - T280563
  • 22:36 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation reboot without plugin upgrade (2 nodes at a time) for ElasticSearch cluster relforge: relforge reboot - ryankemper@cumin1001 - T280563
  • 22:32 ryankemper: T280563 Spotted the issue; forgot to set `--without-lvs` for relforge reboot
  • 22:27 ryankemper: T280563 `urllib3.exceptions.NewConnectionError: <urllib3.connection.VerifiedHTTPSConnection object at 0x7fbe4bb8a518>: Failed to establish a new connection: [Errno -2] Name or service not known`
  • 22:26 ryankemper@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) restart without plugin upgrade (1 nodes at a time) for ElasticSearch cluster relforge: relforge restart - ryankemper@cumin1001 - T280563
  • 22:26 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation restart without plugin upgrade (1 nodes at a time) for ElasticSearch cluster relforge: relforge restart - ryankemper@cumin1001 - T280563
  • 22:21 ryankemper@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) reboot without plugin upgrade (2 nodes at a time) for ElasticSearch cluster relforge: relforge reboot - ryankemper@cumin1001 - T280563
  • 22:21 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation reboot without plugin upgrade (2 nodes at a time) for ElasticSearch cluster relforge: relforge reboot - ryankemper@cumin1001 - T280563
  • 22:21 ryankemper@cumin1001: END (ERROR) - Cookbook sre.elasticsearch.rolling-operation (exit_code=97) reboot without plugin upgrade (2 nodes at a time) for ElasticSearch cluster relforge: relforge reboot - ryankemper@cumin1001 - T280563
  • 22:20 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation reboot without plugin upgrade (2 nodes at a time) for ElasticSearch cluster relforge: relforge reboot - ryankemper@cumin1001 - T280563
  • 21:36 mutante: icinga - enabling disabled notifications for random an-worker nodes where mgmt interface had enabled alerts but the actual host didnt
  • 21:32 mutante: icinga - enabled notifications for checks on ms-backup1001 - they were all manually disabled but none of the checks had any status change since 50 days which indicates it was forgotten to turn them back on which is a common issue with disabling notifications
  • 21:16 mutante: backup1001 - sudo check_bacula.py --icinga
  • 20:54 marostegui: Stop mysql on tendril for the UTC night, dbtree and tendrill will remain down for a few hours T281486
  • 20:16 marostegui: Restart tendril database - T281486
  • 20:00 jhuneidi@deploy1002: rebuilt and synchronized wikiversions files: all wikis to 1.37.0-wmf.3 refs T278347
  • 19:46 jhuneidi@deploy1002: Synchronized php: group1 wikis to 1.37.0-wmf.3 refs T278347 (duration: 01m 08s)
  • 19:45 jhuneidi@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.37.0-wmf.3 refs T278347
  • 19:32 dpifke@deploy1002: Finished deploy [performance/navtiming@e7ad939]: Deploy https://gerrit.wikimedia.org/r/c/performance/navtiming/+/683484 (duration: 00m 05s)
  • 19:32 dpifke@deploy1002: Started deploy [performance/navtiming@e7ad939]: Deploy https://gerrit.wikimedia.org/r/c/performance/navtiming/+/683484
  • 19:01 Krinkle: graphite1004/2003: prune /var/lib/carbon/whisper/MediaWiki/wanobjectcache/revision_row_1/ (bad data from Sep 2019)
  • 18:59 Krinkle: graphite1004/2003: prune /var/lib/carbon/whisper/rl-minify-* (bad data from Aug 2018)
  • 18:58 Krinkle: graphite1004/2003: prune /var/lib/carbon/whisper/MediaWiki_ExternalGuidance_init_Google_tr_fr (bad data from Nov 2019)
  • 18:38 krinkle@deploy1002: Synchronized php-1.37.0-wmf.1/includes/libs/objectcache/MemcachedBagOStuff.php: I926797, T281480 (duration: 01m 08s)
  • 18:33 mutante: LDAP - added mmandere to wmf group (T281344)
  • 18:10 krinkle@deploy1002: Synchronized php-1.37.0-wmf.3/includes/libs/objectcache/MemcachedBagOStuff.php: I926797, T281480 (duration: 01m 09s)
  • 17:13 pt1979@cumin2001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:10 pt1979@cumin2001: START - Cookbook sre.dns.netbox
  • 17:01 pt1979@cumin2001: START - Cookbook sre.dns.netbox
  • 16:29 ryankemper: T281498 `sudo -E cumin 'C:role::lvs::balancer' 'sudo run-puppet-agent'`
  • 16:28 liw@deploy1002: rebuilt and synchronized wikiversions files: Revert "group1 wikis to 1.37.0-wmf.1"
  • 16:27 liw@deploy1002: sync-wikiversions aborted: Revert "group[0|1] wikis to [VERSION]" (duration: 00m 01s)
  • 16:22 ryankemper: T281498 `ryankemper@wdqs2004:~$ sudo depool`
  • 16:20 ryankemper: T281498 `ryankemper@wdqs2004:~$ sudo run-puppet-agent`
  • 16:18 otto@deploy1002: Finished deploy [analytics/refinery@b3c5820] (hadoop-test): update event_sanitized_main allowlst on an-launcher1002 - T273789 (duration: 02m 39s)
  • 16:15 otto@deploy1002: Started deploy [analytics/refinery@b3c5820] (hadoop-test): update event_sanitized_main allowlst on an-launcher1002 - T273789
  • 16:12 papaul: powerdown thanos-fe2001 for memory swap
  • 15:44 ryankemper: T280382 `sudo -i wmf-auto-reimage-host -p T280382 --new wdqs1004.eqiad.wmnet` on `ryankemper@cumin1001` tmux session `reimage` (trying reimaging this host one final time, if this fails again will need to do a deeper investigation into what's going wrong here)
  • 15:43 ryankemper: [WDQS] `wdqs2001` is high on update lag but otherwise functioning; will repool when lag is caught up
  • 15:37 ryankemper: [WDQS] `sudo systemctl restart wdqs-blazegraph` && `sudo systemctl restart wdqs-updater` on `wdqs2001`
  • 15:35 ryankemper: [WDQS] ^ scratch that, depooled `wdqs2001`
  • 15:34 ryankemper: [WDQS] pooled `wdqs2001`
  • 14:35 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on eventlog[1002-1003].eqiad.wmnet with reason: eventlog1003 migration
  • 14:35 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on eventlog[1002-1003].eqiad.wmnet with reason: eventlog1003 migration
  • 13:44 moritzm: installing Java security updates on stat* hosts
  • 13:43 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on eventlog1003.eqiad.wmnet with reason: eventlog1003 migration
  • 13:43 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on eventlog1003.eqiad.wmnet with reason: eventlog1003 migration
  • 13:42 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on eventlog1002.eqiad.wmnet with reason: eventlog1003 migration
  • 13:42 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on eventlog1002.eqiad.wmnet with reason: eventlog1003 migration
  • 13:40 otto@deploy1002: Finished deploy [analytics/refinery@b3c5820]: update event_sanitized_main allowlst on an-launcher1002 - T273789 (duration: 02m 59s)
  • 13:37 otto@deploy1002: Started deploy [analytics/refinery@b3c5820]: update event_sanitized_main allowlst on an-launcher1002 - T273789
  • 13:11 moritzm: installing postgresql-11 security updates
  • 13:08 jbond42: merge netbase change to manage /etc/services
  • 13:07 liw@deploy1002: Synchronized php: group1 wikis to 1.37.0-wmf.3 (duration: 01m 07s)
  • 13:06 liw@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.37.0-wmf.3
  • 12:36 Amir1: upgrading Quiddity to admin in mailman3
  • 12:36 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on eventlog1002.eqiad.wmnet with reason: Testing migration of processors to eventlog1003
  • 12:36 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on eventlog1002.eqiad.wmnet with reason: Testing migration of processors to eventlog1003
  • 12:26 moritzm: installing grub2 updates from buster point release
  • 12:06 jbond42: update debmonitor.discover.wmnet ssl cert
  • 11:59 ladsgroup@deploy1002: Synchronized wmf-config/extension-list: Config: Undeploy JADE from production, Part III (T281418) (duration: 01m 07s)
  • 11:54 ladsgroup@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Undeploy JADE from production, Part II (T281418), Part I (duration: 01m 06s)
  • 11:49 ladsgroup@deploy1002: Synchronized wmf-config/CommonSettings.php: Config: Undeploy JADE from production, Part I (T281418) (duration: 01m 07s)
  • 11:45 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sretest1002.eqiad.wmnet
  • 11:40 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host sretest1002.eqiad.wmnet
  • 11:38 mbsantos@deploy1002: Synchronized wmf-config/CommonSettings.php: Config: Enable suggested values in TemplateData and VisualEditor CommonSettings (T273857) (duration: 01m 07s)
  • 11:34 ladsgroup@deploy1002: Synchronized php-1.37.0-wmf.1/extensions/ContentTranslation/specials/SpecialContentTranslation.php: Backport: Another fix for token cookie handling (T281346) (duration: 01m 07s)
  • 11:32 ladsgroup@deploy1002: Synchronized php-1.37.0-wmf.3/extensions/ContentTranslation/specials/SpecialContentTranslation.php: Backport: Another fix for token cookie handling (T281346) (duration: 01m 08s)
  • 11:32 marostegui@cumin1001: dbctl commit (dc=all): 'db1110 (re)pooling @ 100%: Repool db1110', diff saved to https://phabricator.wikimedia.org/P15658 and previous config saved to /var/cache/conftool/dbconfig/20210429-113211-root.json
  • 11:24 mbsantos@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Enable suggested values in TemplateData and VisualEditor InitialiseSettings (T273857) (duration: 01m 07s)
  • 11:17 marostegui@cumin1001: dbctl commit (dc=all): 'db1110 (re)pooling @ 75%: Repool db1110', diff saved to https://phabricator.wikimedia.org/P15657 and previous config saved to /var/cache/conftool/dbconfig/20210429-111708-root.json
  • 11:02 marostegui@cumin1001: dbctl commit (dc=all): 'db1110 (re)pooling @ 50%: Repool db1110', diff saved to https://phabricator.wikimedia.org/P15656 and previous config saved to /var/cache/conftool/dbconfig/20210429-110204-root.json
  • 10:59 moritzm: updating apt on buster (SUA 198), which eases bullseye upgrades T275873
  • 10:56 ladsgroup@deploy1002: Synchronized php-1.37.0-wmf.3/extensions/ContentTranslation/modules/base/mw.cx.SiteMapper.js: Backport: Fix CX token cookie (T281346) (duration: 01m 08s)
  • 10:54 ladsgroup@deploy1002: Synchronized php-1.37.0-wmf.1/extensions/ContentTranslation/modules/base/mw.cx.SiteMapper.js: Backport: Fix CX token cookie (T281346) (duration: 01m 09s)
  • 10:47 marostegui@cumin1001: dbctl commit (dc=all): 'db1110 (re)pooling @ 25%: Repool db1110', diff saved to https://phabricator.wikimedia.org/P15655 and previous config saved to /var/cache/conftool/dbconfig/20210429-104700-root.json
  • 10:27 marostegui: Upgrade kernel on db1110
  • 10:24 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1110 for kernel upgrade', diff saved to https://phabricator.wikimedia.org/P15654 and previous config saved to /var/cache/conftool/dbconfig/20210429-102447-marostegui.json
  • 09:42 volans: uploaded pynetbox 5.3.0-2 to bullseye-wikimedia on qpt.w.o
  • 09:39 volans@deploy1002: Finished deploy [homer/deploy@e394769]: Release v0.2.8 (duration: 03m 30s)
  • 09:35 volans@deploy1002: Started deploy [homer/deploy@e394769]: Release v0.2.8
  • 09:01 jynus: stop replication and checking data of db2100:s7
  • 08:57 marostegui: Upgrade kernel on db2133
  • 08:51 marostegui: Upgrade kernel on db2125
  • 08:50 marostegui: Upgrade kernel on db2124
  • 08:46 marostegui: Upgrade kernel on db2122
  • 08:40 marostegui@cumin1001: dbctl commit (dc=all): 'db1083 (re)pooling @ 100%: Repool db1083', diff saved to https://phabricator.wikimedia.org/P15652 and previous config saved to /var/cache/conftool/dbconfig/20210429-084011-root.json
  • 08:39 marostegui: Upgrade kernel on db2121
  • 08:33 marostegui: Upgrade kernel on db2120
  • 08:28 volans@deploy1002: Finished deploy [homer/deploy@89cd07c]: Release v0.2.7 (duration: 03m 08s)
  • 08:27 marostegui: Upgrade kernel on db2115
  • 08:25 volans@deploy1002: Started deploy [homer/deploy@89cd07c]: Release v0.2.7
  • 08:25 marostegui@cumin1001: dbctl commit (dc=all): 'db1083 (re)pooling @ 80%: Repool db1083', diff saved to https://phabricator.wikimedia.org/P15651 and previous config saved to /var/cache/conftool/dbconfig/20210429-082507-root.json
  • 08:19 marostegui: Upgrade kernel on db2114
  • 08:12 marostegui: Upgrade kernel on db2109
  • 08:10 marostegui@cumin1001: dbctl commit (dc=all): 'db1083 (re)pooling @ 70%: Repool db1083', diff saved to https://phabricator.wikimedia.org/P15649 and previous config saved to /var/cache/conftool/dbconfig/20210429-081004-root.json
  • 07:55 marostegui@cumin1001: dbctl commit (dc=all): 'db1083 (re)pooling @ 60%: Repool db1083', diff saved to https://phabricator.wikimedia.org/P15648 and previous config saved to /var/cache/conftool/dbconfig/20210429-075500-root.json
  • 07:54 marostegui: Upgrade kernel on db2089
  • 07:48 jynus: rolling restart of bacula hosts T273182
  • 07:48 marostegui@deploy1002: Synchronized wmf-config/db-eqiad.php: Repool pc1007 (duration: 01m 07s)
  • 07:46 marostegui@cumin1001: dbctl commit (dc=all): 'db1156 (re)pooling @ 100%: Slowly pool into s2 db1156', diff saved to https://phabricator.wikimedia.org/P15647 and previous config saved to /var/cache/conftool/dbconfig/20210429-074625-root.json
  • 07:39 marostegui@cumin1001: dbctl commit (dc=all): 'db1083 (re)pooling @ 50%: Repool db1083', diff saved to https://phabricator.wikimedia.org/P15646 and previous config saved to /var/cache/conftool/dbconfig/20210429-073956-root.json
  • 07:31 marostegui@cumin1001: dbctl commit (dc=all): 'db1156 (re)pooling @ 90%: Slowly pool into s2 db1156', diff saved to https://phabricator.wikimedia.org/P15645 and previous config saved to /var/cache/conftool/dbconfig/20210429-073122-root.json
  • 07:28 marostegui: Stop mysql and upgrade kernel on pc1007
  • 07:28 marostegui@deploy1002: Synchronized wmf-config/db-eqiad.php: Depool pc1007 (duration: 01m 08s)
  • 07:24 marostegui@cumin1001: dbctl commit (dc=all): 'db1083 (re)pooling @ 40%: Repool db1083', diff saved to https://phabricator.wikimedia.org/P15644 and previous config saved to /var/cache/conftool/dbconfig/20210429-072453-root.json
  • 07:16 marostegui@cumin1001: dbctl commit (dc=all): 'db1156 (re)pooling @ 80%: Slowly pool into s2 db1156', diff saved to https://phabricator.wikimedia.org/P15643 and previous config saved to /var/cache/conftool/dbconfig/20210429-071618-root.json
  • 07:09 marostegui@cumin1001: dbctl commit (dc=all): 'db1083 (re)pooling @ 25%: Repool db1083', diff saved to https://phabricator.wikimedia.org/P15642 and previous config saved to /var/cache/conftool/dbconfig/20210429-070949-root.json
  • 07:01 marostegui@cumin1001: dbctl commit (dc=all): 'db1156 (re)pooling @ 75%: Slowly pool into s2 db1156', diff saved to https://phabricator.wikimedia.org/P15641 and previous config saved to /var/cache/conftool/dbconfig/20210429-070114-root.json
  • 06:54 marostegui@cumin1001: dbctl commit (dc=all): 'db1083 (re)pooling @ 10%: Repool db1083', diff saved to https://phabricator.wikimedia.org/P15640 and previous config saved to /var/cache/conftool/dbconfig/20210429-065445-root.json
  • 06:53 godog: add 100G to prometheus/ops in eqiad
  • 06:46 marostegui@cumin1001: dbctl commit (dc=all): 'db1156 (re)pooling @ 60%: Slowly pool into s2 db1156', diff saved to https://phabricator.wikimedia.org/P15639 and previous config saved to /var/cache/conftool/dbconfig/20210429-064611-root.json
  • 06:31 marostegui@cumin1001: dbctl commit (dc=all): 'db1156 (re)pooling @ 50%: Slowly pool into s2 db1156', diff saved to https://phabricator.wikimedia.org/P15637 and previous config saved to /var/cache/conftool/dbconfig/20210429-063107-root.json
  • 06:16 marostegui@cumin1001: dbctl commit (dc=all): 'db1156 (re)pooling @ 40%: Slowly pool into s2 db1156', diff saved to https://phabricator.wikimedia.org/P15636 and previous config saved to /var/cache/conftool/dbconfig/20210429-061603-root.json
  • 06:01 marostegui@cumin1001: dbctl commit (dc=all): 'db1156 (re)pooling @ 30%: Slowly pool into s2 db1156', diff saved to https://phabricator.wikimedia.org/P15635 and previous config saved to /var/cache/conftool/dbconfig/20210429-060100-root.json
  • 05:45 marostegui@cumin1001: dbctl commit (dc=all): 'db1156 (re)pooling @ 25%: Slowly pool into s2 db1156', diff saved to https://phabricator.wikimedia.org/P15634 and previous config saved to /var/cache/conftool/dbconfig/20210429-054556-root.json
  • 05:30 marostegui@cumin1001: dbctl commit (dc=all): 'db1156 (re)pooling @ 20%: Slowly pool into s2 db1156', diff saved to https://phabricator.wikimedia.org/P15633 and previous config saved to /var/cache/conftool/dbconfig/20210429-053052-root.json
  • 05:22 marostegui: Check tables on db1121 (this will cause lag on s4 commonswiki, on wikireplicas)
  • 05:21 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1121 for tables checking', diff saved to https://phabricator.wikimedia.org/P15632 and previous config saved to /var/cache/conftool/dbconfig/20210429-052146-marostegui.json
  • 05:15 marostegui@cumin1001: dbctl commit (dc=all): 'db1156 (re)pooling @ 15%: Slowly pool into s2 db1156', diff saved to https://phabricator.wikimedia.org/P15631 and previous config saved to /var/cache/conftool/dbconfig/20210429-051549-root.json
  • 05:00 marostegui@cumin1001: dbctl commit (dc=all): 'db1156 (re)pooling @ 10%: Slowly pool into s2 db1156', diff saved to https://phabricator.wikimedia.org/P15630 and previous config saved to /var/cache/conftool/dbconfig/20210429-050045-root.json
  • 04:55 marostegui@cumin1001: dbctl commit (dc=all): 'Pool db1156 into s2 for the first time with minimal weight T258361', diff saved to https://phabricator.wikimedia.org/P15629 and previous config saved to /var/cache/conftool/dbconfig/20210429-045557-marostegui.json
  • 04:50 marostegui@cumin1001: dbctl commit (dc=all): 'Pool db1156 into s2 for the first time with minimal weight T258361', diff saved to https://phabricator.wikimedia.org/P15627 and previous config saved to /var/cache/conftool/dbconfig/20210429-045015-marostegui.json
  • 04:44 marostegui@cumin1001: dbctl commit (dc=all): 'Pool db1156 into s2 for the first time with minimal weight T258361', diff saved to https://phabricator.wikimedia.org/P15626 and previous config saved to /var/cache/conftool/dbconfig/20210429-044458-marostegui.json
  • 04:44 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1118.eqiad.wmnet with reason: REIMAGE
  • 04:41 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1118.eqiad.wmnet with reason: REIMAGE
  • 04:38 marostegui@cumin1001: dbctl commit (dc=all): 'Pool db1156 into s2 for the first time with minimal weight T258361', diff saved to https://phabricator.wikimedia.org/P15625 and previous config saved to /var/cache/conftool/dbconfig/20210429-043857-marostegui.json
  • 04:38 marostegui@cumin1001: dbctl commit (dc=all): 'Add db1156 to dbctl T258361', diff saved to https://phabricator.wikimedia.org/P15624 and previous config saved to /var/cache/conftool/dbconfig/20210429-043812-marostegui.json
  • 04:27 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1118 for reimage', diff saved to https://phabricator.wikimedia.org/P15623 and previous config saved to /var/cache/conftool/dbconfig/20210429-042757-marostegui.json
  • 02:59 milimetric@deploy1002: Finished deploy [analytics/refinery@740226b] (thin): Hotfix for referrer job (duration: 00m 06s)
  • 02:59 milimetric@deploy1002: Started deploy [analytics/refinery@740226b] (thin): Hotfix for referrer job
  • 02:58 milimetric@deploy1002: Finished deploy [analytics/refinery@740226b]: Hotfix for referrer job (duration: 14m 40s)
  • 02:44 milimetric@deploy1002: Started deploy [analytics/refinery@740226b]: Hotfix for referrer job
  • 01:44 krinkle@deploy1002: Synchronized wmf-config/mc.php: I5869b3c3ba4a (duration: 01m 08s)
  • 01:23 ryankemper: T280382 `sudo -i wmf-auto-reimage-host -p T280382 --new wdqs1004.eqiad.wmnet` on `ryankemper@cumin1001` tmux session `reimage`
  • 01:21 ryankemper@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99)
  • 01:21 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 01:20 ryankemper@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99)
  • 01:20 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 01:19 ryankemper@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99)
  • 01:19 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 01:19 ryankemper: T280382 Aborted data transfer; `wdqs2007` is hosed (see https://phabricator.wikimedia.org/T281437)
  • 01:18 ryankemper@cumin1001: END (ERROR) - Cookbook sre.wdqs.data-transfer (exit_code=97)
  • 00:40 tstarling@deploy1002: Synchronized php-1.37.0-wmf.3/includes/specials/pagers/ImageListPager.php: T281405 (duration: 01m 08s)
  • 00:11 ryankemper: T280382 `sudo -i wmf-auto-reimage-host -p T280382 wdqs1004.eqiad.wmnet` on `ryankemper@cumin1001` tmux session `reimage`
  • 00:06 ryankemper: T280382 `wdqs1013.eqiad.wmnet` has been re-imaged and had the appropriate wikidata/categories journal files transferred. `df -h` shows disk space is no longer an issue following the switch to `raid0`: `/dev/mapper/vg0-srv 2.7T 998G 1.6T 39% /srv`

2021-04-28

  • 23:42 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 23:38 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp5016.eqsin.wmnet with reason: REIMAGE
  • 23:36 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp5016.eqsin.wmnet with reason: REIMAGE
  • 23:36 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp5015.eqsin.wmnet with reason: REIMAGE
  • 23:34 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp5014.eqsin.wmnet with reason: REIMAGE
  • 23:33 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp5015.eqsin.wmnet with reason: REIMAGE
  • 23:32 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp5014.eqsin.wmnet with reason: REIMAGE
  • 23:06 dpifke@deploy1002: Finished deploy [performance/navtiming@cf8b2e9]: Deploying https://gerrit.wikimedia.org/r/c/performance/navtiming/+/682886 (duration: 00m 05s)
  • 23:06 dpifke@deploy1002: Started deploy [performance/navtiming@cf8b2e9]: Deploying https://gerrit.wikimedia.org/r/c/performance/navtiming/+/682886
  • 22:44 dwisehaupt: civiproxy revision changed to 99cecb924a - initial rollout of code for testing
  • 22:26 ryankemper: T280382 `sudo -i cookbook sre.wdqs.data-transfer --source wdqs1004.eqiad.wmnet --dest wdqs1013.eqiad.wmnet --reason "transferring fresh wikidata journal following reimage" --blazegraph_instance blazegraph` on `ryankemper@cumin1001` tmux session `reimage`
  • 22:26 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 22:23 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 22:18 ryankemper: T280382 `sudo -i cookbook sre.wdqs.data-transfer --source wdqs1004.eqiad.wmnet --dest wdqs1013.eqiad.wmnet --reason "transferring fresh categories journal following reimage" --blazegraph_instance categories` on `ryankemper@cumin1001` tmux session `reimage`
  • 22:18 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 22:18 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp5013.eqsin.wmnet with reason: REIMAGE
  • 22:15 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp5013.eqsin.wmnet with reason: REIMAGE
  • 21:49 legoktm@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
  • 21:49 legoktm@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
  • 21:47 legoktm@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 21:46 robh@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on cp5013.eqsin.wmnet with reason: REIMAGE
  • 21:44 legoktm@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 21:44 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp5013.eqsin.wmnet with reason: REIMAGE
  • 21:41 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs1013.eqiad.wmnet with reason: REIMAGE
  • 21:39 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs1013.eqiad.wmnet with reason: REIMAGE
  • 21:39 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 21:39 ryankemper@cumin1001: END (ERROR) - Cookbook sre.wdqs.data-transfer (exit_code=97)
  • 21:38 ryankemper: T280382 `sudo -i cookbook sre.wdqs.data-transfer --source wdqs2001.codfw.wmnet --dest wdqs2007.codfw.wmnet --reason "transferring fresh wikidata journal following reimage" --blazegraph_instance blazegraph` on `ryankemper@cumin1001` tmux session `reimage`
  • 21:37 ryankemper: T280382 `wdqs2007` is reachable again; glancing at `/srv/wdqs` its `wikidata.jnl` is `839G` when it should be `975G` so I'll re-do the wikidata journal transfer
  • 21:32 ryankemper: T280382 [WDQS] `wdqs2007` ssh is unreachable; power cycling via `racadm>>racadm serveraction powercycle`
  • 21:24 ryankemper: T280382 `sudo -i wmf-auto-reimage-host -p T280382 --new wdqs1013.eqiad.wmnet` on `ryankemper@cumin1001` tmux session `reimage` (previous reimage timed out, instance appears to have rebooted)
  • 21:07 robh@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on cp5016.eqsin.wmnet with reason: REIMAGE
  • 21:05 robh@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on cp5015.eqsin.wmnet with reason: REIMAGE
  • 21:04 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp5016.eqsin.wmnet with reason: REIMAGE
  • 21:03 robh@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on cp5013.eqsin.wmnet with reason: REIMAGE
  • 21:03 robh@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on cp5014.eqsin.wmnet with reason: REIMAGE
  • 21:01 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp5013.eqsin.wmnet with reason: REIMAGE
  • 21:01 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp5015.eqsin.wmnet with reason: REIMAGE
  • 21:01 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp5014.eqsin.wmnet with reason: REIMAGE
  • 20:00 robh@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 19:57 jhuneidi@deploy1002: rebuilt and synchronized wikiversions files: Revert "group1 wikis to 1.37.0-wmf.1"
  • 19:56 robh@cumin1001: START - Cookbook sre.dns.netbox
  • 19:13 jhuneidi@deploy1002: Synchronized php: group1 wikis to 1.37.0-wmf.3 refs T278347 (duration: 01m 07s)
  • 19:12 jhuneidi@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.37.0-wmf.3 refs T278347
  • 18:21 legoktm: added mvolz as listadmin for services@ and reset admin pw (T278516)
  • 17:12 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.3/extensions/Wikibase/client/includes/DataAccess/Scribunto/WikibaseLanguageIndependentLuaBindings.php: b392dba: Fix incorrect ItemId typehint in Lua bindings (T281361) (duration: 01m 09s)
  • 16:52 papaul: powerdown logstash2034 for relocation
  • 16:32 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1046.eqiad.wmnet with reason: REIMAGE
  • 16:30 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1045.eqiad.wmnet with reason: REIMAGE
  • 16:29 pt1979@cumin2001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:29 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1046.eqiad.wmnet with reason: REIMAGE
  • 16:28 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1044.eqiad.wmnet with reason: REIMAGE
  • 16:27 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1045.eqiad.wmnet with reason: REIMAGE
  • 16:27 pt1979@cumin2001: START - Cookbook sre.dns.netbox
  • 16:26 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1043.eqiad.wmnet with reason: REIMAGE
  • 16:25 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1044.eqiad.wmnet with reason: REIMAGE
  • 16:24 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1042.eqiad.wmnet with reason: REIMAGE
  • 16:23 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1043.eqiad.wmnet with reason: REIMAGE
  • 16:22 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1041.eqiad.wmnet with reason: REIMAGE
  • 16:21 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1042.eqiad.wmnet with reason: REIMAGE
  • 16:19 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1041.eqiad.wmnet with reason: REIMAGE
  • 16:19 pt1979@cumin2001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 16:12 pt1979@cumin2001: START - Cookbook sre.dns.netbox
  • 15:25 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:15:00 on sessionstore2001.codfw.wmnet with reason: Server relocation
  • 15:25 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 0:15:00 on sessionstore2001.codfw.wmnet with reason: Server relocation
  • 15:24 jayme@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:20 jayme@cumin1001: START - Cookbook sre.dns.netbox
  • 15:19 jayme@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts conf[2001-2003].codfw.wmnet
  • 15:12 pt1979@cumin2001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 15:09 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:15:00 on sessionstore2001.codfw.wmnet with reason: Server relocation
  • 15:09 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 0:15:00 on sessionstore2001.codfw.wmnet with reason: Server relocation
  • 15:03 pt1979@cumin2001: START - Cookbook sre.dns.netbox
  • 15:00 moritzm: imported python-poolcounter 0.0.2-1+deb11u1 to apt.wikimedia.org T275873
  • 14:53 jayme@cumin1001: START - Cookbook sre.hosts.decommission for hosts conf[2001-2003].codfw.wmnet
  • 14:44 moritzm: imported gitlab-ce 13.9.7-ce.0 to apt.wikimedia.org
  • 14:40 milimetric@deploy1002: Finished deploy [analytics/refinery@559d98d] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@559d98d] (duration: 04m 59s)
  • 14:35 milimetric@deploy1002: Started deploy [analytics/refinery@559d98d] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@559d98d]
  • 14:34 milimetric@deploy1002: Finished deploy [analytics/refinery@559d98d] (thin): Regular analytics weekly train THIN [analytics/refinery@559d98d] (duration: 00m 06s)
  • 14:34 milimetric@deploy1002: Started deploy [analytics/refinery@559d98d] (thin): Regular analytics weekly train THIN [analytics/refinery@559d98d]
  • 14:34 milimetric@deploy1002: Finished deploy [analytics/refinery@559d98d]: Regular analytics weekly train [analytics/refinery@559d98d] (duration: 03m 07s)
  • 14:32 moritzm: installing iproute2 updates from buster point release
  • 14:31 milimetric@deploy1002: Started deploy [analytics/refinery@559d98d]: Regular analytics weekly train [analytics/refinery@559d98d]
  • 14:30 milimetric@deploy1002: deploy aborted: - (duration: 00m 00s)
  • 14:30 milimetric@deploy1002: Started deploy [analytics/refinery@559d98d]: -
  • 14:30 milimetric@deploy1002: Finished deploy [analytics/refinery@559d98d]: Regular analytics weekly train [analytics/refinery@559d98d] (duration: 12m 31s)
  • 14:26 moritzm: installing net-snmp updates from buster point release
  • 14:17 milimetric@deploy1002: Started deploy [analytics/refinery@559d98d]: Regular analytics weekly train [analytics/refinery@559d98d]
  • 13:59 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1040.eqiad.wmnet with reason: REIMAGE
  • 13:57 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1040.eqiad.wmnet with reason: REIMAGE
  • 13:15 jayme: restarting pybal on lvs5001,lvs4005,lvs2007 - T271573
  • 13:14 liw@deploy1002: rebuilt and synchronized wikiversions files: Revert "group1 wikis to 3.17.0-wmf.1"
  • 13:10 jayme: restarting pybal on lvs5002,lvs4006,lvs2008 - T271573
  • 13:04 liw@deploy1002: Synchronized php: group1 wikis to 1.37.0-wmf.3 (duration: 01m 07s)
  • 13:03 jmm@cumin2001: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0)
  • 13:03 liw@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.37.0-wmf.3
  • 13:02 moritzm: upgrading deployment servers to PHP 7.4.32
  • 12:55 moritzm: upgrading snapshot hosts to PHP 7.4.32
  • 12:48 jayme: restarting pybal on lvs2009 - T271573
  • 12:45 moritzm: upgrading labweb to PHP 7.4.32
  • 12:43 jmm@cumin2001: START - Cookbook sre.cassandra.roll-restart
  • 12:42 jayme: restarting pybal on lvs5003,lvs4007 - T271573
  • 12:39 jayme: restarting pybal on lvs2010 - T271573
  • 12:36 jmm@cumin2001: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0)
  • 12:28 apergos: manually edited /srv/deployment/dumps/dumps-cache/config on snapshots1011,12,13 to change deploy1001 to deploy1002 (where did it get the old value from? these are new installs!)
  • 12:16 moritzm: rolling restart of cassandra in restbase-dev to pick up Java security updates
  • 12:15 jmm@cumin2001: START - Cookbook sre.cassandra.roll-restart
  • 12:15 jmm@cumin2001: END (FAIL) - Cookbook sre.cassandra.roll-restart (exit_code=99)
  • 12:15 jmm@cumin2001: START - Cookbook sre.cassandra.roll-restart
  • 11:53 jayme: switching SRV record _etcd._tcp to new etcd cluster (for codfw, eqsin, ulsfo)
  • 11:22 Urbanecm: EU B&C window done
  • 11:20 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.3/extensions/Popups/: 8d0ae5e: Separate reference preview settings in beta & non-beta (T281235) (duration: 01m 08s)
  • 11:16 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: ddbc378: Enable partial action blocks on testwiki (T280528) (duration: 01m 07s)
  • 11:05 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudgw1002.eqiad.wmnet with reason: REIMAGE
  • 11:03 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudgw1001.eqiad.wmnet with reason: REIMAGE
  • 11:03 aborrero@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudgw1002.eqiad.wmnet with reason: REIMAGE
  • 11:01 aborrero@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudgw1001.eqiad.wmnet with reason: REIMAGE
  • 10:44 jbond42: updated the check-raid nrpe script to python3
  • 09:40 moritzm: restarting Tomcat on idp-test1001 to pick up Java security updates
  • 09:21 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3316 (re)pooling @ 100%: Repool db1098:3316', diff saved to https://phabricator.wikimedia.org/P15618 and previous config saved to /var/cache/conftool/dbconfig/20210428-092103-root.json
  • 09:19 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host contint1001.wikimedia.org
  • 09:12 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host contint1001.wikimedia.org
  • 09:09 moritzm: restarting jenkins* on releases to pick up Java security updates
  • 09:08 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host contint2001.wikimedia.org
  • 09:06 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3316 (re)pooling @ 75%: Repool db1098:3316', diff saved to https://phabricator.wikimedia.org/P15617 and previous config saved to /var/cache/conftool/dbconfig/20210428-090559-root.json
  • 08:59 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host contint2001.wikimedia.org
  • 08:50 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3316 (re)pooling @ 50%: Repool db1098:3316', diff saved to https://phabricator.wikimedia.org/P15616 and previous config saved to /var/cache/conftool/dbconfig/20210428-085056-root.json
  • 08:42 urbanecm@deploy1002: Synchronized wmf-config/InterwikiSortOrders.php: 96ad0d4: Add alt, bcl, diq, mad, mni, mnw, nia, skr, tay and trv to InterwikiSortOrders (duration: 01m 08s)
  • 08:41 urbanecm@deploy1002: sync-file aborted: 96ad0d4: Add alt, bcl, diq, mad, mni, mnw, nia, skr, tay and trv to InterwikiSortOrders (duration: 00m 02s)
  • 08:36 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1098:3317', diff saved to https://phabricator.wikimedia.org/P15615 and previous config saved to /var/cache/conftool/dbconfig/20210428-083625-marostegui.json
  • 08:35 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3316 (re)pooling @ 25%: Repool db1098:3316', diff saved to https://phabricator.wikimedia.org/P15614 and previous config saved to /var/cache/conftool/dbconfig/20210428-083552-root.json
  • 08:34 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3317 (re)pooling @ 25%: Repool db1098:3316', diff saved to https://phabricator.wikimedia.org/P15613 and previous config saved to /var/cache/conftool/dbconfig/20210428-083458-root.json
  • 08:26 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3317 (re)pooling @ 100%: Repool db1098:3317', diff saved to https://phabricator.wikimedia.org/P15612 and previous config saved to /var/cache/conftool/dbconfig/20210428-082625-root.json
  • 08:25 effie: update php7.2 on jobrunners and parsoid servers && rolling php7.2-fpm restarts
  • 08:11 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3317 (re)pooling @ 75%: Repool db1098:3317', diff saved to https://phabricator.wikimedia.org/P15611 and previous config saved to /var/cache/conftool/dbconfig/20210428-081121-root.json
  • 07:56 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3317 (re)pooling @ 50%: Repool db1098:3317', diff saved to https://phabricator.wikimedia.org/P15610 and previous config saved to /var/cache/conftool/dbconfig/20210428-075618-root.json
  • 07:52 effie: update php7.2 on api servers && rolling php7.2-fpm restarts
  • 07:41 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3317 (re)pooling @ 25%: Repool db1098:3317', diff saved to https://phabricator.wikimedia.org/P15609 and previous config saved to /var/cache/conftool/dbconfig/20210428-074114-root.json
  • 07:40 marostegui: Deploy schema change on db1098:3316 and db1098:3316 T266486 T268392 T273360
  • 07:27 effie: update php7.2 on appservers && rolling php7.2-fpm restarts
  • 07:26 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1098 for schema change and kernel upgrade', diff saved to https://phabricator.wikimedia.org/P15608 and previous config saved to /var/cache/conftool/dbconfig/20210428-072609-marostegui.json
  • 07:19 elukey@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 07:14 elukey@cumin1001: START - Cookbook sre.dns.netbox
  • 07:12 elukey: add AAAA record for kafka-main200[3,4,5].codfw.wmnet
  • 07:10 elukey@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 07:05 elukey@cumin1001: START - Cookbook sre.dns.netbox
  • 07:04 elukey: add AAAA record for kafka-main2002.codfw.wmnet
  • 07:03 marostegui: Deploy schema change on db2089:3316 and db1098:3316 T266486 T268392 T273360
  • 06:26 legoktm: created mailman3 superusers for Administrator (noc@), Ladsgroup and Legoktm
  • 06:23 legoktm: legoktm@lists1001:~$ sudo mailman-web set_default_site --name lists.wikimedia.org --domain lists.wikimedia.org
  • 06:14 marostegui@cumin1001: dbctl commit (dc=all): 'db1112 (re)pooling @ 100%: Repool db1112', diff saved to https://phabricator.wikimedia.org/P15607 and previous config saved to /var/cache/conftool/dbconfig/20210428-061426-root.json
  • 06:00 marostegui: Stop MySQL on db2096 (x1 codfw) T281135
  • 05:59 marostegui@cumin1001: dbctl commit (dc=all): 'db1112 (re)pooling @ 75%: Repool db1112', diff saved to https://phabricator.wikimedia.org/P15606 and previous config saved to /var/cache/conftool/dbconfig/20210428-055922-root.json
  • 05:51 marostegui@cumin1001: dbctl commit (dc=all): 'Pool db1167 in s8 T258361', diff saved to https://phabricator.wikimedia.org/P15605 and previous config saved to /var/cache/conftool/dbconfig/20210428-055144-marostegui.json
  • 05:44 marostegui@cumin1001: dbctl commit (dc=all): 'db1112 (re)pooling @ 50%: Repool db1112', diff saved to https://phabricator.wikimedia.org/P15604 and previous config saved to /var/cache/conftool/dbconfig/20210428-054419-root.json
  • 05:29 marostegui@cumin1001: dbctl commit (dc=all): 'db1112 (re)pooling @ 25%: Repool db1112', diff saved to https://phabricator.wikimedia.org/P15603 and previous config saved to /var/cache/conftool/dbconfig/20210428-052915-root.json
  • 05:15 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1112 for schema change', diff saved to https://phabricator.wikimedia.org/P15602 and previous config saved to /var/cache/conftool/dbconfig/20210428-051526-marostegui.json
  • 05:07 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1083 (old s1 master) for schema change', diff saved to https://phabricator.wikimedia.org/P15601 and previous config saved to /var/cache/conftool/dbconfig/20210428-050754-marostegui.json
  • 05:01 marostegui@cumin1001: dbctl commit (dc=all): 'Promote db1163 to s1 master and remove read-only from s1 T278214', diff saved to https://phabricator.wikimedia.org/P15600 and previous config saved to /var/cache/conftool/dbconfig/20210428-050138-marostegui.json
  • 05:00 marostegui@cumin1001: dbctl commit (dc=all): 'Set s1 as read-only for maintenance T278214', diff saved to https://phabricator.wikimedia.org/P15599 and previous config saved to /var/cache/conftool/dbconfig/20210428-050041-marostegui.json
  • 05:00 marostegui: Starting s1 eqiad failover from db1083 to db1163 - T278214
  • 04:14 ryankemper: T280382 `sudo -i cookbook sre.wdqs.data-transfer --source wdqs2001.codfw.wmnet --dest wdqs2007.codfw.wmnet --reason "transferring fresh wikidata journal following reimage" --blazegraph_instance blazegraph` on `ryankemper@cumin1001` tmux session `reimage`
  • 04:14 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 04:13 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 04:08 ryankemper: T280382 `sudo -i cookbook sre.wdqs.data-transfer --source wdqs2001.codfw.wmnet --dest wdqs2007.codfw.wmnet --reason "transferring fresh categories journal following reimage" --blazegraph_instance categories` on `ryankemper@cumin1001` tmux session `reimage`
  • 04:08 marostegui: Start replication changes, connect everything to db1163 T278214
  • 04:08 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 04:07 marostegui@cumin1001: dbctl commit (dc=all): 'Set db1163 with weight 0 before the switchover T278214', diff saved to https://phabricator.wikimedia.org/P15598 and previous config saved to /var/cache/conftool/dbconfig/20210428-040718-marostegui.json
  • 03:53 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs2007.codfw.wmnet with reason: REIMAGE
  • 03:51 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs2007.codfw.wmnet with reason: REIMAGE
  • 03:49 ryankemper@puppetmaster1001: conftool action : set/pooled=no; selector: name=wdqs2007.codfw.wmnet
  • 03:48 ryankemper@puppetmaster1001: conftool action : set/pooled=no; selector: name=wdqs1013.eqiad.wmnet
  • 03:33 ryankemper: `sudo systemctl restart wdqs-blazegraph` on `wdqs1012` to clear the `WDQS SPARQL` warning
  • 03:32 ryankemper: T280382 `sudo -i wmf-auto-reimage-host -p T280382 wdqs2007.codfw.wmnet` on `ryankemper@cumin1001` tmux session `reimage`
  • 03:32 ryankemper: T280382 `sudo -i wmf-auto-reimage-host -p T280382 wdqs1013.eqiad.wmnet` on `ryankemper@cumin1001` tmux session `reimage`
  • 02:33 robh@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 02:28 robh@cumin1001: START - Cookbook sre.dns.netbox
  • 01:06 robh@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 01:00 robh@cumin1001: START - Cookbook sre.dns.netbox
  • 00:03 robh@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on snapshot1015.eqiad.wmnet with reason: REIMAGE
  • 00:01 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on snapshot1014.eqiad.wmnet with reason: REIMAGE

2021-04-27

  • 23:58 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on snapshot1015.eqiad.wmnet with reason: REIMAGE
  • 23:57 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on snapshot1013.eqiad.wmnet with reason: REIMAGE
  • 23:57 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on snapshot1014.eqiad.wmnet with reason: REIMAGE
  • 23:55 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on snapshot1012.eqiad.wmnet with reason: REIMAGE
  • 23:54 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on snapshot1013.eqiad.wmnet with reason: REIMAGE
  • 23:53 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on snapshot1011.eqiad.wmnet with reason: REIMAGE
  • 23:52 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on snapshot1012.eqiad.wmnet with reason: REIMAGE
  • 23:51 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on snapshot1011.eqiad.wmnet with reason: REIMAGE
  • 21:07 legoktm@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts rdb[2005-2006].codfw.wmnet
  • 20:55 legoktm@cumin1001: START - Cookbook sre.hosts.decommission for hosts rdb[2005-2006].codfw.wmnet
  • 20:54 legoktm@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts rdb[2003-2004].codfw.wmnet
  • 20:42 legoktm@cumin1001: START - Cookbook sre.hosts.decommission for hosts rdb[2003-2004].codfw.wmnet
  • 20:32 bblack: re-pooling codfw public traffic - T279457
  • 20:11 jhuneidi@deploy1002: Synchronized php-1.37.0-wmf.3/includes/rcfeed/IRCColourfulRCFeedFormatter.php: Backport rcfeed: Remove reference assignment (T281226) to 1.37.0-wmf.3 (duration: 01m 12s)
  • 20:08 herron@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-main2005.codfw.wmnet with reason: REIMAGE
  • 20:06 herron@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-main2005.codfw.wmnet with reason: REIMAGE
  • 19:44 dzahn@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host people1003.eqiad.wmnet
  • 19:37 herron@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-main2004.codfw.wmnet with reason: REIMAGE
  • 19:35 papaul: powerdown ms-backup2001 for maintenance
  • 19:35 herron@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-main2004.codfw.wmnet with reason: REIMAGE
  • 19:07 papaul: powerdown logstash2035 for maintenance
  • 19:03 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm for new host people1003.eqiad.wmnet
  • 19:00 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts people1003.eqiad.wmnet
  • 18:50 mutante: people1003 - destroying VM and recreating again from scratch to test if issue of no console and no access is repeatable
  • 18:50 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts people1003.eqiad.wmnet
  • 18:37 herron@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-main1005.eqiad.wmnet with reason: REIMAGE
  • 18:35 herron@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-main1005.eqiad.wmnet with reason: REIMAGE
  • 18:33 mutante: people1003 - rebooting, trying to get new VM to work
  • 18:33 Urbanecm: Morning B&C window done
  • 18:32 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 91a85f2: ac770bf: Enable language in header for office and testwiki users (T280526) (duration: 01m 19s)
  • 18:32 bblack: lvs2009 - restart pybal + re-run puppet agent - T279457
  • 18:23 robh@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 18:20 bblack@cumin1001: conftool action : set/pooled=yes; selector: name=cp203[56].codfw.wmnet
  • 18:20 bblack: cp203[56] - repooling in etcd - T279457
  • 18:19 robh@cumin1001: START - Cookbook sre.dns.netbox
  • 18:17 robh@cumin1001: END (ERROR) - Cookbook sre.dns.netbox (exit_code=97)
  • 18:17 robh@cumin1001: START - Cookbook sre.dns.netbox
  • 18:16 robh@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 18:12 robh@cumin1001: START - Cookbook sre.dns.netbox
  • 18:11 bblack: dns2001 - restarting bird to repool, then re-enabling puppet - T279457
  • 18:04 pt1979@cumin2001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 18:02 pt1979@cumin2001: START - Cookbook sre.dns.netbox
  • 18:02 ejegg: update payments-wiki from 9a4eef1375 to 44570561f2
  • 18:00 herron@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-main1004.eqiad.wmnet with reason: REIMAGE
  • 17:58 herron@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-main1004.eqiad.wmnet with reason: REIMAGE
  • 17:34 papaul: powerdown moss-fe2001 for maintenance
  • 17:32 robh@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 17:29 robh@cumin1001: START - Cookbook sre.dns.netbox
  • 17:25 mbsantos@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
  • 17:23 mbsantos@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
  • 17:21 mbsantos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'mobileapps' for release 'staging' .
  • 17:19 ryankemper: T281215 Banned `elastic2043` from codfw cirrussearch cluster
  • 17:16 mbsantos@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'proton' for release 'production' .
  • 17:14 papaul: powerdown kafka-logging2003 for maintenance
  • 17:14 mbsantos@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'proton' for release 'production' .
  • 17:10 mbsantos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'proton' for release 'production' .
  • 17:09 mbsantos@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
  • 17:07 mbsantos@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
  • 17:04 mbsantos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'wikifeeds' for release 'staging' .
  • 16:52 papaul: powerdown elastic2045 for maintenance
  • 16:49 papaul: powerdown ms-be2042 for maintenance
  • 16:39 dcaro: reprepro updating packages on thirdparty/ceph-nautilus-buster
  • 16:34 pt1979@cumin2001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 16:29 pt1979@cumin2001: START - Cookbook sre.dns.netbox
  • 16:23 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on 39 hosts with reason: upgrading openstack
  • 16:23 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on 39 hosts with reason: upgrading openstack
  • 16:22 effie: upgrading scap 3.17.1-1 on mediawiki canaries - T279695
  • 16:18 effie: uploading scap_3.17.1-1
  • 16:18 effie: uploading cap_3.17.1-1
  • 15:58 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc1026.eqiad.wmnet
  • 14:48 moritzm: installing file/libmagic updates from buster point release
  • 14:47 bblack: lvs2009 - disable puppet + stop pybal (internal services will move to lvs2010, please avoid LVS service definition changes for now!) - T279457
  • 14:39 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sessionstore2003.codfw.wmnet
  • 14:36 bblack@cumin1001: conftool action : set/pooled=no; selector: name=cp203[56].codfw.wmnet
  • 14:36 bblack: cp203[56] - depool all etcd services via confctl - T279457
  • 14:33 hnowlan@cumin1001: START - Cookbook sre.hosts.reboot-single for host sessionstore2003.codfw.wmnet
  • 14:33 bblack: dns2001 - depooling for T279457 (disable puppet + stop bird)
  • 14:32 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sessionstore2002.codfw.wmnet
  • 14:31 moritzm: installing imagemagick security updates
  • 14:28 hnowlan@cumin1001: START - Cookbook sre.hosts.reboot-single for host sessionstore2002.codfw.wmnet
  • 14:25 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sessionstore2001.codfw.wmnet
  • 14:23 jayme@cumin1001: END (PASS) - Cookbook sre.kafka.roll-restart-brokers (exit_code=0)
  • 14:20 hnowlan@cumin1001: START - Cookbook sre.hosts.reboot-single for host sessionstore2001.codfw.wmnet
  • 14:20 jayme@cumin1001: END (PASS) - Cookbook sre.kafka.roll-restart-brokers (exit_code=0)
  • 14:19 moritzm: installing xen security updates
  • 14:17 akosiaris@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'staging' .
  • 14:17 akosiaris@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' .
  • 14:16 akosiaris@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'staging' .
  • 14:16 akosiaris@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' .
  • 14:15 akosiaris@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' .
  • 14:15 akosiaris@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'staging' .
  • 14:14 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sessionstore1003.eqiad.wmnet
  • 14:09 hnowlan@cumin1001: START - Cookbook sre.hosts.reboot-single for host sessionstore1003.eqiad.wmnet
  • 14:08 akosiaris@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 14:08 akosiaris@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 14:04 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sessionstore1002.eqiad.wmnet
  • 14:01 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on 105 hosts with reason: upgrading openstack
  • 14:01 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on 105 hosts with reason: upgrading openstack
  • 14:00 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on 9 hosts with reason: upgrading openstack
  • 14:00 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on 9 hosts with reason: upgrading openstack
  • 13:58 hnowlan@cumin1001: START - Cookbook sre.hosts.reboot-single for host sessionstore1002.eqiad.wmnet
  • 13:56 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sessionstore1001.eqiad.wmnet
  • 13:55 moritzm: imported jenkins 2.277.3 to thirdparty/ci
  • 13:50 hnowlan@cumin1001: START - Cookbook sre.hosts.reboot-single for host sessionstore1001.eqiad.wmnet
  • 13:48 moritzm: uploaded openjdk-8 8u292-b10-0~deb10u1 (buster forward port of latest Java 8 security release)
  • 13:46 akosiaris@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'changeprop' for release 'production' .
  • 13:46 akosiaris@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'changeprop' for release 'staging' .
  • 13:45 akosiaris: switchover api-gateway, changeprop, cpjobqueue to use the new redis cluster servers (rdb2007-rdb2010)
  • 13:45 akosiaris@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'changeprop' for release 'staging' .
  • 13:45 akosiaris@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'changeprop' for release 'production' .
  • 13:44 akosiaris@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'changeprop' for release 'production' .
  • 13:44 akosiaris@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'changeprop' for release 'staging' .
  • 13:34 akosiaris@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 13:34 akosiaris@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 13:33 akosiaris@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 13:33 akosiaris@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 13:30 hashar: Upgrading CI Jenkins from 2.263.3 to 2.277.2
  • 13:23 jayme@cumin1001: START - Cookbook sre.kafka.roll-restart-brokers
  • 13:21 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ms-be[1020-1026].eqiad.wmnet
  • 13:19 jayme@cumin1001: START - Cookbook sre.kafka.roll-restart-brokers
  • 13:13 liw@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.37.0-wmf.3
  • 13:08 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.3/extensions/GrowthExperiments/includes/Config/WikiPageConfigValidation.php: fe2a042: WikiPageConfigValidation: Mentor lists and help desk can be null (T281229) (duration: 01m 06s)
  • 13:07 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on conf[2004-2006].codfw.wmnet with reason: for zookeeper migration
  • 13:07 jayme@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on conf[2004-2006].codfw.wmnet with reason: for zookeeper migration
  • 13:06 filippo@cumin1001: START - Cookbook sre.hosts.decommission for hosts ms-be[1020-1026].eqiad.wmnet
  • 13:05 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ms-be1019.eqiad.wmnet
  • 12:55 filippo@cumin1001: START - Cookbook sre.hosts.decommission for hosts ms-be1019.eqiad.wmnet
  • 12:46 ladsgroup@deploy1002: Synchronized wmf-config/CommonSettings.php: Config: Revert "URGENT: Disable GlobalUsage" (T281242) (duration: 01m 08s)
  • 12:44 hashar: Restarted CI Jenkins for plugins upgrade
  • 12:26 marostegui@cumin1001: dbctl commit (dc=all): 'db1157 (re)pooling @ 100%: Repool db1157', diff saved to https://phabricator.wikimedia.org/P15592 and previous config saved to /var/cache/conftool/dbconfig/20210427-122619-root.json
  • 12:20 ladsgroup@deploy1002: Synchronized php-1.37.0-wmf.1/extensions/GlobalUsage: Backport: Avoid reading primary unless absolutely necessary (T281238) (duration: 01m 09s)
  • 12:12 ladsgroup@deploy1002: Synchronized php-1.37.0-wmf.3/extensions/GlobalUsage: Backport: Avoid reading primary unless absolutely necessary (T281238) (duration: 01m 09s)
  • 12:11 marostegui@cumin1001: dbctl commit (dc=all): 'db1157 (re)pooling @ 75%: Repool db1157', diff saved to https://phabricator.wikimedia.org/P15591 and previous config saved to /var/cache/conftool/dbconfig/20210427-121115-root.json
  • 12:00 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 14 days, 0:00:00 on labstore1007.wikimedia.org with reason: T281045
  • 12:00 aborrero@cumin1001: START - Cookbook sre.hosts.downtime for 14 days, 0:00:00 on labstore1007.wikimedia.org with reason: T281045
  • 11:56 marostegui@cumin1001: dbctl commit (dc=all): 'db1157 (re)pooling @ 50%: Repool db1157', diff saved to https://phabricator.wikimedia.org/P15590 and previous config saved to /var/cache/conftool/dbconfig/20210427-115612-root.json
  • 11:41 marostegui@cumin1001: dbctl commit (dc=all): 'db1157 (re)pooling @ 25%: Repool db1157', diff saved to https://phabricator.wikimedia.org/P15589 and previous config saved to /var/cache/conftool/dbconfig/20210427-114108-root.json
  • 11:36 elukey@cumin1001: END (PASS) - Cookbook sre.kafka.roll-restart-mirror-maker (exit_code=0)
  • 11:30 elukey@cumin1001: START - Cookbook sre.kafka.roll-restart-mirror-maker
  • 11:10 marostegui@cumin1001: dbctl commit (dc=all): 'Remove RW from commonswiki', diff saved to https://phabricator.wikimedia.org/P15588 and previous config saved to /var/cache/conftool/dbconfig/20210427-111016-marostegui.json
  • 11:09 ladsgroup@deploy1002: Synchronized wmf-config/CommonSettings.php: Disable GlobalUsage (duration: 01m 08s)
  • 10:40 volans@cumin1001: dbctl commit (dc=all): 'S4 RO, outage', diff saved to https://phabricator.wikimedia.org/P15585 and previous config saved to /var/cache/conftool/dbconfig/20210427-104057-volans.json
  • 10:18 godog: swift eqiad-prod: less weight for ms-be[1019-1026] / more weight to ms-be106[0-3] - T272836
  • 10:06 XioNoX: standardize management routers ACLs with Capirca - mr1-eqiad (last one)
  • 10:01 ayounsi@deploy1002: Finished deploy [homer/deploy@759f82c]: Homer release v0.2.7 (duration: 02m 16s)
  • 09:59 ayounsi@deploy1002: Started deploy [homer/deploy@759f82c]: Homer release v0.2.7
  • 09:56 ayounsi@deploy1002: Finished deploy [homer/deploy@759f82c]: Homer release v0.2.7 (duration: 00m 22s)
  • 09:56 ayounsi@deploy1002: Started deploy [homer/deploy@759f82c]: Homer release v0.2.7
  • 09:35 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1157 for schema change', diff saved to https://phabricator.wikimedia.org/P15584 and previous config saved to /var/cache/conftool/dbconfig/20210427-093536-marostegui.json
  • 09:35 XioNoX: standardize management routers ACLs with Capirca - mr1-eqsin
  • 09:35 marostegui@cumin1001: dbctl commit (dc=all): 'db1175 (re)pooling @ 100%: Repool db1175', diff saved to https://phabricator.wikimedia.org/P15583 and previous config saved to /var/cache/conftool/dbconfig/20210427-093501-root.json
  • 09:34 legoktm@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host rdb1012.eqiad.wmnet
  • 09:34 legoktm@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host rdb1011.eqiad.wmnet
  • 09:33 moritzm: rolling restart of elastic in relforge* to pick up Java updates
  • 09:32 legoktm@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host rdb2010.codfw.wmnet
  • 09:31 legoktm@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host rdb2009.codfw.wmnet
  • 09:31 legoktm@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host rdb2008.codfw.wmnet
  • 09:19 marostegui@cumin1001: dbctl commit (dc=all): 'db1175 (re)pooling @ 75%: Repool db1175', diff saved to https://phabricator.wikimedia.org/P15582 and previous config saved to /var/cache/conftool/dbconfig/20210427-091957-root.json
  • 09:19 legoktm@cumin1001: START - Cookbook sre.hosts.reboot-single for host rdb1012.eqiad.wmnet
  • 09:19 legoktm@cumin1001: START - Cookbook sre.hosts.reboot-single for host rdb1011.eqiad.wmnet
  • 09:17 legoktm@cumin1001: START - Cookbook sre.hosts.reboot-single for host rdb2010.codfw.wmnet
  • 09:16 legoktm@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host rdb2010.codfw.wmnet
  • 09:16 legoktm@cumin1001: START - Cookbook sre.hosts.reboot-single for host rdb2010.codfw.wmnet
  • 09:16 legoktm@cumin1001: START - Cookbook sre.hosts.reboot-single for host rdb2009.codfw.wmnet
  • 09:16 legoktm@cumin1001: START - Cookbook sre.hosts.reboot-single for host rdb2008.codfw.wmnet
  • 09:16 legoktm@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host rdb2007.codfw.wmnet
  • 09:11 jayme@cumin1001: END (PASS) - Cookbook sre.kafka.roll-restart-brokers (exit_code=0)
  • 09:11 legoktm@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on rdb2010.codfw.wmnet with reason: REIMAGE
  • 09:09 legoktm@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on rdb2009.codfw.wmnet with reason: REIMAGE
  • 09:07 legoktm@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on rdb1012.eqiad.wmnet with reason: REIMAGE
  • 09:06 legoktm@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on rdb2010.codfw.wmnet with reason: REIMAGE
  • 09:05 legoktm@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on rdb1011.eqiad.wmnet with reason: REIMAGE
  • 09:05 legoktm@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on rdb1012.eqiad.wmnet with reason: REIMAGE
  • 09:05 marostegui@cumin1001: dbctl commit (dc=all): 'db1175 (re)pooling @ 50%: Repool db1175', diff saved to https://phabricator.wikimedia.org/P15581 and previous config saved to /var/cache/conftool/dbconfig/20210427-090454-root.json
  • 09:04 legoktm@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on rdb2009.codfw.wmnet with reason: REIMAGE
  • 09:04 jayme@cumin1001: END (PASS) - Cookbook sre.kafka.roll-restart-brokers (exit_code=0)
  • 09:03 legoktm@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on rdb1011.eqiad.wmnet with reason: REIMAGE
  • 09:01 legoktm@cumin1001: START - Cookbook sre.hosts.reboot-single for host rdb2007.codfw.wmnet
  • 08:49 marostegui@cumin1001: dbctl commit (dc=all): 'db1175 (re)pooling @ 25%: Repool db1175', diff saved to https://phabricator.wikimedia.org/P15580 and previous config saved to /var/cache/conftool/dbconfig/20210427-084950-root.json
  • 08:46 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1175 for schema change', diff saved to https://phabricator.wikimedia.org/P15579 and previous config saved to /var/cache/conftool/dbconfig/20210427-084651-marostegui.json
  • 08:46 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 100%: Repool db1166', diff saved to https://phabricator.wikimedia.org/P15578 and previous config saved to /var/cache/conftool/dbconfig/20210427-084630-root.json
  • 08:39 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1114 into main and api', diff saved to https://phabricator.wikimedia.org/P15577 and previous config saved to /var/cache/conftool/dbconfig/20210427-083910-marostegui.json
  • 08:36 XioNoX: standardize management routers ACLs with Capirca
  • 08:31 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1114 into main and traffic', diff saved to https://phabricator.wikimedia.org/P15576 and previous config saved to /var/cache/conftool/dbconfig/20210427-083145-marostegui.json
  • 08:31 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 75%: Repool db1166', diff saved to https://phabricator.wikimedia.org/P15575 and previous config saved to /var/cache/conftool/dbconfig/20210427-083126-root.json
  • 08:24 hashar: Restarting CI Jenkins for plugins upgrade
  • 08:19 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1114 into main and traffic', diff saved to https://phabricator.wikimedia.org/P15574 and previous config saved to /var/cache/conftool/dbconfig/20210427-081911-marostegui.json
  • 08:18 marostegui@cumin1001: dbctl commit (dc=all): 'db1124 (re)pooling @ 100%: Slowly pool into s7 db1124', diff saved to https://phabricator.wikimedia.org/P15573 and previous config saved to /var/cache/conftool/dbconfig/20210427-081846-root.json
  • 08:16 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 50%: Repool db1166', diff saved to https://phabricator.wikimedia.org/P15572 and previous config saved to /var/cache/conftool/dbconfig/20210427-081623-root.json
  • 08:13 marostegui@cumin1001: dbctl commit (dc=all): 'db1087 (re)pooling @ 100%: Repool db1087', diff saved to https://phabricator.wikimedia.org/P15571 and previous config saved to /var/cache/conftool/dbconfig/20210427-081325-root.json
  • 08:12 legoktm@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on rdb2008.codfw.wmnet with reason: REIMAGE
  • 08:11 jayme@cumin1001: START - Cookbook sre.kafka.roll-restart-brokers
  • 08:10 legoktm@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on rdb2007.codfw.wmnet with reason: REIMAGE
  • 08:10 legoktm@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on rdb2008.codfw.wmnet with reason: REIMAGE
  • 08:08 legoktm@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on rdb2007.codfw.wmnet with reason: REIMAGE
  • 08:03 jayme@cumin1001: START - Cookbook sre.kafka.roll-restart-brokers
  • 08:03 marostegui@cumin1001: dbctl commit (dc=all): 'db1124 (re)pooling @ 90%: Slowly pool into s7 db1124', diff saved to https://phabricator.wikimedia.org/P15570 and previous config saved to /var/cache/conftool/dbconfig/20210427-080342-root.json
  • 08:01 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 25%: Repool db1166', diff saved to https://phabricator.wikimedia.org/P15569 and previous config saved to /var/cache/conftool/dbconfig/20210427-080119-root.json
  • 07:58 marostegui@cumin1001: dbctl commit (dc=all): 'db1087 (re)pooling @ 75%: Repool db1087', diff saved to https://phabricator.wikimedia.org/P15568 and previous config saved to /var/cache/conftool/dbconfig/20210427-075822-root.json
  • 07:58 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1166 for schema change', diff saved to https://phabricator.wikimedia.org/P15567 and previous config saved to /var/cache/conftool/dbconfig/20210427-075759-marostegui.json
  • 07:57 marostegui@cumin1001: dbctl commit (dc=all): 'db1179 (re)pooling @ 100%: Repool db1179', diff saved to https://phabricator.wikimedia.org/P15566 and previous config saved to /var/cache/conftool/dbconfig/20210427-075738-root.json
  • 07:52 liw@deploy1002: Pruned MediaWiki: 1.36.0-wmf.38 (duration: 03m 17s)
  • 07:48 marostegui@cumin1001: dbctl commit (dc=all): 'db1124 (re)pooling @ 80%: Slowly pool into s7 db1124', diff saved to https://phabricator.wikimedia.org/P15565 and previous config saved to /var/cache/conftool/dbconfig/20210427-074839-root.json
  • 07:43 marostegui@cumin1001: dbctl commit (dc=all): 'db1087 (re)pooling @ 50%: Repool db1087', diff saved to https://phabricator.wikimedia.org/P15564 and previous config saved to /var/cache/conftool/dbconfig/20210427-074318-root.json
  • 07:42 marostegui@cumin1001: dbctl commit (dc=all): 'db1179 (re)pooling @ 75%: Repool db1179', diff saved to https://phabricator.wikimedia.org/P15563 and previous config saved to /var/cache/conftool/dbconfig/20210427-074234-root.json
  • 07:33 marostegui@cumin1001: dbctl commit (dc=all): 'db1124 (re)pooling @ 75%: Slowly pool into s7 db1124', diff saved to https://phabricator.wikimedia.org/P15562 and previous config saved to /var/cache/conftool/dbconfig/20210427-073335-root.json
  • 07:28 marostegui@cumin1001: dbctl commit (dc=all): 'db1087 (re)pooling @ 25%: Repool db1087', diff saved to https://phabricator.wikimedia.org/P15561 and previous config saved to /var/cache/conftool/dbconfig/20210427-072814-root.json
  • 07:27 marostegui@cumin1001: dbctl commit (dc=all): 'db1179 (re)pooling @ 50%: Repool db1179', diff saved to https://phabricator.wikimedia.org/P15560 and previous config saved to /var/cache/conftool/dbconfig/20210427-072731-root.json
  • 07:26 godog: swift eqiad-prod: less weight for ms-be[1019-1026] / more weight to ms-be106[0-3] - T272836
  • 07:24 liw@deploy1002: Finished scap: testwikis wikis to 1.37.0-wmf.3 (duration: 30m 54s)
  • 07:21 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on conf[2004-2006].codfw.wmnet with reason: for zookeeper migration
  • 07:21 jayme@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on conf[2004-2006].codfw.wmnet with reason: for zookeeper migration
  • 07:19 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on conf[2002-2003].codfw.wmnet with reason: for zookeeper migration
  • 07:19 jayme@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on conf[2002-2003].codfw.wmnet with reason: for zookeeper migration
  • 07:18 marostegui@cumin1001: dbctl commit (dc=all): 'db1124 (re)pooling @ 60%: Slowly pool into s7 db1124', diff saved to https://phabricator.wikimedia.org/P15559 and previous config saved to /var/cache/conftool/dbconfig/20210427-071831-root.json
  • 07:12 marostegui@cumin1001: dbctl commit (dc=all): 'db1179 (re)pooling @ 25%: Repool db1179', diff saved to https://phabricator.wikimedia.org/P15558 and previous config saved to /var/cache/conftool/dbconfig/20210427-071227-root.json
  • 07:03 marostegui@cumin1001: dbctl commit (dc=all): 'db1124 (re)pooling @ 50%: Slowly pool into s7 db1124', diff saved to https://phabricator.wikimedia.org/P15557 and previous config saved to /var/cache/conftool/dbconfig/20210427-070328-root.json
  • 06:56 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1179 for schema change', diff saved to https://phabricator.wikimedia.org/P15556 and previous config saved to /var/cache/conftool/dbconfig/20210427-065628-marostegui.json
  • 06:55 elukey: upgrade mariadb to 10.4.18-1 + reboot on db1108 - T279281
  • 06:54 liw@deploy1002: Started scap: testwikis wikis to 1.37.0-wmf.3
  • 06:48 marostegui@cumin1001: dbctl commit (dc=all): 'db1124 (re)pooling @ 40%: Slowly pool into s7 db1124', diff saved to https://phabricator.wikimedia.org/P15555 and previous config saved to /var/cache/conftool/dbconfig/20210427-064824-root.json
  • 06:37 liw: version 1.37.0-wmf.3 was branched at 20ab303 for T278347
  • 06:33 marostegui@cumin1001: dbctl commit (dc=all): 'db1124 (re)pooling @ 30%: Slowly pool into s7 db1124', diff saved to https://phabricator.wikimedia.org/P15554 and previous config saved to /var/cache/conftool/dbconfig/20210427-063320-root.json
  • 06:18 marostegui@cumin1001: dbctl commit (dc=all): 'db1124 (re)pooling @ 25%: Slowly pool into s7 db1124', diff saved to https://phabricator.wikimedia.org/P15553 and previous config saved to /var/cache/conftool/dbconfig/20210427-061817-root.json
  • 06:11 elukey: powercycle elastic2043 - no ssh, no tty remote console available
  • 06:03 marostegui@cumin1001: dbctl commit (dc=all): 'db1124 (re)pooling @ 20%: Slowly pool into s7 db1124', diff saved to https://phabricator.wikimedia.org/P15552 and previous config saved to /var/cache/conftool/dbconfig/20210427-060313-root.json
  • 05:48 marostegui@cumin1001: dbctl commit (dc=all): 'db1124 (re)pooling @ 15%: Slowly pool into s7 db1124', diff saved to https://phabricator.wikimedia.org/P15551 and previous config saved to /var/cache/conftool/dbconfig/20210427-054809-root.json
  • 05:33 marostegui@cumin1001: dbctl commit (dc=all): 'db1124 (re)pooling @ 10%: Slowly pool into s7 db1124', diff saved to https://phabricator.wikimedia.org/P15550 and previous config saved to /var/cache/conftool/dbconfig/20210427-053306-root.json
  • 05:30 XioNoX: push pfw fw policies - T281137
  • 05:27 legoktm: imported hyperkitty_1.3.4-2~bpo10+2 to apt.wm.o (T281213)
  • 05:22 marostegui@cumin1001: dbctl commit (dc=all): 'db1158 (re)pooling @ 100%: Repool db1158', diff saved to https://phabricator.wikimedia.org/P15549 and previous config saved to /var/cache/conftool/dbconfig/20210427-052236-root.json
  • 05:21 marostegui: Stop mysql on db1087 to clone db1167 (lag will appear on wikidata on wikireplicas) T258361
  • 05:20 marostegui@cumin1001: dbctl commit (dc=all): 'Pool db1114 temporarily as db1087 will be depooled', diff saved to https://phabricator.wikimedia.org/P15547 and previous config saved to /var/cache/conftool/dbconfig/20210427-052026-marostegui.json
  • 05:18 marostegui@cumin1001: dbctl commit (dc=all): 'db1124 (re)pooling @ 5%: Slowly pool into s7 db1124', diff saved to https://phabricator.wikimedia.org/P15546 and previous config saved to /var/cache/conftool/dbconfig/20210427-051802-root.json
  • 05:08 marostegui@cumin1001: dbctl commit (dc=all): 'Add db1124 with minimal weight for the first time in s7 T258361', diff saved to https://phabricator.wikimedia.org/P15545 and previous config saved to /var/cache/conftool/dbconfig/20210427-050826-marostegui.json
  • 05:07 marostegui@cumin1001: dbctl commit (dc=all): 'db1158 (re)pooling @ 75%: Repool db1158', diff saved to https://phabricator.wikimedia.org/P15544 and previous config saved to /var/cache/conftool/dbconfig/20210427-050732-root.json
  • 05:03 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db1077.eqiad.wmnet
  • 04:53 marostegui@cumin1001: START - Cookbook sre.hosts.decommission for hosts db1077.eqiad.wmnet
  • 04:52 marostegui@cumin1001: dbctl commit (dc=all): 'db1158 (re)pooling @ 50%: Repool db1158', diff saved to https://phabricator.wikimedia.org/P15543 and previous config saved to /var/cache/conftool/dbconfig/20210427-045229-root.json
  • 04:46 marostegui@cumin1001: dbctl commit (dc=all): 'Add db1124 with minimal weight for the first time in s7 T258361', diff saved to https://phabricator.wikimedia.org/P15541 and previous config saved to /var/cache/conftool/dbconfig/20210427-044609-marostegui.json
  • 04:45 marostegui@cumin1001: dbctl commit (dc=all): 'Add db1124 to dbctl, depooled, T258361', diff saved to https://phabricator.wikimedia.org/P15540 and previous config saved to /var/cache/conftool/dbconfig/20210427-044520-marostegui.json
  • 04:37 marostegui@cumin1001: dbctl commit (dc=all): 'db1158 (re)pooling @ 25%: Repool db1158', diff saved to https://phabricator.wikimedia.org/P15539 and previous config saved to /var/cache/conftool/dbconfig/20210427-043725-root.json
  • 04:25 legoktm: upgrading lists-next.wikimedia.org to mailman3-from-bullseye (T280887)
  • 04:19 marostegui: Set phabricator on read only T279625
  • 03:37 ryankemper: [WDQS Deploy] Restarting `wdqs-categories` across lvs-managed hosts, one node at a time: `sudo -E cumin -b 1 'A:wdqs-all and not A:wdqs-test' 'depool && sleep 45 && systemctl restart wdqs-categories && sleep 45 && pool'`
  • 03:37 ryankemper: [WDQS Deploy] Restarted `wdqs-categories` across all test hosts simultaneously: `sudo -E cumin 'A:wdqs-test' 'systemctl restart wdqs-categories'`
  • 03:37 ryankemper: [WDQS Deploy] Restarted `wdqs-updater` across all hosts, 4 hosts at a time: `sudo -E cumin -b 4 'A:wdqs-all' 'systemctl restart wdqs-updater'`
  • 03:36 ryankemper@deploy1002: Finished deploy [wdqs/wdqs@08ad17a]: 0.3.70 (duration: 08m 18s)
  • 03:28 ryankemper: [WDQS Deploy] Tests passing following deploy of `0.3.70` on canary `wdqs1003`; proceeding to rest of fleet
  • 03:28 ryankemper@deploy1002: Started deploy [wdqs/wdqs@08ad17a]: 0.3.70
  • 03:27 ryankemper: [WDQS Deploy] Gearing up for deploy of wdqs `0.3.70`. Pre-deploy tests passing on canary `wdqs1003`
  • 03:17 ryankemper: T280382 `wdqs1006` has been re-imaged and had the appropriate wikidata/categories journal files transferred. `df -h` shows disk space is no longer an issue following the switch to raid0: `/dev/md2 2.6T 998G 1.5T 40% /srv`
  • 02:56 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 01:29 ryankemper: T280382 `sudo -i cookbook sre.wdqs.data-transfer --source wdqs1004.eqiad.wmnet --dest wdqs1006.eqiad.wmnet --reason "transferring fresh wikidata journal following reimage" --blazegraph_instance blazegraph --task-id T280382` on `ryankemper@cumin1001` tmux session `reimage`
  • 01:29 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 01:27 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 01:21 ryankemper: T280382 `sudo -i cookbook sre.wdqs.data-transfer --source wdqs1004.eqiad.wmnet --dest wdqs1006.eqiad.wmnet --reason "transferring fresh wikidata journal following reimage" --blazegraph_instance categories` on `ryankemper@cumin1001` tmux session `reimage`
  • 01:21 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer

2021-04-26

  • 23:28 mutante: renewing TLS cert for peopleweb.discovery.wmnet, adding *3 hosts
  • 23:21 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on people1003.eqiad.wmnet with reason: new host
  • 23:21 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on people1003.eqiad.wmnet with reason: new host
  • 22:26 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs1006.eqiad.wmnet with reason: REIMAGE
  • 22:24 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs1006.eqiad.wmnet with reason: REIMAGE
  • 22:11 ryankemper: T280382 `sudo -i wmf-auto-reimage-host -p T280382 --new wdqs1006.eqiad.wmnet` on `ryankemper@cumin1001` tmux session `reimage`
  • 21:21 dzahn@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host people1003.eqiad.wmnet
  • 20:48 twentyafterfour: restarting php-fpm on phab1001 to deploy phabricator hotfix d238db8
  • 20:35 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm for new host people1003.eqiad.wmnet
  • 20:26 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts planet1003.eqiad.wmnet
  • 20:15 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts planet1003.eqiad.wmnet
  • 19:45 legoktm: uploaded python3-falcon, python3-mimeparse, python3-mujson, openstack-pkg-tools to mailman3 component on apt.wm.o
  • 18:51 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wcqs1003.eqiad.wmnet with reason: REIMAGE
  • 18:49 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wcqs1002.eqiad.wmnet with reason: REIMAGE
  • 18:49 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on wcqs1003.eqiad.wmnet with reason: REIMAGE
  • 18:47 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wcqs1001.eqiad.wmnet with reason: REIMAGE
  • 18:47 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on wcqs1002.eqiad.wmnet with reason: REIMAGE
  • 18:45 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on wcqs1001.eqiad.wmnet with reason: REIMAGE
  • 18:18 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 2d16f62: elwiki: Update Growth experiments configuration (T280172) (duration: 00m 58s)
  • 18:06 urbanecm@deploy1002: Synchronized multiversion/MWScript.php: 5ace4e1: Fix error message if MWScript.php is run without arguments (duration: 00m 58s)
  • 17:28 dduvall@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'blubberoid' for release 'production' .
  • 17:26 dduvall@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'blubberoid' for release 'production' .
  • 17:18 dduvall@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'blubberoid' for release 'staging' .
  • 17:06 legoktm: imported postorius_1.3.4-2~bpo10+2 to apt.wm.o
  • 16:49 mutante: gerrit - restarted apache (hard) to remove time out from gerrit:682502
  • 16:40 mutante: gerrit1001 - reload apache2
  • 16:36 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc1025.eqiad.wmnet
  • 16:30 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc1025.eqiad.wmnet
  • 15:26 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest1002.eqiad.wmnet with reason: REIMAGE
  • 15:24 jbond@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest1002.eqiad.wmnet with reason: REIMAGE
  • 15:21 elukey: restart zookeeper on conf2004 to pick up the -javaagent setting for the prometheus exporter
  • 15:06 moritzm: installing jquery security updates on stretch
  • 15:01 hnowlan@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 15:01 hnowlan@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 14:54 hnowlan@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 14:54 hnowlan@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 14:48 hnowlan@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 14:47 hnowlan@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 14:28 moritzm: installing ldap-replica1003/1004
  • 14:03 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on conf2001.codfw.wmnet with reason: for zookeeper migration
  • 14:03 jayme@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on conf2001.codfw.wmnet with reason: for zookeeper migration
  • 13:39 marostegui@cumin1001: dbctl commit (dc=all): 'db1105:3311 (re)pooling @ 100%: Repool db1105:3311', diff saved to https://phabricator.wikimedia.org/P15537 and previous config saved to /var/cache/conftool/dbconfig/20210426-133922-root.json
  • 13:39 marostegui@cumin1001: dbctl commit (dc=all): 'db1105:3312 (re)pooling @ 100%: Repool db1105:3312', diff saved to https://phabricator.wikimedia.org/P15536 and previous config saved to /var/cache/conftool/dbconfig/20210426-133905-root.json
  • 13:28 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on 6 hosts with reason: for zookeeper migration
  • 13:27 jayme@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on 6 hosts with reason: for zookeeper migration
  • 13:25 marostegui@cumin1001: dbctl commit (dc=all): 'db1135 (re)pooling @ 100%: Repool db1135', diff saved to https://phabricator.wikimedia.org/P15535 and previous config saved to /var/cache/conftool/dbconfig/20210426-132533-root.json
  • 13:24 marostegui@cumin1001: dbctl commit (dc=all): 'db1105:3311 (re)pooling @ 75%: Repool db1105:3311', diff saved to https://phabricator.wikimedia.org/P15534 and previous config saved to /var/cache/conftool/dbconfig/20210426-132417-root.json
  • 13:24 marostegui@cumin1001: dbctl commit (dc=all): 'db1105:3312 (re)pooling @ 75%: Repool db1105:3312', diff saved to https://phabricator.wikimedia.org/P15533 and previous config saved to /var/cache/conftool/dbconfig/20210426-132402-root.json
  • 13:14 moritzm: installing ldap-replica2005/2006
  • 13:10 marostegui@cumin1001: dbctl commit (dc=all): 'db1135 (re)pooling @ 75%: Repool db1135', diff saved to https://phabricator.wikimedia.org/P15532 and previous config saved to /var/cache/conftool/dbconfig/20210426-131029-root.json
  • 13:09 marostegui@cumin1001: dbctl commit (dc=all): 'db1105:3311 (re)pooling @ 50%: Repool db1105:3311', diff saved to https://phabricator.wikimedia.org/P15531 and previous config saved to /var/cache/conftool/dbconfig/20210426-130913-root.json
  • 13:08 marostegui@cumin1001: dbctl commit (dc=all): 'db1105:3312 (re)pooling @ 50%: Repool db1105:3312', diff saved to https://phabricator.wikimedia.org/P15530 and previous config saved to /var/cache/conftool/dbconfig/20210426-130858-root.json
  • 12:57 moritzm: installing gst-plugins-base1.0 security updates
  • 12:55 marostegui@cumin1001: dbctl commit (dc=all): 'db1135 (re)pooling @ 50%: Repool db1135', diff saved to https://phabricator.wikimedia.org/P15529 and previous config saved to /var/cache/conftool/dbconfig/20210426-125526-root.json
  • 12:54 marostegui@cumin1001: dbctl commit (dc=all): 'db1105:3311 (re)pooling @ 25%: Repool db1105:3311', diff saved to https://phabricator.wikimedia.org/P15528 and previous config saved to /var/cache/conftool/dbconfig/20210426-125409-root.json
  • 12:53 marostegui@cumin1001: dbctl commit (dc=all): 'db1105:3312 (re)pooling @ 25%: Repool db1105:3312', diff saved to https://phabricator.wikimedia.org/P15527 and previous config saved to /var/cache/conftool/dbconfig/20210426-125354-root.json
  • 12:41 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1105 for kernel upgrade', diff saved to https://phabricator.wikimedia.org/P15526 and previous config saved to /var/cache/conftool/dbconfig/20210426-124141-marostegui.json
  • 12:40 marostegui@cumin1001: dbctl commit (dc=all): 'db1135 (re)pooling @ 25%: Repool db1135', diff saved to https://phabricator.wikimedia.org/P15525 and previous config saved to /var/cache/conftool/dbconfig/20210426-124022-root.json
  • 12:30 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1135 for kernel upgrade', diff saved to https://phabricator.wikimedia.org/P15524 and previous config saved to /var/cache/conftool/dbconfig/20210426-123020-marostegui.json
  • 12:28 akosiaris@cumin1001: conftool action : set/pooled=yes; selector: cluster=jobrunner,service=nginx,name=mw1338.eqiad.wmnet
  • 12:27 akosiaris@cumin1001: conftool action : set/pooled=yes; selector: cluster=jobrunner,service=nginx,name=mw1338.eqiad.wmnet
  • 12:24 Amir1: cleaning watchlist of QuickStatementsBot in wikidatawiki
  • 12:06 akosiaris@cumin1001: conftool action : set/pooled=yes; selector: cluster=jobrunner,service=nginx,name=mw1338.eqiad.wmnet
  • 12:05 akosiaris@cumin1001: conftool action : set/pooled=yes; selector: cluster=jobrunner,service=nginx,name=mw1338.eqiad.wmnet
  • 12:00 marostegui@deploy1002: Synchronized wmf-config/db-eqiad.php: Enable writes on es4 T279281 (duration: 00m 56s)
  • 11:57 marostegui: Restart es4 primary master - T279281
  • 11:55 marostegui@deploy1002: Synchronized wmf-config/db-eqiad.php: Disable writes on es4 T279281 (duration: 00m 56s)
  • 11:51 aborrero@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 11:49 hashar@deploy1002: Finished deploy [integration/docroot@c2e48c9]: doc: Explain that VE is both stand-alone and integrated into MediaWiki (duration: 00m 13s)
  • 11:49 hashar@deploy1002: Started deploy [integration/docroot@c2e48c9]: doc: Explain that VE is both stand-alone and integrated into MediaWiki
  • 11:46 Urbanecm: EU B&C done
  • 11:45 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.1/extensions/TemplateData/modules/ext.templateDataGenerator.editTemplatePage/Dialog.js: a347517: Fix suggested values not being shown when the params type isnt specified (T280688) (duration: 00m 57s)
  • 11:31 hoo@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Revert "Set wgPageImagesAPIDefaultLicense to 'any' for wikidata" (duration: 00m 57s)
  • 11:30 aborrero@cumin1001: START - Cookbook sre.dns.netbox
  • 11:17 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 2b5b640: Enable ContentTranslation as a default tool for 11 Wikipedias (T279422) (duration: 00m 57s)
  • 10:58 effie: restarting php-fpm in mw* clusters in codfw to pick up php7.2 update
  • 10:46 jdrewniak@deploy1002: Synchronized portals: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 00m 57s)
  • 10:45 jdrewniak@deploy1002: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 00m 57s)
  • 10:38 jmm@cumin2001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host ldap-replica1004.wikimedia.org
  • 10:37 reedy@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Setup wmgUseFooterCodeOfConductLink for later usage (duration: 00m 57s)
  • 10:31 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1125.eqiad.wmnet with reason: REIMAGE
  • 10:29 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1125.eqiad.wmnet with reason: REIMAGE
  • 10:26 effie: upgrading mw* servers php7.2 in codfw
  • 10:25 marostegui: Deploy schema change on s4 codfw, lag will appear T276292
  • 10:24 reedy@deploy1002: Synchronized wmf-config/CommonSettings.php: Use wmgUseFooterTechCodeOfConductLink instead of wmgUseFooterCodeOfConductLink (duration: 00m 57s)
  • 10:24 jmm@cumin2001: START - Cookbook sre.ganeti.makevm for new host ldap-replica1004.wikimedia.org
  • 10:22 reedy@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Add wmgUseFooterTechCodeOfConductLink (duration: 00m 59s)
  • 10:22 jmm@cumin2001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host ldap-replica1003.wikimedia.org
  • 10:18 moritzm: installing systemd updates from buster 10.9 point release
  • 10:07 jmm@cumin2001: START - Cookbook sre.ganeti.makevm for new host ldap-replica1003.wikimedia.org
  • 10:00 filippo@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=swift,name=eqiad
  • 09:53 jmm@cumin2001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host ldap-replica2006.wikimedia.org
  • 09:42 moritzm: installing clamav security updates on otrs1001
  • 09:38 godog: reboot ms-be1062, kernel backtrace saved
  • 09:26 filippo@cumin1001: conftool action : set/pooled=false; selector: dnsdisc=swift,name=eqiad
  • 09:26 jmm@cumin2001: START - Cookbook sre.ganeti.makevm for new host ldap-replica2006.wikimedia.org
  • 09:24 jmm@cumin2001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host ldap-replica2005.wikimedia.org
  • 09:15 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on conf2005.codfw.wmnet with reason: for initial etcd replication
  • 09:15 jayme@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on conf2005.codfw.wmnet with reason: for initial etcd replication
  • 09:13 jayme: imported etcd-mirror_0.0.6-2 to buster-wikimedia
  • 09:10 jmm@cumin2001: START - Cookbook sre.ganeti.makevm for new host ldap-replica2005.wikimedia.org
  • 09:07 jmm@cumin2001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host ldap-replica2005failoid1002.wikimedia.org
  • 09:04 jayme: imported etcd-mirror_0.0.6-1 to buster-wikimedia
  • 08:55 jmm@cumin2001: START - Cookbook sre.ganeti.makevm for new host ldap-replica2005failoid1002.wikimedia.org
  • 08:49 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: NOOP: f01a6da: GrowthExperiments: Enable community configuration on testwiki (T274520) (duration: 00m 57s)
  • 08:42 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: NOOP: 88da822: GrowthExperiments: Do not enable community configuration outside of beta wikis (T274520) (duration: 00m 59s)
  • 08:28 moritzm: update debmonitor to 0.2.9 on remaining hosts T281090
  • 08:13 moritzm: installing lxml security updates on stretch
  • 07:54 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on conf2005.codfw.wmnet with reason: for initial etcd replication
  • 07:54 jayme@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on conf2005.codfw.wmnet with reason: for initial etcd replication
  • 07:53 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on thanos-fe1001.eqiad.wmnet with reason: REIMAGE
  • 07:51 filippo@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on thanos-fe1001.eqiad.wmnet with reason: REIMAGE
  • 07:32 godog: swift eqiad-prod: less weight for ms-be[1019-1026] / more weight to ms-be106[0-3] - T272836
  • 07:24 moritzm: installing pear security updates
  • 07:09 moritzm: removed rawdog from bullseye-wikimedia, needs Py2 T280989
  • 06:24 elukey: reboot an-coord1001 to pick up kernel security settings (after reimage)
  • 05:47 marostegui@cumin1001: dbctl commit (dc=all): 'Add db1158 to dbctl, depooled, T258361', diff saved to https://phabricator.wikimedia.org/P15521 and previous config saved to /var/cache/conftool/dbconfig/20210426-054700-marostegui.json
  • 05:32 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1124.eqiad.wmnet with reason: REIMAGE
  • 05:30 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1124.eqiad.wmnet with reason: REIMAGE
  • 03:43 kart_: Updated cxserver to 2021-04-21-044024-production (T279045)
  • 03:41 kartik@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'cxserver' for release 'production' .
  • 03:37 kartik@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'cxserver' for release 'production' .
  • 03:32 kartik@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'cxserver' for release 'staging' .

2021-04-25

  • 15:23 Amir1: sudo -u list /var/lib/mailman/bin/change_pw -l wikica-l -p $(pwgen -c1 -s 12) (T281066)

2021-04-24

  • 22:24 bstorm: Rebooting labstore1007 from ilo after crash

2021-04-23

  • 21:36 foks: removing 1 file for legal compliance
  • 20:15 mutante: [apt1001:~] $ sudo -i reprepro -C main includedeb bullseye-wikimedia /home/dzahn/rawdog_2.23-2_all.deb (T280989)
  • 19:41 mutante: [apt1001:~] $ sudo -i reprepro copy bullseye-wikimedia buster-wikimedia envoyproxy - copy envoy package from buster to bullseye T280989
  • 19:09 ebernhardson: closing duplicate/wrong cluster indices in cloudelastic
  • 17:02 elukey@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp1087.eqiad.wmnet
  • 16:35 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:32 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 16:24 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:19 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 14:59 jbond@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on theemin.codfw.wmnet with reason: REIMAGE
  • 14:59 jbond@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on theemin.codfw.wmnet with reason: REIMAGE
  • 14:25 moritzm: revert back bullseye image to daily build from last week (to rule out potential reimage issue)
  • 13:33 elukey: roll restart of all thanos-swift proxies to pick up new ML account - T280773
  • 12:50 jbond42: upload new debmonitor-client packages
  • 11:50 moritzm: installing perf updates from Buster 10.9 point release
  • 10:06 moritzm: installing Linux 4.19.181 updates from Buster 10.9 point release (no reboots, just updating the packages)
  • 09:54 moritzm: installing xen security updates
  • 09:49 moritzm: installing xorg-server security updates
  • 09:37 marostegui@cumin1001: dbctl commit (dc=all): 'db1079 (re)pooling @ 100%: Repool db1079', diff saved to https://phabricator.wikimedia.org/P15512 and previous config saved to /var/cache/conftool/dbconfig/20210423-093723-root.json
  • 09:22 marostegui@cumin1001: dbctl commit (dc=all): 'db1079 (re)pooling @ 75%: Repool db1079', diff saved to https://phabricator.wikimedia.org/P15511 and previous config saved to /var/cache/conftool/dbconfig/20210423-092220-root.json
  • 09:07 marostegui@cumin1001: dbctl commit (dc=all): 'db1079 (re)pooling @ 50%: Repool db1079', diff saved to https://phabricator.wikimedia.org/P15510 and previous config saved to /var/cache/conftool/dbconfig/20210423-090716-root.json
  • 08:52 marostegui@cumin1001: dbctl commit (dc=all): 'db1079 (re)pooling @ 25%: Repool db1079', diff saved to https://phabricator.wikimedia.org/P15509 and previous config saved to /var/cache/conftool/dbconfig/20210423-085212-root.json
  • 08:27 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1020.eqiad.wmnet
  • 08:21 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be1020.eqiad.wmnet
  • 08:19 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1021.eqiad.wmnet
  • 08:13 moritzm: upgrading d-i image for bullseye to RC1 release T275873
  • 08:12 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be1021.eqiad.wmnet
  • 08:12 moritzm: upgrading d-i image for bullseye to RC1 release
  • 08:12 filippo@cumin1001: END (ERROR) - Cookbook sre.hosts.reboot-single (exit_code=97) for host ms-be1019.eqiad.wmnet
  • 07:59 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be1019.eqiad.wmnet
  • 07:56 jynus: deleting db1156 s2 database and reloading it from logical backups T280492
  • 07:22 Amir1: removing junk bounced email addresses from yahoo from all mailing lists
  • 05:40 marostegui: Stop db1079 to clone db1158 (lag will appear on s7 on wiki replicas)
  • 05:39 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1079 to clone db1158 T258361', diff saved to https://phabricator.wikimedia.org/P15506 and previous config saved to /var/cache/conftool/dbconfig/20210423-053907-marostegui.json

2021-04-22

  • 17:26 marostegui: Stop mysql on tendril/dbtree database
  • 16:33 volker-e@deploy1002: Finished deploy [design/style-guide@e914e8a]: Deploy design/style-guide: e914e8a icons: Add 'share' icon (#455) (duration: 00m 06s)
  • 16:32 volker-e@deploy1002: Started deploy [design/style-guide@e914e8a]: Deploy design/style-guide: e914e8a icons: Add 'share' icon (#455)
  • 13:23 marostegui: Tendril and dbtree are up but on a degraded status (slow reponse)
  • 13:19 marostegui: Tendril and dbtree are down at the moment
  • 12:46 Urbanecm: Start server-side upload for 2 video files (T280763, T280524)
  • 12:31 marostegui: Restart mysql on db1115 (tendril/dbtree will fail)
  • 04:55 eileen: civicrm revision changed from 42ca3cf65a to 33a63d5789, config revision is cf07e7ba0b
  • 02:47 krinkle@deploy1002: Finished deploy [integration/docroot@010e445]: (no justification provided) (duration: 00m 09s)
  • 02:47 krinkle@deploy1002: Started deploy [integration/docroot@010e445]: (no justification provided)
  • 01:34 eileen: civicrm revision changed from 35a8dd33ba to 42ca3cf65a, config revision is cf07e7ba0b
  • 00:28 legoktm: legoktm@deneb:/var/cache/pbuilder/aptcache$ sudo rm -rf * # Cleaned up 8GB more
  • 00:27 legoktm: legoktm@deneb:/var/cache/apt/archives$ sudo rm -rf * # cleaned up 6GB
  • 00:03 legoktm: subscribed all list admins to the listadmins@ mailing list (T280716)

2021-04-21

  • 23:58 eileen: tools revision changed from 3d950fffbd to c26a8c0cb6
  • 23:49 legoktm: made myself and Amir1 list admins for the listadmins@lists.wikimedia.org mailing list
  • 20:32 robh@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts cloudcephosd1017.eqiad.wmnet
  • 20:21 robh@cumin1001: START - Cookbook sre.hosts.decommission for hosts cloudcephosd1017.eqiad.wmnet
  • 20:18 robh@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts cloudcephosd1016.eqiad.wmnet
  • 20:03 robh@cumin1001: START - Cookbook sre.hosts.decommission for hosts cloudcephosd1016.eqiad.wmnet
  • 19:59 dzahn@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host planet1003.eqiad.wmnet
  • 19:52 robh@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 19:48 robh@cumin1001: START - Cookbook sre.dns.netbox
  • 19:48 robh@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 19:46 mutante: creating a ganeti VM to test bullseye install
  • 19:46 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm for new host planet1003.eqiad.wmnet
  • 19:45 bstorm: manually kicking off a run of update-openstack-mirror on sodium to capture an upstream package update
  • 19:15 robh@cumin1001: START - Cookbook sre.dns.netbox
  • 18:46 Urbanecm: Morning B&C done
  • 18:42 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.1/extensions/WikibaseMediaInfo/: f831d16: Make the logistic regression image search default (T271799) (duration: 00m 58s)
  • 18:38 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: f6d076a: Update $wgGEHomepageNewAccountVariants (T278123) (duration: 00m 58s)
  • 18:14 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 1ae5ca5: Set wgGEMentorshipMigrationStage to WRITE_BOTH/READ_NEW everywhere (T279853) (duration: 00m 59s)
  • 18:05 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: e252de0: eswiki: Push Growth features out of dark mode (T278235) (duration: 01m 00s)
  • 17:43 jynus: deploy grant changes on m5 backup sources (db1117 and db2078) T278614
  • 15:54 legoktm: T280744: legoktm@lists1001:~$ sudo chmod 644 /etc/aliases
  • 15:15 Urbanecm: urbanecm@mwmaint1002:~$ foreachwikiindblist growthexperiments extensions/GrowthExperiments/maintenance/migrateMentorMenteeRelationship.php # T279853
  • 15:15 marostegui@cumin1001: dbctl commit (dc=all): 'db1165 (re)pooling @ 100%: Repool db1165', diff saved to https://phabricator.wikimedia.org/P15503 and previous config saved to /var/cache/conftool/dbconfig/20210421-151526-root.json
  • 15:02 moritzm: installing jquery security updates on buster
  • 15:00 marostegui@cumin1001: dbctl commit (dc=all): 'db1165 (re)pooling @ 75%: Repool db1165', diff saved to https://phabricator.wikimedia.org/P15502 and previous config saved to /var/cache/conftool/dbconfig/20210421-150023-root.json
  • 14:45 marostegui@cumin1001: dbctl commit (dc=all): 'db1165 (re)pooling @ 50%: Repool db1165', diff saved to https://phabricator.wikimedia.org/P15501 and previous config saved to /var/cache/conftool/dbconfig/20210421-144519-root.json
  • 14:30 marostegui@cumin1001: dbctl commit (dc=all): 'db1165 (re)pooling @ 25%: Repool db1165', diff saved to https://phabricator.wikimedia.org/P15500 and previous config saved to /var/cache/conftool/dbconfig/20210421-143015-root.json
  • 14:25 jbond42: upload new version of debmonitor-client to apt
  • 13:54 Urbanecm: [urbanecm@mwmaint1002 ~]$ time mwscript extensions/GrowthExperiments/maintenance/migrateMentorMenteeRelationship.php --wiki=fawiki # T279853
  • 13:39 moritzm: upgrading mw1262-1265,mw1277-1279 to PHP 7.2.34
  • 13:18 Urbanecm: [urbanecm@mwmaint1002 ~]$ time mwscript extensions/GrowthExperiments/maintenance/migrateMentorMenteeRelationship.php --wiki=frwiki # T279853
  • 13:01 moritzm: upgrading mw1262-1265,mw1277-1279 to PHP 7.2.34
  • 12:21 moritzm: installing failoid2002
  • 12:16 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1158.eqiad.wmnet with reason: REIMAGE
  • 12:14 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1158.eqiad.wmnet with reason: REIMAGE
  • 11:49 jbond@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 11:46 jbond@cumin1001: START - Cookbook sre.dns.netbox
  • 11:32 awight: EU backport window complete
  • 11:31 moritzm: installing failoid1002
  • 11:29 awight@deploy1002: Synchronized php-1.37.0-wmf.1/extensions/WikimediaEvents: Backport: Send 0 edits userEditCountBucket for anons (T210106) (duration: 00m 59s)
  • 10:41 jbond42: switch debmonitor-client to cfssl (second try)
  • 10:37 jbond42: upload golang-cfssl packages for jessi and stretch
  • 10:33 jmm@cumin2001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host failoid1002.eqiad.wmnet
  • 10:29 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host eventlog1002.eqiad.wmnet
  • 10:23 jmm@cumin2001: START - Cookbook sre.ganeti.makevm for new host failoid1002.eqiad.wmnet
  • 10:22 hnowlan@cumin1001: START - Cookbook sre.hosts.reboot-single for host eventlog1002.eqiad.wmnet
  • 10:21 jmm@cumin2001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host failoid2002.codfw.wmnet
  • 10:21 hnowlan: rebooting eventlog1002 for kernel update
  • 10:06 jmm@cumin2001: START - Cookbook sre.ganeti.makevm for new host failoid2002.codfw.wmnet
  • 09:56 jbond42: switch debmonitor-clients to use cfssl
  • 09:31 marostegui@cumin1001: dbctl commit (dc=all): 'db1074 (re)pooling @ 100%: Repool db1074', diff saved to https://phabricator.wikimedia.org/P15496 and previous config saved to /var/cache/conftool/dbconfig/20210421-093109-root.json
  • 09:16 marostegui@cumin1001: dbctl commit (dc=all): 'db1074 (re)pooling @ 75%: Repool db1074', diff saved to https://phabricator.wikimedia.org/P15495 and previous config saved to /var/cache/conftool/dbconfig/20210421-091605-root.json
  • 09:08 elukey: upgrade hue on an-tool1009 to 4.9
  • 09:05 filippo@deploy1002: Finished deploy [librenms/librenms@692b5d5]: Upgrade LibreNMS to 21.4.0 - T266987 (duration: 00m 05s)
  • 09:05 filippo@deploy1002: Started deploy [librenms/librenms@692b5d5]: Upgrade LibreNMS to 21.4.0 - T266987
  • 09:03 jiji@cumin1001: conftool action : set/pooled=yes; selector: name=mw2280.codfw.wmnet,service=nginx
  • 09:01 marostegui@cumin1001: dbctl commit (dc=all): 'db1074 (re)pooling @ 50%: Repool db1074', diff saved to https://phabricator.wikimedia.org/P15494 and previous config saved to /var/cache/conftool/dbconfig/20210421-090100-root.json
  • 09:00 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ores1009.eqiad.wmnet
  • 08:58 filippo@deploy1002: Finished deploy [librenms/librenms@692b5d5]: Upgrade LibreNMS to 21.4.0 - T266987 (duration: 00m 05s)
  • 08:58 filippo@deploy1002: Started deploy [librenms/librenms@692b5d5]: Upgrade LibreNMS to 21.4.0 - T266987
  • 08:58 filippo@deploy1002: Finished deploy [librenms/librenms@692b5d5]: Upgrade LibreNMS to 21.4.0 - T266987 (duration: 00m 05s)
  • 08:58 filippo@deploy1002: Started deploy [librenms/librenms@692b5d5]: Upgrade LibreNMS to 21.4.0 - T266987
  • 08:56 filippo@deploy1002: Finished deploy [librenms/librenms@692b5d5]: Upgrade LibreNMS to 21.4.0 - T266987 (duration: 00m 05s)
  • 08:55 filippo@deploy1002: Started deploy [librenms/librenms@692b5d5]: Upgrade LibreNMS to 21.4.0 - T266987
  • 08:55 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ores1009.eqiad.wmnet
  • 08:53 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ores1008.eqiad.wmnet
  • 08:53 filippo@deploy1002: Finished deploy [librenms/librenms@692b5d5]: Upgrade LibreNMS to 21.4.0 - T266987 (duration: 00m 05s)
  • 08:52 filippo@deploy1002: Started deploy [librenms/librenms@692b5d5]: Upgrade LibreNMS to 21.4.0 - T266987
  • 08:52 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ores1007.eqiad.wmnet
  • 08:50 filippo@deploy1002: Finished deploy [librenms/librenms@692b5d5]: Upgrade LibreNMS to 21.4.0 - T266987 (duration: 00m 10s)
  • 08:50 filippo@deploy1002: Started deploy [librenms/librenms@692b5d5]: Upgrade LibreNMS to 21.4.0 - T266987
  • 08:47 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ores1008.eqiad.wmnet
  • 08:47 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ores1007.eqiad.wmnet
  • 08:46 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ores1005.eqiad.wmnet
  • 08:46 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ores1006.eqiad.wmnet
  • 08:45 marostegui@cumin1001: dbctl commit (dc=all): 'db1074 (re)pooling @ 25%: Repool db1074', diff saved to https://phabricator.wikimedia.org/P15493 and previous config saved to /var/cache/conftool/dbconfig/20210421-084555-root.json
  • 08:41 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ores1006.eqiad.wmnet
  • 08:41 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ores1005.eqiad.wmnet
  • 08:40 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ores1004.eqiad.wmnet
  • 08:40 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ores1003.eqiad.wmnet
  • 08:33 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ores1004.eqiad.wmnet
  • 08:33 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ores1003.eqiad.wmnet
  • 08:30 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ores1002.eqiad.wmnet
  • 08:22 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ores1001.eqiad.wmnet
  • 08:16 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ores1002.eqiad.wmnet
  • 08:16 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ores1001.eqiad.wmnet
  • 08:10 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ores2009.codfw.wmnet
  • 08:05 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ores2009.codfw.wmnet
  • 08:05 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ores2008.codfw.wmnet
  • 08:05 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ores2007.codfw.wmnet
  • 07:59 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ores2008.codfw.wmnet
  • 07:59 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ores2007.codfw.wmnet
  • 07:58 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ores2006.codfw.wmnet
  • 07:58 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ores2005.codfw.wmnet
  • 07:53 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ores2006.codfw.wmnet
  • 07:52 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ores2005.codfw.wmnet
  • 07:52 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-coord1001.eqiad.wmnet with reason: REIMAGE
  • 07:52 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ores2003.codfw.wmnet
  • 07:52 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ores2004.codfw.wmnet
  • 07:50 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-coord1001.eqiad.wmnet with reason: REIMAGE
  • 07:46 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ores2004.codfw.wmnet
  • 07:46 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ores2003.codfw.wmnet
  • 07:44 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ores2002.codfw.wmnet
  • 07:44 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ores2001.codfw.wmnet
  • 07:39 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ores2002.codfw.wmnet
  • 07:38 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ores2001.codfw.wmnet
  • 06:49 akosiaris@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 06:49 akosiaris@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 06:42 elukey: upload hue_4.9.0-2+deb10u1 to buster-wikimedia
  • 06:11 marostegui: Stop MySQL on db1074 to clone db1156 (there will be lag in s2 in wiki replicas) T258361
  • 06:10 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1074 to clone db1156 T258361', diff saved to https://phabricator.wikimedia.org/P15491 and previous config saved to /var/cache/conftool/dbconfig/20210421-061019-marostegui.json
  • 06:07 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2082.codfw.wmnet with reason: REIMAGE
  • 06:05 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2077.codfw.wmnet with reason: REIMAGE
  • 06:03 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db2082.codfw.wmnet with reason: REIMAGE
  • 06:03 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db2077.codfw.wmnet with reason: REIMAGE
  • 05:42 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db1086.eqiad.wmnet
  • 05:33 marostegui@cumin1001: START - Cookbook sre.hosts.decommission for hosts db1086.eqiad.wmnet
  • 00:38 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on cloudnet2004-dev.codfw.wmnet with reason: REIMAGE
  • 00:36 pt1979@cumin2001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudnet2004-dev.codfw.wmnet with reason: REIMAGE
  • 00:15 ryankemper: [WDQS] Pooled `wdqs1003`
  • 00:14 ryankemper: [WDQS] Pooled `wdqs2008`
  • 00:07 ryankemper: `sudo -i wmf-auto-reimage-host -p T280382 wdqs1006.eqiad.wmnet`
  • 00:04 ryankemper: [WDQS] pooled `wdqs1004`

2021-04-20

  • 23:46 urbanecm@deploy1002: Synchronized dblists/growthexperiments.dblist: 73544cc: urwiki: Enable Growth team features in stealth mode (T280067) (duration: 00m 57s)
  • 23:44 urbanecm@deploy1002: Synchronized wmf-config/config/urwiki.yaml: 73544cc: urwiki: Enable Growth team features in stealth mode (T280067) (duration: 00m 57s)
  • 23:42 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 73544cc: urwiki: Enable Growth team features in stealth mode (T280067) (duration: 00m 58s)
  • 23:38 Urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript extensions/WikimediaMaintenance/createExtensionTables.php --wiki=urwiki GrowthExperiments # T280067
  • 23:38 urbanecm@deploy1002: Synchronized dblists/growthexperiments.dblist: 314367b: elwiki: Enable Growth team features in stealth mode (T280172; 3/3) (duration: 00m 56s)
  • 23:36 urbanecm@deploy1002: Synchronized wmf-config/config/elwiki.yaml: 314367b: elwiki: Enable Growth team features in stealth mode (T280172; 2/3) (duration: 00m 57s)
  • 23:35 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 314367b: elwiki: Enable Growth team features in stealth mode (T280172; 1/3) (duration: 00m 57s)
  • 23:34 Urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript deleteEqualMessages.php --wiki=hrwiki --delete
  • 23:32 Urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript extensions/WikimediaMaintenance/createExtensionTables.php --wiki=elwiki GrowthExperiments # T280172
  • 23:31 urbanecm@deploy1002: Synchronized dblists/growthexperiments.dblist: 425d77b: cawiki: Enable Growth team features in stealth mode (T280673; 3/3) (duration: 00m 57s)
  • 23:28 Urbanecm: [urbanecm@mwmaint1002 ~]$ foreachwikiindblist growthexperiments sql.php --cluster=extension1 /srv/mediawiki/php-1.37.0-wmf.1/extensions/GrowthExperiments/maintenance/schemas/mysql/growthexperiments_mentee_data.sql # T279587
  • 23:28 urbanecm@deploy1002: Synchronized wmf-config/config/cawiki.yaml: 425d77b: cawiki: Enable Growth team features in stealth mode (T280673; 2/3) (duration: 00m 57s)
  • 23:27 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 425d77b: cawiki: Enable Growth team features in stealth mode (T280673; 1/3) (duration: 00m 57s)
  • 23:24 Urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript extensions/WikimediaMaintenance/createExtensionTables.php --wiki=cawiki GrowthExperiments # T280673
  • 23:11 razzi@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on furud.codfw.wmnet with reason: REIMAGE
  • 23:09 razzi@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on furud.codfw.wmnet with reason: REIMAGE
  • 23:05 razzi@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on flerovium.eqiad.wmnet with reason: REIMAGE
  • 23:03 razzi@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on flerovium.eqiad.wmnet with reason: REIMAGE
  • 22:14 robh@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 22:10 robh@cumin1001: START - Cookbook sre.dns.netbox
  • 21:46 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephosd1016.eqiad.wmnet with reason: REIMAGE
  • 21:44 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd1016.eqiad.wmnet with reason: REIMAGE
  • 20:52 Urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript extensions/GrowthExperiments/maintenance/migrateMentorMenteeRelationship.php --wiki=ruwiki # T279853
  • 20:48 robh@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts cloudcephosd1020.wikimedia.org
  • 20:41 Urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript extensions/GrowthExperiments/maintenance/migrateMentorMenteeRelationship.php --wiki=viwiki # T279853
  • 20:36 robh@cumin1001: START - Cookbook sre.hosts.decommission for hosts cloudcephosd1020.wikimedia.org
  • 20:36 Urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript extensions/GrowthExperiments/maintenance/migrateMentorMenteeRelationship.php --wiki=ukwiki # T279853
  • 20:35 robh@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts cloudcephosd[1017-1019].wikimedia.org
  • 20:34 Urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript extensions/GrowthExperiments/maintenance/migrateMentorMenteeRelationship.php --wiki=tewiki # T279853
  • 20:32 Urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript extensions/GrowthExperiments/maintenance/migrateMentorMenteeRelationship.php --wiki=svwiki # T279853
  • 20:30 Urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript extensions/GrowthExperiments/maintenance/migrateMentorMenteeRelationship.php --wiki=srwiki # T279853
  • 20:29 Urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript extensions/GrowthExperiments/maintenance/migrateMentorMenteeRelationship.php --wiki=rowiki # T279853
  • 20:27 Urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript extensions/GrowthExperiments/maintenance/migrateMentorMenteeRelationship.php --wiki=hywiki # T279853
  • 20:22 Urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript extensions/GrowthExperiments/maintenance/migrateMentorMenteeRelationship.php --wiki=huwiki # T279853
  • 20:21 Urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript extensions/GrowthExperiments/maintenance/migrateMentorMenteeRelationship.php --wiki=hrwiki # T279853
  • 20:18 Urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript extensions/GrowthExperiments/maintenance/migrateMentorMenteeRelationship.php --wiki=hewiki # T279853
  • 20:16 Urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript extensions/GrowthExperiments/maintenance/migrateMentorMenteeRelationship.php --wiki=frwiktionary # T279853
  • 20:16 robh@cumin1001: START - Cookbook sre.hosts.decommission for hosts cloudcephosd[1017-1019].wikimedia.org
  • 20:15 Urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript extensions/GrowthExperiments/maintenance/migrateMentorMenteeRelationship.php --wiki=euwiki # T279853
  • 20:13 Urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript extensions/GrowthExperiments/maintenance/migrateMentorMenteeRelationship.php --wiki=bnwiki # T279853
  • 20:08 robh@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 20:03 robh@cumin1001: START - Cookbook sre.dns.netbox
  • 19:58 robh@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 19:56 robh@cumin1001: START - Cookbook sre.dns.netbox
  • 19:28 robh@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts cloudcephosd1016.wikimedia.org
  • 18:34 Urbanecm: mwscript extensions/GrowthExperiments/maintenance/migrateMentorMenteeRelationship.php --wiki=idwiki # T279853
  • 18:33 robh@cumin1001: START - Cookbook sre.hosts.decommission for hosts cloudcephosd1016.wikimedia.org
  • 18:29 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.1/extensions/GrowthExperiments/: 4d1969d: 1fbb8e9: MentorStore: Set wasPosted to true in command line mode (T275773) (duration: 00m 59s)
  • 17:26 XioNoX: boot cr1-codfw:fpc1 - T277341
  • 17:16 papaul: Adding a MPC7E to cr1-codfw
  • 16:32 arturo: merging change to core route firewall https://gerrit.wikimedia.org/r/c/operations/homer/public/+/681316 (T272587)
  • 16:15 andrewbogott: updating core routers config with https://gerrit.wikimedia.org/r/c/operations/homer/public/+/681315
  • 15:38 hnowlan@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host eventlog1003.eqiad.wmnet
  • 15:22 urbanecm@deploy1002: Synchronized docroot/noc/conf/debug.json: dc6647b: remove mwdebug1003 from list of debug servers (T267248) (duration: 00m 58s)
  • 15:20 urbanecm@deploy1002: Synchronized debug.json: dc6647b: remove mwdebug1003 from list of debug servers (T267248) (duration: 00m 57s)
  • 15:14 hnowlan@cumin1001: START - Cookbook sre.ganeti.makevm for new host eventlog1003.eqiad.wmnet
  • 15:08 pt1979@cumin2001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:02 pt1979@cumin2001: START - Cookbook sre.dns.netbox
  • 14:59 volker-e@deploy1002: Finished deploy [design/style-guide@c4d8314]: Deploy design/style-guide: c4d8314 “Components”: Fix “Buttons” active states (#460) (duration: 00m 07s)
  • 14:58 volker-e@deploy1002: Started deploy [design/style-guide@c4d8314]: Deploy design/style-guide: c4d8314 “Components”: Fix “Buttons” active states (#460)
  • 14:40 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'zotero' for release 'production' .
  • 14:38 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'zotero' for release 'production' .
  • 14:37 oblivian@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'zotero' for release 'staging' .
  • 14:35 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
  • 14:34 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
  • 14:31 oblivian@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'mobileapps' for release 'staging' .
  • 14:30 moritzm: installing exim updates from Buster point release
  • 14:27 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
  • 14:27 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-main' for release 'canary' .
  • 14:25 otto@deploy1002: Finished deploy [analytics/refinery@fc6767a] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@fc6767a] (duration: 04m 56s)
  • 14:25 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-main' for release 'canary' .
  • 14:24 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
  • 14:22 oblivian@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
  • 14:20 otto@deploy1002: Started deploy [analytics/refinery@fc6767a] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@fc6767a]
  • 14:18 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'canary' .
  • 14:18 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
  • 14:17 otto@deploy1002: Finished deploy [analytics/refinery@fc6767a] (thin): Regular analytics weekly train THIN [analytics/refinery@fc6767a] (duration: 00m 07s)
  • 14:17 otto@deploy1002: Started deploy [analytics/refinery@fc6767a] (thin): Regular analytics weekly train THIN [analytics/refinery@fc6767a]
  • 14:16 otto@deploy1002: Finished deploy [analytics/refinery@fc6767a]: Regular analytics weekly train - an-launcher1002 retry [analytics/refinery@fc6767a] (duration: 00m 03s)
  • 14:16 otto@deploy1002: Started deploy [analytics/refinery@fc6767a]: Regular analytics weekly train - an-launcher1002 retry [analytics/refinery@fc6767a]
  • 14:16 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'canary' .
  • 14:16 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
  • 14:16 otto@deploy1002: Finished deploy [analytics/refinery@fc6767a]: Regular analytics weekly train - an-launcher1002 retry\ [analytics/refinery@fc6767a] (duration: 00m 03s)
  • 14:15 otto@deploy1002: Started deploy [analytics/refinery@fc6767a]: Regular analytics weekly train - an-launcher1002 retry\ [analytics/refinery@fc6767a]
  • 14:15 otto@deploy1002: Finished deploy [analytics/refinery@fc6767a]: Regular analytics weekly train - an-launcher1002 retry\ [analytics/refinery@fc6767a] (duration: 00m 03s)
  • 14:14 otto@deploy1002: Started deploy [analytics/refinery@fc6767a]: Regular analytics weekly train - an-launcher1002 retry\ [analytics/refinery@fc6767a]
  • 14:14 otto@deploy1002: Finished deploy [analytics/refinery@fc6767a]: Regular analytics weekly train [analytics/refinery@fc6767a] (duration: 14m 50s)
  • 14:11 oblivian@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
  • 14:09 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
  • 14:09 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'canary' .
  • 14:06 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'canary' .
  • 14:06 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
  • 14:04 oblivian@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
  • 14:01 jiji@cumin1001: conftool action : set/pooled=no; selector: name=mw2280.codfw.wmnet,cluster=videoscaler
  • 13:59 otto@deploy1002: Started deploy [analytics/refinery@fc6767a]: Regular analytics weekly train [analytics/refinery@fc6767a]
  • 13:42 moritzm: upgrading mw1276 to PHP 7.2.34
  • 13:40 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-analytics' for release 'production' .
  • 13:40 ayounsi@deploy1002: Finished deploy [homer/deploy@759f82c]: Homer release v0.2.7 (duration: 00m 13s)
  • 13:40 ayounsi@deploy1002: Started deploy [homer/deploy@759f82c]: Homer release v0.2.7
  • 13:38 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-analytics' for release 'production' .
  • 13:36 otto@deploy1002: Finished deploy [analytics/aqs/deploy@ad170d4]: deploy Refactor pageviews per-article endpoint (duration: 05m 17s)
  • 13:35 oblivian@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics' for release 'production' .
  • 13:35 oblivian@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics' for release 'canary' .
  • 13:33 moritzm: upgrading mw1261 to PHP 7.2.34
  • 13:31 otto@deploy1002: Started deploy [analytics/aqs/deploy@ad170d4]: deploy Refactor pageviews per-article endpoint
  • 13:27 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'similar-users' for release 'main' .
  • 13:26 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'similar-users' for release 'main' .
  • 13:25 oblivian@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'similar-users' for release 'main' .
  • 13:22 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'push-notifications' for release 'main' .
  • 13:21 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'push-notifications' for release 'main' .
  • 13:19 oblivian@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'push-notifications' for release 'main' .
  • 13:15 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
  • 13:14 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cumin2002.codfw.wmnet with reason: REIMAGE
  • 13:13 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.1/includes/actions/RollbackAction.php: ccbfcf2: Do not mark rollbacks as bot edits (T280655) (duration: 00m 57s)
  • 13:12 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
  • 13:12 jmm@cumin2001: START - Cookbook sre.hosts.downtime for 2:00:00 on cumin2002.codfw.wmnet with reason: REIMAGE
  • 13:09 oblivian@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'wikifeeds' for release 'staging' .
  • 13:09 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'proton' for release 'production' .
  • 13:07 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'proton' for release 'production' .
  • 13:04 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2076.codfw.wmnet with reason: REIMAGE
  • 13:03 oblivian@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'proton' for release 'production' .
  • 13:02 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db2076.codfw.wmnet with reason: REIMAGE
  • 12:58 moritzm: reimaging cumin2002 to bullseye T276589
  • 12:55 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'termbox' for release 'production' .
  • 12:54 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'termbox' for release 'production' .
  • 12:52 oblivian@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'termbox' for release 'staging' .
  • 12:51 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'recommendation-api' for release 'production' .
  • 12:49 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'recommendation-api' for release 'production' .
  • 12:47 oblivian@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'recommendation-api' for release 'production' .
  • 12:42 moritzm: uploaded PHP 7.2.34-18+0~20210223.60+debian10~1.gbpb21322+wmf1 to component/php72
  • 12:41 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1165 to check its tables T280492', diff saved to https://phabricator.wikimedia.org/P15483 and previous config saved to /var/cache/conftool/dbconfig/20210420-124118-marostegui.json
  • 12:28 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti5003.eqsin.wmnet
  • 12:28 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventstreams-internal' for release 'main' .
  • 12:27 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventstreams-internal' for release 'main' .
  • 12:25 oblivian@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'eventstreams-internal' for release 'main' .
  • 12:23 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ganeti5003.eqsin.wmnet
  • 12:21 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventstreams' for release 'canary' .
  • 12:21 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventstreams' for release 'production' .
  • 12:18 CFisch_WMDE: European mid-day backport window done
  • 12:05 wmde-fisch@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Add NS_PROJECT alias for azwiki (T280577) (duration: 00m 57s)
  • 12:04 moritzm: drain ganeti5003
  • 11:55 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventstreams' for release 'production' .
  • 11:55 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventstreams' for release 'canary' .
  • 11:54 wmde-fisch@deploy1002: Synchronized php-1.37.0-wmf.1/extensions/DiscussionTools/includes/CommentFormatter.php: Backport: CommentFormatter: Add ext-discussiontools-section class instead of overwriting (T280433) (duration: 00m 57s)
  • 11:47 moritzm: failover ganeti master in eqsin to ganeti5001
  • 11:46 oblivian@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'eventstreams' for release 'production' .
  • 11:38 wmde-fisch@deploy1002: Synchronized php-1.37.0-wmf.1/extensions/VisualEditor/modules/ve-mw/ui/pages/ve.ui.MWParameterPage.js: Backport: Add filtering for the suggested values combo box (T271898) (duration: 00m 58s)
  • 11:15 wmde-fisch@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Add default import sources (T214139) (duration: 00m 58s)
  • 11:11 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mathoid' for release 'production' .
  • 11:09 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mathoid' for release 'production' .
  • 11:07 oblivian@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'mathoid' for release 'staging' .
  • 10:49 _joe_: temporary installing some python packages on deploy1002 for testing
  • 10:43 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti5001.eqsin.wmnet
  • 10:34 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ganeti5001.eqsin.wmnet
  • 10:20 moritzm: drain ganeti5001
  • 10:11 hnowlan: opening access to cassandra on new AQS hosts (aqs101*) to analytics-in4 filter
  • 10:05 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host aphlict1001.eqiad.wmnet
  • 10:04 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host aphlict1001.eqiad.wmnet
  • 09:42 volans@cumin2001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for cumin2001.codfw.wmnet,cumin1001.eqiad.wmnet
  • 09:42 volans@cumin2001: START - Cookbook sre.hosts.remove-downtime for cumin2001.codfw.wmnet,cumin1001.eqiad.wmnet
  • 09:42 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudgw1002.eqiad.wmnet with reason: REIMAGE
  • 09:40 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudgw1001.eqiad.wmnet with reason: REIMAGE
  • 09:38 aborrero@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudgw1002.eqiad.wmnet with reason: REIMAGE
  • 09:38 aborrero@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudgw1001.eqiad.wmnet with reason: REIMAGE
  • 09:20 kharlan@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'linkrecommendation' for release 'external' .
  • 09:20 kharlan@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'linkrecommendation' for release 'internal' .
  • 08:58 kharlan@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'linkrecommendation' for release 'external' .
  • 08:58 kharlan@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'linkrecommendation' for release 'internal' .
  • 08:54 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on thanos-fe1003.eqiad.wmnet with reason: REIMAGE
  • 08:51 filippo@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on thanos-fe1003.eqiad.wmnet with reason: REIMAGE
  • 08:50 kharlan@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'linkrecommendation' for release 'staging' .
  • 08:17 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host orespoolcounter1003.eqiad.wmnet
  • 08:15 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host orespoolcounter1003.eqiad.wmnet
  • 08:14 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host orespoolcounter1004.eqiad.wmnet
  • 08:12 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host orespoolcounter1004.eqiad.wmnet
  • 08:12 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2128.codfw.wmnet with reason: REIMAGE
  • 08:10 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host orespoolcounter2004.codfw.wmnet
  • 08:10 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db2128.codfw.wmnet with reason: REIMAGE
  • 08:09 dcaro: reprepro updating thirdparty/ceph-octopus repo
  • 08:08 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host orespoolcounter2004.codfw.wmnet
  • 08:07 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on thanos-fe1002.eqiad.wmnet with reason: REIMAGE
  • 08:06 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host orespoolcounter2003.codfw.wmnet
  • 08:05 filippo@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on thanos-fe1002.eqiad.wmnet with reason: REIMAGE
  • 08:04 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host orespoolcounter2003.codfw.wmnet
  • 07:59 marostegui@cumin1001: dbctl commit (dc=all): 'Remove db1086 from dbctl T278229', diff saved to https://phabricator.wikimedia.org/P15482 and previous config saved to /var/cache/conftool/dbconfig/20210420-075949-marostegui.json
  • 07:38 XioNoX: BGP: prioritize directly connected peers - T280054
  • 07:38 marostegui@cumin1001: dbctl commit (dc=all): 'db1161 (re)pooling @ 100%: Repool db1161', diff saved to https://phabricator.wikimedia.org/P15480 and previous config saved to /var/cache/conftool/dbconfig/20210420-073808-root.json
  • 07:35 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on thanos-fe2003.codfw.wmnet with reason: REIMAGE
  • 07:33 filippo@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on thanos-fe2003.codfw.wmnet with reason: REIMAGE
  • 07:23 marostegui@cumin1001: dbctl commit (dc=all): 'db1161 (re)pooling @ 75%: Repool db1161', diff saved to https://phabricator.wikimedia.org/P15479 and previous config saved to /var/cache/conftool/dbconfig/20210420-072305-root.json
  • 07:08 marostegui@cumin1001: dbctl commit (dc=all): 'db1161 (re)pooling @ 50%: Repool db1161', diff saved to https://phabricator.wikimedia.org/P15478 and previous config saved to /var/cache/conftool/dbconfig/20210420-070801-root.json
  • 07:05 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2074.codfw.wmnet with reason: REIMAGE
  • 07:03 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db2074.codfw.wmnet with reason: REIMAGE
  • 06:52 marostegui@cumin1001: dbctl commit (dc=all): 'db1161 (re)pooling @ 25%: Repool db1161', diff saved to https://phabricator.wikimedia.org/P15477 and previous config saved to /var/cache/conftool/dbconfig/20210420-065257-root.json
  • 06:38 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2127.codfw.wmnet with reason: REIMAGE
  • 06:36 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db2127.codfw.wmnet with reason: REIMAGE
  • 06:16 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2073.codfw.wmnet with reason: REIMAGE
  • 06:14 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2074.codfw.wmnet with reason: REIMAGE
  • 06:13 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db2073.codfw.wmnet with reason: REIMAGE
  • 06:12 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2105.codfw.wmnet with reason: REIMAGE
  • 06:11 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db2074.codfw.wmnet with reason: REIMAGE
  • 06:10 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db2105.codfw.wmnet with reason: REIMAGE

2021-04-19

  • 22:56 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1040.eqiad.wmnet with reason: REIMAGE
  • 22:53 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1040.eqiad.wmnet with reason: REIMAGE
  • 22:37 Trey314159: reindexing wikidata on cloudelastic finished/failed (T274200)
  • 22:37 Trey314159: reindexing commons and wikidata on elastic@eqiad finished/failed (T274200)
  • 21:46 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephosd1018.wikimedia.org with reason: REIMAGE
  • 21:44 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd1018.wikimedia.org with reason: REIMAGE
  • 21:03 sbassett: Deployed security patch for T280226
  • 19:56 dcausse: repool wdqs1005
  • 19:15 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thumbor2004.codfw.wmnet
  • 19:04 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host thumbor2004.codfw.wmnet
  • 18:57 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thumbor2003.codfw.wmnet
  • 18:56 ppchelko@deploy1002: Synchronized php-1.37.0-wmf.1/tests: Factor out rollback logic from WikiPage - /tests (duration: 00m 59s)
  • 18:55 ppchelko@deploy1002: Synchronized php-1.37.0-wmf.1/maintenance: Factor out rollback logic from WikiPage - /maintenance (duration: 00m 57s)
  • 18:51 ppchelko@deploy1002: Synchronized php-1.37.0-wmf.1/includes/: Factor out rollback logic from WikiPage - /includes (duration: 01m 01s)
  • 18:47 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host thumbor2003.codfw.wmnet
  • 18:47 jiji@cumin1001: conftool action : set/pooled=yes; selector: cluster=thumbor,name=thumbor2001.codfw.wmnet
  • 18:44 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thumbor2002.codfw.wmnet
  • 18:39 ppchelko@deploy1002: Synchronized wmf-config/CommonSettings.php: T274436 Math: Enable RESTBase-less Wikidata math validation (duration: 00m 56s)
  • 18:34 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host thumbor2002.codfw.wmnet
  • 18:21 ppchelko@deploy1002: Synchronized wmf-config/CommonSettings.php: T249745 [EventBus] Make eventage-main timeout consistent with envoy (duration: 00m 56s)
  • 18:13 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.1/extensions/DiscussionTools/: 66d137b: Remove <header> tags around headings for compat with MobileFrontend (T280433) (duration: 00m 59s)
  • 18:03 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thumbor2001.codfw.wmnet
  • 18:02 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.1/extensions/GrowthExperiments/includes/Mentorship/Store/DatabaseMentorStore.php: 0233507: DatabaseMentorStore: Fix deprecation warning in upsert query (T280525) (duration: 00m 57s)
  • 17:47 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host thumbor2001.codfw.wmnet
  • 17:40 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thumbor1004.eqiad.wmnet
  • 17:23 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host thumbor1004.eqiad.wmnet
  • 17:20 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thumbor1003.eqiad.wmnet
  • 17:11 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host thumbor1003.eqiad.wmnet
  • 17:08 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thumbor1002.eqiad.wmnet
  • 16:57 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host thumbor1002.eqiad.wmnet
  • 16:57 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thumbor1001.eqiad.wmnet
  • 16:48 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host thumbor1001.eqiad.wmnet
  • 16:25 hoo: Updated the Wikidata property suggester with data from the 2021-04-12 JSON dump (with pre-applied T132839 workarounds)
  • 16:11 marostegui@cumin1001: dbctl commit (dc=all): 'db1182 (re)pooling @ 100%: Slowly pool db1182 for the first time in s2 T275633', diff saved to https://phabricator.wikimedia.org/P15474 and previous config saved to /var/cache/conftool/dbconfig/20210419-161134-root.json
  • 15:56 marostegui@cumin1001: dbctl commit (dc=all): 'db1182 (re)pooling @ 90%: Slowly pool db1182 for the first time in s2 T275633', diff saved to https://phabricator.wikimedia.org/P15473 and previous config saved to /var/cache/conftool/dbconfig/20210419-155631-root.json
  • 15:41 marostegui@cumin1001: dbctl commit (dc=all): 'db1182 (re)pooling @ 80%: Slowly pool db1182 for the first time in s2 T275633', diff saved to https://phabricator.wikimedia.org/P15472 and previous config saved to /var/cache/conftool/dbconfig/20210419-154127-root.json
  • 15:26 marostegui@cumin1001: dbctl commit (dc=all): 'db1182 (re)pooling @ 70%: Slowly pool db1182 for the first time in s2 T275633', diff saved to https://phabricator.wikimedia.org/P15471 and previous config saved to /var/cache/conftool/dbconfig/20210419-152623-root.json
  • 15:24 volans: reverted debmonitor-client to 0.2.0-1 on apt.w.o for jessie-wikimedia
  • 15:11 marostegui@cumin1001: dbctl commit (dc=all): 'db1182 (re)pooling @ 60%: Slowly pool db1182 for the first time in s2 T275633', diff saved to https://phabricator.wikimedia.org/P15470 and previous config saved to /var/cache/conftool/dbconfig/20210419-151119-root.json
  • 14:56 marostegui@cumin1001: dbctl commit (dc=all): 'db1182 (re)pooling @ 50%: Slowly pool db1182 for the first time in s2 T275633', diff saved to https://phabricator.wikimedia.org/P15469 and previous config saved to /var/cache/conftool/dbconfig/20210419-145616-root.json
  • 14:53 reedy@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Rename RelatedArticles wmg variables to wg (duration: 00m 56s)
  • 14:53 jbond42: update debmonitor-client - T280484
  • 14:52 reedy@deploy1002: Synchronized wmf-config/CommonSettings.php: Remove RelatedArticles extension function and wmg to wg mapping (duration: 00m 56s)
  • 14:48 reedy@deploy1002: Synchronized wmf-config/PoolCounterSettings.php: Use namespaced PoolCounter Client (duration: 00m 57s)
  • 14:44 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1086 T278229', diff saved to https://phabricator.wikimedia.org/P15468 and previous config saved to /var/cache/conftool/dbconfig/20210419-144422-marostegui.json
  • 14:41 marostegui@cumin1001: dbctl commit (dc=all): 'db1182 (re)pooling @ 40%: Slowly pool db1182 for the first time in s2 T275633', diff saved to https://phabricator.wikimedia.org/P15467 and previous config saved to /var/cache/conftool/dbconfig/20210419-144112-root.json
  • 14:41 volans: uploaded debmonitor-client 0.2.8 to apt.w.o for jessie, stretch, buster, bullseye
  • 14:29 hnowlan: imported envoyproxy_1.16.3-1 debs to envoy-future component
  • 14:26 marostegui@cumin1001: dbctl commit (dc=all): 'db1182 (re)pooling @ 30%: Slowly pool db1182 for the first time in s2 T275633', diff saved to https://phabricator.wikimedia.org/P15466 and previous config saved to /var/cache/conftool/dbconfig/20210419-142608-root.json
  • 14:11 marostegui@cumin1001: dbctl commit (dc=all): 'db1182 (re)pooling @ 20%: Slowly pool db1182 for the first time in s2 T275633', diff saved to https://phabricator.wikimedia.org/P15465 and previous config saved to /var/cache/conftool/dbconfig/20210419-141105-root.json
  • 13:56 marostegui@cumin1001: dbctl commit (dc=all): 'db1182 (re)pooling @ 15%: Slowly pool db1182 for the first time in s2 T275633', diff saved to https://phabricator.wikimedia.org/P15464 and previous config saved to /var/cache/conftool/dbconfig/20210419-135601-root.json
  • 13:41 marostegui@cumin1001: dbctl commit (dc=all): 'db1182 (re)pooling @ 10%: Slowly pool db1182 for the first time in s2 T275633', diff saved to https://phabricator.wikimedia.org/P15463 and previous config saved to /var/cache/conftool/dbconfig/20210419-134057-root.json
  • 13:25 marostegui@cumin1001: dbctl commit (dc=all): 'db1182 (re)pooling @ 5%: Slowly pool db1182 for the first time in s2 T275633', diff saved to https://phabricator.wikimedia.org/P15462 and previous config saved to /var/cache/conftool/dbconfig/20210419-132554-root.json
  • 13:19 marostegui@cumin1001: dbctl commit (dc=all): 'Pool db1182 in s2 for the first time with minimal weight T275633', diff saved to https://phabricator.wikimedia.org/P15461 and previous config saved to /var/cache/conftool/dbconfig/20210419-131936-marostegui.json
  • 13:15 marostegui@cumin1001: dbctl commit (dc=all): 'Pool db1182 in s2 for the first time with minimal weight T275633', diff saved to https://phabricator.wikimedia.org/P15460 and previous config saved to /var/cache/conftool/dbconfig/20210419-131501-marostegui.json
  • 12:58 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: bd07630: wgGEMentorshipMigrationStage: Set to WRITE_BOTH/READ_OLD everywhere (T279853) (duration: 00m 57s)
  • 12:56 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1182', diff saved to https://phabricator.wikimedia.org/P15459 and previous config saved to /var/cache/conftool/dbconfig/20210419-125600-marostegui.json
  • 12:54 marostegui@cumin1001: dbctl commit (dc=all): 'Pool db1182 in s2 for the first time with minimal weight T275633', diff saved to https://phabricator.wikimedia.org/P15458 and previous config saved to /var/cache/conftool/dbconfig/20210419-125407-marostegui.json
  • 12:53 marostegui@cumin1001: dbctl commit (dc=all): 'Add db1182 to dbctl T275633', diff saved to https://phabricator.wikimedia.org/P15457 and previous config saved to /var/cache/conftool/dbconfig/20210419-125301-marostegui.json
  • 12:51 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: ef0f68e: testwiki: wgGEMentorshipMigrationStage: Set to WRITE_BOTH/READ_NEW (T279853) (duration: 00m 57s)
  • 12:38 Urbanecm: mwscript extensions/GrowthExperiments/maintenance/migrateMentorMenteeRelationship.php --wiki=cswiki # T279853
  • 12:34 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2126.codfw.wmnet with reason: REIMAGE
  • 12:34 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 3e3cce1: cswiki: wgGEMentorshipMigrationStage: Set to WRITE_BOTH/READ_OLD (T279853) (duration: 00m 58s)
  • 12:32 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db2126.codfw.wmnet with reason: REIMAGE
  • 12:26 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2072.codfw.wmnet with reason: REIMAGE
  • 12:24 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db2072.codfw.wmnet with reason: REIMAGE
  • 11:39 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudgw1002.eqiad.wmnet
  • 11:37 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudgw1001.eqiad.wmnet
  • 11:33 moritzm: imported debdeploy 0.0.99.13-1+deb11u1 to bullseye-wikimedia T275873
  • 11:27 Urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript extensions/GrowthExperiments/maintenance/migrateMentorMenteeRelationship.php --wiki=testwiki --force # T279853
  • 11:11 Urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript extensions/GrowthExperiments/maintenance/migrateMentorMenteeRelationship.php --wiki=testwiki # T279853
  • 11:10 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 03f8ed8: testwiki: wgGEMentorshipMigrationStage: Set to WRITE_BOTH/READ_OLD (T279853) (duration: 00m 57s)
  • 11:05 ladsgroup@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Disable legacy javascript variable for the rest of wikis (T72470) (duration: 00m 57s)
  • 11:02 moritzm: import promethus-rsyslog-exporter for bullseye-wikimedia/main
  • 11:01 aborrero@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudgw1002.eqiad.wmnet
  • 11:01 aborrero@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudgw1001.eqiad.wmnet
  • 10:46 jdrewniak@deploy1002: Synchronized portals: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 00m 57s)
  • 10:45 jdrewniak@deploy1002: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 00m 58s)
  • 10:34 jmm@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on theemin.codfw.wmnet with reason: REIMAGE
  • 10:32 jmm@cumin2001: START - Cookbook sre.hosts.downtime for 2:00:00 on theemin.codfw.wmnet with reason: REIMAGE
  • 10:24 hnowlan: imported 1.16.3 into envoy-future
  • 10:22 moritzm: reimaging theemin to bullseye
  • 10:15 dcausse: depooling wdqs1005
  • 10:10 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1156.eqiad.wmnet with reason: REIMAGE
  • 10:08 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1156.eqiad.wmnet with reason: REIMAGE
  • 10:05 arturo: aborrero@apt1001:~ $ sudo -i reprepro --component thirdparty/kubeadm-k8s-1-18 update buster-wikimedia
  • 10:04 arturo: aborrero@apt1001:~ $ sudo -i reprepro --delete clearvanished (remove old buster-wikimedia|thirdparty/kubeadm-k8s-1-15,16 repos and packages)
  • 09:56 ema: cp3051: varnish-frontend-restart to apply exp policy settings changes starting from empty cache T275809
  • 09:53 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on thanos-fe2002.codfw.wmnet with reason: REIMAGE
  • 09:51 filippo@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on thanos-fe2002.codfw.wmnet with reason: REIMAGE
  • 09:42 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1161.eqiad.wmnet with reason: REIMAGE
  • 09:39 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1161.eqiad.wmnet with reason: REIMAGE
  • 09:22 marostegui@cumin1001: dbctl commit (dc=all): 'db1179 (re)pooling @ 100%: Slowly pool db1179 for the first time in s3 T275633', diff saved to https://phabricator.wikimedia.org/P15454 and previous config saved to /var/cache/conftool/dbconfig/20210419-092251-root.json
  • 09:22 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1161 T280492', diff saved to https://phabricator.wikimedia.org/P15453 and previous config saved to /var/cache/conftool/dbconfig/20210419-092234-marostegui.json
  • 09:15 marostegui@cumin1001: dbctl commit (dc=all): 'db1079 (re)pooling @ 100%: Repool db1079', diff saved to https://phabricator.wikimedia.org/P15452 and previous config saved to /var/cache/conftool/dbconfig/20210419-091535-root.json
  • 09:07 marostegui@cumin1001: dbctl commit (dc=all): 'db1179 (re)pooling @ 90%: Slowly pool db1179 for the first time in s3 T275633', diff saved to https://phabricator.wikimedia.org/P15451 and previous config saved to /var/cache/conftool/dbconfig/20210419-090747-root.json
  • 09:00 marostegui@cumin1001: dbctl commit (dc=all): 'db1079 (re)pooling @ 75%: Repool db1079', diff saved to https://phabricator.wikimedia.org/P15450 and previous config saved to /var/cache/conftool/dbconfig/20210419-090031-root.json
  • 08:52 marostegui@cumin1001: dbctl commit (dc=all): 'db1179 (re)pooling @ 80%: Slowly pool db1179 for the first time in s3 T275633', diff saved to https://phabricator.wikimedia.org/P15449 and previous config saved to /var/cache/conftool/dbconfig/20210419-085243-root.json
  • 08:48 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1087 T272008', diff saved to https://phabricator.wikimedia.org/P15448 and previous config saved to /var/cache/conftool/dbconfig/20210419-084834-marostegui.json
  • 08:45 marostegui@cumin1001: dbctl commit (dc=all): 'db1079 (re)pooling @ 50%: Repool db1079', diff saved to https://phabricator.wikimedia.org/P15447 and previous config saved to /var/cache/conftool/dbconfig/20210419-084528-root.json
  • 08:45 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1087 T272008', diff saved to https://phabricator.wikimedia.org/P15446 and previous config saved to /var/cache/conftool/dbconfig/20210419-084523-marostegui.json
  • 08:37 marostegui@cumin1001: dbctl commit (dc=all): 'db1179 (re)pooling @ 70%: Slowly pool db1179 for the first time in s3 T275633', diff saved to https://phabricator.wikimedia.org/P15445 and previous config saved to /var/cache/conftool/dbconfig/20210419-083740-root.json
  • 08:35 ema: restart debmonitor-client.service on cp4030, dns5002, an-worker1106 T280484
  • 08:34 marostegui: Testing log
  • 08:30 marostegui@cumin1001: dbctl commit (dc=all): 'db1085 (re)pooling @ 100%: Repool db1085', diff saved to https://phabricator.wikimedia.org/P15444 and previous config saved to /var/cache/conftool/dbconfig/20210419-083021-root.json
  • 08:30 marostegui@cumin1001: dbctl commit (dc=all): 'db1079 (re)pooling @ 25%: Repool db1079', diff saved to https://phabricator.wikimedia.org/P15443 and previous config saved to /var/cache/conftool/dbconfig/20210419-083018-root.json
  • 08:26 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1079 T272008', diff saved to https://phabricator.wikimedia.org/P15442 and previous config saved to /var/cache/conftool/dbconfig/20210419-082559-marostegui.json
  • 08:22 marostegui@cumin1001: dbctl commit (dc=all): 'db1179 (re)pooling @ 60%: Slowly pool db1179 for the first time in s3 T275633', diff saved to https://phabricator.wikimedia.org/P15441 and previous config saved to /var/cache/conftool/dbconfig/20210419-082236-root.json
  • 08:20 marostegui@cumin1001: dbctl commit (dc=all): 'db1082 (re)pooling @ 100%: Repool db1082', diff saved to https://phabricator.wikimedia.org/P15440 and previous config saved to /var/cache/conftool/dbconfig/20210419-082000-root.json
  • 08:15 marostegui@cumin1001: dbctl commit (dc=all): 'db1085 (re)pooling @ 75%: Repool db1085', diff saved to https://phabricator.wikimedia.org/P15439 and previous config saved to /var/cache/conftool/dbconfig/20210419-081517-root.json
  • 08:07 dcaro@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on labstore1004.eqiad.wmnet with reason: Restarting mysql
  • 08:07 dcaro@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on labstore1004.eqiad.wmnet with reason: Restarting mysql
  • 08:07 marostegui@cumin1001: dbctl commit (dc=all): 'db1179 (re)pooling @ 50%: Slowly pool db1179 for the first time in s3 T275633', diff saved to https://phabricator.wikimedia.org/P15438 and previous config saved to /var/cache/conftool/dbconfig/20210419-080732-root.json
  • 08:07 godog: swift eqiad-prod: less weight for ms-be[1019-1026] / more weight to ms-be106[0-3] - T272836
  • 08:05 marostegui@cumin1001: dbctl commit (dc=all): 'db1082 (re)pooling @ 75%: Repool db1082', diff saved to https://phabricator.wikimedia.org/P15437 and previous config saved to /var/cache/conftool/dbconfig/20210419-080456-root.json
  • 08:04 marostegui@cumin1001: dbctl commit (dc=all): 'db1121 (re)pooling @ 100%: Repool db1121', diff saved to https://phabricator.wikimedia.org/P15436 and previous config saved to /var/cache/conftool/dbconfig/20210419-080454-root.json
  • 08:03 moritzm: installing python-bleach security updates
  • 08:00 marostegui@cumin1001: dbctl commit (dc=all): 'db1085 (re)pooling @ 50%: Repool db1085', diff saved to https://phabricator.wikimedia.org/P15435 and previous config saved to /var/cache/conftool/dbconfig/20210419-080013-root.json
  • 07:52 marostegui@cumin1001: dbctl commit (dc=all): 'db1179 (re)pooling @ 40%: Slowly pool db1179 for the first time in s3 T275633', diff saved to https://phabricator.wikimedia.org/P15434 and previous config saved to /var/cache/conftool/dbconfig/20210419-075229-root.json
  • 07:51 moritzm: upgrade mwdebug2002 to PHP 7.2.34
  • 07:49 marostegui@cumin1001: dbctl commit (dc=all): 'db1082 (re)pooling @ 50%: Repool db1082', diff saved to https://phabricator.wikimedia.org/P15433 and previous config saved to /var/cache/conftool/dbconfig/20210419-074953-root.json
  • 07:49 marostegui@cumin1001: dbctl commit (dc=all): 'db1121 (re)pooling @ 75%: Repool db1121', diff saved to https://phabricator.wikimedia.org/P15432 and previous config saved to /var/cache/conftool/dbconfig/20210419-074950-root.json
  • 07:45 marostegui@cumin1001: dbctl commit (dc=all): 'db1085 (re)pooling @ 25%: Repool db1085', diff saved to https://phabricator.wikimedia.org/P15431 and previous config saved to /var/cache/conftool/dbconfig/20210419-074510-root.json
  • 07:41 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1085 T272008', diff saved to https://phabricator.wikimedia.org/P15430 and previous config saved to /var/cache/conftool/dbconfig/20210419-074155-marostegui.json
  • 07:37 marostegui@cumin1001: dbctl commit (dc=all): 'db1179 (re)pooling @ 30%: Slowly pool db1179 for the first time in s3 T275633', diff saved to https://phabricator.wikimedia.org/P15429 and previous config saved to /var/cache/conftool/dbconfig/20210419-073725-root.json
  • 07:34 marostegui@cumin1001: dbctl commit (dc=all): 'db1082 (re)pooling @ 25%: Repool db1082', diff saved to https://phabricator.wikimedia.org/P15428 and previous config saved to /var/cache/conftool/dbconfig/20210419-073449-root.json
  • 07:34 marostegui@cumin1001: dbctl commit (dc=all): 'db1121 (re)pooling @ 50%: Repool db1121', diff saved to https://phabricator.wikimedia.org/P15427 and previous config saved to /var/cache/conftool/dbconfig/20210419-073446-root.json
  • 07:34 marostegui@cumin1001: dbctl commit (dc=all): 'db1112 (re)pooling @ 100%: Repool db1112', diff saved to https://phabricator.wikimedia.org/P15426 and previous config saved to /var/cache/conftool/dbconfig/20210419-073425-root.json
  • 07:22 marostegui@cumin1001: dbctl commit (dc=all): 'db1179 (re)pooling @ 20%: Slowly pool db1179 for the first time in s3 T275633', diff saved to https://phabricator.wikimedia.org/P15425 and previous config saved to /var/cache/conftool/dbconfig/20210419-072221-root.json
  • 07:21 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'cxserver' for release 'production' .
  • 07:19 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'cxserver' for release 'production' .
  • 07:19 marostegui@cumin1001: dbctl commit (dc=all): 'db1121 (re)pooling @ 25%: Repool db1121', diff saved to https://phabricator.wikimedia.org/P15424 and previous config saved to /var/cache/conftool/dbconfig/20210419-071943-root.json
  • 07:19 marostegui@cumin1001: dbctl commit (dc=all): 'db1112 (re)pooling @ 75%: Repool db1112', diff saved to https://phabricator.wikimedia.org/P15423 and previous config saved to /var/cache/conftool/dbconfig/20210419-071921-root.json
  • 07:17 oblivian@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'cxserver' for release 'staging' .
  • 07:17 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1082 T272008', diff saved to https://phabricator.wikimedia.org/P15422 and previous config saved to /var/cache/conftool/dbconfig/20210419-071701-marostegui.json
  • 07:07 marostegui@cumin1001: dbctl commit (dc=all): 'db1179 (re)pooling @ 15%: Slowly pool db1179 for the first time in s3 T275633', diff saved to https://phabricator.wikimedia.org/P15421 and previous config saved to /var/cache/conftool/dbconfig/20210419-070718-root.json
  • 07:04 marostegui@cumin1001: dbctl commit (dc=all): 'db1121 (re)pooling @ 10%: Repool db1121', diff saved to https://phabricator.wikimedia.org/P15420 and previous config saved to /var/cache/conftool/dbconfig/20210419-070439-root.json
  • 07:04 marostegui@cumin1001: dbctl commit (dc=all): 'db1112 (re)pooling @ 50%: Repool db1112', diff saved to https://phabricator.wikimedia.org/P15419 and previous config saved to /var/cache/conftool/dbconfig/20210419-070418-root.json
  • 07:00 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1121 T272008', diff saved to https://phabricator.wikimedia.org/P15418 and previous config saved to /var/cache/conftool/dbconfig/20210419-070035-marostegui.json
  • 06:56 marostegui@cumin1001: dbctl commit (dc=all): 'db1074 (re)pooling @ 100%: Repool db1074', diff saved to https://phabricator.wikimedia.org/P15417 and previous config saved to /var/cache/conftool/dbconfig/20210419-065627-root.json
  • 06:52 marostegui@cumin1001: dbctl commit (dc=all): 'db1179 (re)pooling @ 10%: Slowly pool db1179 for the first time in s3 T275633', diff saved to https://phabricator.wikimedia.org/P15416 and previous config saved to /var/cache/conftool/dbconfig/20210419-065213-root.json
  • 06:49 marostegui@cumin1001: dbctl commit (dc=all): 'db1112 (re)pooling @ 25%: Repool db1112', diff saved to https://phabricator.wikimedia.org/P15415 and previous config saved to /var/cache/conftool/dbconfig/20210419-064914-root.json
  • 06:46 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1112 T272008', diff saved to https://phabricator.wikimedia.org/P15414 and previous config saved to /var/cache/conftool/dbconfig/20210419-064600-marostegui.json
  • 06:41 marostegui@cumin1001: dbctl commit (dc=all): 'db1074 (re)pooling @ 75%: Repool db1074', diff saved to https://phabricator.wikimedia.org/P15413 and previous config saved to /var/cache/conftool/dbconfig/20210419-064123-root.json
  • 06:26 marostegui@cumin1001: dbctl commit (dc=all): 'db1074 (re)pooling @ 50%: Repool db1074', diff saved to https://phabricator.wikimedia.org/P15412 and previous config saved to /var/cache/conftool/dbconfig/20210419-062620-root.json
  • 06:17 _joe_: upgrading envoy everywhere in eqiad T280317
  • 06:11 marostegui@cumin1001: dbctl commit (dc=all): 'db1074 (re)pooling @ 25%: Repool db1074', diff saved to https://phabricator.wikimedia.org/P15411 and previous config saved to /var/cache/conftool/dbconfig/20210419-061116-root.json
  • 06:10 _joe_: upgrading envoy everywhere in codfw T280317
  • 06:03 marostegui@cumin1001: dbctl commit (dc=all): 'Pool db1179 in s3 for the first time with minimal weight T275633', diff saved to https://phabricator.wikimedia.org/P15410 and previous config saved to /var/cache/conftool/dbconfig/20210419-060321-marostegui.json
  • 06:01 _joe_: rolling out further envoy upgrades T280317
  • 05:56 marostegui@cumin1001: dbctl commit (dc=all): 'db1074 (re)pooling @ 10%: Repool db1074', diff saved to https://phabricator.wikimedia.org/P15409 and previous config saved to /var/cache/conftool/dbconfig/20210419-055613-root.json
  • 05:53 marostegui: Stop sanitarium master on s2 (lag will show up on clouddb* labsdb* hosts) T272008
  • 05:52 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1074 T272008', diff saved to https://phabricator.wikimedia.org/P15408 and previous config saved to /var/cache/conftool/dbconfig/20210419-055240-marostegui.json
  • 05:48 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1106', diff saved to https://phabricator.wikimedia.org/P15407 and previous config saved to /var/cache/conftool/dbconfig/20210419-054831-marostegui.json
  • 05:42 marostegui: Stop sanitarium master on s1 (lag will show up on clouddb* labsdb* hosts) T272008
  • 05:42 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1106 T272008', diff saved to https://phabricator.wikimedia.org/P15406 and previous config saved to /var/cache/conftool/dbconfig/20210419-054158-marostegui.json
  • 05:37 marostegui@cumin1001: dbctl commit (dc=all): 'Pool db1179 in s3 for the first time with minimal weight T275633', diff saved to https://phabricator.wikimedia.org/P15405 and previous config saved to /var/cache/conftool/dbconfig/20210419-053730-marostegui.json
  • 05:31 marostegui@cumin1001: dbctl commit (dc=all): 'Pool db1179 in s3 for the first time with minimal weight T275633', diff saved to https://phabricator.wikimedia.org/P15404 and previous config saved to /var/cache/conftool/dbconfig/20210419-053127-marostegui.json
  • 05:30 marostegui@cumin1001: dbctl commit (dc=all): 'Add db1179 to dbctl T275633', diff saved to https://phabricator.wikimedia.org/P15403 and previous config saved to /var/cache/conftool/dbconfig/20210419-053050-marostegui.json
  • 05:05 marostegui: Restart m2 database master T280251

2021-04-18

  • 06:40 Amir1: cleaning watchlist of User:Mr._Ibrahem in wikidatawiki (in main ns only)

2021-04-17

  • 16:16 Amir1: cleaning SuccuBot's watchlist in wikidatawiki
  • 00:53 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1307.eqiad.wmnet
  • 00:48 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1307.eqiad.wmnet
  • 00:23 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1402.eqiad.wmnet
  • 00:22 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1403.eqiad.wmnet
  • 00:18 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1403.eqiad.wmnet
  • 00:18 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1402.eqiad.wmnet
  • 00:14 ryankemper: T267927 `sudo run-puppet-agent` and `sudo pool` on `wdqs2003`
  • 00:11 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1307.eqiad.wmnet with reason: REIMAGE
  • 00:09 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1307.eqiad.wmnet with reason: REIMAGE
  • 00:08 ryankemper: T267927 Reload of `wdqs2003` complete
  • 00:07 ryankemper@cumin2001: END (ERROR) - Cookbook sre.wdqs.data-transfer (exit_code=97)
  • 00:00 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1403.eqiad.wmnet with reason: REIMAGE

2021-04-16

  • 23:58 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mwdebug1003.eqiad.wmnet
  • 23:58 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1402.eqiad.wmnet with reason: REIMAGE
  • 23:56 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1403.eqiad.wmnet with reason: REIMAGE
  • 23:56 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1402.eqiad.wmnet with reason: REIMAGE
  • 23:48 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mwdebug1003.eqiad.wmnet
  • 23:47 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mwdebug1003.eqiad.wmnet
  • 23:47 mutante: decom'ing mwdebug1003, stretch VM created in T267248
  • 23:39 mutante: reimaging last 3 remaining stretch appservers with buster, mw1307, mw1402, mw1403
  • 23:37 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on mw[1402-1403].eqiad.wmnet with reason: reimage
  • 23:36 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on mw[1402-1403].eqiad.wmnet with reason: reimage
  • 21:08 ejegg: updated fundraising python tools from ef54260b0d to 3d950fffbd
  • 20:40 Trey314159: reindexing wikidata on cloudelastic... AGAIN (T274200)
  • 17:48 ryankemper: T267927 Transferring from `wdqs2008`->`wdqs2003` to resolve the data corruption on `wdqs2003`
  • 17:47 ryankemper@cumin2001: START - Cookbook sre.wdqs.data-transfer
  • 17:41 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephosd1020.wikimedia.org with reason: REIMAGE
  • 17:39 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd1020.wikimedia.org with reason: REIMAGE
  • 17:39 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephosd1019.wikimedia.org with reason: REIMAGE
  • 17:37 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd1019.wikimedia.org with reason: REIMAGE
  • 17:35 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephosd1017.wikimedia.org with reason: REIMAGE
  • 17:35 mutante: depooling mwdebug1003 (stretch VM, will be removed), mwdebug1001/1002 (buster) and unchanged
  • 17:34 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mwdebug1003.eqiad.wmnet
  • 17:33 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephosd1016.wikimedia.org with reason: REIMAGE
  • 17:33 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd1017.wikimedia.org with reason: REIMAGE
  • 17:31 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd1016.wikimedia.org with reason: REIMAGE
  • 17:03 ryankemper: T267927 Pooled `wdqs1007`, `wdqs2003`, `wdqs1008`, `wdqs2004`
  • 17:00 ryankemper: T267927 Following data transfers complete: `wdqs1004`->`wdqs1007`, `wdqs2001`->`wdqs2003`, `wdqs1003`->`wdqs1008`, `wdqs2008`->`wdqs2004`
  • 17:00 ryankemper@cumin2001: END (ERROR) - Cookbook sre.wdqs.data-transfer (exit_code=97)
  • 17:00 ryankemper@cumin2001: END (ERROR) - Cookbook sre.wdqs.data-transfer (exit_code=97)
  • 17:00 ryankemper@cumin2001: END (ERROR) - Cookbook sre.wdqs.data-transfer (exit_code=97)
  • 16:59 ryankemper@cumin2001: END (ERROR) - Cookbook sre.wdqs.data-transfer (exit_code=97)
  • 16:13 otto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
  • 16:13 otto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'canary' .
  • 16:09 otto@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
  • 15:57 otto@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
  • 15:43 urbanecm@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'linkrecommendation' for release 'external' .
  • 15:43 urbanecm@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'linkrecommendation' for release 'internal' .
  • 15:31 urbanecm@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'linkrecommendation' for release 'external' .
  • 15:31 urbanecm@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'linkrecommendation' for release 'internal' .
  • 15:22 urbanecm@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'linkrecommendation' for release 'staging' .
  • 14:59 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase2023.codfw.wmnet
  • 14:58 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:10:00 on restbase-dev1006.eqiad.wmnet with reason: restarting for kernel update
  • 14:58 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 0:10:00 on restbase-dev1006.eqiad.wmnet with reason: restarting for kernel update
  • 14:52 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:10:00 on restbase-dev[1005-1006].eqiad.wmnet with reason: restarting for kernel update
  • 14:51 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 0:10:00 on restbase-dev[1005-1006].eqiad.wmnet with reason: restarting for kernel update
  • 14:50 hnowlan@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase2023.codfw.wmnet
  • 14:49 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase2022.codfw.wmnet
  • 14:43 hnowlan@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase2022.codfw.wmnet
  • 14:39 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase2021.codfw.wmnet
  • 14:31 hnowlan@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase2021.codfw.wmnet
  • 14:24 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase2020.codfw.wmnet
  • 14:18 hnowlan@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase2020.codfw.wmnet
  • 13:07 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase2019.codfw.wmnet
  • 12:59 hnowlan@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase2019.codfw.wmnet
  • 12:54 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase2018.codfw.wmnet
  • 12:48 hnowlan@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase2018.codfw.wmnet
  • 12:47 jayme@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'citoid' for release 'production' .
  • 12:47 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase2017.codfw.wmnet
  • 12:41 hnowlan@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase2017.codfw.wmnet
  • 12:37 jayme@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'citoid' for release 'production' .
  • 12:25 jayme@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'citoid' for release 'staging' .
  • 12:22 jayme: updated envoyproxy to 1.15.4-1 on 'A:mw-canary or A:restbase-canary'
  • 11:08 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase2016.codfw.wmnet
  • 11:02 moritzm: imported ferm 2.5.1-1+wmf1 to bullseye-wikimedia/main T275873
  • 11:01 hnowlan@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase2016.codfw.wmnet
  • 10:55 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase2015.codfw.wmnet
  • 10:49 hnowlan@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase2015.codfw.wmnet
  • 10:44 arturo: merging homer change to cr-eqiad (T279342)
  • 10:39 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase2014.codfw.wmnet
  • 10:33 hnowlan@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase2014.codfw.wmnet
  • 10:28 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase2013.codfw.wmnet
  • 10:20 hnowlan@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase2013.codfw.wmnet
  • 10:15 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase2012.codfw.wmnet
  • 10:08 hnowlan@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase2012.codfw.wmnet
  • 10:08 jayme: updated envoyproxy to 1.15.4-1 on mw1325.eqiad.wmnet,restbase1026.eqiad.wmnet
  • 10:05 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on thanos-fe2001.codfw.wmnet with reason: REIMAGE
  • 10:04 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase2011.codfw.wmnet
  • 10:03 filippo@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on thanos-fe2001.codfw.wmnet with reason: REIMAGE
  • 10:00 jayme: updated envoyproxy to 1.15.4-1 on mwdebug1001.eqiad.wmnet
  • 09:57 hnowlan@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase2011.codfw.wmnet
  • 09:55 jayme: imported envoyproxy_1.15.4-1 to stretch-wikimedia - T280317
  • 09:40 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase2010.codfw.wmnet
  • 09:34 marostegui@cumin1001: dbctl commit (dc=all): 'db1169 (re)pooling @ 100%: Repool db1169', diff saved to https://phabricator.wikimedia.org/P15384 and previous config saved to /var/cache/conftool/dbconfig/20210416-093446-root.json
  • 09:33 hnowlan@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase2010.codfw.wmnet
  • 09:27 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase2009.codfw.wmnet
  • 09:21 hnowlan@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase2009.codfw.wmnet
  • 09:19 marostegui@cumin1001: dbctl commit (dc=all): 'db1169 (re)pooling @ 75%: Repool db1169', diff saved to https://phabricator.wikimedia.org/P15383 and previous config saved to /var/cache/conftool/dbconfig/20210416-091942-root.json
  • 09:13 jayme: imported envoyproxy_1.15.4-1 to buster-wikimedia - T280317
  • 09:12 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host sretest1002.eqiad.wmnet
  • 09:04 marostegui@cumin1001: dbctl commit (dc=all): 'db1169 (re)pooling @ 50%: Repool db1169', diff saved to https://phabricator.wikimedia.org/P15380 and previous config saved to /var/cache/conftool/dbconfig/20210416-090438-root.json
  • 08:49 marostegui@cumin1001: dbctl commit (dc=all): 'db1169 (re)pooling @ 25%: Repool db1169', diff saved to https://phabricator.wikimedia.org/P15374 and previous config saved to /var/cache/conftool/dbconfig/20210416-084935-root.json
  • 08:34 marostegui@cumin1001: dbctl commit (dc=all): 'db1169 (re)pooling @ 10%: Repool db1169', diff saved to https://phabricator.wikimedia.org/P15373 and previous config saved to /var/cache/conftool/dbconfig/20210416-083431-root.json
  • 07:53 elukey: run reprepro --delete clearvanished on apt1001 to clear all cloudera packages
  • 07:41 ema: cp-upload_ulsfo: rolling varnish-frontend-restart to apply exp policy settings changes starting from empty caches T275809
  • 07:19 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1169', diff saved to https://phabricator.wikimedia.org/P15372 and previous config saved to /var/cache/conftool/dbconfig/20210416-071936-marostegui.json
  • 06:58 ariel@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase1030.eqiad.wmnet
  • 06:52 ariel@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase1030.eqiad.wmnet
  • 06:48 ariel@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase1029.eqiad.wmnet
  • 06:39 ariel@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase1029.eqiad.wmnet
  • 06:27 ariel@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase1028.eqiad.wmnet
  • 06:22 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2095.codfw.wmnet with reason: REIMAGE
  • 06:20 ariel@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase1028.eqiad.wmnet
  • 06:19 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db2095.codfw.wmnet with reason: REIMAGE
  • 05:54 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts analytics-tool1001.eqiad.wmnet
  • 05:50 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2094.codfw.wmnet with reason: REIMAGE
  • 05:48 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db2094.codfw.wmnet with reason: REIMAGE
  • 05:42 elukey@cumin1001: START - Cookbook sre.hosts.decommission for hosts analytics-tool1001.eqiad.wmnet
  • 03:31 ryankemper: [wdqs] `ryankemper@wdqs1013:~$ sudo systemctl restart wdqs-blazegraph`
  • 03:26 ryankemper: T267927 Pooled `wdqs2001`
  • 03:22 ryankemper: T267927 Pooled `wdqs1006` and `wdqs2002`
  • 03:09 ryankemper: T267927 kicked off next round of `data-transfer`s: `wdqs1004`->`wdqs1007`, `wdqs2001`->`wdqs2003`, `wdqs1003`->`wdqs1008`, `wdqs2008`->`wdqs2004`
  • 03:09 ryankemper@cumin2001: START - Cookbook sre.wdqs.data-transfer
  • 03:09 ryankemper@cumin2001: START - Cookbook sre.wdqs.data-transfer
  • 03:09 ryankemper@cumin2001: START - Cookbook sre.wdqs.data-transfer
  • 03:09 ryankemper@cumin2001: START - Cookbook sre.wdqs.data-transfer
  • 03:05 ryankemper: T267927 Last round of `data-transfer`s finished successfully, proceeding to next round
  • 03:04 ryankemper@cumin2001: END (ERROR) - Cookbook sre.wdqs.data-transfer (exit_code=97)
  • 03:04 ryankemper@cumin2001: END (ERROR) - Cookbook sre.wdqs.data-transfer (exit_code=97)
  • 03:04 ryankemper@cumin2001: END (ERROR) - Cookbook sre.wdqs.data-transfer (exit_code=97)
  • 00:30 Krinkle: Delete old data at doc1001:/srv/doc/cover/PasswordBlacklist (ref T254799)
  • 00:09 jforrester@deploy1002: Finished deploy [integration/docroot@63b6fb6]: Sync with CI updates (no-op) (duration: 00m 08s)
  • 00:09 jforrester@deploy1002: Started deploy [integration/docroot@63b6fb6]: Sync with CI updates (no-op)

2021-04-15

  • 23:37 jforrester@deploy1002: Synchronized php-1.37.0-wmf.1/skins/Vector/skin.json: Backport: Adjust floating override (T280260) (duration: 00m 56s)
  • 23:35 jforrester@deploy1002: Synchronized php-1.37.0-wmf.1/skins/Vector/resources/skins.vector.styles.legacy/layouts/screen.less: Backport: Adjust floating override (T280260) (duration: 00m 56s)
  • 23:31 jforrester@deploy1002: Synchronized php-1.37.0-wmf.1/extensions/WikimediaEvents/modules/ext.wikimediaEvents/searchSatisfaction.js: Backport: searchSatisfaction: Default userEditBucket back to 0 edits (T280294) (duration: 00m 57s)
  • 23:17 jforrester@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Create Draft namespace on itwiki (T280289) (duration: 00m 56s)
  • 23:09 jforrester@deploy1002: Synchronized wmf-config/logos.php: Config: [wikitech] Update logo to mirror the new MediaWiki logo (T279087) (duration: 00m 56s)
  • 23:08 jforrester@deploy1002: Synchronized static/images/project-logos/wikitech-2x.png: Config: [wikitech] Update logo to mirror the new MediaWiki logo (T279087) (duration: 00m 56s)
  • 23:07 jforrester@deploy1002: Synchronized static/images/project-logos/wikitech-1.5x.png: Config: [wikitech] Update logo to mirror the new MediaWiki logo (T279087) (duration: 00m 57s)
  • 23:06 jforrester@deploy1002: Synchronized static/images/project-logos/wikitech.png: Config: [wikitech] Update logo to mirror the new MediaWiki logo (T279087) (duration: 00m 57s)
  • 22:56 ryankemper: T267927 WDQS kicked off next round of `data-transfer`s: `wdqs1004`->`wdqs1006`, `wdqs2001`->`wdqs2002`, `wdqs2008`->`wdqs1003`
  • 22:56 ryankemper@cumin2001: START - Cookbook sre.wdqs.data-transfer
  • 22:56 ryankemper@cumin2001: START - Cookbook sre.wdqs.data-transfer
  • 22:55 ryankemper@cumin2001: START - Cookbook sre.wdqs.data-transfer
  • 22:48 ryankemper: T267927 pooled `wdqs1005` (all caught up on lag)
  • 22:46 ryankemper: T280108 T267927 Manually re-enabled and ran puppet on `wdqs1005` (had closed the tmux pane which terminated the cookbook without letting it do its final cleanup)
  • 22:33 ryankemper: T280108 T267927 Data transfers completed successfully; small issue with new `wait_for_updater` logic is preventing termination so I ctrl+c'd manually
  • 22:32 ryankemper@cumin2001: END (ERROR) - Cookbook sre.wdqs.data-transfer (exit_code=97)
  • 20:03 herron: migrating kafka-logging broker logstash1012 to kafka-logging1003 T279342
  • 19:56 Trey314159: reindexing wikidata on cloudelastic finished/failed (T274200)
  • 19:43 Trey314159: reindexing wikidata on cloudelastic (T274200)
  • 19:42 Trey314159: reindexing commons and wikidata on elastic@eqiad (T274200)
  • 19:28 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 19:24 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 19:14 jhuneidi@deploy1002: rebuilt and synchronized wikiversions files: all wikis to 1.37.0-wmf.1 refs T278345
  • 18:49 andrew@deploy1002: Finished deploy [horizon/deploy@ec37c43]: test deploy of trove dashboard to codfw1dev (duration: 01m 58s)
  • 18:47 andrew@deploy1002: Started deploy [horizon/deploy@ec37c43]: test deploy of trove dashboard to codfw1dev
  • 18:39 jdrewniak@deploy1002: Synchronized private/readme.php: Config: Add $wgWMEVectorPrefDiffSalt to private/readme (T261842) (duration: 01m 08s)
  • 18:32 jdrewniak@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Add mediawiki.pref_diff stream to wgEventLoggingStreamNames/wgEventStreams (T261842) (duration: 01m 18s)
  • 17:18 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:09 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 16:42 crusnov@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:34 crusnov@cumin1001: START - Cookbook sre.dns.netbox
  • 16:27 ariel@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase1027.eqiad.wmnet
  • 16:21 ryankemper: T280108 T267927 Current wdqs transfers in progress: `wqds1004`->`wdqs1005`, `wdqs2008`->`wdqs2001`
  • 16:21 ariel@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase1027.eqiad.wmnet
  • 16:17 ryankemper@cumin2001: START - Cookbook sre.wdqs.data-transfer
  • 16:17 ariel@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase1026.eqiad.wmnet
  • 16:17 ryankemper@cumin2001: START - Cookbook sre.wdqs.data-transfer
  • 16:17 ryankemper: T280108 T267927 Merged https://gerrit.wikimedia.org/r/c/operations/cookbooks/+/679702 and ran puppet-agent on `cumin2001` before next round of wdqs `data-transfer`s
  • 16:12 ariel@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase1026.eqiad.wmnet
  • 16:08 ariel@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase1025.eqiad.wmnet
  • 16:02 ariel@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase1025.eqiad.wmnet
  • 15:26 otto@deploy1002: Finished deploy [analytics/refinery@497f6a5] (hadoop-test): (no justification provided) (duration: 04m 44s)
  • 15:21 otto@deploy1002: Started deploy [analytics/refinery@497f6a5] (hadoop-test): (no justification provided)
  • 15:09 elukey@deploy1002: Finished deploy [analytics/refinery@497f6a5]: Regular analytics weekly train (duration: 13m 12s)
  • 15:09 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dns1002.wikimedia.org
  • 15:03 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host dns1002.wikimedia.org
  • 14:59 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dns1001.wikimedia.org
  • 14:56 elukey@deploy1002: Started deploy [analytics/refinery@497f6a5]: Regular analytics weekly train
  • 14:53 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host dns1001.wikimedia.org
  • 14:50 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dns5002.wikimedia.org
  • 14:47 jayme: imported etcd-mirror_0.0.5-1 to buster-wikimedia
  • 14:43 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host dns5002.wikimedia.org
  • 14:41 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dns5001.wikimedia.org
  • 14:37 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wtp1048.eqiad.wmnet with reason: REIMAGE
  • 14:35 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wtp1047.eqiad.wmnet with reason: REIMAGE
  • 14:35 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on wtp1048.eqiad.wmnet with reason: REIMAGE
  • 14:34 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host dns5001.wikimedia.org
  • 14:33 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wtp1046.eqiad.wmnet with reason: REIMAGE
  • 14:33 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on wtp1047.eqiad.wmnet with reason: REIMAGE
  • 14:32 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dns2002.wikimedia.org
  • 14:31 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on wtp1046.eqiad.wmnet with reason: REIMAGE
  • 14:27 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host dns2002.wikimedia.org
  • 14:24 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dns2001.wikimedia.org
  • 14:19 ppchelko@deploy1002: Finished deploy [restbase/deploy@4755f50]: T271983, try again (duration: 07m 45s)
  • 14:18 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host dns2001.wikimedia.org
  • 14:17 ariel@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase1024.eqiad.wmnet
  • 14:12 ppchelko@deploy1002: Started deploy [restbase/deploy@4755f50]: T271983, try again
  • 14:11 ppchelko@deploy1002: Finished deploy [restbase/deploy@4755f50]: T271983 (duration: 11m 15s)
  • 14:09 ariel@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase1024.eqiad.wmnet
  • 14:00 ppchelko@deploy1002: Started deploy [restbase/deploy@4755f50]: T271983
  • 13:56 jiji@cumin1001: conftool action : set/pooled=yes; selector: name=wtp104[5-7].eqiad.wmnet
  • 13:55 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase1023.eqiad.wmnet
  • 13:54 andrewbogott: upgrading packages and mediawiki on wikitech-static
  • 13:53 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dns4002.wikimedia.org
  • 13:48 hnowlan@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase1023.eqiad.wmnet
  • 13:46 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host dns4002.wikimedia.org
  • 13:40 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase1022.eqiad.wmnet
  • 13:34 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dns3001.wikimedia.org
  • 13:32 hnowlan@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase1022.eqiad.wmnet
  • 13:25 ariel@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase1021.eqiad.wmnet
  • 13:19 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host dns3001.wikimedia.org
  • 13:18 ariel@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase1021.eqiad.wmnet
  • 13:13 ariel@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase1020.eqiad.wmnet
  • 13:13 XioNoX: redirect ns2 to dns3002
  • 13:11 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dns3002.wikimedia.org
  • 13:07 ariel@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase1020.eqiad.wmnet
  • 13:02 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host dns3002.wikimedia.org
  • 13:02 ariel@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase1019.eqiad.wmnet
  • 12:59 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wtp1045.eqiad.wmnet with reason: REIMAGE
  • 12:57 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wtp1044.eqiad.wmnet with reason: REIMAGE
  • 12:57 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on wtp1045.eqiad.wmnet with reason: REIMAGE
  • 12:56 ariel@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase1019.eqiad.wmnet
  • 12:55 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wtp1043.eqiad.wmnet with reason: REIMAGE
  • 12:55 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on wtp1044.eqiad.wmnet with reason: REIMAGE
  • 12:54 XioNoX: redirect ns2 to dns3001
  • 12:53 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on wtp1043.eqiad.wmnet with reason: REIMAGE
  • 12:50 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host authdns1001.wikimedia.org
  • 12:44 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host authdns1001.wikimedia.org
  • 12:37 XioNoX: redirect ns0 to authdns2001
  • 12:32 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host authdns2001.wikimedia.org
  • 12:28 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase1018.eqiad.wmnet
  • 12:27 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host authdns2001.wikimedia.org
  • 12:23 jiji@cumin1001: conftool action : set/pooled=yes; selector: name=wtp104[0-2].eqiad.wmnet
  • 12:22 hnowlan@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase1018.eqiad.wmnet
  • 12:17 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase1017.eqiad.wmnet
  • 12:12 XioNoX: redirect ns1 to authdns1001
  • 12:09 hnowlan@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase1017.eqiad.wmnet
  • 11:51 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase1016.eqiad.wmnet
  • 11:45 hnowlan@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase1016.eqiad.wmnet
  • 11:45 hnowlan: restarting restbase1016 for kernel update
  • 11:35 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wtp1042.eqiad.wmnet with reason: REIMAGE
  • 11:33 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wtp1041.eqiad.wmnet with reason: REIMAGE
  • 11:33 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on wtp1042.eqiad.wmnet with reason: REIMAGE
  • 11:31 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wtp1040.eqiad.wmnet with reason: REIMAGE
  • 11:30 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on wtp1041.eqiad.wmnet with reason: REIMAGE
  • 11:29 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on wtp1040.eqiad.wmnet with reason: REIMAGE
  • 11:25 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:10:00 on restbase-dev1004.eqiad.wmnet with reason: restarting for kernel update
  • 11:25 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 0:10:00 on restbase-dev1004.eqiad.wmnet with reason: restarting for kernel update
  • 11:20 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 6748a7f: Add *.jfklibrary.org to the wgCopyUploadsDomains allowlist of Wikimedia Commons (T279506) (duration: 01m 51s)
  • 11:14 arturo: merging homer changes for cr-codgw (T280225)
  • 11:14 arturo: merging homer changes for cr-eqiad (T280225)
  • 10:59 jiji@cumin1001: conftool action : set/pooled=yes; selector: name=wtp103[7-9].eqiad.wmnet
  • 10:54 akosiaris@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'linkrecommendation' for release 'internal' .
  • 10:53 akosiaris@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'linkrecommendation' for release 'staging' .
  • 10:53 akosiaris@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'linkrecommendation' for release 'external' .
  • 10:24 akosiaris@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'linkrecommendation' for release 'staging' .
  • 10:24 akosiaris@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'linkrecommendation' for release 'external' .
  • 10:24 akosiaris@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'linkrecommendation' for release 'internal' .
  • 10:21 elukey: Add kafka-logging100{2,3} to the kafka term in the analytics filters on cr1/cr2 eqiad - ref: https://gerrit.wikimedia.org/r/c/operations/homer/public/+/679740
  • 10:08 akosiaris@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'linkrecommendation' for release 'internal' .
  • 10:08 akosiaris@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'linkrecommendation' for release 'external' .
  • 10:08 akosiaris@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'linkrecommendation' for release 'staging' .
  • 09:50 marostegui@cumin1001: dbctl commit (dc=all): 'db1146:3312 (re)pooling @ 100%: Repool db1146:3312', diff saved to https://phabricator.wikimedia.org/P15368 and previous config saved to /var/cache/conftool/dbconfig/20210415-095031-root.json
  • 09:36 marostegui@cumin1001: dbctl commit (dc=all): 'db1146:3314 (re)pooling @ 100%: Repool db1146:3314 after kernel upgrade', diff saved to https://phabricator.wikimedia.org/P15367 and previous config saved to /var/cache/conftool/dbconfig/20210415-093633-root.json
  • 09:35 marostegui@cumin1001: dbctl commit (dc=all): 'db1146:3312 (re)pooling @ 75%: Repool db1146:3312', diff saved to https://phabricator.wikimedia.org/P15366 and previous config saved to /var/cache/conftool/dbconfig/20210415-093527-root.json
  • 09:21 marostegui@cumin1001: dbctl commit (dc=all): 'db1146:3314 (re)pooling @ 75%: Repool db1146:3314 after kernel upgrade', diff saved to https://phabricator.wikimedia.org/P15365 and previous config saved to /var/cache/conftool/dbconfig/20210415-092129-root.json
  • 09:20 marostegui@cumin1001: dbctl commit (dc=all): 'db1146:3312 (re)pooling @ 50%: Repool db1146:3312', diff saved to https://phabricator.wikimedia.org/P15364 and previous config saved to /var/cache/conftool/dbconfig/20210415-092024-root.json
  • 09:16 ema: cp-upload: varnishadm -n frontend param.set nuke_limit 1000 T275809
  • 09:06 marostegui@cumin1001: dbctl commit (dc=all): 'db1146:3314 (re)pooling @ 50%: Repool db1146:3314 after kernel upgrade', diff saved to https://phabricator.wikimedia.org/P15363 and previous config saved to /var/cache/conftool/dbconfig/20210415-090625-root.json
  • 09:05 marostegui@cumin1001: dbctl commit (dc=all): 'db1146:3312 (re)pooling @ 25%: Repool db1146:3312', diff saved to https://phabricator.wikimedia.org/P15362 and previous config saved to /var/cache/conftool/dbconfig/20210415-090520-root.json
  • 09:04 moritzm: installing tomcat security updates
  • 08:51 marostegui@cumin1001: dbctl commit (dc=all): 'db1146:3314 (re)pooling @ 25%: Repool db1146:3314 after kernel upgrade', diff saved to https://phabricator.wikimedia.org/P15361 and previous config saved to /var/cache/conftool/dbconfig/20210415-085122-root.json
  • 08:50 marostegui@cumin1001: dbctl commit (dc=all): 'db1146:3312 (re)pooling @ 10%: Repool db1146:3312', diff saved to https://phabricator.wikimedia.org/P15360 and previous config saved to /var/cache/conftool/dbconfig/20210415-085017-root.json
  • 08:48 godog: free space and bounce thanos-compact
  • 08:36 marostegui@cumin1001: dbctl commit (dc=all): 'db1146:3314 (re)pooling @ 10%: Repool db1146:3314 after kernel upgrade', diff saved to https://phabricator.wikimedia.org/P15359 and previous config saved to /var/cache/conftool/dbconfig/20210415-083618-root.json
  • 08:21 marostegui@cumin1001: dbctl commit (dc=all): 'db1146:3314 (re)pooling @ 5%: Repool db1146:3314 after kernel upgrade', diff saved to https://phabricator.wikimedia.org/P15358 and previous config saved to /var/cache/conftool/dbconfig/20210415-082115-root.json
  • 08:11 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1146:3314', diff saved to https://phabricator.wikimedia.org/P15357 and previous config saved to /var/cache/conftool/dbconfig/20210415-081127-marostegui.json
  • 08:09 marostegui@cumin1001: dbctl commit (dc=all): 'db1146:3314 (re)pooling @ 100%: Repool db1146:3314 after kernel upgrade', diff saved to https://phabricator.wikimedia.org/P15356 and previous config saved to /var/cache/conftool/dbconfig/20210415-080947-root.json
  • 07:57 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 100%: Repool db1166 after cloning db1179', diff saved to https://phabricator.wikimedia.org/P15355 and previous config saved to /var/cache/conftool/dbconfig/20210415-075718-root.json
  • 07:54 marostegui@cumin1001: dbctl commit (dc=all): 'db1146:3314 (re)pooling @ 75%: Repool db1146:3314 after kernel upgrade', diff saved to https://phabricator.wikimedia.org/P15354 and previous config saved to /var/cache/conftool/dbconfig/20210415-075444-root.json
  • 07:42 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 75%: Repool db1166 after cloning db1179', diff saved to https://phabricator.wikimedia.org/P15353 and previous config saved to /var/cache/conftool/dbconfig/20210415-074214-root.json
  • 07:39 marostegui@cumin1001: dbctl commit (dc=all): 'db1146:3314 (re)pooling @ 50%: Repool db1146:3314 after kernel upgrade', diff saved to https://phabricator.wikimedia.org/P15352 and previous config saved to /var/cache/conftool/dbconfig/20210415-073940-root.json
  • 07:27 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 50%: Repool db1166 after cloning db1179', diff saved to https://phabricator.wikimedia.org/P15351 and previous config saved to /var/cache/conftool/dbconfig/20210415-072711-root.json
  • 07:24 marostegui@cumin1001: dbctl commit (dc=all): 'db1146:3314 (re)pooling @ 25%: Repool db1146:3314 after kernel upgrade', diff saved to https://phabricator.wikimedia.org/P15350 and previous config saved to /var/cache/conftool/dbconfig/20210415-072436-root.json
  • 07:16 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1146 (s2,s4) to upgrade kernel', diff saved to https://phabricator.wikimedia.org/P15348 and previous config saved to /var/cache/conftool/dbconfig/20210415-071600-marostegui.json
  • 07:12 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 25%: Repool db1166 after cloning db1179', diff saved to https://phabricator.wikimedia.org/P15347 and previous config saved to /var/cache/conftool/dbconfig/20210415-071207-root.json
  • 06:57 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 10%: Repool db1166 after cloning db1179', diff saved to https://phabricator.wikimedia.org/P15346 and previous config saved to /var/cache/conftool/dbconfig/20210415-065704-root.json
  • 06:33 ryankemper: T280108 T267927 `data-transfer` to `wdqs1004` was successful; cookbook failed due to a newly introduced minor type error that didn't effect the transfer itself
  • 06:32 elukey: move hue.wikimedia.org to an-tool1009 (from analytics-tool1001)
  • 06:00 ryankemper@cumin2001: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99)
  • 05:54 Amir1: end of cleaning archive of pywikibot-bugs and wikidata-bugs T262773
  • 05:44 Amir1: start deleting archive of wikidata-bugs T262773
  • 05:02 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1166 to clone db1179 T275633', diff saved to https://phabricator.wikimedia.org/P15344 and previous config saved to /var/cache/conftool/dbconfig/20210415-050239-marostegui.json
  • 04:14 ryankemper@cumin2001: START - Cookbook sre.wdqs.data-transfer
  • 04:14 ryankemper: T280108 T267927 `wdqs2008` (source) caught up on lag, xfering to `wdqs1004`: `sudo -i cookbook sre.wdqs.data-transfer --source wdqs2008.codfw.wmnet --dest wdqs1004.eqiad.wmnet --reason "transferring wikidata journal following reload from dumps" --blazegraph_instance blazegraph --task-id T267927`
  • 04:06 ryankemper: T280108 T267927 Merged https://gerrit.wikimedia.org/r/c/operations/cookbooks/+/679320, will verify correct behavior of `data-transfer` cookbook
  • 01:19 Amir1: mwscript extensions/Wikibase/repo/maintenance/changePropertyDataType.php wikidatawiki --property-id P8671 --new-data-type external-id (T278427)
  • 00:50 ejegg: updated fundraising CiviCRM from c3342aa4ea to 35a8dd33ba

2021-04-14

  • 23:27 ladsgroup@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Disable legacy javascript global variables in ruwiki (T72470) (duration: 01m 16s)
  • 21:44 legoktm: manually started debmonitor-client.service on ml-serve2004 after 502 Bad gateway error
  • 20:55 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on wtp[1037-1039].eqiad.wmnet with reason: reimage
  • 20:54 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on wtp[1037-1039].eqiad.wmnet with reason: reimage
  • 20:38 mutante: wtp1037, wtp1038, wtp1039 - scap pull
  • 19:52 dzahn@cumin1001: conftool action : set/weight=20; selector: name=mw2395.codfw.wmnet,cluster=jobrunner
  • 19:52 dzahn@cumin1001: conftool action : set/weight=20; selector: name=mw2394.codfw.wmnet,cluster=jobrunner
  • 19:51 dzahn@cumin1001: conftool action : set/weight=20; selector: name=mw2410.codfw.wmnet,cluster=videoscaler
  • 19:51 dzahn@cumin1001: conftool action : set/weight=20; selector: name=mw2411.codfw.wmnet,cluster=videoscaler
  • 19:50 cstone: civicrm revision changed from ec2a3bcff6 to c3342aa4ea
  • 19:50 dzahn@cumin1001: conftool action : set/weight=15; selector: name=mw2411.codfw.wmnet,cluster=videoscaler
  • 19:50 dzahn@cumin1001: conftool action : set/weight=15; selector: name=mw2410.codfw.wmnet,cluster=videoscaler
  • 19:49 dzahn@cumin1001: conftool action : set/weight=10; selector: name=mw2395.codfw.wmnet,cluster=videoscaler
  • 19:48 dzahn@cumin1001: conftool action : set/weight=10; selector: name=mw2394.codfw.wmnet,cluster=videoscaler
  • 19:47 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2411.codfw.wmnet,cluster=jobrunner
  • 19:42 herron: migrating kafka-logging broker logstash1011 to kafka-logging1002 T279342
  • 19:06 jhuneidi@deploy1002: Synchronized php: group1 wikis to 1.37.0-wmf.1 refs T278345 (duration: 02m 03s)
  • 19:04 jhuneidi@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.37.0-wmf.1 refs T278345
  • 18:58 mutante: urldownloader1002 - icinga alerted about disk space, ran 'apt-get clean' which is my usual go to in that case. it reduced usage from 97% to 89%
  • 17:56 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.1/extensions/GrowthExperiments/: ce44792: 84107c5: GrowthExperiments backports related to DatabaseMentorStore (T279957; T279959) (duration: 01m 55s)
  • 15:00 shdubsh: run new curator actions on codfw - T274394
  • 14:48 shdubsh: O:logstash::elasticsearch7 update elasticsearch-curator to 5.8.1
  • 14:13 rzl: mcrouter cert renewal complete, puppet re-enabled T276029
  • 14:11 zpapierski@deploy1002: Finished deploy [wikimedia/discovery/analytics@8ae53e3]: T273847 export queries to relforge dag deployment - start date update (duration: 02m 14s)
  • 14:11 moritzm: installing intel-microcode updates on Buster
  • 14:09 zpapierski@deploy1002: Started deploy [wikimedia/discovery/analytics@8ae53e3]: T273847 export queries to relforge dag deployment - start date update
  • 13:48 rzl: disabling puppet on C:mcrouter for cert renewal T276029
  • 13:43 marostegui@cumin1001: dbctl commit (dc=all): 'Remove weight from es5 master', diff saved to https://phabricator.wikimedia.org/P15342 and previous config saved to /var/cache/conftool/dbconfig/20210414-134331-marostegui.json
  • 13:34 marostegui@cumin1001: dbctl commit (dc=all): 'es1025 (re)pooling @ 100%: Repool es1025 after kernel upgrade', diff saved to https://phabricator.wikimedia.org/P15341 and previous config saved to /var/cache/conftool/dbconfig/20210414-133411-root.json
  • 13:29 zpapierski@deploy1002: Finished deploy [wikimedia/discovery/analytics@825c60a]: T273847 export queries to relforge dag deployment - schedule change (duration: 02m 08s)
  • 13:27 zpapierski@deploy1002: Started deploy [wikimedia/discovery/analytics@825c60a]: T273847 export queries to relforge dag deployment - schedule change
  • 13:19 marostegui@cumin1001: dbctl commit (dc=all): 'es1025 (re)pooling @ 75%: Repool es1025 after kernel upgrade', diff saved to https://phabricator.wikimedia.org/P15340 and previous config saved to /var/cache/conftool/dbconfig/20210414-131908-root.json
  • 13:12 moritzm: installing OpenSSL updates on buster
  • 13:12 hnowlan@cumin1001: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0)
  • 13:04 marostegui@cumin1001: dbctl commit (dc=all): 'es1025 (re)pooling @ 50%: Repool es1025 after kernel upgrade', diff saved to https://phabricator.wikimedia.org/P15339 and previous config saved to /var/cache/conftool/dbconfig/20210414-130404-root.json
  • 13:02 hnowlan@cumin1001: START - Cookbook sre.cassandra.roll-restart
  • 13:01 hnowlan@cumin1001: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0)
  • 13:01 godog: extend prometheus global @ codfw by 100G
  • 12:49 marostegui@cumin1001: dbctl commit (dc=all): 'es1025 (re)pooling @ 25%: Repool es1025 after kernel upgrade', diff saved to https://phabricator.wikimedia.org/P15338 and previous config saved to /var/cache/conftool/dbconfig/20210414-124901-root.json
  • 12:39 elukey: update kafka term for analytics-in{4,6} on cr{1,2}-eqiad to include kafka-logging1001 - ref: https://gerrit.wikimedia.org/r/c/operations/homer/public/+/679296
  • 12:36 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wtp1039.eqiad.wmnet with reason: REIMAGE
  • 12:34 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on wtp1039.eqiad.wmnet with reason: REIMAGE
  • 12:34 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wtp1038.eqiad.wmnet with reason: REIMAGE
  • 12:33 marostegui@cumin1001: dbctl commit (dc=all): 'es1025 (re)pooling @ 10%: Repool es1025 after kernel upgrade', diff saved to https://phabricator.wikimedia.org/P15337 and previous config saved to /var/cache/conftool/dbconfig/20210414-123357-root.json
  • 12:32 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wtp1037.eqiad.wmnet with reason: REIMAGE
  • 12:31 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on wtp1038.eqiad.wmnet with reason: REIMAGE
  • 12:30 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on wtp1037.eqiad.wmnet with reason: REIMAGE
  • 12:27 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3316 (re)pooling @ 100%: Repool db1096:3316 after kernel upgrade', diff saved to https://phabricator.wikimedia.org/P15336 and previous config saved to /var/cache/conftool/dbconfig/20210414-122727-root.json
  • 12:21 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3315 (re)pooling @ 100%: Repool db1096:3315 after kernel upgrade', diff saved to https://phabricator.wikimedia.org/P15335 and previous config saved to /var/cache/conftool/dbconfig/20210414-122108-root.json
  • 12:12 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3316 (re)pooling @ 75%: Repool db1096:3316 after kernel upgrade', diff saved to https://phabricator.wikimedia.org/P15334 and previous config saved to /var/cache/conftool/dbconfig/20210414-121223-root.json
  • 12:07 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es1025 for kernel and mysql upgrade T279281', diff saved to https://phabricator.wikimedia.org/P15333 and previous config saved to /var/cache/conftool/dbconfig/20210414-120724-marostegui.json
  • 12:06 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3315 (re)pooling @ 75%: Repool db1096:3315 after kernel upgrade', diff saved to https://phabricator.wikimedia.org/P15332 and previous config saved to /var/cache/conftool/dbconfig/20210414-120604-root.json
  • 12:03 marostegui: Upgrade mysql on db1080 T279281
  • 11:57 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3316 (re)pooling @ 50%: Repool db1096:3316 after kernel upgrade', diff saved to https://phabricator.wikimedia.org/P15331 and previous config saved to /var/cache/conftool/dbconfig/20210414-115720-root.json
  • 11:53 jiji@cumin1001: conftool action : set/pooled=yes; selector: name=(wtp1034|wtp1035|wtp1036).eqiad.wmnet
  • 11:51 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3315 (re)pooling @ 50%: Repool db1096:3315 after kernel upgrade', diff saved to https://phabricator.wikimedia.org/P15330 and previous config saved to /var/cache/conftool/dbconfig/20210414-115101-root.json
  • 11:42 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3316 (re)pooling @ 25%: Repool db1096:3316 after kernel upgrade', diff saved to https://phabricator.wikimedia.org/P15329 and previous config saved to /var/cache/conftool/dbconfig/20210414-114216-root.json
  • 11:41 hnowlan@cumin1001: START - Cookbook sre.cassandra.roll-restart
  • 11:39 hnowlan@cumin1001: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0)
  • 11:37 marostegui@cumin1001: dbctl commit (dc=all): 'db1177 (re)pooling @ 100%: Slowly pool db1177 for the first time in s8 T275633', diff saved to https://phabricator.wikimedia.org/P15328 and previous config saved to /var/cache/conftool/dbconfig/20210414-113714-root.json
  • 11:35 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3315 (re)pooling @ 25%: Repool db1096:3315 after kernel upgrade', diff saved to https://phabricator.wikimedia.org/P15327 and previous config saved to /var/cache/conftool/dbconfig/20210414-113557-root.json
  • 11:31 marostegui: Upgrade kernel on db1096 (s5, s6)
  • 11:29 hnowlan@cumin1001: START - Cookbook sre.cassandra.roll-restart
  • 11:26 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1096 (s5,s6) kernel upgrade', diff saved to https://phabricator.wikimedia.org/P15326 and previous config saved to /var/cache/conftool/dbconfig/20210414-112619-marostegui.json
  • 11:25 hnowlan: regenerated certificates for restbase1019/restbase102[0-7]
  • 11:22 marostegui@cumin1001: dbctl commit (dc=all): 'db1177 (re)pooling @ 90%: Slowly pool db1177 for the first time in s8 T275633', diff saved to https://phabricator.wikimedia.org/P15325 and previous config saved to /var/cache/conftool/dbconfig/20210414-112211-root.json
  • 11:07 marostegui@cumin1001: dbctl commit (dc=all): 'db1177 (re)pooling @ 80%: Slowly pool db1177 for the first time in s8 T275633', diff saved to https://phabricator.wikimedia.org/P15323 and previous config saved to /var/cache/conftool/dbconfig/20210414-110706-root.json
  • 11:06 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wtp1036.eqiad.wmnet with reason: REIMAGE
  • 11:06 akosiaris@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'linkrecommendation' for release 'external' .
  • 11:06 akosiaris@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'linkrecommendation' for release 'internal' .
  • 11:06 akosiaris@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'linkrecommendation' for release 'staging' .
  • 11:06 akosiaris@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'linkrecommendation' for release 'production' .
  • 11:04 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wtp1035.eqiad.wmnet with reason: REIMAGE
  • 11:04 akosiaris@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'linkrecommendation' for release 'external' .
  • 11:04 akosiaris@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'linkrecommendation' for release 'internal' .
  • 11:04 akosiaris@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'linkrecommendation' for release 'staging' .
  • 11:03 akosiaris@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'linkrecommendation' for release 'production' .
  • 11:03 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on wtp1036.eqiad.wmnet with reason: REIMAGE
  • 11:03 akosiaris@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'linkrecommendation' for release 'external' .
  • 11:03 akosiaris@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'linkrecommendation' for release 'staging' .
  • 11:02 akosiaris@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'linkrecommendation' for release 'internal' .
  • 11:02 akosiaris@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'linkrecommendation' for release 'production' .
  • 11:02 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wtp1034.eqiad.wmnet with reason: REIMAGE
  • 11:01 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on wtp1035.eqiad.wmnet with reason: REIMAGE
  • 10:59 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on wtp1034.eqiad.wmnet with reason: REIMAGE
  • 10:52 marostegui@cumin1001: dbctl commit (dc=all): 'db1177 (re)pooling @ 70%: Slowly pool db1177 for the first time in s8 T275633', diff saved to https://phabricator.wikimedia.org/P15322 and previous config saved to /var/cache/conftool/dbconfig/20210414-105202-root.json
  • 10:37 marostegui@cumin1001: dbctl commit (dc=all): 'db1177 (re)pooling @ 60%: Slowly pool db1177 for the first time in s8 T275633', diff saved to https://phabricator.wikimedia.org/P15321 and previous config saved to /var/cache/conftool/dbconfig/20210414-103659-root.json
  • 10:30 marostegui: Failover m1 from db1080 to db1159 - T276448
  • 10:25 dcaro@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on 6 hosts with reason: Upgrading ceph to octopus
  • 10:25 dcaro@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on 6 hosts with reason: Upgrading ceph to octopus
  • 10:21 marostegui@cumin1001: dbctl commit (dc=all): 'db1177 (re)pooling @ 50%: Slowly pool db1177 for the first time in s8 T275633', diff saved to https://phabricator.wikimedia.org/P15320 and previous config saved to /var/cache/conftool/dbconfig/20210414-102153-root.json
  • 10:06 marostegui@cumin1001: dbctl commit (dc=all): 'db1177 (re)pooling @ 40%: Slowly pool db1177 for the first time in s8 T275633', diff saved to https://phabricator.wikimedia.org/P15319 and previous config saved to /var/cache/conftool/dbconfig/20210414-100649-root.json
  • 09:51 marostegui@cumin1001: dbctl commit (dc=all): 'db1177 (re)pooling @ 30%: Slowly pool db1177 for the first time in s8 T275633', diff saved to https://phabricator.wikimedia.org/P15318 and previous config saved to /var/cache/conftool/dbconfig/20210414-095146-root.json
  • 09:37 ryankemper@cumin2001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 09:36 marostegui@cumin1001: dbctl commit (dc=all): 'db1177 (re)pooling @ 20%: Slowly pool db1177 for the first time in s8 T275633', diff saved to https://phabricator.wikimedia.org/P15317 and previous config saved to /var/cache/conftool/dbconfig/20210414-093642-root.json
  • 09:33 marostegui@cumin1001: dbctl commit (dc=all): 'Pool db1177 with minimal weight on s8 for the first time T275633', diff saved to https://phabricator.wikimedia.org/P15316 and previous config saved to /var/cache/conftool/dbconfig/20210414-093305-marostegui.json
  • 09:29 gehel: depooling wdqs1004 - corrupted data after data reload
  • 09:27 effie: disable puppet on all mediawiki servers to merge 676580
  • 09:24 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.1/extensions/DiscussionTools/includes/Hooks/HookUtils.php: e4b2d93: Dont allow query and cookie hacks to enable topic subscriptions (T280082) (duration: 01m 24s)
  • 09:23 gehel: repooling wdqs1013, catched up on lag
  • 09:22 gehel: depooling wdqs1003 - corrupted data after data reload
  • 09:19 jmm@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts kraz.wikimedia.org
  • 09:16 gehel: restarting blazegraph on wdqs1003
  • 09:12 ryankemper: T267927 depooled `wdqs1004` following data transfer (catching up on lag), current round of data transfers is done so there shouldn't be any left to depool
  • 09:10 ryankemper@cumin2001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 09:09 jmm@cumin1001: START - Cookbook sre.hosts.decommission for hosts kraz.wikimedia.org
  • 09:06 ryankemper: T267927 depool `wdqs2001` following data transfer (catching up on lag)
  • 09:03 jmm@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts bast1002.wikimedia.org
  • 09:03 ryankemper@cumin2001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 08:53 jmm@cumin1001: START - Cookbook sre.hosts.decommission for hosts bast1002.wikimedia.org
  • 08:44 Urbanecm: Run scap pull on mwdebug1002
  • 08:40 Urbanecm: Stagging on mwdebug1002
  • 08:20 akosiaris@cumin1001: conftool action : set/weight=10; selector: cluster=videoscaler,service=apache2,name=mw2394.codfw.wmnet
  • 08:20 akosiaris@cumin1001: conftool action : set/weight=10; selector: cluster=videoscaler,service=apache2,name=mw2395.codfw.wmnet
  • 08:16 jiji@cumin1001: conftool action : set/pooled=yes; selector: name=(wtp1033.eqiad.wmnet|wtp1032.eqiad.wmnet)
  • 08:07 jayme: updated chartmuseum to 0.13.1 on charmuseum1001, chartmuseum2001
  • 08:06 jayme@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=helm-charts,name=eqiad
  • 08:05 gehel: depooling wdqs2004 - catching up on lag
  • 08:01 ryankemper@cumin2001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 07:59 gehel: depooling wdqs2001 - catching up on lag
  • 07:57 gehel: depooling wdqs1013 - catching up on lag
  • 07:56 ryankemper@cumin2001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 07:55 gehel: restarting blazegraph + updater on wdqs1013
  • 07:51 jayme@cumin1001: conftool action : set/pooled=false; selector: dnsdisc=helm-charts,name=eqiad
  • 07:51 jayme@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=helm-charts,name=codfw
  • 07:42 jayme@cumin1001: conftool action : set/pooled=false; selector: dnsdisc=helm-charts,name=codfw
  • 07:42 akosiaris@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'linkrecommendation' for release 'staging' .
  • 07:42 akosiaris@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'linkrecommendation' for release 'production' .
  • 07:42 akosiaris@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'linkrecommendation' for release 'external' .
  • 07:42 jayme: imported chartmuseum_0.13.1-1 to buster-wikimedia
  • 07:41 akosiaris@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'linkrecommendation' for release 'staging' .
  • 07:41 akosiaris@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'linkrecommendation' for release 'external' .
  • 07:41 akosiaris@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'linkrecommendation' for release 'production' .
  • 07:41 godog: swift eqiad-prod: less weight for ms-be[1019-1026] / more weight to ms-be106[0-3] - T272836
  • 07:41 akosiaris@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'linkrecommendation' for release 'production' .
  • 07:40 akosiaris@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'linkrecommendation' for release 'external' .
  • 07:40 akosiaris@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'linkrecommendation' for release 'staging' .
  • 07:22 XioNoX: push pfw policy - T280059
  • 06:47 eileen: civicrm revision changed from 649e415c07 to ec2a3bcff6, config revision is c5fc1b91e0
  • 06:25 marostegui@cumin1001: dbctl commit (dc=all): 'Pool db1177 with minimal weight on s8 for the first time T275633', diff saved to https://phabricator.wikimedia.org/P15314 and previous config saved to /var/cache/conftool/dbconfig/20210414-062549-marostegui.json
  • 05:30 marostegui@cumin1001: dbctl commit (dc=all): 'Pool db1177 with minimal weight on s8 for the first time T275633', diff saved to https://phabricator.wikimedia.org/P15313 and previous config saved to /var/cache/conftool/dbconfig/20210414-052959-marostegui.json
  • 05:14 root@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db1076.eqiad.wmnet
  • 05:08 ryankemper@cumin2001: START - Cookbook sre.wdqs.data-transfer
  • 05:08 ryankemper@cumin2001: START - Cookbook sre.wdqs.data-transfer
  • 05:07 ryankemper@cumin2001: START - Cookbook sre.wdqs.data-transfer
  • 05:07 ryankemper@cumin2001: START - Cookbook sre.wdqs.data-transfer
  • 05:04 root@cumin1001: START - Cookbook sre.hosts.decommission for hosts db1076.eqiad.wmnet
  • 04:50 ryankemper@cumin2001: START - Cookbook sre.wdqs.data-transfer
  • 04:42 eileen: civicrm revision changed from a4c1a7b842 to 649e415c07, config revision is c5fc1b91e0
  • 02:54 andrew@deploy1002: Finished deploy [horizon/deploy@ef844a1]: fix for T276963 (duration: 04m 10s)
  • 02:49 andrew@deploy1002: Started deploy [horizon/deploy@ef844a1]: fix for T276963
  • 00:11 legoktm@cumin1001: conftool action : set/pooled=yes; selector: cluster=jobrunner,name=mw2411.codfw.wmnet
  • 00:10 legoktm@cumin1001: conftool action : set/pooled=yes; selector: cluster=jobrunner,name=mw2411.codfw.wmnet
  • 00:08 legoktm@cumin1001: conftool action : set/pooled=no; selector: cluster=jobrunner,name=mw2411.codfw.wmnet
  • 00:08 legoktm@cumin1001: conftool action : set/pooled=no; selector: cluster=jobrunner,name=mw2410.codfw.wmnet

2021-04-13

  • 23:27 legoktm@deploy1002: Synchronized wmf-config/ProductionServices.php: Config: Broadcast IRC events to irc1001 instead of kraz (T224579) (duration: 01m 06s)
  • 23:15 legoktm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Unset $wmgUseWikimediaShopLink for ptwiki (T279877) (duration: 01m 06s)
  • 23:10 legoktm@deploy1002: Synchronized wmf-config/CommonSettings.php: Config: ExtensionDistributor: Add REL1_36 (duration: 02m 03s)
  • 22:41 mutante: welcome new deployer Silvan Heintze (sihe) (T279764)
  • 22:40 cstone: civicrm revision changed from 76bd8ff009 to a4c1a7b842
  • 22:08 ejegg: updated payments-wiki from 70f5163816 to 9a4eef1375
  • 22:06 dzahn@cumin1001: conftool action : set/weight=15; selector: name=mw2395.codfw.wmnet,cluster=jobrunner
  • 22:06 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2395.codfw.wmnet,cluster=jobrunner
  • 22:04 Urbanecm: [urbanecm@mwmaint1002 /srv/mediawiki]$ foreachwikiindblist growthexperiments sql.php php-1.37.0-wmf.1/extensions/GrowthExperiments/maintenance/schemas/mysql/growthexperiments_mentor_mentee.sql # T278573
  • 21:50 dzahn@cumin1001: conftool action : set/weight=15; selector: name=mw2394.codfw.wmnet,cluster=jobrunner
  • 21:49 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2394.codfw.wmnet,cluster=jobrunner
  • 21:48 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2394.codfw.wmnet,service=jobrunner
  • 21:45 mutante: mw2394, mw2395 - scap pull
  • 21:42 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2395.codfw.wmnet
  • 21:35 mutante: mw2394 - rebooting
  • 21:34 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2394.codfw.wmnet
  • 21:23 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2395.codfw.wmnet with reason: REIMAGE
  • 21:21 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2395.codfw.wmnet with reason: REIMAGE
  • 21:15 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2394.codfw.wmnet with reason: REIMAGE
  • 21:13 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2394.codfw.wmnet with reason: REIMAGE
  • 20:58 mutante: mw2395, mw2395 - reimaging as jobrunners (T279100)
  • 20:56 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw[2394-2395].codfw.wmnet with reason: reimage
  • 20:56 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw[2394-2395].codfw.wmnet with reason: reimage
  • 20:47 mutante: [kubemaster1001:~] $ sudo kubectl delete pod linkrecommendation-production-load-datasets-1618311600-hn6k8 -n linkrecommendation (T280076)
  • 19:35 kharlan@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'linkrecommendation' for release 'production' .
  • 19:35 kharlan@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'linkrecommendation' for release 'external' .
  • 19:32 kharlan@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'linkrecommendation' for release 'production' .
  • 19:32 kharlan@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'linkrecommendation' for release 'external' .
  • 19:29 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wtp1033.eqiad.wmnet with reason: REIMAGE
  • 19:29 kharlan@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'linkrecommendation' for release 'staging' .
  • 19:27 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wtp1032.eqiad.wmnet with reason: REIMAGE
  • 19:27 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on wtp1033.eqiad.wmnet with reason: REIMAGE
  • 19:25 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on wtp1032.eqiad.wmnet with reason: REIMAGE
  • 19:17 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on parse2020.codfw.wmnet with reason: REIMAGE
  • 19:15 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on parse2020.codfw.wmnet with reason: REIMAGE
  • 19:11 jhuneidi@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.37.0-wmf.1
  • 18:45 jhuneidi@deploy1002: Pruned MediaWiki: 1.36.0-wmf.37 (duration: 03m 16s)
  • 18:11 jhuneidi@deploy1002: Finished scap: testwikis wikis to 1.37.0-wmf.1 (duration: 30m 36s)
  • 18:01 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wtp1031.eqiad.wmnet with reason: REIMAGE
  • 17:59 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on wtp1031.eqiad.wmnet with reason: REIMAGE
  • 17:58 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wtp1030.eqiad.wmnet with reason: REIMAGE
  • 17:56 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on wtp1030.eqiad.wmnet with reason: REIMAGE
  • 17:54 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on parse2019.codfw.wmnet with reason: REIMAGE
  • 17:54 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.cf (exit_code=0)
  • 17:54 ayounsi@cumin1001: START - Cookbook sre.network.cf
  • 17:52 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on parse2018.codfw.wmnet with reason: REIMAGE
  • 17:52 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on parse2019.codfw.wmnet with reason: REIMAGE
  • 17:50 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on parse2017.codfw.wmnet with reason: REIMAGE
  • 17:50 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on parse2018.codfw.wmnet with reason: REIMAGE
  • 17:48 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on parse2017.codfw.wmnet with reason: REIMAGE
  • 17:41 jhuneidi@deploy1002: Started scap: testwikis wikis to 1.37.0-wmf.1
  • 17:29 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.cf (exit_code=0)
  • 17:28 ayounsi@cumin1001: START - Cookbook sre.network.cf
  • 17:21 mutante: gerrit1001 - remove /var/lib/gerrit2/review_site/static/gerrit-theme.html after https://gerrit.wikimedia.org/r/c/operations/puppet/+/678646
  • 16:38 marostegui@cumin1001: dbctl commit (dc=all): 'db1184 (re)pooling @ 100%: Slowly pool db1184 for the first time in s1 T275633', diff saved to https://phabricator.wikimedia.org/P15311 and previous config saved to /var/cache/conftool/dbconfig/20210413-163851-root.json
  • 16:23 marostegui@cumin1001: dbctl commit (dc=all): 'db1184 (re)pooling @ 90%: Slowly pool db1184 for the first time in s1 T275633', diff saved to https://phabricator.wikimedia.org/P15310 and previous config saved to /var/cache/conftool/dbconfig/20210413-162347-root.json
  • 16:08 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wtp1029.eqiad.wmnet with reason: REIMAGE
  • 16:08 marostegui@cumin1001: dbctl commit (dc=all): 'db1184 (re)pooling @ 80%: Slowly pool db1184 for the first time in s1 T275633', diff saved to https://phabricator.wikimedia.org/P15309 and previous config saved to /var/cache/conftool/dbconfig/20210413-160844-root.json
  • 16:06 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wtp1028.eqiad.wmnet with reason: REIMAGE
  • 16:05 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on wtp1029.eqiad.wmnet with reason: REIMAGE
  • 16:04 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on parse2016.codfw.wmnet with reason: REIMAGE
  • 16:03 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on wtp1028.eqiad.wmnet with reason: REIMAGE
  • 16:02 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on parse2015.codfw.wmnet with reason: REIMAGE
  • 16:02 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on parse2016.codfw.wmnet with reason: REIMAGE
  • 16:00 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on parse2014.codfw.wmnet with reason: REIMAGE
  • 16:00 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on parse2015.codfw.wmnet with reason: REIMAGE
  • 15:58 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on parse2014.codfw.wmnet with reason: REIMAGE
  • 15:53 marostegui@cumin1001: dbctl commit (dc=all): 'db1184 (re)pooling @ 70%: Slowly pool db1184 for the first time in s1 T275633', diff saved to https://phabricator.wikimedia.org/P15308 and previous config saved to /var/cache/conftool/dbconfig/20210413-155340-root.json
  • 15:38 marostegui@cumin1001: dbctl commit (dc=all): 'db1184 (re)pooling @ 60%: Slowly pool db1184 for the first time in s1 T275633', diff saved to https://phabricator.wikimedia.org/P15307 and previous config saved to /var/cache/conftool/dbconfig/20210413-153836-root.json
  • 15:26 herron: migrating kafka-logging broker logstash1010 to kafka-logging1001 T279342
  • 15:24 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:23 marostegui@cumin1001: dbctl commit (dc=all): 'db1184 (re)pooling @ 50%: Slowly pool db1184 for the first time in s1 T275633', diff saved to https://phabricator.wikimedia.org/P15306 and previous config saved to /var/cache/conftool/dbconfig/20210413-152333-root.json
  • 15:21 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 15:12 Trey314159: reindexing English wikis on elastic@eqiad, elastic@codfw, and cloudelastic complete (with some failures) (T274200)
  • 15:08 marostegui@cumin1001: dbctl commit (dc=all): 'db1184 (re)pooling @ 40%: Slowly pool db1184 for the first time in s1 T275633', diff saved to https://phabricator.wikimedia.org/P15305 and previous config saved to /var/cache/conftool/dbconfig/20210413-150829-root.json
  • 14:53 marostegui@cumin1001: dbctl commit (dc=all): 'db1184 (re)pooling @ 30%: Slowly pool db1184 for the first time in s1 T275633', diff saved to https://phabricator.wikimedia.org/P15304 and previous config saved to /var/cache/conftool/dbconfig/20210413-145325-root.json
  • 14:38 marostegui@cumin1001: dbctl commit (dc=all): 'db1184 (re)pooling @ 20%: Slowly pool db1184 for the first time in s1 T275633', diff saved to https://phabricator.wikimedia.org/P15303 and previous config saved to /var/cache/conftool/dbconfig/20210413-143821-root.json
  • 14:34 marostegui@cumin1001: dbctl commit (dc=all): 'Pool db1184 with minimal weight on s1 for the first time T275633', diff saved to https://phabricator.wikimedia.org/P15302 and previous config saved to /var/cache/conftool/dbconfig/20210413-143419-marostegui.json
  • 14:10 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wtp1027.eqiad.wmnet with reason: REIMAGE
  • 14:09 moritzm: updated bullseye d-i image to 2021-04-12 daily build T275873
  • 14:08 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on wtp1027.eqiad.wmnet with reason: REIMAGE
  • 14:08 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wtp1026.eqiad.wmnet with reason: REIMAGE
  • 14:06 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on wtp1026.eqiad.wmnet with reason: REIMAGE
  • 14:04 marostegui@cumin1001: dbctl commit (dc=all): 'Pool db1184 with minimal weight on s1 for the first time T275633', diff saved to https://phabricator.wikimedia.org/P15301 and previous config saved to /var/cache/conftool/dbconfig/20210413-140431-marostegui.json
  • 14:03 marostegui@cumin1001: dbctl commit (dc=all): 'db1184 (re)pooling @ 20%: Slowly pool db1184 for the first time in s1 T275633', diff saved to https://phabricator.wikimedia.org/P15300 and previous config saved to /var/cache/conftool/dbconfig/20210413-140353-root.json
  • 14:03 _joe_: uploading new versions of the mcrouter, php7.2-fpm and php7.3-fpm images to the registry
  • 14:01 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on parse2013.codfw.wmnet with reason: REIMAGE
  • 13:59 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on parse2012.codfw.wmnet with reason: REIMAGE
  • 13:59 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on parse2013.codfw.wmnet with reason: REIMAGE
  • 13:57 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on parse2011.codfw.wmnet with reason: REIMAGE
  • 13:57 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on parse2012.codfw.wmnet with reason: REIMAGE
  • 13:55 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on parse2011.codfw.wmnet with reason: REIMAGE
  • 13:36 marostegui@cumin1001: dbctl commit (dc=all): 'db1180 (re)pooling @ 100%: Slowly pool db1180 for the first time in s6 T275633', diff saved to https://phabricator.wikimedia.org/P15299 and previous config saved to /var/cache/conftool/dbconfig/20210413-133644-root.json
  • 13:21 marostegui@cumin1001: dbctl commit (dc=all): 'db1180 (re)pooling @ 90%: Slowly pool db1180 for the first time in s6 T275633', diff saved to https://phabricator.wikimedia.org/P15298 and previous config saved to /var/cache/conftool/dbconfig/20210413-132140-root.json
  • 13:06 marostegui@cumin1001: dbctl commit (dc=all): 'db1180 (re)pooling @ 80%: Slowly pool db1180 for the first time in s6 T275633', diff saved to https://phabricator.wikimedia.org/P15297 and previous config saved to /var/cache/conftool/dbconfig/20210413-130637-root.json
  • 12:56 marostegui@cumin1001: dbctl commit (dc=all): 'Pool db1184 with minimal weight on s1 for the first time T275633', diff saved to https://phabricator.wikimedia.org/P15296 and previous config saved to /var/cache/conftool/dbconfig/20210413-125652-marostegui.json
  • 12:51 marostegui@cumin1001: dbctl commit (dc=all): 'db1180 (re)pooling @ 70%: Slowly pool db1180 for the first time in s6 T275633', diff saved to https://phabricator.wikimedia.org/P15295 and previous config saved to /var/cache/conftool/dbconfig/20210413-125133-root.json
  • 12:36 marostegui@cumin1001: dbctl commit (dc=all): 'db1180 (re)pooling @ 60%: Slowly pool db1180 for the first time in s6 T275633', diff saved to https://phabricator.wikimedia.org/P15294 and previous config saved to /var/cache/conftool/dbconfig/20210413-123629-root.json
  • 12:22 marostegui@cumin1001: dbctl commit (dc=all): 'Pool db1184 with minimal weight on s1 for the first time T275633', diff saved to https://phabricator.wikimedia.org/P15293 and previous config saved to /var/cache/conftool/dbconfig/20210413-122248-marostegui.json
  • 12:21 marostegui@cumin1001: dbctl commit (dc=all): 'db1180 (re)pooling @ 50%: Slowly pool db1180 for the first time in s6 T275633', diff saved to https://phabricator.wikimedia.org/P15292 and previous config saved to /var/cache/conftool/dbconfig/20210413-122126-root.json
  • 12:21 marostegui@cumin1001: dbctl commit (dc=all): 'Remove db1076 from dbctl T274752', diff saved to https://phabricator.wikimedia.org/P15291 and previous config saved to /var/cache/conftool/dbconfig/20210413-122119-marostegui.json
  • 12:13 dcausse: deleting stale wikidata indices on cloudelastic (T231517)
  • 11:34 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on parse2010.codfw.wmnet with reason: REIMAGE
  • 11:32 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on parse2009.codfw.wmnet with reason: REIMAGE
  • 11:31 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on parse2010.codfw.wmnet with reason: REIMAGE
  • 11:30 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on parse2008.codfw.wmnet with reason: REIMAGE
  • 11:29 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on parse2009.codfw.wmnet with reason: REIMAGE
  • 11:27 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on parse2008.codfw.wmnet with reason: REIMAGE
  • 11:17 jbond42: switch debmonitor internal service to apache
  • 11:14 ladsgroup@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Disable legacy javascript variables in zhwiki (T72470) (duration: 00m 57s)
  • 10:56 marostegui@cumin1001: dbctl commit (dc=all): 'db1180 (re)pooling @ 40%: Slowly pool db1180 for the first time in s6 T275633', diff saved to https://phabricator.wikimedia.org/P15289 and previous config saved to /var/cache/conftool/dbconfig/20210413-105625-root.json
  • 10:55 kharlan@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'linkrecommendation' for release 'production' .
  • 10:55 kharlan@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'linkrecommendation' for release 'external' .
  • 10:43 kharlan@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'linkrecommendation' for release 'external' .
  • 10:43 kharlan@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'linkrecommendation' for release 'production' .
  • 10:41 marostegui@cumin1001: dbctl commit (dc=all): 'db1180 (re)pooling @ 30%: Slowly pool db1180 for the first time in s6 T275633', diff saved to https://phabricator.wikimedia.org/P15288 and previous config saved to /var/cache/conftool/dbconfig/20210413-104121-root.json
  • 10:39 kharlan@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'linkrecommendation' for release 'staging' .
  • 10:35 jbond42: switch debmonitor internal interface to use to use apache
  • 10:33 moritzm: restarting FPM on mw canaries to pick up OpenSSL updates
  • 10:31 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=aqs1004.eqiad.wmnet
  • 10:28 hnowlan@deploy1002: Finished deploy [analytics/aqs/deploy@3227eea]: (no justification provided) (duration: 03m 08s)
  • 10:26 marostegui@cumin1001: dbctl commit (dc=all): 'db1180 (re)pooling @ 20%: Slowly pool db1180 for the first time in s6 T275633', diff saved to https://phabricator.wikimedia.org/P15287 and previous config saved to /var/cache/conftool/dbconfig/20210413-102617-root.json
  • 10:25 hnowlan@deploy1002: Started deploy [analytics/aqs/deploy@3227eea]: (no justification provided)
  • 10:22 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=aqs1004.eqiad.wmnet
  • 09:57 marostegui@cumin1001: dbctl commit (dc=all): 'Pool db1180 with minimal weight on s6 for the first time T275633', diff saved to https://phabricator.wikimedia.org/P15286 and previous config saved to /var/cache/conftool/dbconfig/20210413-095717-marostegui.json
  • 09:41 ema: cp[5002-5006]: rolling varnish-frontend-restart to apply exp policy settings changes starting from empty caches T275809
  • 09:34 jmm@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts dbmonitor1001.wikimedia.org
  • 09:32 marostegui@cumin1001: dbctl commit (dc=all): 'db1112 (re)pooling @ 100%: Repool db1112 after schema change', diff saved to https://phabricator.wikimedia.org/P15285 and previous config saved to /var/cache/conftool/dbconfig/20210413-093208-root.json
  • 09:22 jayme@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
  • 09:21 jayme@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
  • 09:17 marostegui@cumin1001: dbctl commit (dc=all): 'db1112 (re)pooling @ 75%: Repool db1112 after schema change', diff saved to https://phabricator.wikimedia.org/P15284 and previous config saved to /var/cache/conftool/dbconfig/20210413-091704-root.json
  • 09:16 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on parse2007.codfw.wmnet with reason: REIMAGE
  • 09:16 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 09:14 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on parse2006.codfw.wmnet with reason: REIMAGE
  • 09:14 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on parse2007.codfw.wmnet with reason: REIMAGE
  • 09:14 marostegui@cumin1001: dbctl commit (dc=all): 'Pool db1180 with minimal weight on s6 for the first time T275633', diff saved to https://phabricator.wikimedia.org/P15283 and previous config saved to /var/cache/conftool/dbconfig/20210413-091414-marostegui.json
  • 09:13 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 09:12 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on parse2005.codfw.wmnet with reason: REIMAGE
  • 09:12 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on parse2006.codfw.wmnet with reason: REIMAGE
  • 09:10 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on parse2005.codfw.wmnet with reason: REIMAGE
  • 09:06 jmm@cumin1001: START - Cookbook sre.hosts.decommission for hosts dbmonitor1001.wikimedia.org
  • 09:02 marostegui@cumin1001: dbctl commit (dc=all): 'db1112 (re)pooling @ 50%: Repool db1112 after schema change', diff saved to https://phabricator.wikimedia.org/P15282 and previous config saved to /var/cache/conftool/dbconfig/20210413-090201-root.json
  • 08:59 kharlan@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'linkrecommendation' for release 'production' .
  • 08:59 kharlan@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'linkrecommendation' for release 'external' .
  • 08:50 marostegui@cumin1001: dbctl commit (dc=all): 'Pool db1180 with minimal weight on s6 for the first time T275633', diff saved to https://phabricator.wikimedia.org/P15281 and previous config saved to /var/cache/conftool/dbconfig/20210413-085057-marostegui.json
  • 08:46 marostegui@cumin1001: dbctl commit (dc=all): 'db1112 (re)pooling @ 25%: Repool db1112 after schema change', diff saved to https://phabricator.wikimedia.org/P15280 and previous config saved to /var/cache/conftool/dbconfig/20210413-084657-root.json
  • 08:31 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 7ca7673: mswiki: Fix help panel links (T277562) (duration: 00m 58s)
  • 08:18 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 08:18 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 08:16 marostegui: Restart sanitarium hosts db1124, db1125, db1154, db1155, db2094, db2095 T279587
  • 08:09 akosiaris: Remove system maintenance message from OTRS. Migration to Znuny 6.0.33 done. T279303
  • 08:02 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on parse2004.codfw.wmnet with reason: REIMAGE
  • 08:00 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on parse2003.codfw.wmnet with reason: REIMAGE
  • 08:00 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on parse2004.codfw.wmnet with reason: REIMAGE
  • 07:58 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on parse2002.codfw.wmnet with reason: REIMAGE
  • 07:58 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on parse2003.codfw.wmnet with reason: REIMAGE
  • 07:56 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on parse2002.codfw.wmnet with reason: REIMAGE
  • 07:49 jiji@cumin1001: conftool action : set/pooled=yes; selector: cluster=videoscaler,name=(mw1311.eqiad.wmnet|mw1318.eqiad.wmnet|mw1334.eqiad.wmnet)
  • 07:46 akosiaris: Start up all components on otrs1001. T279303
  • 07:38 jiji@cumin1001: conftool action : set/weight=10; selector: cluster=jobrunner,name=mw1318.eqiad.wmnet
  • 07:38 jiji@cumin1001: conftool action : set/weight=10; selector: cluster=jobrunner,name=mw1334.eqiad.wmnet
  • 07:30 akosiaris: migrating to Znuny-6.0.33, release 2021-03-10 . T279303
  • 07:26 akosiaris: shutdown all OTRS components on otrs1001, prep for OTRS -> Znuny migration. T279303
  • 05:56 _joe_: restarting blazegraph on wdqs1013
  • 05:44 eileen: civicrm revision changed from ecc32d2a35 to 76bd8ff009, config revision is c5fc1b91e0
  • 04:57 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1112 for schema change', diff saved to https://phabricator.wikimedia.org/P15278 and previous config saved to /var/cache/conftool/dbconfig/20210413-045708-marostegui.json

2021-04-12

  • 23:25 krinkle@deploy1002: Synchronized wmf-config/mc.php: I390b47 (duration: 00m 58s)
  • 23:06 jforrester@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: wgAbuseFilterAflFilterMigrationStage: Make COMPAT_NEW in production (T269712) (duration: 00m 58s)
  • 18:31 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 117743f: Enable assignment of importupload on enwikibooks (T278683) (duration: 00m 57s)
  • 18:22 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: a1949fd: Add extendedconfirmed on svwiki (T279836) (duration: 00m 59s)
  • 18:15 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 5d275ec: Add abusefilter-maintainer to wmgPrivilegedGlobalGroups (T279835) (duration: 00m 58s)
  • 18:10 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 13b10d3: Enable <mapframe> on bswiki (T279635) (duration: 00m 57s)
  • 18:04 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: ae05f7c: Replace ombudsman with ombuds in wmgPrivilegedGlobalGroups (T256299) (duration: 00m 57s)
  • 18:03 urbanecm@deploy1002: sync-file aborted: ae05f7c: Replace ombudsman with ombuds in wmgPrivilegedGlobalGroups (T256299ú (duration: 00m 00s)
  • 11:29 ladsgroup@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Disable legacy javascript in jawiki (T72470) (duration: 00m 56s)
  • 11:26 ladsgroup@deploy1002: Synchronized php-1.36.0-wmf.38/extensions/FlaggedRevs/frontend/FlaggedRevsXML.php: Backport: Don't do strict equal condition check (T279750) (duration: 00m 57s)
  • 11:19 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: NO-OP: 6c03d6a: Explicitly set wgGEMentorshipMigrationStage: WRITE_OLD/READ_OLD (T279853) (duration: 00m 58s)
  • 11:13 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: wikidata: post edit constraint jobs on 60% of edits (T204031) (duration: 01m 13s)
  • 11:11 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings-labs.php: Config: Remove all remains of idGeneratorLogging (T274156) (2/2, Beta-only) (duration: 00m 56s)
  • 11:10 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Remove all remains of idGeneratorLogging (T274156) (1/2) (duration: 00m 57s)
  • 11:07 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/Wikibase.php: Config: Remove idGeneratorLogging (T274156) (duration: 00m 58s)
  • 11:00 ladsgroup@deploy1002: Synchronized portals: Wikimedia Portals Update: Bumping portals to master (T279398 T279419) (duration: 00m 58s)
  • 10:59 ladsgroup@deploy1002: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: Bumping portals to master (T279398 T279419) (duration: 00m 58s)
  • 09:55 urbanecm@deploy1002: Synchronized private/PrivateSettings.php: Update T250887 mitigations (duration: 00m 57s)
  • 09:44 Urbanecm: Start server-side upload for 4 video files #2 (T279878, T279839, T279818)
  • 08:43 Urbanecm: Start server-side upload for 4 video files (T279878, T279839, T279818)
  • 08:08 jiji@cumin1001: conftool action : set/weight=20; selector: cluster=jobrunner,name=mw1318.eqiad.wmnet
  • 08:07 jiji@cumin1001: conftool action : set/weight=20; selector: cluster=jobrunner,name=mw1334.eqiad.wmnet
  • 08:07 jiji@cumin1001: conftool action : set/pooled=no; selector: cluster=videoscaler,name=mw1311.eqiad.wmnet
  • 08:06 jiji@cumin1001: conftool action : set/pooled=no; selector: cluster=videoscaler,name=mw1318.eqiad.wmnet
  • 08:06 jiji@cumin1001: conftool action : set/pooled=no; selector: cluster=videoscaler,name=mw1334.eqiad.wmnet
  • 08:05 vgutierrez: restart acme-chief

2021-04-10

  • 14:21 andrew@deploy1002: Finished deploy [horizon/deploy@ee1be56]: fix for T279699 (duration: 04m 12s)
  • 14:17 andrew@deploy1002: Started deploy [horizon/deploy@ee1be56]: fix for T279699
  • 14:11 andrew@deploy1002: Finished deploy [horizon/deploy@ee1be56]: cloudweb2001-dev deploy partial fix for T279699 (duration: 02m 21s)
  • 14:08 andrew@deploy1002: Started deploy [horizon/deploy@ee1be56]: cloudweb2001-dev deploy partial fix for T279699
  • 14:08 andrew@deploy1002: Finished deploy [horizon/deploy@ee1be56]: cloudweb2001-dev deploy partial fix for T279699 (duration: 00m 11s)
  • 14:08 andrew@deploy1002: Started deploy [horizon/deploy@ee1be56]: cloudweb2001-dev deploy partial fix for T279699

2021-04-09

  • 14:07 jynus: retry es4 backup dump on eqiad (backup1002)
  • 01:25 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on moss-be2002.codfw.wmnet with reason: REIMAGE
  • 01:23 pt1979@cumin2001: START - Cookbook sre.hosts.downtime for 2:00:00 on moss-be2002.codfw.wmnet with reason: REIMAGE
  • 00:56 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on moss-be2001.codfw.wmnet with reason: REIMAGE
  • 00:54 pt1979@cumin2001: START - Cookbook sre.hosts.downtime for 2:00:00 on moss-be2001.codfw.wmnet with reason: REIMAGE
  • 00:49 legoktm: imported mailman3 backports on apt.wm.o (T278905)

2021-04-08

  • 23:48 brennen@deploy1002: Synchronized php-1.36.0-wmf.38/extensions/WikibaseMediaInfo/resources/mediasearch-vue/store/actions.js: Backport: Do not show "invalid search" message when request is aborted by user (TT277714) (duration: 00m 57s)
  • 22:12 tgr@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'linkrecommendation' for release 'external' .
  • 22:12 tgr@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'linkrecommendation' for release 'production' .
  • 22:00 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1046.eqiad.wmnet with reason: REIMAGE
  • 21:57 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1045.eqiad.wmnet with reason: REIMAGE
  • 21:56 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1046.eqiad.wmnet with reason: REIMAGE
  • 21:55 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1044.eqiad.wmnet with reason: REIMAGE
  • 21:54 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1045.eqiad.wmnet with reason: REIMAGE
  • 21:53 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1043.eqiad.wmnet with reason: REIMAGE
  • 21:52 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1044.eqiad.wmnet with reason: REIMAGE
  • 21:51 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1043.eqiad.wmnet with reason: REIMAGE
  • 21:50 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1042.eqiad.wmnet with reason: REIMAGE
  • 21:48 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1041.eqiad.wmnet with reason: REIMAGE
  • 21:48 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1042.eqiad.wmnet with reason: REIMAGE
  • 21:47 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1041.eqiad.wmnet with reason: REIMAGE
  • 21:46 tgr@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'linkrecommendation' for release 'external' .
  • 21:46 tgr@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'linkrecommendation' for release 'production' .
  • 21:38 andrew@deploy1002: Finished deploy [horizon/deploy@3abe9d0]: Fix for T279667 (duration: 03m 52s)
  • 21:34 andrew@deploy1002: Started deploy [horizon/deploy@3abe9d0]: Fix for T279667
  • 21:33 tgr@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'linkrecommendation' for release 'staging' .
  • 20:33 mutante: mw2403 through mw2411 pooled and set to active state in netbox (T279599)
  • 20:32 mutante: mw2304 through mw2411 - pooled and set to active state in netbox (T279599)
  • 20:30 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw240[3-9].codfw.wmnet
  • 20:28 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw241[0-1].codfw.wmnet
  • 20:27 legoktm: legoktm@deploy1002:~$ cat deb-parsoid-urls.txt | mwscript purgeList.php --wiki=aawiki # to clear releases.wm.o/debian/ cache
  • 20:02 legoktm: imported parsoid_0.11.1all_all.deb to releases.wikimedia.org apt repo
  • 19:58 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw241[0-1].codfw.wmnet
  • 19:58 dzahn@cumin1001: conftool action : set/weight=10; selector: name=mw241[0-1].codfw.wmnet
  • 19:57 dzahn@cumin1001: conftool action : set/weight=10; selector: name=mw238[0-2].codfw.wmnet
  • 19:56 dzahn@cumin1001: conftool action : set/weight=10; selector: name=mw2379.codfw.wmnet
  • 19:55 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw240[3-9].codfw.wmnet
  • 19:54 dzahn@cumin1001: conftool action : set/weight=30; selector: name=mw240[3-9].codfw.wmnet
  • 19:50 mutante: mw2403 through mw2411 - scap pull - new hardware
  • 19:35 dduvall@deploy1002: rebuilt and synchronized wikiversions files: all wikis to 1.36.0-wmf.38
  • 18:52 phuedx: phuedx@deploy1002 Synchronized private/PrivateSettings.php: PrivateSettings: Add value for $wgWMEVectorPrefDiffSalt (T261842)
  • 18:51 phuedx@deploy1002: Synchronized private/PrivateSettings.php: PrivateSettings: Add value for (T261842) (duration: 01m 06s)
  • 18:37 mutante: mw2403 through mw2411 - serial rebooting
  • 18:31 tgr@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'linkrecommendation' for release 'production' .
  • 18:31 tgr@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'linkrecommendation' for release 'external' .
  • 18:29 urbanecm@deploy1002: Synchronized php-1.36.0-wmf.38/extensions/VisualEditor/modules/ve-mw/ui/tools/ve.ui.MWBackTool.js: e0f3735: Revert incorrect changes to ve.ui.MWBackCommand that made it stop working (T279613) (duration: 01m 07s)
  • 18:25 tgr@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'linkrecommendation' for release 'production' .
  • 18:25 tgr@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'linkrecommendation' for release 'external' .
  • 18:23 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw[2410-2411].codfw.wmnet with reason: new_install
  • 18:23 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw[2410-2411].codfw.wmnet with reason: new_install
  • 18:23 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on 7 hosts with reason: new_install
  • 18:22 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on 7 hosts with reason: new_install
  • 18:03 mutante: mw2403 through mw2411 - new hardware moving into production, not pooled yet, initial puppet run, being added to icinga etc, creating mcrouter certs for them (T279599)
  • 18:02 mutante: mw2403 through mw2401 - new hardwere moving into production, not pooled yet, initial puppet run, being added to icinga etc, creating mcrouter certs for them (T279599)
  • 17:59 tgr@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'linkrecommendation' for release 'staging' .
  • 17:52 ryankemper@cumin2001: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99)
  • 17:29 jgiannelos@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'proton' for release 'production' .
  • 17:23 jgiannelos@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'proton' for release 'production' .
  • 17:18 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'proton' for release 'production' .
  • 17:16 dancy: Scap 3.17.0 deployed to beta cluster
  • 16:51 dancy: testing Scap 3.17.0 release on deployment-deploy01
  • 16:33 elukey: reboot an-worker1100 again to check if all the disks come up correctly
  • 16:16 cmjohnson1: update bios cp1087, already deposed for h/w issues T278729
  • 16:15 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wtp1025.eqiad.wmnet with reason: REIMAGE
  • 16:13 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on wtp1025.eqiad.wmnet with reason: REIMAGE
  • 16:10 pt1979@cumin2001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:05 pt1979@cumin2001: START - Cookbook sre.dns.netbox
  • 15:51 pt1979@cumin2001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:44 pt1979@cumin2001: START - Cookbook sre.dns.netbox
  • 15:36 elukey: reboot an-worker1100 to see if it helps with the strange BBU behavior
  • 13:55 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts cloudcephmon2001-dev.codfw.wmnet
  • 13:44 andrew@cumin1001: START - Cookbook sre.hosts.decommission for hosts cloudcephmon2001-dev.codfw.wmnet
  • 13:41 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on parse2001.codfw.wmnet with reason: REIMAGE
  • 13:39 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on parse2001.codfw.wmnet with reason: REIMAGE
  • 13:24 moritzm: installing groff bugfix updates from Buster point release
  • 12:49 ema: cp5001: varnish-frontend-restart to test exp policy settings starting from a empty cache T275809
  • 12:44 moritzm: installing libbsd security updates for Buster
  • 12:39 moritzm: installing xcftools security updates
  • 12:31 marostegui@cumin1001: dbctl commit (dc=all): 'db1157 (re)pooling @ 100%: Repool after schema change', diff saved to https://phabricator.wikimedia.org/P15264 and previous config saved to /var/cache/conftool/dbconfig/20210408-123137-root.json
  • 12:16 marostegui@cumin1001: dbctl commit (dc=all): 'db1157 (re)pooling @ 75%: Repool after schema change', diff saved to https://phabricator.wikimedia.org/P15263 and previous config saved to /var/cache/conftool/dbconfig/20210408-121633-root.json
  • 12:01 marostegui@cumin1001: dbctl commit (dc=all): 'db1157 (re)pooling @ 50%: Repool after schema change', diff saved to https://phabricator.wikimedia.org/P15262 and previous config saved to /var/cache/conftool/dbconfig/20210408-120128-root.json
  • 11:58 XioNoX: tighten all routers loopback firewall filter - T207799
  • 11:57 zpapierski@deploy1002: Finished deploy [wikimedia/discovery/analytics@25dad72]: T273847 export queries to relforge dag deployment - elastic index name fix (duration: 00m 09s)
  • 11:57 zpapierski@deploy1002: Started deploy [wikimedia/discovery/analytics@25dad72]: T273847 export queries to relforge dag deployment - elastic index name fix
  • 11:50 XioNoX: tighten cr3-ulsfo loopback firewall filter - T207799
  • 11:49 zpapierski@deploy1002: Finished deploy [wikimedia/discovery/analytics@25dad72]: T273847 export queries to relforge dag deployment - elastic index name fix (duration: 01m 39s)
  • 11:47 zpapierski@deploy1002: Started deploy [wikimedia/discovery/analytics@25dad72]: T273847 export queries to relforge dag deployment - elastic index name fix
  • 11:46 marostegui@cumin1001: dbctl commit (dc=all): 'db1157 (re)pooling @ 25%: Repool after schema change', diff saved to https://phabricator.wikimedia.org/P15261 and previous config saved to /var/cache/conftool/dbconfig/20210408-114625-root.json
  • 11:23 marostegui@cumin1001: dbctl commit (dc=all): 'db1118 (re)pooling @ 100%: Repool db1118 after kernel upgrade', diff saved to https://phabricator.wikimedia.org/P15259 and previous config saved to /var/cache/conftool/dbconfig/20210408-112332-root.json
  • 11:09 filippo@cumin1001: END (ERROR) - Cookbook sre.hosts.reboot-single (exit_code=97) for host ms-be2028.codfw.wmnet
  • 11:08 marostegui@cumin1001: dbctl commit (dc=all): 'db1118 (re)pooling @ 75%: Repool db1118 after kernel upgrade', diff saved to https://phabricator.wikimedia.org/P15258 and previous config saved to /var/cache/conftool/dbconfig/20210408-110828-root.json
  • 11:07 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: de1670c: Enable Growth for newcomers on simplewiki, mswiki, tawiki (T278369; T277562; T277550) (duration: 01m 07s)
  • 10:53 marostegui@cumin1001: dbctl commit (dc=all): 'db1118 (re)pooling @ 50%: Repool db1118 after kernel upgrade', diff saved to https://phabricator.wikimedia.org/P15257 and previous config saved to /var/cache/conftool/dbconfig/20210408-105324-root.json
  • 10:47 effie: disable puppet on parsoid* servers
  • 10:41 XioNoX: enable sampling on all routers FPCs
  • 10:40 marostegui: Upgrade db2085's kernel
  • 10:38 marostegui@cumin1001: dbctl commit (dc=all): 'db1118 (re)pooling @ 25%: Repool db1118 after kernel upgrade', diff saved to https://phabricator.wikimedia.org/P15256 and previous config saved to /var/cache/conftool/dbconfig/20210408-103821-root.json
  • 10:37 mvolz@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'citoid' for release 'production' .
  • 10:32 XioNoX: enable sampling on cr1-codfw:fpc0
  • 10:30 marostegui: Upgrade kernel on db1118
  • 10:28 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1118 for kernel upgrade', diff saved to https://phabricator.wikimedia.org/P15255 and previous config saved to /var/cache/conftool/dbconfig/20210408-102855-marostegui.json
  • 10:27 effie: enable puppet on all mw* servers
  • 10:27 mvolz@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'citoid' for release 'production' .
  • 10:17 marostegui@cumin1001: dbctl commit (dc=all): 'db1111 (re)pooling @ 100%: Repool db1111', diff saved to https://phabricator.wikimedia.org/P15254 and previous config saved to /var/cache/conftool/dbconfig/20210408-101702-root.json
  • 10:17 jgiannelos@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'proton' for release 'production' .
  • 10:16 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be2028.codfw.wmnet
  • 10:13 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1157 for schema change', diff saved to https://phabricator.wikimedia.org/P15253 and previous config saved to /var/cache/conftool/dbconfig/20210408-101303-marostegui.json
  • 10:11 zpapierski@deploy1002: Finished deploy [wikimedia/discovery/analytics@ff0137d]: T273847 export queries to relforge dag deployment - start date update (duration: 01m 37s)
  • 10:11 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 100%: Repool after schema change', diff saved to https://phabricator.wikimedia.org/P15252 and previous config saved to /var/cache/conftool/dbconfig/20210408-101119-root.json
  • 10:10 mvolz@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'citoid' for release 'staging' .
  • 10:09 zpapierski@deploy1002: Started deploy [wikimedia/discovery/analytics@ff0137d]: T273847 export queries to relforge dag deployment - start date update
  • 10:08 marostegui@cumin1001: dbctl commit (dc=all): 'Add db1180 to dbctl T275633', diff saved to https://phabricator.wikimedia.org/P15251 and previous config saved to /var/cache/conftool/dbconfig/20210408-100829-marostegui.json
  • 10:07 jgiannelos@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'proton' for release 'production' .
  • 10:01 marostegui@cumin1001: dbctl commit (dc=all): 'db1111 (re)pooling @ 75%: Repool db1111', diff saved to https://phabricator.wikimedia.org/P15250 and previous config saved to /var/cache/conftool/dbconfig/20210408-100159-root.json
  • 09:58 jayme@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
  • 09:56 jayme@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
  • 09:56 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 75%: Repool after schema change', diff saved to https://phabricator.wikimedia.org/P15249 and previous config saved to /var/cache/conftool/dbconfig/20210408-095615-root.json
  • 09:46 marostegui@cumin1001: dbctl commit (dc=all): 'db1111 (re)pooling @ 50%: Repool db1111', diff saved to https://phabricator.wikimedia.org/P15248 and previous config saved to /var/cache/conftool/dbconfig/20210408-094655-root.json
  • 09:44 Urbanecm: [urbanecm@mwmaint1002 ~/uploads]$ mwscript importImages.php --wiki=commonswiki --comment-ext=txt --sleep=3600 --user=Lusccasdeutsch . # T278856
  • 09:42 marostegui@cumin1001: dbctl commit (dc=all): 'Add db1177 to dbctl T275633', diff saved to https://phabricator.wikimedia.org/P15247 and previous config saved to /var/cache/conftool/dbconfig/20210408-094218-marostegui.json
  • 09:42 effie: disable puppet in mw* servers for 677114
  • 09:41 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 50%: Repool after schema change', diff saved to https://phabricator.wikimedia.org/P15246 and previous config saved to /var/cache/conftool/dbconfig/20210408-094112-root.json
  • 09:36 Urbanecm: Retry server-side upload for T279192
  • 09:31 marostegui@cumin1001: dbctl commit (dc=all): 'db1111 (re)pooling @ 25%: Repool db1111', diff saved to https://phabricator.wikimedia.org/P15244 and previous config saved to /var/cache/conftool/dbconfig/20210408-093151-root.json
  • 09:30 moritzm: installing openssl updates for buster
  • 09:29 akosiaris@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
  • 09:29 akosiaris@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
  • 09:27 zpapierski@deploy1002: Finished deploy [wikimedia/discovery/analytics@d098717]: T273847 export queries to relforge dag deployment - sensor name fix (duration: 01m 48s)
  • 09:26 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 25%: Repool after schema change', diff saved to https://phabricator.wikimedia.org/P15243 and previous config saved to /var/cache/conftool/dbconfig/20210408-092608-root.json
  • 09:25 zpapierski@deploy1002: Started deploy [wikimedia/discovery/analytics@d098717]: T273847 export queries to relforge dag deployment - sensor name fix
  • 09:24 moritzm: installing libzstd security updates on buster
  • 09:20 ema: cp5001: varnish-frontend-restart to test exp policy settings starting from a empty cache T275809
  • 09:14 akosiaris@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
  • 09:14 akosiaris@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
  • 09:09 moritzm: installing underscore security updates on stretch
  • 08:56 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1166 for schema change', diff saved to https://phabricator.wikimedia.org/P15242 and previous config saved to /var/cache/conftool/dbconfig/20210408-085630-marostegui.json
  • 08:56 marostegui@cumin1001: dbctl commit (dc=all): 'db1175 (re)pooling @ 100%: Repool after schema change', diff saved to https://phabricator.wikimedia.org/P15241 and previous config saved to /var/cache/conftool/dbconfig/20210408-085610-root.json
  • 08:44 akosiaris@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'sync'.
  • 08:43 akosiaris@deploy1002: helmfile [eqiad] START helmfile.d/admin 'sync'.
  • 08:41 akosiaris@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'sync'.
  • 08:41 marostegui@cumin1001: dbctl commit (dc=all): 'db1175 (re)pooling @ 75%: Repool after schema change', diff saved to https://phabricator.wikimedia.org/P15240 and previous config saved to /var/cache/conftool/dbconfig/20210408-084107-root.json
  • 08:40 akosiaris@deploy1002: helmfile [codfw] START helmfile.d/admin 'sync'.
  • 08:39 akosiaris@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'sync'.
  • 08:38 akosiaris@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'sync'.
  • 08:38 akosiaris@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
  • 08:37 akosiaris@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
  • 08:33 moritzm: installing remaining curl security updates for buster
  • 08:26 marostegui@cumin1001: dbctl commit (dc=all): 'db1175 (re)pooling @ 50%: Repool after schema change', diff saved to https://phabricator.wikimedia.org/P15239 and previous config saved to /var/cache/conftool/dbconfig/20210408-082603-root.json
  • 08:24 marostegui: Stop MySQL on all db1117 sections to upgrade kernel
  • 08:17 moritzm: imported postgis 3.1.1+dfsg-1~wmf1 to component/postgis for buster-wikimedia T277064
  • 08:15 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2028.codfw.wmnet
  • 08:11 marostegui@cumin1001: dbctl commit (dc=all): 'db1175 (re)pooling @ 25%: Repool after schema change', diff saved to https://phabricator.wikimedia.org/P15238 and previous config saved to /var/cache/conftool/dbconfig/20210408-081059-root.json
  • 08:03 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be2028.codfw.wmnet
  • 07:54 marostegui@cumin1001: dbctl commit (dc=all): 'es1023 (re)pooling @ 100%: Repool es1023', diff saved to https://phabricator.wikimedia.org/P15237 and previous config saved to /var/cache/conftool/dbconfig/20210408-075457-root.json
  • 07:50 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1020.eqiad.wmnet
  • 07:49 marostegui@cumin1001: dbctl commit (dc=all): 'Remove weight from es5 master', diff saved to https://phabricator.wikimedia.org/P15236 and previous config saved to /var/cache/conftool/dbconfig/20210408-074911-marostegui.json
  • 07:45 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1175 for schema change', diff saved to https://phabricator.wikimedia.org/P15235 and previous config saved to /var/cache/conftool/dbconfig/20210408-074524-marostegui.json
  • 07:42 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be1020.eqiad.wmnet
  • 07:39 marostegui@cumin1001: dbctl commit (dc=all): 'es1023 (re)pooling @ 75%: Repool es1023', diff saved to https://phabricator.wikimedia.org/P15234 and previous config saved to /var/cache/conftool/dbconfig/20210408-073953-root.json
  • 07:24 marostegui@cumin1001: dbctl commit (dc=all): 'es1023 (re)pooling @ 50%: Repool es1023', diff saved to https://phabricator.wikimedia.org/P15233 and previous config saved to /var/cache/conftool/dbconfig/20210408-072450-root.json
  • 07:09 marostegui@cumin1001: dbctl commit (dc=all): 'es1023 (re)pooling @ 25%: Repool es1023', diff saved to https://phabricator.wikimedia.org/P15232 and previous config saved to /var/cache/conftool/dbconfig/20210408-070946-root.json
  • 06:56 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es1023 to upgrade kernel and mysql, remove weight from es1021, to leave it as it was yesterday T279281', diff saved to https://phabricator.wikimedia.org/P15231 and previous config saved to /var/cache/conftool/dbconfig/20210408-065627-marostegui.json
  • 06:44 elukey@deploy1002: Finished deploy [analytics/refinery@1dbbd3d] (hadoop-test): (no justification provided) (duration: 02m 20s)
  • 06:41 elukey@deploy1002: Started deploy [analytics/refinery@1dbbd3d] (hadoop-test): (no justification provided)
  • 06:33 marostegui: Stop MySQL on db1111 to clone db1177 T275633
  • 06:33 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1111 to clone db1177 T275633', diff saved to https://phabricator.wikimedia.org/P15229 and previous config saved to /var/cache/conftool/dbconfig/20210408-063331-marostegui.json
  • 06:01 kart_: Updated cxserver to 2021-04-07-062518-production (T278141, T263139, T271711, T201491, T240525, T207662)
  • 05:58 kartik@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'cxserver' for release 'production' .
  • 05:54 kartik@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'cxserver' for release 'production' .
  • 05:43 kartik@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'cxserver' for release 'staging' .
  • 02:50 AaronSchulz: Restarted importMissingLocalNames.php (mwmaint 1002, wiki=metawiki,batch-size=1000)

2021-04-07

  • 23:38 ejegg: updated payments-wiki from b06009c099 to 70f5163816,
  • 23:35 cstone: civicrm revision changed from eb9379daa3 to fdb4f90c74
  • 23:10 urbanecm@deploy1002: Synchronized wmf-config/Wikibase.php: 321bf91: Wikibase: sample function call counters at 1:100 (T277817) (duration: 01m 08s)
  • 22:49 mforns@deploy1002: Finished deploy [analytics/refinery@1dbbd3d] (hadoop-test): Regular analytics weekly train TEST retry1 [analytics/refinery@1dbbd3dfa996d2e970eb1cbc0a63d53040d4e3a3] (duration: 01m 47s)
  • 22:48 mforns@deploy1002: Started deploy [analytics/refinery@1dbbd3d] (hadoop-test): Regular analytics weekly train TEST retry1 [analytics/refinery@1dbbd3dfa996d2e970eb1cbc0a63d53040d4e3a3]
  • 22:33 mforns@deploy1002: Finished deploy [analytics/refinery@1dbbd3d] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@1dbbd3dfa996d2e970eb1cbc0a63d53040d4e3a3] (duration: 04m 15s)
  • 22:29 mforns@deploy1002: Started deploy [analytics/refinery@1dbbd3d] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@1dbbd3dfa996d2e970eb1cbc0a63d53040d4e3a3]
  • 22:29 mforns@deploy1002: Finished deploy [analytics/refinery@1dbbd3d] (thin): Regular analytics weekly train THIN [analytics/refinery@1dbbd3dfa996d2e970eb1cbc0a63d53040d4e3a3] (duration: 00m 07s)
  • 22:29 mforns@deploy1002: Started deploy [analytics/refinery@1dbbd3d] (thin): Regular analytics weekly train THIN [analytics/refinery@1dbbd3dfa996d2e970eb1cbc0a63d53040d4e3a3]
  • 22:28 mforns@deploy1002: Finished deploy [analytics/refinery@1dbbd3d]: Regular analytics weekly train [analytics/refinery@1dbbd3dfa996d2e970eb1cbc0a63d53040d4e3a3] (duration: 42m 54s)
  • 22:03 Amir1: clearing watchlist of bots in wikidatawiki (https://www.wikidata.org/w/index.php?title=Wikidata:Project_chat&oldid=1397670734#Clean_up_watchlist_of_bots)
  • 22:01 legoktm: deployed patch for T279451 (part 2)
  • 21:45 mforns@deploy1002: Started deploy [analytics/refinery@1dbbd3d]: Regular analytics weekly train [analytics/refinery@1dbbd3dfa996d2e970eb1cbc0a63d53040d4e3a3]
  • 21:31 legoktm: deployed patch for T279451
  • 21:22 mutante: mw2397 through mw2402 - pooled as new API appservers after scap pull and all monitoring green (T278396)
  • 21:21 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw240[0-2].codfw.wmnet
  • 21:21 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw239[7-9].codfw.wmnet
  • 21:05 mutante: mw2397 - mw2402 - scap pull
  • 21:04 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw240[0-2].codfw.wmnet
  • 21:04 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw239[7-9].codfw.wmnet
  • 21:04 dzahn@cumin1001: conftool action : set/weight=30; selector: name=mw240[0-2].codfw.wmnet
  • 21:03 Amir1: clearing watchlist of bots in enwiki (https://en.wikipedia.org/w/index.php?title=Wikipedia:Bots/Noticeboard&oldid=1016563560#Clearing_bot_watchlists)
  • 21:02 dzahn@cumin1001: conftool action : set/weight=30; selector: name=mw239[7-9].codfw.wmnet
  • 20:58 pt1979@cumin2001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 20:56 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on mw[2400-2401].codfw.wmnet with reason: new_install
  • 20:56 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on mw[2400-2401].codfw.wmnet with reason: new_install
  • 20:56 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on mw[2397-2399].codfw.wmnet with reason: new_install
  • 20:55 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on mw[2397-2399].codfw.wmnet with reason: new_install
  • 20:54 pt1979@cumin2001: START - Cookbook sre.dns.netbox
  • 20:54 mutante: mw2397 - mw2402 - rebooting
  • 20:51 pt1979@cumin2001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 20:44 pt1979@cumin2001: START - Cookbook sre.dns.netbox
  • 20:30 mutante: mw2397 through mw2402 - new hardware moving into production, initial puppet runs as appservers, added to monitoring etc (T278396)
  • 19:47 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on conf2006.codfw.wmnet with reason: REIMAGE
  • 19:45 pt1979@cumin2001: START - Cookbook sre.hosts.downtime for 2:00:00 on conf2006.codfw.wmnet with reason: REIMAGE
  • 19:35 krinkle@deploy1002: Synchronized wmf-config/CommonSettings.php: no-op for Beta (disable LocalisationUpdate extension) (duration: 01m 06s)
  • 19:24 dduvall@deploy1002: Synchronized php: group1 wikis to 1.36.0-wmf.38 (duration: 01m 06s)
  • 19:23 dduvall@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.36.0-wmf.38
  • 19:21 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on conf2005.codfw.wmnet with reason: REIMAGE
  • 19:19 pt1979@cumin2001: START - Cookbook sre.hosts.downtime for 2:00:00 on conf2005.codfw.wmnet with reason: REIMAGE
  • 18:54 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on conf2004.codfw.wmnet with reason: REIMAGE
  • 18:52 pt1979@cumin2001: START - Cookbook sre.hosts.downtime for 2:00:00 on conf2004.codfw.wmnet with reason: REIMAGE
  • 17:40 tgr@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'linkrecommendation' for release 'production' .
  • 17:40 tgr@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'linkrecommendation' for release 'external' .
  • 17:29 tgr@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'linkrecommendation' for release 'external' .
  • 17:29 tgr@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'linkrecommendation' for release 'production' .
  • 17:20 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on moss-fe2002.codfw.wmnet with reason: REIMAGE
  • 17:18 pt1979@cumin2001: START - Cookbook sre.hosts.downtime for 2:00:00 on moss-fe2002.codfw.wmnet with reason: REIMAGE
  • 16:45 tgr@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'linkrecommendation' for release 'staging' .
  • 15:47 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on moss-fe2001.codfw.wmnet with reason: REIMAGE
  • 15:45 pt1979@cumin2001: START - Cookbook sre.hosts.downtime for 2:00:00 on moss-fe2001.codfw.wmnet with reason: REIMAGE
  • 15:39 elukey@cumin1001: END (PASS) - Cookbook sre.aqs.roll-restart (exit_code=0)
  • 15:30 elukey@cumin1001: START - Cookbook sre.aqs.roll-restart
  • 15:13 Amir1: setting enwiki and enwikibooks to wmf.38 on mwdebug1002 to test flagged revs
  • 15:04 marostegui@cumin1001: dbctl commit (dc=all): 'db1173 (re)pooling @ 100%: Repool db1173 after cloning db1180', diff saved to https://phabricator.wikimedia.org/P15228 and previous config saved to /var/cache/conftool/dbconfig/20210407-150436-root.json
  • 14:49 marostegui@cumin1001: dbctl commit (dc=all): 'db1173 (re)pooling @ 75%: Repool db1173 after cloning db1180', diff saved to https://phabricator.wikimedia.org/P15227 and previous config saved to /var/cache/conftool/dbconfig/20210407-144933-root.json
  • 14:34 marostegui@cumin1001: dbctl commit (dc=all): 'db1173 (re)pooling @ 50%: Repool db1173 after cloning db1180', diff saved to https://phabricator.wikimedia.org/P15226 and previous config saved to /var/cache/conftool/dbconfig/20210407-143429-root.json
  • 14:33 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mw2411.codfw.wmnet with reason: REIMAGE
  • 14:31 pt1979@cumin2001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2411.codfw.wmnet with reason: REIMAGE
  • 14:19 effie: restarting pybal on lvs2009, lvs1015
  • 14:19 marostegui@cumin1001: dbctl commit (dc=all): 'db1173 (re)pooling @ 25%: Repool db1173 after cloning db1180', diff saved to https://phabricator.wikimedia.org/P15225 and previous config saved to /var/cache/conftool/dbconfig/20210407-141925-root.json
  • 14:16 effie: restarting pybal on lvs2010, lvs1016
  • 14:05 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mw2410.codfw.wmnet with reason: REIMAGE
  • 14:03 pt1979@cumin2001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2410.codfw.wmnet with reason: REIMAGE
  • 13:54 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mw2409.codfw.wmnet with reason: REIMAGE
  • 13:52 pt1979@cumin2001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2409.codfw.wmnet with reason: REIMAGE
  • 13:43 akosiaris@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'canary' .
  • 13:43 akosiaris@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
  • 13:42 akosiaris@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'canary' .
  • 13:42 akosiaris@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
  • 13:41 akosiaris@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
  • 13:41 akosiaris@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'canary' .
  • 13:39 moritzm: imported jenkins 2.277.2 to apt.wikimedia.org (thirdparty/ci) T279033
  • 13:37 akosiaris@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-analytics' for release 'production' .
  • 13:36 akosiaris@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-analytics' for release 'canary' .
  • 13:35 akosiaris@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-analytics' for release 'production' .
  • 13:35 akosiaris@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-analytics' for release 'canary' .
  • 13:33 akosiaris@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics' for release 'production' .
  • 13:33 akosiaris@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics' for release 'canary' .
  • 12:45 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2028.codfw.wmnet
  • 12:23 marostegui@cumin1001: dbctl commit (dc=all): 'db1118 (re)pooling @ 100%: Repool db1118 after schema change', diff saved to https://phabricator.wikimedia.org/P15224 and previous config saved to /var/cache/conftool/dbconfig/20210407-122304-root.json
  • 12:18 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sretest1002.eqiad.wmnet
  • 12:18 marostegui: Upgrade db1173's kernel
  • 12:18 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be2028.codfw.wmnet
  • 12:17 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1173', diff saved to https://phabricator.wikimedia.org/P15222 and previous config saved to /var/cache/conftool/dbconfig/20210407-121659-marostegui.json
  • 12:08 marostegui@cumin1001: dbctl commit (dc=all): 'db1118 (re)pooling @ 75%: Repool db1118 after schema change', diff saved to https://phabricator.wikimedia.org/P15221 and previous config saved to /var/cache/conftool/dbconfig/20210407-120800-root.json
  • 12:05 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host sretest1002.eqiad.wmnet
  • 11:52 marostegui@cumin1001: dbctl commit (dc=all): 'db1118 (re)pooling @ 50%: Repool db1118 after schema change', diff saved to https://phabricator.wikimedia.org/P15220 and previous config saved to /var/cache/conftool/dbconfig/20210407-115257-root.json
  • 11:39 marostegui: Deploy schema change on s3 codfw, lag will appear T276150 T276156
  • 11:37 marostegui@cumin1001: dbctl commit (dc=all): 'db1118 (re)pooling @ 25%: Repool db1118 after schema change', diff saved to https://phabricator.wikimedia.org/P15219 and previous config saved to /var/cache/conftool/dbconfig/20210407-113753-root.json
  • 11:17 marostegui@cumin1001: dbctl commit (dc=all): 'Add db1184 to s1 depooled T275633', diff saved to https://phabricator.wikimedia.org/P15218 and previous config saved to /var/cache/conftool/dbconfig/20210407-111708-marostegui.json
  • 11:15 ladsgroup@deploy1002: Synchronized wmf-config/flaggedrevs.php: flaggedrevs: Disable quality and pristine tier in all wikis (T277883) (duration: 02m 15s)
  • 10:56 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1118', diff saved to https://phabricator.wikimedia.org/P15217 and previous config saved to /var/cache/conftool/dbconfig/20210407-105617-marostegui.json
  • 10:51 marostegui: Stop apache on dbmonitor1001 T224589
  • 10:34 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1106', diff saved to https://phabricator.wikimedia.org/P15216 and previous config saved to /var/cache/conftool/dbconfig/20210407-103404-marostegui.json
  • 10:01 kormat@cumin1001: dbctl commit (dc=all): 'Repool db2106 and db2147 T279406', diff saved to https://phabricator.wikimedia.org/P15215 and previous config saved to /var/cache/conftool/dbconfig/20210407-100147-kormat.json
  • 09:58 jmm@cumin2001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host kraz.wikimedia.org
  • 09:58 moritzm: reboot kraz to nudge reconnects to irc2001.w.o for remaining connected clients
  • 09:58 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host kraz.wikimedia.org
  • 09:40 moritzm: imported git-lfs for bullseye/main (part of standard packages) T275873
  • 09:23 marostegui@cumin1001: dbctl commit (dc=all): 'db1105:3311 (re)pooling @ 100%: Repool db1105:3311 after schema change', diff saved to https://phabricator.wikimedia.org/P15214 and previous config saved to /var/cache/conftool/dbconfig/20210407-092320-root.json
  • 09:16 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1106', diff saved to https://phabricator.wikimedia.org/P15213 and previous config saved to /var/cache/conftool/dbconfig/20210407-091610-marostegui.json
  • 09:08 marostegui@cumin1001: dbctl commit (dc=all): 'db1105:3311 (re)pooling @ 75%: Repool db1105:3311 after schema change', diff saved to https://phabricator.wikimedia.org/P15212 and previous config saved to /var/cache/conftool/dbconfig/20210407-090817-root.json
  • 08:58 moritzm: imported quickstack for bullseye/main (part of standard packages) T275873
  • 08:53 marostegui@cumin1001: dbctl commit (dc=all): 'db1105:3311 (re)pooling @ 50%: Repool db1105:3311 after schema change', diff saved to https://phabricator.wikimedia.org/P15211 and previous config saved to /var/cache/conftool/dbconfig/20210407-085313-root.json
  • 08:52 kharlan@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'linkrecommendation' for release 'staging' .
  • 08:38 marostegui@cumin1001: dbctl commit (dc=all): 'db1105:3311 (re)pooling @ 25%: Repool db1105:3311 after schema change', diff saved to https://phabricator.wikimedia.org/P15210 and previous config saved to /var/cache/conftool/dbconfig/20210407-083809-root.json
  • 08:29 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2028.codfw.wmnet
  • 08:22 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be2028.codfw.wmnet
  • 08:15 marostegui@cumin1001: dbctl commit (dc=all): 'db1134 (re)pooling @ 100%: Repool db1134', diff saved to https://phabricator.wikimedia.org/P15209 and previous config saved to /var/cache/conftool/dbconfig/20210407-081508-root.json
  • 08:05 marostegui@cumin1001: dbctl commit (dc=all): 'db1163 (re)pooling @ 100%: Repool db1163 after upgrade', diff saved to https://phabricator.wikimedia.org/P15207 and previous config saved to /var/cache/conftool/dbconfig/20210407-080537-root.json
  • 08:04 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1105:3311', diff saved to https://phabricator.wikimedia.org/P15206 and previous config saved to /var/cache/conftool/dbconfig/20210407-080410-marostegui.json
  • 08:00 marostegui@cumin1001: dbctl commit (dc=all): 'db1134 (re)pooling @ 75%: Repool db1134', diff saved to https://phabricator.wikimedia.org/P15205 and previous config saved to /var/cache/conftool/dbconfig/20210407-080005-root.json
  • 07:50 marostegui@cumin1001: dbctl commit (dc=all): 'db1163 (re)pooling @ 75%: Repool db1163 after upgrade', diff saved to https://phabricator.wikimedia.org/P15204 and previous config saved to /var/cache/conftool/dbconfig/20210407-075034-root.json
  • 07:45 marostegui@cumin1001: dbctl commit (dc=all): 'db1134 (re)pooling @ 50%: Repool db1134', diff saved to https://phabricator.wikimedia.org/P15203 and previous config saved to /var/cache/conftool/dbconfig/20210407-074501-root.json
  • 07:35 marostegui@cumin1001: dbctl commit (dc=all): 'db1163 (re)pooling @ 50%: Repool db1163 after upgrade', diff saved to https://phabricator.wikimedia.org/P15201 and previous config saved to /var/cache/conftool/dbconfig/20210407-073530-root.json
  • 07:29 marostegui@cumin1001: dbctl commit (dc=all): 'db1134 (re)pooling @ 25%: Repool db1134', diff saved to https://phabricator.wikimedia.org/P15200 and previous config saved to /var/cache/conftool/dbconfig/20210407-072957-root.json
  • 07:20 marostegui@cumin1001: dbctl commit (dc=all): 'db1163 (re)pooling @ 25%: Repool db1163 after upgrade', diff saved to https://phabricator.wikimedia.org/P15199 and previous config saved to /var/cache/conftool/dbconfig/20210407-072027-root.json
  • 07:12 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1163 for kernel upgrade', diff saved to https://phabricator.wikimedia.org/P15198 and previous config saved to /var/cache/conftool/dbconfig/20210407-071219-marostegui.json
  • 07:03 gehel: repooling wdqs1005, catched up on lag
  • 06:59 gehel: depooling wdqs1005, restarting blazegraph and waiting for it to catchup on lag
  • 06:54 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1134 for schema change', diff saved to https://phabricator.wikimedia.org/P15197 and previous config saved to /var/cache/conftool/dbconfig/20210407-065450-marostegui.json
  • 06:30 marostegui@cumin1001: dbctl commit (dc=all): 'es1020 (re)pooling @ 100%: Repool es1020', diff saved to https://phabricator.wikimedia.org/P15196 and previous config saved to /var/cache/conftool/dbconfig/20210407-063033-root.json
  • 06:28 moritzm: restarting apache/FPM on mw canaries to pick up curl updates
  • 06:24 marostegui@cumin1001: dbctl commit (dc=all): 'db1135 (re)pooling @ 100%: Repool db1135 after schema change', diff saved to https://phabricator.wikimedia.org/P15195 and previous config saved to /var/cache/conftool/dbconfig/20210407-062451-root.json
  • 06:15 marostegui@cumin1001: dbctl commit (dc=all): 'es1020 (re)pooling @ 75%: Repool es1020', diff saved to https://phabricator.wikimedia.org/P15194 and previous config saved to /var/cache/conftool/dbconfig/20210407-061529-root.json
  • 06:09 marostegui@cumin1001: dbctl commit (dc=all): 'db1135 (re)pooling @ 75%: Repool db1135 after schema change', diff saved to https://phabricator.wikimedia.org/P15193 and previous config saved to /var/cache/conftool/dbconfig/20210407-060948-root.json
  • 06:00 marostegui@cumin1001: dbctl commit (dc=all): 'es1020 (re)pooling @ 50%: Repool es1020', diff saved to https://phabricator.wikimedia.org/P15192 and previous config saved to /var/cache/conftool/dbconfig/20210407-060026-root.json
  • 05:54 moritzm: installing curl security updates
  • 05:54 marostegui@cumin1001: dbctl commit (dc=all): 'db1135 (re)pooling @ 50%: Repool db1135 after schema change', diff saved to https://phabricator.wikimedia.org/P15191 and previous config saved to /var/cache/conftool/dbconfig/20210407-055444-root.json
  • 05:45 marostegui@cumin1001: dbctl commit (dc=all): 'es1020 (re)pooling @ 25%: Repool es1020', diff saved to https://phabricator.wikimedia.org/P15190 and previous config saved to /var/cache/conftool/dbconfig/20210407-054522-root.json
  • 05:41 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es1020 for upgrade', diff saved to https://phabricator.wikimedia.org/P15189 and previous config saved to /var/cache/conftool/dbconfig/20210407-054127-marostegui.json
  • 05:39 marostegui@cumin1001: dbctl commit (dc=all): 'db1135 (re)pooling @ 25%: Repool db1135 after schema change', diff saved to https://phabricator.wikimedia.org/P15188 and previous config saved to /var/cache/conftool/dbconfig/20210407-053940-root.json
  • 05:29 marostegui@cumin1001: dbctl commit (dc=all): 'es1020 (re)pooling @ 25%: Repool es1020', diff saved to https://phabricator.wikimedia.org/P15187 and previous config saved to /var/cache/conftool/dbconfig/20210407-052901-root.json
  • 05:07 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es1020 for upgrade', diff saved to https://phabricator.wikimedia.org/P15186 and previous config saved to /var/cache/conftool/dbconfig/20210407-050758-marostegui.json
  • 05:05 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1135 for schema change', diff saved to https://phabricator.wikimedia.org/P15185 and previous config saved to /var/cache/conftool/dbconfig/20210407-050530-marostegui.json
  • 03:28 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mw2408.codfw.wmnet with reason: REIMAGE
  • 03:26 pt1979@cumin2001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2408.codfw.wmnet with reason: REIMAGE
  • 03:19 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mw2407.codfw.wmnet with reason: REIMAGE
  • 03:17 pt1979@cumin2001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2407.codfw.wmnet with reason: REIMAGE
  • 02:20 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mw2406.codfw.wmnet with reason: REIMAGE
  • 02:18 pt1979@cumin2001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2406.codfw.wmnet with reason: REIMAGE
  • 02:05 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mw2405.codfw.wmnet with reason: REIMAGE
  • 02:03 pt1979@cumin2001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2405.codfw.wmnet with reason: REIMAGE
  • 01:53 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mw2404.codfw.wmnet with reason: REIMAGE
  • 01:51 pt1979@cumin2001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2404.codfw.wmnet with reason: REIMAGE
  • 01:46 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on cloudcephmon2004-dev.codfw.wmnet with reason: REIMAGE
  • 01:44 pt1979@cumin2001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephmon2004-dev.codfw.wmnet with reason: REIMAGE
  • 01:41 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mw2403.codfw.wmnet with reason: REIMAGE
  • 01:39 pt1979@cumin2001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2403.codfw.wmnet with reason: REIMAGE
  • 01:11 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mw2402.codfw.wmnet with reason: REIMAGE
  • 01:09 pt1979@cumin2001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2402.codfw.wmnet with reason: REIMAGE
  • 01:03 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mw2401.codfw.wmnet with reason: REIMAGE
  • 01:01 pt1979@cumin2001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2401.codfw.wmnet with reason: REIMAGE
  • 00:45 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mw2400.codfw.wmnet with reason: REIMAGE
  • 00:43 pt1979@cumin2001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2400.codfw.wmnet with reason: REIMAGE
  • 00:38 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mw2399.codfw.wmnet with reason: REIMAGE
  • 00:36 pt1979@cumin2001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2399.codfw.wmnet with reason: REIMAGE
  • 00:12 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mw2398.codfw.wmnet with reason: REIMAGE
  • 00:10 pt1979@cumin2001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2398.codfw.wmnet with reason: REIMAGE
  • 00:06 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mw2397.codfw.wmnet with reason: REIMAGE
  • 00:04 pt1979@cumin2001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2397.codfw.wmnet with reason: REIMAGE

2021-04-06

  • 23:36 urbanecm@deploy1002: Synchronized php-1.36.0-wmf.37/resources/src/: b8a0dab: Fix missing styles on diff (T279099) (duration: 01m 08s)
  • 23:16 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 997b6f3: thwikisource: Enable transwiki import (T275281) (duration: 01m 08s)
  • 23:04 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 4d12a86: Disable upcoming DiscussionTools features for now (duration: 01m 08s)
  • 19:49 dduvall@deploy1002: Pruned MediaWiki: 1.36.0-wmf.36 (duration: 01m 50s)
  • 19:47 dduvall@deploy1002: Pruned MediaWiki: 1.36.0-wmf.35 (duration: 02m 02s)
  • 19:45 dduvall@deploy1002: Pruned MediaWiki: 1.36.0-wmf.34 (duration: 03m 37s)
  • 19:40 marxarelli: 1.36.0-wmf.38 rolled to group0. error rates steady and no new errors spotted (T278344)
  • 19:26 dduvall@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.36.0-wmf.38
  • 18:41 dduvall@deploy1002: Finished scap: testwikis wikis to 1.36.0-wmf.38 (duration: 33m 31s)
  • 18:20 Urbanecm: [urbanecm@mwmaint1002 ~/uploads]$ mwscript importImages.php --wiki=commonswiki --comment-ext=txt --sleep=3600 --user=Sturm . # T278856
  • 18:18 bblack@cumin1001: conftool action : set/pooled=yes; selector: name=cp2036.codfw.wmnet
  • 18:18 bblack: cp2036 - re-pooling via confctl
  • 18:14 bblack: dns2001 - re-enabling and running puppet agent to restore service
  • 18:10 dduvall@deploy1002: Started scap: testwikis wikis to 1.36.0-wmf.38
  • 18:07 andrew@deploy1002: Finished deploy [horizon/deploy@392708e]: Updating Horizon to 'main' to see if that works around T279465 (duration: 04m 10s)
  • 18:03 andrew@deploy1002: Started deploy [horizon/deploy@392708e]: Updating Horizon to 'main' to see if that works around T279465
  • 17:51 bblack: dns2001 - manually disabled puppet and stopped pdns-recursor.service (and thus implicitly BIRD) to manual-depool due to switch port issues
  • 17:49 bblack: cp2036 - explicitly confctl-depooled due to switch issues
  • 17:48 bblack@cumin1001: conftool action : set/pooled=no; selector: name=cp2036.codfw.wmnet
  • 17:05 andrew@deploy1002: Finished deploy [horizon/deploy@df2b0b4]: Upgrade to Horizon/Wallaby (take two) (duration: 03m 30s)
  • 17:02 andrew@deploy1002: Started deploy [horizon/deploy@df2b0b4]: Upgrade to Horizon/Wallaby (take two)
  • 16:28 andrew@deploy1002: Finished deploy [horizon/deploy@df2b0b4]: Upgrade to Horizon/Wallaby (duration: 04m 32s)
  • 16:23 andrew@deploy1002: Started deploy [horizon/deploy@df2b0b4]: Upgrade to Horizon/Wallaby
  • 16:16 krinkle@deploy1002: Synchronized php-1.36.0-wmf.37/skins/Vector/: I3234e7712b8c1 (duration: 01m 01s)
  • 15:49 Urbanecm: Start server-side upload for 3 video files (T279189, T279188, T279183)
  • 15:31 marostegui@cumin1001: dbctl commit (dc=all): 'db1163 (re)pooling @ 100%: Repool db1163', diff saved to https://phabricator.wikimedia.org/P15182 and previous config saved to /var/cache/conftool/dbconfig/20210406-153123-root.json
  • 15:16 marostegui@cumin1001: dbctl commit (dc=all): 'db1163 (re)pooling @ 75%: Repool db1163', diff saved to https://phabricator.wikimedia.org/P15180 and previous config saved to /var/cache/conftool/dbconfig/20210406-151619-root.json
  • 15:03 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on sretest1002.eqiad.wmnet with reason: bullseye tests
  • 15:03 jmm@cumin2001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on sretest1002.eqiad.wmnet with reason: bullseye tests
  • 15:01 marostegui@cumin1001: dbctl commit (dc=all): 'db1163 (re)pooling @ 50%: Repool db1163', diff saved to https://phabricator.wikimedia.org/P15179 and previous config saved to /var/cache/conftool/dbconfig/20210406-150115-root.json
  • 14:46 marostegui@cumin1001: dbctl commit (dc=all): 'db1163 (re)pooling @ 25%: Repool db1163', diff saved to https://phabricator.wikimedia.org/P15178 and previous config saved to /var/cache/conftool/dbconfig/20210406-144612-root.json
  • 14:31 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on 97 hosts with reason: upgrading openstack
  • 14:31 aborrero@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on 97 hosts with reason: upgrading openstack
  • 14:30 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on 9 hosts with reason: upgrading openstack
  • 14:30 aborrero@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on 9 hosts with reason: upgrading openstack
  • 14:29 dcaro: populated thirdparty/ceph-octopus buster repo with reprepro (T274566)
  • 14:22 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sretest1002.eqiad.wmnet
  • 14:18 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host sretest1002.eqiad.wmnet
  • 13:57 moritzm: upgrading sretest1002 to bullseye
  • 13:44 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1163 for schema change', diff saved to https://phabricator.wikimedia.org/P15177 and previous config saved to /var/cache/conftool/dbconfig/20210406-134418-marostegui.json
  • 13:37 Urbanecm: Retrying server-side upload for 1 file (T279192)
  • 13:20 Urbanecm: Start server-side upload for 4 video files (T279191, T279192, T279193, T279190)
  • 12:46 marostegui@cumin1001: dbctl commit (dc=all): 'db1164 (re)pooling @ 100%: Repool db1164', diff saved to https://phabricator.wikimedia.org/P15176 and previous config saved to /var/cache/conftool/dbconfig/20210406-124614-root.json
  • 12:43 jayme@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
  • 12:43 jayme@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
  • 12:42 jayme@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
  • 12:42 jayme@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
  • 12:31 marostegui@cumin1001: dbctl commit (dc=all): 'db1164 (re)pooling @ 75%: Repool db1164', diff saved to https://phabricator.wikimedia.org/P15175 and previous config saved to /var/cache/conftool/dbconfig/20210406-123111-root.json
  • 12:28 moritzm: installing netty security updates
  • 12:16 marostegui@cumin1001: dbctl commit (dc=all): 'db1164 (re)pooling @ 50%: Repool db1164', diff saved to https://phabricator.wikimedia.org/P15174 and previous config saved to /var/cache/conftool/dbconfig/20210406-121607-root.json
  • 12:01 marostegui@cumin1001: dbctl commit (dc=all): 'db1164 (re)pooling @ 25%: Repool db1164', diff saved to https://phabricator.wikimedia.org/P15173 and previous config saved to /var/cache/conftool/dbconfig/20210406-120104-root.json
  • 11:57 moritzm: installing openjpeg2 security updates on buster
  • 11:43 moritzm: removed mw2247 from debmonitor T277780
  • 11:37 jayme@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
  • 11:37 jayme@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
  • 11:37 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 11:37 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 11:28 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1164 for schema change', diff saved to https://phabricator.wikimedia.org/P15172 and previous config saved to /var/cache/conftool/dbconfig/20210406-112839-marostegui.json
  • 11:07 ladsgroup@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Disable legacy javascript globals on all wikis except some big ones (T72470) (duration: 01m 01s)
  • 10:57 moritzm: upload wmf-laptop 0.5.1 to buster-wikimedia component/wmf-sre-laptop
  • 10:55 moritzm: remove wmf-laptop 0.5.0 from buster-wikimedia (incorrect import to main, next upload will land in component/wmf-sre-laptop)
  • 10:33 jayme@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'sync'.
  • 10:31 jayme@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'sync'.
  • 10:03 marostegui@cumin1001: dbctl commit (dc=all): 'db1169 (re)pooling @ 100%: Repool db1169', diff saved to https://phabricator.wikimedia.org/P15171 and previous config saved to /var/cache/conftool/dbconfig/20210406-100329-root.json
  • 09:52 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-coord1002.eqiad.wmnet with reason: REIMAGE
  • 09:50 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-coord1002.eqiad.wmnet with reason: REIMAGE
  • 09:49 Urbanecm: Start server-side upload for 1 video file (T279418)
  • 09:48 marostegui@cumin1001: dbctl commit (dc=all): 'db1169 (re)pooling @ 75%: Repool db1169', diff saved to https://phabricator.wikimedia.org/P15170 and previous config saved to /var/cache/conftool/dbconfig/20210406-094825-root.json
  • 09:41 Urbanecm: Start server side upload for 4 video files (T279197, T279196, T279195, T279194)
  • 09:33 marostegui@cumin1001: dbctl commit (dc=all): 'db1169 (re)pooling @ 50%: Repool db1169', diff saved to https://phabricator.wikimedia.org/P15169 and previous config saved to /var/cache/conftool/dbconfig/20210406-093322-root.json
  • 09:32 jbond@cumin1001: END (PASS) - Cookbook sre.puppet.renew-cert (exit_code=0) for conf2003.codfw.wmnet: Renew puppet certificate - jbond@cumin1001
  • 09:31 jbond@cumin1001: START - Cookbook sre.puppet.renew-cert for conf2003.codfw.wmnet: Renew puppet certificate - jbond@cumin1001
  • 09:30 jbond@cumin1001: END (PASS) - Cookbook sre.puppet.renew-cert (exit_code=0) for conf2002.codfw.wmnet: Renew puppet certificate - jbond@cumin1001
  • 09:29 jbond@cumin1001: START - Cookbook sre.puppet.renew-cert for conf2002.codfw.wmnet: Renew puppet certificate - jbond@cumin1001
  • 09:29 jbond@cumin1001: END (PASS) - Cookbook sre.puppet.renew-cert (exit_code=0) for kraz.wikimedia.org: Renew puppet certificate - jbond@cumin1001
  • 09:28 jbond@cumin1001: START - Cookbook sre.puppet.renew-cert for kraz.wikimedia.org: Renew puppet certificate - jbond@cumin1001
  • 09:28 jbond42: renew puppet cert for kraz T279410
  • 09:18 marostegui@cumin1001: dbctl commit (dc=all): 'db1169 (re)pooling @ 25%: Repool db1169', diff saved to https://phabricator.wikimedia.org/P15168 and previous config saved to /var/cache/conftool/dbconfig/20210406-091818-root.json
  • 08:32 marostegui@cumin1001: dbctl commit (dc=all): 'Remove weight from es4 master', diff saved to https://phabricator.wikimedia.org/P15167 and previous config saved to /var/cache/conftool/dbconfig/20210406-083248-marostegui.json
  • 08:07 moritzm: installing underscore security updates on buster
  • 07:59 marostegui@cumin1001: dbctl commit (dc=all): 'es1022 (re)pooling @ 100%: Repool es1022', diff saved to https://phabricator.wikimedia.org/P15166 and previous config saved to /var/cache/conftool/dbconfig/20210406-075957-root.json
  • 07:44 marostegui@cumin1001: dbctl commit (dc=all): 'es1022 (re)pooling @ 75%: Repool es1022', diff saved to https://phabricator.wikimedia.org/P15165 and previous config saved to /var/cache/conftool/dbconfig/20210406-074453-root.json
  • 07:29 marostegui@cumin1001: dbctl commit (dc=all): 'es1022 (re)pooling @ 50%: Repool es1022', diff saved to https://phabricator.wikimedia.org/P15164 and previous config saved to /var/cache/conftool/dbconfig/20210406-072950-root.json
  • 07:20 godog: swift eqiad-prod: less weight for ms-be[1019-1026] / more weight to ms-be106[0-3] - T272836 T268435
  • 07:14 marostegui@cumin1001: dbctl commit (dc=all): 'es1022 (re)pooling @ 25%: Repool es1022', diff saved to https://phabricator.wikimedia.org/P15162 and previous config saved to /var/cache/conftool/dbconfig/20210406-071446-root.json
  • 06:55 marostegui@cumin1001: dbctl commit (dc=all): 'db1149 (re)pooling @ 100%: Repool db1149', diff saved to https://phabricator.wikimedia.org/P15161 and previous config saved to /var/cache/conftool/dbconfig/20210406-065539-root.json
  • 06:51 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1169 for schema change', diff saved to https://phabricator.wikimedia.org/P15160 and previous config saved to /var/cache/conftool/dbconfig/20210406-065131-marostegui.json
  • 06:40 marostegui@cumin1001: dbctl commit (dc=all): 'db1149 (re)pooling @ 75%: Repool db1149', diff saved to https://phabricator.wikimedia.org/P15159 and previous config saved to /var/cache/conftool/dbconfig/20210406-064036-root.json
  • 06:39 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es1022 for upgrade', diff saved to https://phabricator.wikimedia.org/P15158 and previous config saved to /var/cache/conftool/dbconfig/20210406-063938-marostegui.json
  • 06:38 marostegui@cumin1001: dbctl commit (dc=all): 'Repool es1020', diff saved to https://phabricator.wikimedia.org/P15157 and previous config saved to /var/cache/conftool/dbconfig/20210406-063858-marostegui.json
  • 06:38 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es1020 for upgrade', diff saved to https://phabricator.wikimedia.org/P15156 and previous config saved to /var/cache/conftool/dbconfig/20210406-063759-marostegui.json
  • 06:25 marostegui@cumin1001: dbctl commit (dc=all): 'db1149 (re)pooling @ 50%: Repool db1149', diff saved to https://phabricator.wikimedia.org/P15155 and previous config saved to /var/cache/conftool/dbconfig/20210406-062532-root.json
  • 06:15 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1076 for decommission T274752', diff saved to https://phabricator.wikimedia.org/P15154 and previous config saved to /var/cache/conftool/dbconfig/20210406-061500-marostegui.json
  • 06:10 marostegui@cumin1001: dbctl commit (dc=all): 'db1149 (re)pooling @ 25%: Repool db1149', diff saved to https://phabricator.wikimedia.org/P15153 and previous config saved to /var/cache/conftool/dbconfig/20210406-061028-root.json
  • 05:53 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1149 for upgrade', diff saved to https://phabricator.wikimedia.org/P15152 and previous config saved to /var/cache/conftool/dbconfig/20210406-055324-marostegui.json
  • 05:34 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2106 and db2147 after a crash', diff saved to https://phabricator.wikimedia.org/P15151 and previous config saved to /var/cache/conftool/dbconfig/20210406-053427-marostegui.json
  • 02:18 eileen: civicrm revision changed from 740e49d868 to eb9379daa3, config revision is 6779e3829a
  • 01:55 pt1979@cumin2001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 01:47 pt1979@cumin2001: START - Cookbook sre.dns.netbox

2021-04-05

  • 23:17 AaronSchulz: Running importMissingLocalNames.php on mwmaint1002 in a screen
  • 20:58 sbassett: re-deploy security patch for T270453 to wmf.37
  • 20:50 sbassett: re-deploy security patch for T270988 to wmf.37
  • 20:43 mholloway-shell@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Add event stream config for android.image_recommendation_interaction (duration: 00m 59s)
  • 19:31 andrew@deploy1002: Finished deploy [horizon/deploy@df2b0b4]: Returning cloudweb2001-dev to Horizon/Wallaby (duration: 01m 41s)
  • 19:30 andrew@deploy1002: Started deploy [horizon/deploy@df2b0b4]: Returning cloudweb2001-dev to Horizon/Wallaby
  • 19:08 andrew@deploy1002: Finished deploy [horizon/deploy@392708e]: Experimental main deploy of Horizon (duration: 02m 04s)
  • 19:06 andrew@deploy1002: Started deploy [horizon/deploy@392708e]: Experimental main deploy of Horizon
  • 18:28 tgr_: Morning deploys done
  • 18:28 tgr@deploy1002: Synchronized dblists/growthexperiments.dblist: Config: Fix growthexperiments.dblist (T275171) (duration: 00m 58s)
  • 18:27 tgr@deploy1002: Synchronized wmf-config/config/frwiki.yaml: Config: Fix growthexperiments.dblist (T275171) (duration: 00m 59s)
  • 17:39 pt1979@cumin2001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:36 pt1979@cumin2001: START - Cookbook sre.dns.netbox
  • 17:05 dpifke@deploy1002: Finished deploy [performance/navtiming@bc5af87]: Deploy https://gerrit.wikimedia.org/r/c/performance/navtiming/+/676006 (duration: 00m 05s)
  • 17:05 dpifke@deploy1002: Started deploy [performance/navtiming@bc5af87]: Deploy https://gerrit.wikimedia.org/r/c/performance/navtiming/+/676006
  • 16:45 Urbanecm: Start server-side upload of 4 video files (T279204, T279201, T279200, T279198)
  • 14:43 XioNoX: push pfw policies - T278970
  • 14:08 marostegui@cumin1001: dbctl commit (dc=all): 'db1099:3311 (re)pooling @ 100%: Repool db1099:3311', diff saved to https://phabricator.wikimedia.org/P15149 and previous config saved to /var/cache/conftool/dbconfig/20210405-140825-root.json
  • 14:07 marostegui@cumin1001: dbctl commit (dc=all): 'db1141 (re)pooling @ 100%: Repool after upgrade', diff saved to https://phabricator.wikimedia.org/P15148 and previous config saved to /var/cache/conftool/dbconfig/20210405-140751-root.json
  • 13:53 marostegui@cumin1001: dbctl commit (dc=all): 'db1099:3311 (re)pooling @ 75%: Repool db1099:3311', diff saved to https://phabricator.wikimedia.org/P15147 and previous config saved to /var/cache/conftool/dbconfig/20210405-135321-root.json
  • 13:52 marostegui@cumin1001: dbctl commit (dc=all): 'db1141 (re)pooling @ 75%: Repool after upgrade', diff saved to https://phabricator.wikimedia.org/P15146 and previous config saved to /var/cache/conftool/dbconfig/20210405-135248-root.json
  • 13:38 marostegui@cumin1001: dbctl commit (dc=all): 'db1099:3311 (re)pooling @ 50%: Repool db1099:3311', diff saved to https://phabricator.wikimedia.org/P15145 and previous config saved to /var/cache/conftool/dbconfig/20210405-133818-root.json
  • 13:37 marostegui@cumin1001: dbctl commit (dc=all): 'db1141 (re)pooling @ 50%: Repool after upgrade', diff saved to https://phabricator.wikimedia.org/P15144 and previous config saved to /var/cache/conftool/dbconfig/20210405-133744-root.json
  • 13:23 marostegui@cumin1001: dbctl commit (dc=all): 'db1099:3311 (re)pooling @ 25%: Repool db1099:3311', diff saved to https://phabricator.wikimedia.org/P15143 and previous config saved to /var/cache/conftool/dbconfig/20210405-132314-root.json
  • 13:22 marostegui@cumin1001: dbctl commit (dc=all): 'db1141 (re)pooling @ 25%: Repool after upgrade', diff saved to https://phabricator.wikimedia.org/P15142 and previous config saved to /var/cache/conftool/dbconfig/20210405-132240-root.json
  • 13:12 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1141 for upgrade', diff saved to https://phabricator.wikimedia.org/P15141 and previous config saved to /var/cache/conftool/dbconfig/20210405-131221-marostegui.json
  • 12:41 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1099:3311 for schema change', diff saved to https://phabricator.wikimedia.org/P15140 and previous config saved to /var/cache/conftool/dbconfig/20210405-124118-marostegui.json
  • 12:37 marostegui@cumin1001: dbctl commit (dc=all): 'db1119 (re)pooling @ 100%: Repool after schema change', diff saved to https://phabricator.wikimedia.org/P15139 and previous config saved to /var/cache/conftool/dbconfig/20210405-123751-root.json
  • 12:22 marostegui@cumin1001: dbctl commit (dc=all): 'db1119 (re)pooling @ 75%: Repool after schema change', diff saved to https://phabricator.wikimedia.org/P15138 and previous config saved to /var/cache/conftool/dbconfig/20210405-122247-root.json
  • 12:07 marostegui@cumin1001: dbctl commit (dc=all): 'db1119 (re)pooling @ 50%: Repool after schema change', diff saved to https://phabricator.wikimedia.org/P15137 and previous config saved to /var/cache/conftool/dbconfig/20210405-120744-root.json
  • 12:03 akosiaris@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts scb[1001-1004].eqiad.wmnet
  • 12:03 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts scb[2001-2006].codfw.wmnet
  • 11:52 marostegui@cumin1001: dbctl commit (dc=all): 'db1119 (re)pooling @ 25%: Repool after schema change', diff saved to https://phabricator.wikimedia.org/P15136 and previous config saved to /var/cache/conftool/dbconfig/20210405-115240-root.json
  • 11:11 akosiaris@cumin1001: START - Cookbook sre.hosts.decommission for hosts scb[1001-1004].eqiad.wmnet
  • 11:09 akosiaris@cumin1001: START - Cookbook sre.hosts.decommission for hosts scb[2001-2006].codfw.wmnet
  • 11:06 akosiaris@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=99) for hosts scb[2001-2006].codfw.wmnet
  • 11:06 akosiaris@cumin1001: START - Cookbook sre.hosts.decommission for hosts scb[2001-2006].codfw.wmnet
  • 11:05 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1119 for schema change', diff saved to https://phabricator.wikimedia.org/P15135 and previous config saved to /var/cache/conftool/dbconfig/20210405-110506-marostegui.json
  • 10:57 marostegui@cumin1001: dbctl commit (dc=all): 'db1099:3318 (re)pooling @ 100%: Repool db1099:3318', diff saved to https://phabricator.wikimedia.org/P15134 and previous config saved to /var/cache/conftool/dbconfig/20210405-105731-root.json
  • 10:57 marostegui@cumin1001: dbctl commit (dc=all): 'db1099:3311 (re)pooling @ 100%: Repool db1099:3311', diff saved to https://phabricator.wikimedia.org/P15133 and previous config saved to /var/cache/conftool/dbconfig/20210405-105715-root.json
  • 10:42 marostegui@cumin1001: dbctl commit (dc=all): 'db1099:3318 (re)pooling @ 75%: Repool db1099:3318', diff saved to https://phabricator.wikimedia.org/P15132 and previous config saved to /var/cache/conftool/dbconfig/20210405-104227-root.json
  • 10:42 marostegui@cumin1001: dbctl commit (dc=all): 'db1099:3311 (re)pooling @ 75%: Repool db1099:3311', diff saved to https://phabricator.wikimedia.org/P15131 and previous config saved to /var/cache/conftool/dbconfig/20210405-104211-root.json
  • 10:40 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1113 (s5,s6) after upgrade', diff saved to https://phabricator.wikimedia.org/P15130 and previous config saved to /var/cache/conftool/dbconfig/20210405-104010-marostegui.json
  • 10:33 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1113 (s5,s6) for upgrade', diff saved to https://phabricator.wikimedia.org/P15129 and previous config saved to /var/cache/conftool/dbconfig/20210405-103318-marostegui.json
  • 10:33 marostegui@cumin1001: dbctl commit (dc=all): 'db1181 (re)pooling @ 100%: Pool in s7', diff saved to https://phabricator.wikimedia.org/P15128 and previous config saved to /var/cache/conftool/dbconfig/20210405-103301-root.json
  • 10:27 marostegui@cumin1001: dbctl commit (dc=all): 'db1099:3318 (re)pooling @ 50%: Repool db1099:3318', diff saved to https://phabricator.wikimedia.org/P15127 and previous config saved to /var/cache/conftool/dbconfig/20210405-102724-root.json
  • 10:27 marostegui@cumin1001: dbctl commit (dc=all): 'db1099:3311 (re)pooling @ 50%: Repool db1099:3311', diff saved to https://phabricator.wikimedia.org/P15126 and previous config saved to /var/cache/conftool/dbconfig/20210405-102708-root.json
  • 10:17 marostegui@cumin1001: dbctl commit (dc=all): 'db1181 (re)pooling @ 75%: Pool in s7', diff saved to https://phabricator.wikimedia.org/P15125 and previous config saved to /var/cache/conftool/dbconfig/20210405-101757-root.json
  • 10:12 marostegui@cumin1001: dbctl commit (dc=all): 'db1099:3318 (re)pooling @ 25%: Repool db1099:3318', diff saved to https://phabricator.wikimedia.org/P15124 and previous config saved to /var/cache/conftool/dbconfig/20210405-101213-root.json
  • 10:12 marostegui@cumin1001: dbctl commit (dc=all): 'db1099:3311 (re)pooling @ 25%: Repool db1099:3311', diff saved to https://phabricator.wikimedia.org/P15123 and previous config saved to /var/cache/conftool/dbconfig/20210405-101204-root.json
  • 10:02 marostegui@cumin1001: dbctl commit (dc=all): 'db1181 (re)pooling @ 50%: Pool in s7', diff saved to https://phabricator.wikimedia.org/P15122 and previous config saved to /var/cache/conftool/dbconfig/20210405-100253-root.json
  • 10:02 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1099 (s1,s8) for upgrade', diff saved to https://phabricator.wikimedia.org/P15121 and previous config saved to /var/cache/conftool/dbconfig/20210405-100246-marostegui.json
  • 09:50 marostegui: Deploy schema change on s1 codfw, lag will appear in codfw - T276150 T276156
  • 09:47 marostegui@cumin1001: dbctl commit (dc=all): 'db1181 (re)pooling @ 25%: Pool in s7', diff saved to https://phabricator.wikimedia.org/P15120 and previous config saved to /var/cache/conftool/dbconfig/20210405-094744-root.json
  • 09:10 marostegui@cumin1001: dbctl commit (dc=all): 'Give more weight to db1181 T275633', diff saved to https://phabricator.wikimedia.org/P15119 and previous config saved to /var/cache/conftool/dbconfig/20210405-091043-marostegui.json
  • 08:25 marostegui@cumin1001: dbctl commit (dc=all): 'Give more weight to db1181 T275633', diff saved to https://phabricator.wikimedia.org/P15118 and previous config saved to /var/cache/conftool/dbconfig/20210405-082521-marostegui.json
  • 08:05 marostegui@cumin1001: dbctl commit (dc=all): 'db1121 (re)pooling @ 100%: Repool after schema change', diff saved to https://phabricator.wikimedia.org/P15117 and previous config saved to /var/cache/conftool/dbconfig/20210405-080523-root.json
  • 07:50 marostegui@cumin1001: dbctl commit (dc=all): 'db1121 (re)pooling @ 75%: Repool after schema change', diff saved to https://phabricator.wikimedia.org/P15116 and previous config saved to /var/cache/conftool/dbconfig/20210405-075019-root.json
  • 07:35 marostegui@cumin1001: dbctl commit (dc=all): 'db1121 (re)pooling @ 50%: Repool after schema change', diff saved to https://phabricator.wikimedia.org/P15115 and previous config saved to /var/cache/conftool/dbconfig/20210405-073515-root.json
  • 07:20 marostegui@cumin1001: dbctl commit (dc=all): 'db1121 (re)pooling @ 25%: Repool after schema change', diff saved to https://phabricator.wikimedia.org/P15114 and previous config saved to /var/cache/conftool/dbconfig/20210405-072012-root.json
  • 06:47 marostegui@cumin1001: dbctl commit (dc=all): 'Pool db1181 in s7 with minimal weight T275633', diff saved to https://phabricator.wikimedia.org/P15113 and previous config saved to /var/cache/conftool/dbconfig/20210405-064727-marostegui.json
  • 05:49 marostegui@cumin1001: dbctl commit (dc=all): 'Pool db1181 in s7 with minimal weight T275633', diff saved to https://phabricator.wikimedia.org/P15112 and previous config saved to /var/cache/conftool/dbconfig/20210405-054951-marostegui.json
  • 05:30 marostegui: Deploy schema change on db1121, lag will appear on s4 on wikireplicas
  • 05:30 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1121 for schema change', diff saved to https://phabricator.wikimedia.org/P15111 and previous config saved to /var/cache/conftool/dbconfig/20210405-053000-marostegui.json
  • 05:12 marostegui: Restart all sanitarium hosts to pick up new filters T278573

2021-04-04

  • 14:47 andrew@deploy1002: Finished deploy [horizon/deploy@df2b0b4]: upgrade labtesthorizon to the Wallaby branch (duration: 01m 36s)
  • 14:45 andrew@deploy1002: Started deploy [horizon/deploy@df2b0b4]: upgrade labtesthorizon to the Wallaby branch

2021-04-03

  • 19:20 andrew@deploy1002: Finished deploy [horizon/deploy@df2b0b4]: upgrade labtesthorizon to the Wallaby branch (duration: 02m 11s)
  • 19:18 andrew@deploy1002: Started deploy [horizon/deploy@df2b0b4]: upgrade labtesthorizon to the Wallaby branch
  • 17:30 andrew@deploy1002: Finished deploy [horizon/deploy@3a84c77]: upgrade labtesthorizon to the Wallaby branch (duration: 03m 35s)
  • 17:26 andrew@deploy1002: Started deploy [horizon/deploy@3a84c77]: upgrade labtesthorizon to the Wallaby branch
  • 16:44 elukey: power reset for ms-be2028 - not reachable via ssh, no tty available via mgmt console, NMI unrecoverable errors logged in iLo's system logs
  • 15:35 andrew@deploy1002: Finished deploy [horizon/deploy@3a84c77]: upgrade labtesthorizon to the Wallaby branch (duration: 02m 18s)
  • 15:33 andrew@deploy1002: Started deploy [horizon/deploy@3a84c77]: upgrade labtesthorizon to the Wallaby branch
  • 15:12 andrew@deploy1002: Finished deploy [horizon/deploy@8833f80]: upgrade labtesthorizon to the Wallaby branch (duration: 11m 51s)
  • 15:00 andrew@deploy1002: Started deploy [horizon/deploy@8833f80]: upgrade labtesthorizon to the Wallaby branch
  • 05:38 andrew@deploy1002: Finished deploy [horizon/deploy@35199a3]: upgrade labtesthorizon to the Wallaby branch (duration: 03m 05s)
  • 05:35 andrew@deploy1002: Started deploy [horizon/deploy@35199a3]: upgrade labtesthorizon to the Wallaby branch

2021-04-02

  • 22:31 bstorm@cumin1001: END (PASS) - Cookbook wmcs.wikireplicas.add_wiki (exit_code=0)
  • 22:31 bstorm@cumin1001: Added views for new wiki: trvwiki T276246
  • 22:08 bstorm@cumin1001: START - Cookbook wmcs.wikireplicas.add_wiki
  • 22:08 mutante: pooled mw2395,mw2396 as API appservers running on new hardware
  • 22:08 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw239[5-6].codfw.wmnet
  • 21:58 legoktm: legoktm@lists1002:~$ time sudo mailman-web rebuild_index
  • 21:56 dzahn@cumin1001: conftool action : set/weight=30; selector: name=mw239[5-6].codfw.wmnet
  • 21:55 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw239[5-6].codfw.wmnet
  • 21:48 mutante: mw2395, mw2396 - reboot - becoming API servers
  • 21:43 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw239[0-4].codfw.wmnet
  • 21:42 mutante: pooled 12 brand-new codfw appservers running on new hardware generation
  • 21:41 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw238[5-9].codfw.wmnet
  • 21:40 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2384.codfw.wmnet
  • 21:40 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2383.codfw.wmnet
  • 21:38 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw[2395-2396].codfw.wmnet with reason: new_install
  • 21:38 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw[2395-2396].codfw.wmnet with reason: new_install
  • 21:37 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on moss-fe1002.eqiad.wmnet with reason: REIMAGE
  • 21:35 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on moss-fe1001.eqiad.wmnet with reason: REIMAGE
  • 21:35 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on moss-fe1002.eqiad.wmnet with reason: REIMAGE
  • 21:35 dzahn@cumin1001: conftool action : set/weight=30; selector: name=mw239[0-4].codfw.wmnet
  • 21:34 dzahn@cumin1001: conftool action : set/weight=30; selector: name=mw238[3-9].codfw.wmnet
  • 21:33 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on moss-fe1001.eqiad.wmnet with reason: REIMAGE
  • 21:28 legoktm: imported python-xapian-haystack 2.1.0-6~wmf1 on apt1001 (T278717)
  • 21:24 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2394.codfw.wmnet
  • 21:24 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2393.codfw.wmnet
  • 21:24 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2392.codfw.wmnet
  • 21:22 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2391.codfw.wmnet
  • 21:22 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2390.codfw.wmnet
  • 21:22 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2389.codfw.wmnet
  • 21:21 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2388.codfw.wmnet
  • 21:21 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2387.codfw.wmnet
  • 21:21 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2386.codfw.wmnet
  • 21:21 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2385.codfw.wmnet
  • 21:21 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2384.codfw.wmnet
  • 21:21 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2383.codfw.wmnet
  • 21:19 mutante: generating mcrouter certs for mw2395 through mw2404 (T278396)
  • 21:07 mutante: mw2383 through mw2394 - 'uptime && scap pull' via ssh -C (not cumin because it needs to run as non-root)
  • 20:58 mutante: mw238* - scap pull via cumin not possible because it doesnt work as root
  • 20:50 andrew@deploy1002: Finished deploy [horizon/deploy@86c7cdc]: tweak to affinity group options (duration: 03m 39s)
  • 20:46 andrew@deploy1002: Started deploy [horizon/deploy@86c7cdc]: tweak to affinity group options
  • 20:44 mutante: mw2385 through mw2394 - serial rebooting
  • 20:43 mutante: mw2384 reboot
  • 20:43 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw[2390-2394].codfw.wmnet with reason: new_install
  • 20:43 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw[2390-2394].codfw.wmnet with reason: new_install
  • 20:43 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on 10 hosts with reason: new_install
  • 20:43 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on 10 hosts with reason: new_install
  • 20:40 andrew@deploy1002: Finished deploy [horizon/deploy@86c7cdc]: update horizon for codfw1dev (duration: 01m 47s)
  • 20:39 andrew@deploy1002: Started deploy [horizon/deploy@86c7cdc]: update horizon for codfw1dev
  • 20:09 bstorm@cumin1001: END (PASS) - Cookbook wmcs.wikireplicas.add_wiki (exit_code=0)
  • 20:09 bstorm@cumin1001: Added views for new wiki: taywiki T275836
  • 19:47 bstorm@cumin1001: START - Cookbook wmcs.wikireplicas.add_wiki
  • 19:29 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2383.codfw.wmnet with reason: new_install
  • 19:29 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2383.codfw.wmnet with reason: new_install
  • 19:07 bstorm@cumin1001: END (PASS) - Cookbook wmcs.wikireplicas.add_wiki (exit_code=0)
  • 19:07 bstorm@cumin1001: Added views for new wiki: mnwwiktionary T276126
  • 18:44 bstorm@cumin1001: START - Cookbook wmcs.wikireplicas.add_wiki
  • 18:44 mutante: [puppetmaster1001:~] $ sudo puppet node deactivate mw2247.codfw.wmnet
  • 18:28 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts mw2247.codfw.wmnet
  • 18:20 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw2247.codfw.wmnet
  • 17:57 legoktm: upgraded mailman3 python3-django-postorius on lists1002
  • 15:48 kharlan@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'linkrecommendation' for release 'external' .
  • 15:48 kharlan@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'linkrecommendation' for release 'production' .
  • 15:45 kharlan@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'linkrecommendation' for release 'external' .
  • 15:45 kharlan@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'linkrecommendation' for release 'production' .
  • 15:41 kharlan@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'linkrecommendation' for release 'staging' .
  • 14:35 jiji@cumin1001: conftool action : set/weight=20; selector: cluster=jobrunner,name=mw133[7-8].eqiad.wmnet
  • 14:34 jiji@cumin1001: conftool action : set/weight=20; selector: cluster=videoscaler,name=mw133[5-6].eqiad.wmnet
  • 14:32 jiji@cumin1001: conftool action : set/pooled=no; selector: cluster=jobrunner,name=mw133[5-6].eqiad.wmnet
  • 14:31 jiji@cumin1001: conftool action : set/pooled=no; selector: cluster=videoscaler,name=mw133[7-8].eqiad.wmnet
  • 14:30 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-test-coord1001.eqiad.wmnet with reason: REIMAGE
  • 14:29 jiji@cumin1001: conftool action : set/pooled=no; selector: cluster=videoscaler,name=mw1111.eqiad.wmnet
  • 14:28 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-test-coord1001.eqiad.wmnet with reason: REIMAGE
  • 14:20 Urbanecm: Start server-side upload for 3 video files (T279060, T279061, T279062)
  • 14:09 Urbanecm: Start server-side upload for 3 video files (T279138, T279137, T279136)
  • 13:42 hashar@deploy1002: rebuilt and synchronized wikiversions files: all wikis to 1.36.0-wmf.37
  • 13:14 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-test-master1001.eqiad.wmnet with reason: REIMAGE
  • 13:12 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-test-master1001.eqiad.wmnet with reason: REIMAGE
  • 13:11 reedy@deploy1002: Synchronized php-1.36.0-wmf.37/load.php: T278579 (duration: 00m 58s)
  • 13:10 reedy@deploy1002: Synchronized php-1.36.0-wmf.37/includes/OutputHandler.php: T278579 (duration: 00m 57s)
  • 13:08 reedy@deploy1002: Synchronized php-1.36.0-wmf.37/includes/MediaWiki.php: T278579 (duration: 00m 58s)
  • 11:46 Urbanecm: correction: Start server-side upload for 3 video files (T279079, T279080, T279104)
  • 11:45 Urbanecm: Start server-side upload for 3 images (T279079, T279080, T279104)
  • 10:54 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-test-master1002.eqiad.wmnet with reason: REIMAGE
  • 10:52 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-test-master1002.eqiad.wmnet with reason: REIMAGE
  • 10:14 akosiaris@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'cxserver' for release 'staging' .
  • 10:14 akosiaris@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'cxserver' for release 'production' .
  • 10:12 akosiaris@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'cxserver' for release 'production' .
  • 10:12 akosiaris@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'cxserver' for release 'staging' .
  • 10:11 akosiaris@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'cxserver' for release 'staging' .
  • 10:11 akosiaris@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'cxserver' for release 'production' .
  • 10:07 hashar@deploy1002: rebuilt and synchronized wikiversions files: Rollback group0 wikis to 1.36.0-wmf.36 - T278343
  • 09:45 hashar@deploy1002: rebuilt and synchronized wikiversions files: Revert group1 and group2 wikis to 1.36.0-wmf.36 - T278343
  • 09:44 hashar@deploy1002: sync-wikiversions aborted: Revert group1 and group2 wikis to 1.36.0-wmf.36 (duration: 00m 01s)
  • 09:06 dcausse: remove dumps from wdqs1009 to free disk space
  • 07:33 effie: powercycle an-worker1080
  • 07:28 elukey: manual fix for an-worker1080's interface in netbox (xe-4/0/11), moved by mistake to public-1b
  • 03:54 dwisehaupt: replication user on fundraising db set to require ssl for connections at the mysql user level. db updated on frdb1004 and verified on a set of hosts
  • 03:16 dwisehaupt: replication user on payments db set to require ssl for connections at the mysql user level. db updated on payments1001 and verified on a set of hosts

2021-04-01

  • 23:32 thcipriani@deploy1002: Synchronized php-1.36.0-wmf.37/extensions/WikimediaEvents/modules/ext.wikimediaEvents/searchSatisfaction.js: Backport: Revert "Turn on glent m1 AB test" T262612 (duration: 00m 58s)
  • 23:28 thcipriani: reset /srv/mediawiki-staging/php-1.36.0-wmf.37/extensions/TimedMediaHandler to 1be781d (HEAD of wmf/1.36.0-wmf.37 -- from HEAD of master 49f417)
  • 23:12 thcipriani@deploy1002: Synchronized wmf-config/logos.php: Backport: Part III Add hi-res version of mediawiki.org logos T268230 (duration: 00m 57s)
  • 23:10 thcipriani@deploy1002: Synchronized logos: Backport: Part II Add hi-res version of mediawiki.org logos T268230 (duration: 00m 57s)
  • 23:08 thcipriani@deploy1002: Synchronized static: Backport: Part I Add hi-res version of mediawiki.org logos T268230 (duration: 00m 59s)
  • 22:53 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw2248.codfw.wmnet
  • 22:50 twentyafterfour@deploy1002: Finished deploy [releng/phatality@27ddd0b]: deploy phatality (duration: 00m 13s)
  • 22:50 twentyafterfour@deploy1002: Started deploy [releng/phatality@27ddd0b]: deploy phatality
  • 22:49 twentyafterfour: deploying phatality
  • 22:34 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw2248.codfw.wmnet
  • 22:33 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts mw2247.codfw.wmnet
  • 22:17 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw2247.codfw.wmnet
  • 22:12 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw2246.codfw.wmnet
  • 22:00 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw2246.codfw.wmnet
  • 21:58 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw2243.codfw.wmnet
  • 21:36 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw2243.codfw.wmnet
  • 20:42 mutante: mw2243, mw2246, mw2247, mw2248 - depooled - replaced by mw2379, mw2380, mw2381, mw2382 ( T277780)
  • 20:41 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2248.codfw.wmnet
  • 20:41 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2247.codfw.wmnet
  • 20:41 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2246.codfw.wmnet
  • 20:41 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2243.codfw.wmnet
  • 20:23 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2382.codfw.wmnet
  • 20:23 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2381.codfw.wmnet
  • 20:21 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2380.codfw.wmnet
  • 20:20 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2379.codfw.wmnet
  • 20:06 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2379.codfw.wmnet with reason: new_install
  • 20:06 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2379.codfw.wmnet with reason: new_install
  • 20:06 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2380.codfw.wmnet with reason: new_install
  • 20:05 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2380.codfw.wmnet with reason: new_install
  • 20:05 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2382.codfw.wmnet with reason: new_install
  • 20:05 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2382.codfw.wmnet with reason: new_install
  • 20:05 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2381.codfw.wmnet with reason: new_install
  • 20:05 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2381.codfw.wmnet with reason: new_install
  • 20:01 razzi@deploy1002: Finished deploy [analytics/superset/deploy@5b8de4c]: Deployment of superset fd7c9eb71e193, released after 1.0.1 (duration: 00m 04s)
  • 20:01 razzi@deploy1002: Started deploy [analytics/superset/deploy@5b8de4c]: Deployment of superset fd7c9eb71e193, released after 1.0.1
  • 20:01 razzi@deploy1002: deploy aborted: Deployment of superset fd7c9eb71e193, released after 1.0.1hv (duration: 00m 00s)
  • 20:01 mutante: mw2379, mw2380, mw2381, mw2382 - scap pull
  • 19:59 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2382.codfw.wmnet
  • 19:59 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2381.codfw.wmnet
  • 19:59 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2380.codfw.wmnet
  • 19:59 razzi@deploy1002: Finished deploy [analytics/superset/deploy@5b8de4c]: Deployment of superset fd7c9eb71e193, released after 1.0.1 (duration: 00m 21s)
  • 19:59 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2379.codfw.wmnet
  • 19:58 razzi@deploy1002: Started deploy [analytics/superset/deploy@5b8de4c]: Deployment of superset fd7c9eb71e193, released after 1.0.1
  • 19:57 kharlan@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'linkrecommendation' for release 'external' .
  • 19:57 kharlan@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'linkrecommendation' for release 'production' .
  • 19:56 razzi@deploy1002: Finished deploy [analytics/superset/deploy@5b8de4c]: Deployment of superset fd7c9eb71e193, released after 1.0.1 (duration: 00m 12s)
  • 19:56 razzi@deploy1002: Started deploy [analytics/superset/deploy@5b8de4c]: Deployment of superset fd7c9eb71e193, released after 1.0.1
  • 19:54 kharlan@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'linkrecommendation' for release 'production' .
  • 19:54 kharlan@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'linkrecommendation' for release 'external' .
  • 19:51 kharlan@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'linkrecommendation' for release 'staging' .
  • 19:37 mutante: pooled parse2001 again after twentyaftefour rebuilt the l10n cache for wmf.37 which fixed it and made Apache alert recover (T268524)
  • 19:34 mutante: mw2379, mw2380, mw2381, mw2382 - rebooting
  • 19:34 twentyafterfour@deploy1002: scap sync-l10n completed (1.36.0-wmf.37) (duration: 02m 38s)
  • 19:30 mutante: depooled parse2001 because on train deployment it caused "MWException: No localisation cache found for English" and then "HTTP CRITICAL: HTTP/1.1 500 Internal Server Error" (T268524)
  • 19:29 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=parse2001.codfw.wmnet
  • 19:28 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=parse2001.codfw.wmnet
  • 19:27 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=parse2001.codfw.wmnet
  • 19:21 twentyafterfour@deploy1002: rebuilt and synchronized wikiversions files: group2 wikis to 1.36.0-wmf.37 refs T278343
  • 18:59 mutante: creating mcrouter certs for mw2379 thorugh mw2382
  • 18:35 Urbanecm: Morning B&C window done
  • 18:33 urbanecm@deploy1002: Synchronized php-1.36.0-wmf.37/extensions/WikibaseMediaInfo/resources/mediasearch-vue/components/base/Dialog.vue: e77f2b9: Use appendChild() instead of append() (T278448) (duration: 01m 09s)
  • 18:31 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: b485d1c: Enable SandboxLink extension in ptwikinews (T278634) (duration: 01m 12s)
  • 17:49 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on bast1003.wikimedia.org with reason: REIMAGE
  • 17:47 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on bast1003.wikimedia.org with reason: REIMAGE
  • 17:24 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:21 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 16:59 Urbanecm: Start server-side upload of two files (T279082, T279081)
  • 16:44 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=aqs1007.eqiad.wmnet
  • 16:39 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: a7acf33: hrwiki: Fix help panel links (T275684) (duration: 01m 10s)
  • 16:25 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mw2396.codfw.wmnet with reason: REIMAGE
  • 16:23 pt1979@cumin2001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2396.codfw.wmnet with reason: REIMAGE
  • 16:02 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mw2395.codfw.wmnet with reason: REIMAGE
  • 16:00 pt1979@cumin2001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2395.codfw.wmnet with reason: REIMAGE
  • 15:58 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mw2394.codfw.wmnet with reason: REIMAGE
  • 15:56 pt1979@cumin2001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2394.codfw.wmnet with reason: REIMAGE
  • 15:39 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mw2393.codfw.wmnet with reason: REIMAGE
  • 15:37 pt1979@cumin2001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2393.codfw.wmnet with reason: REIMAGE
  • 15:32 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mw2391.codfw.wmnet with reason: REIMAGE
  • 15:30 pt1979@cumin2001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2391.codfw.wmnet with reason: REIMAGE
  • 15:05 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mw2392.codfw.wmnet with reason: REIMAGE
  • 15:03 pt1979@cumin2001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2392.codfw.wmnet with reason: REIMAGE
  • 14:52 volans: uploaded python3-wmflib_0.0.7 to bullseye-wikimedia
  • 14:41 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mw2390.codfw.wmnet with reason: REIMAGE
  • 14:39 pt1979@cumin2001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2390.codfw.wmnet with reason: REIMAGE
  • 14:22 effie: disable puppet on mw* canaries, rolling depool and pooling of canaries
  • 14:06 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-test-worker1001.eqiad.wmnet with reason: REIMAGE
  • 14:04 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-test-worker1001.eqiad.wmnet with reason: REIMAGE
  • 14:02 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mw2389.codfw.wmnet with reason: REIMAGE
  • 13:59 pt1979@cumin2001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2389.codfw.wmnet with reason: REIMAGE
  • 13:53 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mw2388.codfw.wmnet with reason: REIMAGE
  • 13:51 pt1979@cumin2001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2388.codfw.wmnet with reason: REIMAGE
  • 13:24 ema: cp3054: reboot with Linux 4.19.181+1 -- the kernel was not upgraded earlier during T273278 reboots due to broken dpkg status
  • 13:16 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1022.eqiad.wmnet
  • 13:07 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ganeti1022.eqiad.wmnet
  • 12:59 dcaro@cumin1001: END (PASS) - Cookbook sre.hosts.upgrade-and-reboot (exit_code=0)
  • 12:53 dcaro@cumin1001: START - Cookbook sre.hosts.upgrade-and-reboot
  • 12:51 dcaro@cumin1001: END (PASS) - Cookbook sre.hosts.upgrade-and-reboot (exit_code=0)
  • 12:47 moritzm: drain ganeti1022
  • 12:46 dcaro@cumin1001: START - Cookbook sre.hosts.upgrade-and-reboot
  • 12:45 dcaro@cumin1001: END (PASS) - Cookbook sre.hosts.upgrade-and-reboot (exit_code=0)
  • 12:45 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1021.eqiad.wmnet
  • 12:40 dcaro@cumin1001: START - Cookbook sre.hosts.upgrade-and-reboot
  • 12:38 dcaro@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudcephosd2003-dev.codfw.wmnet
  • 12:38 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ganeti1021.eqiad.wmnet
  • 12:34 dcaro@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudcephosd2003-dev.codfw.wmnet
  • 12:23 moritzm: drain ganeti1021
  • 12:21 dcaro@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudcephosd2003-dev.codfw.wmnet
  • 12:19 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1020.eqiad.wmnet
  • 12:15 dcaro@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudcephosd2003-dev.codfw.wmnet
  • 12:12 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ganeti1020.eqiad.wmnet
  • 11:59 Urbanecm: Start server upload of two video files (~4 GB in total) # T278856
  • 11:55 moritzm: drain ganeti1020
  • 11:55 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1019.eqiad.wmnet
  • 11:48 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ganeti1019.eqiad.wmnet
  • 11:43 ladsgroup@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Disable RelatedArticles on Timeless skin on German Wikipedia (T278611) (duration: 01m 08s)
  • 11:41 moritzm: drain ganeti1019
  • 11:38 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1018.eqiad.wmnet
  • 11:31 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ganeti1018.eqiad.wmnet
  • {{safesubst:SAL entry|1=11:23 ladsgroup@deploy1002: Synchronized wmf-config/InitialiseSettings.php: [[gerrit:674820|Enable MediaSearch by default for anonymous users (duration: 01m 10s)}}
  • 11:20 moritzm: drain ganeti1018
  • 11:17 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1017.eqiad.wmnet
  • 11:10 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ganeti1017.eqiad.wmnet
  • 11:00 moritzm: drain ganeti1017
  • 10:56 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1011.eqiad.wmnet
  • 10:49 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ganeti1011.eqiad.wmnet
  • 10:39 dcaro@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudcephosd2002-dev.codfw.wmnet
  • 10:33 dcaro@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudcephosd2002-dev.codfw.wmnet
  • 10:33 dcaro@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudcephosd2001-dev.codfw.wmnet
  • 10:26 dcaro@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudcephosd2001-dev.codfw.wmnet
  • 09:07 hashar: contint2001: compressing files with 4 parallel executions: sudo -u jenkins find /srv/jenkins/builds/mediawiki-fresnel-patch-docker -name "*trace.json" -print0|xargs -0 -P4 gzip
  • 09:01 hashar: contint2001: compressing all fresnel trace--trace.json files: sudo -u jenkins find /srv/jenkins/builds/mediawiki-fresnel-patch-docker -name "*trace.json" -exec gzip {} \+ # T249268
  • 08:52 moritzm: drain ganeti1011
  • 08:35 moritzm: failover Ganeti master in eqiad to ganeti1009
  • 08:25 moritzm: installing ldb security updates
  • 08:12 kharlan@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'linkrecommendation' for release 'external' .
  • 08:12 kharlan@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'linkrecommendation' for release 'production' .
  • 08:10 kharlan@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'linkrecommendation' for release 'external' .
  • 08:10 kharlan@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'linkrecommendation' for release 'production' .
  • 08:09 kharlan@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'linkrecommendation' for release 'staging' .
  • 07:58 kharlan@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'linkrecommendation' for release 'production' .
  • 07:58 kharlan@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'linkrecommendation' for release 'external' .
  • 07:55 kharlan@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'linkrecommendation' for release 'staging' .
  • 06:37 elukey: powercycle cp1087 (no ssh, no tty via serial console) - T278729
  • 06:35 elukey@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp1087.eqiad.wmnet
  • 02:50 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mw2386.codfw.wmnet with reason: REIMAGE
  • 02:48 pt1979@cumin2001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2386.codfw.wmnet with reason: REIMAGE
  • 02:21 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mw2387.codfw.wmnet with reason: REIMAGE
  • 02:18 pt1979@cumin2001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2387.codfw.wmnet with reason: REIMAGE
  • 02:16 pt1979@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2384.codfw.wmnet with reason: REIMAGE
  • 02:13 pt1979@cumin2001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2384.codfw.wmnet with reason: REIMAGE
  • 01:53 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mw2385.codfw.wmnet with reason: REIMAGE
  • 01:52 Reedy: `echo "https://www.mediawiki.org/static/images/footer/poweredby_mediawiki_176x62.png" | mwscript purgeList.php --wiki=enwiki` T268230
  • 01:51 pt1979@cumin2001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2385.codfw.wmnet with reason: REIMAGE
  • 01:51 Reedy: `echo "https://www.mediawiki.org/favicon.ico" | mwscript purgeList.php --wiki=enwiki` T268230
  • 01:37 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mw2384.codfw.wmnet with reason: REIMAGE
  • 01:35 pt1979@cumin2001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2384.codfw.wmnet with reason: REIMAGE
  • 01:24 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mw2383.codfw.wmnet with reason: REIMAGE
  • 01:22 pt1979@cumin2001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2383.codfw.wmnet with reason: REIMAGE
  • 01:12 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mw2382.codfw.wmnet with reason: REIMAGE
  • 01:10 pt1979@cumin2001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2382.codfw.wmnet with reason: REIMAGE
  • 00:56 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mw2381.codfw.wmnet with reason: REIMAGE
  • 00:54 pt1979@cumin2001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2381.codfw.wmnet with reason: REIMAGE
  • 00:46 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mw2380.codfw.wmnet with reason: REIMAGE
  • 00:44 pt1979@cumin2001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2380.codfw.wmnet with reason: REIMAGE
  • 00:32 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mw2379.codfw.wmnet with reason: REIMAGE
  • 00:30 pt1979@cumin2001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2379.codfw.wmnet with reason: REIMAGE
  • 00:12 pt1979@cumin2001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 00:08 legoktm: uploaded mailman3 3.2.1-1+wmf1, postorius 1.2.4-1+wmf1 to apt.wikimedia.org
  • 00:07 pt1979@cumin2001: START - Cookbook sre.dns.netbox

2021-03-31

  • 23:34 urbanecm@deploy1002: Synchronized php-1.36.0-wmf.37/extensions/Wikibase/client/includes/DataAccess/Scribunto/: bfc8f55: Eliminate another php.getSetting() from Lua code (duration: 01m 09s)
  • 23:32 urbanecm@deploy1002: Synchronized php-1.36.0-wmf.36/extensions/Wikibase/client/includes/DataAccess/Scribunto/: ad564a0: Eliminate another php.getSetting() from Lua code (duration: 01m 10s)
  • 23:12 jhuneidi@deploy1002: Synchronized .pipeline/config.yaml: Config: Include private folder in restricted image (T276145) (duration: 01m 08s)
  • 23:05 ladsgroup@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Use the new mediawiki logos, part II (T268230) (duration: 01m 11s)
  • 23:03 ladsgroup@deploy1002: Synchronized static: Use the new mediawiki logos, part I (T268230) (duration: 01m 09s)
  • 22:58 Urbanecm: Start server side upload for 3 files
  • 22:01 Urbanecm: Server side upload of three video files (T279011, T278956, T278955)
  • 22:01 eileen: civicrm revision changed from 2fcea570bd to 740e49d868, config revision is 6779e3829a
  • 20:16 robh@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 20:00 dwisehaupt: shifted payments2003 to use gtid for mysql replication.
  • 19:55 robh@cumin1001: START - Cookbook sre.dns.netbox
  • 19:21 twentyafterfour@deploy1002: Synchronized php: group1 wikis to 1.36.0-wmf.37 refs T278343 (duration: 01m 08s)
  • 19:20 twentyafterfour@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.36.0-wmf.37 refs T278343
  • 19:18 robh@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 19:13 twentyafterfour@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.36.0-wmf.37 refs T278343
  • 19:06 robh@cumin1001: START - Cookbook sre.dns.netbox
  • 19:03 twentyafterfour@deploy1002: Synchronized php-1.36.0-wmf.37/includes/Revision/RevisionRecord.php: sync https://gerrit.wikimedia.org/r/c/mediawiki/core/+/675875 to unblock train refs T278376 T278343 (duration: 00m 58s)
  • 17:56 twentyafterfour@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.36.0-wmf.36 refs T278343
  • 17:49 twentyafterfour@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.36.0-wmf.37 refs T278343
  • 17:41 twentyafterfour: The train is now unblocked, promoting to group0 refs T278343
  • 17:01 Urbanecm: Server side upload of three video files (T278959, T278958, T278957)
  • 15:09 hnowlan@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 15:09 hnowlan@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 14:57 papaul: disconnecting ps1-d8-codfw for replacement
  • 14:17 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=aqs1007.eqiad.wmnet
  • 14:02 Urbanecm: Server side upload of two video files (T278961, T278960)
  • 13:48 jynus: retrying s3 snapshot on codfw
  • 13:39 akosiaris: revert mw1412, mw1413, wtp1032, mw2305 to the previous state for T278220
  • 13:34 akosiaris: disabling puppet on role::mediawiki::appserver, role::mediawiki::appserver::api, role::mediawiki::maintenance, role::mediawiki::jobrunner, role::parsoid, role::parsoid::testing T278220
  • 13:00 akosiaris: repool all jobrunners/videoscalers in the respective conftool clusters. The video transcoding backlog has been served we can return to "normal"
  • 12:59 akosiaris: repool all jobrunners/videoscalers in the respective conftool clusters
  • 12:59 akosiaris@cumin1001: conftool action : set/pooled=yes; selector: dc=eqiad,cluster=videoscaler
  • 12:59 akosiaris@cumin1001: conftool action : set/pooled=yes; selector: dc=eqiad,cluster=jobrunner
  • 11:38 awight: EU deployment complete
  • 11:38 awight@deploy1002: Synchronized php-1.36.0-wmf.37/extensions/WikibaseMediaInfo: Backport: Style change to mediasearch logged-in notice close (T274927) Suppress user notice on mobile (T274927) Reset namespace filter on cancel (T276261) (duration: 01m 08s)
  • 11:26 awight@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: vector: Disable WVUI search widget treatment A/B test (T276917) (duration: 01m 08s)
  • 10:48 effie: enable puppet on all mw* servers
  • 10:10 effie: disable puppet on all mw* hosts
  • 09:03 hashar: contint2001: enable puppet again
  • 08:38 hashar: contint2001: stopping Puppet for an Apache config live hack
  • 04:35 eileen: civicrm revision changed from 7040b68c11 to 2fcea570bd, config revision is 6779e3829a
  • 02:37 pt1979@cumin2001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 02:33 pt1979@cumin2001: START - Cookbook sre.dns.netbox
  • 02:22 pt1979@cumin2001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 02:17 pt1979@cumin2001: START - Cookbook sre.dns.netbox
  • 02:06 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on wcqs2003.codfw.wmnet with reason: REIMAGE
  • 02:05 pt1979@cumin2001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 02:04 pt1979@cumin2001: START - Cookbook sre.hosts.downtime for 2:00:00 on wcqs2003.codfw.wmnet with reason: REIMAGE
  • 02:00 pt1979@cumin2001: START - Cookbook sre.dns.netbox
  • 01:37 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on wcqs2002.codfw.wmnet with reason: REIMAGE
  • 01:35 pt1979@cumin2001: START - Cookbook sre.hosts.downtime for 2:00:00 on wcqs2002.codfw.wmnet with reason: REIMAGE
  • 01:15 urbanecm@deploy1002: Synchronized wmf-config/config/gawiki.yaml: 3283ae5: Enable local uploads on Irish Wikipedia (T277723) (duration: 01m 08s)
  • 01:13 urbanecm@deploy1002: Synchronized dblists/commonsuploads.dblist: 3283ae5: Enable local uploads on Irish Wikipedia (T277723) (duration: 01m 08s)
  • 01:07 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on wcqs2001.codfw.wmnet with reason: REIMAGE
  • 01:05 pt1979@cumin2001: START - Cookbook sre.hosts.downtime for 2:00:00 on wcqs2001.codfw.wmnet with reason: REIMAGE

2021-03-30

  • 23:59 Trey314159: reindexing English wikis on elastic@eqiad, elastic@codfw, and cloudelastic (T274200)
  • 23:56 legoktm@deploy1002: Synchronized php-1.36.0-wmf.37/extensions/TimedMediaHandler/extension.json: Allow autoconfirmed users to see Special:TranscodeStatistics by default (T278867) (duration: 01m 08s)
  • 23:53 legoktm@deploy1002: Synchronized php-1.36.0-wmf.36/extensions/TimedMediaHandler/extension.json: Allow autoconfirmed users to see Special:TranscodeStatistics by default (T278867) (duration: 01m 08s)
  • 23:29 Amir1: sudo django-admin hyperkitty_import -l discovery-alerts@lists-next.wikimedia.org discovery-alerts.mbox/discovery-alerts.mbox --pythonpath /usr/share/mailman3-web --settings settings (T278609)
  • 23:27 pt1979@cumin2001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 23:23 pt1979@cumin2001: START - Cookbook sre.dns.netbox
  • 23:15 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: ef306a3: Growth features: bnwiki: Enable impact module (T274793) (duration: 01m 07s)
  • 22:52 cstone: civicrm revision changed from ad430721f6 to 7040b68c11
  • 21:11 twentyafterfour@deploy1002: Finished deploy [releng/phatality@fbca60c]: rollback (duration: 00m 12s)
  • 21:11 twentyafterfour@deploy1002: Started deploy [releng/phatality@fbca60c]: rollback
  • 21:05 twentyafterfour@deploy1002: Finished deploy [releng/phatality@fbca60c]: trying again with newly built zip (duration: 00m 12s)
  • 21:05 twentyafterfour@deploy1002: Started deploy [releng/phatality@fbca60c]: trying again with newly built zip
  • 21:02 legoktm: scap pulling on mw1298
  • 20:59 twentyafterfour@deploy1002: Finished deploy [releng/phatality@715d809]: (no justification provided) (duration: 00m 15s)
  • 20:58 twentyafterfour@deploy1002: Started deploy [releng/phatality@715d809]: (no justification provided)
  • 20:58 legoktm: killed remaining ffmpeg on mw1298
  • 20:56 twentyafterfour@deploy1002: Finished deploy [releng/phatality@715d809]: (no justification provided) (duration: 00m 12s)
  • 20:56 twentyafterfour@deploy1002: Started deploy [releng/phatality@715d809]: (no justification provided)
  • 20:53 kharlan@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'linkrecommendation' for release 'external' .
  • 20:52 kharlan@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'linkrecommendation' for release 'external' .
  • 20:41 twentyafterfour@deploy1002: Finished deploy [releng/phatality@715d809]: (no justification provided) (duration: 00m 20s)
  • 20:41 twentyafterfour@deploy1002: Started deploy [releng/phatality@715d809]: (no justification provided)
  • 20:41 kharlan@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'linkrecommendation' for release 'production' .
  • 20:40 kharlan@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'linkrecommendation' for release 'external' .
  • 20:38 kharlan@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'linkrecommendation' for release 'external' .
  • 20:37 kharlan@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'linkrecommendation' for release 'production' .
  • 20:37 twentyafterfour@deploy1002: Finished deploy [releng/phatality@715d809]: (no justification provided) (duration: 00m 31s)
  • 20:36 twentyafterfour@deploy1002: Started deploy [releng/phatality@715d809]: (no justification provided)
  • 20:35 twentyafterfour@deploy1002: Finished deploy [releng/phatality@715d809]: (no justification provided) (duration: 00m 05s)
  • 20:35 twentyafterfour@deploy1002: Started deploy [releng/phatality@715d809]: (no justification provided)
  • 20:34 kharlan@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'linkrecommendation' for release 'staging' .
  • 20:34 twentyafterfour@deploy1002: Started restart [releng/phatality@715d809]: (no justification provided)
  • 20:33 twentyafterfour@deploy1002: Finished scap: testwikis wikis to 1.36.0-wmf.37 refs T278343 (duration: 80m 32s)
  • 20:29 twentyafterfour@deploy1002: Finished deploy [releng/phatality@715d809]: (no justification provided) (duration: 00m 49s)
  • 20:29 twentyafterfour@deploy1002: Started deploy [releng/phatality@715d809]: (no justification provided)
  • 20:28 legoktm@deploy1002: conftool action : set/pooled=yes; selector: cluster=videoscaler,name=mw1307.eqiad.wmnet
  • 20:28 legoktm@deploy1002: conftool action : set/pooled=yes; selector: cluster=videoscaler,name=mw1306.eqiad.wmnet
  • 20:28 legoktm@deploy1002: conftool action : set/pooled=yes; selector: cluster=videoscaler,name=mw1305.eqiad.wmnet
  • 20:28 legoktm@deploy1002: conftool action : set/pooled=yes; selector: cluster=videoscaler,name=mw1304.eqiad.wmnet
  • 20:28 legoktm@deploy1002: conftool action : set/pooled=yes; selector: cluster=videoscaler,name=mw1303.eqiad.wmnet
  • 20:28 legoktm@deploy1002: conftool action : set/pooled=no; selector: cluster=jobrunner,name=mw1307.eqiad.wmnet
  • 20:28 legoktm@deploy1002: conftool action : set/pooled=no; selector: cluster=jobrunner,name=mw1306.eqiad.wmnet
  • 20:27 legoktm@deploy1002: conftool action : set/pooled=no; selector: cluster=jobrunner,name=mw1305.eqiad.wmnet
  • 20:27 legoktm@deploy1002: conftool action : set/pooled=no; selector: cluster=jobrunner,name=mw1304.eqiad.wmnet
  • 20:27 legoktm@deploy1002: conftool action : set/pooled=no; selector: cluster=jobrunner,name=mw1303.eqiad.wmnet
  • 20:26 twentyafterfour: preparing to deploy phatality upgrade to kibana cluster
  • 20:25 legoktm@deploy1002: conftool action : set/pooled=yes; selector: cluster=jobrunner,name=mw1296.eqiad.wmnet
  • 20:25 legoktm@deploy1002: conftool action : set/pooled=yes; selector: cluster=jobrunner,name=mw1298.eqiad.wmnet
  • 20:25 legoktm@deploy1002: conftool action : set/pooled=yes; selector: cluster=jobrunner,name=mw1299.eqiad.wmnet
  • 20:21 joal@deploy1002: Finished deploy [analytics/refinery@1a53e9a] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@1a53e9a] (duration: 04m 29s)
  • 20:20 legoktm@deploy1002: conftool action : set/pooled=no; selector: cluster=videoscaler,name=mw1299.eqiad.wmnet
  • 20:20 legoktm@deploy1002: conftool action : set/pooled=no; selector: cluster=videoscaler,name=mw1298.eqiad.wmnet
  • 20:20 legoktm@deploy1002: conftool action : set/pooled=no; selector: cluster=videoscaler,name=mw1296.eqiad.wmnet
  • 20:16 joal@deploy1002: Started deploy [analytics/refinery@1a53e9a] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@1a53e9a]
  • 20:16 joal@deploy1002: Finished deploy [analytics/refinery@1a53e9a] (thin): Regular analytics weekly train THIN [analytics/refinery@1a53e9a] (duration: 00m 07s)
  • 20:16 joal@deploy1002: Started deploy [analytics/refinery@1a53e9a] (thin): Regular analytics weekly train THIN [analytics/refinery@1a53e9a]
  • 20:15 joal@deploy1002: Finished deploy [analytics/refinery@1a53e9a]: Regular analytics weekly train [analytics/refinery@1a53e9a] (duration: 17m 11s)
  • 20:02 twentyafterfour: when syncing 1.36.0-wmf.37 promote to testwikis, one server failed: server mw1298.eqiad.wmnet and two more appear to be hung because scap is stuck at 2 left 99% without making any progress for a long time now. refs T278343
  • 19:58 bblack@cumin1001: conftool action : set/pooled=yes; selector: name=cp1087.eqiad.wmnet
  • 19:58 joal@deploy1002: Started deploy [analytics/refinery@1a53e9a]: Regular analytics weekly train [analytics/refinery@1a53e9a]
  • 19:58 bblack: repool cp1087 - T278729
  • 19:13 twentyafterfour@deploy1002: Started scap: testwikis wikis to 1.36.0-wmf.37 refs T278343
  • 18:15 pt1979@cumin2001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 18:09 pt1979@cumin2001: START - Cookbook sre.dns.netbox
  • 17:22 legoktm: moved mw[1293-1295] to jobrunners and mw[1300-1302] to videoscalers
  • 17:22 legoktm@deploy1002: conftool action : set/pooled=yes; selector: cluster=videoscaler,name=mw1302.eqiad.wmnet
  • 17:22 legoktm@deploy1002: conftool action : set/pooled=yes; selector: cluster=videoscaler,name=mw1301.eqiad.wmnet
  • 17:21 legoktm@deploy1002: conftool action : set/pooled=yes; selector: cluster=videoscaler,name=mw1300.eqiad.wmnet
  • 17:21 legoktm@deploy1002: conftool action : set/pooled=no; selector: cluster=jobrunner,name=mw1302.eqiad.wmnet
  • 17:21 legoktm@deploy1002: conftool action : set/pooled=no; selector: cluster=jobrunner,name=mw1301.eqiad.wmnet
  • 17:21 legoktm@deploy1002: conftool action : set/pooled=no; selector: cluster=jobrunner,name=mw1300.eqiad.wmnet
  • 17:21 legoktm@deploy1002: conftool action : set/pooled=yes; selector: cluster=jobrunner,name=mw1295.eqiad.wmnet
  • 17:21 legoktm@deploy1002: conftool action : set/pooled=yes; selector: cluster=jobrunner,name=mw1294.eqiad.wmnet
  • 17:21 legoktm@deploy1002: conftool action : set/pooled=yes; selector: cluster=jobrunner,name=mw1293.eqiad.wmnet
  • 17:19 legoktm: killed all ffmpeg on mw1294
  • 17:17 legoktm@deploy1002: conftool action : set/pooled=no; selector: cluster=videoscaler,name=mw1295.eqiad.wmnet
  • 17:17 legoktm@deploy1002: conftool action : set/pooled=no; selector: cluster=videoscaler,name=mw1293.eqiad.wmnet
  • 17:17 legoktm@deploy1002: conftool action : set/pooled=no; selector: cluster=videoscaler,name=mw1294.eqiad.wmnet
  • 17:13 mbsantos@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
  • 17:12 mbsantos@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
  • 17:10 mbsantos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'mobileapps' for release 'staging' .
  • 17:08 mbsantos@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'proton' for release 'production' .
  • 17:05 mbsantos@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'proton' for release 'production' .
  • 17:02 mbsantos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'proton' for release 'production' .
  • 16:40 effie: enable puppet on mw* hosts
  • 16:10 mutante: mw1296 - started ferm
  • 16:10 mutante: mw1308 - started ferm
  • 16:07 akosiaris: split jobrunners/videoscalers clusters in conftool. mw12* become videoscalers, mw13* become jobrunners, killing ffmpeg on mw13*
  • 16:07 mutante: mw1309 - systemctl start ferm
  • 16:07 akosiaris@cumin1001: conftool action : set/pooled=no; selector: dc=eqiad,cluster=jobrunner,name=mw12.*
  • 16:06 akosiaris@cumin1001: conftool action : set/pooled=no; selector: dc=eqiad,cluster=videoscaler,name=mw13.*
  • 16:06 akosiaris@cumin1001: conftool action : set/pooled=yes; selector: dc=eqiad,cluster=videoscaler,name=mw12.*
  • 15:59 akosiaris: depool a number of hosts from videoscalers
  • 15:59 akosiaris@cumin1001: conftool action : set/pooled=no; selector: dc=eqiad,cluster=videoscaler,name=mw12.*
  • 15:55 legoktm@deploy1002: conftool action : set/pooled=no; selector: name=mw1308.eqiad.wmnet,service=jobrunner
  • 15:55 legoktm@deploy1002: conftool action : set/pooled=no; selector: name=mw1307.eqiad.wmnet,service=jobrunner
  • 15:42 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=aqs1004.eqiad.wmnet
  • 15:29 hnowlan: moving all test tables out of cassandra directories on aqs hosts
  • 14:59 effie: disable puppet on mediawiki servers to deploy 663565
  • 14:58 Urbanecm: Move Help talk:Help talk:Getting started --> Help talk:Getting started via moveBatch.php on enwiki (T278350)
  • 14:32 arturo: manually start update-openstack-mirror.service on sodium (T278505)
  • 13:02 jbond42: rollout lxml update T278822
  • 12:55 jbond42: update spamassasin on lists,otrs and mx T278820
  • 12:39 Amir1: ssh -p 29418 gerrit.wikimedia.org replication start wikidata/query-builder --wait (T277060)
  • 12:38 jbond42: update python(3)-pygments
  • 12:36 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=aqs1004.eqiad.wmnet
  • 12:14 Urbanecm: mwmaint1002: Downloading multiple big files (total filesize estimated 150 GB, downloaded and processed in batches) for server-side uploads
  • 11:21 ladsgroup@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Disable legacy javascript global variables in group1, Some increase in client errors is expected (T72470) (duration: 01m 11s)
  • 09:58 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudnet1003.eqiad.wmnet
  • 09:52 aborrero@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudnet1003.eqiad.wmnet
  • 09:42 hnowlan@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 09:41 hnowlan@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 09:35 hnowlan@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 09:35 hnowlan@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 09:05 hnowlan@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 09:04 hnowlan@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 08:36 jynus: mariadb upgrade of all buster source backup hosts to 10.4.18 T250666
  • 08:05 dcausse: refreshing wdqs entities (T278693)
  • 07:37 elukey: restart-php7.2-fpm on mw1304, jobrunner completely overwhelmed by ffmpeg/transcode jobs (not publishing metrics, erroring out for memcached timeouts) - T278734
  • 07:28 hashar@deploy1002: rebuilt and synchronized wikiversions files: all wikis to 1.36.0-wmf.36 - T274940
  • 06:06 elukey: powercycle cp1087 (no ssh, no mgmt console tty)
  • 06:04 elukey@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp1087.eqiad.wmnet

2021-03-29

  • 19:06 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=aqs1004.eqiad.wmnet
  • 17:47 volans@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:37 volans@cumin1001: START - Cookbook sre.dns.netbox
  • 16:15 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=aqs1004.eqiad.wmnet
  • 16:11 hnowlan: depooled aqs1004 for transfer of large tables to aqs1010
  • 15:54 jbond@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:47 jbond@cumin1001: START - Cookbook sre.dns.netbox
  • 15:45 jbond@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:39 jbond@cumin1001: START - Cookbook sre.dns.netbox
  • 13:26 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on parse2001.codfw.wmnet with reason: REIMAGE
  • 13:24 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on parse2001.codfw.wmnet with reason: REIMAGE
  • 13:03 ema: cp4027: rollback luajit experiment https://github.com/apache/trafficserver/issues/7423#issuecomment-809354214
  • 12:36 ema: cp4027: re-enable JIT compilation in all ats-be lua scripts -- https://github.com/apache/trafficserver/issues/7423
  • 11:57 ema: cp4027: re-enable JIT compilation in normalize-path.lua -- https://github.com/apache/trafficserver/issues/7423
  • 11:32 ema: cp4027: install libluajit 2.1.0~beta3+dfsg-6wm1 with P15083 applied -- https://github.com/apache/trafficserver/issues/7423
  • 09:59 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pki2001.codfw.wmnet with reason: REIMAGE
  • 09:57 jbond@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on pki2001.codfw.wmnet with reason: REIMAGE
  • 09:16 ryankemper: T267927 `sudo -i cookbook sre.wdqs.data-reload wdqs2008.codfw.wmnet --task-id T267927 --reload-data wikidata --reason 'T267927: Reload wikidata jnl from fresh dumps' --reuse-downloaded-dump --depool`
  • 09:15 ryankemper@cumin2001: START - Cookbook sre.wdqs.data-reload
  • 08:47 filippo@deploy1002: Finished deploy [librenms/librenms@df69efe]: deploy I156f32925f693 (duration: 00m 08s)
  • 08:47 filippo@deploy1002: Started deploy [librenms/librenms@df69efe]: deploy I156f32925f693
  • 07:59 hashar@deploy1002: Synchronized php: group1 wikis to 1.36.0-wmf.36 (duration: 01m 06s)
  • 07:58 hashar@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.36.0-wmf.36
  • 07:54 hashar@deploy1002: Synchronized php-1.36.0-wmf.36/extensions/FlaggedRevs: Wrap most of functionalities depending on protect mode in a condition - T278478 (duration: 01m 08s)
  • 07:49 ladsgroup@deploy1002: Synchronized php-1.36.0-wmf.36/extensions/FlaggedRevs: Wrap most of functionalities depending on protect mode in a condition (T278478) (duration: 01m 08s)
  • 07:42 godog: swift eqiad-prod: less weight for ms-be[1019-1026] / more weight to ms-be106[0-3] - T272836 T268435

2021-03-27

  • 19:25 elukey: powercycle elastic1060 - T278630
  • 06:10 ryankemper: T267927 `sudo https_proxy=webproxy.codfw.wmnet:8080 wget https://dumps.wikimedia.org/wikidatawiki/entities/latest-all.ttl.bz2 -O /srv/wdqs/latest-all.ttl.bz2 && sudo https_proxy=webproxy.codfw.wmnet:8080 wget https://dumps.wikimedia.org/wikidatawiki/entities/latest-lexemes.ttl.bz2 -O /srv/wdqs/latest-lexemes.ttl.bz2` on `ryankemper@wdqs2008` tmux session `download_dumps_2020-03-26`
  • 05:44 ryankemper@cumin2001: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99)
  • 05:44 ryankemper@cumin2001: START - Cookbook sre.wdqs.data-reload
  • 05:42 ryankemper@cumin2001: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99)
  • 05:42 ryankemper@cumin2001: START - Cookbook sre.wdqs.data-reload
  • 05:40 ryankemper@cumin2001: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99)
  • 05:40 ryankemper@cumin2001: START - Cookbook sre.wdqs.data-reload
  • 05:40 ryankemper@cumin2001: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99)
  • 05:40 ryankemper@cumin2001: START - Cookbook sre.wdqs.data-reload
  • 05:38 ryankemper@cumin2001: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99)
  • 05:38 ryankemper@cumin2001: START - Cookbook sre.wdqs.data-reload

2021-03-26

  • 22:27 tzatziki: reset password for Philroc
  • 20:10 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudgw1001.eqiad.wmnet with reason: REIMAGE
  • 20:08 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudgw1001.eqiad.wmnet with reason: REIMAGE
  • 17:44 hashar@deploy1002: Synchronized php-1.36.0-wmf.36/includes/changes/RecentChange.php: RecentChange: directly build the user identity if we have the data - T277795 (duration: 01m 06s)
  • 17:42 hashar@deploy1002: Finished scap: Revert "Add change tags for media additions/removals" - T266067 T278429 (duration: 31m 43s)
  • 17:10 hashar@deploy1002: Started scap: Revert "Add change tags for media additions/removals" - T266067 T278429
  • 15:40 Urbanecm: Delete `commonswiki:ip-autoblock:whitelist` cache key from memcached (wmf.36 moves the autoblock whitelist source, and it was deployed on commonswiki for a while, resulting in the cache key being empty)
  • 15:37 hnowlan: importing imposm3_0.11.0+git20201104.4758cf4-1_amd64.changes on apt1001
  • 14:40 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1016.eqiad.wmnet
  • 14:33 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ganeti1016.eqiad.wmnet
  • 14:05 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1015.eqiad.wmnet
  • 13:58 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ganeti1015.eqiad.wmnet
  • 13:10 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1014.eqiad.wmnet
  • 13:02 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ganeti1014.eqiad.wmnet
  • 13:02 moritzm: reimaging theemin T275873
  • 12:56 moritzm: drain ganeti1014
  • 12:49 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1013.eqiad.wmnet
  • 12:42 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ganeti1013.eqiad.wmnet
  • 12:37 moritzm: drain ganeti1013
  • 12:35 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1012.eqiad.wmnet
  • 12:27 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ganeti1012.eqiad.wmnet
  • 10:55 Urbanecm: Move `Help talk:Getting Started --> Help talk:Getting started` on enwiki with `[urbanecm@mwmaint1002 ~]$ mwscript moveBatch.php --wiki=enwiki -r 'sysadmin action: fixing phab:T278350' -u 'Martin Urbanec' batch.txt` (T278350)
  • 10:49 Urbanecm: Move `User talk:TheAafi/Help talk` to `Help talk:Getting Started` via `[urbanecm@mwmaint1002 ~]$ mwscript moveBatch.php --wiki=enwiki -r 'sysadmin action: fixing phab:T278350' -u 'Martin Urbanec' batch.txt` to fix an UBN task (T278350)
  • 10:10 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts chlorine.eqiad.wmnet
  • 10:02 akosiaris@cumin1001: START - Cookbook sre.hosts.decommission for hosts chlorine.eqiad.wmnet
  • 10:00 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts argon.eqiad.wmnet
  • 09:49 filippo@deploy1002: Finished deploy [librenms/librenms@63e862a]: deploy I955cbfc244 (duration: 00m 08s)
  • 09:49 filippo@deploy1002: Started deploy [librenms/librenms@63e862a]: deploy I955cbfc244
  • 09:46 akosiaris@cumin1001: START - Cookbook sre.hosts.decommission for hosts argon.eqiad.wmnet
  • 09:45 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts acrab.codfw.wmnet
  • 09:43 moritzm: delete fermium in Ganeti (was still around, but powered down) T224586
  • 09:38 akosiaris@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts acrux.codfw.wmnet
  • 09:36 akosiaris@cumin1001: START - Cookbook sre.hosts.decommission for hosts acrab.codfw.wmnet
  • 09:32 akosiaris@cumin1001: START - Cookbook sre.hosts.decommission for hosts acrux.codfw.wmnet
  • 09:31 filippo@deploy1002: Finished deploy [librenms/librenms@e7727e3]: deploy I12ac21d877c (duration: 00m 12s)
  • 09:31 filippo@deploy1002: Started deploy [librenms/librenms@e7727e3]: deploy I12ac21d877c
  • 09:28 moritzm: drain ganeti1012
  • 09:27 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1010.eqiad.wmnet
  • 09:20 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ganeti1010.eqiad.wmnet
  • 08:38 moritzm: drain ganeti1010
  • 08:38 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1009.eqiad.wmnet
  • 08:30 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ganeti1009.eqiad.wmnet
  • 06:11 ryankemper: [WDQS Deploy] Deploy complete. Successful test query placed on query.wikidata.org, there's no relevant criticals in Icinga, and Grafana looks good
  • 06:09 ryankemper: [WDQS Deploy] Restarted `wdqs-updater` across all hosts, 4 hosts at a time: `sudo -E cumin -b 4 'A:wdqs-all' 'systemctl restart wdqs-updater'`
  • 06:09 ryankemper: [WDQS Deploy] Restarted `wdqs-categories` across all test hosts simultaneously: `sudo -E cumin 'A:wdqs-test' 'systemctl restart wdqs-categories'`
  • 06:09 ryankemper: [WDQS Deploy] Restarted `wdqs-categories` across lvs-managed hosts, one node at a time: `sudo -E cumin -b 1 'A:wdqs-all and not A:wdqs-test' 'depool && sleep 45 && systemctl restart wdqs-categories && sleep 45 && pool'`
  • 05:06 ryankemper@deploy1002: Finished deploy [wdqs/wdqs@bb5a072]: 0.3.68 (duration: 07m 31s)
  • 05:00 ryankemper: [WDQS Deploy] Tests passing following deploy of `0.3.68` on canary `wdqs1003`; proceeding to rest of fleet
  • 04:58 ryankemper@deploy1002: Started deploy [wdqs/wdqs@bb5a072]: 0.3.68
  • 04:58 ryankemper: [WDQS Deploy] Gearing up for deploy of wdqs `0.3.68`. Pre-deploy tests passing on canary `wdqs1003`

2021-03-25

  • 23:47 thcipriani@deploy1002: Synchronized php-1.36.0-wmf.36/extensions/3D/package.json: No-op demo sync (duration: 01m 07s)
  • 23:37 stran@deploy1002: Synchronized README: (no justification provided) (duration: 01m 06s)
  • 23:20 jhuneidi@deploy1002: Synchronized README: DEMO: README (duration: 01m 07s)
  • 22:59 brennen: no patches for upcoming deploy window, but we'll be conducting a deployment training using DEMO patches to READMEs.
  • 22:16 Urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript deleteEqualMessages.php --wiki=hrwiki --delete
  • 21:35 kharlan@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'linkrecommendation' for release 'external' .
  • 21:35 kharlan@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'linkrecommendation' for release 'production' .
  • 21:31 kharlan@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'linkrecommendation' for release 'production' .
  • 21:31 kharlan@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'linkrecommendation' for release 'external' .
  • 21:27 kharlan@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'linkrecommendation' for release 'staging' .
  • 19:48 hashar@deploy1002: rebuilt and synchronized wikiversions files: Revert group 1 and 2 wikis to 1.36.0-wmf.35 - T274940
  • 19:37 hashar@deploy1002: rebuilt and synchronized wikiversions files: Revert group2 wikis to 1.36.0-wmf.35 - T274940
  • 19:36 hashar@deploy1002: sync-wikiversions aborted: (no justification provided) (duration: 00m 03s)
  • 19:11 hashar@deploy1002: rebuilt and synchronized wikiversions files: all wikis to 1.36.0-wmf.36
  • 19:04 urbanecm@deploy1002: Synchronized wmf-config/flaggedrevs.php: ce7d2d7: ruwiki: flaggedrevs: Delete autoeditor group (T275337) (duration: 01m 08s)
  • 19:01 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: ce7d2d7: ruwiki: flaggedrevs: Delete autoeditor group (T275337) (duration: 01m 06s)
  • 18:59 Urbanecm: `mwscript migrateUserGroup.php --wiki=ruwiki 'autoeditor' 'autoreview' ` finished (T275337)
  • 18:53 Urbanecm: [urbanecm@mwmaint1002 ~/uploads]$ mwscript importImages.php --wiki=commonswiki --comment-ext=txt --user=Sturm . # T278391
  • 18:50 Urbanecm: mwscript migrateUserGroup.php --wiki=ruwiki 'autoeditor' 'autoreview' # T275337
  • 18:49 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 39cd4f1: ruwiki: flaggedrevs: Do not allow sysops to modify users in autoeditor group (T275337) (duration: 01m 09s)
  • 18:45 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: dcfb7fe: ruwiki: flaggedrevs: Do not remove autoreview group (T275337) (duration: 01m 14s)
  • 18:39 urbanecm@deploy1002: Synchronized wmf-config/flaggedrevs.php: 3fb6646: ruwiki: flaggedrevs: Revoke review from sysop group (T275811) (duration: 01m 06s)
  • 18:29 urbanecm@deploy1002: Synchronized logos/config.yaml: 29660f9: Update altwiki logo (3/3; T275819) (duration: 01m 06s)
  • 18:28 urbanecm@deploy1002: Synchronized wmf-config/logos.php: 29660f9: Update altwiki logo (2/3; T275819) (duration: 01m 06s)
  • 18:26 urbanecm@deploy1002: Synchronized static/images/project-logos/: 29660f9: Update altwiki logo (1/3; T275819) (duration: 01m 10s)
  • 18:21 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 62be4e7: Disable magic links on enwiki (T275951) (duration: 01m 20s)
  • 18:14 mutante: alert1001 - sudo systemctl restart tcpircbot-logmsgbot
  • 18:09 marxarelli: scap sync-file .pipeline Config: Include patches in restricted image (T271274)
  • 18:06 hnowlan: draining and restarting aqs1004-b cassandra
  • 17:45 hnowlan: draining and restarting aqs1004-a cassandra
  • 17:16 mbsantos@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'proton' for release 'production' .
  • 17:14 mbsantos@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'proton' for release 'production' .
  • 17:08 mbsantos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'proton' for release 'production' .
  • 16:39 hashar: Restarted Apache 2 on contint2001 / contint1001
  • 16:35 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-upgrade
  • 16:32 moritzm: restarting apache on an-tool1007/turnilo
  • 16:27 moritzm: restarting dnsdist/rdns-recursor on malmok
  • 16:24 jbond42: restart slapd on ldap-replica
  • 16:22 jbond42: restart slapd on ldap-corp
  • 16:20 jbond42: restart apache on lists1002
  • 16:18 jbond42: restart apache on netbox
  • 16:13 hashar@deploy1002: Synchronized php-1.36.0-wmf.36/extensions/ProofreadPage: Disallow negative or decimal values in pages tag - T278400 (duration: 01m 32s)
  • 16:12 jbond42: restart routinator on rpki*
  • 16:12 moritzm: restarting nginx on apt*
  • 16:10 moritzm: restarting apache on dbmonitor
  • 16:08 moritzm: restart Apacge on matomo/piwik
  • 16:03 jbond42: restart apache service on gerrit
  • 16:02 jbond42: restart idp service
  • 16:01 ema: A:cp rolling ats-{tls,backend}-restart for openssl upgrades -- https://www.openssl.org/news/secadv/20210325.txt
  • 15:45 moritzm: installing openssl updates on buster
  • 14:48 herron@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:45 herron@cumin1001: START - Cookbook sre.dns.netbox
  • 14:13 twentyafterfour: update phabricator again (last night's update undid a hotfix that is now fixed properly)
  • 13:45 moritzm: drain ganeti1009
  • 13:27 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on webperf1001.eqiad.wmnet with reason: adapt RAM
  • 13:27 jmm@cumin2001: START - Cookbook sre.hosts.downtime for 1:00:00 on webperf1001.eqiad.wmnet with reason: adapt RAM
  • 13:27 moritzm: reduce webperf1001/webperf2001 to 4G RAM (xhgui has been split off to separate VMs)
  • 13:24 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1008.eqiad.wmnet
  • 13:18 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ganeti1008.eqiad.wmnet
  • 12:52 hnowlan: aqs1004 nodetool-a cleanup finished
  • 12:14 moritzm: drain ganeti1008
  • 12:12 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1007.eqiad.wmnet
  • 12:08 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ganeti1007.eqiad.wmnet
  • 11:52 ladsgroup@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Disable Legacy javascript in fawikiquote (T72470) (duration: 01m 07s)
  • 11:46 moritzm: drain ganeti1007
  • 11:44 ladsgroup@deploy1002: Synchronized php-1.36.0-wmf.36/skins/Vector/resources: Inform anonymous A/B test by tracking time from navigationStart (T275807) (duration: 01m 09s)
  • 11:43 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1006.eqiad.wmnet
  • 11:39 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ganeti1006.eqiad.wmnet
  • 11:33 ladsgroup@deploy1002: Synchronized dblists/: tawiki: Enable Growth features in dark mode, Part II (T278369) (duration: 01m 07s)
  • 11:32 ladsgroup@deploy1002: Synchronized wmf-config: tawiki: Enable Growth features in dark mode (T278369) (duration: 01m 30s)
  • 11:29 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pki2001.codfw.wmnet with reason: REIMAGE
  • 11:27 jbond@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on pki2001.codfw.wmnet with reason: REIMAGE
  • 11:24 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dns4001.wikimedia.org
  • 11:19 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pki1001.eqiad.wmnet with reason: REIMAGE
  • 11:18 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host dns4001.wikimedia.org
  • 11:17 jbond@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on pki1001.eqiad.wmnet with reason: REIMAGE
  • 11:10 moritzm: drain ganeti1006
  • 11:03 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1005.eqiad.wmnet
  • 10:58 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ganeti1005.eqiad.wmnet
  • 10:54 kharlan@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'linkrecommendation' for release 'external' .
  • 10:54 kharlan@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'linkrecommendation' for release 'production' .
  • 10:51 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host flerovium.eqiad.wmnet
  • 10:48 kharlan@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'linkrecommendation' for release 'external' .
  • 10:48 kharlan@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'linkrecommendation' for release 'production' .
  • 10:45 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host flerovium.eqiad.wmnet
  • 10:44 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host furud.codfw.wmnet
  • 10:42 kharlan@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'linkrecommendation' for release 'staging' .
  • 10:40 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host furud.codfw.wmnet
  • 10:36 hnowlan: running general nodetool cleanup on aqs1004-a
  • 10:35 hnowlan: running cleanup on aqs1004-a: nodetool-a cleanup "local_group_default_T_pageviews_per_project_v2" data
  • 10:34 moritzm: drain ganeti1005
  • 10:29 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2024.codfw.wmnet
  • 10:28 dcaro@cumin1001: END (PASS) - Cookbook sre.hosts.upgrade-and-reboot (exit_code=0)
  • 10:24 dcaro@cumin1001: START - Cookbook sre.hosts.upgrade-and-reboot
  • 10:23 dcaro@cumin1001: END (PASS) - Cookbook sre.hosts.upgrade-and-reboot (exit_code=0)
  • 10:22 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ganeti2024.codfw.wmnet
  • 10:18 dcaro@cumin1001: START - Cookbook sre.hosts.upgrade-and-reboot
  • 10:17 dcaro@cumin1001: END (PASS) - Cookbook sre.hosts.upgrade-and-reboot (exit_code=0)
  • 10:13 dcaro@cumin1001: START - Cookbook sre.hosts.upgrade-and-reboot
  • 10:13 dcaro@cumin1001: END (FAIL) - Cookbook sre.hosts.upgrade-and-reboot (exit_code=99)
  • 10:13 dcaro@cumin1001: START - Cookbook sre.hosts.upgrade-and-reboot
  • 09:26 moritzm: drain ganeti2024
  • 09:25 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2023.codfw.wmnet
  • 09:17 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ganeti2023.codfw.wmnet
  • 08:45 moritzm: drain ganeti2023
  • 08:43 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2022.codfw.wmnet
  • 08:35 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ganeti2022.codfw.wmnet
  • 08:12 elukey: upgrade hive packages in thirdparty/bigtop15 to 2.3.6-2 for buster-wikimedia
  • 08:11 elukey: upgrade hive packages in thirdparty/bigtop15 to 2.3.6-2
  • 07:41 legoktm: upgraded lists1002 to hyperkitty 1.2.2-1+wmf1 (T276687)
  • 07:36 legoktm: uploaded hyperkitty 1.2.2-1+wmf1 to buster-wikimedia (T276687)
  • 07:35 jynus: restart db2135 T278408 T273281
  • 07:05 effie: enable puppet on all mediawiki servers
  • 06:57 XioNoX: Option 82: use-vlan-id
  • 06:53 effie: enable puppet on jobrunners
  • 06:47 effie: enable puppet on parsoid
  • 06:40 effie: disable puppet on all mediawiki servers to merge 673061 (service proxy to listen on ::1)
  • 06:23 ryankemper@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.rolling-upgrade (exit_code=99)
  • 05:19 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-upgrade
  • 04:44 legoktm: restarted exim4 on lists1002 so it listens on 0.0.0.0 instead of 127.0.0.1
  • 04:16 ryankemper@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.rolling-upgrade (exit_code=99)
  • 03:10 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-upgrade
  • 01:33 ryankemper@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.rolling-upgrade (exit_code=99)
  • 01:10 legoktm: mailman3: added lists-next.wikimedia.org domain
  • 01:08 legoktm: mailman3: renamed default site from "example.com" to "lists-next.wikimedia.org"
  • 00:50 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2378.codfw.wmnet
  • 00:35 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2377.codfw.wmnet
  • 00:35 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2777.codfw.wmnet
  • 00:34 mutante: mw2377, mw2378 - first scap pull
  • 00:33 dzahn@cumin1001: conftool action : set/weight=10; selector: name=mw2378.codfw.wmnet
  • 00:33 dzahn@cumin1001: conftool action : set/weight=10; selector: name=mw2377.codfw.wmnet
  • 00:32 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2378.codfw.wmnet
  • 00:32 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2377.codfw.wmnet
  • 00:29 legoktm: syncing facts for puppet-compiler
  • 00:23 mutante: mw2377, mw2378 - reboot
  • 00:14 twentyafterfour: phabricator update complete
  • 00:10 twentyafterfour: deploying phabricator
  • 00:05 ryankemper: T274204 `sudo -i cookbook sre.elasticsearch.rolling-upgrade search_eqiad "eqiad cluster reboot" --task-id T274204 --nodes-per-run 3 --start-datetime 2021-03-24T23:55:35` on `ryankemper@cumin1001` tmux session `elasticsearch_rolling_upgrade_reboots`

2021-03-24

  • 23:57 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2378.codfw.wmnet with reason: new_install
  • 23:57 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2378.codfw.wmnet with reason: new_install
  • 23:56 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2377.codfw.wmnet with reason: new_install
  • 23:56 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2377.codfw.wmnet with reason: new_install
  • 23:56 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-upgrade
  • 23:48 mutante: generating new mcrouter certs for mw2377, mw2378
  • 22:07 ryankemper@cumin1001: END (PASS) - Cookbook sre.elasticsearch.rolling-upgrade (exit_code=0)
  • 22:07 legoktm: disabled puppet on lists1002 while mailman3-web is broken
  • 21:49 ryankemper@cumin2001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 21:19 mutante: webperf2001 - restarted apache
  • 21:11 hashar@deploy1002: Synchronized php: group1 wikis to 1.36.0-wmf.36 (duration: 01m 07s)
  • 21:10 hashar@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.36.0-wmf.36
  • 21:08 kharlan@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'linkrecommendation' for release 'external' .
  • 21:08 kharlan@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'linkrecommendation' for release 'production' .
  • 21:07 hashar@deploy1002: Synchronized php-1.36.0-wmf.36/extensions/GrowthExperiments: LinkRecommendation: Modify path args for calls to API - T277865 (duration: 01m 07s)
  • 21:05 hashar@deploy1002: Synchronized php-1.36.0-wmf.36/extensions/ProofreadPage: Revert "Add default TemplateStyles for an Index" - T278379 (duration: 01m 07s)
  • 21:04 kharlan@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'linkrecommendation' for release 'production' .
  • 21:04 kharlan@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'linkrecommendation' for release 'external' .
  • 21:02 hashar@deploy1002: Synchronized php-1.36.0-wmf.36/extensions/GlobalUsage: Fix hook registration after class was namespaced - T278375 (duration: 01m 07s)
  • 20:59 hashar@deploy1002: Synchronized wmf-config/env.php: multiversion: Move '@' operator in env.php closer to relevant statement (duration: 01m 07s)
  • 20:56 kharlan@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'linkrecommendation' for release 'staging' .
  • 20:30 kharlan@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'linkrecommendation' for release 'staging' .
  • 20:26 kharlan@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'linkrecommendation' for release 'staging' .
  • 20:13 ryankemper@cumin2001: START - Cookbook sre.wdqs.data-transfer
  • 20:13 ryankemper@cumin2001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 20:10 ryankemper@cumin2001: START - Cookbook sre.wdqs.data-transfer
  • 20:09 ryankemper@cumin2001: START - Cookbook sre.wdqs.data-transfer
  • 20:07 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on cumin2002.codfw.wmnet with reason: REIMAGE
  • 20:05 pt1979@cumin2001: START - Cookbook sre.hosts.downtime for 2:00:00 on cumin2002.codfw.wmnet with reason: REIMAGE
  • 19:59 ryankemper@cumin2001: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99)
  • 19:59 ryankemper@cumin2001: START - Cookbook sre.wdqs.data-transfer
  • 19:57 ryankemper: T267927 Host key is missing for `wdqs2008` leading to `data-transfer` cookbook failing, looking into resolving
  • 19:55 ryankemper@cumin2001: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99)
  • 19:55 ryankemper@cumin2001: START - Cookbook sre.wdqs.data-transfer
  • 19:50 ryankemper@cumin2001: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99)
  • 19:50 ryankemper@cumin2001: START - Cookbook sre.wdqs.data-transfer
  • 19:49 ryankemper@cumin2001: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99)
  • 19:49 ryankemper@cumin2001: START - Cookbook sre.wdqs.data-transfer
  • 19:45 ryankemper@cumin2001: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99)
  • 19:45 ryankemper@cumin2001: START - Cookbook sre.wdqs.data-transfer
  • 19:42 ryankemper: T267927 Re-enabledpuppet on `wdqs2008` and ran puppet agent
  • 19:21 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-upgrade
  • 19:14 hashar@deploy1002: rebuilt and synchronized wikiversions files: Revert group 1 to 1.36.0-wmf.35
  • 19:07 hashar@deploy1002: Synchronized php: group1 wikis to 1.36.0-wmf.36 (duration: 01m 21s)
  • 19:05 hashar@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.36.0-wmf.36
  • 19:03 urbanecm@deploy1002: Synchronized wmf-config/config/shwiki.yaml: 0f3aa72: shwiki: Enable Growth features in dark mode (T278240; 3/3) (duration: 01m 08s)
  • 19:02 urbanecm@deploy1002: Synchronized dblists/growthexperiments.dblist: 0f3aa72: shwiki: Enable Growth features in dark mode (T278240; 2/3) (duration: 01m 06s)
  • 19:00 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 0f3aa72: shwiki: Enable Growth features in dark mode (T278240; 1/3) (duration: 01m 07s)
  • 18:54 urbanecm@deploy1002: Synchronized wmf-config/config/eswiki.yaml: ced0920: Enable Growth features on eswiki in dark mode (T278235; 3/3) (duration: 01m 06s)
  • 18:53 urbanecm@deploy1002: Synchronized dblists/growthexperiments.dblist: ced0920: Enable Growth features on eswiki in dark mode (T278235; 2/3) (duration: 01m 07s)
  • 18:52 urbanecm@deploy1002: sync-file aborted: ced0920: Enable Growth features on eswiki in dark mode (2/3) (duration: 00m 01s)
  • 18:51 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: ced0920: Enable Growth features on eswiki in dark mode (T278235; 1/3) (duration: 01m 08s)
  • 18:49 legoktm@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 18:45 legoktm@cumin1001: START - Cookbook sre.dns.netbox
  • 18:42 legoktm@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 18:40 legoktm@cumin1001: START - Cookbook sre.dns.netbox
  • 18:31 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 5aa0506: Promote several Growth target wikis out of dark mode (T277491; T276830; T276123; T276816; T275550; T276450) (duration: 01m 08s)
  • 18:20 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 333393d: Add autopatrol to autoreviewers in en.wikibooks (T278300) (duration: 01m 09s)
  • 18:08 pt1979@cumin2001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 18:02 pt1979@cumin2001: START - Cookbook sre.dns.netbox
  • 17:25 effie: upgrade memcached on mc-gp* hosts
  • 15:45 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on irc2001.wikimedia.org with reason: adapt RAM
  • 15:45 jmm@cumin2001: START - Cookbook sre.hosts.downtime for 1:00:00 on irc2001.wikimedia.org with reason: adapt RAM
  • 15:42 moritzm: reduce RAM for irc2001 to 2G, was originally created with 8 G T224579
  • 15:35 effie: enable puppet on all mediawiki + memcached hosts
  • 15:20 moritzm: drain ganeti2022
  • 15:20 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2021.codfw.wmnet
  • 15:10 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ganeti2021.codfw.wmnet
  • 14:35 moritzm: drain ganeti2021
  • 14:31 effie: disable puppet on all mediawiki servers + memcached for 674290
  • 14:05 moritzm: failover Ganeti master in codfw to ganeti2019
  • 13:59 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2020.codfw.wmnet
  • 13:51 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ganeti2020.codfw.wmnet
  • 13:29 moritzm: installing irc1001
  • 13:15 moritzm: drain ganeti2020
  • 12:38 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2019.codfw.wmnet
  • 12:30 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ganeti2019.codfw.wmnet
  • 12:28 effie: enabling puppet on mediawiki and memcached servers
  • 12:10 jynus: restart dbprov200[12] T271913
  • 11:59 marostegui@cumin1001: dbctl commit (dc=all): 'db1160 (re)pooling @ 100%: Slowly repool db1160 after schema change', diff saved to https://phabricator.wikimedia.org/P15076 and previous config saved to /var/cache/conftool/dbconfig/20210324-115940-root.json
  • 11:57 Andrew-WMDE_: EU deploys done
  • 11:53 jynus: restart dbprov100[12] T271913
  • 11:51 andrew-wmde@deploy1002: Synchronized php-1.36.0-wmf.35/extensions/MassMessage/: Backport: MassMessage: Unbreak remote content fetching (T276936) (duration: 01m 08s)
  • 11:49 effie: disable puppet on all hosts running mediawiki+memcached to merge 674282
  • 11:45 andrew-wmde@deploy1002: Synchronized php-1.36.0-wmf.36/extensions/MassMessage/: Backport: MassMessage: Unbreak remote content fetching (T276936) (duration: 01m 07s)
  • 11:44 marostegui@cumin1001: dbctl commit (dc=all): 'db1160 (re)pooling @ 75%: Slowly repool db1160 after schema change', diff saved to https://phabricator.wikimedia.org/P15075 and previous config saved to /var/cache/conftool/dbconfig/20210324-114436-root.json
  • 11:29 marostegui@cumin1001: dbctl commit (dc=all): 'db1160 (re)pooling @ 50%: Slowly repool db1160 after schema change', diff saved to https://phabricator.wikimedia.org/P15074 and previous config saved to /var/cache/conftool/dbconfig/20210324-112932-root.json
  • 11:22 andrew-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Enable CodeMirror accessibility colors on initial wikis (T276346) (duration: 01m 08s)
  • 11:15 jynus: restart serially db2097 db2098 db2099 db2100 T271913
  • 11:14 andrew-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Enable bracket matching on group0 and wikitech (T273591) (duration: 01m 25s)
  • 11:14 marostegui@cumin1001: dbctl commit (dc=all): 'db1160 (re)pooling @ 25%: Slowly repool db1160 after schema change', diff saved to https://phabricator.wikimedia.org/P15073 and previous config saved to /var/cache/conftool/dbconfig/20210324-111429-root.json
  • 10:50 jmm@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host irc1001.wikimedia.org
  • 10:48 mbsantos@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
  • 10:45 mbsantos@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
  • 10:44 mbsantos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'wikifeeds' for release 'staging' .
  • 10:36 jmm@cumin1001: START - Cookbook sre.ganeti.makevm for new host irc1001.wikimedia.org
  • 10:31 jynus: restart db1171 T271913
  • 10:15 akosiaris@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'staging' .
  • 10:14 akosiaris@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' .
  • 10:14 jynus: restart db1145 T271913
  • 10:06 akosiaris@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'staging' .
  • 10:06 akosiaris@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' .
  • 10:03 jynus: restart db1139 T271913
  • 09:56 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1160 for schema change', diff saved to https://phabricator.wikimedia.org/P15072 and previous config saved to /var/cache/conftool/dbconfig/20210324-095655-marostegui.json
  • 09:56 marostegui@cumin1001: dbctl commit (dc=all): 'db1149 (re)pooling @ 100%: Slowly repool db1149 after schema change', diff saved to https://phabricator.wikimedia.org/P15071 and previous config saved to /var/cache/conftool/dbconfig/20210324-095606-root.json
  • 09:51 jynus: restart db1116 T271913
  • 09:41 marostegui@cumin1001: dbctl commit (dc=all): 'db1149 (re)pooling @ 75%: Slowly repool db1149 after schema change', diff saved to https://phabricator.wikimedia.org/P15070 and previous config saved to /var/cache/conftool/dbconfig/20210324-094102-root.json
  • 09:28 jayme@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' .
  • 09:28 jayme@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'staging' .
  • 09:25 marostegui@cumin1001: dbctl commit (dc=all): 'db1149 (re)pooling @ 50%: Slowly repool db1149 after schema change', diff saved to https://phabricator.wikimedia.org/P15069 and previous config saved to /var/cache/conftool/dbconfig/20210324-092558-root.json
  • 09:10 marostegui@cumin1001: dbctl commit (dc=all): 'db1149 (re)pooling @ 25%: Slowly repool db1149 after schema change', diff saved to https://phabricator.wikimedia.org/P15068 and previous config saved to /var/cache/conftool/dbconfig/20210324-091055-root.json
  • 08:29 jayme@cumin1001: conftool action : set/pooled=true; selector: name=eqiad,dnsdisc=sessionstore
  • 08:16 gehel: restarting wdqs updater on all nodes for config change
  • 08:14 jayme@cumin1001: conftool action : set/pooled=true; selector: name=eqiad,dnsdisc=eventgate-analytics
  • 08:14 jayme@cumin1001: conftool action : set/pooled=true; selector: name=eqiad,dnsdisc=eventgate-analytics-external
  • 08:10 marostegui@cumin1001: dbctl commit (dc=all): 'db1086 (re)pooling @ 75%: Slowly repool db1086 after schema change', diff saved to https://phabricator.wikimedia.org/P15066 and previous config saved to /var/cache/conftool/dbconfig/20210324-081057-root.json
  • 08:07 marostegui@cumin1001: dbctl commit (dc=all): 'db1141 (re)pooling @ 100%: Slowly repool db1141', diff saved to https://phabricator.wikimedia.org/P15065 and previous config saved to /var/cache/conftool/dbconfig/20210324-080725-root.json
  • 08:02 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1149 for schema change', diff saved to https://phabricator.wikimedia.org/P15064 and previous config saved to /var/cache/conftool/dbconfig/20210324-080223-marostegui.json
  • 08:01 jayme@cumin1001: conftool action : set/pooled=true; selector: name=eqiad,dnsdisc=eventgate-main
  • 08:01 jayme@cumin1001: conftool action : set/pooled=true; selector: name=eqiad,dnsdisc=eventgate-logging-external
  • 08:01 jayme@cumin1001: conftool action : set/pooled=true; selector: name=eqiad,dnsdisc=zotero
  • 07:55 marostegui@cumin1001: dbctl commit (dc=all): 'db1086 (re)pooling @ 50%: Slowly repool db1086 after schema change', diff saved to https://phabricator.wikimedia.org/P15063 and previous config saved to /var/cache/conftool/dbconfig/20210324-075553-root.json
  • 07:52 marostegui@cumin1001: dbctl commit (dc=all): 'db1141 (re)pooling @ 75%: Slowly repool db1141', diff saved to https://phabricator.wikimedia.org/P15062 and previous config saved to /var/cache/conftool/dbconfig/20210324-075221-root.json
  • 07:50 jayme@cumin1001: conftool action : set/pooled=true; selector: name=eqiad,dnsdisc=dnsdisc=eventgate-main
  • 07:50 jayme@cumin1001: conftool action : set/pooled=true; selector: name=eqiad,dnsdisc=dnsdisc=eventgate-logging-external
  • 07:50 jayme@cumin1001: conftool action : set/pooled=true; selector: name=eqiad,dnsdisc=dnsdisc=zotero
  • 07:41 elukey@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host ml-etcd2002.codfw.wmnet
  • 07:40 marostegui@cumin1001: dbctl commit (dc=all): 'db1086 (re)pooling @ 25%: Slowly repool db1086 after schema change', diff saved to https://phabricator.wikimedia.org/P15061 and previous config saved to /var/cache/conftool/dbconfig/20210324-074050-root.json
  • 07:37 marostegui@cumin1001: dbctl commit (dc=all): 'db1141 (re)pooling @ 50%: Slowly repool db1141', diff saved to https://phabricator.wikimedia.org/P15060 and previous config saved to /var/cache/conftool/dbconfig/20210324-073718-root.json
  • 07:27 elukey@cumin1001: START - Cookbook sre.ganeti.makevm for new host ml-etcd2002.codfw.wmnet
  • 07:23 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1086 for schema change', diff saved to https://phabricator.wikimedia.org/P15059 and previous config saved to /var/cache/conftool/dbconfig/20210324-072319-marostegui.json
  • 07:22 marostegui@cumin1001: dbctl commit (dc=all): 'db1141 (re)pooling @ 25%: Slowly repool db1141', diff saved to https://phabricator.wikimedia.org/P15058 and previous config saved to /var/cache/conftool/dbconfig/20210324-072214-root.json
  • 07:20 elukey@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts ml-etcd2002.codfw.wmnet
  • 07:10 elukey@cumin1001: START - Cookbook sre.hosts.decommission for hosts ml-etcd2002.codfw.wmnet
  • 07:09 moritzm: installing squid security updates
  • 06:35 marostegui@cumin1001: dbctl commit (dc=all): 'Add db1181 to dbctl, depooled T275633', diff saved to https://phabricator.wikimedia.org/P15057 and previous config saved to /var/cache/conftool/dbconfig/20210324-063459-marostegui.json
  • 06:24 root@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db1084.eqiad.wmnet
  • 06:14 root@cumin1001: START - Cookbook sre.hosts.decommission for hosts db1084.eqiad.wmnet
  • 05:52 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1141', diff saved to https://phabricator.wikimedia.org/P15056 and previous config saved to /var/cache/conftool/dbconfig/20210324-055246-marostegui.json
  • 04:44 ryankemper@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.rolling-upgrade (exit_code=99)
  • 03:41 ryankemper: T274204 `sudo -i cookbook sre.elasticsearch.rolling-upgrade search_codfw "codfw cluster reboot" --task-id T274204 --nodes-per-run 3 --start-datetime 2021-03-24T02:29:39` on `ryankemper@cumin1001` tmux session `elasticsearch_rolling_upgrade_reboots`
  • 03:41 ryankemper: T274204 Restarting `codfw` restart; the timestamp argument should prevent it from wasting time on nodes that have been rebooted already
  • 03:40 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-upgrade
  • 03:39 ryankemper: T274204 Timed out waiting for write queues to empty: `[59/60, retrying in 60.00s] Attempt to run 'spicerack.elasticsearch_cluster.ElasticsearchClusters.wait_for_all_write_queues_empty' raised: Write queue not empty (had value of 241631) for partition 0 of topic codfw.cpjobqueue.partitioned.mediawiki.job.cirrusSearchElasticaWrite.`
  • 03:38 ryankemper@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.rolling-upgrade (exit_code=99)
  • 02:38 ryankemper: T274204 `sudo -i cookbook sre.elasticsearch.rolling-upgrade search_codfw "codfw cluster reboot" --task-id T274204 --nodes-per-run 3 --start-datetime 2021-03-24T02:29:39` on `ryankemper@cumin1001` tmux session `elasticsearch_rolling_upgrade_reboots`
  • 02:31 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-upgrade
  • 01:59 ryankemper: T274204 For now I'll proceed to the reboots of `codfw`
  • 01:59 ryankemper: T274204 `ctrl+c`'d out of run; relforge is relying on outdated config that is trying to talk to `relforge1002` which no longer exists. Need to refactor so that config no longer lives in spicerack
  • 01:58 ryankemper@cumin1001: END (ERROR) - Cookbook sre.elasticsearch.rolling-upgrade-reboot (exit_code=97)
  • 01:49 ryankemper: T274204 `sudo -i cookbook sre.elasticsearch.rolling-upgrade-reboot relforge "relforge cluster restarts" --task-id T274204 --nodes-per-run 3 --start-datetime 2021-03-24T01:45:59+00:00` on `ryankemper@cumin1001` tmux session `elasticsearch_rolling_upgrade_reboots`
  • 01:48 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-upgrade-reboot
  • 01:36 eileen: civicrm revision changed from f36a0b08f0 to ad430721f6, config revision is 26b02db7ba
  • 00:22 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mw2378.codfw.wmnet with reason: REIMAGE
  • 00:18 pt1979@cumin2001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2378.codfw.wmnet with reason: REIMAGE
  • 00:18 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mw2377.codfw.wmnet with reason: REIMAGE
  • 00:16 pt1979@cumin2001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2377.codfw.wmnet with reason: REIMAGE

2021-03-23

  • 22:59 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pki-root1001.eqiad.wmnet with reason: REIMAGE
  • 22:57 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on pki-root1001.eqiad.wmnet with reason: REIMAGE
  • 22:33 dwisehaupt: pushing 60f9baaf50b to fundraising hosts which will enable ssl by default for mysql client connections that use the host my.cnf file - T170321
  • 22:19 ebernhardson@deploy1002: Finished deploy [wikimedia/discovery/analytics@3fd7d7b]: partition ores dumps by namespace (duration: 02m 07s)
  • 22:17 ebernhardson@deploy1002: Started deploy [wikimedia/discovery/analytics@3fd7d7b]: partition ores dumps by namespace
  • 22:09 dzahn@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 22:05 dzahn@cumin1001: START - Cookbook sre.dns.netbox
  • 21:27 ppchelko@deploy1002: Finished deploy [restbase/deploy@531c474]: Add pageviews top-per-country endpoint (duration: 17m 58s)
  • 21:09 ppchelko@deploy1002: Started deploy [restbase/deploy@531c474]: Add pageviews top-per-country endpoint
  • 21:04 robh@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 21:00 robh@cumin1001: START - Cookbook sre.dns.netbox
  • 21:00 robh@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 20:41 eileen: civicrm revision changed from 39d24e8b0a to f36a0b08f0, config revision is 26b02db7ba
  • 20:24 robh@cumin1001: START - Cookbook sre.dns.netbox
  • 20:24 robh@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 20:21 robh@cumin1001: START - Cookbook sre.dns.netbox
  • 20:13 robh@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts auth1002.eqiad.wmnet
  • 20:03 robh@cumin1001: START - Cookbook sre.hosts.decommission for hosts auth1002.eqiad.wmnet
  • 20:02 robh@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=99) for hosts auth1002.eqiad.wmnet
  • 20:01 robh@cumin1001: START - Cookbook sre.hosts.decommission for hosts auth1002.eqiad.wmnet
  • 19:51 jforrester@deploy1002: Finished deploy [integration/docroot@9de8c9d]: Add homer-public listing, added by volans (duration: 00m 08s)
  • 19:51 jforrester@deploy1002: Started deploy [integration/docroot@9de8c9d]: Add homer-public listing, added by volans
  • 18:45 otto@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Remove schema overrides for 6 finished EL migrations - T267347 T271164 T267351 T267348 T267343 T267353 (duration: 01m 07s)
  • 18:40 legoktm@deploy1002: Synchronized php-1.36.0-wmf.36/vendor/: Bump wikimedia/parsoid to 0.13.0-a29 (duration: 01m 16s)
  • 18:20 mbsantos@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
  • 18:18 mbsantos@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
  • 18:16 mbsantos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'mobileapps' for release 'staging' .
  • 18:10 legoktm@deploy1002: Synchronized wmf-config/ProductionServices.php: Add irc2001.wikimedia.org (running buster) as second irc server (T224579) (duration: 01m 08s)
  • 15:39 akosiaris@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'cxserver' for release 'staging' .
  • 15:39 akosiaris@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'cxserver' for release 'production' .
  • 15:38 akosiaris@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'cxserver' for release 'production' .
  • 15:38 akosiaris@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'cxserver' for release 'staging' .
  • 15:36 akosiaris@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'cxserver' for release 'staging' .
  • 15:36 akosiaris@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'cxserver' for release 'production' .
  • 15:32 moritzm: installing libsdl2 security updates
  • 15:31 akosiaris: pool echostore for eqiad (the first of the larger services traffic wise)
  • 15:31 akosiaris@cumin1001: conftool action : set/pooled=true; selector: name=eqiad,dnsdisc=echostore
  • 15:25 Trey314159: reindexing Italian wikis on elastic@eqiad, elastic@codfw, and cloudelastic complete (T274200)
  • 15:10 akosiaris@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'linkrecommendation' for release 'production' .
  • 15:10 akosiaris@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'linkrecommendation' for release 'staging' .
  • 15:10 akosiaris@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'linkrecommendation' for release 'external' .
  • 14:53 akosiaris@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'linkrecommendation' for release 'staging' .
  • 14:53 akosiaris@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'linkrecommendation' for release 'external' .
  • 14:53 akosiaris@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'linkrecommendation' for release 'production' .
  • 14:46 akosiaris@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'linkrecommendation' for release 'production' .
  • 14:46 akosiaris@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'linkrecommendation' for release 'staging' .
  • 14:46 akosiaris@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'linkrecommendation' for release 'external' .
  • 14:43 akosiaris: pool more services in eqiad k8s. T277741. Only the very large ones traffic wise are still on codfw
  • 14:43 akosiaris@cumin1001: conftool action : set/pooled=true; selector: name=eqiad,dnsdisc=recommendation-api
  • 14:43 akosiaris@cumin1001: conftool action : set/pooled=true; selector: name=eqiad,dnsdisc=push-notifications
  • 14:43 akosiaris@cumin1001: conftool action : set/pooled=true; selector: name=eqiad,dnsdisc=proton
  • 14:42 akosiaris@cumin1001: conftool action : set/pooled=true; selector: name=eqiad,dnsdisc=mobileapps
  • 14:42 akosiaris@cumin1001: conftool action : set/pooled=true; selector: name=eqiad,dnsdisc=mathoid
  • 14:42 akosiaris@cumin1001: conftool action : set/pooled=true; selector: name=eqiad,dnsdisc=linkrecommendation
  • 14:42 akosiaris@cumin1001: conftool action : set/pooled=true; selector: name=eqiad,dnsdisc=eventstreams-internal
  • 14:42 akosiaris@cumin1001: conftool action : set/pooled=true; selector: name=eqiad,dnsdisc=eventstreams
  • 14:20 akosiaris: pool a few more services in eqiad k8s. T277741
  • 14:19 akosiaris@cumin1001: conftool action : set/pooled=true; selector: name=eqiad,dnsdisc=wikifeeds
  • 14:19 akosiaris@cumin1001: conftool action : set/pooled=true; selector: name=eqiad,dnsdisc=termbox
  • 14:19 akosiaris@cumin1001: conftool action : set/pooled=true; selector: name=eqiad,dnsdisc=similar-users
  • 14:07 hashar@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.36.0-wmf.36
  • 14:06 akosiaris: pool a few services in eqiad k8s. T277741
  • 14:05 akosiaris@cumin1001: conftool action : set/pooled=true; selector: name=eqiad,dnsdisc=cxserver
  • 14:05 akosiaris@cumin1001: conftool action : set/pooled=true; selector: name=eqiad,dnsdisc=citoid
  • 14:05 akosiaris@cumin1001: conftool action : set/pooled=true; selector: name=eqiad,dnsdisc=blubberoid
  • 14:05 akosiaris@cumin1001: conftool action : set/pooled=true; selector: name=eqiad,dnsdisc=api-gateway
  • 14:05 akosiaris@cumin1001: conftool action : set/pooled=true; selector: name=eqiad,dnsdisc=apertium
  • 14:05 moritzm: installing pygments security updates on stretch
  • 14:04 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2008.codfw.wmnet
  • 13:59 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ganeti2008.codfw.wmnet
  • 13:55 hashar@deploy1002: Finished scap: Promote testwikis from 1.36.0-wmf.35 to 1.36.0-wmf.36 - T274940 (duration: 31m 57s)
  • 13:54 elukey: sudo systemctl reload apache2 on prometheus[12]00[34] to pick up new k8s-mlserve instance settings
  • 13:28 moritzm: drain ganeti2008
  • 13:25 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2018.codfw.wmnet
  • 13:23 hashar@deploy1002: Started scap: Promote testwikis from 1.36.0-wmf.35 to 1.36.0-wmf.36 - T274940
  • 13:17 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ganeti2018.codfw.wmnet
  • 13:15 ema: cp3054: install varnishkafka built explicitly against varnish 6.0.1-1wm2 to fix broken dpkg status T264398
  • 13:05 marostegui@cumin1001: dbctl commit (dc=all): 'db1086 (re)pooling @ 100%: Slowly repool db1086 after cloning db1181', diff saved to https://phabricator.wikimedia.org/P15054 and previous config saved to /var/cache/conftool/dbconfig/20210323-130543-root.json
  • 13:01 marostegui@cumin1001: dbctl commit (dc=all): 'db1148 (re)pooling @ 100%: Slowly repool db1148 after schema change', diff saved to https://phabricator.wikimedia.org/P15053 and previous config saved to /var/cache/conftool/dbconfig/20210323-130153-root.json
  • 12:58 moritzm: drain ganeti2018
  • 12:58 akosiaris: remove and decomission argon, chroline, acrab, acrux T277741, T277191
  • 12:51 marostegui@cumin1001: dbctl commit (dc=all): 'db1165 (re)pooling @ 100%: Slowly pool db1165 into s6 T258361', diff saved to https://phabricator.wikimedia.org/P15052 and previous config saved to /var/cache/conftool/dbconfig/20210323-125155-root.json
  • 12:50 marostegui@cumin1001: dbctl commit (dc=all): 'db1086 (re)pooling @ 75%: Slowly repool db1086 after cloning db1181', diff saved to https://phabricator.wikimedia.org/P15051 and previous config saved to /var/cache/conftool/dbconfig/20210323-125039-root.json
  • 12:50 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2017.codfw.wmnet
  • 12:46 marostegui@cumin1001: dbctl commit (dc=all): 'db1148 (re)pooling @ 75%: Slowly repool db1148 after schema change', diff saved to https://phabricator.wikimedia.org/P15050 and previous config saved to /var/cache/conftool/dbconfig/20210323-124650-root.json
  • 12:42 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ganeti2017.codfw.wmnet
  • 12:36 marostegui@cumin1001: dbctl commit (dc=all): 'db1165 (re)pooling @ 85%: Slowly pool db1165 into s6 T258361', diff saved to https://phabricator.wikimedia.org/P15049 and previous config saved to /var/cache/conftool/dbconfig/20210323-123651-root.json
  • 12:35 marostegui@cumin1001: dbctl commit (dc=all): 'db1086 (re)pooling @ 50%: Slowly repool db1086 after cloning db1181', diff saved to https://phabricator.wikimedia.org/P15048 and previous config saved to /var/cache/conftool/dbconfig/20210323-123535-root.json
  • 12:31 marostegui@cumin1001: dbctl commit (dc=all): 'db1148 (re)pooling @ 50%: Slowly repool db1148 after schema change', diff saved to https://phabricator.wikimedia.org/P15047 and previous config saved to /var/cache/conftool/dbconfig/20210323-123146-root.json
  • 12:27 moritzm: drain ganeti2017
  • 12:25 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2016.codfw.wmnet
  • 12:21 marostegui@cumin1001: dbctl commit (dc=all): 'db1165 (re)pooling @ 75%: Slowly pool db1165 into s6 T258361', diff saved to https://phabricator.wikimedia.org/P15046 and previous config saved to /var/cache/conftool/dbconfig/20210323-122148-root.json
  • 12:20 marostegui@cumin1001: dbctl commit (dc=all): 'db1086 (re)pooling @ 25%: Slowly repool db1086 after cloning db1181', diff saved to https://phabricator.wikimedia.org/P15045 and previous config saved to /var/cache/conftool/dbconfig/20210323-122032-root.json
  • 12:17 akosiaris: remove all schedule downtimes for k8s cluster. T277741
  • 12:17 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ganeti2016.codfw.wmnet
  • 12:16 marostegui@cumin1001: dbctl commit (dc=all): 'db1148 (re)pooling @ 25%: Slowly repool db1148 after schema change', diff saved to https://phabricator.wikimedia.org/P15044 and previous config saved to /var/cache/conftool/dbconfig/20210323-121642-root.json
  • 12:09 moritzm: drain ganeti2016
  • 12:06 marostegui@cumin1001: dbctl commit (dc=all): 'db1165 (re)pooling @ 60%: Slowly pool db1165 into s6 T258361', diff saved to https://phabricator.wikimedia.org/P15043 and previous config saved to /var/cache/conftool/dbconfig/20210323-120644-root.json
  • 11:55 moritzm: installing libcaca security updates
  • 11:51 marostegui@cumin1001: dbctl commit (dc=all): 'db1165 (re)pooling @ 50%: Slowly pool db1165 into s6 T258361', diff saved to https://phabricator.wikimedia.org/P15042 and previous config saved to /var/cache/conftool/dbconfig/20210323-115141-root.json
  • 11:38 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on aqs[1012-1015].eqiad.wmnet with reason: New buster hosts, not in use
  • 11:38 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on aqs[1012-1015].eqiad.wmnet with reason: New buster hosts, not in use
  • 11:36 marostegui@cumin1001: dbctl commit (dc=all): 'db1165 (re)pooling @ 35%: Slowly pool db1165 into s6 T258361', diff saved to https://phabricator.wikimedia.org/P15041 and previous config saved to /var/cache/conftool/dbconfig/20210323-113637-root.json
  • 11:31 Lucas_WMDE: EU backport&config window done
  • 11:30 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Enable DiscussionTools' beta features on dewiki (T276494) (duration: 00m 58s)
  • 11:21 marostegui@cumin1001: dbctl commit (dc=all): 'db1165 (re)pooling @ 25%: Slowly pool db1165 into s6 T258361', diff saved to https://phabricator.wikimedia.org/P15040 and previous config saved to /var/cache/conftool/dbconfig/20210323-112133-root.json
  • 11:06 marostegui@cumin1001: dbctl commit (dc=all): 'db1165 (re)pooling @ 20%: Slowly pool db1165 into s6 T258361', diff saved to https://phabricator.wikimedia.org/P15039 and previous config saved to /var/cache/conftool/dbconfig/20210323-110630-root.json
  • 11:05 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1148', diff saved to https://phabricator.wikimedia.org/P15038 and previous config saved to /var/cache/conftool/dbconfig/20210323-110553-marostegui.json
  • 11:03 marostegui@cumin1001: dbctl commit (dc=all): 'db1147 (re)pooling @ 100%: Slowly repool db1147 after schema change', diff saved to https://phabricator.wikimedia.org/P15037 and previous config saved to /var/cache/conftool/dbconfig/20210323-110347-root.json
  • 11:01 moritzm: installing tomcat8 security updates
  • 10:56 jayme: all services re-deployed to k8s eqiad - T277741
  • 10:51 marostegui@cumin1001: dbctl commit (dc=all): 'db1165 (re)pooling @ 15%: Slowly pool db1165 into s6 T258361', diff saved to https://phabricator.wikimedia.org/P15036 and previous config saved to /var/cache/conftool/dbconfig/20210323-105126-root.json
  • 10:48 marostegui@cumin1001: dbctl commit (dc=all): 'db1147 (re)pooling @ 75%: Slowly repool db1147 after schema change', diff saved to https://phabricator.wikimedia.org/P15035 and previous config saved to /var/cache/conftool/dbconfig/20210323-104843-root.json
  • 10:46 jayme@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'zotero' for release 'staging' .
  • 10:46 jayme@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'zotero' for release 'production' .
  • 10:45 jayme@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
  • 10:45 jayme@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'wikifeeds' for release 'staging' .
  • 10:45 jayme@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'termbox' for release 'test' .
  • 10:45 jayme@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'termbox' for release 'production' .
  • 10:45 jayme@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'termbox' for release 'staging' .
  • 10:44 jayme@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'similar-users' for release 'main' .
  • 10:44 jayme@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'sessionstore' for release 'staging' .
  • 10:44 jayme@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'sessionstore' for release 'production' .
  • 10:43 jayme@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'recommendation-api' for release 'production' .
  • 10:42 jayme@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'push-notifications' for release 'canary' .
  • 10:42 jayme@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'push-notifications' for release 'main' .
  • 10:41 jayme@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'proton' for release 'production' .
  • 10:39 jayme@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
  • 10:39 jayme@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mobileapps' for release 'staging' .
  • 10:37 jayme@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mathoid' for release 'staging' .
  • 10:37 jayme@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mathoid' for release 'production' .
  • 10:36 marostegui@cumin1001: dbctl commit (dc=all): 'db1165 (re)pooling @ 10%: Slowly pool db1165 into s6 T258361', diff saved to https://phabricator.wikimedia.org/P15034 and previous config saved to /var/cache/conftool/dbconfig/20210323-103623-root.json
  • 10:33 marostegui@cumin1001: dbctl commit (dc=all): 'db1147 (re)pooling @ 50%: Slowly repool db1147 after schema change', diff saved to https://phabricator.wikimedia.org/P15033 and previous config saved to /var/cache/conftool/dbconfig/20210323-103340-root.json
  • 10:32 jayme@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'linkrecommendation' for release 'production' .
  • 10:32 jayme@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'linkrecommendation' for release 'external' .
  • 10:32 jayme@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'linkrecommendation' for release 'staging' .
  • 10:32 jayme@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventstreams-internal' for release 'canary' .
  • 10:32 jayme@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventstreams-internal' for release 'main' .
  • 10:31 jayme@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventstreams' for release 'canary' .
  • 10:31 jayme@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventstreams' for release 'production' .
  • 10:29 jayme@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
  • 10:29 jayme@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-main' for release 'canary' .
  • 10:29 jayme@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'canary' .
  • 10:28 jayme@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
  • 10:28 jayme@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'canary' .
  • 10:28 jayme@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
  • 10:27 jayme@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-analytics' for release 'production' .
  • 10:27 jayme@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-analytics' for release 'canary' .
  • 10:26 jayme@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'echostore' for release 'staging' .
  • 10:26 jayme@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'echostore' for release 'production' .
  • 10:25 jayme@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'cxserver' for release 'production' .
  • 10:25 jayme@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'cxserver' for release 'staging' .
  • 10:24 jayme@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'citoid' for release 'production' .
  • 10:23 jayme@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'citoid' for release 'staging' .
  • 10:23 jayme@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' .
  • 10:23 jayme@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'staging' .
  • 10:22 jayme@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'changeprop' for release 'production' .
  • 10:22 jayme@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'changeprop' for release 'staging' .
  • 10:22 jayme@cumin1001: conftool action : set/pooled=yes; selector: dc=eqiad,service=kubesvc
  • 10:22 jayme@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'blubberoid' for release 'production' .
  • 10:22 jayme@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'blubberoid' for release 'staging' .
  • 10:21 jayme@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 10:21 jayme@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 10:21 marostegui@cumin1001: dbctl commit (dc=all): 'db1165 (re)pooling @ 5%: Slowly pool db1165 into s6 T258361', diff saved to https://phabricator.wikimedia.org/P15031 and previous config saved to /var/cache/conftool/dbconfig/20210323-102119-root.json
  • 10:20 jayme@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'apertium' for release 'staging' .
  • 10:20 jayme@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'apertium' for release 'production' .
  • 10:20 jayme@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'similar-users' for release 'main' .
  • 10:20 jayme@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'recommendation-api' for release 'production' .
  • 10:20 jayme@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'proton' for release 'production' .
  • 10:20 jayme@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventstreams' for release 'production' .
  • 10:20 jayme@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventstreams' for release 'canary' .
  • 10:20 jayme@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-main' for release 'canary' .
  • 10:19 jayme@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
  • 10:19 jayme@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'canary' .
  • 10:19 jayme@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
  • 10:19 jayme@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
  • 10:19 jayme@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'canary' .
  • 10:19 hashar@deploy1002: Pruned MediaWiki: 1.36.0-wmf.33 (duration: 01m 48s)
  • 10:18 marostegui@cumin1001: dbctl commit (dc=all): 'db1147 (re)pooling @ 25%: Slowly repool db1147 after schema change', diff saved to https://phabricator.wikimedia.org/P15030 and previous config saved to /var/cache/conftool/dbconfig/20210323-101836-root.json
  • 10:16 hashar@deploy1002: Pruned MediaWiki: 1.36.0-wmf.32 (duration: 14m 47s)
  • 10:10 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kubernetes1005.eqiad.wmnet
  • 10:02 hashar: scap clean --delete 1.36.0-wmf.32 # T274940
  • 10:01 hashar: Applied security patches for 1.36.0-wmf.36 # T274940
  • 09:57 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kubernetes1006.eqiad.wmnet
  • 09:56 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kubernetes1015.eqiad.wmnet
  • 09:54 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single for host kubernetes1006.eqiad.wmnet
  • 09:54 marostegui@cumin1001: dbctl commit (dc=all): 'Pool db1165 into s6 with minimal weight T258361', diff saved to https://phabricator.wikimedia.org/P15029 and previous config saved to /var/cache/conftool/dbconfig/20210323-095437-marostegui.json
  • 09:54 akosiaris@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'sync'.
  • 09:54 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kubernetes1016.eqiad.wmnet
  • 09:53 akosiaris: deploy helmfile.d/admin_ng for eqiad T277741
  • 09:53 hashar: scap prep 1.36.0-wmf.36 # T274940
  • 09:53 akosiaris@deploy1002: helmfile [eqiad] START helmfile.d/admin 'sync'.
  • 09:53 jayme@cumin1001: conftool action : set/pooled=yes; selector: dc=codfw,service=kubesvc,name=kubernetes2017.codfw.wmnet
  • 09:53 jayme@cumin1001: conftool action : set/weight=10; selector: dc=codfw,service=kubesvc,name=kubernetes2017.codfw.wmnet
  • 09:51 akosiaris@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'sync'.
  • 09:50 jayme@cumin1001: conftool action : set/pooled=yes; selector: dc=eqiad,service=kubesvc,name=kubernetes1017.eqiad.wmnet
  • 09:50 jayme@cumin1001: conftool action : set/weight=10; selector: dc=eqiad,service=kubesvc,name=kubernetes1017.eqiad.wmnet
  • 09:49 akosiaris@deploy1002: helmfile [eqiad] START helmfile.d/admin 'sync'.
  • 09:46 akosiaris@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'sync'.
  • 09:46 akosiaris@deploy1002: helmfile [eqiad] START helmfile.d/admin 'sync'.
  • 09:45 akosiaris@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'sync'.
  • 09:45 akosiaris@deploy1002: helmfile [eqiad] START helmfile.d/admin 'sync'.
  • 09:45 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes1014.eqiad.wmnet with reason: REIMAGE
  • 09:44 akosiaris@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'sync'.
  • 09:44 akosiaris@deploy1002: helmfile [eqiad] START helmfile.d/admin 'sync'.
  • 09:43 jayme@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1014.eqiad.wmnet with reason: REIMAGE
  • 09:43 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes1013.eqiad.wmnet with reason: REIMAGE
  • 09:42 marostegui@cumin1001: dbctl commit (dc=all): 'Pool db1165 into s6 with minimal weight T258361', diff saved to https://phabricator.wikimedia.org/P15028 and previous config saved to /var/cache/conftool/dbconfig/20210323-094257-marostegui.json
  • 09:41 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes1012.eqiad.wmnet with reason: REIMAGE
  • 09:41 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single for host kubernetes1016.eqiad.wmnet
  • 09:41 jayme@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1013.eqiad.wmnet with reason: REIMAGE
  • 09:40 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single for host kubernetes1015.eqiad.wmnet
  • 09:40 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single for host kubernetes1005.eqiad.wmnet
  • 09:39 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes1011.eqiad.wmnet with reason: REIMAGE
  • 09:38 jayme@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1012.eqiad.wmnet with reason: REIMAGE
  • 09:37 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes1010.eqiad.wmnet with reason: REIMAGE
  • 09:36 jayme@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1011.eqiad.wmnet with reason: REIMAGE
  • 09:36 jayme@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on kubernetes1004.eqiad.wmnet with reason: REIMAGE
  • 09:35 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes1009.eqiad.wmnet with reason: REIMAGE
  • 09:34 jayme@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1010.eqiad.wmnet with reason: REIMAGE
  • 09:33 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kubernetes1017.eqiad.wmnet
  • 09:33 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes1008.eqiad.wmnet with reason: REIMAGE
  • 09:32 jayme@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1009.eqiad.wmnet with reason: REIMAGE
  • 09:32 marostegui@cumin1001: dbctl commit (dc=all): 'Add db1165 to dbctl, depooled - T258361', diff saved to https://phabricator.wikimedia.org/P15027 and previous config saved to /var/cache/conftool/dbconfig/20210323-093246-marostegui.json
  • 09:31 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes1007.eqiad.wmnet with reason: REIMAGE
  • 09:31 jayme@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1004.eqiad.wmnet with reason: REIMAGE
  • 09:30 jayme@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1008.eqiad.wmnet with reason: REIMAGE
  • 09:29 jayme@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on kubernetes1003.eqiad.wmnet with reason: REIMAGE
  • 09:29 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes1002.eqiad.wmnet with reason: REIMAGE
  • 09:28 jayme@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1007.eqiad.wmnet with reason: REIMAGE
  • 09:28 jayme@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1003.eqiad.wmnet with reason: REIMAGE
  • 09:27 jayme@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1002.eqiad.wmnet with reason: REIMAGE
  • 09:26 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes1001.eqiad.wmnet with reason: REIMAGE
  • 09:26 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1086 to clone db1181 T275633', diff saved to https://phabricator.wikimedia.org/P15025 and previous config saved to /var/cache/conftool/dbconfig/20210323-092600-marostegui.json
  • 09:24 jayme@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1001.eqiad.wmnet with reason: REIMAGE
  • 09:18 akosiaris@cumin1001: conftool action : set/pooled=true; selector: dc=eqiad,cluster=kubernetes,name=kubernetes1017.eqiad.wmnet
  • 09:17 akosiaris@cumin1001: conftool action : set/pooled=yes; selector: dc=eqiad,service=kubemaster,cluster=kubernetes
  • 09:17 akosiaris@cumin1001: conftool action : set/weight=10; selector: dc=eqiad,service=kubemaster,cluster=kubernetes
  • 09:16 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single for host kubernetes1017.eqiad.wmnet
  • 09:14 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1147', diff saved to https://phabricator.wikimedia.org/P15024 and previous config saved to /var/cache/conftool/dbconfig/20210323-091432-marostegui.json
  • 09:05 akosiaris: reboot kubetcd100[456] for kernel upgrades. T277741 T273278
  • 09:04 akosiaris: empty etcd T277741
  • 08:43 akosiaris: poweroff argon and chlorine T277741
  • 08:41 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3317 (re)pooling @ 100%: Slowly repool db1098:3317', diff saved to https://phabricator.wikimedia.org/P15023 and previous config saved to /var/cache/conftool/dbconfig/20210323-083957-root.json
  • 08:41 akosiaris@cumin1001: conftool action : set/pooled=false; selector: name=eqiad,dnsdisc=zotero
  • 08:41 akosiaris@cumin1001: conftool action : set/pooled=false; selector: name=eqiad,dnsdisc=wikifeeds
  • 08:41 akosiaris@cumin1001: conftool action : set/pooled=false; selector: name=eqiad,dnsdisc=termbox
  • 08:41 akosiaris@cumin1001: conftool action : set/pooled=false; selector: name=eqiad,dnsdisc=similar-users
  • 08:41 akosiaris@cumin1001: conftool action : set/pooled=false; selector: name=eqiad,dnsdisc=sessionstore
  • 08:40 akosiaris@cumin1001: conftool action : set/pooled=false; selector: name=eqiad,dnsdisc=recommendation-api
  • 08:40 akosiaris@cumin1001: conftool action : set/pooled=false; selector: name=eqiad,dnsdisc=push-notifications
  • 08:40 akosiaris@cumin1001: conftool action : set/pooled=false; selector: name=eqiad,dnsdisc=proton
  • 08:40 akosiaris@cumin1001: conftool action : set/pooled=false; selector: name=eqiad,dnsdisc=mobileapps
  • 08:40 akosiaris@cumin1001: conftool action : set/pooled=false; selector: name=eqiad,dnsdisc=mathoid
  • 08:40 akosiaris@cumin1001: conftool action : set/pooled=false; selector: name=eqiad,dnsdisc=linkrecommendation
  • 08:40 akosiaris@cumin1001: conftool action : set/pooled=false; selector: name=eqiad,dnsdisc=eventstreams-internal
  • 08:40 akosiaris@cumin1001: conftool action : set/pooled=false; selector: name=eqiad,dnsdisc=eventstreams
  • 08:40 akosiaris@cumin1001: conftool action : set/pooled=false; selector: name=eqiad,dnsdisc=eventgate-main
  • 08:39 akosiaris@cumin1001: conftool action : set/pooled=false; selector: name=eqiad,dnsdisc=eventgate-logging-external
  • 08:39 akosiaris@cumin1001: conftool action : set/pooled=false; selector: name=eqiad,dnsdisc=eventgate-analytics-external
  • 08:39 akosiaris@cumin1001: conftool action : set/pooled=false; selector: name=eqiad,dnsdisc=eventgate-analytics
  • 08:39 akosiaris@cumin1001: conftool action : set/pooled=false; selector: name=eqiad,dnsdisc=echostore
  • 08:39 akosiaris@cumin1001: conftool action : set/pooled=false; selector: name=eqiad,dnsdisc=cxserver
  • 08:39 akosiaris@cumin1001: conftool action : set/pooled=false; selector: name=eqiad,dnsdisc=citoid
  • 08:39 akosiaris@cumin1001: conftool action : set/pooled=false; selector: name=eqiad,dnsdisc=blubberoid
  • 08:39 akosiaris@cumin1001: conftool action : set/pooled=false; selector: name=eqiad,dnsdisc=api-gateway
  • 08:39 akosiaris@cumin1001: conftool action : set/pooled=false; selector: name=eqiad,dnsdisc=apertium
  • 08:33 akosiaris: eqiad services in k8s depooled. T277741
  • 08:33 akosiaris@cumin1001: conftool action : set/pooled=false; selector: name=eqiad,dnsdisc=wikifeeds
  • 08:33 akosiaris@cumin1001: conftool action : set/pooled=false; selector: name=eqiad,dnsdisc=termbox
  • 08:33 akosiaris@cumin1001: conftool action : set/pooled=false; selector: name=eqiad,dnsdisc=similar-users
  • 08:33 akosiaris@cumin1001: conftool action : set/pooled=false; selector: name=eqiad,dnsdisc=sessionstore
  • 08:33 akosiaris@cumin1001: conftool action : set/pooled=false; selector: name=eqiad,dnsdisc=recommendation-api
  • 08:33 akosiaris@cumin1001: conftool action : set/pooled=false; selector: name=eqiad,dnsdisc=push-notifications
  • 08:32 akosiaris@cumin1001: conftool action : set/pooled=false; selector: name=eqiad,dnsdisc=proton
  • 08:32 akosiaris@cumin1001: conftool action : set/pooled=false; selector: name=eqiad,dnsdisc=mobileapps
  • 08:32 akosiaris@cumin1001: conftool action : set/pooled=false; selector: name=eqiad,dnsdisc=mathoid
  • 08:32 akosiaris@cumin1001: conftool action : set/pooled=false; selector: name=eqiad,dnsdisc=linkrecommendation
  • 08:32 akosiaris@cumin1001: conftool action : set/pooled=false; selector: name=eqiad,dnsdisc=eventstreams-internal
  • 08:32 akosiaris@cumin1001: conftool action : set/pooled=false; selector: name=eqiad,dnsdisc=eventstreams
  • 08:32 akosiaris@cumin1001: conftool action : set/pooled=false; selector: name=eqiad,dnsdisc=eventgate-main
  • 08:32 akosiaris@cumin1001: conftool action : set/pooled=false; selector: name=eqiad,dnsdisc=eventgate-logging-external
  • 08:32 akosiaris@cumin1001: conftool action : set/pooled=false; selector: name=eqiad,dnsdisc=eventgate-analytics-external
  • 08:32 akosiaris@cumin1001: conftool action : set/pooled=false; selector: name=eqiad,dnsdisc=eventgate-analytics
  • 08:32 akosiaris@cumin1001: conftool action : set/pooled=false; selector: name=eqiad,dnsdisc=echostore
  • 08:31 akosiaris@cumin1001: conftool action : set/pooled=false; selector: name=eqiad,dnsdisc=cxserver
  • 08:31 akosiaris@cumin1001: conftool action : set/pooled=false; selector: name=eqiad,dnsdisc=citoid
  • 08:31 akosiaris@cumin1001: conftool action : set/pooled=false; selector: name=eqiad,dnsdisc=blubberoid
  • 08:31 akosiaris@cumin1001: conftool action : set/pooled=false; selector: name=eqiad,dnsdisc=api-gateway
  • 08:31 akosiaris@cumin1001: conftool action : set/pooled=false; selector: name=eqiad,dnsdisc=apertium
  • 08:28 akosiaris: downtime all services in T277741 for 24H
  • 08:25 akosiaris: beginning the k8s upgrade/reinit process. T277741
  • 08:24 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3317 (re)pooling @ 75%: Slowly repool db1098:3317', diff saved to https://phabricator.wikimedia.org/P15022 and previous config saved to /var/cache/conftool/dbconfig/20210323-082454-root.json
  • 08:24 moritzm: installing mariadb-10.3 updates on buster (just client-side libs/tools, unrelated to the main wmf-mariadb packages)
  • 08:24 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on 18 hosts with reason: Reinitialize eqiad k8s cluster with new etcd
  • 08:24 akosiaris@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on 18 hosts with reason: Reinitialize eqiad k8s cluster with new etcd
  • 08:22 marostegui@cumin1001: dbctl commit (dc=all): 'db1146:3314 (re)pooling @ 100%: Slowly repool db1146:3314 after schema change', diff saved to https://phabricator.wikimedia.org/P15021 and previous config saved to /var/cache/conftool/dbconfig/20210323-082213-root.json
  • 08:09 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3317 (re)pooling @ 50%: Slowly repool db1098:3317', diff saved to https://phabricator.wikimedia.org/P15020 and previous config saved to /var/cache/conftool/dbconfig/20210323-080949-root.json
  • 08:07 marostegui@cumin1001: dbctl commit (dc=all): 'db1146:3314 (re)pooling @ 75%: Slowly repool db1146:3314 after schema change', diff saved to https://phabricator.wikimedia.org/P15019 and previous config saved to /var/cache/conftool/dbconfig/20210323-080709-root.json
  • 07:54 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3317 (re)pooling @ 25%: Slowly repool db1098:3317', diff saved to https://phabricator.wikimedia.org/P15017 and previous config saved to /var/cache/conftool/dbconfig/20210323-075445-root.json
  • 07:52 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1098:3317 to enable report_host T266483', diff saved to https://phabricator.wikimedia.org/P15016 and previous config saved to /var/cache/conftool/dbconfig/20210323-075253-marostegui.json
  • 07:52 marostegui@cumin1001: dbctl commit (dc=all): 'db1101:3318 (re)pooling @ 100%: Slowly repool db1101:3318', diff saved to https://phabricator.wikimedia.org/P15015 and previous config saved to /var/cache/conftool/dbconfig/20210323-075230-root.json
  • 07:52 marostegui@cumin1001: dbctl commit (dc=all): 'db1101:3317 (re)pooling @ 100%: Slowly repool db1101:3317', diff saved to https://phabricator.wikimedia.org/P15014 and previous config saved to /var/cache/conftool/dbconfig/20210323-075216-root.json
  • 07:52 marostegui@cumin1001: dbctl commit (dc=all): 'db1146:3314 (re)pooling @ 50%: Slowly repool db1146:3314 after schema change', diff saved to https://phabricator.wikimedia.org/P15013 and previous config saved to /var/cache/conftool/dbconfig/20210323-075206-root.json
  • 07:37 marostegui@cumin1001: dbctl commit (dc=all): 'db1101:3318 (re)pooling @ 75%: Slowly repool db1101:3318', diff saved to https://phabricator.wikimedia.org/P15012 and previous config saved to /var/cache/conftool/dbconfig/20210323-073726-root.json
  • 07:37 marostegui@cumin1001: dbctl commit (dc=all): 'db1101:3317 (re)pooling @ 75%: Slowly repool db1101:3317', diff saved to https://phabricator.wikimedia.org/P15011 and previous config saved to /var/cache/conftool/dbconfig/20210323-073713-root.json
  • 07:37 marostegui@cumin1001: dbctl commit (dc=all): 'db1146:3314 (re)pooling @ 25%: Slowly repool db1146:3314 after schema change', diff saved to https://phabricator.wikimedia.org/P15010 and previous config saved to /var/cache/conftool/dbconfig/20210323-073702-root.json
  • 07:36 elukey: create a 50g lvm volume on prometheus[12]00[34] for the k8s-mlserve cluster - T272918
  • 07:34 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1181.eqiad.wmnet with reason: REIMAGE
  • 07:32 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1181.eqiad.wmnet with reason: REIMAGE
  • 07:23 marostegui@cumin1001: dbctl commit (dc=all): 'db1086 (re)pooling @ 100%: Slowly repool db1086 after removing it from master', diff saved to https://phabricator.wikimedia.org/P15009 and previous config saved to /var/cache/conftool/dbconfig/20210323-072352-root.json
  • 07:22 marostegui@cumin1001: dbctl commit (dc=all): 'db1101:3318 (re)pooling @ 50%: Slowly repool db1101:3318', diff saved to https://phabricator.wikimedia.org/P15008 and previous config saved to /var/cache/conftool/dbconfig/20210323-072223-root.json
  • 07:22 marostegui@cumin1001: dbctl commit (dc=all): 'db1101:3317 (re)pooling @ 50%: Slowly repool db1101:3317', diff saved to https://phabricator.wikimedia.org/P15007 and previous config saved to /var/cache/conftool/dbconfig/20210323-072209-root.json
  • 07:08 marostegui@cumin1001: dbctl commit (dc=all): 'db1086 (re)pooling @ 75%: Slowly repool db1086 after removing it from master', diff saved to https://phabricator.wikimedia.org/P15006 and previous config saved to /var/cache/conftool/dbconfig/20210323-070849-root.json
  • 07:07 marostegui@cumin1001: dbctl commit (dc=all): 'db1101:3318 (re)pooling @ 25%: Slowly repool db1101:3318', diff saved to https://phabricator.wikimedia.org/P15005 and previous config saved to /var/cache/conftool/dbconfig/20210323-070719-root.json
  • 07:07 marostegui@cumin1001: dbctl commit (dc=all): 'db1101:3317 (re)pooling @ 25%: Slowly repool db1101:3317', diff saved to https://phabricator.wikimedia.org/P15004 and previous config saved to /var/cache/conftool/dbconfig/20210323-070705-root.json
  • 07:02 marostegui: Upgrade kernel on db1101
  • 06:59 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1101:3318 to enable report_host T266483', diff saved to https://phabricator.wikimedia.org/P15003 and previous config saved to /var/cache/conftool/dbconfig/20210323-065947-marostegui.json
  • 06:58 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1101:3317 to enable report_host T266483', diff saved to https://phabricator.wikimedia.org/P15002 and previous config saved to /var/cache/conftool/dbconfig/20210323-065836-marostegui.json
  • 06:53 marostegui@cumin1001: dbctl commit (dc=all): 'db1086 (re)pooling @ 50%: Slowly repool db1086 after removing it from master', diff saved to https://phabricator.wikimedia.org/P15001 and previous config saved to /var/cache/conftool/dbconfig/20210323-065345-root.json
  • 06:38 marostegui@cumin1001: dbctl commit (dc=all): 'db1086 (re)pooling @ 25%: Slowly repool db1086 after removing it from master', diff saved to https://phabricator.wikimedia.org/P15000 and previous config saved to /var/cache/conftool/dbconfig/20210323-063842-root.json
  • 06:29 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1146:3314', diff saved to https://phabricator.wikimedia.org/P14999 and previous config saved to /var/cache/conftool/dbconfig/20210323-062942-marostegui.json
  • 06:23 marostegui@cumin1001: dbctl commit (dc=all): 'db1086 (re)pooling @ 10%: Slowly repool db1086 after removing it from master', diff saved to https://phabricator.wikimedia.org/P14998 and previous config saved to /var/cache/conftool/dbconfig/20210323-062338-root.json
  • 06:20 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1086', diff saved to https://phabricator.wikimedia.org/P14997 and previous config saved to /var/cache/conftool/dbconfig/20210323-062059-marostegui.json
  • 06:20 marostegui: Upgrade kernel on db1086
  • 06:07 marostegui@cumin1001: dbctl commit (dc=all): 'db1086 (re)pooling @ 25%: Slowly repool db1086 after removing it from master', diff saved to https://phabricator.wikimedia.org/P14996 and previous config saved to /var/cache/conftool/dbconfig/20210323-060701-root.json
  • 06:02 marostegui@cumin1001: dbctl commit (dc=all): 'Promote db1136 to s7 master and remove read-only from s7 T274336', diff saved to https://phabricator.wikimedia.org/P14995 and previous config saved to /var/cache/conftool/dbconfig/20210323-060216-marostegui.json
  • 06:01 marostegui@cumin1001: dbctl commit (dc=all): 'Set s7 as read-only for maintenance T274336', diff saved to https://phabricator.wikimedia.org/P14994 and previous config saved to /var/cache/conftool/dbconfig/20210323-060104-marostegui.json
  • 06:00 marostegui: Starting s7 eqiad failover from db1086 to db1136 - T274336
  • 05:13 marostegui@cumin1001: dbctl commit (dc=all): 'Add db1174 to api T274336', diff saved to https://phabricator.wikimedia.org/P14993 and previous config saved to /var/cache/conftool/dbconfig/20210323-051346-marostegui.json
  • 05:12 marostegui@cumin1001: dbctl commit (dc=all): 'Set weight 0 to db1136 before failover T274336', diff saved to https://phabricator.wikimedia.org/P14992 and previous config saved to /var/cache/conftool/dbconfig/20210323-051210-marostegui.json
  • 00:07 tstarling@deploy1002: Synchronized wmf-config: use RequestTimeout library step 3: clean up (duration: 00m 58s)
  • 00:06 tstarling@deploy1002: Synchronized wmf-config/CommonSettings.php: use RequestTimeout library step 2: enable new system (duration: 00m 57s)
  • 00:04 tstarling@deploy1002: Synchronized wmf-config/PhpAutoPrepend.php: use RequestTimeout library step 1: disable old request timeout system (duration: 00m 58s)

2021-03-22

  • 23:52 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudgw1002.eqiad.wmnet with reason: REIMAGE
  • 23:49 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudgw1002.eqiad.wmnet with reason: REIMAGE
  • 23:34 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw2250.codfw.wmnet
  • 23:21 pt1979@cumin2001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 23:18 ebernhardson@deploy1002: Synchronized php-1.36.0-wmf.35/extensions/WikimediaEvents/modules/ext.wikimediaEvents/searchSatisfaction.js: T262612: Start glent m1 ab test (duration: 01m 53s)
  • 23:18 pt1979@cumin2001: START - Cookbook sre.dns.netbox
  • 23:08 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw2250.codfw.wmnet
  • 23:01 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw2249.codfw.wmnet
  • 22:52 mutante: decom mw2249
  • 22:44 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw2249.codfw.wmnet
  • 21:08 sbassett: Deployed security patch for T272244
  • 20:02 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2279.codfw.wmnet,service=canary
  • 20:02 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2278.codfw.wmnet,service=canary
  • 20:02 dzahn@cumin1001: conftool action : set/weight=1; selector: name=mw2279.codfw.wmnet,service=canary
  • 20:02 dzahn@cumin1001: conftool action : set/weight=1; selector: name=mw2278.codfw.wmnet,service=canary
  • 19:50 mutante: gerrit2001 - restarted apache2 as well for consistency
  • 19:47 mutante: gerrit - restarting apache2 after we dropped MaxClients config line. This should make us fall back to Debian default MaxRequestWorkers. (since we use event MPM we should not be using MaxClients in the first place, says #httpd) (T277127)
  • 18:20 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 25247c9: hrwiki: Configure mentorship for Growth team features (T275684) (duration: 01m 00s)
  • 18:13 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 951601f: Grant enwiki pagemovers the delete-redirect right (T278131) (duration: 00m 59s)
  • 17:30 Trey314159: reindexing Italian wikis on elastic@eqiad, elastic@codfw, and cloudelastic (T274200)
  • 16:49 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 16:48 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 16:47 jayme@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
  • 16:46 jayme@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
  • 16:37 jayme@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
  • 16:37 jayme@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
  • 16:12 pt1979@cumin2001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:07 pt1979@cumin2001: START - Cookbook sre.dns.netbox
  • 15:58 marostegui@cumin1001: dbctl commit (dc=all): 'db1144:3314 (re)pooling @ 100%: Slowly repool db1144:3314', diff saved to https://phabricator.wikimedia.org/P14990 and previous config saved to /var/cache/conftool/dbconfig/20210322-155808-root.json
  • 15:43 marostegui@cumin1001: dbctl commit (dc=all): 'db1144:3314 (re)pooling @ 75%: Slowly repool db1144:3314', diff saved to https://phabricator.wikimedia.org/P14989 and previous config saved to /var/cache/conftool/dbconfig/20210322-154304-root.json
  • 15:38 pt1979@cumin2001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:33 pt1979@cumin2001: START - Cookbook sre.dns.netbox
  • 15:28 marostegui@cumin1001: dbctl commit (dc=all): 'db1144:3314 (re)pooling @ 50%: Slowly repool db1144:3314', diff saved to https://phabricator.wikimedia.org/P14988 and previous config saved to /var/cache/conftool/dbconfig/20210322-152800-root.json
  • 15:12 marostegui@cumin1001: dbctl commit (dc=all): 'db1144:3314 (re)pooling @ 25%: Slowly repool db1144:3314', diff saved to https://phabricator.wikimedia.org/P14987 and previous config saved to /var/cache/conftool/dbconfig/20210322-151257-root.json
  • 14:26 jayme@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
  • 14:25 jayme@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
  • 14:23 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'sync'.
  • 14:22 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'sync'.
  • 14:14 elukey@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
  • 14:14 elukey@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
  • 14:11 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1144:3314 for schema change', diff saved to https://phabricator.wikimedia.org/P14986 and previous config saved to /var/cache/conftool/dbconfig/20210322-141146-marostegui.json
  • 14:08 marostegui@cumin1001: dbctl commit (dc=all): 'db1143 (re)pooling @ 100%: Slowly repool db1143', diff saved to https://phabricator.wikimedia.org/P14985 and previous config saved to /var/cache/conftool/dbconfig/20210322-140800-root.json
  • 14:07 XioNoX: rename cloud-hosts1-b-eqiad to cloud-hosts1-eqiad - T277771
  • 14:07 XioNoX: rename cloud-hosts1-b-eqiad to cloud-hosts1-eqiad
  • 13:52 marostegui@cumin1001: dbctl commit (dc=all): 'db1143 (re)pooling @ 75%: Slowly repool db1143', diff saved to https://phabricator.wikimedia.org/P14984 and previous config saved to /var/cache/conftool/dbconfig/20210322-135256-root.json
  • 13:37 marostegui@cumin1001: dbctl commit (dc=all): 'db1143 (re)pooling @ 50%: Slowly repool db1143', diff saved to https://phabricator.wikimedia.org/P14983 and previous config saved to /var/cache/conftool/dbconfig/20210322-133753-root.json
  • 13:26 kharlan@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'linkrecommendation' for release 'external' .
  • 13:26 kharlan@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'linkrecommendation' for release 'production' .
  • 13:22 marostegui@cumin1001: dbctl commit (dc=all): 'db1143 (re)pooling @ 25%: Slowly repool db1143', diff saved to https://phabricator.wikimedia.org/P14982 and previous config saved to /var/cache/conftool/dbconfig/20210322-132249-root.json
  • 13:20 kharlan@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'linkrecommendation' for release 'external' .
  • 13:20 kharlan@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'linkrecommendation' for release 'production' .
  • 13:16 kharlan@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'linkrecommendation' for release 'staging' .
  • 12:28 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'sync'.
  • 12:27 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'sync'.
  • 12:20 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'sync'.
  • 12:19 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'sync'.
  • 12:19 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1143 for schema change', diff saved to https://phabricator.wikimedia.org/P14981 and previous config saved to /var/cache/conftool/dbconfig/20210322-121924-marostegui.json
  • 11:29 marostegui@cumin1001: dbctl commit (dc=all): 'db1085 (re)pooling @ 100%: Slowly repool db1085', diff saved to https://phabricator.wikimedia.org/P14980 and previous config saved to /var/cache/conftool/dbconfig/20210322-112954-root.json
  • 11:27 marostegui@cumin1001: dbctl commit (dc=all): 'db1142 (re)pooling @ 100%: Slowly repool db1142', diff saved to https://phabricator.wikimedia.org/P14979 and previous config saved to /var/cache/conftool/dbconfig/20210322-112707-root.json
  • 11:15 elukey@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
  • 11:15 elukey@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
  • 11:15 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
  • 11:14 marostegui@cumin1001: dbctl commit (dc=all): 'db1085 (re)pooling @ 75%: Slowly repool db1085', diff saved to https://phabricator.wikimedia.org/P14978 and previous config saved to /var/cache/conftool/dbconfig/20210322-111451-root.json
  • 11:14 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
  • 11:12 marostegui@cumin1001: dbctl commit (dc=all): 'db1142 (re)pooling @ 75%: Slowly repool db1142', diff saved to https://phabricator.wikimedia.org/P14977 and previous config saved to /var/cache/conftool/dbconfig/20210322-111203-root.json
  • 11:09 hnowlan@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 11:09 hnowlan@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 10:59 marostegui@cumin1001: dbctl commit (dc=all): 'db1085 (re)pooling @ 50%: Slowly repool db1085', diff saved to https://phabricator.wikimedia.org/P14976 and previous config saved to /var/cache/conftool/dbconfig/20210322-105947-root.json
  • 10:57 marostegui@cumin1001: dbctl commit (dc=all): 'db1142 (re)pooling @ 50%: Slowly repool db1142', diff saved to https://phabricator.wikimedia.org/P14975 and previous config saved to /var/cache/conftool/dbconfig/20210322-105700-root.json
  • 10:53 akosiaris@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'cxserver' for release 'staging' .
  • 10:53 akosiaris@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'cxserver' for release 'production' .
  • 10:51 moritzm: installing libdbi-perl security updates
  • 10:48 hnowlan@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 10:48 hnowlan@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 10:48 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
  • 10:48 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
  • 10:47 elukey@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
  • 10:47 elukey@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
  • 10:44 marostegui@cumin1001: dbctl commit (dc=all): 'db1085 (re)pooling @ 25%: Slowly repool db1085', diff saved to https://phabricator.wikimedia.org/P14974 and previous config saved to /var/cache/conftool/dbconfig/20210322-104443-root.json
  • 10:42 marostegui@cumin1001: dbctl commit (dc=all): 'db1142 (re)pooling @ 25%: Slowly repool db1142', diff saved to https://phabricator.wikimedia.org/P14973 and previous config saved to /var/cache/conftool/dbconfig/20210322-104156-root.json
  • 10:42 hnowlan@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 10:41 hnowlan@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 10:41 jdrewniak@deploy1002: Synchronized portals: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 00m 58s)
  • 10:40 jdrewniak@deploy1002: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 00m 58s)
  • 10:34 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
  • 10:33 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
  • 10:32 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
  • 10:32 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
  • 10:27 elukey@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
  • 10:26 elukey@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
  • 10:26 elukey@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
  • 10:25 elukey@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
  • 10:21 jayme@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
  • 10:21 jayme@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
  • 10:17 elukey@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
  • 10:17 elukey@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
  • 10:15 elukey@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
  • 10:15 elukey@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
  • 10:12 elukey: run homer for cr1/cr2 eqiad and codfw to add new iBGP session for the k8s ML clusters - https://gerrit.wikimedia.org/r/c/operations/homer/public/+/661055
  • 09:50 reedy@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config cleanup (duration: 00m 57s)
  • 09:49 reedy@deploy1002: Synchronized wmf-config/InitialiseSettings-labs.php: Config cleanup (duration: 00m 59s)
  • 09:48 reedy@deploy1002: Synchronized wmf-config/CommonSettings-labs.php: Config cleanup (duration: 01m 20s)
  • 09:36 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1142 for schema change', diff saved to https://phabricator.wikimedia.org/P14971 and previous config saved to /var/cache/conftool/dbconfig/20210322-093558-marostegui.json
  • 09:15 marostegui@cumin1001: dbctl commit (dc=all): 'db1141 (re)pooling @ 100%: Slowly repool db1141', diff saved to https://phabricator.wikimedia.org/P14970 and previous config saved to /var/cache/conftool/dbconfig/20210322-091534-root.json
  • 09:00 marostegui@cumin1001: dbctl commit (dc=all): 'db1141 (re)pooling @ 75%: Slowly repool db1141', diff saved to https://phabricator.wikimedia.org/P14969 and previous config saved to /var/cache/conftool/dbconfig/20210322-090030-root.json
  • 08:45 marostegui@cumin1001: dbctl commit (dc=all): 'db1141 (re)pooling @ 50%: Slowly repool db1141', diff saved to https://phabricator.wikimedia.org/P14968 and previous config saved to /var/cache/conftool/dbconfig/20210322-084527-root.json
  • 08:30 marostegui@cumin1001: dbctl commit (dc=all): 'db1141 (re)pooling @ 25%: Slowly repool db1141', diff saved to https://phabricator.wikimedia.org/P14967 and previous config saved to /var/cache/conftool/dbconfig/20210322-083023-root.json
  • 08:13 godog: swift eqiad-prod: less weight for ms-be[1019-1026] / more weight to ms-be106[0-3] - T272836 T268435
  • 08:13 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1158.eqiad.wmnet with reason: REIMAGE
  • 08:11 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1158.eqiad.wmnet with reason: REIMAGE
  • 08:02 jayme: build and release docker-registry.discovery.wmnet/eventrouter:0.3.0-6, docker-registry.discovery.wmnet/fluent-bit:1.5.3-3, docker-registry.discovery.wmnet/ratelimit:1.5.1-s3
  • 08:00 marostegui: Stop MySQL on db1085 to clone db1165 (lag will appear on s6 on wiki replicas) T258361
  • 08:00 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1085 to clone db1165', diff saved to https://phabricator.wikimedia.org/P14965 and previous config saved to /var/cache/conftool/dbconfig/20210322-080020-marostegui.json
  • 07:51 elukey: stop/start mariadb instances on dbstore1004 to reduce buffer pool memory settings - T273865
  • 07:37 marostegui@cumin1001: dbctl commit (dc=all): 'db1161 (re)pooling @ 100%: Slowly repool db1161', diff saved to https://phabricator.wikimedia.org/P14964 and previous config saved to /var/cache/conftool/dbconfig/20210322-073747-root.json
  • 07:22 marostegui@cumin1001: dbctl commit (dc=all): 'db1161 (re)pooling @ 75%: Slowly repool db1161', diff saved to https://phabricator.wikimedia.org/P14963 and previous config saved to /var/cache/conftool/dbconfig/20210322-072243-root.json
  • 07:14 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1141 for schema change', diff saved to https://phabricator.wikimedia.org/P14962 and previous config saved to /var/cache/conftool/dbconfig/20210322-071430-marostegui.json
  • 07:07 marostegui@cumin1001: dbctl commit (dc=all): 'db1161 (re)pooling @ 50%: Slowly repool db1161', diff saved to https://phabricator.wikimedia.org/P14961 and previous config saved to /var/cache/conftool/dbconfig/20210322-070740-root.json
  • 06:52 marostegui@cumin1001: dbctl commit (dc=all): 'db1161 (re)pooling @ 25%: Slowly repool db1161', diff saved to https://phabricator.wikimedia.org/P14960 and previous config saved to /var/cache/conftool/dbconfig/20210322-065236-root.json
  • 06:37 marostegui@cumin1001: dbctl commit (dc=all): 'Remove db1084 from dbctl T276302', diff saved to https://phabricator.wikimedia.org/P14959 and previous config saved to /var/cache/conftool/dbconfig/20210322-063732-marostegui.json
  • 06:11 marostegui: Sanitize db1124 db2094 db1154: taywiki trvwiki mnwwiktionary
  • 04:28 kartik@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'cxserver' for release 'staging' .

2021-03-21

  • 10:25 _joe_: restarting gerrit on gerrit1001, using 45G of reserved memory
  • 09:22 elukey: install apache2-bin-dbgsym on gerrit1001 - T277127
  • 08:50 qchris: Restarting apache on gerrit1001 again (all apache workers busy again) see T277127
  • 08:18 qchris: Restarting apache on gerrit1001 (all apache workers busy)

2021-03-20

  • 00:22 tzatziki: altering emails for STei (WMF) and SGrabarczuk (WMF)

2021-03-19

  • 21:11 mutante: scandium - stop apache and rerun puppet which fails after reimaging because it tries to run an nginx on port 80 which is already used by apache T268248
  • 20:31 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on scandium.eqiad.wmnet with reason: REIMAGE
  • 20:29 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on scandium.eqiad.wmnet with reason: REIMAGE
  • 20:15 mutante: scandium - reimaging with buster
  • 20:14 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on scandium.eqiad.wmnet with reason: reimage
  • 20:14 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on scandium.eqiad.wmnet with reason: reimage
  • 20:11 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw2245.codfw.wmnet
  • 19:55 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw2245.codfw.wmnet
  • 19:53 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw2244.codfw.wmnet
  • 19:53 legoktm@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host lists1002.wikimedia.org
  • 19:50 mutante: testreduce1001 - confirmed MariaDB @@datadir is /srv/data/mysql and deleting /var/lib/mysql (T277580)
  • 19:40 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw2244.codfw.wmnet
  • 19:39 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw2245.codfw.wmnet
  • 19:39 legoktm@cumin1001: START - Cookbook sre.ganeti.makevm for new host lists1002.wikimedia.org
  • 19:39 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw2244.codfw.wmnet
  • 19:37 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2252.codfw.wmnet,service=canary
  • 19:37 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2251.codfw.wmnet,service=canary
  • 19:33 dzahn@cumin1001: conftool action : set/weight=1; selector: name=mw2252.codfw.wmnet,service=canary
  • 19:33 dzahn@cumin1001: conftool action : set/weight=1; selector: name=mw2251.codfw.wmnet,service=canary
  • 19:24 mutante: deploy2002 - re-enabled puppet, reverted patch of scap-sync-master
  • 18:46 mutante: deploy2002 - disable puppet, copy modified version of scap-master-sync over it that does not --exclude="**/cache/l10n/*.cdb" (for T275826)
  • 16:01 effie: upgrade memcached on mc-gp200*
  • 12:36 klausman@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ml-serve2002.codfw.wmnet with reason: REIMAGE
  • 12:34 klausman@cumin2001: START - Cookbook sre.hosts.downtime for 2:00:00 on ml-serve2002.codfw.wmnet with reason: REIMAGE
  • 12:10 effie: upgrade memcached on mc1026,mc2026
  • 11:37 klausman@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
  • 11:37 klausman@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
  • 11:36 klausman@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
  • 11:36 klausman@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
  • 11:30 klausman@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
  • 11:29 klausman@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
  • 11:29 klausman@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
  • 11:29 klausman@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
  • 11:29 klausman@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
  • 11:29 klausman@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
  • 11:27 akosiaris@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
  • 11:27 akosiaris@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
  • 11:20 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ml-serve2002.codfw.wmnet with reason: REIMAGE
  • 11:18 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on ml-serve2002.codfw.wmnet with reason: REIMAGE
  • 10:45 klausman@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
  • 10:45 klausman@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
  • 10:42 moritzm: installing dbmonitor1002 T224589
  • 10:42 klausman@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
  • 10:42 klausman@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
  • 10:41 klausman@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
  • 10:41 klausman@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
  • 10:11 kharlan@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'linkrecommendation' for release 'external' .
  • 10:10 kharlan@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'linkrecommendation' for release 'production' .
  • 10:05 kharlan@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'linkrecommendation' for release 'external' .
  • 10:04 kharlan@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'linkrecommendation' for release 'production' .
  • 09:40 kharlan@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'linkrecommendation' for release 'staging' .
  • 09:36 jayme@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'proton' for release 'production' .
  • 08:22 elukey: upload alluxio 2.4.1 to thirdparty/bigtop15 on stretch/buster-wikimedia
  • 07:16 ryankemper: T275885 `ryankemper@cumin1001:~$ sudo cumin 'P{relforge*}' 'sudo run-puppet-agent'` (change hadn't been merged when I ran the agent earlier)
  • 04:04 eileen: civicrm revision changed from 99bf1c9210 to 39d24e8b0a, config revision is 26b02db7ba
  • 03:27 ryankemper: [wdqs] `ryankemper@wdqs1013:~$ sudo systemctl restart wdqs-blazegraph`
  • 03:26 ryankemper: T275885 `ryankemper@cumin1001:~$ sudo cumin 'P{relforge*}' 'sudo run-puppet-agent'`
  • 02:43 ryankemper: T275885 Revoking current `relforge` TLS cert in advance of generation of new cert: `ryankemper@puppetmaster1001:/srv/private$ sudo puppet cert clean relforge.svc.eqiad.wmnet`
  • 00:51 dancy@deploy1002: Synchronized php-1.36.0-wmf.35/extensions/LiquidThreads/classes/Thread.php: T277772 (duration: 00m 58s)
  • 00:45 mutante: testreduce1001 - stop mysql; rsyncing /var/lib/mysql to /srv/data/mysql (T277580)

2021-03-18

  • 23:56 legoktm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Don't define a default icon (T274199) (duration: 00m 57s)
  • 23:38 brennen@deploy1002: Synchronized php-1.36.0-wmf.35/includes/user/ActorStore.php: Backport: ActorStore::getActorById - fall back to master. (T277795) (duration: 00m 57s)
  • 23:35 brennen@deploy1002: Synchronized php-1.36.0-wmf.35/includes/user/ActorStore.php: Backport: ActorStore::getActorById - fall back to master. (T277795) (duration: 00m 58s)
  • 23:25 dduvall@deploy1002: Synchronized .pipeline: config: Use build environment HTTP proxy for APT sources (T277109) (duration: 01m 02s)
  • 23:06 brennen: train status: 1.36.0-wmf.35 (T274939) stable on all wikis after deploy of hotfix for T277795
  • 22:53 brennen@deploy1002: Synchronized php-1.36.0-wmf.35/includes/specials/SpecialContributions.php: Backport: ActorStore::getActorById - fall back to master. (T277795) (duration: 01m 07s)
  • 22:30 dduvall@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'blubberoid' for release 'production' .
  • 22:29 dduvall@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'blubberoid' for release 'production' .
  • 22:25 dduvall@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'blubberoid' for release 'staging' .
  • 20:37 dancy@deploy1002: Synchronized php-1.36.0-wmf.35/extensions/LiquidThreads/classes/Thread.php: (no justification provided) (duration: 01m 05s)
  • 19:04 dancy@deploy1002: rebuilt and synchronized wikiversions files: group2 wikis to 1.36.0-wmf.35
  • 18:28 legoktm: re-enabled puppet on registry*
  • 18:17 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 44eddcc: hrwiki: Deploy Growth features to newcomers (T275684) (duration: 01m 08s)
  • 18:12 urbanecm@deploy1002: Synchronized dblists/growthexperiments.dblist: 179d9e5: mswiki: Enable Growth features in stealth mode (T277562; 2/2) (duration: 01m 08s)
  • 18:10 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 179d9e5: mswiki: Enable Growth features in stealth mode (T277562; 1/2) (duration: 01m 11s)
  • 17:58 legoktm: disabled puppet on registry* for rolling out https://gerrit.wikimedia.org/r/672537
  • 17:50 urbanecm@deploy1002: Synchronized dblists/growthexperiments.dblist: 55aa6cb: tewiki: Enable Growth features in stealth mode (T277491; 2/2) (duration: 01m 08s)
  • 17:50 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw2242.codfw.wmnet
  • 17:48 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 55aa6cb: tewiki: Enable Growth features in stealth mode (T277491; 1/2) (duration: 01m 10s)
  • 17:45 urbanecm@deploy1002: Synchronized dblists/growthexperiments.dblist: 04342e9: simplewiki: Enable Growth team features in stealth mode (T277550) (duration: 01m 09s)
  • 17:42 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 04342e9: simplewiki: Enable Growth team features in stealth mode (T277550) (duration: 01m 10s)
  • 17:40 dduvall@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'blubberoid' for release 'staging' .
  • 17:31 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw2242.codfw.wmnet
  • 17:28 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw2241.codfw.wmnet
  • 17:09 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw2241.codfw.wmnet
  • 17:07 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw2240.codfw.wmnet
  • 16:54 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw2240.codfw.wmnet
  • 16:51 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw2239.codfw.wmnet
  • 16:38 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw2239.codfw.wmnet
  • 16:37 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw2242.codfw.wmnet
  • 16:37 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw2241.codfw.wmnet
  • 16:37 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw2240.codfw.wmnet
  • 16:37 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw2239.codfw.wmnet
  • 15:33 shdubsh: clean up dead letter queue and restart all logstashes
  • 14:50 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:43 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 14:37 dcausse: repooling wdqs1005
  • 14:29 hashar: Restarting CI Jenkins for plugin upgrade
  • 13:49 elukey: reboot analytics1066
  • 13:23 ladsgroup@deploy1002: Synchronized php-1.36.0-wmf.35/extensions/Wikibase/repo: languageLabelDescriptionAliases: use getLanguageNameByCode (T275611 T277722) (duration: 01m 14s)
  • 12:58 jbond42: upload cas_6.3.2 to apt buster-wikimedia
  • 11:37 mvolz@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'zotero' for release 'production' .
  • 11:34 mvolz@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'zotero' for release 'production' .
  • 11:25 mvolz@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'zotero' for release 'staging' .
  • 11:24 urbanecm@deploy1002: Synchronized wmf-config/flaggedrevs.php: 896c9f0: flaggedrevs: Disable multiple dimensions in hewikisource (duration: 01m 09s)
  • 11:20 urbanecm@deploy1002: Synchronized php-1.36.0-wmf.35/extensions/GrowthExperiments/includes/HomepageHooks.php: 3b2aa1a: Remove variant C from list of valid variants (T277727) (duration: 01m 09s)
  • 11:16 mvolz@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'citoid' for release 'production' .
  • 11:14 mvolz@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'citoid' for release 'production' .
  • 11:11 mvolz@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'citoid' for release 'staging' .
  • 11:11 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 0005676: GrowthExperiments: set $wgGEHomepageNewAccountVariants to D only (T277727) (duration: 01m 10s)
  • 11:08 urbanecm@deploy1002: Synchronized wmf-config/CommonSettings.php: NOOP: e7f5eac: Enable CentralAuth IRC feed in beta cluster (T277432) (duration: 01m 12s)
  • 09:13 _joe_: hard reboot of snapshot1005
  • 09:04 _joe_: attempted reboot of snapshot1005, read-only filesystem and probably disks are broken beyond repair
  • 08:27 godog: swift eqiad-prod: less weight for ms-be[1019-1026] - T272836
  • 08:18 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ml-serve1004.eqiad.wmnet with reason: REIMAGE
  • 08:16 akosiaris@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on ml-serve1004.eqiad.wmnet with reason: REIMAGE
  • 08:03 marostegui@cumin1001: dbctl commit (dc=all): 'db1126 (re)pooling @ 100%: Slowly repool db1126', diff saved to https://phabricator.wikimedia.org/P14946 and previous config saved to /var/cache/conftool/dbconfig/20210318-080258-root.json
  • 08:02 akosiaris: reimage ml-serve1004 to debug a docker volume_group issue
  • 07:47 marostegui@cumin1001: dbctl commit (dc=all): 'db1126 (re)pooling @ 75%: Slowly repool db1126', diff saved to https://phabricator.wikimedia.org/P14945 and previous config saved to /var/cache/conftool/dbconfig/20210318-074754-root.json
  • 07:32 marostegui@cumin1001: dbctl commit (dc=all): 'db1126 (re)pooling @ 50%: Slowly repool db1126', diff saved to https://phabricator.wikimedia.org/P14944 and previous config saved to /var/cache/conftool/dbconfig/20210318-073250-root.json
  • 07:20 dcausse: depooling & restarting blazegraph on wdqs1005
  • 07:19 marostegui: Deploy schema change on s4 codfw master, lag will appear - T276150 T276156
  • 07:17 marostegui@cumin1001: dbctl commit (dc=all): 'db1126 (re)pooling @ 25%: Slowly repool db1126', diff saved to https://phabricator.wikimedia.org/P14943 and previous config saved to /var/cache/conftool/dbconfig/20210318-071747-root.json
  • 07:15 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1156.eqiad.wmnet with reason: REIMAGE
  • 07:13 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1156.eqiad.wmnet with reason: REIMAGE
  • 06:32 marostegui@cumin1001: dbctl commit (dc=all): 'Add db1161 to dbctl, depooled T258361', diff saved to https://phabricator.wikimedia.org/P14942 and previous config saved to /var/cache/conftool/dbconfig/20210318-063241-marostegui.json
  • 06:22 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db2120', diff saved to https://phabricator.wikimedia.org/P14941 and previous config saved to /var/cache/conftool/dbconfig/20210318-062201-marostegui.json
  • 06:04 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1126 for schema change', diff saved to https://phabricator.wikimedia.org/P14940 and previous config saved to /var/cache/conftool/dbconfig/20210318-060445-marostegui.json
  • 03:46 andrewbogott: restarting slapd on seaborgium, serpens, and r-o ldap replicas (we're getting irregular connection failures)
  • 00:05 eileen: tools revision changed from b7b4060c30 to ef54260b0d

2021-03-17

  • 23:42 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: c730dd5: idwiki: Deploy Growth features to newcomers (T259024) (duration: 01m 08s)
  • 23:40 urbanecm@deploy1002: Synchronized wmf-config/CommonSettings.php: 5c14e7d: Define confirmed group in MediaWikiServices hook (T275334, T277704, T275310, T275333) (duration: 01m 08s)
  • 23:30 ebernhardson@deploy1002: Synchronized php-1.36.0-wmf.35/extensions/CirrusSearch/profiles/FallbackProfiles.config.php: Add fallback profile including glent m1 (duration: 01m 42s)
  • 22:27 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1038.eqiad.wmnet with reason: REIMAGE
  • 22:25 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1037.eqiad.wmnet with reason: REIMAGE
  • 22:25 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1038.eqiad.wmnet with reason: REIMAGE
  • 22:23 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1037.eqiad.wmnet with reason: REIMAGE
  • 20:52 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1184.eqiad.wmnet with reason: REIMAGE
  • 20:50 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1183.eqiad.wmnet with reason: REIMAGE
  • 20:49 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1184.eqiad.wmnet with reason: REIMAGE
  • 20:48 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1182.eqiad.wmnet with reason: REIMAGE
  • 20:47 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1183.eqiad.wmnet with reason: REIMAGE
  • 20:46 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1181.eqiad.wmnet with reason: REIMAGE
  • 20:45 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1182.eqiad.wmnet with reason: REIMAGE
  • 20:44 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1180.eqiad.wmnet with reason: REIMAGE
  • 20:43 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1181.eqiad.wmnet with reason: REIMAGE
  • 20:42 andrew@deploy1002: Finished deploy [horizon/deploy@17ea780]: display volume usage summaries (duration: 03m 34s)
  • 20:42 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1179.eqiad.wmnet with reason: REIMAGE
  • 20:41 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1180.eqiad.wmnet with reason: REIMAGE
  • 20:40 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1178.eqiad.wmnet with reason: REIMAGE
  • 20:39 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1179.eqiad.wmnet with reason: REIMAGE
  • 20:39 andrew@deploy1002: Started deploy [horizon/deploy@17ea780]: display volume usage summaries
  • 20:38 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1177.eqiad.wmnet with reason: REIMAGE
  • 20:37 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1178.eqiad.wmnet with reason: REIMAGE
  • 20:35 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1177.eqiad.wmnet with reason: REIMAGE
  • 20:19 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw2238.codfw.wmnet
  • 20:08 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw2238.codfw.wmnet
  • 20:07 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1176.eqiad.wmnet with reason: REIMAGE
  • 20:05 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1176.eqiad.wmnet with reason: REIMAGE
  • 20:05 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw2237.codfw.wmnet
  • 19:54 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw2237.codfw.wmnet
  • 19:51 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw2236.codfw.wmnet
  • 19:48 andrew@deploy1002: Finished deploy [horizon/deploy@3c2d1ee]: support VM resizing (duration: 03m 42s)
  • 19:44 andrew@deploy1002: Started deploy [horizon/deploy@3c2d1ee]: support VM resizing
  • 19:42 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw2236.codfw.wmnet
  • 19:42 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw2238.codfw.wmnet
  • 19:42 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw2237.codfw.wmnet
  • 19:42 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw2236.codfw.wmnet
  • 19:38 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw2235.codfw.wmnet
  • 19:29 mutante: testreduce1001 - rebooted, fdisk /dev/sdb, create partition table, create primary partition, mkfs.ext4 /dev/vdb1
  • 19:23 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw2235.codfw.wmnet
  • 19:18 andrew@deploy1002: Finished deploy [horizon/deploy@8967660]: clean up a reverted hack (duration: 03m 25s)
  • 19:17 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw2234.codfw.wmnet
  • 19:14 andrew@deploy1002: Started deploy [horizon/deploy@8967660]: clean up a reverted hack
  • 19:06 dancy@deploy1002: Synchronized php: group1 wikis to 1.36.0-wmf.35 (duration: 01m 26s)
  • 19:05 mutante: ganeti1011 - rebooting VM testreduce1001 on ganeti level for T277580
  • 19:04 dancy@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.36.0-wmf.35
  • 19:02 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw2234.codfw.wmnet
  • 19:01 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw2233.codfw.wmnet
  • 18:58 catrope@deploy1002: Synchronized php-1.36.0-wmf.35/extensions/WikimediaEvents/: sessionTick: Tick right away on sessionReset (T277515) (duration: 01m 10s)
  • 18:52 catrope@deploy1002: Synchronized php-1.36.0-wmf.35/vendor/: Bump wikimedia/parsoid to 0.13.0-a28 (T276649) (duration: 01m 18s)
  • 18:43 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw2233.codfw.wmnet
  • 18:43 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw2235.codfw.wmnet
  • 18:43 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw2234.codfw.wmnet
  • 18:43 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw2233.codfw.wmnet
  • 18:33 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw2232.codfw.wmnet
  • 18:31 catrope@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Define Portal and Portal talk namespace for niawiki (T277671) (duration: 01m 11s)
  • 18:10 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 18:10 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw2232.codfw.wmnet
  • 18:09 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw2231.codfw.wmnet
  • 18:08 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 17:56 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw2231.codfw.wmnet
  • 17:52 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw2230.codfw.wmnet
  • 17:50 razzi: update firewall rules to allow mysql-sqoop in analytics-in4 to access clouddb1021 - https://gerrit.wikimedia.org/r/c/operations/homer/public/+/672797
  • 17:47 ejegg: updated payments-wiki from 0405ea1723 to b06009c099
  • 17:41 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw2230.codfw.wmnet
  • 17:35 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:28 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 17:09 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:04 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 16:50 andrew@deploy1002: Finished deploy [horizon/deploy@8c50f27]: more support for disabled flavors (duration: 02m 32s)
  • 16:48 andrew@deploy1002: Started deploy [horizon/deploy@8c50f27]: more support for disabled flavors
  • 16:45 andrew@deploy1002: Finished deploy [horizon/deploy@8c50f27]: more support for disabled flavors (duration: 00m 07s)
  • 16:45 andrew@deploy1002: Started deploy [horizon/deploy@8c50f27]: more support for disabled flavors
  • 16:44 andrew@deploy1002: Finished deploy [horizon/deploy@e4fd934]: more support for disabled flavors (duration: 00m 07s)
  • 16:44 andrew@deploy1002: Started deploy [horizon/deploy@e4fd934]: more support for disabled flavors
  • 16:38 effie: upgrade memcached on mc1025, mc2025
  • 16:06 dancy@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.36.0-wmf.35
  • 16:04 dancy@deploy1002: Synchronized php-1.36.0-wmf.35/includes/Revision/RevisionRecord.php: (no justification provided) (duration: 00m 58s)
  • 15:54 ejegg: updated standalone SmashPig deployment from 58b070db1a to 250a8570d1
  • 15:23 jmm@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host dbmonitor1002.wikimedia.org
  • 14:56 jmm@cumin1001: START - Cookbook sre.ganeti.makevm for new host dbmonitor1002.wikimedia.org
  • 14:45 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host testreduce1001.eqiad.wmnet
  • 14:25 marostegui@cumin1001: dbctl commit (dc=all): 'db1087 (re)pooling @ 100%: Slowly repool db1087', diff saved to https://phabricator.wikimedia.org/P14935 and previous config saved to /var/cache/conftool/dbconfig/20210317-142532-root.json
  • 14:18 jayme: rebooting restreduce1001 for T277580
  • 14:17 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single for host testreduce1001.eqiad.wmnet
  • 14:10 marostegui@cumin1001: dbctl commit (dc=all): 'db1087 (re)pooling @ 75%: Slowly repool db1087', diff saved to https://phabricator.wikimedia.org/P14934 and previous config saved to /var/cache/conftool/dbconfig/20210317-141028-root.json
  • 14:02 jayme@cumin1001: conftool action : set/pooled=true; selector: name=codfw,dnsdisc=sessionstore
  • 14:02 jayme@cumin1001: conftool action : set/pooled=true; selector: name=codfw,dnsdisc=eventgate-analytics
  • 14:01 otto@deploy1002: Finished deploy [analytics/refinery@d2f1b28] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@d2f1b28] (duration: 04m 19s)
  • 13:59 razzi@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-conf1001.eqiad.wmnet
  • 13:58 moritzm: added bullseye tftpboot environment T275873
  • 13:56 otto@deploy1002: Started deploy [analytics/refinery@d2f1b28] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@d2f1b28]
  • 13:56 otto@deploy1002: Finished deploy [analytics/refinery@d2f1b28] (thin): Regular analytics weekly train THIN [analytics/refinery@d2f1b28] (duration: 00m 06s)
  • 13:56 otto@deploy1002: Started deploy [analytics/refinery@d2f1b28] (thin): Regular analytics weekly train THIN [analytics/refinery@d2f1b28]
  • 13:55 marostegui@cumin1001: dbctl commit (dc=all): 'db1087 (re)pooling @ 50%: Slowly repool db1087', diff saved to https://phabricator.wikimedia.org/P14933 and previous config saved to /var/cache/conftool/dbconfig/20210317-135522-root.json
  • 13:54 razzi@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-conf1001.eqiad.wmnet
  • 13:52 razzi@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-conf1003.eqiad.wmnet
  • 13:52 otto@deploy1002: Finished deploy [analytics/refinery@d2f1b28]: Regular analytics weekly train [analytics/refinery@d2f1b28] (duration: 11m 36s)
  • 13:47 jayme@cumin1001: conftool action : set/pooled=true; selector: name=codfw,dnsdisc=eventgate-analytics-external
  • 13:47 jayme@cumin1001: conftool action : set/pooled=true; selector: name=codfw,dnsdisc=eventgate-logging-external
  • 13:47 jayme@cumin1001: conftool action : set/pooled=true; selector: name=codfw,dnsdisc=api-gateway
  • 13:47 jayme@cumin1001: conftool action : set/pooled=true; selector: name=codfw,dnsdisc=echostore
  • 13:47 razzi@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-conf1003.eqiad.wmnet
  • 13:46 razzi@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-conf1002.eqiad.wmnet
  • 13:41 razzi@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-conf1002.eqiad.wmnet
  • 13:40 otto@deploy1002: Started deploy [analytics/refinery@d2f1b28]: Regular analytics weekly train [analytics/refinery@d2f1b28]
  • 13:40 marostegui@cumin1001: dbctl commit (dc=all): 'db1087 (re)pooling @ 25%: Slowly repool db1087', diff saved to https://phabricator.wikimedia.org/P14932 and previous config saved to /var/cache/conftool/dbconfig/20210317-134018-root.json
  • 13:38 kormat: stopping db2137:s5 T277632
  • 13:33 kormat: stopping db2089:s5 T277632
  • 13:31 otto@deploy1002: Finished deploy [analytics/aqs/deploy@3e92346]: deploy aqs as part of train - T207171, T263697 (duration: 03m 24s)
  • 13:27 otto@deploy1002: Started deploy [analytics/aqs/deploy@3e92346]: deploy aqs as part of train - T207171, T263697
  • 13:23 jynus: stopping s5 instance on db2099 and restoring from backup T277632
  • 13:17 jayme@cumin1001: conftool action : set/pooled=true; selector: name=codfw,dnsdisc=eventstreams
  • 13:14 jayme@cumin1001: conftool action : set/pooled=true; selector: name=codfw,dnsdisc=eventstreams-internal
  • 13:13 jayme@cumin1001: conftool action : set/pooled=true; selector: name=codfw,dnsdisc=mobileapps
  • 13:13 jayme@cumin1001: conftool action : set/pooled=true; selector: name=codfw,dnsdisc=wikifeeds
  • 13:13 jayme@cumin1001: conftool action : set/pooled=true; selector: name=codfw,dnsdisc=termbox
  • 13:12 moritzm: installing tiff security updates
  • 12:45 jayme@cumin1001: conftool action : set/pooled=true; selector: name=codfw,dnsdisc=similar-users
  • 12:45 jayme@cumin1001: conftool action : set/pooled=true; selector: name=codfw,dnsdisc=push-notifications
  • 12:45 jayme@cumin1001: conftool action : set/pooled=true; selector: name=codfw,dnsdisc=proton
  • 12:45 jayme@cumin1001: conftool action : set/pooled=true; selector: name=codfw,dnsdisc=linkrecommendation
  • 12:44 jayme@cumin1001: conftool action : set/pooled=true; selector: name=codfw,dnsdisc=blubberoid
  • 12:44 jayme@cumin1001: conftool action : set/pooled=true; selector: name=codfw,dnsdisc=apertium
  • 12:11 jayme@cumin1001: conftool action : set/pooled=true; selector: name=codfw,dnsdisc=mathoid
  • 12:10 jayme@cumin1001: conftool action : set/pooled=true; selector: name=codfw,dnsdisc=eventgate-main
  • 11:49 marostegui: Deploy schema change on s8, lag will appear on wiki replicas T276150 T276156
  • 11:47 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1087 for schema change', diff saved to https://phabricator.wikimedia.org/P14931 and previous config saved to /var/cache/conftool/dbconfig/20210317-114746-marostegui.json
  • 11:46 marostegui@cumin1001: dbctl commit (dc=all): 'db1109 (re)pooling @ 100%: Slowly repool db1109', diff saved to https://phabricator.wikimedia.org/P14930 and previous config saved to /var/cache/conftool/dbconfig/20210317-114601-root.json
  • 11:30 marostegui@cumin1001: dbctl commit (dc=all): 'db1109 (re)pooling @ 75%: Slowly repool db1109', diff saved to https://phabricator.wikimedia.org/P14929 and previous config saved to /var/cache/conftool/dbconfig/20210317-113057-root.json
  • 11:20 jayme: switch restbase-async back to codfw (the newly initialized cluster)
  • 11:17 jayme@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=restbase-async,name=codfw
  • 11:17 jayme@cumin1001: conftool action : set/pooled=false; selector: dnsdisc=restbase-async,name=eqiad
  • 11:15 marostegui@cumin1001: dbctl commit (dc=all): 'db1109 (re)pooling @ 50%: Slowly repool db1109', diff saved to https://phabricator.wikimedia.org/P14928 and previous config saved to /var/cache/conftool/dbconfig/20210317-111553-root.json
  • 11:09 moritzm: restarting tomcat on idp.wikimedia.org
  • 11:00 marostegui@cumin1001: dbctl commit (dc=all): 'db1109 (re)pooling @ 25%: Slowly repool db1109', diff saved to https://phabricator.wikimedia.org/P14927 and previous config saved to /var/cache/conftool/dbconfig/20210317-110050-root.json
  • 09:59 moritzm: imported PHP 5.6.40 to thirdparty/php56 T224589
  • 09:47 vgutierrez: restart varnish-fe on cp5011
  • 09:24 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1109 for schema change', diff saved to https://phabricator.wikimedia.org/P14926 and previous config saved to /var/cache/conftool/dbconfig/20210317-092443-marostegui.json
  • 09:23 marostegui@cumin1001: dbctl commit (dc=all): 'db1114 (re)pooling @ 100%: Slowly repool db1114', diff saved to https://phabricator.wikimedia.org/P14925 and previous config saved to /var/cache/conftool/dbconfig/20210317-092357-root.json
  • 09:08 marostegui@cumin1001: dbctl commit (dc=all): 'db1114 (re)pooling @ 75%: Slowly repool db1114', diff saved to https://phabricator.wikimedia.org/P14924 and previous config saved to /var/cache/conftool/dbconfig/20210317-090853-root.json
  • 09:04 jayme@cumin1001: conftool action : set/pooled=true; selector: name=codfw,dnsdisc=recommendation-api
  • 09:04 jayme@cumin1001: conftool action : set/pooled=true; selector: name=codfw,dnsdisc=cxserver
  • 09:04 jayme@cumin1001: conftool action : set/pooled=true; selector: name=codfw,dnsdisc=citoid
  • 09:01 marostegui@cumin1001: dbctl commit (dc=all): 'db1082 (re)pooling @ 100%: Slowly repool db1082', diff saved to https://phabricator.wikimedia.org/P14923 and previous config saved to /var/cache/conftool/dbconfig/20210317-090108-root.json
  • 08:58 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1084 T276302', diff saved to https://phabricator.wikimedia.org/P14922 and previous config saved to /var/cache/conftool/dbconfig/20210317-085852-marostegui.json
  • 08:53 marostegui@cumin1001: dbctl commit (dc=all): 'db1114 (re)pooling @ 50%: Slowly repool db1114', diff saved to https://phabricator.wikimedia.org/P14921 and previous config saved to /var/cache/conftool/dbconfig/20210317-085350-root.json
  • 08:46 marostegui@cumin1001: dbctl commit (dc=all): 'db1082 (re)pooling @ 75%: Slowly repool db1082', diff saved to https://phabricator.wikimedia.org/P14920 and previous config saved to /var/cache/conftool/dbconfig/20210317-084605-root.json
  • 08:38 marostegui@cumin1001: dbctl commit (dc=all): 'db1114 (re)pooling @ 25%: Slowly repool db1114', diff saved to https://phabricator.wikimedia.org/P14919 and previous config saved to /var/cache/conftool/dbconfig/20210317-083846-root.json
  • 08:31 marostegui@cumin1001: dbctl commit (dc=all): 'db1082 (re)pooling @ 50%: Slowly repool db1082', diff saved to https://phabricator.wikimedia.org/P14918 and previous config saved to /var/cache/conftool/dbconfig/20210317-083101-root.json
  • 08:15 marostegui@cumin1001: dbctl commit (dc=all): 'db1082 (re)pooling @ 25%: Slowly repool db1082', diff saved to https://phabricator.wikimedia.org/P14917 and previous config saved to /var/cache/conftool/dbconfig/20210317-081557-root.json
  • 07:50 godog: swift eqiad-prod: less weight for ms-be[1019-1026] - T272836
  • 07:34 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1114 for schema change', diff saved to https://phabricator.wikimedia.org/P14916 and previous config saved to /var/cache/conftool/dbconfig/20210317-073403-marostegui.json
  • 07:30 marostegui@cumin1001: dbctl commit (dc=all): 'db1111 (re)pooling @ 100%: Slowly repool db1111', diff saved to https://phabricator.wikimedia.org/P14915 and previous config saved to /var/cache/conftool/dbconfig/20210317-073024-root.json
  • 07:15 marostegui@cumin1001: dbctl commit (dc=all): 'db1111 (re)pooling @ 75%: Slowly repool db1111', diff saved to https://phabricator.wikimedia.org/P14914 and previous config saved to /var/cache/conftool/dbconfig/20210317-071520-root.json
  • 07:00 marostegui@cumin1001: dbctl commit (dc=all): 'db1111 (re)pooling @ 50%: Slowly repool db1111', diff saved to https://phabricator.wikimedia.org/P14913 and previous config saved to /var/cache/conftool/dbconfig/20210317-070017-root.json
  • 06:52 marostegui: Stop MySQL on db1082 to clone db1161 (lag will appear on s5 on wikireplicas) - T258361
  • 06:51 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1082 to clone db1161 T258361', diff saved to https://phabricator.wikimedia.org/P14911 and previous config saved to /var/cache/conftool/dbconfig/20210317-065146-marostegui.json
  • 06:46 marostegui@cumin1001: dbctl commit (dc=all): 'Pool db2150 into s7 T275633', diff saved to https://phabricator.wikimedia.org/P14910 and previous config saved to /var/cache/conftool/dbconfig/20210317-064606-marostegui.json
  • 06:45 marostegui@cumin1001: dbctl commit (dc=all): 'db1111 (re)pooling @ 25%: Slowly repool db1111', diff saved to https://phabricator.wikimedia.org/P14909 and previous config saved to /var/cache/conftool/dbconfig/20210317-064513-root.json
  • 06:03 marostegui@cumin1001: dbctl commit (dc=all): 'Add db2150 to s7, depooled T275633', diff saved to https://phabricator.wikimedia.org/P14908 and previous config saved to /var/cache/conftool/dbconfig/20210317-060358-marostegui.json
  • 05:42 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1111 for schema change', diff saved to https://phabricator.wikimedia.org/P14907 and previous config saved to /var/cache/conftool/dbconfig/20210317-054206-marostegui.json
  • 02:25 eileen: civicrm revision changed from 8c137b94f0 to 99bf1c9210, config revision is ef2767ab91
  • 01:55 eileen: civicrm revision changed from 550be50105 to 8c137b94f0, config revision is ef2767ab91

2021-03-16

  • 23:56 krinkle@deploy1002: Synchronized php-1.36.0-wmf.35/includes/Revision/: I8619ab9e92b, T277362, T275531 (duration: 00m 58s)
  • 23:51 krinkle@deploy1002: Synchronized php-1.36.0-wmf.34/extensions/Scribunto/: I84e8732d8d - tmp logging (duration: 00m 58s)
  • 23:47 Krinkle: There is an uncommitted dirty diff in /srv/mediawiki-staging/php-1.36.0-wmf.34/extensions/WikimediaMaintenance/createExtensionTables.php
  • 23:31 krinkle@deploy1002: Synchronized wmf-config/InitialiseSettings.php: I1ca4f30c2, T262612 (duration: 00m 57s)
  • 23:22 krinkle@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Icd6635cb302cc, T277332 (duration: 00m 58s)
  • 23:07 krinkle@deploy1002: Synchronized wmf-config/InitialiseSettings.php: I8d8c94d95c6 (duration: 00m 59s)
  • 23:03 twentyafterfour: applied hotfix to phabricator/src/infrastructure/customfield/storage/PhabricatorCustomFieldStorage.php and restarted php-fpm
  • 23:02 krinkle@deploy1002: Synchronized wmf-config/CommonSettings.php: I4097cbcb1d5 (duration: 00m 59s)
  • 22:59 krinkle@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Ie24eb2077 (duration: 00m 58s)
  • 20:59 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw2232.codfw.wmnet
  • 20:59 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw2231.codfw.wmnet
  • 20:59 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw2230.codfw.wmnet
  • 20:49 andrew@deploy1002: Finished deploy [horizon/deploy@e4fd934]: tiny horizon patch to support flavor deprecation (duration: 03m 44s)
  • 20:45 andrew@deploy1002: Started deploy [horizon/deploy@e4fd934]: tiny horizon patch to support flavor deprecation
  • 20:15 XioNoX: remove DMZ zone from pfw3-eqiad - T174203
  • 20:00 brennen: 1.36.0-wmf.35 train status (T274939): blocked at group0 on T277362
  • 19:52 dancy@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.36.0-wmf.34
  • 19:52 XioNoX: commit changes to pfw3-eqiad - T274422
  • 19:44 dancy@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.36.0-wmf.35
  • 19:31 dancy@deploy1002: Finished scap: testwikis wikis to 1.36.0-wmf.35 (duration: 33m 41s)
  • 19:20 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw2229.codfw.wmnet
  • 19:11 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw2229.codfw.wmnet
  • 19:10 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw2229.codfw.wmnet
  • 19:08 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw2228.codfw.wmnet
  • 19:07 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw2228.codfw.wmnet
  • 19:06 XioNoX: commit changes to pfw3-codfw - T274422
  • 18:58 dancy@deploy1002: Started scap: testwikis wikis to 1.36.0-wmf.35
  • 18:55 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw2228.codfw.wmnet
  • 18:48 mbsantos@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
  • 18:43 mbsantos@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
  • 18:41 mbsantos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'mobileapps' for release 'staging' .
  • 18:03 ppchelko@deploy1002: Finished deploy [restbase/deploy@f99ddaa]: Add new wikis T275837 T271983 T273466 T276127 T273460 T276249 (duration: 31m 31s)
  • 17:44 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6 days, 0:00:00 on aqs1011.eqiad.wmnet with reason: New buster hosts, not in use
  • 17:44 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 6 days, 0:00:00 on aqs1011.eqiad.wmnet with reason: New buster hosts, not in use
  • 17:37 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts mw2227.codfw.wmnet
  • 17:32 ppchelko@deploy1002: Started deploy [restbase/deploy@f99ddaa]: Add new wikis T275837 T271983 T273466 T276127 T273460 T276249
  • 17:09 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw2227.codfw.wmnet
  • 17:04 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts mw2226.codfw.wmnet
  • 16:47 eevans@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'sessionstore' for release 'production' .
  • 16:44 eevans@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'sessionstore' for release 'production' .
  • 16:25 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2242.codfw.wmnet
  • 16:25 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2241.codfw.wmnet
  • 16:24 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2240.codfw.wmnet
  • 16:21 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw2226.codfw.wmnet
  • 16:20 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw2227.codfw.wmnet
  • 16:20 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw2226.codfw.wmnet
  • 16:17 mutante: testreduce1001 - gzip /var/log/daemon.log.1 ; apt-get clean .. free some disk space
  • 15:47 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 16 days, 16:00:00 on acrux.codfw.wmnet with reason: Extend downtime for like a month until we remove the VMs
  • 15:47 akosiaris@cumin1001: START - Cookbook sre.hosts.downtime for 16 days, 16:00:00 on acrux.codfw.wmnet with reason: Extend downtime for like a month until we remove the VMs
  • 15:47 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 16 days, 16:00:00 on acrab.codfw.wmnet with reason: Extend downtime for like a month until we remove the VMs
  • 15:46 akosiaris@cumin1001: START - Cookbook sre.hosts.downtime for 16 days, 16:00:00 on acrab.codfw.wmnet with reason: Extend downtime for like a month until we remove the VMs
  • 15:34 marostegui@cumin1001: dbctl commit (dc=all): 'db1101:3318 (re)pooling @ 100%: Slowly repool db1101:3318', diff saved to https://phabricator.wikimedia.org/P14905 and previous config saved to /var/cache/conftool/dbconfig/20210316-153446-root.json
  • 15:32 ayounsi@deploy1002: Finished deploy [homer/deploy@759f82c]: T277006 (duration: 04m 56s)
  • 15:27 ayounsi@deploy1002: Started deploy [homer/deploy@759f82c]: T277006
  • 15:19 marostegui@cumin1001: dbctl commit (dc=all): 'db1101:3318 (re)pooling @ 75%: Slowly repool db1101:3318', diff saved to https://phabricator.wikimedia.org/P14904 and previous config saved to /var/cache/conftool/dbconfig/20210316-151943-root.json
  • 15:07 hashar@deploy1002: Finished deploy [integration/docroot@cf787a5]: (no justification provided) (duration: 00m 30s)
  • 15:06 hashar@deploy1002: Started deploy [integration/docroot@cf787a5]: (no justification provided)
  • 15:04 marostegui@cumin1001: dbctl commit (dc=all): 'db1101:3318 (re)pooling @ 50%: Slowly repool db1101:3318', diff saved to https://phabricator.wikimedia.org/P14903 and previous config saved to /var/cache/conftool/dbconfig/20210316-150439-root.json
  • 15:03 hashar@deploy1002: Finished deploy [integration/docroot@44d5685]: Verify check can restart php-fpm # T275468 (duration: 00m 07s)
  • 15:03 hashar@deploy1002: Started deploy [integration/docroot@44d5685]: Verify check can restart php-fpm # T275468
  • 14:58 Amir1: end of foreachwikiindblist wikidataclient extensions/Wikibase/lib/maintenance/populateSitesTable.php --force-protocol https (T276251 T276129 T275839)
  • 14:53 jmm@cumin2001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host ganeti2015.codfw.wmnet
  • 14:49 marostegui@cumin1001: dbctl commit (dc=all): 'db1101:3318 (re)pooling @ 25%: Slowly repool db1101:3318', diff saved to https://phabricator.wikimedia.org/P14902 and previous config saved to /var/cache/conftool/dbconfig/20210316-144935-root.json
  • 14:37 Amir1: start of foreachwikiindblist wikidataclient extensions/Wikibase/lib/maintenance/populateSitesTable.php --force-protocol https (T276251 T276129 T275839)
  • 13:45 moritzm: powercycling ganeti2015, stuck on reboot
  • 13:35 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ganeti2015.codfw.wmnet
  • 13:35 akosiaris@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'zotero' for release 'production' .
  • 13:35 akosiaris@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'zotero' for release 'staging' .
  • 13:34 akosiaris@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
  • 13:33 akosiaris@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'wikifeeds' for release 'staging' .
  • 13:32 akosiaris@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'termbox' for release 'test' .
  • 13:32 akosiaris@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'termbox' for release 'production' .
  • 13:32 akosiaris@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'termbox' for release 'staging' .
  • 13:32 akosiaris@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'similar-users' for release 'main' .
  • 13:31 moritzm: drain ganeti2015
  • 13:31 akosiaris@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'sessionstore' for release 'staging' .
  • 13:31 akosiaris@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'sessionstore' for release 'production' .
  • 13:30 akosiaris@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'recommendation-api' for release 'production' .
  • 13:30 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2014.codfw.wmnet
  • 13:28 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1101:3318 for schema change', diff saved to https://phabricator.wikimedia.org/P14901 and previous config saved to /var/cache/conftool/dbconfig/20210316-132844-marostegui.json
  • 13:28 marostegui@cumin1001: dbctl commit (dc=all): 'db1172 (re)pooling @ 100%: Slowly repool db1172', diff saved to https://phabricator.wikimedia.org/P14900 and previous config saved to /var/cache/conftool/dbconfig/20210316-132814-root.json
  • 13:28 akosiaris@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'push-notifications' for release 'main' .
  • 13:27 akosiaris@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'push-notifications' for release 'canary' .
  • 13:26 akosiaris@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'proton' for release 'production' .
  • 13:24 urbanecm@deploy1002: Synchronized wmf-config/logos.php: 7fb50c3: trvwiki: set logo to File:Wikipedia-logo-v2-trv.svg (T276246; 2/2) (duration: 00m 57s)
  • 13:24 akosiaris@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
  • 13:24 akosiaris@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mobileapps' for release 'staging' .
  • 13:23 urbanecm@deploy1002: Synchronized static/images/project-logos/: 7fb50c3: trvwiki: set logo to File:Wikipedia-logo-v2-trv.svg (T276246; 1/2) (duration: 01m 01s)
  • 13:22 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ganeti2014.codfw.wmnet
  • 13:22 akosiaris@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mathoid' for release 'production' .
  • 13:22 akosiaris@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mathoid' for release 'staging' .
  • 13:22 urbanecm@deploy1002: sync-file aborted: 7fb50c3: trvwiki: set logo to File:Wikipedia-logo-v2-trv.svg (T276246) (duration: 00m 00s)
  • 13:20 moritzm: drain ganeti2014
  • 13:19 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2013.codfw.wmnet
  • 13:19 akosiaris@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'linkrecommendation' for release 'external' .
  • 13:19 akosiaris@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'linkrecommendation' for release 'production' .
  • 13:19 akosiaris@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'linkrecommendation' for release 'staging' .
  • 13:18 akosiaris@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventstreams-internal' for release 'canary' .
  • 13:18 akosiaris@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventstreams-internal' for release 'main' .
  • 13:17 akosiaris@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventstreams' for release 'production' .
  • 13:16 akosiaris@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventstreams' for release 'canary' .
  • 13:15 akosiaris@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-main' for release 'canary' .
  • 13:15 akosiaris@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
  • 13:13 marostegui@cumin1001: dbctl commit (dc=all): 'db1172 (re)pooling @ 75%: Slowly repool db1172', diff saved to https://phabricator.wikimedia.org/P14899 and previous config saved to /var/cache/conftool/dbconfig/20210316-131310-root.json
  • 13:13 akosiaris@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'canary' .
  • 13:13 akosiaris@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
  • 13:12 akosiaris@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'canary' .
  • 13:12 akosiaris@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
  • 13:12 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ganeti2013.codfw.wmnet
  • 13:10 akosiaris@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-analytics' for release 'canary' .
  • 13:10 akosiaris@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-analytics' for release 'production' .
  • 13:09 akosiaris@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'echostore' for release 'production' .
  • 13:09 akosiaris@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'echostore' for release 'staging' .
  • 13:08 akosiaris@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'cxserver' for release 'production' .
  • 13:08 akosiaris@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'cxserver' for release 'staging' .
  • 13:07 akosiaris@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'citoid' for release 'staging' .
  • 13:07 akosiaris@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'citoid' for release 'production' .
  • 13:07 akosiaris@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'staging' .
  • 13:07 akosiaris@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' .
  • 13:05 akosiaris@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'changeprop' for release 'staging' .
  • 13:05 akosiaris@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'changeprop' for release 'production' .
  • 13:05 akosiaris@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'blubberoid' for release 'staging' .
  • 13:05 akosiaris@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'blubberoid' for release 'production' .
  • 13:04 akosiaris@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 13:04 akosiaris@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 13:03 akosiaris: sync all services on the new codfw kubernetes cluster T277191
  • 13:02 akosiaris@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'apertium' for release 'production' .
  • 13:02 akosiaris@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'apertium' for release 'staging' .
  • 12:59 moritzm: drain ganeti2013
  • 12:58 marostegui@cumin1001: dbctl commit (dc=all): 'db1172 (re)pooling @ 50%: Slowly repool db1172', diff saved to https://phabricator.wikimedia.org/P14898 and previous config saved to /var/cache/conftool/dbconfig/20210316-125807-root.json
  • 12:55 volans@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 12:53 Urbanecm: New wiki creation is done
  • 12:51 volans@cumin1001: START - Cookbook sre.dns.netbox
  • 12:50 urbanecm@deploy1002: Synchronized wmf-config/flaggedrevs.php: 1426d04: flaggedrevs: Simplify the config a bit (duration: 00m 58s)
  • 12:46 urbanecm@deploy1002: Synchronized wmf-config/interwiki.php: Update interwiki cache (duration: 02m 06s)
  • 12:43 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Creating mnwwiktionary (T276125) (duration: 00m 57s)
  • 12:43 marostegui@cumin1001: dbctl commit (dc=all): 'db1172 (re)pooling @ 25%: Slowly repool db1172', diff saved to https://phabricator.wikimedia.org/P14897 and previous config saved to /var/cache/conftool/dbconfig/20210316-124303-root.json
  • 12:42 urbanecm@deploy1002: Synchronized wmf-config/logos.php: Creating mnwwiktionary (T276125) (duration: 01m 00s)
  • 12:41 urbanecm@deploy1002: Synchronized static/images/project-logos/: Creating mnwwiktionary (T276125) (duration: 01m 01s)
  • 12:40 urbanecm@deploy1002: rebuilt and synchronized wikiversions files: Creating mnwwiktionary (T276125)
  • 12:39 urbanecm@deploy1002: Synchronized dblists: Creating mnwwiktionary (T276125) (duration: 00m 57s)
  • 12:39 jayme@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'sync'.
  • 12:37 urbanecm@deploy1002: Synchronized wmf-config/db-codfw.php: Creating mnwwiktionary (T276125) (duration: 00m 58s)
  • 12:36 urbanecm@deploy1002: Synchronized wmf-config/db-eqiad.php: Creating mnwwiktionary (T276125) (duration: 00m 58s)
  • 12:34 urbanecm@deploy1002: Synchronized langlist: Creating trvwiki (T276246) (duration: 00m 58s)
  • 12:33 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Creating trvwiki (T276246) (duration: 00m 57s)
  • 12:32 urbanecm@deploy1002: Synchronized static/images/project-logos/: Creating trvwiki (T276246) (duration: 00m 58s)
  • 12:31 urbanecm@deploy1002: rebuilt and synchronized wikiversions files: Creating trvwiki (T276246)
  • 12:29 urbanecm@deploy1002: Synchronized dblists: Creating trvwiki (T276246) (duration: 00m 57s)
  • 12:28 jayme@deploy1002: helmfile [codfw] START helmfile.d/admin 'sync'.
  • 12:28 urbanecm@deploy1002: Synchronized wmf-config/db-codfw.php: Creating trvwiki (T276246) (duration: 01m 02s)
  • 12:27 urbanecm@deploy1002: Synchronized wmf-config/db-eqiad.php: Creating trvwiki (T276246) (duration: 00m 57s)
  • 12:20 urbanecm@deploy1002: Synchronized langlist: Creating taywiki (T275803) (duration: 00m 57s)
  • 12:19 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Creating taywiki (T275803) (duration: 00m 58s)
  • 12:17 urbanecm@deploy1002: Synchronized wmf-config/logos.php: Creating taywiki (T275803) (duration: 00m 57s)
  • 12:17 jayme@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'sync'.
  • 12:16 urbanecm@deploy1002: Synchronized static/images/project-logos/: Creating taywiki (T275803) (duration: 00m 58s)
  • 12:14 urbanecm@deploy1002: rebuilt and synchronized wikiversions files: Creating taywiki (T275803)
  • 12:12 urbanecm@deploy1002: Synchronized dblists: Creating taywiki (T275803) (duration: 00m 58s)
  • 12:11 urbanecm@deploy1002: Synchronized wmf-config/db-codfw.php: Creating taywiki (T275803) (duration: 01m 02s)
  • 12:10 urbanecm@deploy1002: Synchronized wmf-config/db-eqiad.php: Creating taywiki (T275803) (duration: 00m 59s)
  • 12:10 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on aqs1011.eqiad.wmnet with reason: New buster host
  • 12:10 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on aqs1011.eqiad.wmnet with reason: New buster host
  • 12:07 jayme@deploy1002: helmfile [codfw] START helmfile.d/admin 'sync'.
  • 11:54 jayme@cumin1001: conftool action : set/pooled=yes; selector: dc=codfw,cluster=kubernetes,service=kubesvc
  • 11:43 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1172 for schema change', diff saved to https://phabricator.wikimedia.org/P14896 and previous config saved to /var/cache/conftool/dbconfig/20210316-114310-marostegui.json
  • 11:33 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kubernetes2015.codfw.wmnet
  • 11:32 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kubernetes2016.codfw.wmnet
  • 11:32 effie: upgrade memached in mc1023, mc2023
  • 11:31 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kubernetes2006.codfw.wmnet
  • 11:30 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single for host kubernetes2016.codfw.wmnet
  • 11:29 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single for host kubernetes2015.codfw.wmnet
  • 11:29 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single for host kubernetes2006.codfw.wmnet
  • 11:29 marostegui@cumin1001: dbctl commit (dc=all): 'db1099:3318 (re)pooling @ 100%: Slowly repool db1099:3318', diff saved to https://phabricator.wikimedia.org/P14895 and previous config saved to /var/cache/conftool/dbconfig/20210316-112931-root.json
  • 11:28 jayme@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host kubernetes2006.codfw.wmnet
  • 11:28 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single for host kubernetes2006.codfw.wmnet
  • 11:22 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: c444517: 4e66529: dff200b: Enable DiscussionTools features on several projects (T276493; T276498; T277103) (duration: 00m 57s)
  • 11:17 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kubernetes2005.codfw.wmnet
  • 11:17 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kubernetes2017.codfw.wmnet
  • 11:16 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: f0d5465: Enable DiscussionTools beta features on enwiki (T273146) (duration: 00m 58s)
  • 11:15 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single for host kubernetes2005.codfw.wmnet
  • 11:14 marostegui@cumin1001: dbctl commit (dc=all): 'db1099:3318 (re)pooling @ 75%: Slowly repool db1099:3318', diff saved to https://phabricator.wikimedia.org/P14893 and previous config saved to /var/cache/conftool/dbconfig/20210316-111427-root.json
  • 11:13 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 835f9ab: Enable ContentTranslation as a default tool in Amharic, Maltese and Uzbek Wikipedias (T276765) (duration: 01m 00s)
  • 11:10 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes2014.codfw.wmnet with reason: REIMAGE
  • 11:08 akosiaris@cumin1001: conftool action : set/pooled=yes; selector: dc=codfw,service=kubemaster,name=.*,cluster=kubernetes
  • 11:08 akosiaris@cumin1001: conftool action : set/weight=10; selector: dc=codfw,service=kubemaster,name=.*,cluster=kubernetes
  • 11:07 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes2013.codfw.wmnet with reason: REIMAGE
  • 11:06 jayme@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes2014.codfw.wmnet with reason: REIMAGE
  • 11:05 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes2012.codfw.wmnet with reason: REIMAGE
  • 11:04 jayme@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes2013.codfw.wmnet with reason: REIMAGE
  • 11:03 jayme@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on kubernetes2010.codfw.wmnet with reason: REIMAGE
  • 11:02 jayme@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes2012.codfw.wmnet with reason: REIMAGE
  • 11:01 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes2011.codfw.wmnet with reason: REIMAGE
  • 11:00 jayme@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on kubernetes2004.codfw.wmnet with reason: REIMAGE
  • 10:59 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single for host kubernetes2017.codfw.wmnet
  • 10:59 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes2009.codfw.wmnet with reason: REIMAGE
  • 10:59 marostegui@cumin1001: dbctl commit (dc=all): 'db1099:3318 (re)pooling @ 50%: Slowly repool db1099:3318', diff saved to https://phabricator.wikimedia.org/P14892 and previous config saved to /var/cache/conftool/dbconfig/20210316-105924-root.json
  • 10:59 jayme@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes2011.codfw.wmnet with reason: REIMAGE
  • 10:58 jayme@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes2010.codfw.wmnet with reason: REIMAGE
  • 10:57 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes2008.codfw.wmnet with reason: REIMAGE
  • 10:55 jayme@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes2009.codfw.wmnet with reason: REIMAGE
  • 10:55 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes2007.codfw.wmnet with reason: REIMAGE
  • 10:55 jayme@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes2004.codfw.wmnet with reason: REIMAGE
  • 10:54 jayme@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes2008.codfw.wmnet with reason: REIMAGE
  • 10:53 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes2003.codfw.wmnet with reason: REIMAGE
  • 10:52 jayme@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes2007.codfw.wmnet with reason: REIMAGE
  • 10:51 marostegui@cumin1001: dbctl commit (dc=all): 'db1162 (re)pooling @ 100%: Slowly repool db1162', diff saved to https://phabricator.wikimedia.org/P14891 and previous config saved to /var/cache/conftool/dbconfig/20210316-105128-root.json
  • 10:51 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes2002.codfw.wmnet with reason: REIMAGE
  • 10:51 jayme@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes2003.codfw.wmnet with reason: REIMAGE
  • 10:49 jayme@cumin1001: conftool action : set/pooled=inactive; selector: dc=codfw,service=kubesvc,name=kubernetes2006.codfw.wmnet
  • 10:49 jayme@cumin1001: conftool action : set/pooled=inactive; selector: dc=codfw,service=kubesvc,name=kubernetes2005.codfw.wmnet
  • 10:49 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes2001.codfw.wmnet with reason: REIMAGE
  • 10:49 jayme@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes2002.codfw.wmnet with reason: REIMAGE
  • 10:47 jayme@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes2001.codfw.wmnet with reason: REIMAGE
  • 10:47 jayme@cumin1001: conftool action : set/pooled=inactive; selector: dc=codfw,service=kubesvc,name=kubernetes2015.codfw.wmnet
  • 10:46 jayme@cumin1001: conftool action : set/pooled=inactive; selector: dc=codfw,service=kubesvc,name=kubernetes2016.codfw.wmnet
  • 10:44 marostegui@cumin1001: dbctl commit (dc=all): 'db1099:3318 (re)pooling @ 25%: Slowly repool db1099:3318', diff saved to https://phabricator.wikimedia.org/P14890 and previous config saved to /var/cache/conftool/dbconfig/20210316-104420-root.json
  • 10:36 marostegui@cumin1001: dbctl commit (dc=all): 'db1162 (re)pooling @ 75%: Slowly repool db1162', diff saved to https://phabricator.wikimedia.org/P14889 and previous config saved to /var/cache/conftool/dbconfig/20210316-103625-root.json
  • 10:21 marostegui@cumin1001: dbctl commit (dc=all): 'db1162 (re)pooling @ 60%: Slowly repool db1162', diff saved to https://phabricator.wikimedia.org/P14887 and previous config saved to /var/cache/conftool/dbconfig/20210316-102121-root.json
  • 10:15 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2012.codfw.wmnet
  • 10:07 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ganeti2012.codfw.wmnet
  • 10:07 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1161.eqiad.wmnet with reason: REIMAGE
  • 10:06 marostegui@cumin1001: dbctl commit (dc=all): 'db1162 (re)pooling @ 50%: Slowly repool db1162', diff saved to https://phabricator.wikimedia.org/P14886 and previous config saved to /var/cache/conftool/dbconfig/20210316-100617-root.json
  • 10:05 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1161.eqiad.wmnet with reason: REIMAGE
  • 10:03 moritzm: drain ganeti2012
  • 10:00 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2011.codfw.wmnet
  • 09:59 akosiaris: Push new certs for kubemaster.svc.codfw.wmnet - T277191
  • 09:51 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ganeti2011.codfw.wmnet
  • 09:51 marostegui@cumin1001: dbctl commit (dc=all): 'db1162 (re)pooling @ 49%: Slowly repool db1162', diff saved to https://phabricator.wikimedia.org/P14885 and previous config saved to /var/cache/conftool/dbconfig/20210316-095113-root.json
  • 09:50 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kubetcd2006.codfw.wmnet
  • 09:48 moritzm: drain ganeti2011
  • 09:46 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kubetcd2005.codfw.wmnet
  • 09:46 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single for host kubetcd2006.codfw.wmnet
  • 09:44 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single for host kubetcd2005.codfw.wmnet
  • 09:44 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kubetcd2004.codfw.wmnet
  • 09:41 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1099:3318 for schema change', diff saved to https://phabricator.wikimedia.org/P14884 and previous config saved to /var/cache/conftool/dbconfig/20210316-094117-marostegui.json
  • 09:40 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single for host kubetcd2004.codfw.wmnet
  • 09:36 marostegui@cumin1001: dbctl commit (dc=all): 'db1162 (re)pooling @ 25%: Slowly repool db1162', diff saved to https://phabricator.wikimedia.org/P14883 and previous config saved to /var/cache/conftool/dbconfig/20210316-093609-root.json
  • 09:34 akosiaris: poweroff acrux and acrab T277191
  • 09:22 marostegui@cumin1001: dbctl commit (dc=all): 'db1076 (re)pooling @ 100%: Slowly repool db1076', diff saved to https://phabricator.wikimedia.org/P14881 and previous config saved to /var/cache/conftool/dbconfig/20210316-092204-root.json
  • 09:21 marostegui@cumin1001: dbctl commit (dc=all): 'db1162 (re)pooling @ 20%: Slowly repool db1162', diff saved to https://phabricator.wikimedia.org/P14880 and previous config saved to /var/cache/conftool/dbconfig/20210316-092106-root.json
  • 09:18 akosiaris: switch restbase-async to eqiad since the kubernetes codfw cluster is being reinitialized and it makes little sense to have it there while the callers will run in eqiad only
  • 09:15 akosiaris@cumin1001: conftool action : set/pooled=true; selector: name=eqiad,dnsdisc=restbase-async
  • 09:12 akosiaris@cumin1001: conftool action : set/pooled=false; selector: name=codfw,dnsdisc=restbase-async
  • 09:12 jayme@cumin1001: conftool action : set/pooled=false; selector: name=codfw,dnsdisc=wikifeeds
  • 09:12 jayme@cumin1001: conftool action : set/pooled=false; selector: name=codfw,dnsdisc=termbox
  • 09:12 jayme@cumin1001: conftool action : set/pooled=false; selector: name=codfw,dnsdisc=similar-users
  • 09:12 jayme@cumin1001: conftool action : set/pooled=false; selector: name=codfw,dnsdisc=sessionstore
  • 09:11 jayme@cumin1001: conftool action : set/pooled=false; selector: name=codfw,dnsdisc=recommendation-api
  • 09:11 jayme@cumin1001: conftool action : set/pooled=false; selector: name=codfw,dnsdisc=push-notifications
  • 09:11 jayme@cumin1001: conftool action : set/pooled=false; selector: name=codfw,dnsdisc=proton
  • 09:11 jayme@cumin1001: conftool action : set/pooled=false; selector: name=codfw,dnsdisc=mobileapps
  • 09:11 jayme@cumin1001: conftool action : set/pooled=false; selector: name=codfw,dnsdisc=mathoid
  • 09:11 jayme@cumin1001: conftool action : set/pooled=false; selector: name=codfw,dnsdisc=linkrecommendation
  • 09:11 jayme@cumin1001: conftool action : set/pooled=false; selector: name=codfw,dnsdisc=eventstreams-internal
  • 09:11 jayme@cumin1001: conftool action : set/pooled=false; selector: name=codfw,dnsdisc=eventstreams
  • 09:11 jayme@cumin1001: conftool action : set/pooled=false; selector: name=codfw,dnsdisc=eventgate-main
  • 09:11 jayme@cumin1001: conftool action : set/pooled=false; selector: name=codfw,dnsdisc=eventgate-logging-external
  • 09:10 jayme@cumin1001: conftool action : set/pooled=false; selector: name=codfw,dnsdisc=eventgate-analytics-external
  • 09:10 jayme@cumin1001: conftool action : set/pooled=false; selector: name=codfw,dnsdisc=eventgate-analytics
  • 09:10 jayme@cumin1001: conftool action : set/pooled=false; selector: name=codfw,dnsdisc=echostore
  • 09:10 jayme@cumin1001: conftool action : set/pooled=false; selector: name=codfw,dnsdisc=cxserver
  • 09:10 jayme@cumin1001: conftool action : set/pooled=false; selector: name=codfw,dnsdisc=citoid
  • 09:10 jayme@cumin1001: conftool action : set/pooled=false; selector: name=codfw,dnsdisc=blubberoid
  • 09:10 jayme@cumin1001: conftool action : set/pooled=false; selector: name=codfw,dnsdisc=api-gateway
  • 09:10 jayme@cumin1001: conftool action : set/pooled=false; selector: name=codfw,dnsdisc=apertium
  • 09:07 marostegui@cumin1001: dbctl commit (dc=all): 'db1076 (re)pooling @ 75%: Slowly repool db1076', diff saved to https://phabricator.wikimedia.org/P14879 and previous config saved to /var/cache/conftool/dbconfig/20210316-090701-root.json
  • 09:06 marostegui@cumin1001: dbctl commit (dc=all): 'db1162 (re)pooling @ 15%: Slowly repool db1162', diff saved to https://phabricator.wikimedia.org/P14878 and previous config saved to /var/cache/conftool/dbconfig/20210316-090602-root.json
  • 09:05 jayme@cumin1001: END (PASS) - Cookbook sre.discovery.service-route (exit_code=0)
  • 09:05 jayme@cumin1001: START - Cookbook sre.discovery.service-route
  • 09:05 jayme@cumin1001: END (PASS) - Cookbook sre.discovery.service-route (exit_code=0)
  • 09:05 jayme@cumin1001: START - Cookbook sre.discovery.service-route
  • 09:05 jayme@cumin1001: END (PASS) - Cookbook sre.discovery.service-route (exit_code=0)
  • 09:04 jayme@cumin1001: START - Cookbook sre.discovery.service-route
  • 09:04 jayme@cumin1001: END (PASS) - Cookbook sre.discovery.service-route (exit_code=0)
  • 09:04 jayme@cumin1001: START - Cookbook sre.discovery.service-route
  • 09:04 jayme@cumin1001: END (PASS) - Cookbook sre.discovery.service-route (exit_code=0)
  • 09:04 jayme@cumin1001: START - Cookbook sre.discovery.service-route
  • 09:04 jayme@cumin1001: END (PASS) - Cookbook sre.discovery.service-route (exit_code=0)
  • 09:04 jayme@cumin1001: START - Cookbook sre.discovery.service-route
  • 09:04 jayme@cumin1001: END (PASS) - Cookbook sre.discovery.service-route (exit_code=0)
  • 09:04 jayme@cumin1001: START - Cookbook sre.discovery.service-route
  • 09:04 jayme@cumin1001: END (PASS) - Cookbook sre.discovery.service-route (exit_code=0)
  • 09:04 jayme@cumin1001: START - Cookbook sre.discovery.service-route
  • 09:03 jayme@cumin1001: END (PASS) - Cookbook sre.discovery.service-route (exit_code=0)
  • 09:03 jayme@cumin1001: START - Cookbook sre.discovery.service-route
  • 09:03 jayme@cumin1001: END (PASS) - Cookbook sre.discovery.service-route (exit_code=0)
  • 09:03 jayme@cumin1001: START - Cookbook sre.discovery.service-route
  • 09:03 jayme@cumin1001: END (PASS) - Cookbook sre.discovery.service-route (exit_code=0)
  • 09:03 jayme@cumin1001: START - Cookbook sre.discovery.service-route
  • 09:03 jayme@cumin1001: END (PASS) - Cookbook sre.discovery.service-route (exit_code=0)
  • 09:02 jayme@cumin1001: START - Cookbook sre.discovery.service-route
  • 09:02 jayme@cumin1001: END (PASS) - Cookbook sre.discovery.service-route (exit_code=0)
  • 09:02 jayme@cumin1001: START - Cookbook sre.discovery.service-route
  • 09:02 jayme@cumin1001: END (PASS) - Cookbook sre.discovery.service-route (exit_code=0)
  • 09:02 jayme@cumin1001: START - Cookbook sre.discovery.service-route
  • 09:02 jayme@cumin1001: END (PASS) - Cookbook sre.discovery.service-route (exit_code=0)
  • 09:02 jayme@cumin1001: START - Cookbook sre.discovery.service-route
  • 09:02 jayme@cumin1001: END (PASS) - Cookbook sre.discovery.service-route (exit_code=0)
  • 09:02 jayme@cumin1001: START - Cookbook sre.discovery.service-route
  • 09:02 jayme@cumin1001: END (PASS) - Cookbook sre.discovery.service-route (exit_code=0)
  • 09:02 jayme@cumin1001: START - Cookbook sre.discovery.service-route
  • 09:01 jayme@cumin1001: END (PASS) - Cookbook sre.discovery.service-route (exit_code=0)
  • 09:01 jayme@cumin1001: START - Cookbook sre.discovery.service-route
  • 09:01 jayme@cumin1001: END (PASS) - Cookbook sre.discovery.service-route (exit_code=0)
  • 09:01 jayme@cumin1001: START - Cookbook sre.discovery.service-route
  • 09:01 jayme@cumin1001: END (PASS) - Cookbook sre.discovery.service-route (exit_code=0)
  • 09:01 jayme@cumin1001: START - Cookbook sre.discovery.service-route
  • 09:01 jayme@cumin1001: END (PASS) - Cookbook sre.discovery.service-route (exit_code=0)
  • 09:01 jayme@cumin1001: START - Cookbook sre.discovery.service-route
  • 09:01 jayme@cumin1001: END (PASS) - Cookbook sre.discovery.service-route (exit_code=0)
  • 09:01 jayme@cumin1001: START - Cookbook sre.discovery.service-route
  • 09:00 jayme@cumin1001: END (PASS) - Cookbook sre.discovery.service-route (exit_code=0)
  • 09:00 jayme@cumin1001: START - Cookbook sre.discovery.service-route
  • 09:00 jayme@cumin1001: END (PASS) - Cookbook sre.discovery.service-route (exit_code=0)
  • 09:00 jayme@cumin1001: START - Cookbook sre.discovery.service-route
  • 09:00 jayme@cumin1001: END (PASS) - Cookbook sre.discovery.service-route (exit_code=0)
  • 09:00 jayme@cumin1001: START - Cookbook sre.discovery.service-route
  • 09:00 jayme@cumin1001: END (PASS) - Cookbook sre.discovery.service-route (exit_code=0)
  • 09:00 jayme@cumin1001: START - Cookbook sre.discovery.service-route
  • 08:59 akosiaris: starting the k8s codfw cluster reinitialization process
  • 08:59 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on 18 hosts with reason: Reinitialize codfw k8s cluster with new etcd
  • 08:59 akosiaris@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on 18 hosts with reason: Reinitialize codfw k8s cluster with new etcd
  • 08:57 jayme@cumin1001: END (PASS) - Cookbook sre.discovery.service-route (exit_code=0)
  • 08:56 jayme@cumin1001: START - Cookbook sre.discovery.service-route
  • 08:51 marostegui@cumin1001: dbctl commit (dc=all): 'db1076 (re)pooling @ 50%: Slowly repool db1076', diff saved to https://phabricator.wikimedia.org/P14877 and previous config saved to /var/cache/conftool/dbconfig/20210316-085157-root.json
  • 08:50 marostegui@cumin1001: dbctl commit (dc=all): 'db1162 (re)pooling @ 10%: Slowly repool db1162', diff saved to https://phabricator.wikimedia.org/P14876 and previous config saved to /var/cache/conftool/dbconfig/20210316-085058-root.json
  • 08:47 marostegui: Check tables on db2150 db2120 T276742
  • 08:42 moritzm: remove Java 8 from contint/releases T269354
  • 08:36 marostegui@cumin1001: dbctl commit (dc=all): 'db1076 (re)pooling @ 25%: Slowly repool db1076', diff saved to https://phabricator.wikimedia.org/P14875 and previous config saved to /var/cache/conftool/dbconfig/20210316-083653-root.json
  • 08:35 marostegui@cumin1001: dbctl commit (dc=all): 'db1162 (re)pooling @ 5%: Slowly repool db1162', diff saved to https://phabricator.wikimedia.org/P14874 and previous config saved to /var/cache/conftool/dbconfig/20210316-083555-root.json
  • 08:20 marostegui@cumin1001: dbctl commit (dc=all): 'db1162 (re)pooling @ 2%: Slowly repool db1162', diff saved to https://phabricator.wikimedia.org/P14873 and previous config saved to /var/cache/conftool/dbconfig/20210316-082051-root.json
  • 08:18 godog: enable nick enforcing for logmsgbot - T276303
  • 08:05 marostegui@cumin1001: dbctl commit (dc=all): 'db1162 (re)pooling @ 1%: Slowly repool db1162', diff saved to https://phabricator.wikimedia.org/P14872 and previous config saved to /var/cache/conftool/dbconfig/20210316-080547-root.json
  • 07:51 godog: swift eqiad-prod: less weight for ms-be[1019-1026] - T272836
  • 07:29 marostegui@cumin1001: dbctl commit (dc=all): 'db1136 (re)pooling @ 100%: Repool db1136', diff saved to https://phabricator.wikimedia.org/P14871 and previous config saved to /var/cache/conftool/dbconfig/20210316-072910-root.json
  • 07:14 marostegui@cumin1001: dbctl commit (dc=all): 'db1136 (re)pooling @ 75%: Repool db1136', diff saved to https://phabricator.wikimedia.org/P14870 and previous config saved to /var/cache/conftool/dbconfig/20210316-071407-root.json
  • 06:59 marostegui@cumin1001: dbctl commit (dc=all): 'db1136 (re)pooling @ 50%: Repool db1136', diff saved to https://phabricator.wikimedia.org/P14869 and previous config saved to /var/cache/conftool/dbconfig/20210316-065903-root.json
  • 06:58 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db2148', diff saved to https://phabricator.wikimedia.org/P14868 and previous config saved to /var/cache/conftool/dbconfig/20210316-065840-marostegui.json
  • 06:58 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db2108', diff saved to https://phabricator.wikimedia.org/P14867 and previous config saved to /var/cache/conftool/dbconfig/20210316-065814-marostegui.json
  • 06:52 marostegui: Stop MySQL on db2120 to clone db2150 - T275633
  • 06:51 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2120 T275633', diff saved to https://phabricator.wikimedia.org/P14865 and previous config saved to /var/cache/conftool/dbconfig/20210316-065148-marostegui.json
  • 06:44 marostegui@cumin1001: dbctl commit (dc=all): 'db1136 (re)pooling @ 25%: Repool db1136', diff saved to https://phabricator.wikimedia.org/P14864 and previous config saved to /var/cache/conftool/dbconfig/20210316-064358-root.json
  • 05:53 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1136.eqiad.wmnet with reason: REIMAGE
  • 05:51 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1136.eqiad.wmnet with reason: REIMAGE
  • 05:35 marostegui: Stop MySQL on db1162 to clone db1162 T258361
  • 05:35 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1076', diff saved to https://phabricator.wikimedia.org/P14862 and previous config saved to /var/cache/conftool/dbconfig/20210316-053516-marostegui.json

2021-03-15

  • 23:31 legoktm@deploy1002: Synchronized wmf-config/CommonSettings.php: Remove back-compat from when IRC feed servers was a string (T224579) (duration: 00m 59s)
  • 23:24 legoktm@deploy1002: Synchronized wmf-config/: Define IRC feed servers as an array in {Production,Labs}Services.php (T224579) (duration: 00m 59s)
  • 23:23 legoktm@deploy1002: Synchronized wmf-config/CommonSettings.php: Support having multiple IRC feed servers (T224579) (duration: 00m 58s)
  • 23:13 legoktm@deploy1002: conftool action : set/pooled=inactive; selector: name=mw2225.codfw.wmnet
  • 23:11 legoktm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: GlobalWatchlist: allow watching up to 50 sites (T276195) (duration: 01m 04s)
  • 21:36 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2239.codfw.wmnet
  • 21:36 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2238.codfw.wmnet
  • 21:36 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2237.codfw.wmnet
  • 21:35 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2236.codfw.wmnet
  • 21:02 ebernhardson@deploy1002: Finished deploy [wikimedia/discovery/analytics@4300929]: convert_to_esbulk: Accept partial hour timestamps (duration: 03m 02s)
  • 20:59 ebernhardson@deploy1002: Started deploy [wikimedia/discovery/analytics@4300929]: convert_to_esbulk: Accept partial hour timestamps
  • 20:55 legoktm: re-enabled puppet on kubestage2001, uncordoned kubestage2002
  • 20:23 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw2225.codfw.wmnet
  • 19:57 ebernhardson@deploy1002: Finished deploy [wikimedia/discovery/analytics@82e0654]: prepare_mw_rev_score: Correct scores_export to bulk_ingest (duration: 01m 49s)
  • 19:55 ebernhardson@deploy1002: Started deploy [wikimedia/discovery/analytics@82e0654]: prepare_mw_rev_score: Correct scores_export to bulk_ingest
  • 19:53 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw2225.codfw.wmnet
  • 19:53 dzahn@cumin1001: END (ERROR) - Cookbook sre.hosts.decommission (exit_code=97) for hosts mw2224.codfw.wmnet
  • 19:53 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw2224.codfw.wmnet
  • 19:43 eevans@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'echostore' for release 'production' .
  • 19:37 eevans@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'echostore' for release 'production' .
  • 19:27 eevans@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'echostore' for release 'staging' .
  • 18:56 dduvall@deploy1002: Synchronized .pipeline: config: Initial multiversion pipeline configuration pipeline: add building the webserver image (T274182) (duration: 00m 59s)
  • 18:55 dduvall@deploy1002: Synchronized multiversion/: config: Initial multiversion pipeline configuration pipeline: add building the webserver image (T274182) (duration: 00m 59s)
  • 18:46 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: e5a7284: Enable DiscussionsTools for enwikibooks (T276851) (duration: 00m 59s)
  • 18:41 legoktm: puppet disabled on kubestage1001 for debugging docker-registry credentials
  • 18:38 urbanecm@deploy1002: Synchronized wmf-config/config/enwikibooks.yaml: b6a8df0: Enable visualeditor on enwikibooks by default (T276851; 2/2) (duration: 01m 00s)
  • 18:37 foks: removing 1 file from eowiki, for legal compliance
  • 18:35 urbanecm@deploy1002: Synchronized dblists/visualeditor-nondefault.dblist: b6a8df0: Enable visualeditor on enwikibooks by default (T276851; 1/2) (duration: 00m 58s)
  • 18:33 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: b70a75c: Configure default search namespaces for thwikisource (T275280) (duration: 00m 59s)
  • 18:18 hoo: Updated the Wikidata property suggester with data from the 2021-03-08 JSON dump (with pre-applied T132839 workarounds)
  • 18:17 urbanecm@deploy1002: Synchronized php-1.36.0-wmf.34/extensions/WikimediaEvents/modules/ext.wikimediaEvents/clientError.js: a7eb550: Use master version of clientError.js (duration: 00m 58s)
  • 18:13 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: a8234a9: Add deleterevision right to botadmin group on fawiki (T277358) (duration: 00m 59s)
  • 18:07 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw2223.codfw.wmnet
  • 18:07 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2235.codfw.wmnet
  • 18:06 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2234.codfw.wmnet
  • 17:56 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw2223.codfw.wmnet
  • 17:55 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw2222.codfw.wmnet
  • 17:30 hnowlan: disabling puppet on aqs100[4-9].eqiad.wmnet to test change to password logic in puppet
  • 17:30 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw2222.codfw.wmnet
  • 17:29 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw2223.codfw.wmnet
  • 17:29 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw2222.codfw.wmnet
  • 17:29 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw2221.codfw.wmnet
  • 17:28 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw2221.codfw.wmnet
  • 17:03 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on aqs1010.eqiad.wmnet with reason: New buster host
  • 17:03 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on aqs1010.eqiad.wmnet with reason: New buster host
  • 16:59 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw2221.codfw.wmnet
  • 16:58 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw2224.codfw.wmnet
  • 16:58 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw2220.codfw.wmnet
  • 16:56 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw2220.codfw.wmnet
  • 16:55 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts mw2224.codfw.wmnet
  • 16:48 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw2224.codfw.wmnet
  • 16:42 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw2220.codfw.wmnet
  • 16:38 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2233.codfw.wmnet
  • 16:38 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2232.codfw.wmnet
  • 16:38 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2231.codfw.wmnet
  • 16:29 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudvirt-wdqs1002.eqiad.wmnet
  • 16:28 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudvirt-wdqs1003.eqiad.wmnet
  • 16:27 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudvirt-wdqs1001.eqiad.wmnet
  • 16:23 aborrero@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudvirt-wdqs1003.eqiad.wmnet
  • 16:23 aborrero@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudvirt-wdqs1002.eqiad.wmnet
  • 16:14 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2010.codfw.wmnet
  • 16:08 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ganeti2010.codfw.wmnet
  • 16:06 aborrero@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudvirt-wdqs1001.eqiad.wmnet
  • 16:05 moritzm: draining ganeti2010
  • 16:04 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2009.codfw.wmnet
  • 15:58 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ganeti2009.codfw.wmnet
  • 15:48 moritzm: draining ganeti2009
  • 15:47 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2007.codfw.wmnet
  • 15:41 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ganeti2007.codfw.wmnet
  • 15:33 moritzm: draining ganeti2007
  • 15:27 aborrero@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudgw2001-dev.codfw.wmnet with reason: REIMAGE
  • 15:24 aborrero@cumin2001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudgw2001-dev.codfw.wmnet with reason: REIMAGE
  • 15:16 marostegui@cumin1001: dbctl commit (dc=all): 'db1074 (re)pooling @ 100%: Repool db1074', diff saved to https://phabricator.wikimedia.org/P14858 and previous config saved to /var/cache/conftool/dbconfig/20210315-151648-root.json
  • 15:16 aborrero@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudgw2002-dev.codfw.wmnet with reason: REIMAGE
  • 15:14 aborrero@cumin2001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudgw2002-dev.codfw.wmnet with reason: REIMAGE
  • 15:08 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1162.eqiad.wmnet with reason: REIMAGE
  • 15:06 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1162.eqiad.wmnet with reason: REIMAGE
  • 15:01 marostegui@cumin1001: dbctl commit (dc=all): 'db1074 (re)pooling @ 75%: Repool db1074', diff saved to https://phabricator.wikimedia.org/P14857 and previous config saved to /var/cache/conftool/dbconfig/20210315-150144-root.json
  • 14:46 marostegui@cumin1001: dbctl commit (dc=all): 'db1074 (re)pooling @ 50%: Repool db1074', diff saved to https://phabricator.wikimedia.org/P14856 and previous config saved to /var/cache/conftool/dbconfig/20210315-144641-root.json
  • 14:36 kharlan@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'linkrecommendation' for release 'external' .
  • 14:36 kharlan@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'linkrecommendation' for release 'production' .
  • 14:32 kharlan@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'linkrecommendation' for release 'external' .
  • 14:32 kharlan@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'linkrecommendation' for release 'production' .
  • 14:31 marostegui@cumin1001: dbctl commit (dc=all): 'db1074 (re)pooling @ 25%: Repool db1074', diff saved to https://phabricator.wikimedia.org/P14855 and previous config saved to /var/cache/conftool/dbconfig/20210315-143137-root.json
  • 14:28 kharlan@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'linkrecommendation' for release 'staging' .
  • 14:08 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1074', diff saved to https://phabricator.wikimedia.org/P14854 and previous config saved to /var/cache/conftool/dbconfig/20210315-140809-marostegui.json
  • 14:04 dcausse: re-pooling wdqs1005
  • 13:54 marostegui@cumin1001: dbctl commit (dc=all): 'db1105:3312 (re)pooling @ 100%: Repool db1105:3312', diff saved to https://phabricator.wikimedia.org/P14853 and previous config saved to /var/cache/conftool/dbconfig/20210315-135426-root.json
  • 13:39 marostegui@cumin1001: dbctl commit (dc=all): 'db1105:3312 (re)pooling @ 75%: Repool db1105:3312', diff saved to https://phabricator.wikimedia.org/P14852 and previous config saved to /var/cache/conftool/dbconfig/20210315-133921-root.json
  • 13:25 Urbanecm: Deploy security patch for T152394
  • 13:24 marostegui@cumin1001: dbctl commit (dc=all): 'db1105:3312 (re)pooling @ 50%: Repool db1105:3312', diff saved to https://phabricator.wikimedia.org/P14851 and previous config saved to /var/cache/conftool/dbconfig/20210315-132418-root.json
  • 13:09 marostegui@cumin1001: dbctl commit (dc=all): 'db1105:3312 (re)pooling @ 25%: Repool db1105:3312', diff saved to https://phabricator.wikimedia.org/P14849 and previous config saved to /var/cache/conftool/dbconfig/20210315-130914-root.json
  • 12:39 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1105:3312', diff saved to https://phabricator.wikimedia.org/P14848 and previous config saved to /var/cache/conftool/dbconfig/20210315-123930-marostegui.json
  • 12:32 urbanecm@deploy1002: Synchronized php-1.36.0-wmf.34/extensions/MobileFrontend/: 41a2aaa: Revert "Rewite MoveLeadParagraphTransform based on mobile apps approach" (T277302) (duration: 00m 58s)
  • 12:31 Lucas_WMDE: maintenance scripts for T270249 completed successfully, no more terms for deleted items found on stat1007
  • 12:30 urbanecm@deploy1002: Synchronized php-1.36.0-wmf.34/extensions/GrowthExperiments/: fa2abfa: Manual submodule update of GrowthExperiments repository (T276966) (duration: 00m 59s)
  • 12:29 Lucas_WMDE: RemoveDeletedItemsFromTermStore.php finished in 5m39s
  • 12:23 Lucas_WMDE: lucaswerkmeister-wmde@mwmaint1002:~$ time mwscript extensions/Wikibase/repo/maintenance/RemoveDeletedItemsFromTermStore.php wikidatawiki --itemIds "$(sed -n 5555,9593p T270249.ids | tr '\n' ',' | sed 's/,$//')" # T270249, remaining 4039 items
  • 12:22 Lucas_WMDE: RemoveDeletedItemsFromTermStore.php finished in 8min
  • 12:19 _joe_: depooled mw1347 for testing
  • 12:13 Lucas_WMDE: lucaswerkmeister-wmde@mwmaint1002:~$ time mwscript extensions/Wikibase/repo/maintenance/RemoveDeletedItemsFromTermStore.php wikidatawiki --itemIds "$(sed -n 555,5554p T270249.ids | tr '\n' ',' | sed 's/,$//')" # T270249, 5000 items
  • 12:12 Lucas_WMDE: finished in 43s
  • 12:11 Lucas_WMDE: lucaswerkmeister-wmde@mwmaint1002:~$ time mwscript extensions/Wikibase/repo/maintenance/RemoveDeletedItemsFromTermStore.php wikidatawiki --itemIds "$(sed -n 55,554p T270249.ids | tr '\n' ',' | sed 's/,$//')" # T270249, 500 items
  • 12:10 Lucas_WMDE: finished in 5.1s
  • 12:10 Lucas_WMDE: lucaswerkmeister-wmde@mwmaint1002:~$ time mwscript extensions/Wikibase/repo/maintenance/RemoveDeletedItemsFromTermStore.php wikidatawiki --itemIds "$(sed -n 5,54p T270249.ids | tr '\n' ',' | sed 's/,$//')" # T270249, 50 items
  • 11:58 marostegui@cumin1001: dbctl commit (dc=all): 'db1129 (re)pooling @ 100%: Repool db1129', diff saved to https://phabricator.wikimedia.org/P14847 and previous config saved to /var/cache/conftool/dbconfig/20210315-115826-root.json
  • 11:51 jdrewniak@deploy1002: Synchronized portals: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 00m 58s)
  • 11:50 jdrewniak@deploy1002: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 00m 59s)
  • 11:43 marostegui@cumin1001: dbctl commit (dc=all): 'db1129 (re)pooling @ 75%: Repool db1129', diff saved to https://phabricator.wikimedia.org/P14846 and previous config saved to /var/cache/conftool/dbconfig/20210315-114323-root.json
  • 11:34 moritzm: restarting FPM on mw canaries to pick up new libtiff
  • 11:30 aborrero@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudgw2002-dev.codfw.wmnet with reason: REIMAGE
  • 11:28 aborrero@cumin2001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudgw2002-dev.codfw.wmnet with reason: REIMAGE
  • 11:28 marostegui@cumin1001: dbctl commit (dc=all): 'db1129 (re)pooling @ 50%: Repool db1129', diff saved to https://phabricator.wikimedia.org/P14844 and previous config saved to /var/cache/conftool/dbconfig/20210315-112819-root.json
  • 11:22 moritzm: installing tiff security updates
  • 11:17 moritzm: installing golang-1.7 security updates
  • 11:13 marostegui@cumin1001: dbctl commit (dc=all): 'db1129 (re)pooling @ 25%: Repool db1129', diff saved to https://phabricator.wikimedia.org/P14843 and previous config saved to /var/cache/conftool/dbconfig/20210315-111315-root.json
  • 11:00 volans: upgraded spicerack on cumin1001 to 0.0.49-1+deb10u1
  • 10:58 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1129', diff saved to https://phabricator.wikimedia.org/P14842 and previous config saved to /var/cache/conftool/dbconfig/20210315-105855-marostegui.json
  • 10:58 marostegui@cumin1001: dbctl commit (dc=all): 'db1076 (re)pooling @ 100%: Repool db1076', diff saved to https://phabricator.wikimedia.org/P14841 and previous config saved to /var/cache/conftool/dbconfig/20210315-105820-root.json
  • 10:56 volans@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:05:00 on cumin2001.codfw.wmnet with reason: test
  • 10:55 volans@cumin2001: START - Cookbook sre.hosts.downtime for 0:05:00 on cumin2001.codfw.wmnet with reason: test
  • 10:43 marostegui@cumin1001: dbctl commit (dc=all): 'db1076 (re)pooling @ 75%: Repool db1076', diff saved to https://phabricator.wikimedia.org/P14840 and previous config saved to /var/cache/conftool/dbconfig/20210315-104316-root.json
  • 10:42 moritzm: installing pygments security updates on buster
  • 10:33 volans: upgraded spicerack on cumin2001 to 0.0.49-1+deb10u1
  • 10:28 marostegui@cumin1001: dbctl commit (dc=all): 'db1076 (re)pooling @ 50%: Repool db1076', diff saved to https://phabricator.wikimedia.org/P14839 and previous config saved to /var/cache/conftool/dbconfig/20210315-102813-root.json
  • 10:26 kormat@cumin1001: dbctl commit (dc=all): 'db1114 (re)pooling @ 100%: schema change T267767', diff saved to https://phabricator.wikimedia.org/P14838 and previous config saved to /var/cache/conftool/dbconfig/20210315-102648-kormat.json
  • 10:13 marostegui@cumin1001: dbctl commit (dc=all): 'db1076 (re)pooling @ 25%: Repool db1076', diff saved to https://phabricator.wikimedia.org/P14837 and previous config saved to /var/cache/conftool/dbconfig/20210315-101309-root.json
  • 10:11 kormat@cumin1001: dbctl commit (dc=all): 'db1114 (re)pooling @ 50%: schema change T267767', diff saved to https://phabricator.wikimedia.org/P14836 and previous config saved to /var/cache/conftool/dbconfig/20210315-101143-kormat.json
  • 10:03 kormat@cumin1001: dbctl commit (dc=all): 'db1114 depooling: schema change T267767', diff saved to https://phabricator.wikimedia.org/P14835 and previous config saved to /var/cache/conftool/dbconfig/20210315-100337-kormat.json
  • 10:03 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1114.eqiad.wmnet with reason: schema change T267767
  • 10:02 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db1114.eqiad.wmnet with reason: schema change T267767
  • 09:56 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1076', diff saved to https://phabricator.wikimedia.org/P14834 and previous config saved to /var/cache/conftool/dbconfig/20210315-095607-marostegui.json
  • 09:49 marostegui@cumin1001: dbctl commit (dc=all): 'db1146:3312 (re)pooling @ 100%: Repool db1146:3312', diff saved to https://phabricator.wikimedia.org/P14833 and previous config saved to /var/cache/conftool/dbconfig/20210315-094920-root.json
  • 09:34 marostegui@cumin1001: dbctl commit (dc=all): 'db1146:3312 (re)pooling @ 75%: Repool db1146:3312', diff saved to https://phabricator.wikimedia.org/P14832 and previous config saved to /var/cache/conftool/dbconfig/20210315-093416-root.json
  • 09:23 vgutierrez: rolling restart of LVS cluster to bump depool_threshold to 0.8 on text & upload clusters - T274888
  • 09:19 marostegui@cumin1001: dbctl commit (dc=all): 'db1146:3312 (re)pooling @ 50%: Repool db1146:3312', diff saved to https://phabricator.wikimedia.org/P14831 and previous config saved to /var/cache/conftool/dbconfig/20210315-091912-root.json
  • 09:04 marostegui@cumin1001: dbctl commit (dc=all): 'db1146:3312 (re)pooling @ 25%: Repool db1146:3312', diff saved to https://phabricator.wikimedia.org/P14830 and previous config saved to /var/cache/conftool/dbconfig/20210315-090409-root.json
  • 08:54 marostegui: Stop MySQL on db1136 T277007
  • 08:54 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1136 T277007', diff saved to https://phabricator.wikimedia.org/P14829 and previous config saved to /var/cache/conftool/dbconfig/20210315-085409-marostegui.json
  • 08:35 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1146:3312', diff saved to https://phabricator.wikimedia.org/P14828 and previous config saved to /var/cache/conftool/dbconfig/20210315-083555-marostegui.json
  • 08:33 godog: swift eqiad-prod remove decom hosts from account/container rings - T272836 T276193
  • 08:33 marostegui: Repool labsdb1009 T276980
  • 07:22 elukey: powercycle ms-be1038 - no ssh, no tty available in mgmt serial console, irrecoverable error saved in ilo's system logs

2021-03-14

  • 17:57 marostegui@cumin1001: dbctl commit (dc=all): 'db1146:3314 (re)pooling @ 100%: Repool db1146:3314', diff saved to https://phabricator.wikimedia.org/P14827 and previous config saved to /var/cache/conftool/dbconfig/20210314-175751-root.json
  • 17:42 marostegui@cumin1001: dbctl commit (dc=all): 'db1146:3314 (re)pooling @ 75%: Repool db1146:3314', diff saved to https://phabricator.wikimedia.org/P14826 and previous config saved to /var/cache/conftool/dbconfig/20210314-174248-root.json
  • 17:27 marostegui@cumin1001: dbctl commit (dc=all): 'db1146:3314 (re)pooling @ 50%: Repool db1146:3314', diff saved to https://phabricator.wikimedia.org/P14825 and previous config saved to /var/cache/conftool/dbconfig/20210314-172744-root.json
  • 17:12 marostegui@cumin1001: dbctl commit (dc=all): 'db1146:3314 (re)pooling @ 25%: Repool db1146:3314', diff saved to https://phabricator.wikimedia.org/P14824 and previous config saved to /var/cache/conftool/dbconfig/20210314-171240-root.json
  • 14:43 gehel: depool wdqs1005 and restart blazegraph - will keep depooled until this server has catched up on lag

2021-03-13

  • 19:02 Amir1: change default charset of all core tables in labstestwiki to binary (T269348)
  • 18:53 Amir1: run schema changes for varbinary on wikitech (T269348)
  • 17:38 twentyafterfour: restarted apache on gerrit1001 to resolve apache worker exhaustion see T277127
  • 16:57 Reedy: gerrit web interface is slow/timing out
  • 01:18 ryankemper: T266470 Re-enabled icinga service notifications for `Check no envoy runtime configuration is left persistent` on `wdqs100[9,10]`
  • 01:04 ryankemper: T266470 merged https://gerrit.wikimedia.org/r/c/operations/dns/+/668255 && `ryankemper@authdns1001:~$ sudo authdns-update`
  • 00:55 mutante: [wdqs1009:/etc/envoy] $ sudo /usr/local/sbin/build-envoy-config -c /etc/envoy/

2021-03-12

  • 22:53 ryankemper: T266470 Manually disabled service notifications for `Check no envoy runtime configuration is left persistent`, will need to circle back on Monday to restore notifications
  • 22:10 legoktm: imported mailman-puppetmaster.mailman.eqiad1.wikimedia.cloud facts to puppet-compiler
  • 21:52 mutante: puppetmaster1001 sudo puppet cert clean testreduce.discovery.wmnet (T266509)
  • 21:15 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw2219.codfw.wmnet
  • 20:49 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw2219.codfw.wmnet
  • 20:48 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw2218.codfw.wmnet
  • 20:32 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw2218.codfw.wmnet
  • 20:32 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw2217.codfw.wmnet
  • 20:22 eevans@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'sessionstore' for release 'staging' .
  • 20:15 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw2217.codfw.wmnet
  • 20:14 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw2219.codfw.wmnet
  • 20:14 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw2218.codfw.wmnet
  • 20:14 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw2217.codfw.wmnet
  • 19:47 dzahn@cumin1001: conftool action : set/weight=1; selector: name=mw2376.codfw.wmnet,service=canary
  • 19:47 dzahn@cumin1001: conftool action : set/weight=1; selector: name=mw2374.codfw.wmnet,service=canary
  • 19:47 ebernhardson: start in-place reindex testwiki in eqiad, codfw, cloudelastic cirrus clusters for T269493
  • 19:45 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2374.codfw.wmnet
  • 19:41 mutante: mw2374, mw2376 - depooling to turn them into canaries
  • 19:41 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2376.codfw.wmnet
  • 19:41 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2374.codfw.wmnet
  • 19:09 cstone: tools revision changed from 532f8ecb33 to b7b4060c30
  • 18:28 bblack: authdns1001.wikimedia.org,dns2001.wikimedia.org - upgrade gdnsd to 3.6.0 (half the servers have been on this for a couple weeks now, just finishing up the rollout)
  • 18:24 bblack: dns[15]001.wikimedia.org - upgrade gdnsd to 3.6.0 (half the servers have been on this for a couple weeks now, just finishing up the rollout)
  • 18:21 bblack: dns[34]001.wikimedia.org - upgrade gdnsd to 3.6.0 (half the servers have been on this for a couple weeks now, just finishing up the rollout)
  • 18:03 mutante: depooling mw2244,mw2245 (API on old hardware), mw2229,mw2230 (app on old hardware) - T277119
  • 18:02 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2245.codfw.wmnet
  • 18:01 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2244.codfw.wmnet
  • 18:00 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2230.codfw.wmnet
  • 18:00 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2229.codfw.wmnet
  • 17:00 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on aqs1010.eqiad.wmnet with reason: New buster host
  • 17:00 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on aqs1010.eqiad.wmnet with reason: New buster host
  • 16:05 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:01 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 14:56 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:50 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 14:34 marostegui@cumin1001: dbctl commit (dc=all): 'db1170:3312 (re)pooling @ 100%: Repool db1170:3312 after schema change', diff saved to https://phabricator.wikimedia.org/P14818 and previous config saved to /var/cache/conftool/dbconfig/20210312-143450-root.json
  • 14:19 marostegui@cumin1001: dbctl commit (dc=all): 'db1170:3312 (re)pooling @ 75%: Repool db1170:3312 after schema change', diff saved to https://phabricator.wikimedia.org/P14817 and previous config saved to /var/cache/conftool/dbconfig/20210312-141947-root.json
  • 14:04 marostegui@cumin1001: dbctl commit (dc=all): 'db1170:3312 (re)pooling @ 50%: Repool db1170:3312 after schema change', diff saved to https://phabricator.wikimedia.org/P14816 and previous config saved to /var/cache/conftool/dbconfig/20210312-140443-root.json
  • 13:49 marostegui@cumin1001: dbctl commit (dc=all): 'db1170:3312 (re)pooling @ 25%: Repool db1170:3312 after schema change', diff saved to https://phabricator.wikimedia.org/P14815 and previous config saved to /var/cache/conftool/dbconfig/20210312-134940-root.json
  • 13:24 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db1088.eqiad.wmnet
  • 13:14 marostegui@cumin1001: START - Cookbook sre.hosts.decommission for hosts db1088.eqiad.wmnet
  • 13:10 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1170:3312', diff saved to https://phabricator.wikimedia.org/P14814 and previous config saved to /var/cache/conftool/dbconfig/20210312-131033-marostegui.json
  • 12:12 vgutierrez: restart ats-tls on cp3051
  • 11:55 effie: upgrade memcached on mc1022, mc2022
  • 11:22 hnowlan: corrected git_server for logstash-logback-encoder, cassandra/twcs and cassandra/metrics-collector on deploy1002
  • 09:45 jayme@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'sync'.
  • 09:45 jayme@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'sync'.
  • 09:44 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'sync'.
  • 09:43 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'sync'.
  • 09:28 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mx1001.wikimedia.org
  • 09:25 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host mx1001.wikimedia.org
  • 09:16 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mx2001.wikimedia.org
  • 09:11 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host mx2001.wikimedia.org
  • 09:07 zpapierski@deploy1002: Finished deploy [wikimedia/discovery/analytics@9a408b2]: T273847 export queries to relforge dag deployment - elastic-template handling (duration: 01m 35s)
  • 09:05 zpapierski@deploy1002: Started deploy [wikimedia/discovery/analytics@9a408b2]: T273847 export queries to relforge dag deployment - elastic-template handling
  • 09:00 zpapierski@deploy1002: Finished deploy [wikimedia/discovery/analytics@9a408b2]: T273847 export queries to relforge dag deployment - elastic-template handling (duration: 00m 09s)
  • 09:00 zpapierski@deploy1002: Started deploy [wikimedia/discovery/analytics@9a408b2]: T273847 export queries to relforge dag deployment - elastic-template handling
  • 08:59 zpapierski@deploy1002: Finished deploy [wikimedia/discovery/analytics@9a408b2]: T273847 export queries to relforge dag deployment - elastic-template handling (duration: 00m 10s)
  • 08:59 zpapierski@deploy1002: Started deploy [wikimedia/discovery/analytics@9a408b2]: T273847 export queries to relforge dag deployment - elastic-template handling
  • 08:52 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host pybal-test2003.codfw.wmnet
  • 08:49 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host pybal-test2003.codfw.wmnet
  • 08:47 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host pybal-test2002.codfw.wmnet
  • 08:44 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host pybal-test2002.codfw.wmnet
  • 08:35 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host scandium.eqiad.wmnet
  • 08:20 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host scandium.eqiad.wmnet
  • 08:01 moritzm: installing openjpeg2 security updates
  • 07:16 marostegui: Stop mysql on db2108 to clone db2148
  • 07:16 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2108 T276742', diff saved to https://phabricator.wikimedia.org/P14811 and previous config saved to /var/cache/conftool/dbconfig/20210312-071628-marostegui.json
  • 07:14 marostegui@cumin1001: dbctl commit (dc=all): 'db1082 (re)pooling @ 100%: Repool db1082 after schema change', diff saved to https://phabricator.wikimedia.org/P14810 and previous config saved to /var/cache/conftool/dbconfig/20210312-071400-root.json
  • 07:02 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2148 T276742', diff saved to https://phabricator.wikimedia.org/P14809 and previous config saved to /var/cache/conftool/dbconfig/20210312-070219-marostegui.json
  • 06:58 marostegui@cumin1001: dbctl commit (dc=all): 'db1082 (re)pooling @ 60%: Repool db1082 after schema change', diff saved to https://phabricator.wikimedia.org/P14808 and previous config saved to /var/cache/conftool/dbconfig/20210312-065857-root.json
  • 06:50 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1146:3314 for table checking T276742', diff saved to https://phabricator.wikimedia.org/P14807 and previous config saved to /var/cache/conftool/dbconfig/20210312-065008-marostegui.json
  • 06:43 marostegui@cumin1001: dbctl commit (dc=all): 'db1082 (re)pooling @ 30%: Repool db1082 after schema change', diff saved to https://phabricator.wikimedia.org/P14806 and previous config saved to /var/cache/conftool/dbconfig/20210312-064353-root.json
  • 06:30 marostegui: Deploy schema change on s2 codfw master, lag will appear - T276150 T276156
  • 06:28 marostegui@cumin1001: dbctl commit (dc=all): 'db1082 (re)pooling @ 10%: Repool db1082 after schema change', diff saved to https://phabricator.wikimedia.org/P14805 and previous config saved to /var/cache/conftool/dbconfig/20210312-062850-root.json
  • 06:13 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1082 for schema change', diff saved to https://phabricator.wikimedia.org/P14804 and previous config saved to /var/cache/conftool/dbconfig/20210312-061306-marostegui.json
  • 06:11 marostegui@cumin1001: dbctl commit (dc=all): 'Remove db1088 from dbctl T276025', diff saved to https://phabricator.wikimedia.org/P14803 and previous config saved to /var/cache/conftool/dbconfig/20210312-061118-marostegui.json
  • 04:14 eileen: tools revision changed from d64b2f8cee to 532f8ecb33
  • 01:30 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw2215.codfw.wmnet
  • 00:58 mutante: shutting down mw2215
  • 00:57 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw2215.codfw.wmnet

2021-03-11

  • 22:55 mutante: depooled mw2224 through mw2228 but not removing from DSH groups yet (T277119)
  • 22:55 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2228.codfw.wmnet
  • 22:55 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2227.codfw.wmnet
  • 22:55 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2226.codfw.wmnet
  • 22:54 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2225.codfw.wmnet
  • 22:54 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2224.codfw.wmnet
  • 22:50 dzahn@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 22:48 dzahn@cumin1001: START - Cookbook sre.dns.netbox
  • 22:47 mutante: running DNS cookbook in an attempt to remove mw2216
  • 22:47 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts mw2216.codfw.wmnet
  • 22:41 brennen@deploy1002: rebuilt and synchronized wikiversions files: all wikis to 1.36.0-wmf.34
  • 22:36 brennen: train status: 1.36.0-wmf.34 (T274938): T277229 and T266517 related issues hopefully resolved, rolling forward to all wikis
  • 22:34 brennen@deploy1002: Synchronized php-1.36.0-wmf.34/extensions/WikimediaEvents/modules/ext.wikimediaEvents/clientError.js: Backport: Do not log script errors without file uri (T266517) (duration: 01m 07s)
  • 22:33 dzahn@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 22:30 brennen@deploy1002: Synchronized php-1.36.0-wmf.34/extensions/MobileFrontend/includes/: Backport: Revert "Fix: Save user options only once when Advanced Mode is toggled" (T277229) (duration: 01m 09s)
  • 22:28 dzahn@cumin1001: START - Cookbook sre.dns.netbox
  • 21:57 Amir1: run populate pages in cognate (T259360)
  • 21:28 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2222.codfw.wmnet
  • 21:27 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2223.codfw.wmnet
  • 21:27 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2221.codfw.wmnet
  • 21:27 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2220.codfw.wmnet
  • 21:21 brennen@deploy1002: rebuilt and synchronized wikiversions files: Revert "all wikis to 1.36.0-wmf.34"
  • 21:20 brennen: train status: 1.36.0-wmf.34 (T274938): rolling back to group1 and marking T277229 a train blocker
  • 21:17 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on backup1003.eqiad.wmnet with reason: REIMAGE
  • 21:15 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on backup1003.eqiad.wmnet with reason: REIMAGE
  • {{safesubst:SAL entry|1=21:14 tgr@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:670858|Enable GrowthExperiments link recommendations on testwiki (T277173)] (duration: 00m 59s)}}
  • 21:13 zpapierski@deploy1002: Finished deploy [wikimedia/discovery/analytics@3810277]: T273847 export queries to relforge dag deployment - correct start date (duration: 01m 53s)
  • 21:12 zpapierski@deploy1002: Started deploy [wikimedia/discovery/analytics@3810277]: T273847 export queries to relforge dag deployment - correct start date
  • 21:05 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw2216.codfw.wmnet
  • 21:04 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=99) for hosts mw2215.codfw.wmnet
  • 21:03 otto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventstreams' for release 'canary' .
  • 21:03 otto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventstreams' for release 'production' .
  • 21:03 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw2215.codfw.wmnet
  • 21:00 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4 days, 0:00:00 on mw2216.codfw.wmnet with reason: decom
  • 21:00 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 4 days, 0:00:00 on mw2216.codfw.wmnet with reason: decom
  • 21:00 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4 days, 0:00:00 on mw2215.codfw.wmnet with reason: decom
  • 21:00 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 4 days, 0:00:00 on mw2215.codfw.wmnet with reason: decom
  • 21:00 otto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventstreams' for release 'canary' .
  • 21:00 otto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventstreams' for release 'production' .
  • 20:58 mutante: deactivating codfw API canaries on old hardware (T277119)
  • 20:57 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw2216.codfw.wmnet
  • 20:57 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw2215.codfw.wmnet
  • 20:50 otto@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'eventstreams' for release 'production' .
  • 20:46 zpapierski@deploy1002: Finished deploy [wikimedia/discovery/analytics@cc478d4]: T273847 export queries to relforge dag deployment (duration: 02m 09s)
  • 20:44 zpapierski@deploy1002: Started deploy [wikimedia/discovery/analytics@cc478d4]: T273847 export queries to relforge dag deployment
  • 20:35 otto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventstreams-internal' for release 'main' .
  • 20:33 otto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventstreams-internal' for release 'main' .
  • 20:28 otto@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'eventstreams-internal' for release 'main' .
  • 20:20 mutante: phab1001 - systemctl start phabricator_clean_tmp_files - now Succeeded
  • 20:17 razzi@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host matomo1002.eqiad.wmnet
  • 20:13 razzi@cumin1001: START - Cookbook sre.hosts.reboot-single for host matomo1002.eqiad.wmnet
  • 20:04 brennen@deploy1002: rebuilt and synchronized wikiversions files: all wikis to 1.36.0-wmf.34
  • 19:59 mutante: phab1001 - sudo systemctl start phabricator_clean_tmp_files (manually run after conversion from cron to timer, and it fails with permission issues)
  • 19:55 tgr_: T277173 running mwscript extensions/WikimediaMaintenance/createExtensionTables.php --wiki=testwiki GrowthExperiments
  • 19:54 tgr@deploy1002: Synchronized wmf-config/: Config: Configure GrowthExperiments Add Link settings, step 2 (T277173) (duration: 01m 08s)
  • 19:43 robh@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 19:30 tgr@deploy1002: Synchronized wmf-config/: Config: Configure GrowthExperiments Add Link settings, step 1 (T277173) (duration: 01m 08s)
  • 19:18 tgr@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: wikitech: enable BetaFeatures (T125941) (duration: 01m 08s)
  • 19:13 hnowlan@deploy1002: Finished deploy [restbase/deploy@6f0fe23]: Remove internal ratelimits that were causing service proxy issues (duration: 16m 25s)
  • 18:56 hnowlan@deploy1002: Started deploy [restbase/deploy@6f0fe23]: Remove internal ratelimits that were causing service proxy issues
  • 18:47 tgr_: running mwscript extensions/GrowthExperiments/maintenance/importOresTopics.php testwiki --count 1000 --verbose --wikiId enwiki --apiUrl 'https://en.wikipedia.org/w/api.php'
  • 17:31 effie: install mecached 1.6.6-1 on mwdebug1001
  • 16:26 effie: upgrade memcached on mc1021, mc2021
  • 16:20 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:14 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 16:11 marostegui@cumin1001: dbctl commit (dc=all): 'db1110 (re)pooling @ 100%: Repool db1110 after schema change', diff saved to https://phabricator.wikimedia.org/P14802 and previous config saved to /var/cache/conftool/dbconfig/20210311-161138-root.json
  • 15:56 marostegui@cumin1001: dbctl commit (dc=all): 'db1110 (re)pooling @ 60%: Repool db1110 after schema change', diff saved to https://phabricator.wikimedia.org/P14801 and previous config saved to /var/cache/conftool/dbconfig/20210311-155635-root.json
  • 15:53 cmjohnson1: updating firmware wdqs1009 T274751
  • 15:41 marostegui@cumin1001: dbctl commit (dc=all): 'db1110 (re)pooling @ 30%: Repool db1110 after schema change', diff saved to https://phabricator.wikimedia.org/P14800 and previous config saved to /var/cache/conftool/dbconfig/20210311-154131-root.json
  • 15:35 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:30 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 15:26 marostegui@cumin1001: dbctl commit (dc=all): 'db1110 (re)pooling @ 10%: Repool db1110 after schema change', diff saved to https://phabricator.wikimedia.org/P14799 and previous config saved to /var/cache/conftool/dbconfig/20210311-152627-root.json
  • 15:14 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1110 for schema change', diff saved to https://phabricator.wikimedia.org/P14798 and previous config saved to /var/cache/conftool/dbconfig/20210311-151435-marostegui.json
  • 15:07 marostegui@cumin1001: dbctl commit (dc=all): 'db1113:3315 (re)pooling @ 100%: Repool db1113:3315 after schema change', diff saved to https://phabricator.wikimedia.org/P14797 and previous config saved to /var/cache/conftool/dbconfig/20210311-150707-root.json
  • 14:55 klausman: restarting pybal on lvs2009 T272918
  • 14:52 marostegui@cumin1001: dbctl commit (dc=all): 'db1113:3315 (re)pooling @ 60%: Repool db1113:3315 after schema change', diff saved to https://phabricator.wikimedia.org/P14796 and previous config saved to /var/cache/conftool/dbconfig/20210311-145204-root.json
  • 14:50 klausman: restarting pybal on lvs1016 T272918
  • 14:49 klausman: restarting pybal on lvs2010 T272918
  • 14:46 moritzm: installing openssl (1.1) security updates for stretch
  • 14:37 marostegui@cumin1001: dbctl commit (dc=all): 'db1113:3315 (re)pooling @ 30%: Repool db1113:3315 after schema change', diff saved to https://phabricator.wikimedia.org/P14795 and previous config saved to /var/cache/conftool/dbconfig/20210311-143700-root.json
  • 14:21 marostegui@cumin1001: dbctl commit (dc=all): 'db1113:3315 (re)pooling @ 10%: Repool db1113:3315 after schema change', diff saved to https://phabricator.wikimedia.org/P14794 and previous config saved to /var/cache/conftool/dbconfig/20210311-142157-root.json
  • 14:07 godog: swift eqiad-prod: decrease weight for SSDs on ms-be[1019-1026] - T272836
  • 14:05 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1113:3315 for schema change', diff saved to https://phabricator.wikimedia.org/P14793 and previous config saved to /var/cache/conftool/dbconfig/20210311-140526-marostegui.json
  • 14:03 marostegui@cumin1001: dbctl commit (dc=all): 'db1130 (re)pooling @ 100%: Repool db1130 after schema change', diff saved to https://phabricator.wikimedia.org/P14792 and previous config saved to /var/cache/conftool/dbconfig/20210311-140328-root.json
  • 14:01 marostegui@cumin1001: dbctl commit (dc=all): 'Pool db2149 into s3', diff saved to https://phabricator.wikimedia.org/P14791 and previous config saved to /var/cache/conftool/dbconfig/20210311-140119-marostegui.json
  • 13:56 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1063.eqiad.wmnet
  • 13:48 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be1063.eqiad.wmnet
  • 13:48 marostegui@cumin1001: dbctl commit (dc=all): 'db1130 (re)pooling @ 60%: Repool db1130 after schema change', diff saved to https://phabricator.wikimedia.org/P14790 and previous config saved to /var/cache/conftool/dbconfig/20210311-134825-root.json
  • 13:47 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1062.eqiad.wmnet
  • 13:39 jayme@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'linkrecommendation' for release 'production' .
  • 13:39 jayme@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'linkrecommendation' for release 'external' .
  • 13:39 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be1062.eqiad.wmnet
  • 13:36 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1061.eqiad.wmnet
  • 13:33 jayme@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'linkrecommendation' for release 'external' .
  • 13:33 jayme@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'linkrecommendation' for release 'production' .
  • 13:33 marostegui@cumin1001: dbctl commit (dc=all): 'db1130 (re)pooling @ 30%: Repool db1130 after schema change', diff saved to https://phabricator.wikimedia.org/P14789 and previous config saved to /var/cache/conftool/dbconfig/20210311-133321-root.json
  • 13:18 marostegui@cumin1001: dbctl commit (dc=all): 'db1130 (re)pooling @ 10%: Repool db1130 after schema change', diff saved to https://phabricator.wikimedia.org/P14788 and previous config saved to /var/cache/conftool/dbconfig/20210311-131818-root.json
  • 13:04 moritzm: installing openssl1.0 security updates on stretch
  • 13:03 arturo: copy python-mwclient 0.8.4-1 from stretch-wikimedia to buster-wikimedia for T275865
  • 13:02 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1130 for schema change', diff saved to https://phabricator.wikimedia.org/P14787 and previous config saved to /var/cache/conftool/dbconfig/20210311-130208-marostegui.json
  • 13:01 marostegui@cumin1001: dbctl commit (dc=all): 'db1144:3315 (re)pooling @ 100%: Repool db1144:3315 after schema change', diff saved to https://phabricator.wikimedia.org/P14786 and previous config saved to /var/cache/conftool/dbconfig/20210311-130103-root.json
  • 13:00 hnowlan: imported cassandra_2.2.6-wmf5 to buster-wikimedia
  • 12:49 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be1061.eqiad.wmnet
  • 12:46 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1060.eqiad.wmnet
  • 12:46 marostegui@cumin1001: dbctl commit (dc=all): 'db1144:3315 (re)pooling @ 60%: Repool db1144:3315 after schema change', diff saved to https://phabricator.wikimedia.org/P14785 and previous config saved to /var/cache/conftool/dbconfig/20210311-124559-root.json
  • 12:39 hnowlan: imported cassandra_2.2.6-wmf1 to buster-wikimedia
  • 12:37 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be1060.eqiad.wmnet
  • 12:33 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1059.eqiad.wmnet
  • 12:30 marostegui@cumin1001: dbctl commit (dc=all): 'db1144:3315 (re)pooling @ 30%: Repool db1144:3315 after schema change', diff saved to https://phabricator.wikimedia.org/P14783 and previous config saved to /var/cache/conftool/dbconfig/20210311-123056-root.json
  • 12:30 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be1059.eqiad.wmnet
  • 12:27 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1058.eqiad.wmnet
  • 12:22 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be1058.eqiad.wmnet
  • 12:22 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1057.eqiad.wmnet
  • 12:17 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be1057.eqiad.wmnet
  • 12:17 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1056.eqiad.wmnet
  • 12:16 Lucas_WMDE: EU backport&config window done
  • 12:15 marostegui@cumin1001: dbctl commit (dc=all): 'db1144:3315 (re)pooling @ 10%: Repool db1144:3315 after schema change', diff saved to https://phabricator.wikimedia.org/P14782 and previous config saved to /var/cache/conftool/dbconfig/20210311-121552-root.json
  • 12:13 Lucas_WMDE: lucaswerkmeister-wmde@mwmaint1002:~$ mwscript extensions/Wikibase/repo/maintenance/RemoveDeletedItemsFromTermStore.php wikidatawiki --itemIds 581768,739279,774383,852302 # T270249, finished in 1.124s
  • 12:12 Lucas_WMDE: finished in 1.124s real time
  • 12:12 Lucas_WMDE: start of lucaswerkmeister-wmde@mwmaint1002:~$ time mwscript extensions/Wikibase/repo/maintenance/RemoveDeletedItemsFromTermStore.php wikidatawiki --itemIds 581768,739279,774383,852302
  • 12:10 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/LabsServices.php: Config: Update comment for irc.beta.wmflabs.org (T277081) (comment-only beta-only change) (duration: 01m 13s)
  • 12:08 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be1056.eqiad.wmnet
  • 12:07 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Fix obsolete comments on wgCheckUserLogLogins (T253802) (duration: 01m 08s)
  • 12:07 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1055.eqiad.wmnet
  • 12:05 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1144:3315 for schema change', diff saved to https://phabricator.wikimedia.org/P14781 and previous config saved to /var/cache/conftool/dbconfig/20210311-120554-marostegui.json
  • 12:00 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be1055.eqiad.wmnet
  • 11:59 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1054.eqiad.wmnet
  • 11:53 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be1054.eqiad.wmnet
  • 11:53 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1053.eqiad.wmnet
  • 11:49 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be1053.eqiad.wmnet
  • 11:49 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1052.eqiad.wmnet
  • 11:43 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be1052.eqiad.wmnet
  • 11:42 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1051.eqiad.wmnet
  • 11:37 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be1051.eqiad.wmnet
  • 11:37 klausman@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 11:37 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1050.eqiad.wmnet
  • 11:35 klausman@cumin1001: START - Cookbook sre.dns.netbox
  • 11:34 klausman@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 11:31 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be1050.eqiad.wmnet
  • 11:31 klausman@cumin1001: START - Cookbook sre.dns.netbox
  • 11:31 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1049.eqiad.wmnet
  • 11:27 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3315 (re)pooling @ 100%: Repool db1096:3315 after schema change', diff saved to https://phabricator.wikimedia.org/P14778 and previous config saved to /var/cache/conftool/dbconfig/20210311-112747-root.json
  • 11:26 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be1049.eqiad.wmnet
  • 11:26 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1048.eqiad.wmnet
  • 11:21 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be1048.eqiad.wmnet
  • 11:16 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1047.eqiad.wmnet
  • 11:12 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3315 (re)pooling @ 60%: Repool db1096:3315 after schema change', diff saved to https://phabricator.wikimedia.org/P14777 and previous config saved to /var/cache/conftool/dbconfig/20210311-111243-root.json
  • 11:11 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be1047.eqiad.wmnet
  • 11:04 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1046.eqiad.wmnet
  • 10:59 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be1046.eqiad.wmnet
  • 10:58 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1045.eqiad.wmnet
  • 10:57 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3315 (re)pooling @ 30%: Repool db1096:3315 after schema change', diff saved to https://phabricator.wikimedia.org/P14776 and previous config saved to /var/cache/conftool/dbconfig/20210311-105740-root.json
  • 10:54 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be1045.eqiad.wmnet
  • 10:54 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1044.eqiad.wmnet
  • 10:49 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be1044.eqiad.wmnet
  • 10:42 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3315 (re)pooling @ 10%: Repool db1096:3315 after schema change', diff saved to https://phabricator.wikimedia.org/P14775 and previous config saved to /var/cache/conftool/dbconfig/20210311-104236-root.json
  • 10:42 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1043.eqiad.wmnet
  • 10:35 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be1043.eqiad.wmnet
  • 10:35 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1042.eqiad.wmnet
  • 10:30 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be1042.eqiad.wmnet
  • 10:29 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1041.eqiad.wmnet
  • 10:22 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be1041.eqiad.wmnet
  • 10:22 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1040.eqiad.wmnet
  • 10:17 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1096:3315 for schema change', diff saved to https://phabricator.wikimedia.org/P14774 and previous config saved to /var/cache/conftool/dbconfig/20210311-101714-marostegui.json
  • 10:16 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be1040.eqiad.wmnet
  • 10:16 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1039.eqiad.wmnet
  • 10:16 marostegui@cumin1001: dbctl commit (dc=all): 'Add db2149 to dbctl, depooled, T275633', diff saved to https://phabricator.wikimedia.org/P14773 and previous config saved to /var/cache/conftool/dbconfig/20210311-101604-marostegui.json
  • 10:10 marostegui@cumin1001: dbctl commit (dc=all): 'db1174 (re)pooling @ 100%: Repool db1174 after schema change', diff saved to https://phabricator.wikimedia.org/P14772 and previous config saved to /var/cache/conftool/dbconfig/20210311-101008-root.json
  • 10:09 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be1039.eqiad.wmnet
  • 10:07 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db2109', diff saved to https://phabricator.wikimedia.org/P14771 and previous config saved to /var/cache/conftool/dbconfig/20210311-100705-marostegui.json
  • 10:00 mbsantos@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'proton' for release 'production' .
  • 09:55 marostegui@cumin1001: dbctl commit (dc=all): 'db1174 (re)pooling @ 60%: Repool db1174 after schema change', diff saved to https://phabricator.wikimedia.org/P14770 and previous config saved to /var/cache/conftool/dbconfig/20210311-095504-root.json
  • 09:51 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1038.eqiad.wmnet
  • 09:45 marostegui: Deploy schema change on s5 codfw master, lag will appear - T276150 T276156
  • 09:43 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be1038.eqiad.wmnet
  • 09:40 marostegui@cumin1001: dbctl commit (dc=all): 'db1174 (re)pooling @ 30%: Repool db1174 after schema change', diff saved to https://phabricator.wikimedia.org/P14769 and previous config saved to /var/cache/conftool/dbconfig/20210311-094000-root.json
  • 09:35 akosiaris@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'recommendation-api' for release 'production' .
  • 09:31 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1037.eqiad.wmnet
  • 09:31 hashar: Restarting CI Jenkins
  • 09:29 akosiaris@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'recommendation-api' for release 'production' .
  • 09:24 marostegui@cumin1001: dbctl commit (dc=all): 'db1174 (re)pooling @ 10%: Repool db1174 after schema change', diff saved to https://phabricator.wikimedia.org/P14768 and previous config saved to /var/cache/conftool/dbconfig/20210311-092457-root.json
  • 09:24 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be1037.eqiad.wmnet
  • 09:23 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1036.eqiad.wmnet
  • 09:19 effie: upgrade memcached on mc1020, mc2020
  • 09:16 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be1036.eqiad.wmnet
  • 09:11 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1035.eqiad.wmnet
  • 09:08 akosiaris@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'recommendation-api' for release 'production' .
  • 09:04 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be1035.eqiad.wmnet
  • 09:03 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1174', diff saved to https://phabricator.wikimedia.org/P14767 and previous config saved to /var/cache/conftool/dbconfig/20210311-090342-marostegui.json
  • 09:03 marostegui@cumin1001: dbctl commit (dc=all): 'db1079 (re)pooling @ 100%: Repool db1079 after schema change', diff saved to https://phabricator.wikimedia.org/P14766 and previous config saved to /var/cache/conftool/dbconfig/20210311-090312-root.json
  • 09:03 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1033.eqiad.wmnet
  • 08:58 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be1033.eqiad.wmnet
  • 08:58 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1032.eqiad.wmnet
  • 08:51 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be1032.eqiad.wmnet
  • 08:50 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1031.eqiad.wmnet
  • 08:48 marostegui@cumin1001: dbctl commit (dc=all): 'db1079 (re)pooling @ 60%: Repool db1079 after schema change', diff saved to https://phabricator.wikimedia.org/P14765 and previous config saved to /var/cache/conftool/dbconfig/20210311-084809-root.json
  • 08:43 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be1031.eqiad.wmnet
  • 08:41 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1030.eqiad.wmnet
  • 08:34 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be1030.eqiad.wmnet
  • 08:33 marostegui@cumin1001: dbctl commit (dc=all): 'db1079 (re)pooling @ 30%: Repool db1079 after schema change', diff saved to https://phabricator.wikimedia.org/P14764 and previous config saved to /var/cache/conftool/dbconfig/20210311-083305-root.json
  • 08:30 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1029.eqiad.wmnet
  • 08:25 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2109', diff saved to https://phabricator.wikimedia.org/P14762 and previous config saved to /var/cache/conftool/dbconfig/20210311-082546-marostegui.json
  • 08:25 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db2074', diff saved to https://phabricator.wikimedia.org/P14761 and previous config saved to /var/cache/conftool/dbconfig/20210311-082528-marostegui.json
  • 08:24 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2074', diff saved to https://phabricator.wikimedia.org/P14760 and previous config saved to /var/cache/conftool/dbconfig/20210311-082445-marostegui.json
  • 08:24 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be1029.eqiad.wmnet
  • 08:22 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1028.eqiad.wmnet
  • 08:18 marostegui@cumin1001: dbctl commit (dc=all): 'db1079 (re)pooling @ 10%: Repool db1079 after schema change', diff saved to https://phabricator.wikimedia.org/P14759 and previous config saved to /var/cache/conftool/dbconfig/20210311-081801-root.json
  • 08:16 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be1028.eqiad.wmnet
  • 08:10 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db2108 T275633', diff saved to https://phabricator.wikimedia.org/P14758 and previous config saved to /var/cache/conftool/dbconfig/20210311-081010-marostegui.json
  • 08:09 marostegui@cumin1001: dbctl commit (dc=all): 'Add db2148 to s2 T275633', diff saved to https://phabricator.wikimedia.org/P14757 and previous config saved to /var/cache/conftool/dbconfig/20210311-080944-marostegui.json
  • 07:43 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1079', diff saved to https://phabricator.wikimedia.org/P14756 and previous config saved to /var/cache/conftool/dbconfig/20210311-074352-marostegui.json
  • 07:37 marostegui@cumin1001: dbctl commit (dc=all): 'db1136 (re)pooling @ 100%: Repool db1136 after schema change', diff saved to https://phabricator.wikimedia.org/P14755 and previous config saved to /var/cache/conftool/dbconfig/20210311-073741-root.json
  • 07:22 marostegui@cumin1001: dbctl commit (dc=all): 'db1136 (re)pooling @ 60%: Repool db1136 after schema change', diff saved to https://phabricator.wikimedia.org/P14754 and previous config saved to /var/cache/conftool/dbconfig/20210311-072237-root.json
  • 07:07 marostegui@cumin1001: dbctl commit (dc=all): 'db1136 (re)pooling @ 30%: Repool db1136 after schema change', diff saved to https://phabricator.wikimedia.org/P14753 and previous config saved to /var/cache/conftool/dbconfig/20210311-070734-root.json
  • 06:52 marostegui@cumin1001: dbctl commit (dc=all): 'db1136 (re)pooling @ 10%: Repool db1136 after schema change', diff saved to https://phabricator.wikimedia.org/P14752 and previous config saved to /var/cache/conftool/dbconfig/20210311-065230-root.json
  • 06:48 marostegui: Stop mysql on db2108 to clone db2148 T275633
  • 06:48 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2108 T275633', diff saved to https://phabricator.wikimedia.org/P14750 and previous config saved to /var/cache/conftool/dbconfig/20210311-064821-marostegui.json
  • 06:38 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1136', diff saved to https://phabricator.wikimedia.org/P14749 and previous config saved to /var/cache/conftool/dbconfig/20210311-063814-marostegui.json
  • 06:36 marostegui: Drop testreduce from m5 - T276787
  • 05:34 thcipriani: restarted apache2 on gerrit1001
  • 00:50 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2219.codfw.wmnet
  • 00:49 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2218.codfw.wmnet
  • 00:49 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2217.codfw.wmnet
  • 00:48 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2216.codfw.wmnet
  • 00:48 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2215.codfw.wmnet

2021-03-10