You are browsing a read-only backup copy of Wikitech. The live site can be found at wikitech.wikimedia.org

Server Admin Log

From Wikitech-static
Revision as of 20:42, 14 May 2021 by imported>Stashbot (dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts people1002.eqiad.wmnet)
Jump to navigation Jump to search

2021-05-14

  • 20:42 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts people1002.eqiad.wmnet
  • 20:32 mutante: people1002 - decom'ing - please use people1003 and see list mail
  • 20:31 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts people1002.eqiad.wmnet
  • 18:58 cdanis@cumin1001: END (PASS) - Cookbook sre.network.cf (exit_code=0)
  • 18:58 cdanis@cumin1001: START - Cookbook sre.network.cf
  • 18:39 cdanis: ✔️ cdanis@install1003.wikimedia.org ~ 🕝☕ sudo systemctl restart squid.service
  • 18:14 mutante: people1003/people2002: awk -F: '$6 ~ "^\/home" {print $1,$6}' /etc/passwd | while read line ; do user=${line% *}; dir=${line#* }; sudo mkdir -p ${dir}/public_html; sudo chown $user ${dir}/public_html; done (courtesy of Jbond)
  • 17:49 bblack: install1003 - restored normal resolv.conf + re-enabled+ran puppet
  • 17:41 bblack: install1003 - restart squid
  • 17:35 bblack: install1003 - puppet disabled and /etc/resolv.conf manually patched over to deal with a current issue
  • 17:25 cdanis: rolled back cr1-eqiad/cr2-eqiad interface disables T282881
  • 17:10 cdanis: cdanis@re0.cr1-eqiad# set interfaces gr-3/3/0.1 disable # T282881
  • 17:03 cdanis: cdanis@re0.cr2-eqiad# set interfaces gr-4/3/0.2 disable # T282881
  • 15:22 cdanis@cumin2002: END (PASS) - Cookbook sre.network.cf (exit_code=0)
  • 15:22 cdanis@cumin2002: START - Cookbook sre.network.cf
  • 15:05 Urbanecm: Start server-side upload for 1 video file (T282874)
  • 14:09 andrew@deploy1002: Finished deploy [horizon/deploy@5d0a683]: removing 'locality' from trove dashboard (duration: 04m 15s)
  • 14:04 andrew@deploy1002: Started deploy [horizon/deploy@5d0a683]: removing 'locality' from trove dashboard
  • 12:54 bblack: re-running puppet agent on cp5*
  • 12:19 jbond42: run puppet on CP servers
  • 04:20 tstarling@deploy1002: Synchronized php-1.37.0-wmf.5/includes/revisionlist/RevisionItem.php: fix deprecation warning T282825 (duration: 01m 07s)
  • 04:19 tstarling@deploy1002: Synchronized php-1.37.0-wmf.5/includes/revisiondelete/RevDelRevisionItem.php: fix deprecation warning T282825 (duration: 01m 07s)
  • 04:18 ariel@deploy1002: Finished deploy [dumps/dumps@b97a2a9]: eliminate double slash in construction of api path (duration: 00m 03s)
  • 04:18 ariel@deploy1002: Started deploy [dumps/dumps@b97a2a9]: eliminate double slash in construction of api path
  • 03:25 tstarling@deploy1002: Synchronized php-1.37.0-wmf.5/extensions/MapSources/includes/specials/MapSourcesPage.php: fix PHP notice T282833 (duration: 01m 07s)
  • 03:20 tstarling@deploy1002: Synchronized php-1.37.0-wmf.5/includes/page/WikiPage.php: T282844 (duration: 01m 06s)
  • 03:18 tstarling@deploy1002: Synchronized php-1.37.0-wmf.5/includes/page/PageArchive.php: T282844 (duration: 01m 07s)
  • 03:16 tstarling@deploy1002: Synchronized php-1.37.0-wmf.5/includes/Revision/RevisionArchiveRecord.php: fix DeletedContributions breakage T282844 (duration: 01m 07s)
  • 03:13 tstarling@deploy1002: Synchronized php-1.37.0-wmf.5/includes/logging/LogEventsList.php: fix PHP notice T282834 (duration: 01m 08s)
  • 00:39 ryankemper: T280382 `sudo -i wmf-auto-reimage-host -p T280382 --new wdqs2003.codfw.wmnet` on `ryankemper@cumin2001` tmux session `wdqs_reimage`

2021-05-13

  • 23:53 mutante: [sodium:~] $ sudo systemctl start update-ubuntu-mirror.service
  • 23:50 mutante: [sodium:~] $ sudo -u mirror /usr/local/sbin/update-ubuntu-mirror
  • 23:22 thcipriani@deploy1002: Synchronized php-1.37.0-wmf.5/extensions/WikimediaEvents: Backport: Fix "final_state: vector" bug in VectorPrefDiffInstrumentation (T261842) (duration: 01m 07s)
  • 23:11 thcipriani@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Enable WikiLove extension on tawiki (T280326) (duration: 01m 07s)
  • 23:10 ryankemper: T280382 `sudo -i wmf-auto-reimage-host -p T280382 --new wdqs2003.codfw.wmnet` on `ryankemper@cumin2001` tmux session `wdqs_reimage`
  • 23:09 ryankemper: T280382 `sudo -i wmf-auto-reimage-host -p T280382 wdqs2003.codfw.wmnet` on `ryankemper@cumin2001` tmux session `wdqs_reimage`
  • 23:09 ryankemper: T280382 `sudo -i wmf-auto-reimage-host -p T280382 wdqs1003.eqiad.wmnet` on `ryankemper@cumin1001` tmux session `wdqs_reimage`
  • 20:21 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: REVERT: 9dc74e4: Revert "Enable media change tags on wikipedias" (T266067, T282822) (duration: 01m 07s)
  • 20:09 herron@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'canary' .
  • 20:09 herron@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
  • 20:08 herron@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'canary' .
  • 20:08 herron@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
  • 19:43 dancy@deploy1002: Synchronized php: group1 wikis to 1.37.0-wmf.5 (duration: 01m 06s)
  • 19:42 dancy@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.37.0-wmf.5
  • 19:39 dancy@deploy1002: Synchronized php-1.37.0-wmf.5/extensions/GeoData/includes/Hooks.php: Backport: Make sure mId exists (T282735) (duration: 01m 08s)
  • 19:05 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 80e5b9d: cd113a7: Enable structured_task/article/link_suggestion_interaction schema (T278177) (duration: 01m 06s)
  • 18:59 Urbanecm: Morning B&C is going to take few more minutes
  • 18:36 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts people2001.codfw.wmnet
  • 18:35 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.5/extensions/GrowthExperiments/: 0856ae1: ca52e78: GrowthExperiments backports (T282711, T282175) (duration: 01m 08s)
  • 18:26 mutante: people2001 is going down - people1003 (eqiad) and people2002 (codfw) are your replacements on bullseye
  • 18:25 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts people2001.codfw.wmnet
  • 18:22 Urbanecm: Start server-side upload for 2 video files (T282643, T282644)
  • 18:21 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 4cd6a78: Growth features: Push elwiki and cawiki out of dark mode (T280673; T280172) (duration: 01m 07s)
  • 18:19 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 04eb9d3: Enable media change tags on wikipedias (T266067) (duration: 01m 07s)
  • 18:09 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: b3300c3: 59c8448: Enable Extension:MediaSearch on (test)commons (T265939) (duration: 01m 08s)
  • 17:20 andrew@deploy1002: Finished deploy [horizon/deploy@3d160f6]: Adding Database dashboards (duration: 04m 08s)
  • 17:16 andrew@deploy1002: Started deploy [horizon/deploy@3d160f6]: Adding Database dashboards
  • 16:36 jiji@deploy1002: Synchronized wmf-config/ProductionServices.php: Config: ProductionServices: add poolcounter1005 back to config (T273278) (duration: 01m 07s)
  • 16:26 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host poolcounter1005.eqiad.wmnet
  • 16:24 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host poolcounter1005.eqiad.wmnet
  • 16:24 effie: rebooting poolcounter1005
  • 16:09 jiji@deploy1002: Synchronized wmf-config/ProductionServices.php: Config: ProductionServices: poolcounter1005 will be rebooted for updates (T273278) (duration: 01m 07s)
  • 15:58 jiji@deploy1002: Synchronized wmf-config/ProductionServices.php: Config: ProductionServices: add poolcounter1004 back to config (T273278) (duration: 01m 07s)
  • 15:49 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host poolcounter1004.eqiad.wmnet
  • 15:46 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host poolcounter1004.eqiad.wmnet
  • 15:46 effie: restarting poolcounter1004
  • 15:27 jiji@deploy1002: Synchronized wmf-config/ProductionServices.php: Config: ProductionServices: poolcounter1004 will be rebooted for updates (T273278) (duration: 01m 08s)
  • 14:49 Urbanecm: Start server-side upload for 1 video file (T282785)
  • 14:07 Urbanecm: Start server-side upload for 3 video files (T282558, T282556)
  • 12:40 tgr@deploy1002: Synchronized php-1.37.0-wmf.5/extensions/GrowthExperiments: Backport: instrumentation patches ([[gerrit:690070|]] [[gerrit:690071|]] [[gerrit:690072|]] [[gerrit:690073|]]) (T278116 T278117 T278114 T278177 T278487 T278112 T278111 T278118) (duration: 01m 09s)
  • 11:00 hnowlan: deleting packages still referenced by jessie components: `sudo -i reprepro clearvanished --delete`
  • 10:46 mvolz@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'citoid' for release 'production' .
  • 10:40 mvolz@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'citoid' for release 'staging' .
  • 10:31 mvolz@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'zotero' for release 'production' .
  • 10:25 mvolz@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'zotero' for release 'production' .
  • 10:11 mvolz@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'zotero' for release 'staging' .
  • 08:47 akosiaris@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'linkrecommendation' for release 'internal' .
  • 08:47 akosiaris@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'linkrecommendation' for release 'external' .
  • 08:45 akosiaris@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'linkrecommendation' for release 'external' .
  • 08:45 akosiaris@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'linkrecommendation' for release 'internal' .
  • 08:21 akosiaris@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'linkrecommendation' for release 'staging' .
  • 07:43 kevinbazira@deploy1002: Finished deploy [ores/deploy@8fd23ed]: Regular ORES Deployment T278723 (duration: 32m 50s)
  • 07:10 kevinbazira@deploy1002: Started deploy [ores/deploy@8fd23ed]: Regular ORES Deployment T278723
  • 05:54 _joe_: running docker image prune on contint1001, which has 722 unlinked images stored in its docker daemon
  • 01:20 ryankemper@cumin2001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)

2021-05-12

  • 23:48 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.4/extensions/WikiEditor/includes/WikiEditorHooks.php: 2f6af514c49d47bbec5ce51f9f7263015e039003? PHP VisualEditorFeatureUse logging: properly record session id (T281409) (duration: 01m 07s)
  • 23:40 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.5/extensions/WikiEditor/includes/WikiEditorHooks.php: ef41396: PHP VisualEditorFeatureUse logging: properly record session id (T281409) (duration: 01m 08s)
  • 23:27 ryankemper: T280382 `sudo -i cookbook sre.wdqs.data-transfer --source wdqs2001.codfw.wmnet --dest wdqs2007.codfw.wmnet --reason "transferring fresh wikidata journal following reimage" --blazegraph_instance blazegraph` on `ryankemper@cumin2001` tmux session `wdqs_reimage`
  • 23:27 ryankemper@cumin2001: START - Cookbook sre.wdqs.data-transfer
  • 22:01 ryankemper@cumin2001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 21:56 ryankemper: T280382 `sudo -i cookbook sre.wdqs.data-transfer --source wdqs2001.codfw.wmnet --dest wdqs2007.codfw.wmnet --reason "transferring fresh categories journal following reimage" --blazegraph_instance categories` on `ryankemper@cumin1001` tmux session `reimage`
  • 21:56 ryankemper@cumin2001: START - Cookbook sre.wdqs.data-transfer
  • 21:54 ryankemper: T280382 `wdqs1012.eqiad.wmnet` has been re-imaged and had the appropriate wikidata/categories journal files transferred. `df -h` shows disk space is no longer an issue following the switch to `raid0`: `/dev/mapper/vg0-srv 2.7T 998G 1.6T 39% /srv`
  • 20:57 ottomata: starting new drop_event data purge job to drop all event data older than 90 days in the Hive event database - T273789
  • 20:33 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 19:27 ryankemper@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs2007.codfw.wmnet with reason: REIMAGE
  • 19:25 ryankemper@cumin2001: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs2007.codfw.wmnet with reason: REIMAGE
  • 19:15 ryankemper: T280382 `sudo -i cookbook sre.wdqs.data-transfer --source wdqs1011.eqiad.wmnet --dest wdqs1012.eqiad.wmnet --reason "transferring fresh wikidata journal following reimage" --blazegraph_instance blazegraph` on `ryankemper@cumin1001` tmux session `wdqs_reimage`
  • 19:15 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 19:15 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 19:11 dancy@deploy1002: Synchronized php: group1 wikis to 1.37.0-wmf.4 (duration: 01m 07s)
  • 19:10 dancy@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.37.0-wmf.4
  • 19:10 ryankemper: T280382 `sudo -i cookbook sre.wdqs.data-transfer --source wdqs1011.eqiad.wmnet --dest wdqs1012.eqiad.wmnet --reason "transferring fresh categories journal following reimage" --blazegraph_instance categories` on `ryankemper@cumin1001` tmux session `wdqs_reimage`
  • 19:09 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 19:07 ryankemper@cumin2001: END (PASS) - Cookbook sre.elasticsearch.rolling-operation (exit_code=0) reboot without plugin upgrade (3 nodes at a time) for ElasticSearch cluster search_codfw: codfw reboot - ryankemper@cumin2001 - T280563
  • 19:06 dancy@deploy1002: Synchronized php: group1 wikis to 1.37.0-wmf.5 (duration: 01m 06s)
  • 19:05 dancy@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.37.0-wmf.5
  • 19:05 ryankemper: T280382 T281437 `sudo -i wmf-auto-reimage-host -p T280382 wdqs2007.codfw.wmnet` on `ryankemper@cumin2001` tmux session `wdqs_reimage`
  • 19:00 ryankemper: T280563 `sudo -i cookbook sre.elasticsearch.rolling-operation search_codfw "codfw reboot" --reboot --nodes-per-run 3 --start-datetime 2021-04-29T23:04:29 --task-id T280563` on `ryankemper@cumin2001` tmux session `elastic_restarts`
  • 19:00 ryankemper@cumin2001: START - Cookbook sre.elasticsearch.rolling-operation reboot without plugin upgrade (3 nodes at a time) for ElasticSearch cluster search_codfw: codfw reboot - ryankemper@cumin2001 - T280563
  • 18:59 ryankemper: [Elastic] Restarted `*search*` services on `elastic2058`
  • 18:48 mutante: rsyncing home dirs of people1003 over to people2002 as well (T280989)
  • 18:42 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.5/extensions/GrowthExperiments/: 3999be1: Add Link: refine exclusion rules for finding link text matches (duration: 01m 08s)
  • 18:28 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: eb65aff: Update wordmark and tagline for kawiki (T278251; 2/2) (duration: 01m 09s)
  • 18:26 urbanecm@deploy1002: Synchronized static/images/mobile/: eb65aff: Update wordmark and tagline for kawiki (T278251; 1/2) (duration: 01m 06s)
  • 18:25 urbanecm@deploy1002: sync-file aborted: eb65aff: Update wordmark and tagline for kawiki (T278251) (duration: 00m 00s)
  • 18:16 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 0cd3297: Disable Education Program namespaces in cswiki (T282691) (duration: 01m 15s)
  • 18:11 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.5/includes/skins/SkinTemplate.php: 7f14913: Modern keys must be unset (T282646) (duration: 01m 08s)
  • 18:05 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 11defd4: enwiki: Growth features: Change help panel links (T281896) (duration: 01m 23s)
  • 16:15 hnowlan: including envoyproxy_1.15.5-1_amd64.changes with reprepro
  • 15:51 aborrero@cumin2001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts cloudnet2003-dev.codfw.wmnet
  • 14:45 aborrero@cumin2001: START - Cookbook sre.hosts.decommission for hosts cloudnet2003-dev.codfw.wmnet
  • 14:02 marostegui: Upgrad mysql on clouddb1015
  • 14:01 marostegui: Upgraded mysql on clouddb1014
  • 13:57 kormat: uploaded wmfmariadbpy 0.6.1 for bullseye
  • 13:32 marostegui@cumin1001: dbctl commit (dc=all): 'db1174 (re)pooling @ 100%: Repool db1174', diff saved to https://phabricator.wikimedia.org/P15950 and previous config saved to /var/cache/conftool/dbconfig/20210512-133248-root.json
  • 13:17 marostegui@cumin1001: dbctl commit (dc=all): 'db1174 (re)pooling @ 75%: Repool db1174', diff saved to https://phabricator.wikimedia.org/P15949 and previous config saved to /var/cache/conftool/dbconfig/20210512-131745-root.json
  • 13:15 aborrero@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudnet2004-dev.codfw.wmnet with reason: REIMAGE
  • 13:13 aborrero@cumin2001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudnet2004-dev.codfw.wmnet with reason: REIMAGE
  • 13:06 volans@cumin2002: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) homer to cumin2002.codfw.wmnet with reason: Test deploy procedure on cumin2002 - volans@cumin2002
  • 13:05 volans@cumin2002: START - Cookbook sre.deploy.python-code homer to cumin2002.codfw.wmnet with reason: Test deploy procedure on cumin2002 - volans@cumin2002
  • 13:02 marostegui@cumin1001: dbctl commit (dc=all): 'db1174 (re)pooling @ 50%: Repool db1174', diff saved to https://phabricator.wikimedia.org/P15948 and previous config saved to /var/cache/conftool/dbconfig/20210512-130239-root.json
  • 12:47 marostegui@cumin1001: dbctl commit (dc=all): 'db1174 (re)pooling @ 25%: Repool db1174', diff saved to https://phabricator.wikimedia.org/P15947 and previous config saved to /var/cache/conftool/dbconfig/20210512-124736-root.json
  • 12:44 aborrero@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudnet2004-dev.codfw.wmnet with reason: REIMAGE
  • 12:42 aborrero@cumin2001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudnet2004-dev.codfw.wmnet with reason: REIMAGE
  • 12:10 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1174', diff saved to https://phabricator.wikimedia.org/P15946 and previous config saved to /var/cache/conftool/dbconfig/20210512-121004-marostegui.json
  • 12:07 marostegui@cumin1001: dbctl commit (dc=all): 'db1101:3317 (re)pooling @ 100%: Repool db1101:3317', diff saved to https://phabricator.wikimedia.org/P15945 and previous config saved to /var/cache/conftool/dbconfig/20210512-120746-root.json
  • 11:52 marostegui@cumin1001: dbctl commit (dc=all): 'db1101:3317 (re)pooling @ 75%: Repool db1101:3317', diff saved to https://phabricator.wikimedia.org/P15944 and previous config saved to /var/cache/conftool/dbconfig/20210512-115242-root.json
  • 11:43 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.5/extensions/GrowthExperiments/: 6cc2530: c268d08: b89592e: 7620953: 8fd7610: GrowthExperiments backports (duration: 01m 17s)
  • 11:37 marostegui@cumin1001: dbctl commit (dc=all): 'db1101:3317 (re)pooling @ 50%: Repool db1101:3317', diff saved to https://phabricator.wikimedia.org/P15943 and previous config saved to /var/cache/conftool/dbconfig/20210512-113737-root.json
  • 11:22 marostegui@cumin1001: dbctl commit (dc=all): 'db1101:3317 (re)pooling @ 25%: Repool db1101:3317', diff saved to https://phabricator.wikimedia.org/P15942 and previous config saved to /var/cache/conftool/dbconfig/20210512-112234-root.json
  • 11:09 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 9939edb: zhwikinews: Allow sysops to grant/revoke transwiki group (T273405) (duration: 02m 17s)
  • 10:46 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 180 days, 0:00:00 on cloudvirt1038.eqiad.wmnet with reason: T276922
  • 10:46 aborrero@cumin1001: START - Cookbook sre.hosts.downtime for 180 days, 0:00:00 on cloudvirt1038.eqiad.wmnet with reason: T276922
  • 10:32 volans@cumin2002: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) homer to cumin2002.codfw.wmnet with reason: Initial deploy to cumin2002 - volans@cumin2002
  • 10:31 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host poolcounter2004.codfw.wmnet
  • 10:29 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host poolcounter2004.codfw.wmnet
  • 10:14 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host poolcounter2003.codfw.wmnet
  • 10:01 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host poolcounter2003.codfw.wmnet
  • 10:01 effie: reboot poolcounter2003 and poolcounter2004
  • 09:55 volans@cumin2002: START - Cookbook sre.deploy.python-code homer to cumin2002.codfw.wmnet with reason: Initial deploy to cumin2002 - volans@cumin2002
  • 09:33 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1101:3317', diff saved to https://phabricator.wikimedia.org/P15940 and previous config saved to /var/cache/conftool/dbconfig/20210512-093333-marostegui.json
  • 09:33 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3317 (re)pooling @ 100%: Repool db1098:3317', diff saved to https://phabricator.wikimedia.org/P15939 and previous config saved to /var/cache/conftool/dbconfig/20210512-093308-root.json
  • 09:20 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db1074.eqiad.wmnet
  • 09:18 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3317 (re)pooling @ 75%: Repool db1098:3317', diff saved to https://phabricator.wikimedia.org/P15938 and previous config saved to /var/cache/conftool/dbconfig/20210512-091804-root.json
  • 09:10 marostegui@cumin1001: START - Cookbook sre.hosts.decommission for hosts db1074.eqiad.wmnet
  • 09:03 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3317 (re)pooling @ 50%: Repool db1098:3317', diff saved to https://phabricator.wikimedia.org/P15937 and previous config saved to /var/cache/conftool/dbconfig/20210512-090301-root.json
  • 08:47 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3317 (re)pooling @ 25%: Repool db1098:3317', diff saved to https://phabricator.wikimedia.org/P15936 and previous config saved to /var/cache/conftool/dbconfig/20210512-084757-root.json
  • 08:47 marostegui@cumin1001: dbctl commit (dc=all): 'Remove db1074 from dbctl T281959', diff saved to https://phabricator.wikimedia.org/P15935 and previous config saved to /var/cache/conftool/dbconfig/20210512-084755-marostegui.json
  • 08:23 jbond42: rolling restart of ats
  • 07:10 marostegui@cumin1001: dbctl commit (dc=all): 'db1121 (re)pooling @ 100%: Repool db1121', diff saved to https://phabricator.wikimedia.org/P15934 and previous config saved to /var/cache/conftool/dbconfig/20210512-071017-root.json
  • 07:02 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1098:3317', diff saved to https://phabricator.wikimedia.org/P15933 and previous config saved to /var/cache/conftool/dbconfig/20210512-070202-marostegui.json
  • 06:55 marostegui@cumin1001: dbctl commit (dc=all): 'db1121 (re)pooling @ 75%: Repool db1121', diff saved to https://phabricator.wikimedia.org/P15932 and previous config saved to /var/cache/conftool/dbconfig/20210512-065513-root.json
  • 06:40 marostegui@cumin1001: dbctl commit (dc=all): 'db1121 (re)pooling @ 50%: Repool db1121', diff saved to https://phabricator.wikimedia.org/P15931 and previous config saved to /var/cache/conftool/dbconfig/20210512-064009-root.json
  • 06:25 marostegui@cumin1001: dbctl commit (dc=all): 'db1121 (re)pooling @ 25%: Repool db1121', diff saved to https://phabricator.wikimedia.org/P15930 and previous config saved to /var/cache/conftool/dbconfig/20210512-062506-root.json
  • 06:21 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1079', diff saved to https://phabricator.wikimedia.org/P15929 and previous config saved to /var/cache/conftool/dbconfig/20210512-062118-marostegui.json
  • 06:20 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db2121 and db2108 in s7 T282535', diff saved to https://phabricator.wikimedia.org/P15928 and previous config saved to /var/cache/conftool/dbconfig/20210512-062046-marostegui.json
  • 06:17 marostegui@cumin1001: dbctl commit (dc=all): 'db1181 (re)pooling @ 100%: Repool db1181', diff saved to https://phabricator.wikimedia.org/P15927 and previous config saved to /var/cache/conftool/dbconfig/20210512-061702-root.json
  • 06:08 marostegui@cumin1001: dbctl commit (dc=all): 'Move db2148 to also serve vslow in s2 T282535', diff saved to https://phabricator.wikimedia.org/P15926 and previous config saved to /var/cache/conftool/dbconfig/20210512-060817-marostegui.json
  • 06:01 marostegui@cumin1001: dbctl commit (dc=all): 'db1181 (re)pooling @ 75%: Repool db1181', diff saved to https://phabricator.wikimedia.org/P15925 and previous config saved to /var/cache/conftool/dbconfig/20210512-060158-root.json
  • 05:46 marostegui@cumin1001: dbctl commit (dc=all): 'db1181 (re)pooling @ 50%: Repool db1181', diff saved to https://phabricator.wikimedia.org/P15924 and previous config saved to /var/cache/conftool/dbconfig/20210512-054655-root.json
  • 05:31 marostegui@cumin1001: dbctl commit (dc=all): 'db1181 (re)pooling @ 25%: Repool db1181', diff saved to https://phabricator.wikimedia.org/P15923 and previous config saved to /var/cache/conftool/dbconfig/20210512-053151-root.json
  • 05:00 marostegui: Stop MySQL on labsdb1009 labsdb1010 labsdb1011 T282524 T282523 T282522
  • 04:47 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1181', diff saved to https://phabricator.wikimedia.org/P15922 and previous config saved to /var/cache/conftool/dbconfig/20210512-044728-marostegui.json
  • 04:42 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2121 T282535', diff saved to https://phabricator.wikimedia.org/P15920 and previous config saved to /var/cache/conftool/dbconfig/20210512-044222-marostegui.json
  • 04:41 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2108 T282535', diff saved to https://phabricator.wikimedia.org/P15919 and previous config saved to /var/cache/conftool/dbconfig/20210512-044109-marostegui.json
  • 04:38 marostegui: Drop testing mailman3 databases T281548
  • 04:36 Amir1: importing archives of wikitech-l (T280322)
  • 01:42 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 1 day, 0:00:00 on people2002.codfw.wmnet with reason: new host
  • 01:42 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on people2002.codfw.wmnet with reason: new host
  • 01:35 mutante: people2002 - created new VM resembling people2001, signed puppet cert request, initial puppet run T280989
  • 01:19 tstarling@deploy1002: Synchronized php-1.37.0-wmf.5/includes/specialpage/ChangesListSpecialPage.php: T282183 fix hidemyself in RC and watchlist (duration: 01m 08s)
  • 01:17 tstarling@deploy1002: Synchronized php-1.37.0-wmf.4/includes/specialpage/ChangesListSpecialPage.php: T282183 fix hidemyself in RC and watchlist (duration: 01m 16s)
  • 00:54 mutante: made public_html dirs on people1002 readonly to make it obvious it is not the active backend anymore
  • 00:51 mutante: [people1002:/home] $ sudo find . -type d -name public_html -exec chmod 555 {} \;

2021-05-11

  • 23:19 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: ec37795: Change namespace names and aliases on tiwiki and tiwiktionary (T263840) (duration: 01m 07s)
  • 23:16 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 5bc40ac: ptwiki: Use celebration logos in new vector (T281925) (duration: 01m 06s)
  • 23:08 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: eac843a: Make DT source mode toolbar available as beta on all wikis (T279124) (duration: 01m 12s)
  • 23:06 urbanecm@deploy1002: Synchronized static/images/mobile/copyright/wikipedia-pt-20.png: 60e6e4e: ptwiki: Add wikipedia-pt-20.png (T281925) (duration: 01m 08s)
  • 23:02 urbanecm@deploy1002: Synchronized static/images/mobile/copyright/: e35199b: Adding square logo and wordmark for ptwiki 20 years celebration (T281925) (duration: 01m 50s)
  • 22:14 legoktm@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts lists1002.wikimedia.org
  • 22:05 legoktm@cumin1001: START - Cookbook sre.hosts.decommission for hosts lists1002.wikimedia.org
  • 21:37 Urbanecm: Start server-side upload for 3 video files (T282566, T282565, T282559)
  • 21:37 herron@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on logstash1012.eqiad.wmnet with reason: REIMAGE
  • 21:34 herron@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on logstash1012.eqiad.wmnet with reason: REIMAGE
  • 20:52 legoktm: upgraded mailman3 on lists1001
  • 20:37 dzahn@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host people2002.codfw.wmnet
  • 20:24 mforns@deploy1002: Finished deploy [analytics/refinery@270c753] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@270c753fc746b979cf90e1537f9a67ede6372795] (duration: 06m 57s)
  • 20:17 mforns@deploy1002: Started deploy [analytics/refinery@270c753] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@270c753fc746b979cf90e1537f9a67ede6372795]
  • 20:17 mforns@deploy1002: Finished deploy [analytics/refinery@270c753] (thin): Regular analytics weekly train THIN [analytics/refinery@270c753fc746b979cf90e1537f9a67ede6372795] (duration: 00m 05s)
  • 20:17 mforns@deploy1002: Started deploy [analytics/refinery@270c753] (thin): Regular analytics weekly train THIN [analytics/refinery@270c753fc746b979cf90e1537f9a67ede6372795]
  • 20:17 mforns@deploy1002: Finished deploy [analytics/refinery@270c753]: Regular analytics weekly train [analytics/refinery@270c753fc746b979cf90e1537f9a67ede6372795] (duration: 17m 01s)
  • 20:00 mforns@deploy1002: Started deploy [analytics/refinery@270c753]: Regular analytics weekly train [analytics/refinery@270c753fc746b979cf90e1537f9a67ede6372795]
  • 19:55 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm for new host people2002.codfw.wmnet
  • 19:46 mforns@deploy1002: Finished deploy [analytics/refinery@7e0598d] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@7e0598d3f0805bf3dda4e01b637d95c16a6a668b] (duration: 09m 45s)
  • 19:37 mforns@deploy1002: Started deploy [analytics/refinery@7e0598d] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@7e0598d3f0805bf3dda4e01b637d95c16a6a668b]
  • 19:33 dancy@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.37.0-wmf.5
  • 19:29 mforns@deploy1002: Finished deploy [analytics/refinery@7e0598d] (thin): Regular analytics weekly train THIN [analytics/refinery@7e0598d3f0805bf3dda4e01b637d95c16a6a668b] (duration: 00m 07s)
  • 19:29 mforns@deploy1002: Started deploy [analytics/refinery@7e0598d] (thin): Regular analytics weekly train THIN [analytics/refinery@7e0598d3f0805bf3dda4e01b637d95c16a6a668b]
  • 19:28 mforns@deploy1002: Finished deploy [analytics/refinery@7e0598d]: Regular analytics weekly train [analytics/refinery@7e0598d3f0805bf3dda4e01b637d95c16a6a668b] (duration: 45m 45s)
  • 18:55 herron@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on logstash1011.eqiad.wmnet with reason: REIMAGE
  • 18:53 otto@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Migrate VirtualPageView to EventPlatform on testwiki - T238138 (duration: 01m 09s)
  • 18:52 herron@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on logstash1011.eqiad.wmnet with reason: REIMAGE
  • 18:43 mforns@deploy1002: Started deploy [analytics/refinery@7e0598d]: Regular analytics weekly train [analytics/refinery@7e0598d3f0805bf3dda4e01b637d95c16a6a668b]
  • 18:20 dancy@deploy1002: Finished scap: testwikis wikis to 1.37.0-wmf.5 (duration: 09m 43s)
  • 18:10 dancy@deploy1002: Started scap: testwikis wikis to 1.37.0-wmf.5
  • 17:36 andrew@deploy1002: Finished deploy [horizon/deploy@acc3c68]: testing default policy deployment in codfw1dev (again) (duration: 01m 25s)
  • 17:35 herron@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on logstash1010.eqiad.wmnet with reason: REIMAGE
  • 17:35 andrew@deploy1002: Started deploy [horizon/deploy@acc3c68]: testing default policy deployment in codfw1dev (again)
  • 17:34 andrew@deploy1002: Finished deploy [horizon/deploy@acc3c68]: testing default policy deployment in codfw1dev (again) (duration: 02m 27s)
  • 17:33 herron@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on logstash1010.eqiad.wmnet with reason: REIMAGE
  • 17:32 andrew@deploy1002: Started deploy [horizon/deploy@acc3c68]: testing default policy deployment in codfw1dev (again)
  • 17:31 andrew@deploy1002: Finished deploy [horizon/deploy@2604d7b]: testing default policy deployment in codfw1dev (duration: 01m 59s)
  • 17:29 andrew@deploy1002: Started deploy [horizon/deploy@2604d7b]: testing default policy deployment in codfw1dev
  • 17:20 mutante: the backend for people.wikimedia.org switched from people1002 to people1003, the people.wikimedia.org CNAME has been updated. MOTD is about to be updated to inform users.
  • 17:18 legoktm: disabled pipermail redirects on lists.wikimedia.org
  • 17:07 dancy@deploy1002: scap failed: average error rate on 9/9 canaries increased by 10x (rerun with --force to override this check, see https://logstash.wikimedia.org/goto/83629bcb5560d11e61d3085c89dd9ed6 for details)
  • 16:12 jynus: restarting bacula-dir on backup1001, stuck process
  • 15:59 dancy@deploy1002: rebuilt and synchronized wikiversions files: (no justification provided)
  • 15:58 herron@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mwlog1001.eqiad.wmnet
  • 15:55 bstorm: restart haproxy on dbproxy1018/9 to remove old config
  • 15:47 herron@cumin1001: START - Cookbook sre.hosts.decommission for hosts mwlog1001.eqiad.wmnet
  • 15:38 herron@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mwlog2001.codfw.wmnet
  • 15:37 dancy@deploy1002: scap failed: average error rate on 9/9 canaries increased by 10x (rerun with --force to override this check, see https://logstash.wikimedia.org/goto/83629bcb5560d11e61d3085c89dd9ed6 for details)
  • 15:36 dancy@deploy1002: sync-world aborted: testwikis wikis to 1.37.0-wmf.4 (duration: 02m 04s)
  • 15:34 dancy@deploy1002: Started scap: testwikis wikis to 1.37.0-wmf.4
  • 15:33 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:31 dancy@deploy1002: scap failed: RuntimeError scap failed: average error rate on 9/9 canaries increased by 10x (rerun with --force to override this check, see https://logstash.wikimedia.org/goto/83629bcb5560d11e61d3085c89dd9ed6 for details) (duration: 17m 36s)
  • 15:31 dancy@deploy1002: scap failed: average error rate on 9/9 canaries increased by 10x (rerun with --force to override this check, see https://logstash.wikimedia.org/goto/83629bcb5560d11e61d3085c89dd9ed6 for details)
  • 15:27 herron@cumin1001: START - Cookbook sre.hosts.decommission for hosts mwlog2001.codfw.wmnet
  • 15:24 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 15:13 dancy@deploy1002: Started scap: testwikis wikis to 1.37.0-wmf.5
  • 15:03 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:01 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 14:59 herron@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on logstash1010.eqiad.wmnet with reason: REIMAGE
  • 14:57 herron@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on logstash1010.eqiad.wmnet with reason: REIMAGE
  • 14:49 moritzm: installing busybox security updates
  • 14:38 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:31 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 14:29 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 14:27 moritzm: installing cgal security updates
  • 14:26 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 14:14 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:14 hashar: Restarted CI Jenkins with a snapshot of the Gearman Jenkins plugin # T281737
  • 14:10 hashar: Restarted CI Jenkins for plugin upgrade # T282433
  • 14:05 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 14:01 hashar: Restarted releases Jenkins for plugin upgrade # T282433
  • 13:47 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 1d4d007: enwiki: Growth features: Change help panel links (T281896) (duration: 01m 02s)
  • 13:39 jbond42: rolling restart of ats-backend
  • 12:11 jmm@cumin2002: END (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: mc1027.eqiad.wmnet
  • 12:11 jmm@cumin2002: START - Cookbook sre.debmonitor.remove-hosts for 1 hosts: mc1027.eqiad.wmnet
  • 11:45 marostegui@cumin1001: dbctl commit (dc=all): 'db1162 (re)pooling @ 100%: Repool db1162', diff saved to https://phabricator.wikimedia.org/P15913 and previous config saved to /var/cache/conftool/dbconfig/20210511-114540-root.json
  • 11:35 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudgw1002.eqiad.wmnet
  • 11:30 marostegui@cumin1001: dbctl commit (dc=all): 'db1162 (re)pooling @ 75%: Repool db1162', diff saved to https://phabricator.wikimedia.org/P15912 and previous config saved to /var/cache/conftool/dbconfig/20210511-113036-root.json
  • 11:16 ladsgroup@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Add P2671 and P4839 to deprecated properties list (T280779) (duration: 00m 58s)
  • 11:15 marostegui@cumin1001: dbctl commit (dc=all): 'db1162 (re)pooling @ 50%: Repool db1162', diff saved to https://phabricator.wikimedia.org/P15911 and previous config saved to /var/cache/conftool/dbconfig/20210511-111532-root.json
  • 11:00 marostegui@cumin1001: dbctl commit (dc=all): 'db1162 (re)pooling @ 25%: Repool db1162', diff saved to https://phabricator.wikimedia.org/P15910 and previous config saved to /var/cache/conftool/dbconfig/20210511-110029-root.json
  • 10:52 jayme@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
  • 10:46 jayme@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
  • 10:23 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1162', diff saved to https://phabricator.wikimedia.org/P15909 and previous config saved to /var/cache/conftool/dbconfig/20210511-102303-marostegui.json
  • 10:22 marostegui@cumin1001: dbctl commit (dc=all): 'db1170:3312 (re)pooling @ 100%: Repool db1170:3312', diff saved to https://phabricator.wikimedia.org/P15908 and previous config saved to /var/cache/conftool/dbconfig/20210511-102212-root.json
  • 10:13 aborrero@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudgw1002.eqiad.wmnet
  • 10:13 jayme@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
  • 10:07 jayme@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
  • 10:07 marostegui@cumin1001: dbctl commit (dc=all): 'db1170:3312 (re)pooling @ 75%: Repool db1170:3312', diff saved to https://phabricator.wikimedia.org/P15907 and previous config saved to /var/cache/conftool/dbconfig/20210511-100708-root.json
  • 09:54 aborrero@cumin1001: END (ERROR) - Cookbook sre.hosts.reboot-single (exit_code=97) for host cloudgw2002-dev.codfw.wmnet
  • 09:52 marostegui@cumin1001: dbctl commit (dc=all): 'db1170:3312 (re)pooling @ 50%: Repool db1170:3312', diff saved to https://phabricator.wikimedia.org/P15904 and previous config saved to /var/cache/conftool/dbconfig/20210511-095204-root.json
  • 09:37 marostegui@cumin1001: dbctl commit (dc=all): 'db1170:3312 (re)pooling @ 25%: Repool db1170:3312', diff saved to https://phabricator.wikimedia.org/P15903 and previous config saved to /var/cache/conftool/dbconfig/20210511-093701-root.json
  • 09:23 aborrero@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudgw2002-dev.codfw.wmnet
  • 08:37 jayme@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'.
  • 08:36 jayme@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'.
  • 08:35 jayme@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'apply'.
  • 08:34 jayme@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'apply'.
  • 08:32 moritzm: installing hivex security updates
  • 08:31 jayme@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'sync'.
  • 08:30 jayme@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'sync'.
  • 08:20 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1170:3312', diff saved to https://phabricator.wikimedia.org/P15901 and previous config saved to /var/cache/conftool/dbconfig/20210511-082038-marostegui.json
  • 08:19 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'sync'.
  • 08:17 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'sync'.
  • 07:55 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'sync'.
  • 07:54 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'sync'.
  • 07:40 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'sync'.
  • 07:39 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'sync'.
  • 07:07 marostegui@cumin1001: dbctl commit (dc=all): 'db1182 (re)pooling @ 100%: Repool db1182', diff saved to https://phabricator.wikimedia.org/P15899 and previous config saved to /var/cache/conftool/dbconfig/20210511-070742-root.json
  • 06:52 marostegui@cumin1001: dbctl commit (dc=all): 'db1182 (re)pooling @ 75%: Repool db1182', diff saved to https://phabricator.wikimedia.org/P15898 and previous config saved to /var/cache/conftool/dbconfig/20210511-065238-root.json
  • 06:50 marostegui: Stop replication on db2094:3318 T282514
  • 06:37 marostegui@cumin1001: dbctl commit (dc=all): 'db1182 (re)pooling @ 50%: Repool db1182', diff saved to https://phabricator.wikimedia.org/P15897 and previous config saved to /var/cache/conftool/dbconfig/20210511-063734-root.json
  • 06:22 marostegui@cumin1001: dbctl commit (dc=all): 'db1182 (re)pooling @ 25%: Repool db1182', diff saved to https://phabricator.wikimedia.org/P15896 and previous config saved to /var/cache/conftool/dbconfig/20210511-062231-root.json
  • 05:44 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db1082.eqiad.wmnet
  • 05:36 marostegui@cumin1001: START - Cookbook sre.hosts.decommission for hosts db1082.eqiad.wmnet
  • 05:32 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1121.eqiad.wmnet with reason: REIMAGE
  • 05:30 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1121.eqiad.wmnet with reason: REIMAGE
  • 05:11 marostegui: Reimage db1121 to buster, this will generate lag on s4 (commonswiki) on wikireplicas T280492
  • 05:11 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1121 - going to be reimaged to buster T280492', diff saved to https://phabricator.wikimedia.org/P15895 and previous config saved to /var/cache/conftool/dbconfig/20210511-051102-marostegui.json
  • 05:08 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1182', diff saved to https://phabricator.wikimedia.org/P15894 and previous config saved to /var/cache/conftool/dbconfig/20210511-050816-marostegui.json

2021-05-10

  • 23:38 urbanecm@deploy1002: Synchronized wmf-config/CommonSettings.php: 779fb53: Update messages used for tech CoC (T280886) (duration: 00m 56s)
  • 23:32 urbanecm@deploy1002: Synchronized wmf-config/extension-list: ba8b786: NO-OP: Enable ChessBrowser on beta (T244075) (duration: 00m 57s)
  • 23:12 urbanecm@deploy1002: Synchronized wmf-config/logos.php: dd6fa65: Use ptwiki 20th anniversary logos (T281925) (duration: 00m 59s)
  • 23:08 urbanecm@deploy1002: Synchronized static/images/project-logos/: f2a76b1: Add ptwiki 20th anniversary logos (T281925) (duration: 00m 58s)
  • 22:28 eileen: civicrm revision changed from 2052d79248 to 38ac15233f, config revision is 47f21e4568
  • 21:59 dancy@deploy1002: Synchronized php-1.37.0-wmf.4/extensions/MediaSearch/MediaSearch.i18n.php: Backport: Manually include I18nUtils class (T282206) (duration: 00m 56s)
  • 21:45 dancy@deploy1002: Synchronized php-1.37.0-wmf.3/extensions/MediaSearch/MediaSearch.i18n.php: Backport: Manually include I18nUtils class (T282206) (duration: 01m 01s)
  • 21:39 legoktm: nvm, downgraded flufl.bounce on lists1001
  • 21:26 legoktm: upgraded flufl.bounce on lists1001 and restarted mailman3 T282348
  • 20:44 andrew@deploy1002: Finished deploy [horizon/deploy@2604d7b]: more deployment fixes (duration: 03m 44s)
  • 20:41 andrew@deploy1002: Started deploy [horizon/deploy@2604d7b]: more deployment fixes
  • 20:40 andrew@deploy1002: Finished deploy [horizon/deploy@6dc83bd]: update horizon to fix T282489 (duration: 02m 07s)
  • 20:38 andrew@deploy1002: Started deploy [horizon/deploy@6dc83bd]: update horizon to fix T282489
  • 20:35 andrew@deploy1002: Finished deploy [horizon/deploy@6dc83bd]: update horizon to fix T282489 (duration: 01m 55s)
  • 20:33 andrew@deploy1002: Started deploy [horizon/deploy@6dc83bd]: update horizon to fix T282489
  • 20:31 andrew@deploy1002: Finished deploy [horizon/deploy@6dc83bd]: update horizon to fix T282489 (duration: 01m 21s)
  • 20:29 andrew@deploy1002: Started deploy [horizon/deploy@6dc83bd]: update horizon to fix T282489
  • 20:29 andrew@deploy1002: deploy aborted: update horizon to fix T282489 (duration: 00m 36s)
  • 20:29 andrew@deploy1002: Started deploy [horizon/deploy@6dc83bd]: update horizon to fix T282489
  • 20:29 andrew@deploy1002: deploy aborted: update horizon to fix T282489 (duration: 00m 15s)
  • 20:28 andrew@deploy1002: Started deploy [horizon/deploy@6dc83bd]: update horizon to fix T282489
  • 20:25 andrew@deploy1002: Finished deploy [horizon/deploy@6dc83bd]: update horizon to fix T282489 (duration: 04m 10s)
  • 20:21 andrew@deploy1002: Started deploy [horizon/deploy@6dc83bd]: update horizon to fix T282489
  • 18:34 jforrester@deploy1002: Synchronized wmf-config/CommonSettings.php: Config: loginwiki: Allow users to mark Notifications as read (T264834) (duration: 00m 57s)
  • 18:25 jforrester@deploy1002: Synchronized wmf-config/CommonSettings.php: Config: Disable LocalisationUpdate, part I (T158360) (duration: 00m 58s)
  • 18:24 XioNoX: add cmooney to all network devices
  • 18:18 jforrester@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [wikitech] Enable VE desktop section edit links (T280291) (duration: 00m 57s)
  • 18:13 jforrester@deploy1002: Synchronized wmf-config: Config: wgAbuseFilterAflFilterMigrationStage: Stop setting, COMPAT_NEW is default (T269712) (duration: 00m 57s)
  • 18:10 jforrester@deploy1002: Synchronized wmf-config/flaggedrevs.php: Config: FlaggedRevs: Stop setting wgFlaggedRevsWhitelist, now ignored (duration: 00m 57s)
  • 18:08 legoktm: imported new mailman3, flufl.bounce packages to apt.wm.o
  • 16:27 jbond42: rm -r /var/lib/routinator/repository and rebuilding repo
  • 16:23 herron@deploy1002: Synchronized wmf-config/ProductionServices.php: Config: arclamp/xenon: point all hosts to eqiad (mwlog1002) (T224565) (duration: 00m 59s)
  • 15:20 elukey: restart rsyslog on rpki1001
  • 14:32 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=maps1005.eqiad.wmnet
  • 13:14 marostegui@cumin1001: dbctl commit (dc=all): 'db1105:3312 (re)pooling @ 100%: Repool db1105:3312', diff saved to https://phabricator.wikimedia.org/P15892 and previous config saved to /var/cache/conftool/dbconfig/20210510-131434-root.json
  • 12:59 marostegui@cumin1001: dbctl commit (dc=all): 'db1105:3312 (re)pooling @ 75%: Repool db1105:3312', diff saved to https://phabricator.wikimedia.org/P15891 and previous config saved to /var/cache/conftool/dbconfig/20210510-125930-root.json
  • 12:44 marostegui@cumin1001: dbctl commit (dc=all): 'db1105:3312 (re)pooling @ 50%: Repool db1105:3312', diff saved to https://phabricator.wikimedia.org/P15890 and previous config saved to /var/cache/conftool/dbconfig/20210510-124427-root.json
  • 12:29 volans@cumin2002: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) homer to cumin2002.codfw.wmnet with reason: Initial deploy to cumin2002 - volans@cumin2002
  • 12:29 marostegui@cumin1001: dbctl commit (dc=all): 'db1105:3312 (re)pooling @ 25%: Repool db1105:3312', diff saved to https://phabricator.wikimedia.org/P15889 and previous config saved to /var/cache/conftool/dbconfig/20210510-122923-root.json
  • 12:27 volans@cumin2002: START - Cookbook sre.deploy.python-code homer to cumin2002.codfw.wmnet with reason: Initial deploy to cumin2002 - volans@cumin2002
  • 11:46 Urbanecm: EU B&C window done
  • 11:41 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 3418237: Disabling Education Program namespaces in Russian Wikipedia (T282112) (duration: 00m 57s)
  • 11:37 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 8bef11c: Add *.geograph.ie to the wgCopyUploadsDomains allowlist of Wikimedia Commons (T282007) (duration: 00m 57s)
  • 11:33 Urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript namespaceDupes.php --wiki=jawikivoyage --fix # T262155
  • 11:33 Urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript namespaceDupes.php --wiki=jawikivoyage # T262155
  • 11:33 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 068cd7e: Change namespace name and aliases on jawikivoyage (T262155) (duration: 00m 57s)
  • 11:26 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 9209d96: Remove Vector language button from Commons, Wikidata, Mediawiki, Wikispecies (T281968) (duration: 00m 57s)
  • 11:20 urbanecm@deploy1002: Synchronized wmf-config/Wikibase.php: 7f6f849: Add tmpSerializeEmptyListsAsObjects to Wikibase.php (T241422) (duration: 01m 01s)
  • 11:19 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 6138c64: Add tmpSerializeEmptyListsAsObjects Wikibase repo config (T241422) (duration: 00m 57s)
  • 11:11 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 23271dd: Enable ReferencePreviews as full default on Marathi wiki (T282147) (duration: 00m 57s)
  • 11:09 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.4/includes/block/DatabaseBlockStore.php: bd28391: DatabaseBlockStore: fetch correct ActorNormalization (3/3; T281972) (duration: 00m 56s)
  • 11:08 urbanecm@deploy1002: sync-file aborted: bd28391: DatabaseBlockStore: fetch correct ActorNormalization (T281972) (duration: 00m 04s)
  • 11:07 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.4/includes/ServiceWiring.php: 85dc711: DatabaseBlockStore: fetch correct ActorNormalization (2/3; T281972) (duration: 00m 56s)
  • 11:05 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.4/includes/block/DatabaseBlockStore.php: 85dc711: DatabaseBlockStore: fetch correct ActorNormalization (1/3; T281972) (duration: 00m 57s)
  • 11:01 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1105:3312', diff saved to https://phabricator.wikimedia.org/P15888 and previous config saved to /var/cache/conftool/dbconfig/20210510-110125-marostegui.json
  • 10:41 marostegui@cumin1001: dbctl commit (dc=all): 'db1156 (re)pooling @ 100%: Repool db1156', diff saved to https://phabricator.wikimedia.org/P15887 and previous config saved to /var/cache/conftool/dbconfig/20210510-104119-root.json
  • 10:40 jdrewniak@deploy1002: Synchronized portals: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 00m 58s)
  • 10:39 jdrewniak@deploy1002: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 00m 59s)
  • 10:31 moritzm: installing openjdk-11 security updates
  • 10:26 marostegui@cumin1001: dbctl commit (dc=all): 'db1156 (re)pooling @ 75%: Repool db1156', diff saved to https://phabricator.wikimedia.org/P15886 and previous config saved to /var/cache/conftool/dbconfig/20210510-102615-root.json
  • 10:22 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=maps1005.eqiad.wmnet
  • 10:18 vgutierrez: rolling restart of ATS backend instances to clear spurious warnings
  • 10:17 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=maps1004.eqiad.wmnet
  • 10:13 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on maps1005.eqiad.wmnet with reason: Resyncing database from master
  • 10:13 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on maps1005.eqiad.wmnet with reason: Resyncing database from master
  • 10:11 marostegui@cumin1001: dbctl commit (dc=all): 'db1156 (re)pooling @ 50%: Repool db1156', diff saved to https://phabricator.wikimedia.org/P15885 and previous config saved to /var/cache/conftool/dbconfig/20210510-101112-root.json
  • 09:56 marostegui@cumin1001: dbctl commit (dc=all): 'db1156 (re)pooling @ 25%: Repool db1156', diff saved to https://phabricator.wikimedia.org/P15884 and previous config saved to /var/cache/conftool/dbconfig/20210510-095608-root.json
  • 09:48 vgutierrez: Enforce Puppet Internal CA validation on trafficserver@eqiad - T281673
  • 09:45 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1074 T281959', diff saved to https://phabricator.wikimedia.org/P15883 and previous config saved to /var/cache/conftool/dbconfig/20210510-094554-marostegui.json
  • 09:28 jmm@puppetmaster1001: conftool action : set/pooled=yes; selector: name=ldap-replica1004.wikimedia.org
  • 09:27 jmm@puppetmaster1001: conftool action : set/pooled=yes; selector: name=ldap-replica1003.wikimedia.org
  • 09:26 jmm@puppetmaster1001: conftool action : set/pooled=yes; selector: name=ldap-replica2006.wikimedia.org
  • 08:52 moritzm: installing bind9 security updates on stretch (client-side tools/libs only)
  • 08:48 vgutierrez: Enforce Puppet Internal CA validation on trafficserver@esams - T281673
  • 08:41 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1156 for schema change', diff saved to https://phabricator.wikimedia.org/P15881 and previous config saved to /var/cache/conftool/dbconfig/20210510-084102-marostegui.json
  • 08:40 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts failoid1001.eqiad.wmnet
  • 08:40 marostegui@cumin1001: dbctl commit (dc=all): 'db1146:3312 (re)pooling @ 100%: Repool db1146:3312', diff saved to https://phabricator.wikimedia.org/P15880 and previous config saved to /var/cache/conftool/dbconfig/20210510-084040-root.json
  • 08:28 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts failoid1001.eqiad.wmnet
  • 08:25 marostegui@cumin1001: dbctl commit (dc=all): 'db1146:3312 (re)pooling @ 75%: Repool db1146:3312', diff saved to https://phabricator.wikimedia.org/P15879 and previous config saved to /var/cache/conftool/dbconfig/20210510-082536-root.json
  • 08:24 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts failoid2001.codfw.wmnet
  • 08:24 XioNoX: push pfw policies - T282286
  • 08:15 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts failoid2001.codfw.wmnet
  • 08:10 marostegui@cumin1001: dbctl commit (dc=all): 'db1146:3312 (re)pooling @ 50%: Repool db1146:3312', diff saved to https://phabricator.wikimedia.org/P15878 and previous config saved to /var/cache/conftool/dbconfig/20210510-081033-root.json
  • 07:55 marostegui@cumin1001: dbctl commit (dc=all): 'db1146:3312 (re)pooling @ 25%: Repool db1146:3312', diff saved to https://phabricator.wikimedia.org/P15877 and previous config saved to /var/cache/conftool/dbconfig/20210510-075529-root.json
  • 07:38 hashar: Restarted CI Jenkins # T281737
  • 06:37 elukey: apt-get clean on rpki1001 to free some space
  • 06:32 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1146:3312 for schema change', diff saved to https://phabricator.wikimedia.org/P15876 and previous config saved to /var/cache/conftool/dbconfig/20210510-063254-marostegui.json
  • 06:31 marostegui@cumin1001: dbctl commit (dc=all): 'db1129 (re)pooling @ 100%: Repool db1129', diff saved to https://phabricator.wikimedia.org/P15875 and previous config saved to /var/cache/conftool/dbconfig/20210510-063121-root.json
  • 06:16 marostegui@cumin1001: dbctl commit (dc=all): 'db1129 (re)pooling @ 75%: Repool db1129', diff saved to https://phabricator.wikimedia.org/P15874 and previous config saved to /var/cache/conftool/dbconfig/20210510-061617-root.json
  • 06:01 marostegui@cumin1001: dbctl commit (dc=all): 'db1129 (re)pooling @ 50%: Repool db1129', diff saved to https://phabricator.wikimedia.org/P15873 and previous config saved to /var/cache/conftool/dbconfig/20210510-060113-root.json
  • 05:46 marostegui@cumin1001: dbctl commit (dc=all): 'db1129 (re)pooling @ 25%: Repool db1129', diff saved to https://phabricator.wikimedia.org/P15872 and previous config saved to /var/cache/conftool/dbconfig/20210510-054610-root.json
  • 05:13 marostegui@cumin1001: dbctl commit (dc=all): 'Remove db1082 from dbctl T281794', diff saved to https://phabricator.wikimedia.org/P15871 and previous config saved to /var/cache/conftool/dbconfig/20210510-051334-marostegui.json
  • 05:07 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1129 for schema change', diff saved to https://phabricator.wikimedia.org/P15870 and previous config saved to /var/cache/conftool/dbconfig/20210510-050727-marostegui.json

2021-05-09

  • 21:44 legoktm: restarted mailman3 again (T282348) pymysql.err.InternalError: (1205, 'Lock wait timeout exceeded; try restarting transaction')
  • 18:28 legoktm: systemctl restart mailman3, bounce runner died again (T282348)
  • 10:52 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 180 days, 0:00:00 on cloudmetrics1002.eqiad.wmnet with reason: T275605
  • 10:52 aborrero@cumin1001: START - Cookbook sre.hosts.downtime for 180 days, 0:00:00 on cloudmetrics1002.eqiad.wmnet with reason: T275605
  • 09:16 legoktm: mailman3 live hacked patch at https://phabricator.wikimedia.org/T282348#7072358 to fix bounce queue
  • 06:21 legoktm: restarting mailman3 service, bounce runner died
  • 04:27 Amir1: starting upgrade of batch H of mailing lists (T280322)

2021-05-08

  • 17:18 Amir1: starting upgrade of batch G of mailing lists (T280322)

2021-05-07

  • 21:40 legoktm: deleted education@ from MM3, didn't import properly
  • 21:35 legoktm: deleted festivalsommer-teilnehmer from MM3, didn't import properly
  • 21:33 legoktm: fixed owner for wdqs-gui-build list
  • 19:48 pt1979@cumin2001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 19:42 pt1979@cumin2001: START - Cookbook sre.dns.netbox
  • 18:55 legoktm: deleted daily-article-l from mailman3 after failed import
  • 18:33 brennen@deploy1002: rebuilt and synchronized wikiversions files: all wikis to 1.37.0-wmf.4
  • 18:28 brennen@deploy1002: Synchronized php: group1 wikis to 1.37.0-wmf.4 (duration: 01m 07s)
  • 18:27 brennen@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.37.0-wmf.4
  • 18:23 brennen: 1.37.0-wmf.4 train status (T281145): blockers appear resolved, going ahead in the interest of not having a split deploy over weekend
  • 17:50 brennen@deploy1002: Synchronized php-1.37.0-wmf.4/includes/cache/LinkBatch.php: Backport: LinkBatch: skip bad input (T282180 T282070) (duration: 01m 06s)
  • 17:25 andrew@deploy1002: Finished deploy [horizon/deploy@20f479e]: updated trove -> codfw1dev (duration: 01m 55s)
  • 17:23 andrew@deploy1002: Started deploy [horizon/deploy@20f479e]: updated trove -> codfw1dev
  • 15:10 andrew@deploy1002: Finished deploy [horizon/deploy@71f273c]: updated trove -> codfw1dev (duration: 01m 24s)
  • 15:08 andrew@deploy1002: Started deploy [horizon/deploy@71f273c]: updated trove -> codfw1dev
  • 15:03 andrew@deploy1002: Finished deploy [horizon/deploy@71f273c]: updated trove -> codfw1dev (duration: 01m 11s)
  • 15:02 andrew@deploy1002: Started deploy [horizon/deploy@71f273c]: updated trove -> codfw1dev
  • 15:02 andrew@deploy1002: Finished deploy [horizon/deploy@71f273c]: updated trove -> codfw1dev (duration: 01m 26s)
  • 15:00 andrew@deploy1002: Started deploy [horizon/deploy@71f273c]: updated trove -> codfw1dev
  • 15:00 andrew@deploy1002: Finished deploy [horizon/deploy@71f273c]: updated trove -> codfw1dev (duration: 01m 29s)
  • 14:58 andrew@deploy1002: Started deploy [horizon/deploy@71f273c]: updated trove -> codfw1dev
  • 14:57 andrew@deploy1002: Finished deploy [horizon/deploy@71f273c]: updated trove -> codfw1dev (duration: 01m 22s)
  • 14:56 andrew@deploy1002: Started deploy [horizon/deploy@71f273c]: updated trove -> codfw1dev
  • 14:41 bblack@cumin1001: conftool action : set/pooled=yes; selector: name=cp203[34].codfw.wmnet
  • 14:40 andrew@deploy1002: Finished deploy [horizon/deploy@71f273c]: updated trove -> codfw1dev (duration: 01m 19s)
  • 14:38 andrew@deploy1002: Started deploy [horizon/deploy@71f273c]: updated trove -> codfw1dev
  • 14:38 andrew@deploy1002: Finished deploy [horizon/deploy@71f273c]: updated trove -> codfw1dev (duration: 00m 50s)
  • 14:37 andrew@deploy1002: Started deploy [horizon/deploy@71f273c]: updated trove -> codfw1dev
  • 13:04 Urbanecm: Start server-side upload for 1 video file (T281927)
  • 12:19 kormat@cumin1001: dbctl commit (dc=all): 'db1173 (re)pooling @ 100%: reimaged to buster T280751', diff saved to https://phabricator.wikimedia.org/P15856 and previous config saved to /var/cache/conftool/dbconfig/20210507-121908-kormat.json
  • 12:04 kormat@cumin1001: dbctl commit (dc=all): 'db1173 (re)pooling @ 75%: reimaged to buster T280751', diff saved to https://phabricator.wikimedia.org/P15855 and previous config saved to /var/cache/conftool/dbconfig/20210507-120404-kormat.json
  • 11:49 kormat@cumin1001: dbctl commit (dc=all): 'db1173 (re)pooling @ 50%: reimaged to buster T280751', diff saved to https://phabricator.wikimedia.org/P15854 and previous config saved to /var/cache/conftool/dbconfig/20210507-114859-kormat.json
  • 11:33 kormat@cumin1001: dbctl commit (dc=all): 'db1173 (re)pooling @ 25%: reimaged to buster T280751', diff saved to https://phabricator.wikimedia.org/P15853 and previous config saved to /var/cache/conftool/dbconfig/20210507-113355-kormat.json
  • 09:55 dcausse: depooling wdqs1012 T280382, T282222
  • 09:44 vgutierrez: Enforce Puppet Internal CA validation on trafficserver@codfw - T281673
  • 08:50 jmm@puppetmaster1001: conftool action : set/pooled=yes; selector: name=ldap-replica2005.wikimedia.org
  • 08:19 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2057.codfw.wmnet
  • 08:15 vgutierrez: Enforce Puppet Internal CA validation on trafficserver@eqsin - T281673
  • 08:10 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be2057.codfw.wmnet
  • 07:47 marostegui@cumin1001: dbctl commit (dc=all): 'db1161 (re)pooling @ 100%: Repool db1161', diff saved to https://phabricator.wikimedia.org/P15849 and previous config saved to /var/cache/conftool/dbconfig/20210507-074725-root.json
  • 07:32 marostegui@cumin1001: dbctl commit (dc=all): 'db1161 (re)pooling @ 75%: Repool db1161', diff saved to https://phabricator.wikimedia.org/P15848 and previous config saved to /var/cache/conftool/dbconfig/20210507-073222-root.json
  • 07:17 marostegui@cumin1001: dbctl commit (dc=all): 'db1161 (re)pooling @ 50%: Repool db1161', diff saved to https://phabricator.wikimedia.org/P15847 and previous config saved to /var/cache/conftool/dbconfig/20210507-071718-root.json
  • 07:02 marostegui@cumin1001: dbctl commit (dc=all): 'db1161 (re)pooling @ 25%: Repool db1161', diff saved to https://phabricator.wikimedia.org/P15846 and previous config saved to /var/cache/conftool/dbconfig/20210507-070214-root.json
  • 06:17 marostegui: Deploy schema change on s2 codfw, lag will appear T266486 T268392 T273360
  • 06:11 tstarling@deploy1002: Synchronized php-1.37.0-wmf.4/includes/api/ApiQueryLogEvents.php: fix UBN T282122 (duration: 01m 10s)
  • 06:09 tstarling@deploy1002: Synchronized php-1.37.0-wmf.3/includes/api/ApiQueryLogEvents.php: fix UBN T282122 (duration: 01m 06s)
  • 05:54 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1161 for schema change', diff saved to https://phabricator.wikimedia.org/P15845 and previous config saved to /var/cache/conftool/dbconfig/20210507-055425-marostegui.json
  • 05:53 marostegui@cumin1001: dbctl commit (dc=all): 'db1130 (re)pooling @ 100%: Repool db1130', diff saved to https://phabricator.wikimedia.org/P15844 and previous config saved to /var/cache/conftool/dbconfig/20210507-055350-root.json
  • 05:38 marostegui@cumin1001: dbctl commit (dc=all): 'db1130 (re)pooling @ 75%: Repool db1130', diff saved to https://phabricator.wikimedia.org/P15842 and previous config saved to /var/cache/conftool/dbconfig/20210507-053847-root.json
  • 05:23 marostegui@cumin1001: dbctl commit (dc=all): 'db1130 (re)pooling @ 50%: Repool db1130', diff saved to https://phabricator.wikimedia.org/P15841 and previous config saved to /var/cache/conftool/dbconfig/20210507-052343-root.json
  • 05:15 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1087 T282093', diff saved to https://phabricator.wikimedia.org/P15840 and previous config saved to /var/cache/conftool/dbconfig/20210507-051519-marostegui.json
  • 05:08 marostegui@cumin1001: dbctl commit (dc=all): 'db1130 (re)pooling @ 25%: Repool db1130', diff saved to https://phabricator.wikimedia.org/P15839 and previous config saved to /var/cache/conftool/dbconfig/20210507-050839-root.json
  • 04:33 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1130 for schema change', diff saved to https://phabricator.wikimedia.org/P15837 and previous config saved to /var/cache/conftool/dbconfig/20210507-043350-marostegui.json

2021-05-06

  • 23:50 brennen@deploy1002: rebuilt and synchronized wikiversions files: Rollback group1 and group2 to 1.37.0-wmf.3 (T282193)
  • 22:52 legoktm: upgrading mailman3 and hyperkitty on lists1001 (T282092)
  • 22:11 brennen@deploy1002: Synchronized php-1.37.0-wmf.4/includes/specials/SpecialWatchlist.php: Backport: Reorder tables in SpecialWatchlist (T282181) (duration: 00m 57s)
  • 21:48 legoktm: upgraded mailman3 and hyperkitty on lists1002 (T282092)
  • 21:46 legoktm: uploaded new mailman3 and hyperkitty packages to apt.wm.o (T282092)
  • 21:11 hashar: restarted CI Jenkins due to T281737
  • 19:05 brennen@deploy1002: rebuilt and synchronized wikiversions files: all wikis to 1.37.0-wmf.4
  • 19:04 ejegg: updated fundraising CiviCRM from 8034e47008 to 2052d79248
  • 18:58 otto@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Migrate WikidataCompletionSearchClicks to event platform on all wikis (T282140) (duration: 01m 04s)
  • 18:55 urbanecm@deploy1002: Synchronized wmf-config/Wikibase.php: 338d1df: Wikibase: Use wikidataclient-test dblist for testwikidata localClientDatabases (T282160) (duration: 01m 05s)
  • 18:46 urbanecm@deploy1002: Synchronized wmf-config/Wikibase.php: 7e21cf0: NO-OP: Wikibase: Use wikidataclient dblist directly for repo localClientDatabases (T282160) (duration: 01m 04s)
  • 18:31 otto@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Declare WikidataCompletionSearchClicks stream and migrate on testwiki - T282140 (duration: 01m 06s)
  • 17:59 volans@cumin2001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for cumin1001.eqiad.wmnet
  • 17:59 volans@cumin2001: START - Cookbook sre.hosts.remove-downtime for cumin1001.eqiad.wmnet
  • 17:47 volans@cumin2001: END (FAIL) - Cookbook sre.hosts.remove-downtime (exit_code=99) for cumin1001.eqiad.wmnet
  • 17:47 volans@cumin2001: START - Cookbook sre.hosts.remove-downtime for cumin1001.eqiad.wmnet
  • 17:35 jgiannelos@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'proton' for release 'production' .
  • 17:33 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'proton' for release 'production' .
  • 17:27 bblack@cumin1001: conftool action : set/pooled=no; selector: name=cp203[34].codfw.wmnet
  • 17:20 jgiannelos@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'proton' for release 'production' .
  • 17:15 volans: upgrade spicerack on cumin* to 0.0.52
  • 17:15 ryankemper: [Elastic] Set `elastic2043` as the only banned node in Cirrussearch Elasticsearch clusters (`elastic2058-production-search-codfw`, `elastic2058-production-search-omega-codfw`, `elastic2058-production-search-psi-codfw`)
  • 17:13 papaul: powerdown ms-be2057 for relocation
  • 17:13 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'proton' for release 'production' .
  • 17:12 volans: uploaded spicerack_0.0.52 to apt.wikimedia.org buster-wikimedia,bullseye-wikimedia
  • 17:00 papaul: powerdown elastic2058 for relocation
  • 16:43 vgutierrez: Enforce Puppet Internal CA validation on trafficserver@ulsfo - T281673
  • 16:12 papaul: powerdown mc-gp2002 for relocation
  • 16:09 ryankemper: [Elastic] Set `elastic2058` as the only banned node in Cirrussearch Elasticsearch clusters (`elastic2058-production-search-codfw`, `elastic2058-production-search-omega-codfw`, `elastic2058-production-search-psi-codfw`)
  • 15:58 Amir1: starting upgrade of public mailing lists in group d and e (T280322)
  • 15:50 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs1012.eqiad.wmnet with reason: REIMAGE
  • 15:47 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs1012.eqiad.wmnet with reason: REIMAGE
  • 15:42 papaul: powerdown logstash2027 for relocation
  • 15:41 mvolz@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'zotero' for release 'production' .
  • 15:40 ryankemper@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) reboot without plugin upgrade (3 nodes at a time) for ElasticSearch cluster search_codfw: codfw reboot - ryankemper@cumin1001 - T280563
  • 15:34 XioNoX: push cloud-gw-transport-eqiad to asw2-b-eqiad and cloudsw
  • 15:33 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation reboot without plugin upgrade (3 nodes at a time) for ElasticSearch cluster search_codfw: codfw reboot - ryankemper@cumin1001 - T280563
  • 15:32 ryankemper: T280382 `sudo -i wmf-auto-reimage-host -p T280382 wdqs1012.eqiad.wmnet` on `ryankemper@cumin1001` tmux session `reimage`
  • 15:32 ryankemper: T280382 `sudo -i wmf-auto-reimage-host -p T280382 wdqs2003.codfw.wmnet` on `ryankemper@cumin1001` tmux session `reimage`
  • 15:31 mvolz@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'zotero' for release 'staging' .
  • 15:29 cdanis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:05:00 on cumin1001.eqiad.wmnet with reason: quiz
  • 15:29 cdanis@cumin1001: START - Cookbook sre.hosts.downtime for 0:05:00 on cumin1001.eqiad.wmnet with reason: quiz
  • 15:26 ryankemper: T280382 [WDQS] Pooled `wdqs1007` and `wdqs2004`
  • 15:26 ryankemper: T280382 `wdqs2004.codfw.wmnet` has been re-imaged and had the appropriate wikidata/categories journal files transferred. `df -h` shows disk space is no longer an issue following the switch to `raid0`: `/dev/md2 2.6T 998G 1.5T 40% /srv`
  • 15:26 ryankemper: T280382 `wdqs1007.eqiad.wmnet` has been re-imaged and had the appropriate wikidata/categories journal files transferred. `df -h` shows disk space is no longer an issue following the switch to `raid0`: `/dev/md2 2.6T 998G 1.5T 40% /srv`
  • 15:20 mvolz@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'citoid' for release 'production' .
  • 15:16 mvolz@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'citoid' for release 'production' .
  • 15:14 papaul: powerdown ms-be2053 for relocation
  • 15:10 moritzm: imported wmfbackups 0.5+deb11u1 for bullseye-wikimedia to apt.wikimedia.org
  • 15:07 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 9 hosts with reason: T270704
  • 15:06 aborrero@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 9 hosts with reason: T270704
  • 15:06 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 105 hosts with reason: T270704
  • 15:06 aborrero@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 105 hosts with reason: T270704
  • 15:06 mvolz@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'citoid' for release 'staging' .
  • 15:05 moritzm: imported wmfmariadbpy 0.6+deb11u1 for bullseye-wikimedia to apt.wikimedia.org
  • 14:55 papaul: powerdown kafka-main2002 for relocation
  • 14:30 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1113:3315', diff saved to https://phabricator.wikimedia.org/P15833 and previous config saved to /var/cache/conftool/dbconfig/20210506-143002-marostegui.json
  • 14:09 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1113:3315 for schema change', diff saved to https://phabricator.wikimedia.org/P15829 and previous config saved to /var/cache/conftool/dbconfig/20210506-140916-marostegui.json
  • 13:37 marostegui@cumin1001: dbctl commit (dc=all): 'db1144:3315 (re)pooling @ 100%: Repool db1144:3315', diff saved to https://phabricator.wikimedia.org/P15828 and previous config saved to /var/cache/conftool/dbconfig/20210506-133738-root.json
  • 13:22 marostegui@cumin1001: dbctl commit (dc=all): 'db1144:3315 (re)pooling @ 75%: Repool db1144:3315', diff saved to https://phabricator.wikimedia.org/P15827 and previous config saved to /var/cache/conftool/dbconfig/20210506-132234-root.json
  • 13:21 XioNoX: push pfw policies - T281942
  • 13:07 marostegui@cumin1001: dbctl commit (dc=all): 'db1144:3315 (re)pooling @ 50%: Repool db1144:3315', diff saved to https://phabricator.wikimedia.org/P15826 and previous config saved to /var/cache/conftool/dbconfig/20210506-130730-root.json
  • 12:52 marostegui@cumin1001: dbctl commit (dc=all): 'db1144:3315 (re)pooling @ 25%: Repool db1144:3315', diff saved to https://phabricator.wikimedia.org/P15825 and previous config saved to /var/cache/conftool/dbconfig/20210506-125226-root.json
  • 11:44 hnowlan@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts eventlog1002.eqiad.wmnet
  • 11:35 mlitn@deploy1002: Synchronized wmf-config: Config: Enable Extension:MediaSearch on betacommons (T265939) (duration: 01m 06s)
  • 11:34 mlitn@deploy1002: sync-file aborted: Config: Enable Extension:MediaSearch on betacommons (T265939) (duration: 00m 56s)
  • 11:34 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1173.eqiad.wmnet with reason: REIMAGE
  • 11:31 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1173.eqiad.wmnet with reason: REIMAGE
  • 11:30 hnowlan@cumin1001: START - Cookbook sre.hosts.decommission for hosts eventlog1002.eqiad.wmnet
  • 11:28 hnowlan@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=99) for hosts eventlog1002.eqiad.wmnet
  • 11:27 hnowlan@cumin1001: START - Cookbook sre.hosts.decommission for hosts eventlog1002.eqiad.wmnet
  • 11:23 wmde-fisch@deploy1002: Synchronized wmf-config/CommonSettings.php: Config: Enable ReferencePreviews as full default on pilot wikis (T271206) (duration: 01m 06s)
  • 11:22 wmde-fisch@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Enable ReferencePreviews as full default on pilot wikis (T271206) (duration: 01m 06s)
  • 11:12 kormat@cumin1001: dbctl commit (dc=all): 'db1173 depooling: Reimage to buster T280751', diff saved to https://phabricator.wikimedia.org/P15824 and previous config saved to /var/cache/conftool/dbconfig/20210506-111256-kormat.json
  • 11:12 kormat: reimaging db1173 to buster T280751
  • 10:59 volans: upgrading spicerack on cumin hosts to 0.0.51-1
  • 10:59 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1144:3315 for schema change', diff saved to https://phabricator.wikimedia.org/P15823 and previous config saved to /var/cache/conftool/dbconfig/20210506-105909-marostegui.json
  • 10:58 marostegui@cumin1001: dbctl commit (dc=all): 'db1110 (re)pooling @ 100%: Repool db1110', diff saved to https://phabricator.wikimedia.org/P15822 and previous config saved to /var/cache/conftool/dbconfig/20210506-105850-root.json
  • 10:43 marostegui@cumin1001: dbctl commit (dc=all): 'db1110 (re)pooling @ 75%: Repool db1110', diff saved to https://phabricator.wikimedia.org/P15821 and previous config saved to /var/cache/conftool/dbconfig/20210506-104346-root.json
  • 10:28 marostegui@cumin1001: dbctl commit (dc=all): 'db1110 (re)pooling @ 50%: Repool db1110', diff saved to https://phabricator.wikimedia.org/P15820 and previous config saved to /var/cache/conftool/dbconfig/20210506-102842-root.json
  • 10:19 jynus: stop dbprov2002 in advance of maintenance T281135
  • 10:13 marostegui@cumin1001: dbctl commit (dc=all): 'db1110 (re)pooling @ 25%: Repool db1110', diff saved to https://phabricator.wikimedia.org/P15819 and previous config saved to /var/cache/conftool/dbconfig/20210506-101339-root.json
  • 09:55 akosiaris@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'linkrecommendation' for release 'external' .
  • 09:55 akosiaris@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'linkrecommendation' for release 'internal' .
  • 09:50 akosiaris@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'linkrecommendation' for release 'external' .
  • 09:50 akosiaris@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'linkrecommendation' for release 'internal' .
  • 09:45 akosiaris@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'linkrecommendation' for release 'staging' .
  • 09:22 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1110 for schema change', diff saved to https://phabricator.wikimedia.org/P15818 and previous config saved to /var/cache/conftool/dbconfig/20210506-092217-marostegui.json
  • 09:18 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3315 (re)pooling @ 100%: Repool db1096:3315', diff saved to https://phabricator.wikimedia.org/P15817 and previous config saved to /var/cache/conftool/dbconfig/20210506-091818-root.json
  • 09:03 elukey: sudo apt-get remove linux-image-4.19.0-11-amd64 linux-image-4.19.0-9-amd64 linux-image-4.19.0-13-amd64 on ping[123]001 host to free some space (tiny root partition, these are old kernels)
  • 09:03 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3315 (re)pooling @ 75%: Repool db1096:3315', diff saved to https://phabricator.wikimedia.org/P15816 and previous config saved to /var/cache/conftool/dbconfig/20210506-090315-root.json
  • 08:48 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3315 (re)pooling @ 50%: Repool db1096:3315', diff saved to https://phabricator.wikimedia.org/P15815 and previous config saved to /var/cache/conftool/dbconfig/20210506-084811-root.json
  • 08:47 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1087 db1167', diff saved to https://phabricator.wikimedia.org/P15814 and previous config saved to /var/cache/conftool/dbconfig/20210506-084754-marostegui.json
  • 08:44 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1087 and db1167 to switch sanitarium masters', diff saved to https://phabricator.wikimedia.org/P15813 and previous config saved to /var/cache/conftool/dbconfig/20210506-084443-marostegui.json
  • 08:39 marostegui@cumin1001: dbctl commit (dc=all): 'db1160 (re)pooling @ 100%: Repool db1160', diff saved to https://phabricator.wikimedia.org/P15812 and previous config saved to /var/cache/conftool/dbconfig/20210506-083910-root.json
  • 08:33 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3315 (re)pooling @ 25%: Repool db1096:3315', diff saved to https://phabricator.wikimedia.org/P15811 and previous config saved to /var/cache/conftool/dbconfig/20210506-083307-root.json
  • 08:27 ariel@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts snapshot1007.eqiad.wmnet
  • 08:24 marostegui@cumin1001: dbctl commit (dc=all): 'db1160 (re)pooling @ 75%: Repool db1160', diff saved to https://phabricator.wikimedia.org/P15810 and previous config saved to /var/cache/conftool/dbconfig/20210506-082406-root.json
  • 08:23 moritzm: imported wikimedia-lvs-realserver to apt.wikimedia.org/bullseye T275873
  • 08:18 ariel@cumin1001: START - Cookbook sre.hosts.decommission for hosts snapshot1007.eqiad.wmnet
  • 08:16 ariel@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts snapshot1006.eqiad.wmnet
  • 08:09 marostegui@cumin1001: dbctl commit (dc=all): 'db1160 (re)pooling @ 50%: Repool db1160', diff saved to https://phabricator.wikimedia.org/P15809 and previous config saved to /var/cache/conftool/dbconfig/20210506-080902-root.json
  • 08:06 ariel@cumin1001: START - Cookbook sre.hosts.decommission for hosts snapshot1006.eqiad.wmnet
  • 08:04 ariel@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts snapshot1005.eqiad.wmnet
  • 07:54 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1096:3315 for schema change', diff saved to https://phabricator.wikimedia.org/P15808 and previous config saved to /var/cache/conftool/dbconfig/20210506-075416-marostegui.json
  • 07:53 marostegui@cumin1001: dbctl commit (dc=all): 'db1160 (re)pooling @ 25%: Repool db1160', diff saved to https://phabricator.wikimedia.org/P15807 and previous config saved to /var/cache/conftool/dbconfig/20210506-075359-root.json
  • 07:47 jynus: shutting down and removing db2098:s3 instance
  • 07:47 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1160 for schema change', diff saved to https://phabricator.wikimedia.org/P15806 and previous config saved to /var/cache/conftool/dbconfig/20210506-074746-marostegui.json
  • 07:45 ariel@cumin1001: START - Cookbook sre.hosts.decommission for hosts snapshot1005.eqiad.wmnet
  • 07:29 vgutierrez: Enforce Puppet Internal CA validation on trafficserver@cp[4026,4032] - T281673
  • 07:26 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 07:24 moritzm: installing exim security updates on bullseye hosts
  • 07:24 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 06:40 marostegui@cumin1001: dbctl commit (dc=all): 'db1112 (re)pooling @ 100%: Repool db1112 after checking its tables', diff saved to https://phabricator.wikimedia.org/P15805 and previous config saved to /var/cache/conftool/dbconfig/20210506-064020-root.json
  • 06:29 marostegui@cumin1001: dbctl commit (dc=all): 'db1079 (re)pooling @ 100%: Repool db1079', diff saved to https://phabricator.wikimedia.org/P15804 and previous config saved to /var/cache/conftool/dbconfig/20210506-062931-root.json
  • 06:29 marostegui@cumin1001: dbctl commit (dc=all): 'db1158 (re)pooling @ 100%: Repool db1158', diff saved to https://phabricator.wikimedia.org/P15803 and previous config saved to /var/cache/conftool/dbconfig/20210506-062915-root.json
  • 06:25 marostegui@cumin1001: dbctl commit (dc=all): 'db1112 (re)pooling @ 75%: Repool db1112 after checking its tables', diff saved to https://phabricator.wikimedia.org/P15802 and previous config saved to /var/cache/conftool/dbconfig/20210506-062516-root.json
  • 06:20 elukey: apt-get clean on ping[1,2,3]001 to free some space
  • 06:14 marostegui@cumin1001: dbctl commit (dc=all): 'db1079 (re)pooling @ 75%: Repool db1079', diff saved to https://phabricator.wikimedia.org/P15801 and previous config saved to /var/cache/conftool/dbconfig/20210506-061427-root.json
  • 06:14 marostegui@cumin1001: dbctl commit (dc=all): 'db1158 (re)pooling @ 75%: Repool db1158', diff saved to https://phabricator.wikimedia.org/P15800 and previous config saved to /var/cache/conftool/dbconfig/20210506-061411-root.json
  • 06:10 marostegui@cumin1001: dbctl commit (dc=all): 'db1112 (re)pooling @ 50%: Repool db1112 after checking its tables', diff saved to https://phabricator.wikimedia.org/P15799 and previous config saved to /var/cache/conftool/dbconfig/20210506-061012-root.json
  • 06:01 ryankemper: T280382 `sudo -i cookbook sre.wdqs.data-transfer --source wdqs1011.eqiad.wmnet --dest wdqs1007.eqiad.wmnet --reason "transferring fresh wikidata journal following reimage" --blazegraph_instance blazegraph` on `ryankemper@cumin1001` tmux session `reimage`
  • 06:00 ryankemper: T280382 `sudo -i cookbook sre.wdqs.data-transfer --source wdqs2008.codfw.wmnet --dest wdqs2004.codfw.wmnet --reason "transferring fresh wikidata journal following reimage" --blazegraph_instance blazegraph` on `ryankemper@cumin1001` tmux session `reimage`
  • 06:00 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 05:59 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 05:59 marostegui@cumin1001: dbctl commit (dc=all): 'db1079 (re)pooling @ 50%: Repool db1079', diff saved to https://phabricator.wikimedia.org/P15798 and previous config saved to /var/cache/conftool/dbconfig/20210506-055923-root.json
  • 05:59 marostegui@cumin1001: dbctl commit (dc=all): 'db1158 (re)pooling @ 50%: Repool db1158', diff saved to https://phabricator.wikimedia.org/P15797 and previous config saved to /var/cache/conftool/dbconfig/20210506-055907-root.json
  • 05:55 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1083 T281445', diff saved to https://phabricator.wikimedia.org/P15796 and previous config saved to /var/cache/conftool/dbconfig/20210506-055535-marostegui.json
  • 05:55 marostegui@cumin1001: dbctl commit (dc=all): 'db1112 (re)pooling @ 25%: Repool db1112 after checking its tables', diff saved to https://phabricator.wikimedia.org/P15795 and previous config saved to /var/cache/conftool/dbconfig/20210506-055509-root.json
  • 05:44 marostegui@cumin1001: dbctl commit (dc=all): 'db1079 (re)pooling @ 25%: Repool db1079', diff saved to https://phabricator.wikimedia.org/P15794 and previous config saved to /var/cache/conftool/dbconfig/20210506-054419-root.json
  • 05:44 marostegui@cumin1001: dbctl commit (dc=all): 'db1158 (re)pooling @ 25%: Repool db1158', diff saved to https://phabricator.wikimedia.org/P15793 and previous config saved to /var/cache/conftool/dbconfig/20210506-054404-root.json
  • 05:43 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 05:43 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 05:38 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 05:38 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1079 and db1158 to switch sanitarium masters', diff saved to https://phabricator.wikimedia.org/P15792 and previous config saved to /var/cache/conftool/dbconfig/20210506-053801-marostegui.json
  • 05:38 ryankemper: T280382 `sudo -i cookbook sre.wdqs.data-transfer --source wdqs1011.eqiad.wmnet --dest wdqs1007.eqiad.wmnet --reason "transferring fresh categories journal following reimage" --blazegraph_instance categories` on `ryankemper@cumin1001` tmux session `reimage`
  • 05:37 ryankemper: T280382 `sudo -i cookbook sre.wdqs.data-transfer --source wdqs2008.codfw.wmnet --dest wdqs2004.codfw.wmnet --reason "transferring fresh categories journal following reimage" --blazegraph_instance categories` on `ryankemper@cumin1001` tmux session `reimage`
  • 05:37 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 05:32 tstarling@deploy1002: Synchronized php-1.37.0-wmf.4/includes/page/PageReferenceValue.php: fixing T282070 RC/log breakage due to unblocking autoblocks (duration: 01m 09s)
  • 05:27 effie: upgrade scap to 3.17.1-1 - T279695
  • 03:56 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs2004.codfw.wmnet with reason: REIMAGE
  • 03:54 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs1007.eqiad.wmnet with reason: REIMAGE
  • 03:53 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs2004.codfw.wmnet with reason: REIMAGE
  • 03:52 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs1007.eqiad.wmnet with reason: REIMAGE
  • 03:38 ryankemper: T280382 `sudo -i wmf-auto-reimage-host -p T280382 wdqs1007.eqiad.wmnet` on `ryankemper@cumin1001` tmux session `reimage`
  • 03:38 ryankemper: T280382 `sudo -i wmf-auto-reimage-host -p T280382 wdqs2004.codfw.wmnet` on `ryankemper@cumin1001` tmux session `reimage`
  • 03:18 ryankemper: [Elastic] `elastic2043` is ssh unreachable. Power cycling it to bring it briefly back online - if it has the shard it should be able to repair the cluster state. Otherwise I'll have to delete the index for `enwiki_titlesuggest_1620184482` given the data would be unrecoverable
  • 03:08 ryankemper: [Elastic] `ryankemper@elastic2044:~$ curl -H 'Content-Type: application/json' -XPUT http://localhost:9200/_cluster/settings -d '{"transient":{"cluster.routing.allocation.exclude":{"_host": null,"_name": null}'`}}
  • 03:08 ryankemper: [Elastic] Temporarily unbanning `elastic2033` and `elastic2043` from `production-search-codfw` to see if we can get the cluster green again. If it returns to green then we'll ban one node, wait for the shards to redistribute, and then ban the other
  • 03:06 ryankemper: [Elastic] I banned two nodes simultaneously earlier today - if there's an index with only 1 replica, and its primary and replica happened to be on the two nodes I banned, then that would have caused this situation
  • 03:04 ryankemper: [Elastic] It looks like we've got a single missing shard in `production-search-codfw` (port 9200), which is putting the cluster into red status. The cluster won't get back into green status without intervention
  • 02:56 ryankemper@cumin1001: END (ERROR) - Cookbook sre.elasticsearch.rolling-operation (exit_code=97) reboot without plugin upgrade (3 nodes at a time) for ElasticSearch cluster search_codfw: codfw reboot - ryankemper@cumin1001 - T280563
  • 02:55 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation reboot without plugin upgrade (3 nodes at a time) for ElasticSearch cluster search_codfw: codfw reboot - ryankemper@cumin1001 - T280563
  • 00:35 Amir1: sudo service mailman3-web restart

2021-05-05

  • 23:35 ryankemper: T281621 T281327 [Elastic] Banned `elastic2033` and `elastic2043` from the Cirrussearch Elasticsearch clusters
  • 23:10 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.4/extensions/GlobalWatchlist/modules/SpecialGlobalWatchlist.display.css: 4947241: Fix centering of as-of label (duration: 01m 08s)
  • 22:13 mutante: welcome new deployer derick - user created on deploy1002 and bastions (T281564)
  • 22:05 mutante: pushing puppet run on all bastion hosts
  • 21:45 mutante: mailing lists: approved Alangi Derick's pending request for membership in ops mailing list (is becoming deployer) T281309
  • 21:37 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.4/extensions/CentralAuth/includes/CentralAuthUser.php: 52b134e: Cross-wiki block should pass correct wiki blocker (T281972) (duration: 01m 09s)
  • 21:34 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.3/extensions/CentralAuth/includes/CentralAuthUser.php: 6526884: Cross-wiki block should pass correct wiki blocker (T281972) (duration: 01m 08s)
  • 21:32 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.4/includes/user/UserIdentityValue.php: f189c46: UserIdentityValue: Introduce convenience static factory methods (T281972) (duration: 01m 09s)
  • 21:30 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.3/includes/user/UserIdentityValue.php: 8ffb52d: UserIdentityValue: Introduce convenience static factory methods (T281972) (duration: 01m 11s)
  • 21:29 urbanecm@deploy1002: sync-file aborted: 8ffb52d: UserIdentityValue: Introduce convenience static factory methods (T281972) (duration: 00m 04s)
  • 20:37 ejegg: updated email preferences wiki (donorwiki) from d449599540 to 9f51ace546
  • 20:36 ejegg: updated payments-wiki from d449599540 to 9f51ace546
  • 20:20 ejegg: updated email preferences wiki (donorwiki) from a232fc3438 to d449599540
  • 19:59 jbond42: re-enable puppet post 685485
  • 19:53 jbond42: disable puppet: rolling out change (685485) which affects all hosts
  • 19:21 brennen@deploy1002: Synchronized php: group1 wikis to 1.37.0-wmf.4 (duration: 01m 07s)
  • 19:19 brennen@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.37.0-wmf.4
  • 19:16 jbond42: ignore the last log message will wait for deploy to finish
  • 19:16 brennen@deploy1002: Synchronized php-1.37.0-wmf.4/tests/phpunit/includes: Backport: Fix order of joins in SpecialRecentChanges (T281981) (duration: 01m 10s)
  • 19:16 jbond42: disable puppet: rolling out change (685485) which affects all hosts
  • 19:14 brennen@deploy1002: Synchronized php-1.37.0-wmf.4/includes/specials: Backport: Fix order of joins in SpecialRecentChanges (T281981) (duration: 01m 08s)
  • 19:10 Amir1: starting migration of public mailing lists in group b and c to mailman3 (T280322)
  • 19:01 brennen: 1.37.0-wmf.4 train status (T281145): deploying patch for T282038 and then rolling forward to group1.
  • 18:59 bblack@cumin1001: conftool action : set/pooled=yes; selector: name=cp501[46].eqsin.wmnet
  • 18:50 bblack@cumin1001: conftool action : set/pooled=yes; selector: name=cp501[35].eqsin.wmnet
  • 18:43 tgr_: Morning deploys done
  • 18:43 tgr@deploy1002: Synchronized php-1.37.0-wmf.4/extensions/GrowthExperiments/modules/homepage/addlink/AddLinkArticleTarget.js: Backport: Prevent edit notices from appearing (T281960) (duration: 01m 08s)
  • 18:42 tgr@deploy1002: Synchronized php-1.37.0-wmf.3/extensions/GrowthExperiments/modules/homepage/addlink/AddLinkArticleTarget.js: Backport: Prevent edit notices from appearing (T281960) (duration: 01m 08s)
  • 18:40 tgr@deploy1002: Synchronized wmf-config/flaggedrevs.php: Config: flaggedrevs.php: Use MediaWikiServices, not an extension function (duration: 01m 08s)
  • 18:34 tgr@deploy1002: Synchronized php-1.37.0-wmf.4/extensions/Popups/includes: Backport: Enable Reference Previews for more users (T271206) (duration: 01m 08s)
  • 18:33 tgr@deploy1002: Synchronized php-1.37.0-wmf.3/extensions/Popups/includes: Backport: Enable Reference Previews for more users (T271206) (duration: 01m 11s)
  • 18:24 tgr@deploy1002: Synchronized wmf-config/ProductionServices.php: Config: replace mwlog1001 with new mwlog[12]002 hosts (T224565) (duration: 01m 24s)
  • 17:59 bblack@cumin1001: conftool action : set/weight=100; selector: name=cp501[3456].eqsin.wmnet,service=ats-be
  • 17:59 bblack@cumin1001: conftool action : set/weight=1; selector: name=cp501[3456].eqsin.wmnet,service=ats-tls
  • 17:59 bblack@cumin1001: conftool action : set/weight=1; selector: name=cp501[3456].eqsin.wmnet,service=varnish-fe
  • 17:59 mutante: adding a systemd timer to all thumbor servers that writes output of fc-list command into /srv/fc-list/fc-list (T280718)
  • 17:58 XioNoX: push pfw policies - T281942
  • 17:10 ejegg: updated standalone SmashPig deploy from 250a8570d1 to be272c02ce
  • 15:54 marostegui@cumin1001: dbctl commit (dc=all): 'db1126 (re)pooling @ 100%: Repool db1126', diff saved to https://phabricator.wikimedia.org/P15786 and previous config saved to /var/cache/conftool/dbconfig/20210505-155453-root.json
  • 15:43 herron@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts icinga2001.wikimedia.org
  • 15:39 marostegui@cumin1001: dbctl commit (dc=all): 'db1126 (re)pooling @ 75%: Repool db1126', diff saved to https://phabricator.wikimedia.org/P15785 and previous config saved to /var/cache/conftool/dbconfig/20210505-153949-root.json
  • 15:25 herron@cumin1001: START - Cookbook sre.hosts.decommission for hosts icinga2001.wikimedia.org
  • 15:24 marostegui@cumin1001: dbctl commit (dc=all): 'db1126 (re)pooling @ 50%: Repool db1126', diff saved to https://phabricator.wikimedia.org/P15784 and previous config saved to /var/cache/conftool/dbconfig/20210505-152445-root.json
  • 15:23 herron@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts icinga1001.wikimedia.org
  • 15:11 herron@cumin1001: START - Cookbook sre.hosts.decommission for hosts icinga1001.wikimedia.org
  • 15:10 herron: decommissioning icinga[12]001 hosts T279601 T279602
  • 15:10 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on 10 hosts with reason: Table check on db2129 T280751
  • 15:10 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on 10 hosts with reason: Table check on db2129 T280751
  • 15:09 marostegui@cumin1001: dbctl commit (dc=all): 'db1126 (re)pooling @ 30%: Repool db1126', diff saved to https://phabricator.wikimedia.org/P15783 and previous config saved to /var/cache/conftool/dbconfig/20210505-150942-root.json
  • 14:54 marostegui@cumin1001: dbctl commit (dc=all): 'db1126 (re)pooling @ 20%: Repool db1126', diff saved to https://phabricator.wikimedia.org/P15782 and previous config saved to /var/cache/conftool/dbconfig/20210505-145438-root.json
  • 14:44 marostegui@cumin1001: dbctl commit (dc=all): 'db1165 (re)pooling @ 100%: Repool db1165', diff saved to https://phabricator.wikimedia.org/P15781 and previous config saved to /var/cache/conftool/dbconfig/20210505-144431-root.json
  • 14:39 marostegui@cumin1001: dbctl commit (dc=all): 'db1126 (re)pooling @ 10%: Repool db1126', diff saved to https://phabricator.wikimedia.org/P15780 and previous config saved to /var/cache/conftool/dbconfig/20210505-143934-root.json
  • 14:29 marostegui@cumin1001: dbctl commit (dc=all): 'db1165 (re)pooling @ 75%: Repool db1165', diff saved to https://phabricator.wikimedia.org/P15779 and previous config saved to /var/cache/conftool/dbconfig/20210505-142927-root.json
  • 14:28 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 10 hosts with reason: Reimage db2129 T280751
  • 14:28 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 10 hosts with reason: Reimage db2129 T280751
  • 14:24 marostegui@cumin1001: dbctl commit (dc=all): 'db1126 (re)pooling @ 5%: Repool db1126', diff saved to https://phabricator.wikimedia.org/P15778 and previous config saved to /var/cache/conftool/dbconfig/20210505-142431-root.json
  • 14:19 kormat@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2129.codfw.wmnet with reason: REIMAGE
  • 14:18 marostegui: Upgrade kernel and enable report_host on db1126
  • 14:17 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1126 to enable report_host', diff saved to https://phabricator.wikimedia.org/P15777 and previous config saved to /var/cache/conftool/dbconfig/20210505-141735-marostegui.json
  • 14:17 kormat@cumin2001: START - Cookbook sre.hosts.downtime for 2:00:00 on db2129.codfw.wmnet with reason: REIMAGE
  • 14:14 marostegui@cumin1001: dbctl commit (dc=all): 'db1165 (re)pooling @ 50%: Repool db1165', diff saved to https://phabricator.wikimedia.org/P15776 and previous config saved to /var/cache/conftool/dbconfig/20210505-141423-root.json
  • 13:59 marostegui@cumin1001: dbctl commit (dc=all): 'db1165 (re)pooling @ 25%: Repool db1165', diff saved to https://phabricator.wikimedia.org/P15775 and previous config saved to /var/cache/conftool/dbconfig/20210505-135920-root.json
  • 13:58 kevinbazira@deploy1002: Finished deploy [ores/deploy@5612f30]: Regular ORES Deployment T278723 (duration: 16m 47s)
  • 13:48 wmde-fisch@deploy1002: Synchronized wmf-config/CommonSettings.php: Config: Revert "Enable ReferencePreviews on first wikis CommonSettings" () (duration: 02m 08s)
  • 13:41 kevinbazira@deploy1002: Started deploy [ores/deploy@5612f30]: Regular ORES Deployment T278723
  • 13:33 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1165 for schema change', diff saved to https://phabricator.wikimedia.org/P15774 and previous config saved to /var/cache/conftool/dbconfig/20210505-133259-marostegui.json
  • 13:32 marostegui@cumin1001: dbctl commit (dc=all): 'db1180 (re)pooling @ 100%: Repool db1180', diff saved to https://phabricator.wikimedia.org/P15773 and previous config saved to /var/cache/conftool/dbconfig/20210505-133202-root.json
  • 13:19 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 9 hosts with reason: Reimage db2129 T280751
  • 13:19 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 9 hosts with reason: Reimage db2129 T280751
  • 13:16 marostegui@cumin1001: dbctl commit (dc=all): 'db1180 (re)pooling @ 75%: Repool db1180', diff saved to https://phabricator.wikimedia.org/P15772 and previous config saved to /var/cache/conftool/dbconfig/20210505-131658-root.json
  • 13:12 kormat: reimaging db2129 to buster T280751
  • 13:01 marostegui@cumin1001: dbctl commit (dc=all): 'db1180 (re)pooling @ 50%: Repool db1180', diff saved to https://phabricator.wikimedia.org/P15771 and previous config saved to /var/cache/conftool/dbconfig/20210505-130155-root.json
  • 12:46 marostegui@cumin1001: dbctl commit (dc=all): 'db1180 (re)pooling @ 25%: Repool db1180', diff saved to https://phabricator.wikimedia.org/P15770 and previous config saved to /var/cache/conftool/dbconfig/20210505-124651-root.json
  • 12:23 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1180 for schema change', diff saved to https://phabricator.wikimedia.org/P15769 and previous config saved to /var/cache/conftool/dbconfig/20210505-122351-marostegui.json
  • 12:13 marostegui@cumin1001: dbctl commit (dc=all): 'db1173 (re)pooling @ 100%: Repool db1173', diff saved to https://phabricator.wikimedia.org/P15768 and previous config saved to /var/cache/conftool/dbconfig/20210505-121353-root.json
  • 12:01 moritzm: installing exim security updates on stretch
  • 11:58 marostegui@cumin1001: dbctl commit (dc=all): 'db1173 (re)pooling @ 75%: Repool db1173', diff saved to https://phabricator.wikimedia.org/P15767 and previous config saved to /var/cache/conftool/dbconfig/20210505-115849-root.json
  • 11:43 marostegui@cumin1001: dbctl commit (dc=all): 'db1173 (re)pooling @ 50%: Repool db1173', diff saved to https://phabricator.wikimedia.org/P15765 and previous config saved to /var/cache/conftool/dbconfig/20210505-114345-root.json
  • 11:28 marostegui@cumin1001: dbctl commit (dc=all): 'db1173 (re)pooling @ 25%: Repool db1173', diff saved to https://phabricator.wikimedia.org/P15764 and previous config saved to /var/cache/conftool/dbconfig/20210505-112842-root.json
  • 11:25 urbanecm@deploy1002: Synchronized wmf-config/CommonSettings.php: 3565427: Enable ReferencePreviews on first wikis (T271206; 2/2) (duration: 01m 10s)
  • 11:24 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 4f3051b: Enable ReferencePreviews on first wikis (T271206; 1/2) (duration: 01m 20s)
  • 11:17 urbanecm@deploy1002: Scap failed!: Call to mwscript eval.php stderr: not empty
  • 11:11 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 289dc34: Enable new language button for all logged in users outside test projects (T280526) (duration: 02m 24s)
  • 10:29 jmm@cumin2001: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0)
  • 09:54 hashar: Restarted Zuul / CI
  • 09:49 marostegui@cumin1001: dbctl commit (dc=all): 'db1168 (re)pooling @ 100%: Repool db1168', diff saved to https://phabricator.wikimedia.org/P15762 and previous config saved to /var/cache/conftool/dbconfig/20210505-094945-root.json
  • 09:40 marostegui@cumin1001: dbctl commit (dc=all): 'db1178 (re)pooling @ 100%: Slowly pool db1178 into s8 T275633', diff saved to https://phabricator.wikimedia.org/P15761 and previous config saved to /var/cache/conftool/dbconfig/20210505-094005-root.json
  • 09:34 marostegui@cumin1001: dbctl commit (dc=all): 'db1168 (re)pooling @ 75%: Repool db1168', diff saved to https://phabricator.wikimedia.org/P15760 and previous config saved to /var/cache/conftool/dbconfig/20210505-093441-root.json
  • 09:25 marostegui@cumin1001: dbctl commit (dc=all): 'db1178 (re)pooling @ 80%: Slowly pool db1178 into s8 T275633', diff saved to https://phabricator.wikimedia.org/P15759 and previous config saved to /var/cache/conftool/dbconfig/20210505-092501-root.json
  • 09:19 marostegui@cumin1001: dbctl commit (dc=all): 'db1168 (re)pooling @ 50%: Repool db1168', diff saved to https://phabricator.wikimedia.org/P15758 and previous config saved to /var/cache/conftool/dbconfig/20210505-091938-root.json
  • 09:09 marostegui@cumin1001: dbctl commit (dc=all): 'db1178 (re)pooling @ 70%: Slowly pool db1178 into s8 T275633', diff saved to https://phabricator.wikimedia.org/P15757 and previous config saved to /var/cache/conftool/dbconfig/20210505-090957-root.json
  • 09:08 hashar: Upgraded Jenkins ldap plugin from 1.26 to 2.6 # T281737
  • 09:04 marostegui@cumin1001: dbctl commit (dc=all): 'db1168 (re)pooling @ 25%: Repool db1168', diff saved to https://phabricator.wikimedia.org/P15756 and previous config saved to /var/cache/conftool/dbconfig/20210505-090434-root.json
  • 08:55 hashar: Restarting CI Jenkins # T281737
  • 08:54 marostegui@cumin1001: dbctl commit (dc=all): 'db1178 (re)pooling @ 60%: Slowly pool db1178 into s8 T275633', diff saved to https://phabricator.wikimedia.org/P15755 and previous config saved to /var/cache/conftool/dbconfig/20210505-085454-root.json
  • 08:50 jgiannelos@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
  • 08:47 jgiannelos@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
  • 08:41 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'mobileapps' for release 'staging' .
  • 08:39 marostegui@cumin1001: dbctl commit (dc=all): 'db1178 (re)pooling @ 50%: Slowly pool db1178 into s8 T275633', diff saved to https://phabricator.wikimedia.org/P15754 and previous config saved to /var/cache/conftool/dbconfig/20210505-083950-root.json
  • 08:38 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1168 for schema change', diff saved to https://phabricator.wikimedia.org/P15753 and previous config saved to /var/cache/conftool/dbconfig/20210505-083810-marostegui.json
  • 08:26 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1113:3316', diff saved to https://phabricator.wikimedia.org/P15752 and previous config saved to /var/cache/conftool/dbconfig/20210505-082609-marostegui.json
  • 08:24 marostegui@cumin1001: dbctl commit (dc=all): 'db1178 (re)pooling @ 35%: Slowly pool db1178 into s8 T275633', diff saved to https://phabricator.wikimedia.org/P15751 and previous config saved to /var/cache/conftool/dbconfig/20210505-082446-root.json
  • 08:13 volans: uploaded spicerack_0.0.51 to apt.wikimedia.org buster-wikimedia
  • 08:09 marostegui@cumin1001: dbctl commit (dc=all): 'db1178 (re)pooling @ 30%: Slowly pool db1178 into s8 T275633', diff saved to https://phabricator.wikimedia.org/P15750 and previous config saved to /var/cache/conftool/dbconfig/20210505-080942-root.json
  • 07:54 marostegui@cumin1001: dbctl commit (dc=all): 'db1178 (re)pooling @ 25%: Slowly pool db1178 into s8 T275633', diff saved to https://phabricator.wikimedia.org/P15749 and previous config saved to /var/cache/conftool/dbconfig/20210505-075438-root.json
  • 07:53 ryankemper@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) reboot without plugin upgrade (3 nodes at a time) for ElasticSearch cluster search_codfw: codfw reboot - ryankemper@cumin1001 - T280563
  • 07:39 marostegui@cumin1001: dbctl commit (dc=all): 'db1178 (re)pooling @ 20%: Slowly pool db1178 into s8 T275633', diff saved to https://phabricator.wikimedia.org/P15748 and previous config saved to /var/cache/conftool/dbconfig/20210505-073934-root.json
  • 07:37 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1113:3316 for schema change', diff saved to https://phabricator.wikimedia.org/P15747 and previous config saved to /var/cache/conftool/dbconfig/20210505-073722-marostegui.json
  • 07:36 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3316 (re)pooling @ 100%: Repool db1096:3316 after schema change', diff saved to https://phabricator.wikimedia.org/P15746 and previous config saved to /var/cache/conftool/dbconfig/20210505-073653-root.json
  • 07:35 jmm@cumin2001: START - Cookbook sre.cassandra.roll-restart
  • 07:35 moritzm: rolling restart of cassandra in eqiad to pick up Java security updates
  • 07:34 marostegui@cumin1001: dbctl commit (dc=all): 'db1156 (re)pooling @ 100%: Repool db1156', diff saved to https://phabricator.wikimedia.org/P15745 and previous config saved to /var/cache/conftool/dbconfig/20210505-073416-root.json
  • 07:32 marostegui@cumin1001: dbctl commit (dc=all): 'db1074 (re)pooling @ 100%: Repool db1074', diff saved to https://phabricator.wikimedia.org/P15744 and previous config saved to /var/cache/conftool/dbconfig/20210505-073223-root.json
  • 07:24 marostegui@cumin1001: dbctl commit (dc=all): 'db1178 (re)pooling @ 15%: Slowly pool db1178 into s8 T275633', diff saved to https://phabricator.wikimedia.org/P15743 and previous config saved to /var/cache/conftool/dbconfig/20210505-072431-root.json
  • 07:21 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3316 (re)pooling @ 75%: Repool db1096:3316 after schema change', diff saved to https://phabricator.wikimedia.org/P15742 and previous config saved to /var/cache/conftool/dbconfig/20210505-072149-root.json
  • 07:19 marostegui@cumin1001: dbctl commit (dc=all): 'db1156 (re)pooling @ 75%: Repool db1156', diff saved to https://phabricator.wikimedia.org/P15741 and previous config saved to /var/cache/conftool/dbconfig/20210505-071912-root.json
  • 07:17 marostegui@cumin1001: dbctl commit (dc=all): 'db1074 (re)pooling @ 75%: Repool db1074', diff saved to https://phabricator.wikimedia.org/P15740 and previous config saved to /var/cache/conftool/dbconfig/20210505-071720-root.json
  • 07:11 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1082 T281794', diff saved to https://phabricator.wikimedia.org/P15739 and previous config saved to /var/cache/conftool/dbconfig/20210505-071132-marostegui.json
  • 07:09 marostegui@cumin1001: dbctl commit (dc=all): 'db1178 (re)pooling @ 10%: Slowly pool db1178 into s8 T275633', diff saved to https://phabricator.wikimedia.org/P15738 and previous config saved to /var/cache/conftool/dbconfig/20210505-070927-root.json
  • 07:06 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3316 (re)pooling @ 50%: Repool db1096:3316 after schema change', diff saved to https://phabricator.wikimedia.org/P15737 and previous config saved to /var/cache/conftool/dbconfig/20210505-070646-root.json
  • 07:04 marostegui@cumin1001: dbctl commit (dc=all): 'db1156 (re)pooling @ 50%: Repool db1156', diff saved to https://phabricator.wikimedia.org/P15736 and previous config saved to /var/cache/conftool/dbconfig/20210505-070409-root.json
  • 07:02 marostegui@cumin1001: dbctl commit (dc=all): 'db1074 (re)pooling @ 50%: Repool db1074', diff saved to https://phabricator.wikimedia.org/P15735 and previous config saved to /var/cache/conftool/dbconfig/20210505-070216-root.json
  • 06:54 marostegui@cumin1001: dbctl commit (dc=all): 'db1178 (re)pooling @ 5%: Slowly pool db1178 into s8 T275633', diff saved to https://phabricator.wikimedia.org/P15734 and previous config saved to /var/cache/conftool/dbconfig/20210505-065423-root.json
  • 06:51 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3316 (re)pooling @ 25%: Repool db1096:3316 after schema change', diff saved to https://phabricator.wikimedia.org/P15733 and previous config saved to /var/cache/conftool/dbconfig/20210505-065142-root.json
  • 06:49 marostegui@cumin1001: dbctl commit (dc=all): 'db1156 (re)pooling @ 25%: Repool db1156', diff saved to https://phabricator.wikimedia.org/P15732 and previous config saved to /var/cache/conftool/dbconfig/20210505-064905-root.json
  • 06:47 marostegui@cumin1001: dbctl commit (dc=all): 'db1074 (re)pooling @ 25%: Repool db1074', diff saved to https://phabricator.wikimedia.org/P15731 and previous config saved to /var/cache/conftool/dbconfig/20210505-064712-root.json
  • 06:42 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1074 and db1156 to switch sanitarium hosts T280492', diff saved to https://phabricator.wikimedia.org/P15730 and previous config saved to /var/cache/conftool/dbconfig/20210505-064204-marostegui.json
  • 06:41 marostegui: Check tables on db1112 (lag might show up on s3 on wiki replicas) T280492
  • 06:39 marostegui@cumin1001: dbctl commit (dc=all): 'db1178 (re)pooling @ 3%: Slowly pool db1178 into s8 T275633', diff saved to https://phabricator.wikimedia.org/P15729 and previous config saved to /var/cache/conftool/dbconfig/20210505-063920-root.json
  • 06:24 marostegui@cumin1001: dbctl commit (dc=all): 'db1178 (re)pooling @ 2%: Slowly pool db1178 into s8 T275633', diff saved to https://phabricator.wikimedia.org/P15728 and previous config saved to /var/cache/conftool/dbconfig/20210505-062416-root.json
  • 06:09 marostegui@cumin1001: dbctl commit (dc=all): 'db1178 (re)pooling @ 1%: Slowly pool db1178 into s8 T275633', diff saved to https://phabricator.wikimedia.org/P15727 and previous config saved to /var/cache/conftool/dbconfig/20210505-060912-root.json
  • 06:08 marostegui@cumin1001: dbctl commit (dc=all): 'Add db1178 into dbctl T275633', diff saved to https://phabricator.wikimedia.org/P15726 and previous config saved to /var/cache/conftool/dbconfig/20210505-060814-marostegui.json
  • 06:06 marostegui@cumin1001: dbctl commit (dc=all): 'Remove db1104 from API', diff saved to https://phabricator.wikimedia.org/P15725 and previous config saved to /var/cache/conftool/dbconfig/20210505-060636-marostegui.json
  • 06:00 marostegui: Restart mysqld on x1 database primary master (db1103) T281212
  • 05:38 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1099:3311 into main traffic', diff saved to https://phabricator.wikimedia.org/P15724 and previous config saved to /var/cache/conftool/dbconfig/20210505-053841-marostegui.json
  • 05:32 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1106 into s1 vslow, remove db1099:3311', diff saved to https://phabricator.wikimedia.org/P15723 and previous config saved to /var/cache/conftool/dbconfig/20210505-053211-marostegui.json
  • 05:29 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1096:3316 for schema change', diff saved to https://phabricator.wikimedia.org/P15722 and previous config saved to /var/cache/conftool/dbconfig/20210505-052943-marostegui.json
  • 04:53 eileen: civicrm revision changed from e7c610fd87 to 8034e47008, config revision is 189788d452
  • 03:58 ryankemper: T280563 `sudo -i cookbook sre.elasticsearch.rolling-operation search_codfw "codfw reboot" --reboot --nodes-per-run 3 --start-datetime 2021-04-29T23:04:29 --task-id T280563` on `ryankemper@cumin1001` tmux session `elastic_restarts`
  • 03:58 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation reboot without plugin upgrade (3 nodes at a time) for ElasticSearch cluster search_codfw: codfw reboot - ryankemper@cumin1001 - T280563
  • 03:56 ryankemper: T280563 Reboot of `eqiad` complete. Only ~half of `codfw` is remaining.
  • 03:56 ryankemper@cumin1001: END (PASS) - Cookbook sre.elasticsearch.rolling-operation (exit_code=0) reboot without plugin upgrade (3 nodes at a time) for ElasticSearch cluster search_eqiad: eqiad reboot to apply sec updates - ryankemper@cumin1001 - T280563
  • 03:54 ryankemper: T280382 `wdqs1011.eqiad.wmnet` has been re-imaged and had the appropriate wikidata/categories journal files transferred. `df -h` shows disk space is no longer an issue following the switch to `raid0`: `/dev/mapper/vg0-srv 2.7T 998G 1.6T 39% /srv`
  • 03:52 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation reboot without plugin upgrade (3 nodes at a time) for ElasticSearch cluster search_eqiad: eqiad reboot to apply sec updates - ryankemper@cumin1001 - T280563
  • 03:51 ryankemper: T280382 [WDQS] `ryankemper@wdqs2007:~$ sudo depool` (need to monitor host to see if it becomes ssh unreachable again or if it was a one-off; also high update lag)
  • 03:50 ryankemper: T280382 `wdqs2007.codfw.wmnet` has been re-imaged and had the appropriate wikidata/categories journal files transferred. `df -h` shows disk space is no longer an issue following the switch to `raid0`: `/dev/mapper/vg0-srv 2.7T 998G 1.6T 39% /srv`
  • 03:07 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 03:02 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 02:59 ryankemper@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) reboot without plugin upgrade (3 nodes at a time) for ElasticSearch cluster search_eqiad: eqiad reboot to apply sec updates - ryankemper@cumin1001 - T280563
  • 01:55 ryankemper: T281327 [Elastic] Unbanned `elastic2043` from cluster
  • 01:50 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 01:49 ryankemper: T280382 `sudo -i cookbook sre.wdqs.data-transfer --source wdqs2001.codfw.wmnet --dest wdqs2007.codfw.wmnet --reason "transferring fresh wikidata journal following reimage" --blazegraph_instance blazegraph` on `ryankemper@cumin1001` tmux session `reimage` (will likely fail due to underlying hw but we'll see)
  • 01:47 ryankemper@cumin1001: END (ERROR) - Cookbook sre.wdqs.data-transfer (exit_code=97)
  • 01:45 ryankemper: T280382 `sudo -i cookbook sre.wdqs.data-transfer --source wdqs1006.eqiad.wmnet --dest wdqs1011.eqiad.wmnet --reason "transferring fresh wikidata journal following reimage" --blazegraph_instance blazegraph` on `ryankemper@cumin1001` tmux session `reimage`
  • 01:45 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 01:44 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 01:43 ryankemper: T280382 [WDQS] `racadm>>racadm serveraction powercycle` on `wdqs2007`
  • 01:39 ryankemper: T280382 `sudo -i cookbook sre.wdqs.data-transfer --source wdqs1006.eqiad.wmnet --dest wdqs1011.eqiad.wmnet --reason "transferring fresh categories journal following reimage" --blazegraph_instance categories` on `ryankemper@cumin1001` tmux session `reimage`
  • 01:39 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 01:36 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation reboot without plugin upgrade (3 nodes at a time) for ElasticSearch cluster search_eqiad: eqiad reboot to apply sec updates - ryankemper@cumin1001 - T280563
  • 00:29 eileen: civicrm revision changed from 94e321dbe0 to e7c610fd87, config revision is 189788d452
  • 00:15 ejegg: updated payments-wiki from 44570561f2 to d449599540
  • 00:08 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 3f6ea8c: Growth: enwiki: Add list of mentors (T281896) (duration: 01m 10s)
  • 00:00 urbanecm@deploy1002: Synchronized fc-list: 9397049: update fc-list to current version on buster (T79424) (duration: 01m 09s)

2021-05-04

  • 23:41 urbanecm@deploy1002: Synchronized wmf-config/config/enwiki.yaml: d29dbb2: Enable Growth features on enwiki in the dark mode (T281896; 3/3) (duration: 01m 09s)
  • 23:40 urbanecm@deploy1002: Synchronized dblists/growthexperiments.dblist: d29dbb2: Enable Growth features on enwiki in the dark mode (T281896; 2/3) (duration: 01m 09s)
  • 23:38 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: d29dbb2: Enable Growth features on enwiki in the dark mode (T281896; 1/3) (duration: 01m 09s)
  • 23:31 urbanecm@deploy1002: Synchronized wmf-config/config/bgwiki.yaml: 5b4c516: Enable Growth team features in dark mode on bgwiki (T280824; 3/3) (duration: 01m 09s)
  • 23:30 urbanecm@deploy1002: sync-file aborted: 5b4c516: Enable Growth team features in dark mode on bgwiki (T280824; 3/3) (duration: 00m 03s)
  • 23:30 urbanecm@deploy1002: Synchronized dblists/growthexperiments.dblist: 5b4c516: Enable Growth team features in dark mode on bgwiki (T280824; 2/3) (duration: 01m 09s)
  • 23:28 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 5b4c516: Enable Growth team features in dark mode on bgwiki (T280824; 1/3) (duration: 01m 09s)
  • 23:26 Urbanecm: Create tables for GrowthExperiments extension on enwiki (T281896)
  • 23:24 Urbanecm: Create tables for GrowthExperiments extension on bgwiki (T280824)
  • 23:22 urbanecm@deploy1002: Synchronized wmf-config/CommonSettings.php: a3c24f3: Avoid using User::getGroups() and ::getEffectiveGroups() (T281823) (duration: 01m 10s)
  • 23:13 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: e467d92: Add extendedconfirmed on ptwiki (T281926) (duration: 01m 10s)
  • 23:06 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 012d613: Add extendedconfirmed on azwiki (T281860) (duration: 01m 10s)
  • 22:49 bblack@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on cp5016.eqsin.wmnet with reason: REIMAGE
  • 22:47 bblack@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on cp5015.eqsin.wmnet with reason: REIMAGE
  • 22:46 bblack@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp5014.eqsin.wmnet with reason: REIMAGE
  • 22:44 bblack@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp5016.eqsin.wmnet with reason: REIMAGE
  • 22:44 bblack@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp5013.eqsin.wmnet with reason: REIMAGE
  • 22:42 bblack@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp5015.eqsin.wmnet with reason: REIMAGE
  • 22:42 bblack@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp5014.eqsin.wmnet with reason: REIMAGE
  • 22:42 bblack@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp5013.eqsin.wmnet with reason: REIMAGE
  • 21:30 eileen: civicrm revision changed from 33a63d5789 to 94e321dbe0, config revision is a212d6ab23
  • 21:17 ebernhardson@deploy1002: Finished deploy [wikimedia/discovery/analytics@06a4a3e]: Bump glent to 0.2.4 (duration: 03m 55s)
  • 21:13 ebernhardson@deploy1002: Started deploy [wikimedia/discovery/analytics@06a4a3e]: Bump glent to 0.2.4
  • 20:13 pt1979@cumin2001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 20:10 pt1979@cumin2001: START - Cookbook sre.dns.netbox
  • 20:09 joal@deploy1002: Finished deploy [analytics/refinery@0dc3ae7] (hadoop-test): Regular analytics weekly train HADOOP-TEST [analytics/refinery@0dc3ae7] (duration: 05m 16s)
  • 20:04 joal@deploy1002: Started deploy [analytics/refinery@0dc3ae7] (hadoop-test): Regular analytics weekly train HADOOP-TEST [analytics/refinery@0dc3ae7]
  • 20:03 joal@deploy1002: Finished deploy [analytics/refinery@0dc3ae7] (thin): Regular analytics weekly train THIN [analytics/refinery@0dc3ae7] (duration: 00m 07s)
  • 20:03 joal@deploy1002: Started deploy [analytics/refinery@0dc3ae7] (thin): Regular analytics weekly train THIN [analytics/refinery@0dc3ae7]
  • 20:03 joal@deploy1002: Finished deploy [analytics/refinery@0dc3ae7]: Regular analytics weekly train [analytics/refinery@0dc3ae7] (duration: 17m 15s)
  • 19:46 joal@deploy1002: Started deploy [analytics/refinery@0dc3ae7]: Regular analytics weekly train [analytics/refinery@0dc3ae7]
  • 19:38 dancy@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.37.0-wmf.4
  • 17:58 dancy@deploy1002: Finished scap: testwikis wikis to 1.37.0-wmf.4 (duration: 42m 33s)
  • 17:26 ebernhardson@deploy1002: Finished deploy [wikimedia/discovery/analytics@0c4538f]: Increase convert_to_esbulk memory overhead (duration: 01m 46s)
  • 17:24 ebernhardson@deploy1002: Started deploy [wikimedia/discovery/analytics@0c4538f]: Increase convert_to_esbulk memory overhead
  • 17:16 dancy@deploy1002: Started scap: testwikis wikis to 1.37.0-wmf.4
  • 17:03 brennen: 1.37.0-wmf.4 was branched at f069fd8 for T281145
  • 17:00 volans: uploaded spicerack_0.0.51 to apt.wikimedia.org bullseye-wikimedia
  • 16:26 ebernhardson@deploy1002: Finished deploy [wikimedia/discovery/analytics@e6ae572]: Increase convert_to_esbulk memory overhead (duration: 01m 54s)
  • 16:25 ebernhardson@deploy1002: Started deploy [wikimedia/discovery/analytics@e6ae572]: Increase convert_to_esbulk memory overhead
  • 16:16 dzahn@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
  • 16:15 dzahn@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
  • 16:13 mutante: k8s: upgrading release=namespaces, helmfile apply to create miscweb namespace T281538
  • 16:13 dzahn@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 16:12 hnowlan@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 16:12 hnowlan@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 16:12 dzahn@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 16:07 hnowlan@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 16:07 hnowlan@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 15:59 hnowlan@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 15:59 hnowlan@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 14:41 jmm@cumin2001: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0)
  • 13:46 moritzm: installing exim security updates on buster
  • 13:39 marostegui@cumin1001: dbctl commit (dc=all): 'db1137 (re)pooling @ 100%: Repool db1137 after mysql upgrade', diff saved to https://phabricator.wikimedia.org/P15721 and previous config saved to /var/cache/conftool/dbconfig/20210504-133950-root.json
  • 13:24 marostegui@cumin1001: dbctl commit (dc=all): 'db1137 (re)pooling @ 75%: Repool db1137 after mysql upgrade', diff saved to https://phabricator.wikimedia.org/P15720 and previous config saved to /var/cache/conftool/dbconfig/20210504-132446-root.json
  • 13:14 moritzm: upgrading linux-libc-dev on buster hosts (to version introduced by 10.9 point release)
  • 13:12 hnowlan@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 13:12 hnowlan@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 13:09 marostegui@cumin1001: dbctl commit (dc=all): 'db1137 (re)pooling @ 50%: Repool db1137 after mysql upgrade', diff saved to https://phabricator.wikimedia.org/P15719 and previous config saved to /var/cache/conftool/dbconfig/20210504-130943-root.json
  • 13:01 moritzm: installing debian-archive-keyring updates on buster
  • 12:54 marostegui@cumin1001: dbctl commit (dc=all): 'db1137 (re)pooling @ 25%: Repool db1137 after mysql upgrade', diff saved to https://phabricator.wikimedia.org/P15718 and previous config saved to /var/cache/conftool/dbconfig/20210504-125439-root.json
  • 12:50 marostegui: Upgrade mysql and kernel on db1137 T281212
  • 12:49 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1137 to upgrade its mysql T281212', diff saved to https://phabricator.wikimedia.org/P15717 and previous config saved to /var/cache/conftool/dbconfig/20210504-124937-marostegui.json
  • 12:48 marostegui@cumin1001: dbctl commit (dc=all): 'db1120 (re)pooling @ 100%: Repool db1120 after mysql upgrade', diff saved to https://phabricator.wikimedia.org/P15716 and previous config saved to /var/cache/conftool/dbconfig/20210504-124848-root.json
  • 12:46 kormat@cumin1001: dbctl commit (dc=all): 'Repooling after sanitarium master switch T280751', diff saved to https://phabricator.wikimedia.org/P15715 and previous config saved to /var/cache/conftool/dbconfig/20210504-124647-kormat.json
  • 12:35 kormat@cumin1001: dbctl commit (dc=all): 'Depooling for sanitarium master switch T280751', diff saved to https://phabricator.wikimedia.org/P15714 and previous config saved to /var/cache/conftool/dbconfig/20210504-123537-kormat.json
  • 12:34 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 15 hosts with reason: Replace db1085 with db1165 T280751
  • 12:34 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 15 hosts with reason: Replace db1085 with db1165 T280751
  • 12:33 marostegui@cumin1001: dbctl commit (dc=all): 'db1120 (re)pooling @ 75%: Repool db1120 after mysql upgrade', diff saved to https://phabricator.wikimedia.org/P15713 and previous config saved to /var/cache/conftool/dbconfig/20210504-123344-root.json
  • 12:27 hnowlan@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 12:27 hnowlan@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 12:22 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 683b876: 5763630: GrowthExperiments: Rename control variant to control, GrowthExperiments: Set linkrecommendation variant to 0 (T281727) (duration: 00m 58s)
  • 12:20 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.3/extensions/GrowthExperiments/: 8f938c2: c8c07ab: GrowthExperiments backports (T281727) (duration: 00m 59s)
  • 12:18 marostegui@cumin1001: dbctl commit (dc=all): 'db1120 (re)pooling @ 50%: Repool db1120 after mysql upgrade', diff saved to https://phabricator.wikimedia.org/P15712 and previous config saved to /var/cache/conftool/dbconfig/20210504-121841-root.json
  • 12:08 jmm@cumin2001: START - Cookbook sre.cassandra.roll-restart
  • 12:03 marostegui@cumin1001: dbctl commit (dc=all): 'db1120 (re)pooling @ 25%: Repool db1120 after mysql upgrade', diff saved to https://phabricator.wikimedia.org/P15711 and previous config saved to /var/cache/conftool/dbconfig/20210504-120337-root.json
  • 11:58 marostegui: Upgrade mysql and kernel on db1120 T281212
  • 11:56 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1120 to upgrade its mysql T281212', diff saved to https://phabricator.wikimedia.org/P15710 and previous config saved to /var/cache/conftool/dbconfig/20210504-115634-marostegui.json
  • 11:40 jmm@cumin2001: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0)
  • 11:31 Urbanecm: Run `User::newSystemUser( 'Maintenance script', [ 'steal' => true ] );` on arwiki, bnwiki, viwiki (T278710, T281703)
  • 11:27 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 87dff0b: GrowthExperiments: Enable link recommendations for target wikis (T278710) (duration: 00m 57s)
  • 11:10 Urbanecm: Create growthexperiments_link_recommendations and growthexperiments_link_submissions on arwiki,bnwiki,viwiki x1 (T266913)
  • 11:06 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 8228f6b: Disable ContentTranslation New article campaign in fiwiki (T277473) (duration: 00m 59s)
  • 10:26 marostegui@cumin1001: dbctl commit (dc=all): 'db1167 (re)pooling @ 100%: Repool db1167', diff saved to https://phabricator.wikimedia.org/P15707 and previous config saved to /var/cache/conftool/dbconfig/20210504-102649-root.json
  • 10:11 marostegui@cumin1001: dbctl commit (dc=all): 'db1167 (re)pooling @ 75%: Repool db1167', diff saved to https://phabricator.wikimedia.org/P15705 and previous config saved to /var/cache/conftool/dbconfig/20210504-101145-root.json
  • 09:57 moritzm: installing bind9 security updates on buster (client side tools/libs only)
  • 09:56 marostegui@cumin1001: dbctl commit (dc=all): 'db1167 (re)pooling @ 50%: Repool db1167', diff saved to https://phabricator.wikimedia.org/P15704 and previous config saved to /var/cache/conftool/dbconfig/20210504-095642-root.json
  • 09:45 godog: +50G for prometheus k8s in codfw
  • 09:41 marostegui@cumin1001: dbctl commit (dc=all): 'db1167 (re)pooling @ 25%: Repool db1167', diff saved to https://phabricator.wikimedia.org/P15703 and previous config saved to /var/cache/conftool/dbconfig/20210504-094138-root.json
  • 09:04 jmm@cumin2001: START - Cookbook sre.cassandra.roll-restart
  • 09:04 moritzm: rolling restart of cassandra in codfw to pick up Java security updates
  • 08:17 marostegui@cumin1001: dbctl commit (dc=all): 'db1082 (re)pooling @ 100%: Repool db1082', diff saved to https://phabricator.wikimedia.org/P15702 and previous config saved to /var/cache/conftool/dbconfig/20210504-081716-root.json
  • 08:02 marostegui: Check tables on db1106, lag will show up on s1 on wiki replicas (T280492)
  • 08:02 marostegui@cumin1001: dbctl commit (dc=all): 'db1161 (re)pooling @ 100%: Repool db1161', diff saved to https://phabricator.wikimedia.org/P15701 and previous config saved to /var/cache/conftool/dbconfig/20210504-080213-root.json
  • 08:02 marostegui@cumin1001: dbctl commit (dc=all): 'db1082 (re)pooling @ 75%: Repool db1082', diff saved to https://phabricator.wikimedia.org/P15700 and previous config saved to /var/cache/conftool/dbconfig/20210504-080212-root.json
  • 08:02 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1106 from s1 vslow to get its tables checked and pool db1099:3311 instead T280492', diff saved to https://phabricator.wikimedia.org/P15699 and previous config saved to /var/cache/conftool/dbconfig/20210504-080206-marostegui.json
  • 07:46 marostegui@cumin1001: dbctl commit (dc=all): 'db1161 (re)pooling @ 75%: Repool db1161', diff saved to https://phabricator.wikimedia.org/P15698 and previous config saved to /var/cache/conftool/dbconfig/20210504-074639-root.json
  • 07:46 marostegui@cumin1001: dbctl commit (dc=all): 'db1082 (re)pooling @ 50%: Repool db1082', diff saved to https://phabricator.wikimedia.org/P15697 and previous config saved to /var/cache/conftool/dbconfig/20210504-074632-root.json
  • 07:31 marostegui@cumin1001: dbctl commit (dc=all): 'db1161 (re)pooling @ 50%: Repool db1161', diff saved to https://phabricator.wikimedia.org/P15696 and previous config saved to /var/cache/conftool/dbconfig/20210504-073135-root.json
  • 07:31 marostegui@cumin1001: dbctl commit (dc=all): 'db1082 (re)pooling @ 25%: Repool db1082', diff saved to https://phabricator.wikimedia.org/P15695 and previous config saved to /var/cache/conftool/dbconfig/20210504-073127-root.json
  • 07:16 marostegui@cumin1001: dbctl commit (dc=all): 'db1161 (re)pooling @ 25%: Repool db1161', diff saved to https://phabricator.wikimedia.org/P15694 and previous config saved to /var/cache/conftool/dbconfig/20210504-071632-root.json
  • 07:16 marostegui@cumin1001: dbctl commit (dc=all): 'db1082 (re)pooling @ 10%: Repool db1082', diff saved to https://phabricator.wikimedia.org/P15693 and previous config saved to /var/cache/conftool/dbconfig/20210504-071623-root.json
  • 07:11 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1161 and db1082 to change s5 sanitarium master T280492', diff saved to https://phabricator.wikimedia.org/P15692 and previous config saved to /var/cache/conftool/dbconfig/20210504-071146-marostegui.json
  • 06:50 marostegui@cumin1001: dbctl commit (dc=all): 'db1158 (re)pooling @ 100%: Repool db1158', diff saved to https://phabricator.wikimedia.org/P15691 and previous config saved to /var/cache/conftool/dbconfig/20210504-065034-root.json
  • 06:35 marostegui@cumin1001: dbctl commit (dc=all): 'db1158 (re)pooling @ 75%: Repool db1158', diff saved to https://phabricator.wikimedia.org/P15690 and previous config saved to /var/cache/conftool/dbconfig/20210504-063530-root.json
  • 06:20 marostegui@cumin1001: dbctl commit (dc=all): 'db1158 (re)pooling @ 50%: Repool db1158', diff saved to https://phabricator.wikimedia.org/P15689 and previous config saved to /var/cache/conftool/dbconfig/20210504-062027-root.json
  • 06:17 marostegui@cumin1001: dbctl commit (dc=all): 'db1118 (re)pooling @ 100%: Repool db1118', diff saved to https://phabricator.wikimedia.org/P15688 and previous config saved to /var/cache/conftool/dbconfig/20210504-061700-root.json
  • 06:05 marostegui@cumin1001: dbctl commit (dc=all): 'db1158 (re)pooling @ 25%: Repool db1158', diff saved to https://phabricator.wikimedia.org/P15687 and previous config saved to /var/cache/conftool/dbconfig/20210504-060523-root.json
  • 06:01 marostegui@cumin1001: dbctl commit (dc=all): 'db1118 (re)pooling @ 75%: Repool db1118', diff saved to https://phabricator.wikimedia.org/P15686 and previous config saved to /var/cache/conftool/dbconfig/20210504-060156-root.json
  • 05:51 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1167 to clone db1178 T275633', diff saved to https://phabricator.wikimedia.org/P15684 and previous config saved to /var/cache/conftool/dbconfig/20210504-055116-marostegui.json
  • 05:50 marostegui@cumin1001: dbctl commit (dc=all): 'db1158 (re)pooling @ 10%: Repool db1158', diff saved to https://phabricator.wikimedia.org/P15683 and previous config saved to /var/cache/conftool/dbconfig/20210504-055020-root.json
  • 05:46 marostegui@cumin1001: dbctl commit (dc=all): 'db1118 (re)pooling @ 50%: Repool db1118', diff saved to https://phabricator.wikimedia.org/P15682 and previous config saved to /var/cache/conftool/dbconfig/20210504-054653-root.json
  • 05:45 marostegui: Stop mysql on db1158 to clone db1178
  • 05:45 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1158 to clone db1178 T275633', diff saved to https://phabricator.wikimedia.org/P15680 and previous config saved to /var/cache/conftool/dbconfig/20210504-054539-marostegui.json
  • 05:36 marostegui: Deploy schema change on s6 codfw, lag will appear - T266486 T268392 T273360
  • 05:31 marostegui@cumin1001: dbctl commit (dc=all): 'db1118 (re)pooling @ 25%: Repool db1118', diff saved to https://phabricator.wikimedia.org/P15678 and previous config saved to /var/cache/conftool/dbconfig/20210504-053149-root.json
  • 05:26 marostegui@cumin1001: dbctl commit (dc=all): 'db1121 (re)pooling @ 100%: Repool db1121', diff saved to https://phabricator.wikimedia.org/P15677 and previous config saved to /var/cache/conftool/dbconfig/20210504-052612-root.json
  • 05:11 marostegui@cumin1001: dbctl commit (dc=all): 'db1121 (re)pooling @ 75%: Repool db1121', diff saved to https://phabricator.wikimedia.org/P15676 and previous config saved to /var/cache/conftool/dbconfig/20210504-051108-root.json
  • 05:07 marostegui: Restart sanitarium hosts to pick up new filters T263817
  • 04:56 marostegui@cumin1001: dbctl commit (dc=all): 'db1121 (re)pooling @ 50%: Repool db1121', diff saved to https://phabricator.wikimedia.org/P15675 and previous config saved to /var/cache/conftool/dbconfig/20210504-045605-root.json
  • 04:41 marostegui@cumin1001: dbctl commit (dc=all): 'db1121 (re)pooling @ 25%: Repool db1121', diff saved to https://phabricator.wikimedia.org/P15674 and previous config saved to /var/cache/conftool/dbconfig/20210504-044101-root.json
  • 04:06 ryankemper@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) reboot without plugin upgrade (3 nodes at a time) for ElasticSearch cluster search_eqiad: eqiad reboot to apply sec updates - ryankemper@cumin1001 - T280563
  • 03:38 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation reboot without plugin upgrade (3 nodes at a time) for ElasticSearch cluster search_eqiad: eqiad reboot to apply sec updates - ryankemper@cumin1001 - T280563
  • 03:38 ryankemper@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) reboot without plugin upgrade (3 nodes at a time) for ElasticSearch cluster search_eqiad: eqiad reboot to apply sec updates - ryankemper@cumin1001 - T280563
  • 03:36 ryankemper: T280563 `sudo -i cookbook sre.elasticsearch.rolling-operation search_eqiad "eqiad reboot to apply sec updates" --reboot --nodes-per-run 3 --start-datetime 2021-04-29T23:04:29 --task-id T280563`
  • 03:35 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation reboot without plugin upgrade (3 nodes at a time) for ElasticSearch cluster search_eqiad: eqiad reboot to apply sec updates - ryankemper@cumin1001 - T280563
  • 02:09 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on phab2002.codfw.wmnet with reason: REIMAGE
  • 02:07 pt1979@cumin2001: START - Cookbook sre.hosts.downtime for 2:00:00 on phab2002.codfw.wmnet with reason: REIMAGE
  • 01:41 ryankemper@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) reboot without plugin upgrade (3 nodes at a time) for ElasticSearch cluster search_eqiad: eqiad reboot to apply sec updates - ryankemper@cumin1001 - T280563

2021-05-03

  • 23:18 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 230ef57: Prepare for new configuration option (T277951) (duration: 00m 57s)
  • 23:15 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 7c47ee1: Replace $wgRelatedArticlesFooterWhitelistedSkins (T277958) (duration: 00m 57s)
  • 23:14 urbanecm@deploy1002: sync-file aborted: 7c47ee1: Replace $wgRelatedArticlesFooterWhitelistedSkins (T277958)¨ (duration: 00m 01s)
  • 22:17 legoktm: ran disable_list for: iegcom wikien-l fundraiser spcommittee-private-l spcommittee-l mediation-en-l test-second wikifr-colloque-l
  • 22:14 mutante: [backup1001:~] $ sudo check_bacula.py --icinga
  • 21:56 ryankemper: T280563 `sudo -i cookbook sre.elasticsearch.rolling-operation search_eqiad "eqiad reboot to apply sec updates" --reboot --nodes-per-run 3 --start-datetime 2021-04-29T23:04:29 --task-id T280563`
  • 21:55 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation reboot without plugin upgrade (3 nodes at a time) for ElasticSearch cluster search_eqiad: eqiad reboot to apply sec updates - ryankemper@cumin1001 - T280563
  • 21:54 ryankemper: T280563 eqiad reboot failed with: `curator.exceptions.FailedExecution: Exception encountered. Rerun with loglevel DEBUG and/or check Elasticsearch logs for more information. Exception: ConnectionTimeout caused by - ReadTimeoutError(HTTPSConnectionPool(host='search.svc.eqiad.wmnet', port=9243): Read timed out. (read timeout=10))`
  • 21:52 ryankemper@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) reboot without plugin upgrade (3 nodes at a time) for ElasticSearch cluster search_eqiad: eqiad reboot to apply sec updates - ryankemper@cumin1001 - T280563
  • 21:47 ryankemper: T280563 `sudo -i cookbook sre.elasticsearch.rolling-operation search_eqiad "eqiad reboot to apply sec updates" --reboot --nodes-per-run 3 --start-datetime 2021-04-29T23:04:29 --task-id T280563`
  • 21:46 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation reboot without plugin upgrade (3 nodes at a time) for ElasticSearch cluster search_eqiad: eqiad reboot to apply sec updates - ryankemper@cumin1001 - T280563
  • 21:32 krinkle@deploy1002: Synchronized wmf-config/InitialiseSettings.php: d95b91648 (duration: 00m 58s)
  • 21:27 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs1011.eqiad.wmnet with reason: REIMAGE
  • 21:25 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs1011.eqiad.wmnet with reason: REIMAGE
  • 21:22 ryankemper: [WDQS] `ryankemper@wdqs1003:~$ sudo pool`
  • 21:20 ryankemper: T280382 [WDQS] `ryankemper@puppetmaster1001:~$ sudo confctl select 'name=wdqs1011.eqiad.wmnet' set/pooled=no`
  • 21:19 ryankemper@puppetmaster1001: conftool action : set/pooled=no; selector: name=wdqs1011.eqiad.wmnet
  • 21:09 ryankemper: T280382 `sudo -i wmf-auto-reimage-host -p T280382 wdqs1011.eqiad.wmnet` on `ryankemper@cumin1001` tmux session `reimage`
  • 21:06 ryankemper: T280382 `sudo -i cookbook sre.wdqs.data-transfer --source wdqs2001.codfw.wmnet --dest wdqs2007.codfw.wmnet --reason "transferring fresh wikidata journal following reimage" --blazegraph_instance blazegraph` on `ryankemper@cumin1001` tmux session `reimage`
  • 21:05 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 21:02 ryankemper: T280382 `wdqs1010.eqiad.wmnet` has been re-imaged and had the appropriate wikidata/categories journal files transferred. `df -h` shows disk space is no longer an issue following the switch to `raid0`: `/dev/md2 2.6T 975G 1.5T 39% /srv`
  • 20:56 ryankemper: T280382 [WDQS] `ryankemper@wdqs2001:~$ sudo run-puppet-agent --force`
  • 20:44 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 20:42 ryankemper@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99)
  • 20:37 ryankemper: T280382 `sudo -i cookbook sre.wdqs.data-transfer --source wdqs2001.codfw.wmnet --dest wdqs2007.codfw.wmnet --reason "transferring fresh categories journal following reimage" --blazegraph_instance categories` on `ryankemper@cumin1001` tmux session `reimage`
  • 20:37 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 19:40 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs2007.codfw.wmnet with reason: REIMAGE
  • 19:39 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs2007.codfw.wmnet with reason: REIMAGE
  • 19:24 ryankemper: T280382 `sudo -i cookbook sre.wdqs.data-transfer --without-lvs --source wdqs1003.eqiad.wmnet --dest wdqs1010.eqiad.wmnet --reason "transferring fresh wikidata journal following reimage" --blazegraph_instance blazegraph` on `ryankemper@cumin1001` tmux session `reimage`
  • 19:24 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 19:21 ryankemper@puppetmaster1001: conftool action : set/pooled=no; selector: name=wdqs1004.eqiad.wmnet
  • 19:21 ryankemper: T280382 [WDQS] `sudo confctl select 'name=wdqs1004.eqiad.wmnet' set/pooled=no` (`wdqs1004` failed re-image [not sure why yet] and won't let me ssh in to depool so using conftool instead)
  • 18:20 Urbanecm: Morning B&C window done
  • 18:19 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.3/extensions/RelatedArticles/resources/ext.relatedArticles.readMore.bootstrap/index.js: cf9d9da: Hotfix: loadRelatedArticles should consider existence of container element (T281547) (duration: 00m 57s)
  • 18:15 urbanecm@deploy1002: Synchronized wmf-config/filebackend.php: bc1bc90: NOOP: beta: Use upload.wikimedia.beta.wmflabs.o for uploads (T281650; 2/2) (duration: 00m 57s)
  • 18:14 urbanecm@deploy1002: Synchronized wmf-config/CommonSettings.php: bc1bc90: NOOP: beta: Use upload.wikimedia.beta.wmflabs.o for uploads (T281650; 1/2) (duration: 00m 58s)
  • 17:44 ryankemper@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) reboot without plugin upgrade (3 nodes at a time) for ElasticSearch cluster search_eqiad: eqiad reboot to apply sec updates - ryankemper@cumin1001 - T280563
  • 17:20 hashar: Restarting CI Jenkins due to "Gearman worker contint2001.wikimedia.org_manager" thread dieing unexpectedly # T281737
  • 16:30 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation reboot without plugin upgrade (3 nodes at a time) for ElasticSearch cluster search_eqiad: eqiad reboot to apply sec updates - ryankemper@cumin1001 - T280563
  • 16:29 ryankemper: T281498 `sudo confctl select 'name=wdqs2004.codfw.wmnet' set/pooled=yes:weight=10` after merge of https://gerrit.wikimedia.org/r/c/operations/puppet/+/684435
  • 16:27 ryankemper@puppetmaster1001: conftool action : set/pooled=yes:weight=10; selector: name=wdqs2004.codfw.wmnet
  • 16:19 legoktm: legoktm@lists1001:~$ sudo apt install default-mysql-client # for temporary debugging
  • 15:48 pt1979@cumin2001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:44 pt1979@cumin2001: START - Cookbook sre.dns.netbox
  • 15:27 Amir1: upgrade group A to mailman3 (T280322)
  • 14:27 volans: uploaded conftool_1.3.1 to apt.wikimedia.org bullseye-wikimedia
  • 13:43 volans: uploaded cumin_4.1.0 to apt.wikimedia.org buster-wikimedia,bullseye-wikimedia
  • 13:10 Urbanecm: Run `User::newSystemUser( 'Maintenance script', [ 'steal' => true ] )` on cswiki to make the user a proper system user (T281703)
  • 12:36 kostajh: Backport window done
  • 12:33 kharlan@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: GrowthExperiments: Set default variant (T278123) GrowthExperiments: enable link recommendations frontend on cswiki (T278710) (duration: 00m 57s)
  • 12:07 kharlan@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: GrowthExperiments: enable link recommendations backend on cswiki (T278710) (duration: 00m 57s)
  • 11:56 kharlan@deploy1002: Synchronized php-1.37.0-wmf.3/extensions/GrowthExperiments: Backport: refreshLinkRecommendations.php: Use per-wiki locks Handle DB readonly errors (T281382) (duration: 00m 58s)
  • 11:15 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.3/extensions/Popups/: a438b64: Fix settings dialog offering ReferencePreviews when unavailable (T281352) (duration: 00m 58s)
  • 11:11 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: c5a7c67: Set wgGEMentorshipMigrationStage to SCHEMA_COMPAT_NEW everywhere (T279853) (duration: 00m 57s)
  • 11:04 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: f1a5ef0: wikidata: post edit constraint jobs on 70% of edits (T204031) (duration: 00m 57s)
  • 10:59 moritzm: installing avahi security updates on buster
  • 10:47 jdrewniak@deploy1002: Synchronized portals: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 00m 57s)
  • 10:46 jdrewniak@deploy1002: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 00m 58s)
  • 09:42 moritzm: installing python3.7 security updates
  • 09:41 joal@deploy1002: Finished deploy [analytics/refinery@584ed6a] (hadoop-test): Hotfix analytics deploy (monthly sqoop) HADOOP-TEST [analytics/refinery@584ed6a] (duration: 29m 24s)
  • 09:12 joal@deploy1002: Started deploy [analytics/refinery@584ed6a] (hadoop-test): Hotfix analytics deploy (monthly sqoop) HADOOP-TEST [analytics/refinery@584ed6a]
  • 09:10 joal@deploy1002: Finished deploy [analytics/refinery@584ed6a] (thin): Hotfix analytics deploy (monthly sqoop) THIN [analytics/refinery@584ed6a] (duration: 00m 07s)
  • 09:10 joal@deploy1002: Started deploy [analytics/refinery@584ed6a] (thin): Hotfix analytics deploy (monthly sqoop) THIN [analytics/refinery@584ed6a]
  • 09:09 joal@deploy1002: Finished deploy [analytics/refinery@584ed6a]: Hotfix analytics deploy (monthly sqoop) [analytics/refinery@584ed6a] (duration: 16m 06s)
  • 08:52 joal@deploy1002: Started deploy [analytics/refinery@584ed6a]: Hotfix analytics deploy (monthly sqoop) [analytics/refinery@584ed6a]
  • 08:01 moritzm: installing edk2 security updates
  • 07:31 moritzm: installing libimage-exiftool-perl security updates

2021-05-02

  • 13:40 dcaro@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on cloudmetrics1002.eqiad.wmnet with reason: Flaky host
  • 13:40 dcaro@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on cloudmetrics1002.eqiad.wmnet with reason: Flaky host

2021-05-01

  • 19:12 Urbanecm: Invalidate password for MaraBot@SUL (T281586)
  • 16:58 legoktm@deploy1002: Synchronized logos/config.yaml: Add eswiki 20th anniversary logos (duration: 00m 57s)
  • 16:56 legoktm@deploy1002: Synchronized wmf-config/logos.php: Use eswiki 20th anniversary logos (T280908) (duration: 00m 56s)
  • 16:50 legoktm@deploy1002: Synchronized static/images/project-logos/: Add eswiki 20th anniversary logos (duration: 00m 57s)
  • 07:22 elukey: powercycle elastic2033 - no ssh, no tty available via mgmt

Archives

See Server Admin Log/Archives.