You are browsing a read-only backup copy of Wikitech. The primary site can be found at wikitech.wikimedia.org

Server Admin Log

From Wikitech-static
Revision as of 16:16, 5 June 2021 by imported>Stashbot (Amir1: deleting all private archives of mm2. All are inaccessible now (T282303))
Jump to navigation Jump to search

2021-06-05

  • 16:16 Amir1: deleting all private archives of mm2. All are inaccessible now (T282303)
  • 15:21 Amir1: delete mbox files of group D and E in mm2 (T282303)
  • 14:35 oblivian@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 00:21 mutante: backup1001 - systemctl baclua-dir works again (restoring backup for non-existing host)
  • 00:18 mutante: backup1001 systemctl reload bacula-dir fails

2021-06-04

  • 22:08 cwhite@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host doh4001.wikimedia.org
  • 21:51 cwhite@cumin1001: START - Cookbook sre.ganeti.makevm for new host doh4001.wikimedia.org
  • 20:59 bblack: repool cp1087 - T278729
  • 20:11 bblack@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp1087.eqiad.wmnet with reason: REIMAGE
  • 20:09 bblack@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp1087.eqiad.wmnet with reason: REIMAGE
  • 19:06 bblack: depool cp1087 - T278729
  • 18:21 ayounsi@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 17:36 razzi@cumin1001: END (PASS) - Cookbook sre.aqs.roll-restart (exit_code=0)
  • 17:33 razzi@cumin1001: START - Cookbook sre.aqs.roll-restart
  • 17:33 razzi@cumin1001: END (FAIL) - Cookbook sre.aqs.roll-restart (exit_code=99)
  • 17:33 razzi@cumin1001: START - Cookbook sre.aqs.roll-restart
  • 17:28 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1040.eqiad.wmnet with reason: REIMAGE
  • 17:25 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1040.eqiad.wmnet with reason: REIMAGE
  • 15:25 topranks: Adding 1:1 NAT configuration for fran2001 / analytics.codfw.wikimedia.org to pfw3-codfw (backup site)
  • 14:47 krinkle@deploy1002: Synchronized wmf-config/CommonSettings.php: I434d9c (duration: 00m 56s)
  • 14:46 krinkle@deploy1002: Synchronized php-1.37.0-wmf.7/extensions/DiscussionTools/extension.json: Iea41ab (duration: 00m 56s)
  • 14:44 krinkle@deploy1002: Synchronized php-1.37.0-wmf.7/extensions/DiscussionTools/includes/: Iea41ab (duration: 00m 59s)
  • 14:41 krinkle@deploy1002: Scap failed!: 9/9 canaries failed their endpoint checks(https://en.wikipedia.org)
  • 13:39 Krinkle: mwmaint1002: Running purge_parsercache_now.php on pc1008, server 3/4, ref T282761
  • 13:33 oblivian@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 12:46 marostegui: Upgrade mysql on clouddb1016 T283235
  • 12:27 marostegui: Upgrade mysql on clouddb1015 T283235
  • 11:20 jbond: upload debmonitor-client_0.3.0-1+deb10u3_all.deb to apt
  • 10:59 topranks: Running homer for Gerrit 698162: Set up BGP peering to doh5001 in eqsin, triggering DoH /24 announcement there.
  • 09:47 ema: pool cp1087 T278729
  • 09:37 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host people1003.eqiad.wmnet
  • 09:33 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host people1003.eqiad.wmnet
  • 09:33 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host people2002.codfw.wmnet
  • 09:30 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host people2002.codfw.wmnet
  • 09:17 marostegui@cumin1001: dbctl commit (dc=all): 'db1110 (re)pooling @ 100%: Repool db1110', diff saved to https://phabricator.wikimedia.org/P16304 and previous config saved to /var/cache/conftool/dbconfig/20210604-091742-root.json
  • 09:06 ema: reboot cp1087 T278729
  • 09:02 marostegui@cumin1001: dbctl commit (dc=all): 'db1110 (re)pooling @ 75%: Repool db1110', diff saved to https://phabricator.wikimedia.org/P16303 and previous config saved to /var/cache/conftool/dbconfig/20210604-090239-root.json
  • 08:47 marostegui@cumin1001: dbctl commit (dc=all): 'db1110 (re)pooling @ 50%: Repool db1110', diff saved to https://phabricator.wikimedia.org/P16302 and previous config saved to /var/cache/conftool/dbconfig/20210604-084735-root.json
  • 08:33 marostegui: Upgrade db1110 T283235
  • 08:32 marostegui@cumin1001: dbctl commit (dc=all): 'db1110 (re)pooling @ 25%: Repool db1110', diff saved to https://phabricator.wikimedia.org/P16301 and previous config saved to /var/cache/conftool/dbconfig/20210604-083232-root.json
  • 08:29 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1110', diff saved to https://phabricator.wikimedia.org/P16300 and previous config saved to /var/cache/conftool/dbconfig/20210604-082956-marostegui.json
  • 08:20 godog: upgrade karma to 0.86-1
  • 07:38 jynus: stop and upgrade db1150 T283235
  • 07:33 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3315 (re)pooling @ 100%: Repool db1096:3315', diff saved to https://phabricator.wikimedia.org/P16299 and previous config saved to /var/cache/conftool/dbconfig/20210604-073326-root.json
  • 07:33 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3316 (re)pooling @ 100%: Repool db1096:3316', diff saved to https://phabricator.wikimedia.org/P16298 and previous config saved to /var/cache/conftool/dbconfig/20210604-073318-root.json
  • 07:29 moritzm: cleanup now unused nginx mods and former deps on install* and puppetdb* servers after switch towards nginx-light (various X11 libs and libxslt) T164456
  • 07:24 moritzm: cleanup now unused nginx mods and former deps on install* servers after switch towards nginx-light (various X11 libs and libxslt)
  • 07:19 urbanecm: Password reset for SUL User:Dominic_Mayers (T282656)
  • 07:18 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3315 (re)pooling @ 75%: Repool db1096:3315', diff saved to https://phabricator.wikimedia.org/P16297 and previous config saved to /var/cache/conftool/dbconfig/20210604-071823-root.json
  • 07:18 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3316 (re)pooling @ 75%: Repool db1096:3316', diff saved to https://phabricator.wikimedia.org/P16296 and previous config saved to /var/cache/conftool/dbconfig/20210604-071815-root.json
  • 07:03 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3315 (re)pooling @ 50%: Repool db1096:3315', diff saved to https://phabricator.wikimedia.org/P16295 and previous config saved to /var/cache/conftool/dbconfig/20210604-070319-root.json
  • 07:03 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3316 (re)pooling @ 50%: Repool db1096:3316', diff saved to https://phabricator.wikimedia.org/P16294 and previous config saved to /var/cache/conftool/dbconfig/20210604-070311-root.json
  • 06:48 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3315 (re)pooling @ 25%: Repool db1096:3315', diff saved to https://phabricator.wikimedia.org/P16293 and previous config saved to /var/cache/conftool/dbconfig/20210604-064815-root.json
  • 06:48 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3316 (re)pooling @ 25%: Repool db1096:3316', diff saved to https://phabricator.wikimedia.org/P16292 and previous config saved to /var/cache/conftool/dbconfig/20210604-064807-root.json
  • 06:46 ryankemper@cumin2002: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 06:42 marostegui: Upgrade mysql on db1096:3315 db1096:3316
  • 06:42 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1096:3316 db1096:3315', diff saved to https://phabricator.wikimedia.org/P16291 and previous config saved to /var/cache/conftool/dbconfig/20210604-064242-marostegui.json
  • 05:55 marostegui@cumin1001: dbctl commit (dc=all): 'db1121 (re)pooling @ 100%: Repool db1121', diff saved to https://phabricator.wikimedia.org/P16290 and previous config saved to /var/cache/conftool/dbconfig/20210604-055521-root.json
  • 05:40 marostegui@cumin1001: dbctl commit (dc=all): 'db1121 (re)pooling @ 75%: Repool db1121', diff saved to https://phabricator.wikimedia.org/P16289 and previous config saved to /var/cache/conftool/dbconfig/20210604-054017-root.json
  • 05:26 oblivian@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 05:25 marostegui@cumin1001: dbctl commit (dc=all): 'db1121 (re)pooling @ 50%: Repool db1121', diff saved to https://phabricator.wikimedia.org/P16288 and previous config saved to /var/cache/conftool/dbconfig/20210604-052514-root.json
  • 05:24 ryankemper: T280382 `sudo -i cookbook sre.wdqs.data-transfer --source wdqs2001.codfw.wmnet --dest wdqs2002.codfw.wmnet --reason "transferring fresh wikidata journal following reimage" --blazegraph_instance blazegraph` on `ryankemper@cumin2002` tmux session `wdqs_reimage`
  • 05:23 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-transfer
  • 05:22 ryankemper@cumin2002: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 05:17 ryankemper: T280382 `sudo -i cookbook sre.wdqs.data-transfer --source wdqs2001.codfw.wmnet --dest wdqs2002.codfw.wmnet --reason "transferring fresh categories journal following reimage" --blazegraph_instance categories` on `ryankemper@cumin2002` tmux session `wdqs_reimage`
  • 05:16 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-transfer
  • 05:10 marostegui@cumin1001: dbctl commit (dc=all): 'db1121 (re)pooling @ 25%: Repool db1121', diff saved to https://phabricator.wikimedia.org/P16287 and previous config saved to /var/cache/conftool/dbconfig/20210604-051010-root.json
  • 04:43 ryankemper@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs2002.codfw.wmnet with reason: REIMAGE
  • 04:41 ryankemper@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs2002.codfw.wmnet with reason: REIMAGE
  • 04:25 ryankemper: T280382 `sudo -i wmf-auto-reimage-host -p T280382 wdqs2002.codfw.wmnet` on `ryankemper@cumin2002` tmux session `wdqs_reimage`
  • 04:22 ryankemper: T280382 `wdqs2001.codfw.wmnet` has been re-imaged and had the appropriate wikidata/categories journal files transferred. `df -h` shows disk space is no longer an issue following the switch to `raid0`: `/dev/md2 2.9T 998G 1.8T 36% /srv`
  • 03:49 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 02:42 ryankemper@cumin2002: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 02:33 ryankemper: [WDQS] `sudo -i cookbook sre.wdqs.data-transfer --source wdqs1006.eqiad.wmnet --dest wdqs1013.eqiad.wmnet --reason "repair overinflated wikidata jnl" --blazegraph_instance blazegraph`
  • 02:32 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 02:30 ryankemper: T280382 `wdqs1005.eqiad.wmnet` has been re-imaged and had the appropriate wikidata/categories journal files transferred. `df -h` shows disk space is no longer an issue following the switch to `raid0`: `/dev/md2 2.9T 998G 1.8T 36% /srv`
  • 02:25 ryankemper: [WDQS] `ryankemper@wdqs1012:~$ sudo pool` (caught up on lag)
  • 02:09 ryankemper: T280382 `sudo -i cookbook sre.wdqs.data-transfer --source wdqs2007.codfw.wmnet --dest wdqs2001.codfw.wmnet --reason "transferring fresh wikidata journal following reimage" --blazegraph_instance blazegraph` on `ryankemper@cumin2002` tmux session `wdqs_reimage`
  • 02:06 ebernhardson: post-deploy restart airflow-(webserver|scheduer) on an-airflow1001
  • 02:05 ebernhardson@deploy1002: Finished deploy [wikimedia/discovery/analytics@500179f]: Stop overwriting uploads in swift (duration: 04m 40s)
  • 02:00 ebernhardson@deploy1002: Started deploy [wikimedia/discovery/analytics@500179f]: Stop overwriting uploads in swift
  • 01:38 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 01:24 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-transfer
  • 00:12 ryankemper@cumin2002: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 00:08 reedy@deploy1002: Synchronized wmf-config/CommonSettings.php: T280886 (duration: 00m 57s)
  • 00:07 ryankemper: T280382 `sudo -i cookbook sre.wdqs.data-transfer --source wdqs2007.codfw.wmnet --dest wdqs2001.codfw.wmnet --reason "transferring fresh categories journal following reimage" --blazegraph_instance categories` on `ryankemper@cumin2002` tmux session `wdqs_reimage`
  • 00:06 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-transfer
  • 00:05 ryankemper: T280382 `sudo -i cookbook sre.wdqs.data-transfer --source wdqs1008.eqiad.wmnet --dest wdqs1005.eqiad.wmnet --reason "transferring fresh wikidata journal following reimage" --blazegraph_instance blazegraph` on `ryankemper@cumin1001` tmux session `wdqs_reimage`
  • 00:05 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 00:05 ryankemper@cumin1001: END (ERROR) - Cookbook sre.wdqs.data-transfer (exit_code=97)

2021-06-03

  • 23:41 reedy@deploy1002: Synchronized wmf-config/CommonSettings.php: T280886 (duration: 00m 56s)
  • 23:40 reedy@deploy1002: Synchronized wmf-config/InitialiseSettings.php: T280886 (duration: 00m 57s)
  • 23:33 mutante: installing OS on fresh VM doh5001
  • 23:30 ryankemper@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs2001.codfw.wmnet with reason: REIMAGE
  • 23:28 ryankemper@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs2001.codfw.wmnet with reason: REIMAGE
  • 23:09 thcipriani@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Restrict changetags to sysops and bots on meta T283625 (duration: 00m 58s)
  • 22:41 ryankemper: T280382 `sudo -i wmf-auto-reimage-host -p T280382 wdqs2001.codfw.wmnet` on `ryankemper@cumin2002` tmux session `wdqs_reimage`
  • 22:39 ryankemper: T280382 `sudo -i cookbook sre.wdqs.data-transfer --source wdqs1008.eqiad.wmnet --dest wdqs1005.eqiad.wmnet --reason "transferring fresh wikidata journal following reimage" --blazegraph_instance blazegraph` on `ryankemper@cumin1001` tmux session `wdqs_reimage`
  • 22:39 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 22:36 ryankemper: T280382 Cancelled transfer to `wdqs1005`; the source host `wdqs1013` has a `wikidata.jnl` that is 80% too big; will transfer from different node -> `wdqs1005` and then fix the journal on `wdqs1013` after
  • 22:36 ryankemper@cumin1001: END (ERROR) - Cookbook sre.wdqs.data-transfer (exit_code=97)
  • 22:35 ryankemper: T280382 `wdqs2005.codfw.wmnet` has been re-imaged and had the appropriate wikidata/categories journal files transferred. `df -h` shows disk space is no longer an issue following the switch to `raid0`: `/dev/md2 2.6T 998G 1.5T 40% /srv`
  • 22:28 robh@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 22:15 robh@cumin1001: START - Cookbook sre.dns.netbox
  • 21:55 ryankemper@cumin2002: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 20:54 shdubsh: restart kafka on kafka-logging to take new retention config
  • 20:47 sbassett: Deployed security patch for T282932
  • 20:37 ebernhardson: restart mjolnir-kafka-bulk-daemon on search-loader[12]001
  • 20:35 ebernhardson@deploy1002: Finished deploy [search/mjolnir/deploy@1c40c83]: bulk daemon: accept events for search_updates swift container (duration: 01m 00s)
  • 20:34 ryankemper: T280382 `sudo -i cookbook sre.wdqs.data-transfer --source wdqs1013.eqiad.wmnet --dest wdqs1005.eqiad.wmnet --reason "transferring fresh wikidata journal following reimage" --blazegraph_instance blazegraph` on `ryankemper@cumin1001` tmux session `wdqs_reimage`
  • 20:34 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 20:34 ebernhardson@deploy1002: Started deploy [search/mjolnir/deploy@1c40c83]: bulk daemon: accept events for search_updates swift container
  • 20:34 ryankemper: T280382 `sudo -i cookbook sre.wdqs.data-transfer --source wdqs2001.codfw.wmnet --dest wdqs2005.codfw.wmnet --reason "transferring fresh wikidata journal following reimage" --blazegraph_instance blazegraph` on `ryankemper@cumin2002` tmux session `wdqs_reimage`
  • 20:34 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-transfer
  • 19:58 mutante: [mwmaint1002:~] $ /usr/local/bin/systemd-timer-mail-wrapper -T root@mwmaint1002.eqiad.wmnet --only-on-error /usr/local/bin/cross-validate-accounts
  • 19:56 mutante: [mwmaint1002:~] $ sudo systemctl start daily_account_consistency_check.service
  • 19:41 dzahn@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host doh5002.wikimedia.org
  • 19:41 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm for new host doh5002.wikimedia.org
  • 19:39 ebernhardson@deploy1002: Finished deploy [wikimedia/discovery/analytics@339d402]: ship pip and wheel packages for virtualenvs (duration: 04m 27s)
  • 19:37 dzahn@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host doh5001.wikimedia.org
  • 19:34 ebernhardson@deploy1002: Started deploy [wikimedia/discovery/analytics@339d402]: ship pip and wheel packages for virtualenvs
  • 19:33 mutante: [deneb:~] $ sudo systemctl start docker-reporter-releng-images - T251918 - icinga-wm> RECOVERY - Check systemd state on deneb is OK
  • 19:33 ryankemper@cumin2002: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 19:32 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 19:32 mutante: [deneb:~] $ sudo systemctl start docker-reporter-releng-images
  • 19:28 ryankemper: T280382 `sudo -i cookbook sre.wdqs.data-transfer --source wdqs2001.codfw.wmnet --dest wdqs2005.codfw.wmnet --reason "transferring fresh categories journal following reimage" --blazegraph_instance categories` on `ryankemper@cumin2002` tmux session `wdqs_reimage`
  • 19:27 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-transfer
  • 19:27 ryankemper: T280382 `sudo -i cookbook sre.wdqs.data-transfer --source wdqs1013.eqiad.wmnet --dest wdqs1005.eqiad.wmnet --reason "transferring fresh categories journal following reimage" --blazegraph_instance categories` on `ryankemper@cumin1001` tmux session `wdqs_reimage`
  • 19:27 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 19:23 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm for new host doh5001.wikimedia.org
  • 19:14 mutante: install1003 - restarting nginx after we switched from nginx-full to nginx-light package, same on other install servers T164456
  • 19:05 ryankemper@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs2005.codfw.wmnet with reason: REIMAGE
  • 19:03 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs1005.eqiad.wmnet with reason: REIMAGE
  • 19:03 ryankemper@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs2005.codfw.wmnet with reason: REIMAGE
  • 19:01 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs1005.eqiad.wmnet with reason: REIMAGE
  • 18:52 ebernhardson@deploy1002: Finished deploy [wikimedia/discovery/analytics@f40d41a]: resolve npe in datawriter (duration: 00m 31s)
  • 18:51 ebernhardson@deploy1002: Started deploy [wikimedia/discovery/analytics@f40d41a]: resolve npe in datawriter
  • 18:46 ryankemper: T280382 `sudo -i wmf-auto-reimage-host -p T280382 wdqs2005.codfw.wmnet` on `ryankemper@cumin2002` tmux session `wdqs_reimage`
  • 18:46 ryankemper: T280382 `sudo -i wmf-auto-reimage-host -p T280382 wdqs1005.eqiad.wmnet` on `ryankemper@cumin1001` tmux session `wdqs_reimage`
  • 18:39 ryankemper: [WDQS] depooled `wdqs1012` (has ~15 hours of lag to catch up on)
  • 18:37 ryankemper: [WDQS] `ryankemper@wdqs1012:~$ sudo systemctl restart wdqs-blazegraph` (blazegraph on the host has been locked up for ~16 hours based off of https://grafana.wikimedia.org/d/000000489/wikidata-query-service?orgId=1&var-cluster_name=wdqs&from=1622683465757&to=1622745461547)
  • 18:37 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on cp1087.eqiad.wmnet with reason: replaced DIMM https://phabricator.wikimedia.org/T278729
  • 18:37 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on cp1087.eqiad.wmnet with reason: replaced DIMM https://phabricator.wikimedia.org/T278729
  • 18:28 mutante: temp. disabling puppet on install* servers. switching nginx to light variant (T164456)
  • 18:16 ebernhardson@deploy1002: Finished deploy [wikimedia/discovery/analytics@659a8e4]: resolve npe in datawriter (duration: 00m 15s)
  • 18:16 ebernhardson@deploy1002: Started deploy [wikimedia/discovery/analytics@659a8e4]: resolve npe in datawriter
  • 17:49 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on moss-be1002.eqiad.wmnet with reason: REIMAGE
  • 17:47 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on moss-be1001.eqiad.wmnet with reason: REIMAGE
  • 17:47 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on moss-be1002.eqiad.wmnet with reason: REIMAGE
  • 17:45 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on moss-be1001.eqiad.wmnet with reason: REIMAGE
  • 17:37 brennen: gitlab1001: re-running install-gitlab-server.sh
  • 17:16 urandom: remove dropped Cassandra keyspace snapshots -- T258414
  • 16:55 ejegg: updated payments-wiki from 6fac77f60e to 7be0534b91
  • 16:23 ayounsi@cumin1001: START - Cookbook sre.dns.netbox
  • 15:49 topranks: Gerrit 697993: Change BGP peer IP for doh3002 on esams CRs.
  • 15:27 papaul: pdu replacement complete
  • 15:25 moritzm: upgrading gitlab to 13.11.5
  • 15:08 papaul: disconnect ps2-d8-codfw for replacement
  • 14:55 oblivian@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 14:54 topranks: Gerrit 697970: Add Wikidough BGP peerings on esams CRs for doh3001 and doh3002.
  • 14:23 moritzm: installing nginx security updates on buster
  • 14:12 moritzm: installing postgresql-9.6 security updates
  • 13:55 oblivian@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 13:25 oblivian@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 13:19 oblivian@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 13:17 oblivian@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 13:01 marostegui@cumin1001: dbctl commit (dc=all): 'db1112 (re)pooling @ 100%: Repool db1112', diff saved to https://phabricator.wikimedia.org/P16285 and previous config saved to /var/cache/conftool/dbconfig/20210603-130059-root.json
  • 12:45 marostegui@cumin1001: dbctl commit (dc=all): 'db1112 (re)pooling @ 75%: Repool db1112', diff saved to https://phabricator.wikimedia.org/P16284 and previous config saved to /var/cache/conftool/dbconfig/20210603-124556-root.json
  • 12:32 marostegui@cumin1001: dbctl commit (dc=all): 'db1157 (re)pooling @ 100%: Repool db1157', diff saved to https://phabricator.wikimedia.org/P16283 and previous config saved to /var/cache/conftool/dbconfig/20210603-123243-root.json
  • 12:30 marostegui@cumin1001: dbctl commit (dc=all): 'db1112 (re)pooling @ 50%: Repool db1112', diff saved to https://phabricator.wikimedia.org/P16282 and previous config saved to /var/cache/conftool/dbconfig/20210603-123052-root.json
  • 12:17 marostegui@cumin1001: dbctl commit (dc=all): 'db1157 (re)pooling @ 75%: Repool db1157', diff saved to https://phabricator.wikimedia.org/P16281 and previous config saved to /var/cache/conftool/dbconfig/20210603-121739-root.json
  • 12:15 marostegui@cumin1001: dbctl commit (dc=all): 'db1112 (re)pooling @ 25%: Repool db1112', diff saved to https://phabricator.wikimedia.org/P16280 and previous config saved to /var/cache/conftool/dbconfig/20210603-121548-root.json
  • 12:12 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1112', diff saved to https://phabricator.wikimedia.org/P16279 and previous config saved to /var/cache/conftool/dbconfig/20210603-121205-marostegui.json
  • 12:11 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 100%: Repool db1166', diff saved to https://phabricator.wikimedia.org/P16278 and previous config saved to /var/cache/conftool/dbconfig/20210603-121133-root.json
  • 12:06 moritzm: restarting FPM on mw canaries to pick up lz4 update
  • 12:03 moritzm: installing lz4 security updates on buster
  • 12:02 marostegui@cumin1001: dbctl commit (dc=all): 'db1157 (re)pooling @ 50%: Repool db1157', diff saved to https://phabricator.wikimedia.org/P16277 and previous config saved to /var/cache/conftool/dbconfig/20210603-120235-root.json
  • 11:56 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 75%: Repool db1166', diff saved to https://phabricator.wikimedia.org/P16276 and previous config saved to /var/cache/conftool/dbconfig/20210603-115628-root.json
  • 11:53 moritzm: installing curl security updates on stretch
  • 11:47 marostegui@cumin1001: dbctl commit (dc=all): 'db1157 (re)pooling @ 25%: Repool db1157', diff saved to https://phabricator.wikimedia.org/P16275 and previous config saved to /var/cache/conftool/dbconfig/20210603-114731-root.json
  • 11:45 marostegui@cumin1001: dbctl commit (dc=all): 'db1175 (re)pooling @ 100%: Repool db1175', diff saved to https://phabricator.wikimedia.org/P16274 and previous config saved to /var/cache/conftool/dbconfig/20210603-114503-root.json
  • 11:43 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1157', diff saved to https://phabricator.wikimedia.org/P16273 and previous config saved to /var/cache/conftool/dbconfig/20210603-114325-marostegui.json
  • 11:41 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 50%: Repool db1166', diff saved to https://phabricator.wikimedia.org/P16272 and previous config saved to /var/cache/conftool/dbconfig/20210603-114124-root.json
  • 11:30 marostegui@cumin1001: dbctl commit (dc=all): 'db1175 (re)pooling @ 75%: Repool db1175', diff saved to https://phabricator.wikimedia.org/P16271 and previous config saved to /var/cache/conftool/dbconfig/20210603-113000-root.json
  • 11:26 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 25%: Repool db1166', diff saved to https://phabricator.wikimedia.org/P16270 and previous config saved to /var/cache/conftool/dbconfig/20210603-112620-root.json
  • 11:22 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1166', diff saved to https://phabricator.wikimedia.org/P16269 and previous config saved to /var/cache/conftool/dbconfig/20210603-112243-marostegui.json
  • 11:14 marostegui@cumin1001: dbctl commit (dc=all): 'db1175 (re)pooling @ 50%: Repool db1175', diff saved to https://phabricator.wikimedia.org/P16268 and previous config saved to /var/cache/conftool/dbconfig/20210603-111456-root.json
  • 11:10 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: e840968: jawiki: extended confirmed should be 120 days since first edit, not registration (T284212) (duration: 00m 58s)
  • 11:09 marostegui@cumin1001: dbctl commit (dc=all): 'db1179 (re)pooling @ 100%: Repool db1179', diff saved to https://phabricator.wikimedia.org/P16267 and previous config saved to /var/cache/conftool/dbconfig/20210603-110906-root.json
  • 10:59 marostegui@cumin1001: dbctl commit (dc=all): 'db1175 (re)pooling @ 25%: Repool db1175', diff saved to https://phabricator.wikimedia.org/P16266 and previous config saved to /var/cache/conftool/dbconfig/20210603-105953-root.json
  • 10:55 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1175', diff saved to https://phabricator.wikimedia.org/P16265 and previous config saved to /var/cache/conftool/dbconfig/20210603-105536-marostegui.json
  • 10:54 marostegui@cumin1001: dbctl commit (dc=all): 'db1179 (re)pooling @ 75%: Repool db1179', diff saved to https://phabricator.wikimedia.org/P16264 and previous config saved to /var/cache/conftool/dbconfig/20210603-105402-root.json
  • 10:52 jiji@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 10:41 godog: test librenms/AM paging
  • 10:40 jiji@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 10:38 marostegui@cumin1001: dbctl commit (dc=all): 'db1179 (re)pooling @ 50%: Repool db1179', diff saved to https://phabricator.wikimedia.org/P16263 and previous config saved to /var/cache/conftool/dbconfig/20210603-103858-root.json
  • 10:28 jiji@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 10:23 marostegui@cumin1001: dbctl commit (dc=all): 'db1179 (re)pooling @ 25%: Repool db1179', diff saved to https://phabricator.wikimedia.org/P16262 and previous config saved to /var/cache/conftool/dbconfig/20210603-102354-root.json
  • 10:21 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5 days, 0:00:00 on pc2008.codfw.wmnet,pc1008.eqiad.wmnet with reason: Purging parsercache T282761
  • 10:21 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 5 days, 0:00:00 on pc2008.codfw.wmnet,pc1008.eqiad.wmnet with reason: Purging parsercache T282761
  • 10:19 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1179', diff saved to https://phabricator.wikimedia.org/P16261 and previous config saved to /var/cache/conftool/dbconfig/20210603-101950-marostegui.json
  • 10:13 kormat@deploy1002: Synchronized wmf-config/db-eqiad.php: Set pc1010 as pc2 primary T282761 (duration: 00m 58s)
  • 09:38 marostegui: Deploy schema change on s3 codfw master (with replication) - T282373 T282372 T282371
  • 09:37 moritzm: upgrading eqiad to debmonitor-client 0.3.0 (along with deleting/recreating system user within 100-499 range) T235162
  • 08:55 moritzm: uploading gitlab-ce 13.11.5-ce to apt.wikimedia.org thirdparty/gitlab
  • 08:43 oblivian@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 08:37 moritzm: upgrading codfw to debmonitor-client 0.3.0 (along with deleting/recreating system user within 100-499 range) T235162
  • 08:23 oblivian@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 08:19 oblivian@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 08:09 moritzm: upgrading esams/eqsin to debmonitor-client 0.3.0 (along with deleting/recreating system user within 100-499 range)
  • 07:52 ryankemper: [WDQS] Pooled `wdqs1008` and `wdqs2006` (all caught up on lag)
  • 07:48 moritzm: uploaded debmonitor-client 0.3.0-1+deb10u2 to apt.wikimedia.org
  • 06:24 ryankemper: [WDQS] De-pooled `wdqs1008` and `wdqs2006` (~1 hour of lag to catch up on)
  • 06:23 ryankemper: T280382 `wdqs2006.codfw.wmnet` has been re-imaged and had the appropriate wikidata/categories journal files transferred. `df -h` shows disk space is no longer an issue following the switch to `raid0`: `/dev/md2 2.6T 998G 1.5T 40% /srv`
  • 06:23 ryankemper: T280382 `wdqs1008.eqiad.wmnet` has been re-imaged and had the appropriate wikidata/categories journal files transferred. `df -h` shows disk space is no longer an issue following the switch to `raid0`: `/dev/md2 2.6T 998G 1.5T 40% /srv`
  • 06:07 ryankemper@cumin2002: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 06:05 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 05:20 marostegui: Deploy schema change on db1121, lag will appear on s4 (commonswiki) wiki replicas - T266486 T268392 T273360
  • 05:18 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1121', diff saved to https://phabricator.wikimedia.org/P16259 and previous config saved to /var/cache/conftool/dbconfig/20210603-051853-marostegui.json
  • 05:14 marostegui@cumin1001: dbctl commit (dc=all): 'db1144:3314 (re)pooling @ 100%: Repool db1144:3314', diff saved to https://phabricator.wikimedia.org/P16258 and previous config saved to /var/cache/conftool/dbconfig/20210603-051402-root.json
  • 04:58 marostegui@cumin1001: dbctl commit (dc=all): 'db1144:3314 (re)pooling @ 75%: Repool db1144:3314', diff saved to https://phabricator.wikimedia.org/P16257 and previous config saved to /var/cache/conftool/dbconfig/20210603-045859-root.json
  • 04:43 marostegui@cumin1001: dbctl commit (dc=all): 'db1144:3314 (re)pooling @ 50%: Repool db1144:3314', diff saved to https://phabricator.wikimedia.org/P16256 and previous config saved to /var/cache/conftool/dbconfig/20210603-044355-root.json
  • 04:37 ryankemper: T280382 `sudo -i cookbook sre.wdqs.data-transfer --source wdqs1005.eqiad.wmnet --dest wdqs1008.eqiad.wmnet --reason "transferring fresh wikidata journal following reimage" --blazegraph_instance blazegraph` on `ryankemper@cumin1001` tmux session `wdqs_reimage`
  • 04:36 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 04:36 ryankemper: T280382 `sudo -i cookbook sre.wdqs.data-transfer --source wdqs2004.codfw.wmnet --dest wdqs2006.codfw.wmnet --reason "transferring fresh wikidata journal following reimage" --blazegraph_instance blazegraph` on `ryankemper@cumin2002` tmux session `wdqs_reimage`
  • 04:36 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-transfer
  • 04:35 ryankemper@cumin2002: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 04:34 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 04:30 ryankemper: T280382 `sudo -i cookbook sre.wdqs.data-transfer --source wdqs2004.codfw.wmnet --dest wdqs2006.codfw.wmnet --reason "transferring fresh categories journal following reimage" --blazegraph_instance categories` on `ryankemper@cumin2002` tmux session `wdqs_reimage`
  • 04:29 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-transfer
  • 04:29 ryankemper: T280382 `sudo -i cookbook sre.wdqs.data-transfer --source wdqs1005.eqiad.wmnet --dest wdqs1008.eqiad.wmnet --reason "transferring fresh categories journal following reimage" --blazegraph_instance categories` on `ryankemper@cumin1001` tmux session `wdqs_reimage`
  • 04:29 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 04:28 marostegui@cumin1001: dbctl commit (dc=all): 'db1144:3314 (re)pooling @ 25%: Repool db1144:3314', diff saved to https://phabricator.wikimedia.org/P16255 and previous config saved to /var/cache/conftool/dbconfig/20210603-042851-root.json
  • 02:22 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs1008.eqiad.wmnet with reason: REIMAGE
  • 02:20 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs1008.eqiad.wmnet with reason: REIMAGE
  • 02:09 ryankemper@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs2006.codfw.wmnet with reason: REIMAGE
  • 02:07 ryankemper: T280382 `sudo -i wmf-auto-reimage-host -p T280382 wdqs1008.eqiad.wmnet` on `ryankemper@cumin1001` tmux session `wdqs_reimage`
  • 02:07 ryankemper@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs2006.codfw.wmnet with reason: REIMAGE
  • 02:05 ryankemper: T280382 `wdqs1003.eqiad.wmnet` has been re-imaged and had the appropriate wikidata/categories journal files transferred. `df -h` shows disk space is no longer an issue following the switch to `raid0`: `/dev/md2 2.9T 998G 1.8T 36% /srv`
  • 02:04 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 01:51 ryankemper: T280382 `sudo -i wmf-auto-reimage-host -p T280382 wdqs2006.codfw.wmnet` on `ryankemper@cumin2002` tmux session `wdqs_reimage`
  • 01:47 ryankemper: T280382 `wdqs2003.codfw.wmnet` has been re-imaged and had the appropriate wikidata/categories journal files transferred. `df -h` shows disk space is no longer an issue following the switch to `raid0`: `/dev/md2 2.9T 998G 1.8T 36% /srv`
  • 01:43 ryankemper: [WDQS] Pooled `wdqs1004` (caught up on lag)
  • 01:25 ryankemper@cumin2002: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 00:40 ladsgroup@deploy1002: Synchronized php-1.37.0-wmf.7/extensions/Gadgets: Backport: Reduce message parse in GadgetHooks::getPreferences (second time) (T58633 T278650), Try II (duration: 00m 57s)
  • 00:36 ladsgroup@deploy1002: Synchronized php-1.37.0-wmf.7/includes/user/UserOptionsManager.php: Backport: user: Accept options-messages for multiselect user options (T58633 T278650) (duration: 00m 57s)
  • 00:35 ryankemper: T280382 `sudo -i cookbook sre.wdqs.data-transfer --source wdqs1007.eqiad.wmnet --dest wdqs1003.eqiad.wmnet --reason "transferring fresh wikidata journal following reimage" --blazegraph_instance blazegraph` on `ryankemper@cumin1001` tmux session `wdqs_reimage`
  • 00:35 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 00:23 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 00:18 ryankemper: T280382 `sudo -i cookbook sre.wdqs.data-transfer --source wdqs1007.eqiad.wmnet --dest wdqs1003.eqiad.wmnet --reason "transferring fresh categories journal following reimage" --blazegraph_instance categories` on `ryankemper@cumin1001` tmux session `wdqs_reimage`
  • 00:18 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 00:18 ryankemper@cumin1001: END (ERROR) - Cookbook sre.wdqs.data-transfer (exit_code=97)

2021-06-02

  • 23:57 ryankemper: T280382 `sudo -i cookbook sre.wdqs.data-transfer --source wdqs2007.codfw.wmnet --dest wdqs2003.codfw.wmnet --reason "transferring fresh wikidata journal following reimage" --blazegraph_instance blazegraph` on `ryankemper@cumin2002` tmux session `wdqs_reimage`
  • 23:57 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-transfer
  • 23:56 ryankemper: T280382 `sudo -i cookbook sre.wdqs.data-transfer --source wdqs1004.eqiad.wmnet --dest wdqs1003.eqiad.wmnet --reason "transferring fresh categories journal following reimage" --blazegraph_instance categories` on `ryankemper@cumin1001` tmux session `wdqs_reimage`
  • 23:56 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 23:53 ryankemper@cumin2002: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 23:47 ryankemper: T280382 `wdqs1004.eqiad.wmnet` has been re-imaged and had the appropriate wikidata/categories journal files transferred. `df -h` shows disk space is no longer an issue following the switch to `raid0`: `/dev/md2 2.9T 998G 1.8T 36% /srv`
  • 23:41 ladsgroup@deploy1002: scap failed: average error rate on 4/9 canaries increased by 10x (rerun with --force to override this check, see https://logstash.wikimedia.org/goto/83629bcb5560d11e61d3085c89dd9ed6 for details)
  • 23:38 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 23:28 ryankemper: T280382 `sudo -i cookbook sre.wdqs.data-transfer --source wdqs2007.codfw.wmnet --dest wdqs2003.codfw.wmnet --reason "transferring fresh categories journal following reimage" --blazegraph_instance categories` on `ryankemper@cumin2002` tmux session `wdqs_reimage`
  • 23:28 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-transfer
  • 23:26 ryankemper: T280382 `wdqs2007.codfw.wmnet` has been re-imaged and had the appropriate wikidata/categories journal files transferred. `df -h` shows disk space is no longer an issue following the switch to `raid10`: `/dev/mapper/vg0-srv 2.7T 998G 1.6T 39% /srv`
  • 23:24 ryankemper@cumin2002: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 23:18 ladsgroup@deploy1002: Synchronized php-1.37.0-wmf.7/includes: Backport: Allow html form field option 'options-messages' to get parsed (T58633) (duration: 01m 01s)
  • 22:56 ryankemper@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs2003.codfw.wmnet with reason: REIMAGE
  • 22:54 ryankemper@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs2003.codfw.wmnet with reason: REIMAGE
  • 22:48 ladsgroup@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Enable wgVectorConsolidateUserLinks on the beta cluster (T266536) (duration: 00m 57s)
  • 22:39 ryankemper: T280382 `sudo -i wmf-auto-reimage-host -p T280382 --new wdqs2003.codfw.wmnet` on `ryankemper@cumin2002` tmux session `wdqs_reimage_2`
  • 22:34 ryankemper: T280382 Cleaned up no-longer-needed files removed in https://gerrit.wikimedia.org/r/c/operations/puppet/+/697832 => `ryankemper@cumin1001:~$ sudo -E cumin -b 2 'P{apt*}' 'sudo rm -rfv /srv/tftpboot/buster-raid0-installer/pxelinux.cfg'`
  • 22:30 ryankemper: T280382 Cleaned up no-longer-needed files removed in https://gerrit.wikimedia.org/r/c/operations/puppet/+/697832 => `ryankemper@cumin1001:~$ sudo -E cumin -b 6 'P{install*}' 'sudo rm -fv /srv/tftpboot/buster-raid0-installer/pxelinux.cfg'`
  • 22:27 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs1003.eqiad.wmnet with reason: REIMAGE
  • 22:25 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs1003.eqiad.wmnet with reason: REIMAGE
  • 22:19 Amir1: setting charset of all tables in wikitech to binary (T284108 T269348)
  • 22:11 ryankemper: T280382 `sudo -i wmf-auto-reimage-host -p T280382 --new wdqs1003.eqiad.wmnet` on `ryankemper@cumin1001` tmux session `wdqs_reimage_2`
  • 22:08 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 22:07 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-transfer
  • 22:07 ryankemper@puppetmaster1001: conftool action : set/pooled=yes; selector: name=wdqs1004.eqiad.wmnet
  • 22:07 ryankemper@puppetmaster1001: conftool action : set/pooled=yes; selector: name=wdqs2007.codfw.wmnet
  • 22:05 ryankemper@cumin2002: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 22:01 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 21:59 ryankemper: T280382 `sudo -i cookbook sre.wdqs.data-transfer --source wdqs2008.codfw.wmnet --dest wdqs2007.codfw.wmnet --reason "transferring fresh categories journal following reimage" --blazegraph_instance categories` on `ryankemper@cumin2002` tmux session `wdqs_reimage`
  • 21:59 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-transfer
  • 21:56 ryankemper: T280382 `sudo -i cookbook sre.wdqs.data-transfer --source wdqs1006.eqiad.wmnet --dest wdqs1004.eqiad.wmnet --reason "transferring fresh categories journal following reimage" --blazegraph_instance categories` on `ryankemper@cumin1001` tmux session `wdqs_reimage`
  • 21:55 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 21:39 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs1004.eqiad.wmnet with reason: REIMAGE
  • 21:38 dzahn@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host doh3002.wikimedia.org
  • 21:37 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs1004.eqiad.wmnet with reason: REIMAGE
  • 21:32 ryankemper@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs2007.codfw.wmnet with reason: REIMAGE
  • 21:30 ryankemper@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs2007.codfw.wmnet with reason: REIMAGE
  • 21:28 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm for new host doh3002.wikimedia.org
  • 21:21 dzahn@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host doh3001.wikimedia.org
  • 21:19 ryankemper@puppetmaster1001: conftool action : set/pooled=inactive; selector: name=wdqs2007.codfw.wmnet
  • 21:17 ryankemper: `ryankemper@wdqs1013:~$ sudo depool` (catching up on 17.9h lag)
  • 21:12 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm for new host doh3001.wikimedia.org
  • 21:10 ryankemper: T280382 T281437 `sudo -i wmf-auto-reimage-host -p T280382 --new wdqs2007.codfw.wmnet` on `ryankemper@cumin2002` tmux session `wdqs_reimage`
  • 21:10 ryankemper: T280382 `sudo -i wmf-auto-reimage-host -p T280382 --new wdqs1004.eqiad.wmnet` on `ryankemper@cumin1001` tmux session `wdqs_reimage`
  • 20:58 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts doh3001.wikimedia.org
  • 20:49 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts doh3001.wikimedia.org
  • 20:27 dzahn@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host doh3002.wikimedia.org
  • 20:21 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm for new host doh3002.wikimedia.org
  • 20:00 dzahn@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host doh3001.wikimedia.org
  • 19:42 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm for new host doh3001.wikimedia.org
  • 18:37 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: e9c981d: Revert "enwiktionary: Raise AF emergency disable treshold+count" (T283460) (duration: 00m 58s)
  • 18:11 urbanecm: Deployed security patch for T281972
  • 18:05 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 4bf76fc: Make DiscussionTools replytool available for everyone on wikitech (T283119) (duration: 00m 58s)
  • 17:33 legoktm: disabled Kadirselcuk gerrit account, +1 spam (and blocked elsewhere)
  • 16:55 legoktm: restarted apache2 on lists1001 for https://gerrit.wikimedia.org/r/697805
  • 16:23 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:19 pt1979@cumin2002: START - Cookbook sre.dns.netbox
  • 16:10 sukhe@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts cescout1001.eqiad.wmnet
  • 16:01 pt1979@cumin2002: START - Cookbook sre.dns.netbox
  • 15:59 sukhe@cumin1001: START - Cookbook sre.hosts.decommission for hosts cescout1001.eqiad.wmnet
  • 13:16 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1125.eqiad.wmnet with reason: REIMAGE
  • 13:14 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1125.eqiad.wmnet with reason: REIMAGE
  • 12:05 jbond: enable puppet fleet wide. post changing puppetdb to use nginx-light #T164456
  • 11:54 jbond: disable puppet fleet wide. changing puppetdb to use nginx-light #T164456
  • 11:27 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.7/includes/actions/InfoAction.php: 85feaa1: InfoAction: Cast wgNamespaceProtection to array (T283751) (duration: 01m 00s)
  • 11:08 jbond: update mod_auth_cas T264605
  • 11:06 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: f12e368: Investigate MediaSearch usability on other wikis (T278984) (duration: 00m 57s)
  • 11:04 jbond: upload libapache2-mod-auth-cas_1.2-1 for buster and stretch - #T264605
  • 11:01 jbond: upload libapache2-mod-auth-cas_1.2-1+wmf11u1_amd64.deb - #T264605
  • 10:44 topranks: Commit pfw policy 1622570851 to pfw3-codfw and pfw3-eqiad to support new host fran2001 (T282056)
  • 10:21 kormat@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 10:17 kormat@cumin1001: START - Cookbook sre.dns.netbox
  • 10:01 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts dbstore1006.eqiad.wmnet
  • 09:51 kormat@cumin1001: START - Cookbook sre.hosts.decommission for hosts dbstore1006.eqiad.wmnet
  • 09:14 urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript extensions/Translate/scripts/moveTranslatablePage.php --wiki=metawiki --reason='OTRS -> VRTS renaming process; see Phab:T280392 and Phab:T280396 (request)' 'OTRS' 'VRT' 'Quiddity (WMF)' # T284118
  • 08:12 moritzm: removed eight inactive addresses from ops@ list
  • 07:44 moritzm: installing squid security updates
  • 06:54 razzi@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dbstore1007.eqiad.wmnet with reason: REIMAGE
  • 06:51 razzi@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on dbstore1007.eqiad.wmnet with reason: REIMAGE
  • 06:38 razzi@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 06:34 razzi@cumin1001: START - Cookbook sre.dns.netbox
  • 05:36 marostegui@cumin1001: dbctl commit (dc=all): 'db1146:3314 (re)pooling @ 75%: Repool db1146:3314', diff saved to https://phabricator.wikimedia.org/P16249 and previous config saved to /var/cache/conftool/dbconfig/20210602-050234-root.json [REPLAY FROM 2021-06-02 05:02:34]
  • 05:36 razzi@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 05:36 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db2071', diff saved to https://phabricator.wikimedia.org/P16248 and previous config saved to /var/cache/conftool/dbconfig/20210602-045736-marostegui.json [REPLAY FROM 2021-06-02 04:57:36]
  • 05:35 marostegui@cumin1001: dbctl commit (dc=all): 'db2071', diff saved to https://phabricator.wikimedia.org/P16247 and previous config saved to /var/cache/conftool/dbconfig/20210602-045717-marostegui.json [REPLAY FROM 2021-06-02 04:57:17]
  • 05:33 marostegui@cumin1001: dbctl commit (dc=all): 'db1146:3314 (re)pooling @ 50%: Repool db1146:3314', diff saved to https://phabricator.wikimedia.org/P16246 and previous config saved to /var/cache/conftool/dbconfig/20210602-044730-root.json [REPLAY FROM 2021-06-02 04:47:31]
  • 05:32 marostegui@cumin1001: dbctl commit (dc=all): 'db1146:3314 (re)pooling @ 25%: Repool db1146:3314', diff saved to https://phabricator.wikimedia.org/P16245 and previous config saved to /var/cache/conftool/dbconfig/20210602-043227-root.json [REPLAY FROM 2021-06-02 04:32:27]
  • 05:32 razzi@cumin1001: START - Cookbook sre.dns.netbox
  • 05:31 ladsgroup@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Fix pageterms API call for Special:Nearby in Wikidata (T281639) (duration: 00m 56s) [REPLAY FROM 2021-06-01 21:44:06]
  • 05:30 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) [REPLAY FROM 2021-06-01 19:42:38]
  • 05:30 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox [REPLAY FROM 2021-06-01 19:29:26]
  • 05:28 razzi@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db1183.eqiad.wmnet
  • 05:19 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1144:3314', diff saved to https://phabricator.wikimedia.org/P16251 and previous config saved to /var/cache/conftool/dbconfig/20210602-051919-marostegui.json
  • 05:18 razzi@cumin1001: START - Cookbook sre.hosts.decommission for hosts db1183.eqiad.wmnet
  • 05:17 marostegui@cumin1001: dbctl commit (dc=all): 'db1146:3314 (re)pooling @ 100%: Repool db1146:3314', diff saved to https://phabricator.wikimedia.org/P16250 and previous config saved to /var/cache/conftool/dbconfig/20210602-051738-root.json
  • off: restart tcpircbot-logmsgbot on alert1001 - T284123
  • 04:56 marostegui: Test

2021-06-01

  • 21:09 andrewbogott: dropping a bunch of tables from the labswiki db as per T284108
  • 17:23 Amir1: starting deletion of mbox files on lists1001 for mailman2, first reading-web-team.mbox, then smallest lists (T282303)
  • 16:31 moritzm: updating debmonitor clients to 0.3.0 (along with cleanup of sysuser UID allocation)
  • 15:38 legoktm: stopped mailman2 service on lists1001 (T52864)
  • 15:23 ryankemper@cumin1001: END (PASS) - Cookbook sre.elasticsearch.rolling-operation (exit_code=0) reboot without plugin upgrade (1 nodes at a time) for ElasticSearch cluster cloudelastic: cloudelastic reboot - ryankemper@cumin1001 - T283223
  • 15:16 ryankemper: T283223 `sudo -i cookbook sre.elasticsearch.rolling-operation cloudelastic "cloudelastic reboot" --reboot --nodes-per-run 1 --start-datetime 2021-05-20T05:16:40 --task-id T283223` on `ryankemper@cumin1001` tmux session `restart_cloudelastic`
  • 15:16 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation reboot without plugin upgrade (1 nodes at a time) for ElasticSearch cluster cloudelastic: cloudelastic reboot - ryankemper@cumin1001 - T283223
  • 14:59 topranks: Restoring Lumen CCT 442550293 to normal metric / bring back into service (T274234)
  • 13:56 marostegui: Stop mysql on db2079 (codfw master) - T283743
  • 13:53 topranks: Draining Lumen CCT 442550293 to do some comparative bandwidth tests from eqiad to codfw (T274234)
  • 13:53 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 3f75774: cawiki: Fix help panel links (T280673) (duration: 00m 58s)
  • 13:48 otto@deploy1002: Finished deploy [analytics/refinery@c0a02e5] (hadoop-test): deploy to an-test-coord1001 to get airflow/dags/hello_world.py - T272973 (duration: 02m 58s)
  • 13:45 otto@deploy1002: Started deploy [analytics/refinery@c0a02e5] (hadoop-test): deploy to an-test-coord1001 to get airflow/dags/hello_world.py - T272973
  • 13:43 topranks: Restoring Telia CT IC-307235 to normal metric / bring back into service (T274234)
  • 13:08 jynus@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2098.codfw.wmnet with reason: REIMAGE
  • 13:06 jynus@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2098.codfw.wmnet with reason: REIMAGE
  • 12:12 dcausse: re-pooling wdsq1005 (caught-up lag)
  • 12:06 moritzm: installing djvulibre security updates
  • 11:16 jbond@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs2003.codfw.wmnet with reason: REIMAGE
  • 11:14 jbond@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs2003.codfw.wmnet with reason: REIMAGE
  • 11:04 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: e4989d2: Enable "Diff" RSS feed on meta (T283380) (duration: 00m 58s)
  • 11:04 jiji@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 10:39 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on maps1009.eqiad.wmnet with reason: Postgis version juggling
  • 10:39 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on maps1009.eqiad.wmnet with reason: Postgis version juggling
  • 10:38 jiji@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 09:37 topranks: Draining Telia CT IC-307235 to do some comparative bandwidth tests from eqiad to codfw (T274234)
  • 08:04 hashar: Restarted Gerrit on gerrit1001 for Java 11 upgrade # T268225
  • 08:02 hashar: Restarted Gerrit on gerrit2001 for Java 11 upgrade # T268225
  • 07:26 dcausse: depooling wdsq1005 (lag)
  • 07:14 moritzm: installing nginx security updates
  • 05:56 legoktm: restarting mailman3 on lists1001
  • 05:37 legoktm: uploaded django-allauth_0.44.0+ds-1~bpo10+1 mailman3_3.3.3-1~bpo10+4 to apt.wm.o
  • 05:31 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1146:3314', diff saved to https://phabricator.wikimedia.org/P16242 and previous config saved to /var/cache/conftool/dbconfig/20210601-053137-marostegui.json
  • 05:23 marostegui@cumin1001: dbctl commit (dc=all): 'db1147 (re)pooling @ 100%: Repool db1147', diff saved to https://phabricator.wikimedia.org/P16241 and previous config saved to /var/cache/conftool/dbconfig/20210601-052349-root.json
  • 05:08 marostegui@cumin1001: dbctl commit (dc=all): 'db1147 (re)pooling @ 75%: Repool db1147', diff saved to https://phabricator.wikimedia.org/P16240 and previous config saved to /var/cache/conftool/dbconfig/20210601-050845-root.json
  • 04:53 marostegui@cumin1001: dbctl commit (dc=all): 'db1147 (re)pooling @ 50%: Repool db1147', diff saved to https://phabricator.wikimedia.org/P16239 and previous config saved to /var/cache/conftool/dbconfig/20210601-045341-root.json
  • 04:38 marostegui@cumin1001: dbctl commit (dc=all): 'db1147 (re)pooling @ 25%: Repool db1147', diff saved to https://phabricator.wikimedia.org/P16238 and previous config saved to /var/cache/conftool/dbconfig/20210601-043837-root.json
  • 00:46 legoktm@deploy1002: Synchronized logos/config.yaml: Revert "Use eswiki 20th anniversary logos" (T280908) (duration: 01m 07s)
  • 00:43 legoktm@deploy1002: Synchronized wmf-config/logos.php: Revert "Use eswiki 20th anniversary logos" (T280908) (duration: 01m 00s)

2021-05-31

  • 07:32 legoktm: deleted all outoing list mail that is for a gmail address being unsubscribed T284003
  • 07:30 legoktm: deleted all outoing list mail that is for a yahoo/aol address being unsubscribed T284003
  • 07:23 legoktm: deleting all outgoing list mail that has a subject that starts with "You have been unsubscribed from the" T284003
  • 06:33 legoktm: manually unsubscribed ahalfaker [at] wikimedia.org from scoring-internal list, triggering mailman bounce loop T282348#7124014
  • 06:22 legoktm: sudo systemctl restart mailman3 on lists1001, bounce runner crashed

2021-05-29

  • 14:44 elukey: execute apt-get clean on an-airflow1001 to free space
  • 14:40 elukey@puppetmaster1001: conftool action : set/pooled=inactive; selector: name=cp1087.eqiad.wmnet

2021-05-28

2021-05-27

  • 23:56 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on phab1004.eqiad.wmnet with reason: REIMAGE
  • 23:54 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on phab1004.eqiad.wmnet with reason: REIMAGE
  • 23:45 thcipriani@deploy1002: Synchronized README: Config: Revert "README: deployment training" (duration: 00m 55s)
  • 23:38 derick@deploy1002: Synchronized README: Config: README: deployment training (duration: 00m 55s)
  • 23:21 egardner@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Enable MediaSearch Assessment filter (T276257) (duration: 00m 57s)
  • 22:06 urbanecm: Invalidate bot password for `PKM@PKMbot` (T283839)
  • 20:37 jbond: add eugene-chernov, strofimovsky01, il to ldap nda #T279545
  • 20:37 jbond: add eugene-chernov, strofimovsky01, il to ldap nda
  • 19:53 James_F: Manually create missing SecurePoll DB tables on mnwwiktionary, taywiki, and trvwiki for T283844
  • 19:48 legoktm@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'shellbox' for release 'main' .
  • 19:21 brennen@deploy1002: rebuilt and synchronized wikiversions files: all wikis to 1.37.0-wmf.7
  • 19:15 tgr: US morning deploys done
  • 19:12 tgr@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: GrowthExperiments: Enable Add Links for 50% of new users and all old ones (T277356) (duration: 01m 04s)
  • 19:03 tgr@deploy1002: Synchronized php-1.37.0-wmf.6/extensions/GrowthExperiments: Backport: Help panel: SwitchEditorPanel fixes (T282800) Avoid session loading when loading task types in help panel RL data (T282800) Add Link: Fix homepage PV token and newcomer task token logging (T283765) (duration: 01m 05s)
  • 18:57 legoktm@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'shellbox' for release 'main' .
  • 18:56 tgr@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: ptwiki: Add 'flow-delete' to 'eliminator' user group (T283266) (duration: 01m 04s)
  • 18:49 tgr@deploy1002: Synchronized php-1.37.0-wmf.7/extensions/GrowthExperiments: Backport: Help panel: SwitchEditorPanel fixes (T282800) Avoid session loading when loading task types in help panel RL data (T282800) Add Link: Fix homepage PV token and newcomer task token logging (T283765) (duration: 01m 06s)
  • 18:22 legoktm@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'shellbox' for release 'main' .
  • 18:09 tgr@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Enable Growth's community configuration on the pilot wikis (T283809) (duration: 01m 06s)
  • 17:26 ryankemper@cumin2002: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 17:26 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 17:23 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:20 James_F: Running SecurePoll maintenance script cli/updateNotBlockedKey.php for all wikis T277079
  • 17:18 pt1979@cumin2002: START - Cookbook sre.dns.netbox
  • 17:05 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:59 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 15:58 ryankemper: T280382 `sudo -i cookbook sre.wdqs.data-transfer --source wdqs1007.eqiad.wmnet --dest wdqs1006.eqiad.wmnet --reason "transferring fresh wikidata journal following runaway inflation of wdqs1006's wikidata.jnl" --blazegraph_instance blazegraph` on `ryankemper@cumin1001` tmux session `wdqs_disk`
  • 15:58 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 15:56 ryankemper: T280382 `sudo -i cookbook sre.wdqs.data-transfer --source wdqs2008.codfw.wmnet --dest wdqs2004.codfw.wmnet --reason "transferring fresh wikidata journal following runaway inflation of wdqs2004's wikidata.jnl" --blazegraph_instance blazegraph` on `ryankemper@cumin2002` tmux session `wdqs_disk`
  • 15:56 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-transfer
  • 15:53 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:50 ryankemper: T280382 (fixing couple wrong host names in last log line) `wdqs2004` inexplicably has a 2.5TB `wikidata.jnl`. By comparison `wdqs1006` has a 1.6T `wikidata.jnl`, and `wdqs2001`, `wdqs2002`, and `wdqs2008`, have a 975G `wikidata.jnl`
  • 15:49 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 15:44 ryankemper: T280382 `wdqs2004` inexplicably has a 2.5TB `wikidata.jnl`. By comparison `wdqs1006` has a 1.6T `wikidata.jnl`, and `wdqs2004` and `wdqs2001` have a 975G `wikidata.jnl`. It's not clear why there's such a big divergence
  • 15:41 ryankemper: T280382 `wdqs2004` inexplicably has a 2.5TB `wikidata.jnl`. By comparison `wdqs1006` has a 1.6T `wikidata.jnl`
  • 15:12 XioNoX: test netconf over ssh on cr3-ulsfo
  • 15:03 effie: disable puppet mc2019
  • 14:14 moritzm: bounce keyholder-agent on cumin2001 to drop homer key (now on 2002 only)
  • 12:57 tgr: T283606: running mwscript extensions/GrowthExperiments/maintenance/fixLinkRecommendationData.php --wiki={ar,bn,cs,vi}wiki --verbose --search-index with gerrit:696307 applied
  • 12:55 tgr: T283606: running mwscript extensions/GrowthExperiments/maintenance/fixLinkRecommendationData.php --wiki={ar,bn,cs,vi}wiki --verbose --search-index
  • 12:50 kormat@deploy1002: Synchronized wmf-config/db-eqiad.php: Repool pc1007 as pc1 master T282761 (duration: 01m 04s)
  • 12:47 tgr: EU deploys done
  • 12:40 tgr@deploy1002: Synchronized php-1.37.0-wmf.7/extensions/GrowthExperiments/: Backport: Add Link: Prevent double-opening of the post-edit dialog (T283120) Always delete from search index in AddLinkSubmissionHandler (T283606) (duration: 01m 06s)
  • 12:40 topranks: cr2-eqord: Gerrit 696383: Removing IPv4 Anycast ranges from bgp_out policy.
  • 12:39 tgr@deploy1002: Synchronized php-1.37.0-wmf.6/extensions/GrowthExperiments/: Backport: Add Link: Prevent double-opening of the post-edit dialog (T283120) Add Link: Prevent double-opening of the post-edit dialog (T283120) (duration: 01m 06s)
  • 12:25 tgr@deploy1002: Synchronized php-1.37.0-wmf.7/extensions/VisualEditor/modules/ve-mw/ui/dialogs/ve.ui.MWTransclusionDialog.js: Backport: Don't update backButton visibility if not set (T283511) (duration: 01m 06s)
  • 11:51 tgr@deploy1002: Synchronized php-1.37.0-wmf.6/extensions/VisualEditor/modules/ve-mw/ui/dialogs/ve.ui.MWTransclusionDialog.js: Backport: Don't update backButton visibility if not set (T283511) (duration: 01m 06s)
  • 10:27 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2082.codfw.wmnet with reason: Rebuilding db2094:s8 from db2082 T283793
  • 10:27 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2082.codfw.wmnet with reason: Rebuilding db2094:s8 from db2082 T283793
  • 10:23 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on dborch1001.wikimedia.org with reason: Rebuilding db2094:s8 from db2082 12:19:41 <kormat> i thought also i might directly move pc1010 to pc2, so that it'll have a few days of pc2 cache available when we make it pc2 primary next week
  • 10:23 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on dborch1001.wikimedia.org with reason: Rebuilding db2094:s8 from db2082 12:19:41 <kormat> i thought also i might directly move pc1010 to pc2, so that it'll have a few days of pc2 cache available when we make it pc2 primary next week
  • 09:46 kormat: restarting mariadb on pc1007 to upgrade it
  • 08:35 topranks: removing stale peers (AS8674 / Netnod and AS57695 / Misaka) from cr2-esams
  • 08:30 moritzm: installing libx11 security updates
  • 07:45 topranks: cmooney@cumin1001 Gerrit 694305: Run homer to add Wikidough prefix aggregate config on cr's in AMS
  • 07:44 legoktm: adding stephane at kiwix as owner of offline-l per email
  • 07:43 topranks: cmooney@cumin1001 Gerrit 694305: Run homer to add Wikidough prefix aggregate config on cr's in eqsin
  • 07:42 topranks: cmooney@cumin1001 Gerrit 694305: Run homer to add Wikidough prefix aggregate config on cr2-eqord
  • 07:20 topranks: cmooney@cumin1001 Gerrit 694305: Run homer to announce Wikidough Anycast range from cr's in ulsfo
  • 07:14 topranks: cmooney@cumin1001 Gerrit 694305: Add Wikidough Anycast range to aggregate config to cr1-eqdfw
  • 07:11 topranks: cmooney@cumin1001 Gerrit 694305: Add Wikidough Anycast range to aggregate config to cr2-codfw
  • 06:47 ryankemper@puppetmaster2001: conftool action : set/pooled=no; selector: name=wdqs1003.eqiad.wmnet
  • 06:43 urbanecm@deploy1002: Synchronized wmf-config/interwiki.php: Update interwiki cache (duration: 02m 13s)
  • 06:09 marostegui@cumin1001: dbctl commit (dc=all): 'db1148 (re)pooling @ 100%: Repool db1148', diff saved to https://phabricator.wikimedia.org/P16227 and previous config saved to /var/cache/conftool/dbconfig/20210527-060953-root.json
  • 05:55 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1147', diff saved to https://phabricator.wikimedia.org/P16226 and previous config saved to /var/cache/conftool/dbconfig/20210527-055507-marostegui.json
  • 05:54 marostegui@cumin1001: dbctl commit (dc=all): 'db1148 (re)pooling @ 75%: Repool db1148', diff saved to https://phabricator.wikimedia.org/P16225 and previous config saved to /var/cache/conftool/dbconfig/20210527-055450-root.json
  • 05:39 marostegui@cumin1001: dbctl commit (dc=all): 'db1148 (re)pooling @ 50%: Repool db1148', diff saved to https://phabricator.wikimedia.org/P16224 and previous config saved to /var/cache/conftool/dbconfig/20210527-053946-root.json
  • 05:29 ryankemper: `ryankemper@cloudelastic1003:~$ sudo run-puppet-agent --force`
  • 05:24 marostegui@cumin1001: dbctl commit (dc=all): 'db1148 (re)pooling @ 25%: Repool db1148', diff saved to https://phabricator.wikimedia.org/P16223 and previous config saved to /var/cache/conftool/dbconfig/20210527-052442-root.json

2021-05-26

  • 23:07 ladsgroup@deploy1002: Synchronized php-1.37.0-wmf.7/includes/resourceloader/dependencystore/SqlModuleDependencyStore.php: Backport: resourceloader: Avoid primary connection in SqlModuleDependencyStore (2) (duration: 01m 06s)
  • 23:03 ladsgroup@deploy1002: Synchronized php-1.37.0-wmf.6/includes/resourceloader/dependencystore/SqlModuleDependencyStore.php: Backport: resourceloader: Avoid primary connection in SqlModuleDependencyStore (2) (duration: 01m 06s)
  • 22:17 ladsgroup@deploy1002: Synchronized php-1.37.0-wmf.7/includes/resourceloader/dependencystore/SqlModuleDependencyStore.php: Backport: resourceloader: Avoid opening a connection to master when not needed (duration: 01m 06s)
  • 22:10 ladsgroup@deploy1002: Synchronized php-1.37.0-wmf.6/includes/resourceloader/dependencystore/SqlModuleDependencyStore.php: Backport: resourceloader: Avoid opening a connection to master when not needed (duration: 01m 07s)
  • 21:22 tgr: T283606: running mwscript extensions/GrowthExperiments/maintenance/fixLinkRecommendationData.php --wiki={ar,bn,cs,vi}wiki --verbose --search-index
  • 19:58 twentyafterfour: finished deploying wmf.7 and error levels appear unchanged. refs T281148
  • 19:57 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1018.eqiad.wmnet with reason: REIMAGE
  • 19:55 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1018.eqiad.wmnet with reason: REIMAGE
  • 19:51 twentyafterfour@deploy1002: Synchronized php: group1 wikis to 1.37.0-wmf.7 (duration: 01m 07s)
  • 19:50 otto@deploy1002: Finished deploy [analytics/refinery@c02cef1] (hadoop-test): Regular analytics weekly train (duration: 05m 12s)
  • 19:50 twentyafterfour@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.37.0-wmf.7
  • 19:45 otto@deploy1002: Started deploy [analytics/refinery@c02cef1] (hadoop-test): Regular analytics weekly train
  • 19:44 twentyafterfour: train is unblocked, proceeding to deploy wmf.7 to group1 wikis refs T281148
  • 19:44 otto@deploy1002: Finished deploy [analytics/refinery@c02cef1] (thin): Regular analytics weekly train THIN (duration: 00m 07s)
  • 19:44 otto@deploy1002: Started deploy [analytics/refinery@c02cef1] (thin): Regular analytics weekly train THIN
  • 19:43 otto@deploy1002: Finished deploy [analytics/refinery@c02cef1]: Regular analytics weekly train take 3 (duration: 01m 00s)
  • 19:42 otto@deploy1002: Started deploy [analytics/refinery@c02cef1]: Regular analytics weekly train take 3
  • 19:33 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.7/extensions/GrowthExperiments/modules/homepage/suggestededits/ext.growthExperiments.SuggestedEdits.Guidance.js: 9f3410b: Add Link: Suppress the blue dot on the edit button (T283094) (duration: 01m 07s)
  • 19:31 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.6/extensions/GrowthExperiments/modules/homepage/suggestededits/ext.growthExperiments.SuggestedEdits.Guidance.js: 512d72e: Add Link: Suppress the blue dot on the edit button (T283094) (duration: 01m 07s)
  • 19:25 urbanecm@deploy1002: Synchronized dblists/visualeditor-nondefault.dblist: 80abdf9: 92d2952: Enable VisualEditor by default at ptwikinews and plwikinews (T282846, T283033) (duration: 01m 09s)
  • 19:21 otto@deploy1002: Started deploy [analytics/refinery@c02cef1]: Regular analytics weekly train take 2
  • 19:17 legoktm: legoktm@deploy1002:~$ sudo -E kubectl delete pod kask-production-6d6869b697-m2qjs -n sessionstore
  • 19:16 otto@deploy1002: Finished deploy [analytics/refinery@b787999]: Regular analytics weekly train (duration: 01m 23s)
  • 19:15 otto@deploy1002: Started deploy [analytics/refinery@b787999]: Regular analytics weekly train
  • 18:12 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 3f66b3b: Enable wgCiteResponsiveReferences on svwiki (T281622) (duration: 01m 06s)
  • 18:07 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 07b804b: Enable DiscussionTools on wikitech (T283119) (duration: 01m 05s)
  • 17:51 legoktm@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'shellbox' for release 'main' .
  • 17:39 legoktm@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'shellbox' for release 'main' .
  • 17:16 legoktm@deploy1002: Synchronized private/PrivateSettings.php: Set $wgShellboxSecretKey - T281423 (duration: 01m 14s)
  • 17:02 moritzm: restarting FPM on mw canaries to pick up libx11 update
  • 16:51 moritzm: installing libx11 security updates
  • 16:38 topranks: cmooney@cumin1001 Running homer to deploy Gerrit 694305 changes to cr2-codfw - Wikidough Anycast
  • 16:12 marostegui: Reboot db2107 (codfw master) T282072
  • 16:10 marostegui: Reboot db2103 (codfw master) T282072
  • 16:09 sukhe@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:45:00 on malmok.wikimedia.org with reason: [WIP] applying anycast update: T283503
  • 16:09 sukhe@cumin1001: START - Cookbook sre.hosts.downtime for 0:45:00 on malmok.wikimedia.org with reason: [WIP] applying anycast update: T283503
  • 16:01 papaul: powerdown ms-be2038 for BBU replacement
  • 15:41 effie: enable puppet on mc2019
  • 15:31 marostegui: Cold reset db2107 idrac T283727
  • 15:23 sukhe@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:45:00 on malmok.wikimedia.org with reason: applying anycast update: T283503
  • 15:23 sukhe@cumin1001: START - Cookbook sre.hosts.downtime for 0:45:00 on malmok.wikimedia.org with reason: applying anycast update: T283503
  • 15:22 topranks: cmooney@cumin1001 Running homer to deploy Gerrit 694305 changes to cr1-codfw - Wikidough Anycast
  • 15:18 urbanecm: otrs_wikiwiki was moved to vrt-wiki.wikimedia.org (T280400)
  • 15:12 topranks: Merging https://gerrit.wikimedia.org/r/c/operations/homer/public/+/694305/ - Add Wikidough Anycast range to network config
  • 15:11 urbanecm@deploy1002: Synchronized wmf-config/: 490435e: Move otrs-wiki.wikimedia.org to vrt-wiki.wikimedia.org (T280400) (duration: 01m 07s)
  • 15:08 urbanecm@deploy1002: Synchronized multiversion/MWMultiVersion.php: 945ee9c: Move otrs-wiki.wikimedia.org to vrt-wiki.wikimedia.org (T280400; 1/2) (duration: 01m 06s)
  • 15:02 legoktm@deploy1002: Synchronized wmf-config/interwiki.php: Update interwiki cache (duration: 03m 18s)
  • 14:59 otto@deploy1002: Finished deploy [analytics/refinery@b787999] (hadoop-test): Regular analytics weekly train TEST (duration: 05m 24s)
  • 14:53 otto@deploy1002: Started deploy [analytics/refinery@b787999] (hadoop-test): Regular analytics weekly train TEST
  • 14:50 otto@deploy1002: Finished deploy [analytics/refinery@b787999] (thin): Regular analytics weekly train THIN (duration: 00m 07s)
  • 14:49 otto@deploy1002: Started deploy [analytics/refinery@b787999] (thin): Regular analytics weekly train THIN
  • 14:49 volans@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest1002.eqiad.wmnet with reason: REIMAGE
  • 14:49 otto@deploy1002: Finished deploy [analytics/refinery@b787999]: Regular analytics weekly train [analytics/refinery@e536abd] (duration: 30m 22s)
  • 14:47 volans@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest1002.eqiad.wmnet with reason: REIMAGE
  • 14:31 moritzm: updated bullseye d-i image to 2021-05-26 daily image T275873
  • 14:19 otto@deploy1002: Started deploy [analytics/refinery@b787999]: Regular analytics weekly train [analytics/refinery@e536abd]
  • 14:18 otto@deploy1002: deploy aborted: Regular analytics weekly train [analytics/refinery@e536abd] (duration: 00m 06s)
  • 14:18 otto@deploy1002: Started deploy [analytics/refinery@e536abd]: Regular analytics weekly train [analytics/refinery@e536abd]
  • 14:05 mbsantos@deploy1002: Finished deploy [kartotherian/deploy@5d7c993]: (no justification provided) (duration: 00m 14s)
  • 14:05 mbsantos@deploy1002: Started deploy [kartotherian/deploy@5d7c993]: (no justification provided)
  • 14:03 hashar@deploy1002: Finished deploy [integration/docroot@ebee5d3]: composer/npm updates (duration: 00m 09s)
  • 14:03 hashar@deploy1002: Started deploy [integration/docroot@ebee5d3]: composer/npm updates
  • 11:44 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.6/extensions/GrowthExperiments/maintenance/fixLinkRecommendationData.php: b3c2941: Allow running fixLinkRecommendationData --search-index in production (T283606) (duration: 01m 07s)
  • 11:39 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.7/extensions/GrowthExperiments/maintenance/fixLinkRecommendationData.php: 86bba48: Allow running fixLinkRecommendationData --search-index in production (T283606) (duration: 01m 06s)
  • 11:30 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on maps1009.eqiad.wmnet with reason: Planet reimport
  • 11:30 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on maps1009.eqiad.wmnet with reason: Planet reimport
  • 11:27 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.7/extensions/GrowthExperiments/: GrowthExperiments backports (T283544; T282899; T282546) (duration: 01m 06s)
  • 11:26 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.6/extensions/GrowthExperiments/: GrowthExperiments backports (T283544; T282899; T282546) (duration: 01m 19s)
  • 11:05 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Test Wikidata: Enable empty list to object serialization (T241422) (duration: 01m 19s)
  • 10:26 moritzm: installing lz4 security updates on buster
  • 10:01 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 180 days, 0:00:00 on labstore1007.wikimedia.org with reason: T281045
  • 10:01 aborrero@cumin1001: START - Cookbook sre.hosts.downtime for 180 days, 0:00:00 on labstore1007.wikimedia.org with reason: T281045
  • 09:55 ladsgroup@deploy1002: Synchronized php-1.37.0-wmf.6/extensions/Wikibase: Backport: Wrap list of acceptable site ids with an APCu cache in API (duration: 01m 18s)
  • 09:45 godog: rm /root/prometheus from prometheus5001 - old transition files
  • 09:42 ladsgroup@deploy1002: Synchronized php-1.37.0-wmf.7/extensions/Wikibase: Backport: Wrap list of acceptable site ids with an APCu cache in API (duration: 02m 12s)
  • 09:36 marostegui@cumin1001: dbctl commit (dc=all): 'db1106 (re)pooling @ 100%: Repool db1106', diff saved to https://phabricator.wikimedia.org/P16222 and previous config saved to /var/cache/conftool/dbconfig/20210526-093647-root.json
  • 09:21 marostegui@cumin1001: dbctl commit (dc=all): 'db1106 (re)pooling @ 75%: Repool db1106', diff saved to https://phabricator.wikimedia.org/P16221 and previous config saved to /var/cache/conftool/dbconfig/20210526-092144-root.json
  • 09:13 elukey: deploy https://gerrit.wikimedia.org/r/c/operations/homer/public/+/695192 on {cr1|cr2}-eqiad - T225005
  • 09:06 marostegui@cumin1001: dbctl commit (dc=all): 'db1106 (re)pooling @ 50%: Repool db1106', diff saved to https://phabricator.wikimedia.org/P16220 and previous config saved to /var/cache/conftool/dbconfig/20210526-090640-root.json
  • 08:51 marostegui@cumin1001: dbctl commit (dc=all): 'db1106 (re)pooling @ 25%: Repool db1106', diff saved to https://phabricator.wikimedia.org/P16219 and previous config saved to /var/cache/conftool/dbconfig/20210526-085137-root.json
  • 08:12 _joe_: purging images on deneb
  • 08:11 kormat: running 'optimize table' over parsercache db on pc1007 with replication enabled T282761
  • 07:14 ryankemper: Pooled `wdqs1013` (caught up on lag), de-pooled `wdqs2003` (should not have been pooled due to reimage failure)
  • 07:13 ryankemper@puppetmaster2001: conftool action : set/pooled=no; selector: name=wdqs2003.codfw.wmnet
  • 05:46 marostegui: Stop MySQL on clouddb1021 to upgrade mysql
  • 05:19 marostegui@cumin1001: dbctl commit (dc=all): 'db1160 (re)pooling @ 100%: Repool db1160', diff saved to https://phabricator.wikimedia.org/P16215 and previous config saved to /var/cache/conftool/dbconfig/20210526-051935-root.json
  • 05:09 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1148', diff saved to https://phabricator.wikimedia.org/P16214 and previous config saved to /var/cache/conftool/dbconfig/20210526-050919-marostegui.json
  • 05:04 marostegui@cumin1001: dbctl commit (dc=all): 'db1160 (re)pooling @ 75%: Repool db1160', diff saved to https://phabricator.wikimedia.org/P16213 and previous config saved to /var/cache/conftool/dbconfig/20210526-050431-root.json
  • 04:49 marostegui@cumin1001: dbctl commit (dc=all): 'db1160 (re)pooling @ 50%: Repool db1160', diff saved to https://phabricator.wikimedia.org/P16212 and previous config saved to /var/cache/conftool/dbconfig/20210526-044928-root.json
  • 04:35 marostegui: Deploy schema change on db1106, this will generate lag on s1 (enwiki) on wiki replicas T266486 T268392 T273360
  • 04:34 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1106', diff saved to https://phabricator.wikimedia.org/P16211 and previous config saved to /var/cache/conftool/dbconfig/20210526-043439-marostegui.json
  • 04:34 marostegui@cumin1001: dbctl commit (dc=all): 'db1160 (re)pooling @ 25%: Repool db1160', diff saved to https://phabricator.wikimedia.org/P16210 and previous config saved to /var/cache/conftool/dbconfig/20210526-043424-root.json
  • 03:29 eileen: process-control config revision is 7b646533da
  • 00:47 eileen: civicrm revision changed from 584b96452a to eac772e9c9, config revision is 2ca92c3c3c
  • 00:27 mutante: phab2001 - restarted apache2

2021-05-25

  • 23:09 razzi@cumin1001: END (PASS) - Cookbook sre.hadoop.roll-restart-masters (exit_code=0)
  • 22:39 razzi@cumin1001: START - Cookbook sre.hadoop.roll-restart-masters
  • 22:21 razzi@cumin1001: END (FAIL) - Cookbook sre.hadoop.roll-restart-masters (exit_code=99)
  • 22:21 razzi@cumin1001: START - Cookbook sre.hadoop.roll-restart-masters
  • 22:21 razzi@cumin1001: END (FAIL) - Cookbook sre.hadoop.roll-restart-masters (exit_code=99)
  • 22:21 razzi@cumin1001: START - Cookbook sre.hadoop.roll-restart-masters
  • 22:04 razzi@cumin1001: END (FAIL) - Cookbook sre.hadoop.roll-restart-masters (exit_code=99)
  • 22:04 razzi@cumin1001: START - Cookbook sre.hadoop.roll-restart-masters
  • 21:58 razzi@cumin1001: END (FAIL) - Cookbook sre.hadoop.roll-restart-masters (exit_code=99)
  • 21:58 razzi@cumin1001: START - Cookbook sre.hadoop.roll-restart-masters
  • 21:13 razzi@cumin1001: END (FAIL) - Cookbook sre.hadoop.roll-restart-masters (exit_code=99)
  • 21:13 razzi@cumin1001: START - Cookbook sre.hadoop.roll-restart-masters
  • 21:13 razzi@cumin1001: END (ERROR) - Cookbook sre.hadoop.roll-restart-workers (exit_code=97)
  • 21:13 razzi@cumin1001: START - Cookbook sre.hadoop.roll-restart-workers
  • 20:40 razzi@cumin1001: END (PASS) - Cookbook sre.hadoop.roll-restart-workers (exit_code=0)
  • 20:28 razzi@cumin1001: START - Cookbook sre.hadoop.roll-restart-workers
  • 20:00 twentyafterfour@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.37.0-wmf.7
  • 19:20 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 19:17 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 19:17 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 19:12 twentyafterfour@deploy1002: Finished scap: testwikis wikis to 1.37.0-wmf.7 (duration: 33m 29s)
  • 19:12 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 18:38 twentyafterfour@deploy1002: Started scap: testwikis wikis to 1.37.0-wmf.7
  • 18:08 krinkle@deploy1002: Synchronized wmf-config/CommonSettings.php: I2ebe96 (duration: 00m 56s)
  • 17:34 Krinkle: mwmaint1002: Running purge-parsercache-now.php on server 2/4 (pc1007, depooled spare). Ref P16060, T280605, T282761.
  • 17:30 marostegui@cumin1001: dbctl commit (dc=all): 'db1164 (re)pooling @ 100%: Repool db1164', diff saved to https://phabricator.wikimedia.org/P16207 and previous config saved to /var/cache/conftool/dbconfig/20210525-173031-root.json
  • 17:22 effie: disable puppet on mc2019 (for tests)
  • 17:15 marostegui@cumin1001: dbctl commit (dc=all): 'db1164 (re)pooling @ 75%: Repool db1164', diff saved to https://phabricator.wikimedia.org/P16206 and previous config saved to /var/cache/conftool/dbconfig/20210525-171527-root.json
  • 17:00 marostegui@cumin1001: dbctl commit (dc=all): 'db1164 (re)pooling @ 50%: Repool db1164', diff saved to https://phabricator.wikimedia.org/P16205 and previous config saved to /var/cache/conftool/dbconfig/20210525-170024-root.json
  • 16:45 marostegui@cumin1001: dbctl commit (dc=all): 'db1164 (re)pooling @ 25%: Repool db1164', diff saved to https://phabricator.wikimedia.org/P16203 and previous config saved to /var/cache/conftool/dbconfig/20210525-164520-root.json
  • 12:55 urbanecm@deploy1002: Synchronized static/images/project-logos/: 63ad5fda: Revert "Add svwiki 20th anniversary logos" (T282389) (duration: 00m 56s)
  • 12:52 urbanecm@deploy1002: Synchronized wmf-config/logos.php: 94ede526: Revert "Use svwiki 20th anniversary logos" (T282389) (duration: 00m 56s)
  • 12:21 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1164', diff saved to https://phabricator.wikimedia.org/P16200 and previous config saved to /var/cache/conftool/dbconfig/20210525-122127-marostegui.json
  • 12:07 marostegui@cumin1001: dbctl commit (dc=all): 'remove db1124 from dbctl', diff saved to https://phabricator.wikimedia.org/P16199 and previous config saved to /var/cache/conftool/dbconfig/20210525-120718-marostegui.json
  • 11:35 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1124 will be moved to the test cluster', diff saved to https://phabricator.wikimedia.org/P16198 and previous config saved to /var/cache/conftool/dbconfig/20210525-113521-marostegui.json
  • 11:26 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on maps1009.eqiad.wmnet with reason: Planet reimport
  • 11:26 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on maps1009.eqiad.wmnet with reason: Planet reimport
  • 11:21 Lucas_WMDE: EU backport&config window done
  • 11:20 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Change HTTP to HTTPS for concept URIs on Commons (T258590) (duration: 00m 56s)
  • 11:17 marostegui@cumin1001: dbctl commit (dc=all): 'db1169 (re)pooling @ 100%: Repool db1169', diff saved to https://phabricator.wikimedia.org/P16196 and previous config saved to /var/cache/conftool/dbconfig/20210525-111719-root.json
  • 11:02 marostegui@cumin1001: dbctl commit (dc=all): 'db1169 (re)pooling @ 75%: Repool db1169', diff saved to https://phabricator.wikimedia.org/P16195 and previous config saved to /var/cache/conftool/dbconfig/20210525-110215-root.json
  • 10:47 marostegui@cumin1001: dbctl commit (dc=all): 'db1169 (re)pooling @ 50%: Repool db1169', diff saved to https://phabricator.wikimedia.org/P16194 and previous config saved to /var/cache/conftool/dbconfig/20210525-104711-root.json
  • 10:32 marostegui@cumin1001: dbctl commit (dc=all): 'db1169 (re)pooling @ 25%: Repool db1169', diff saved to https://phabricator.wikimedia.org/P16193 and previous config saved to /var/cache/conftool/dbconfig/20210525-103208-root.json
  • 09:58 ema: cp3054: upgrade varnish to latest LTS (6.0.7-1wm1) T264398
  • 09:28 jynus: updating puppet facts on cloud from puppetmaster1001
  • 09:05 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on pc[2007,2010].codfw.wmnet,pc1007.eqiad.wmnet with reason: Purging parsercache T282761
  • 09:05 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on pc[2007,2010].codfw.wmnet,pc1007.eqiad.wmnet with reason: Purging parsercache T282761
  • 09:01 kormat: stopping replication on pc1010 T282761
  • 09:00 kormat@deploy1002: Synchronized wmf-config/db-eqiad.php: Set pc1010 as pc1 primary T282761 (duration: 00m 58s)
  • 08:57 marostegui@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 08:52 marostegui@cumin1001: START - Cookbook sre.dns.netbox
  • 08:20 jynus@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on backup2007.codfw.wmnet with reason: REIMAGE
  • 08:18 jynus@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on backup2006.codfw.wmnet with reason: REIMAGE
  • 08:17 jynus@cumin2001: START - Cookbook sre.hosts.downtime for 2:00:00 on backup2007.codfw.wmnet with reason: REIMAGE
  • 08:16 jynus@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on backup2005.codfw.wmnet with reason: REIMAGE
  • 08:16 jynus@cumin2001: START - Cookbook sre.hosts.downtime for 2:00:00 on backup2006.codfw.wmnet with reason: REIMAGE
  • 08:14 jynus@cumin2001: START - Cookbook sre.hosts.downtime for 2:00:00 on backup2005.codfw.wmnet with reason: REIMAGE
  • 08:02 marostegui@cumin1001: dbctl commit (dc=all): 'db1184 (re)pooling @ 100%: Repool db1184', diff saved to https://phabricator.wikimedia.org/P16192 and previous config saved to /var/cache/conftool/dbconfig/20210525-080234-root.json
  • 07:49 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1169', diff saved to https://phabricator.wikimedia.org/P16191 and previous config saved to /var/cache/conftool/dbconfig/20210525-074950-marostegui.json
  • 07:47 marostegui@cumin1001: dbctl commit (dc=all): 'db1184 (re)pooling @ 75%: Repool db1184', diff saved to https://phabricator.wikimedia.org/P16190 and previous config saved to /var/cache/conftool/dbconfig/20210525-074730-root.json
  • 07:32 marostegui@cumin1001: dbctl commit (dc=all): 'db1184 (re)pooling @ 50%: Repool db1184', diff saved to https://phabricator.wikimedia.org/P16189 and previous config saved to /var/cache/conftool/dbconfig/20210525-073227-root.json
  • 07:17 marostegui@cumin1001: dbctl commit (dc=all): 'db1184 (re)pooling @ 25%: Repool db1184', diff saved to https://phabricator.wikimedia.org/P16188 and previous config saved to /var/cache/conftool/dbconfig/20210525-071723-root.json
  • 06:16 kart_: Updated cxserver to 2021-05-15-034540-production (T276214)
  • 06:05 kartik@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'cxserver' for release 'production' .
  • 05:58 kartik@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'cxserver' for release 'production' .
  • 05:53 kartik@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'cxserver' for release 'staging' .
  • 05:14 marostegui: Reload daily_account_consistency_check.service on mwmaint1002
  • 05:09 marostegui@cumin1001: dbctl commit (dc=all): 'db1149 (re)pooling @ 100%: Repool db1149', diff saved to https://phabricator.wikimedia.org/P16187 and previous config saved to /var/cache/conftool/dbconfig/20210525-050921-root.json
  • 04:54 marostegui@cumin1001: dbctl commit (dc=all): 'db1149 (re)pooling @ 75%: Repool db1149', diff saved to https://phabricator.wikimedia.org/P16186 and previous config saved to /var/cache/conftool/dbconfig/20210525-045417-root.json
  • 04:39 marostegui@cumin1001: dbctl commit (dc=all): 'db1149 (re)pooling @ 50%: Repool db1149', diff saved to https://phabricator.wikimedia.org/P16185 and previous config saved to /var/cache/conftool/dbconfig/20210525-043914-root.json
  • 04:32 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1184', diff saved to https://phabricator.wikimedia.org/P16184 and previous config saved to /var/cache/conftool/dbconfig/20210525-043234-marostegui.json
  • 04:31 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1160', diff saved to https://phabricator.wikimedia.org/P16183 and previous config saved to /var/cache/conftool/dbconfig/20210525-043129-marostegui.json
  • 04:25 marostegui: Stop MySQL on dbstore1004 to clone dbstore1006 T283125
  • 04:24 marostegui@cumin1001: dbctl commit (dc=all): 'db1149 (re)pooling @ 25%: Repool db1149', diff saved to https://phabricator.wikimedia.org/P16181 and previous config saved to /var/cache/conftool/dbconfig/20210525-042410-root.json
  • 02:06 James_F: 1.37.0-wmf.7 was branched at 7ee6a2e for T281148 by the TrainBranchBot
  • 00:48 legoktm@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 00:44 legoktm@cumin1001: START - Cookbook sre.dns.netbox
  • 00:37 bstorm: labstore1007 downtimed for maintenance T281045

2021-05-24

  • 21:43 legoktm@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 21:40 legoktm@cumin1001: START - Cookbook sre.dns.netbox
  • 19:32 ppchelko@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 19:23 ppchelko@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 19:20 ppchelko@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 19:15 ppchelko@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 18:33 urbanecm: Morning B&C deployment done
  • 18:31 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: e9cd344: Disable Education Program namespaces in hewiki (T217137) (duration: 00m 56s)
  • 18:29 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.6/skins/Vector/: 1742532687b: Introduce the vector-body class (T283206) (duration: 00m 57s)
  • 17:13 ppchelko@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 16:39 pt1979@cumin2001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:35 pt1979@cumin2001: START - Cookbook sre.dns.netbox
  • 16:17 jynus@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on backup2004.codfw.wmnet with reason: REIMAGE
  • 16:15 jynus@cumin2001: START - Cookbook sre.hosts.downtime for 2:00:00 on backup2004.codfw.wmnet with reason: REIMAGE
  • 16:14 herron@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts logstash1022.eqiad.wmnet
  • 15:55 herron@cumin1001: START - Cookbook sre.hosts.decommission for hosts logstash1022.eqiad.wmnet
  • 15:52 ppchelko@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 15:47 ppchelko@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 15:45 ppchelko@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 15:41 twentyafterfour: deploying phabricator hotfix (and restarting php7.3-fpm on phab1001)
  • 15:29 ppchelko@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 15:09 herron@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts logstash1021.eqiad.wmnet
  • 15:09 marostegui@cumin1001: dbctl commit (dc=all): 'db1105:3311 (re)pooling @ 100%: Repool db1105:3311', diff saved to https://phabricator.wikimedia.org/P16176 and previous config saved to /var/cache/conftool/dbconfig/20210524-150926-root.json
  • 14:54 marostegui@cumin1001: dbctl commit (dc=all): 'db1105:3311 (re)pooling @ 75%: Repool db1105:3311', diff saved to https://phabricator.wikimedia.org/P16175 and previous config saved to /var/cache/conftool/dbconfig/20210524-145422-root.json
  • 14:50 herron@cumin1001: START - Cookbook sre.hosts.decommission for hosts logstash1021.eqiad.wmnet
  • 14:47 herron@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts logstash1020.eqiad.wmnet
  • 14:39 marostegui@cumin1001: dbctl commit (dc=all): 'db1105:3311 (re)pooling @ 50%: Repool db1105:3311', diff saved to https://phabricator.wikimedia.org/P16174 and previous config saved to /var/cache/conftool/dbconfig/20210524-143919-root.json
  • 14:36 herron@cumin1001: START - Cookbook sre.hosts.decommission for hosts logstash1020.eqiad.wmnet
  • 14:24 marostegui@cumin1001: dbctl commit (dc=all): 'db1105:3311 (re)pooling @ 25%: Repool db1105:3311', diff saved to https://phabricator.wikimedia.org/P16173 and previous config saved to /var/cache/conftool/dbconfig/20210524-142415-root.json
  • 13:44 herron@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventstreams' for release 'production' .
  • 13:44 herron@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventstreams' for release 'canary' .
  • 13:44 pt1979@cumin2001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 13:43 herron@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventstreams' for release 'production' .
  • 13:43 herron@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventstreams' for release 'canary' .
  • 13:41 herron@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
  • 13:41 herron@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-main' for release 'canary' .
  • 13:40 pt1979@cumin2001: START - Cookbook sre.dns.netbox
  • 13:39 herron@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
  • 13:39 herron@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-main' for release 'canary' .
  • 13:37 herron@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'changeprop' for release 'production' .
  • 13:36 pt1979@cumin2001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 13:35 herron@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'changeprop' for release 'production' .
  • 13:34 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on maps1009.eqiad.wmnet with reason: Planet reimport
  • 13:34 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on maps1009.eqiad.wmnet with reason: Planet reimport
  • 13:34 herron@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' .
  • 13:33 pt1979@cumin2001: START - Cookbook sre.dns.netbox
  • 13:33 herron@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' .
  • 12:18 urbanecm: Uninstalling Flow from ruwiki: Delete all pages in NS2600 (Flow's Topic) in ruwiki via deleteBatch.php (T282132; P16170)
  • 12:16 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 47e040b: ruwiki: Uninstall Flow (T282132) (duration: 00m 56s)
  • 11:37 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1105:3311', diff saved to https://phabricator.wikimedia.org/P16169 and previous config saved to /var/cache/conftool/dbconfig/20210524-113711-marostegui.json
  • 11:20 marostegui@cumin1001: dbctl commit (dc=all): 'db1099:3311 (re)pooling @ 100%: Repool db1099:3311', diff saved to https://phabricator.wikimedia.org/P16168 and previous config saved to /var/cache/conftool/dbconfig/20210524-112011-root.json
  • 11:16 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1183.eqiad.wmnet with reason: Schema change
  • 11:16 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db1183.eqiad.wmnet with reason: Schema change
  • 11:06 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 1129e01: Remove wgGEMentorshipMigrationStage (T279853) (duration: 00m 57s)
  • 11:05 marostegui@cumin1001: dbctl commit (dc=all): 'db1099:3311 (re)pooling @ 75%: Repool db1099:3311', diff saved to https://phabricator.wikimedia.org/P16167 and previous config saved to /var/cache/conftool/dbconfig/20210524-110508-root.json
  • 11:03 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 829c61d: Deploy Growth features to newcomers on bgwiki, urwiki (T280824, T280067) (duration: 00m 56s)
  • 10:51 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on maps1009.eqiad.wmnet with reason: Planet reimport
  • 10:51 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on maps1009.eqiad.wmnet with reason: Planet reimport
  • 10:50 marostegui@cumin1001: dbctl commit (dc=all): 'db1099:3311 (re)pooling @ 50%: Repool db1099:3311', diff saved to https://phabricator.wikimedia.org/P16166 and previous config saved to /var/cache/conftool/dbconfig/20210524-105004-root.json
  • 10:35 mbsantos@deploy1002: Finished deploy [tilerator/deploy@6bfdab5]: (no justification provided) (duration: 00m 16s)
  • 10:35 mbsantos@deploy1002: Started deploy [tilerator/deploy@6bfdab5]: (no justification provided)
  • 10:35 marostegui@cumin1001: dbctl commit (dc=all): 'db1099:3311 (re)pooling @ 25%: Repool db1099:3311', diff saved to https://phabricator.wikimedia.org/P16165 and previous config saved to /var/cache/conftool/dbconfig/20210524-103501-root.json
  • 10:34 mbsantos@deploy1002: Finished deploy [kartotherian/deploy@a9a577a]: (no justification provided) (duration: 00m 15s)
  • 10:34 mbsantos@deploy1002: Started deploy [kartotherian/deploy@a9a577a]: (no justification provided)
  • 07:59 marostegui@cumin1001: dbctl commit (dc=all): 'db1135 (re)pooling @ 100%: Repool db1135', diff saved to https://phabricator.wikimedia.org/P16164 and previous config saved to /var/cache/conftool/dbconfig/20210524-075958-root.json
  • 07:49 XioNoX: bump Equinix Chicago RS max prefix
  • 07:47 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1099:3311', diff saved to https://phabricator.wikimedia.org/P16163 and previous config saved to /var/cache/conftool/dbconfig/20210524-074659-marostegui.json
  • 07:44 marostegui@cumin1001: dbctl commit (dc=all): 'db1135 (re)pooling @ 75%: Repool db1135', diff saved to https://phabricator.wikimedia.org/P16162 and previous config saved to /var/cache/conftool/dbconfig/20210524-074454-root.json
  • 07:29 marostegui@cumin1001: dbctl commit (dc=all): 'db1135 (re)pooling @ 50%: Repool db1135', diff saved to https://phabricator.wikimedia.org/P16161 and previous config saved to /var/cache/conftool/dbconfig/20210524-072950-root.json
  • 07:14 marostegui@cumin1001: dbctl commit (dc=all): 'db1135 (re)pooling @ 25%: Repool db1135', diff saved to https://phabricator.wikimedia.org/P16160 and previous config saved to /var/cache/conftool/dbconfig/20210524-071447-root.json
  • 05:27 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1149 - schema change', diff saved to https://phabricator.wikimedia.org/P16159 and previous config saved to /var/cache/conftool/dbconfig/20210524-052747-marostegui.json
  • 05:13 marostegui@cumin1001: dbctl commit (dc=all): 'db1142 (re)pooling @ 100%: Repool db1142', diff saved to https://phabricator.wikimedia.org/P16158 and previous config saved to /var/cache/conftool/dbconfig/20210524-051345-root.json
  • 05:09 legoktm: restarting mailman3 on lists1001, bounce runner crashed
  • 04:58 marostegui@cumin1001: dbctl commit (dc=all): 'db1142 (re)pooling @ 75%: Repool db1142', diff saved to https://phabricator.wikimedia.org/P16157 and previous config saved to /var/cache/conftool/dbconfig/20210524-045841-root.json
  • 04:43 marostegui@cumin1001: dbctl commit (dc=all): 'db1142 (re)pooling @ 50%: Repool db1142', diff saved to https://phabricator.wikimedia.org/P16156 and previous config saved to /var/cache/conftool/dbconfig/20210524-044337-root.json
  • 04:38 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1135.eqiad.wmnet with reason: Schema change
  • 04:38 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 5:00:00 on db1135.eqiad.wmnet with reason: Schema change
  • 04:36 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1135', diff saved to https://phabricator.wikimedia.org/P16155 and previous config saved to /var/cache/conftool/dbconfig/20210524-043654-marostegui.json
  • 04:28 marostegui@cumin1001: dbctl commit (dc=all): 'db1142 (re)pooling @ 25%: Repool db1142', diff saved to https://phabricator.wikimedia.org/P16154 and previous config saved to /var/cache/conftool/dbconfig/20210524-042834-root.json

2021-05-23

  • 14:25 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: EMERGENCY: f752f8b: enwiktionary: Raise AF emergency disable treshold+count (T283460) (duration: 00m 57s)

2021-05-22

  • 22:13 legoktm: reset 2FA for User:Yuvipanda on wikitech
  • 21:07 ryankemper: [WDQS] Pooled `wdqs1006` (caught up on lag), de-pooled `wdqs1013` (8 hours)
  • 16:35 urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript deleteEqualMessages.php cswiki --delete

2021-05-21

  • 22:32 bstorm: upload nfsd-ldap: 1.2+deb10u1 to buster-wikimedia T283385
  • 18:24 ppchelko@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 18:22 ppchelko@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 18:14 ppchelko@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 17:39 ppchelko@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 17:36 ppchelko@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 17:29 ppchelko@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 17:28 legoktm@deploy1002: Synchronized wmf-config/interwiki.php: Update interwiki cache (duration: 02m 19s)
  • 17:21 clarakosi@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 17:17 clarakosi@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 17:09 hnowlan@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 17:09 hnowlan@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 17:07 hnowlan@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 17:07 hnowlan@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 16:40 hnowlan@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 16:40 hnowlan@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 16:16 hnowlan@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 16:16 hnowlan@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 16:14 hnowlan@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 16:14 hnowlan@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 16:11 hnowlan@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 16:11 hnowlan@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 16:09 hnowlan@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 16:09 hnowlan@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 16:06 hnowlan@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 16:06 hnowlan@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 16:03 hnowlan@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 16:03 hnowlan@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 16:02 hnowlan@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 16:02 hnowlan@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 16:02 hnowlan@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 16:01 hnowlan@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 15:19 clarakosi@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 15:14 clarakosi@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 15:07 clarakosi@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 14:57 hnowlan@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 14:57 hnowlan@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 14:56 hnowlan@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 14:56 hnowlan@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 14:42 hnowlan@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 14:42 hnowlan@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 14:20 clarakosi@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 14:13 ppchelko@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 13:41 akosiaris@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'mathoid' for release 'staging' .
  • 12:59 reedy@deploy1002: Synchronized wmf-config/interwiki.php: Update interwiki cache (duration: 02m 11s)
  • 12:56 akosiaris@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mathoid' for release 'production' .
  • 12:34 jbond@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=puppetdb-api
  • 12:24 akosiaris@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mathoid' for release 'production' .
  • 12:24 jayme@cumin1001: conftool action : set/pooled=yes; selector: dc=codfw,cluster=docker-registry
  • 12:23 jayme@cumin1001: conftool action : set/weight=10; selector: dc=codfw,cluster=docker-registry
  • 12:23 akosiaris@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'mathoid' for release 'staging' .
  • 12:22 marostegui@cumin1001: dbctl commit (dc=all): 'db1134 (re)pooling @ 100%: Repool db1134', diff saved to https://phabricator.wikimedia.org/P16150 and previous config saved to /var/cache/conftool/dbconfig/20210521-122253-root.json
  • 12:15 topranks: "Removing BGP peering sessions to LinkedIn AS14413 at AMS-IX / cr2-esams as they are no longer on the exchange."
  • 12:07 marostegui@cumin1001: dbctl commit (dc=all): 'db1134 (re)pooling @ 75%: Repool db1134', diff saved to https://phabricator.wikimedia.org/P16149 and previous config saved to /var/cache/conftool/dbconfig/20210521-120749-root.json
  • 11:52 marostegui@cumin1001: dbctl commit (dc=all): 'db1134 (re)pooling @ 50%: Repool db1134', diff saved to https://phabricator.wikimedia.org/P16148 and previous config saved to /var/cache/conftool/dbconfig/20210521-115246-root.json
  • 11:37 marostegui@cumin1001: dbctl commit (dc=all): 'db1134 (re)pooling @ 25%: Repool db1134', diff saved to https://phabricator.wikimedia.org/P16147 and previous config saved to /var/cache/conftool/dbconfig/20210521-113742-root.json
  • 10:01 jayme@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host registry2008.codfw.wmnet
  • 09:51 jayme@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host registry2007.codfw.wmnet
  • 09:41 jayme@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host registry2006.codfw.wmnet
  • 09:32 jayme@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host registry2005.codfw.wmnet
  • 09:32 jayme@cumin1001: START - Cookbook sre.ganeti.makevm for new host registry2008.codfw.wmnet
  • 09:28 jayme@cumin1001: START - Cookbook sre.ganeti.makevm for new host registry2007.codfw.wmnet
  • 09:26 gehel: depooling wdqs1006 to catch up on lag
  • 09:24 jayme@cumin1001: START - Cookbook sre.ganeti.makevm for new host registry2006.codfw.wmnet
  • 09:21 jayme@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host registry2008.codfw.wmnet
  • 09:21 jayme@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host registry2007.codfw.wmnet
  • 09:21 jayme@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host registry2006.codfw.wmnet
  • 09:15 jayme@cumin1001: START - Cookbook sre.ganeti.makevm for new host registry2008.codfw.wmnet
  • 09:15 jayme@cumin1001: START - Cookbook sre.ganeti.makevm for new host registry2007.codfw.wmnet
  • 09:15 jayme@cumin1001: START - Cookbook sre.ganeti.makevm for new host registry2006.codfw.wmnet
  • 09:14 jayme@cumin1001: START - Cookbook sre.ganeti.makevm for new host registry2005.codfw.wmnet
  • 08:56 kormat: deploying cumin2002 grants to production T276589
  • 08:41 jmm@puppetmaster1001: conftool action : set/pooled=no; selector: name=ldap-replica1002.wikimedia.org
  • 08:41 jmm@puppetmaster1001: conftool action : set/pooled=no; selector: name=ldap-replica1001.wikimedia.org
  • 08:41 jmm@puppetmaster1001: conftool action : set/pooled=no; selector: name=ldap-replica2004.wikimedia.org
  • 08:41 jmm@puppetmaster1001: conftool action : set/pooled=no; selector: name=ldap-replica2003.wikimedia.org
  • 08:20 marostegui@cumin1001: dbctl commit (dc=all): 'db1119 (re)pooling @ 100%: Repool db1119', diff saved to https://phabricator.wikimedia.org/P16146 and previous config saved to /var/cache/conftool/dbconfig/20210521-082009-root.json
  • 08:05 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1134', diff saved to https://phabricator.wikimedia.org/P16145 and previous config saved to /var/cache/conftool/dbconfig/20210521-080540-marostegui.json
  • 08:05 marostegui@cumin1001: dbctl commit (dc=all): 'db1119 (re)pooling @ 75%: Repool db1119', diff saved to https://phabricator.wikimedia.org/P16144 and previous config saved to /var/cache/conftool/dbconfig/20210521-080506-root.json
  • 07:50 marostegui@cumin1001: dbctl commit (dc=all): 'db1119 (re)pooling @ 50%: Repool db1119', diff saved to https://phabricator.wikimedia.org/P16143 and previous config saved to /var/cache/conftool/dbconfig/20210521-075002-root.json
  • 07:34 marostegui@cumin1001: dbctl commit (dc=all): 'db1119 (re)pooling @ 25%: Repool db1119', diff saved to https://phabricator.wikimedia.org/P16142 and previous config saved to /var/cache/conftool/dbconfig/20210521-073459-root.json
  • 06:32 moritzm: installing libspring-java security updates on stretch
  • 05:30 marostegui@cumin1001: dbctl commit (dc=all): 'db1143 (re)pooling @ 100%: Repool db1143', diff saved to https://phabricator.wikimedia.org/P16141 and previous config saved to /var/cache/conftool/dbconfig/20210521-053027-root.json
  • 05:18 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dbstore1006.eqiad.wmnet with reason: REIMAGE
  • 05:16 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on dbstore1006.eqiad.wmnet with reason: REIMAGE
  • 05:15 marostegui@cumin1001: dbctl commit (dc=all): 'db1143 (re)pooling @ 75%: Repool db1143', diff saved to https://phabricator.wikimedia.org/P16140 and previous config saved to /var/cache/conftool/dbconfig/20210521-051523-root.json
  • 05:14 moritzm: installing graphviz security updates on stretch
  • 05:00 marostegui@cumin1001: dbctl commit (dc=all): 'db1143 (re)pooling @ 50%: Repool db1143', diff saved to https://phabricator.wikimedia.org/P16139 and previous config saved to /var/cache/conftool/dbconfig/20210521-050020-root.json
  • 04:59 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db1087.eqiad.wmnet
  • 04:49 marostegui@cumin1001: START - Cookbook sre.hosts.decommission for hosts db1087.eqiad.wmnet
  • 04:47 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1142', diff saved to https://phabricator.wikimedia.org/P16138 and previous config saved to /var/cache/conftool/dbconfig/20210521-044717-marostegui.json
  • 04:45 marostegui@cumin1001: dbctl commit (dc=all): 'db1143 (re)pooling @ 25%: Repool db1143', diff saved to https://phabricator.wikimedia.org/P16137 and previous config saved to /var/cache/conftool/dbconfig/20210521-044516-root.json
  • 04:43 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1119', diff saved to https://phabricator.wikimedia.org/P16136 and previous config saved to /var/cache/conftool/dbconfig/20210521-044339-marostegui.json
  • 01:27 eileen: civicrm revision changed from 35f5afb1b4 to 584b96452a, config revision is 1f8d0a6bfa
  • 01:18 eileen: civicrm revision changed from 35f5afb1b4 to 584b96452a, config revision is 1f8d0a6bfa

2021-05-20

  • 21:45 herron@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 21:41 herron@cumin1001: START - Cookbook sre.dns.netbox
  • 20:30 ppchelko@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 20:30 ppchelko@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 20:06 ppchelko@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 20:06 ppchelko@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 19:54 herron@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts mwlog1001.eqiad.wmnet
  • 19:43 ppchelko@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 19:41 herron@cumin1001: START - Cookbook sre.hosts.decommission for hosts mwlog1001.eqiad.wmnet
  • 19:30 marostegui@cumin1001: dbctl commit (dc=all): 'db1118 (re)pooling @ 100%: Repool db1118', diff saved to https://phabricator.wikimedia.org/P16134 and previous config saved to /var/cache/conftool/dbconfig/20210520-193039-root.json
  • 19:15 marostegui@cumin1001: dbctl commit (dc=all): 'db1118 (re)pooling @ 75%: Repool db1118', diff saved to https://phabricator.wikimedia.org/P16133 and previous config saved to /var/cache/conftool/dbconfig/20210520-191536-root.json
  • 19:05 hashar@deploy1002: rebuilt and synchronized wikiversions files: all wikis to 1.37.0-wmf.6
  • 19:01 razzi@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 19:00 marostegui@cumin1001: dbctl commit (dc=all): 'db1118 (re)pooling @ 50%: Repool db1118', diff saved to https://phabricator.wikimedia.org/P16132 and previous config saved to /var/cache/conftool/dbconfig/20210520-190031-root.json
  • 18:56 razzi@cumin1001: START - Cookbook sre.dns.netbox
  • 18:45 marostegui@cumin1001: dbctl commit (dc=all): 'db1118 (re)pooling @ 25%: Repool db1118', diff saved to https://phabricator.wikimedia.org/P16131 and previous config saved to /var/cache/conftool/dbconfig/20210520-184527-root.json
  • 18:33 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.5/extensions/GrowthExperiments/modules/homepage/addlink/AddLinkOnboarding.js: 9edb3f4: Check if task is link-recommendation type before showing onboarding (T282826) (duration: 01m 04s)
  • 18:32 urbanecm@deploy1002: sync-file aborted: 9edb3f4: Check if task is link-recommendation type before showing onboarding (T282826) (duration: 00m 00s)
  • 18:31 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.6/extensions/GrowthExperiments/modules/homepage/addlink/AddLinkOnboarding.js: 7fb129f: Check if task is link-recommendation type before showing onboarding (T282826) (duration: 01m 05s)
  • 18:24 hnowlan@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 18:24 hnowlan@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 17:49 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:45 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 17:35 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:30 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 17:30 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:25 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 17:16 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:14 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 17:12 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:07 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 16:27 godog: upgrade grafana to 8 beta 2 on grafana2001
  • 15:48 jiji@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
  • 15:46 jiji@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
  • 15:46 jiji@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
  • 15:44 jiji@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
  • 15:44 jiji@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
  • 15:43 jiji@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
  • 15:33 moritzm: installing graphviz security updates on buster
  • 15:31 ryankemper: [cloudelastic] `ryankemper@cloudelastic1003:~$ sudo systemctl restart *search*` to clear `Check systemd state` alert on `cloudelastic1003`
  • 15:30 _joe_: test
  • 15:23 moritzm: installing graphviz security updates on buster
  • 15:21 jiji@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 15:21 jiji@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 15:21 jiji@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 15:21 jiji@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 14:38 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1118', diff saved to https://phabricator.wikimedia.org/P16128 and previous config saved to /var/cache/conftool/dbconfig/20210520-143825-marostegui.json
  • 13:58 hashar@deploy1002: Synchronized php: group1 wikis to 1.37.0-wmf.6 (duration: 01m 05s)
  • 13:57 hashar@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.37.0-wmf.6
  • 13:52 hashar@deploy1002: Synchronized php-1.37.0-wmf.6/includes/upload/UploadFromStash.php: UploadFromStash: convert default user from false to null - T283196 (duration: 01m 05s)
  • 13:50 hashar@deploy1002: Synchronized php-1.37.0-wmf.6/includes/user/ActorStore.php: ActorStore: avoid throwing in case of invalid usernames T283167 (duration: 01m 05s)
  • 13:41 volans@deploy1002: Finished deploy [debmonitor/deploy@444b931]: Release v0.3.0 (duration: 01m 20s)
  • 13:39 volans@deploy1002: Started deploy [debmonitor/deploy@444b931]: Release v0.3.0
  • 12:30 kormat: Deploying wmfmariadbpy 0.7 T283228
  • 11:35 marostegui@cumin1001: dbctl commit (dc=all): 'db1112 (re)pooling @ 100%: Repool db1112', diff saved to https://phabricator.wikimedia.org/P16126 and previous config saved to /var/cache/conftool/dbconfig/20210520-113529-root.json
  • 11:20 marostegui@cumin1001: dbctl commit (dc=all): 'db1112 (re)pooling @ 75%: Repool db1112', diff saved to https://phabricator.wikimedia.org/P16125 and previous config saved to /var/cache/conftool/dbconfig/20210520-112026-root.json
  • 11:05 marostegui@cumin1001: dbctl commit (dc=all): 'db1112 (re)pooling @ 50%: Repool db1112', diff saved to https://phabricator.wikimedia.org/P16124 and previous config saved to /var/cache/conftool/dbconfig/20210520-110522-root.json
  • 10:50 marostegui@cumin1001: dbctl commit (dc=all): 'db1112 (re)pooling @ 25%: Repool db1112', diff saved to https://phabricator.wikimedia.org/P16123 and previous config saved to /var/cache/conftool/dbconfig/20210520-105018-root.json
  • 10:15 marostegui: Deploy schema change on s1 codfw, lag will appear in codfw T266486 T268392 T273360
  • 10:10 jiji@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 10:10 jiji@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 09:35 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1112', diff saved to https://phabricator.wikimedia.org/P16122 and previous config saved to /var/cache/conftool/dbconfig/20210520-093510-marostegui.json
  • 09:32 marostegui@cumin1001: dbctl commit (dc=all): 'db1179 (re)pooling @ 100%: Repool db1179', diff saved to https://phabricator.wikimedia.org/P16121 and previous config saved to /var/cache/conftool/dbconfig/20210520-093257-root.json
  • 09:17 marostegui@cumin1001: dbctl commit (dc=all): 'db1179 (re)pooling @ 75%: Repool db1179', diff saved to https://phabricator.wikimedia.org/P16120 and previous config saved to /var/cache/conftool/dbconfig/20210520-091754-root.json
  • 09:02 marostegui@cumin1001: dbctl commit (dc=all): 'db1179 (re)pooling @ 50%: Repool db1179', diff saved to https://phabricator.wikimedia.org/P16119 and previous config saved to /var/cache/conftool/dbconfig/20210520-090250-root.json
  • 08:56 godog: move icinga-wm to libera.chat
  • 08:47 marostegui@cumin1001: dbctl commit (dc=all): 'db1179 (re)pooling @ 25%: Repool db1179', diff saved to https://phabricator.wikimedia.org/P16118 and previous config saved to /var/cache/conftool/dbconfig/20210520-084746-root.json
  • 07:44 jiji@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 07:41 jiji@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 07:17 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1179', diff saved to https://phabricator.wikimedia.org/P16117 and previous config saved to /var/cache/conftool/dbconfig/20210520-071723-marostegui.json
  • 07:14 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 100%: Repool db1166', diff saved to https://phabricator.wikimedia.org/P16116 and previous config saved to /var/cache/conftool/dbconfig/20210520-071432-root.json
  • 06:59 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 75%: Repool db1166', diff saved to https://phabricator.wikimedia.org/P16115 and previous config saved to /var/cache/conftool/dbconfig/20210520-065928-root.json
  • 06:50 ryankemper@cumin1001: END (ERROR) - Cookbook sre.elasticsearch.rolling-operation (exit_code=97) reboot without plugin upgrade (1 nodes at a time) for ElasticSearch cluster cloudelastic: cloudelastic reboot - ryankemper@cumin1001 - T283223
  • 06:50 ryankemper: T283223 Write queue not draining fast enough for the next node to reboot, will finish reboot tomorrow
  • 06:44 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 50%: Repool db1166', diff saved to https://phabricator.wikimedia.org/P16114 and previous config saved to /var/cache/conftool/dbconfig/20210520-064425-root.json
  • 06:29 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 25%: Repool db1166', diff saved to https://phabricator.wikimedia.org/P16113 and previous config saved to /var/cache/conftool/dbconfig/20210520-062921-root.json
  • 06:25 ladsgroup@deploy1002: Synchronized php-1.37.0-wmf.6/includes/PageProps.php: Backport: PageProps: be prepared that PageIdentity is not proper title (T283170) (duration: 01m 06s)
  • 06:08 elukey: powercycle ms-be2035 - no ssh available, no metrics since hours ago, I/O errors registered in the main tty on serial console
  • 05:44 marostegui@cumin1001: dbctl commit (dc=all): 'db1141 (re)pooling @ 100%: Repool db1141', diff saved to https://phabricator.wikimedia.org/P16112 and previous config saved to /var/cache/conftool/dbconfig/20210520-054402-root.json
  • 05:33 ryankemper: T283223 `sudo -i cookbook sre.elasticsearch.rolling-operation cloudelastic "cloudelastic reboot" --reboot --nodes-per-run 1 --start-datetime 2021-05-20T05:16:40 --task-id T283223` on `ryankemper@cumin1001` tmux session `restart_cloudelastic`
  • 05:33 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation reboot without plugin upgrade (1 nodes at a time) for ElasticSearch cluster cloudelastic: cloudelastic reboot - ryankemper@cumin1001 - T283223
  • 05:29 marostegui@cumin1001: dbctl commit (dc=all): 'db1141 (re)pooling @ 75%: Repool db1141', diff saved to https://phabricator.wikimedia.org/P16111 and previous config saved to /var/cache/conftool/dbconfig/20210520-052859-root.json
  • 05:27 ryankemper@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) reboot without plugin upgrade (3 nodes at a time) for ElasticSearch cluster cloudelastic: cloudelastic reboot - ryankemper@cumin1001 - T283223
  • 05:24 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation reboot without plugin upgrade (3 nodes at a time) for ElasticSearch cluster cloudelastic: cloudelastic reboot - ryankemper@cumin1001 - T283223
  • 05:22 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts labsdb1011.eqiad.wmnet
  • 05:13 marostegui@cumin1001: dbctl commit (dc=all): 'db1141 (re)pooling @ 50%: Repool db1141', diff saved to https://phabricator.wikimedia.org/P16110 and previous config saved to /var/cache/conftool/dbconfig/20210520-051355-root.json
  • 05:13 marostegui@cumin1001: START - Cookbook sre.hosts.decommission for hosts labsdb1011.eqiad.wmnet
  • 05:00 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1143', diff saved to https://phabricator.wikimedia.org/P16109 and previous config saved to /var/cache/conftool/dbconfig/20210520-050025-marostegui.json
  • 04:59 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1166', diff saved to https://phabricator.wikimedia.org/P16108 and previous config saved to /var/cache/conftool/dbconfig/20210520-045919-marostegui.json
  • 04:58 marostegui@cumin1001: dbctl commit (dc=all): 'db1141 (re)pooling @ 25%: Repool db1141', diff saved to https://phabricator.wikimedia.org/P16107 and previous config saved to /var/cache/conftool/dbconfig/20210520-045852-root.json
  • 01:01 mutante: signing puppet certs for doh2001 and doh2002.wikimedia.org (T283192)
  • 00:14 ejegg: updated fundraising CiviCRM from b3fb3c9cb0 to 35f5afb1b4
  • 00:13 ejegg: updated payments-wiki from 9f51ace546 to 6fac77f60e

2021-05-19

  • 22:44 Urbanecm: [urbanecm@mwmaint1002 ~/uploads]$ sleep 3600 && mwscript importImages.php --wiki=commonswiki --comment-ext=txt --sleep=7200 --user=Lusccasdeutsch . # T278856 # 3 video files
  • 22:29 dzahn@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host doh2002.wikimedia.org
  • 22:27 Urbanecm: Start server-side upload for 1 video file (T283186)
  • 22:25 razzi@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 22:22 Urbanecm: Start server-side upload for 3 video file (T283102, T283054)
  • 22:22 razzi@cumin1001: START - Cookbook sre.dns.netbox
  • 22:21 razzi@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 22:18 razzi@cumin1001: START - Cookbook sre.dns.netbox
  • 22:12 urbanecm@deploy1002: Synchronized wmf-config/interwiki.php: Update interwiki cache (duration: 02m 14s)
  • 22:11 dzahn@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host doh2001.wikimedia.org
  • 22:09 urbanecm@deploy1002: update-interwiki-cache aborted: Update interwiki cache (duration: 00m 11s)
  • 22:07 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm for new host doh2002.wikimedia.org
  • 22:04 dzahn@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host doh2002.wikimedia.org
  • 22:00 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm for new host doh2002.wikimedia.org
  • 21:58 dzahn@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host doh2002.wikimedia.org
  • 21:56 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm for new host doh2002.wikimedia.org
  • 21:56 dzahn@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host doh2002.wikimedia.org
  • 21:52 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm for new host doh2002.wikimedia.org
  • 21:51 razzi@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 21:50 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm for new host doh2001.wikimedia.org
  • 21:44 razzi@cumin1001: START - Cookbook sre.dns.netbox
  • 20:08 razzi@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db1125.eqiad.wmnet
  • 19:40 razzi@cumin1001: START - Cookbook sre.hosts.decommission for hosts db1125.eqiad.wmnet
  • 18:30 herron@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 18:23 herron@cumin1001: START - Cookbook sre.dns.netbox
  • 18:23 herron@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 18:20 hashar@deploy1002: rebuilt and synchronized wikiversions files: Revert group1 wikis to 1.37.0-wmf.6 T281147
  • 18:17 herron@cumin1001: START - Cookbook sre.dns.netbox
  • 16:13 volans: uploaded debmonitor-client_0.3.0 to apt.wikimedia.org stretch-wikimedia,buster-wikimedia,bullseye-wikimedia
  • 15:48 marostegui@cumin1001: dbctl commit (dc=all): 'db1157 (re)pooling @ 100%: Repool db1157', diff saved to https://phabricator.wikimedia.org/P16103 and previous config saved to /var/cache/conftool/dbconfig/20210519-154808-root.json
  • 15:33 marostegui@cumin1001: dbctl commit (dc=all): 'db1157 (re)pooling @ 75%: Repool db1157', diff saved to https://phabricator.wikimedia.org/P16102 and previous config saved to /var/cache/conftool/dbconfig/20210519-153304-root.json
  • 15:18 marostegui@cumin1001: dbctl commit (dc=all): 'db1157 (re)pooling @ 50%: Repool db1157', diff saved to https://phabricator.wikimedia.org/P16101 and previous config saved to /var/cache/conftool/dbconfig/20210519-151800-root.json
  • 15:02 marostegui@cumin1001: dbctl commit (dc=all): 'db1157 (re)pooling @ 25%: Repool db1157', diff saved to https://phabricator.wikimedia.org/P16100 and previous config saved to /var/cache/conftool/dbconfig/20210519-150257-root.json
  • 13:33 kormat: uploaded wmfmariadb 0.7 packages to apt
  • 13:29 hashar@deploy1002: Synchronized php: group1 wikis to 1.37.0-wmf.6 (duration: 01m 05s)
  • 13:28 hashar@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.37.0-wmf.6
  • 13:19 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1157', diff saved to https://phabricator.wikimedia.org/P16099 and previous config saved to /var/cache/conftool/dbconfig/20210519-131920-marostegui.json
  • 13:10 marostegui@cumin1001: dbctl commit (dc=all): 'db1175 (re)pooling @ 100%: Repool db1175', diff saved to https://phabricator.wikimedia.org/P16098 and previous config saved to /var/cache/conftool/dbconfig/20210519-131012-root.json
  • 12:55 marostegui@cumin1001: dbctl commit (dc=all): 'db1175 (re)pooling @ 75%: Repool db1175', diff saved to https://phabricator.wikimedia.org/P16097 and previous config saved to /var/cache/conftool/dbconfig/20210519-125508-root.json
  • 12:40 marostegui@cumin1001: dbctl commit (dc=all): 'db1175 (re)pooling @ 50%: Repool db1175', diff saved to https://phabricator.wikimedia.org/P16096 and previous config saved to /var/cache/conftool/dbconfig/20210519-124004-root.json
  • 12:25 marostegui@cumin1001: dbctl commit (dc=all): 'db1175 (re)pooling @ 25%: Repool db1175', diff saved to https://phabricator.wikimedia.org/P16095 and previous config saved to /var/cache/conftool/dbconfig/20210519-122501-root.json
  • 11:45 matthiasmullie: "EU backports done"
  • 11:42 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1175', diff saved to https://phabricator.wikimedia.org/P16093 and previous config saved to /var/cache/conftool/dbconfig/20210519-114203-marostegui.json
  • 11:41 mlitn@deploy1002: Synchronized php-1.37.0-wmf.6/extensions/GrowthExperiments/modules: Backport: Add a link: Set contentedtiable=false on mobile (T281771) (duration: 01m 06s)
  • 11:14 mlitn@deploy1002: Synchronized wmf-config/CommonSettings.php: Config: Properly enable media change tags on Wikipedias (T266067 T282822) - part 2 (duration: 01m 04s)
  • 11:13 mlitn@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Properly enable media change tags on Wikipedias (T266067 T282822) - part 1 (duration: 01m 34s)
  • 09:26 marostegui@cumin1001: dbctl commit (dc=all): 'db1175 (re)pooling @ 100%: Repool db1175', diff saved to https://phabricator.wikimedia.org/P16091 and previous config saved to /var/cache/conftool/dbconfig/20210519-092630-root.json
  • 09:11 marostegui@cumin1001: dbctl commit (dc=all): 'db1175 (re)pooling @ 75%: Repool db1175', diff saved to https://phabricator.wikimedia.org/P16090 and previous config saved to /var/cache/conftool/dbconfig/20210519-091126-root.json
  • 08:56 marostegui@cumin1001: dbctl commit (dc=all): 'db1175 (re)pooling @ 50%: Repool db1175', diff saved to https://phabricator.wikimedia.org/P16089 and previous config saved to /var/cache/conftool/dbconfig/20210519-085622-root.json
  • 08:41 marostegui@cumin1001: dbctl commit (dc=all): 'db1175 (re)pooling @ 25%: Repool db1175', diff saved to https://phabricator.wikimedia.org/P16088 and previous config saved to /var/cache/conftool/dbconfig/20210519-084119-root.json
  • 08:28 marostegui: Stop MySQL on db1175 to upgrade kernel and mysql
  • 08:27 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1175', diff saved to https://phabricator.wikimedia.org/P16087 and previous config saved to /var/cache/conftool/dbconfig/20210519-082713-marostegui.json
  • 08:13 zpapierski@deploy1002: Finished deploy [wikimedia/discovery/analytics@f514dd9]: T273847 deploying export_queries_to_relforge - starttime bump (duration: 02m 24s)
  • 08:10 zpapierski@deploy1002: Started deploy [wikimedia/discovery/analytics@f514dd9]: T273847 deploying export_queries_to_relforge - starttime bump
  • 07:48 zpapierski@deploy1002: Finished deploy [wikimedia/discovery/analytics@5740956]: T273847 deploying export_queries_to_relforge - index setting changes (duration: 02m 23s)
  • 07:45 zpapierski@deploy1002: Started deploy [wikimedia/discovery/analytics@5740956]: T273847 deploying export_queries_to_relforge - index setting changes
  • 07:45 marostegui@cumin1001: dbctl commit (dc=all): 'db1167 (re)pooling @ 100%: Repool db1167', diff saved to https://phabricator.wikimedia.org/P16086 and previous config saved to /var/cache/conftool/dbconfig/20210519-074530-root.json
  • 07:42 XioNoX: roll SNMP: filter out default logical interfaces (.0) to all network devices - T283060
  • 07:38 godog: add 100G to prometheus/ops eqiad
  • 07:31 marostegui: Deploy schema change on s3 codfw, lag will appear in codfw T266486 T268392 T273360
  • 07:30 marostegui@cumin1001: dbctl commit (dc=all): 'db1167 (re)pooling @ 75%: Repool db1167', diff saved to https://phabricator.wikimedia.org/P16085 and previous config saved to /var/cache/conftool/dbconfig/20210519-073027-root.json
  • 07:15 marostegui@cumin1001: dbctl commit (dc=all): 'db1167 (re)pooling @ 50%: Repool db1167', diff saved to https://phabricator.wikimedia.org/P16084 and previous config saved to /var/cache/conftool/dbconfig/20210519-071523-root.json
  • 07:00 marostegui@cumin1001: dbctl commit (dc=all): 'db1167 (re)pooling @ 25%: Repool db1167', diff saved to https://phabricator.wikimedia.org/P16083 and previous config saved to /var/cache/conftool/dbconfig/20210519-070019-root.json
  • 06:46 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts labsdb1010.eqiad.wmnet
  • 06:43 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1106 T280492', diff saved to https://phabricator.wikimedia.org/P16082 and previous config saved to /var/cache/conftool/dbconfig/20210519-064343-marostegui.json
  • 06:35 marostegui@cumin1001: START - Cookbook sre.hosts.decommission for hosts labsdb1010.eqiad.wmnet
  • 06:33 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1167', diff saved to https://phabricator.wikimedia.org/P16081 and previous config saved to /var/cache/conftool/dbconfig/20210519-063345-marostegui.json
  • 06:28 marostegui@cumin1001: dbctl commit (dc=all): 'db1109 (re)pooling @ 100%: Repool db1109', diff saved to https://phabricator.wikimedia.org/P16080 and previous config saved to /var/cache/conftool/dbconfig/20210519-062824-root.json
  • 06:18 Amir1: upgrading daily-article-l to mailman3 (T282271 T280322)
  • 06:13 marostegui@cumin1001: dbctl commit (dc=all): 'db1109 (re)pooling @ 75%: Repool db1109', diff saved to https://phabricator.wikimedia.org/P16079 and previous config saved to /var/cache/conftool/dbconfig/20210519-061321-root.json
  • 06:04 legoktm: restarted mailman3 on lists1001
  • 06:01 legoktm: stopped mailman3 service on lists1001 for schema change
  • 05:58 marostegui@cumin1001: dbctl commit (dc=all): 'db1109 (re)pooling @ 50%: Repool db1109', diff saved to https://phabricator.wikimedia.org/P16078 and previous config saved to /var/cache/conftool/dbconfig/20210519-055817-root.json
  • 05:51 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1141', diff saved to https://phabricator.wikimedia.org/P16077 and previous config saved to /var/cache/conftool/dbconfig/20210519-055134-marostegui.json
  • 05:43 marostegui@cumin1001: dbctl commit (dc=all): 'db1109 (re)pooling @ 25%: Repool db1109', diff saved to https://phabricator.wikimedia.org/P16076 and previous config saved to /var/cache/conftool/dbconfig/20210519-054313-root.json
  • 05:17 marostegui: Compress a few tables on s3 T283125
  • 04:58 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1109', diff saved to https://phabricator.wikimedia.org/P16075 and previous config saved to /var/cache/conftool/dbconfig/20210519-045857-marostegui.json
  • 03:03 reedy@deploy1002: Synchronized php-1.37.0-wmf.5/includes/changetags/ChangeTagsRevisionList.php: T283098 T283099 (duration: 01m 05s)
  • 03:01 reedy@deploy1002: Synchronized php-1.37.0-wmf.6/includes/changetags/ChangeTagsRevisionList.php: T283098 T283099 (duration: 02m 35s)

2021-05-18

  • 18:40 razzi@deploy1002: Finished deploy [analytics/refinery@9392f1d] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@9392f1db6e66975304c8e9b2b7031acd3ed87fa7] (duration: 05m 16s)
  • 18:35 razzi@deploy1002: Started deploy [analytics/refinery@9392f1d] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@9392f1db6e66975304c8e9b2b7031acd3ed87fa7]
  • 18:35 razzi@deploy1002: Finished deploy [analytics/refinery@9392f1d] (thin): Regular analytics weekly train THIN [analytics/refinery@9392f1db6e66975304c8e9b2b7031acd3ed87fa7] (duration: 00m 07s)
  • 18:34 razzi@deploy1002: Started deploy [analytics/refinery@9392f1d] (thin): Regular analytics weekly train THIN [analytics/refinery@9392f1db6e66975304c8e9b2b7031acd3ed87fa7]
  • 18:33 razzi@deploy1002: Finished deploy [analytics/refinery@9392f1d]: Regular analytics weekly train [analytics/refinery@9392f1db6e66975304c8e9b2b7031acd3ed87fa7] (duration: 15m 39s)
  • 18:29 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 3da5a8b: Update IP addresses for Wiki Education Dashboard exemptions (T283096) (duration: 01m 06s)
  • 18:26 urbanecm@deploy1002: Synchronized w/robots.php: 8224e53: robots.php: avoid using ContentHandler::getContentText() (T268041) (duration: 01m 04s)
  • 18:17 razzi@deploy1002: Started deploy [analytics/refinery@9392f1d]: Regular analytics weekly train [analytics/refinery@9392f1db6e66975304c8e9b2b7031acd3ed87fa7]
  • 16:00 kormat@cumin1001: dbctl commit (dc=all): 'db1085 being decommissioned T282096', diff saved to https://phabricator.wikimedia.org/P16073 and previous config saved to /var/cache/conftool/dbconfig/20210518-160053-kormat.json
  • 15:30 urbanecm@deploy1002: Synchronized private/PrivateSettings.php: Update T250887 mitigations (duration: 01m 05s)
  • 15:23 urbanecm@deploy1002: Synchronized private/PrivateSettings.php: Update T250887 mitigations (duration: 01m 07s)
  • 14:51 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db1085.eqiad.wmnet
  • 14:38 otto@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Migrate VirtualPageView to EventPlatform on all wikis - T238138 (duration: 01m 06s)
  • 14:32 hashar@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.37.0-wmf.6
  • 14:32 kormat@cumin1001: START - Cookbook sre.hosts.decommission for hosts db1085.eqiad.wmnet
  • 14:21 hashar@deploy1002: Finished scap: testwikis wikis to 1.37.0-wmf.6 (duration: 79m 07s)
  • 14:20 marostegui@cumin1001: dbctl commit (dc=all): 'db1172 (re)pooling @ 100%: Repool db1172', diff saved to https://phabricator.wikimedia.org/P16067 and previous config saved to /var/cache/conftool/dbconfig/20210518-142042-root.json
  • 14:17 moritzm: installing remaining postgresql-11 updates (client tools and libs, servers already done)
  • 14:05 marostegui@cumin1001: dbctl commit (dc=all): 'db1172 (re)pooling @ 75%: Repool db1172', diff saved to https://phabricator.wikimedia.org/P16066 and previous config saved to /var/cache/conftool/dbconfig/20210518-140538-root.json
  • 13:50 marostegui@cumin1001: dbctl commit (dc=all): 'db1172 (re)pooling @ 50%: Repool db1172', diff saved to https://phabricator.wikimedia.org/P16065 and previous config saved to /var/cache/conftool/dbconfig/20210518-135034-root.json
  • 13:35 marostegui@cumin1001: dbctl commit (dc=all): 'db1172 (re)pooling @ 25%: Repool db1172', diff saved to https://phabricator.wikimedia.org/P16064 and previous config saved to /var/cache/conftool/dbconfig/20210518-133531-root.json
  • 13:02 hashar@deploy1002: Started scap: testwikis wikis to 1.37.0-wmf.6
  • 12:59 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1172', diff saved to https://phabricator.wikimedia.org/P16063 and previous config saved to /var/cache/conftool/dbconfig/20210518-125945-marostegui.json
  • 12:43 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on aqs1012.eqiad.wmnet with reason: new AQS node
  • 12:43 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on aqs1012.eqiad.wmnet with reason: new AQS node
  • 12:42 marostegui@cumin1001: dbctl commit (dc=all): 'db1177 (re)pooling @ 100%: Repool db1177', diff saved to https://phabricator.wikimedia.org/P16062 and previous config saved to /var/cache/conftool/dbconfig/20210518-124247-root.json
  • 12:40 Krinkle: krinkle@mw1002 purge-parsercache-now.php on pc1010 (spare, depooled), ref P16060, T280605, T282761
  • 12:27 marostegui@cumin1001: dbctl commit (dc=all): 'db1177 (re)pooling @ 75%: Repool db1177', diff saved to https://phabricator.wikimedia.org/P16061 and previous config saved to /var/cache/conftool/dbconfig/20210518-122744-root.json
  • 12:12 marostegui@cumin1001: dbctl commit (dc=all): 'db1177 (re)pooling @ 50%: Repool db1177', diff saved to https://phabricator.wikimedia.org/P16059 and previous config saved to /var/cache/conftool/dbconfig/20210518-121240-root.json
  • 12:08 hashar@deploy1002: Pruned MediaWiki: 1.37.0-wmf.4 (duration: 01m 28s)
  • 12:07 hashar@deploy1002: Pruned MediaWiki: 1.37.0-wmf.3 (duration: 01m 50s)
  • 12:04 hashar@deploy1002: clean aborted: Pruned MediaWiki: 1.37.0-wmf.1 (duration: 01m 16s)
  • 12:04 hashar: scap clean 1.37.0-wmf.1 1.37.0-wmf.3 and 1.37.0-wmf.4 # T281147
  • 11:57 marostegui@cumin1001: dbctl commit (dc=all): 'db1177 (re)pooling @ 25%: Repool db1177', diff saved to https://phabricator.wikimedia.org/P16058 and previous config saved to /var/cache/conftool/dbconfig/20210518-115736-root.json
  • 11:41 moritzm: upgrading idp2001 to Java 11.0.11
  • 11:29 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1177', diff saved to https://phabricator.wikimedia.org/P16057 and previous config saved to /var/cache/conftool/dbconfig/20210518-112942-marostegui.json
  • 10:53 moritzm: upgrade idp-test to OpenJDK 11.0.11 T281345
  • 10:27 moritzm: installing OpenJDK updates on Hadoop/Druid/AQS/kafka-Jumbo
  • 10:26 marostegui@cumin1001: dbctl commit (dc=all): 'db1178 (re)pooling @ 100%: Repool db1178', diff saved to https://phabricator.wikimedia.org/P16056 and previous config saved to /var/cache/conftool/dbconfig/20210518-102607-root.json
  • 10:16 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aqs1012.eqiad.wmnet with reason: REIMAGE
  • 10:14 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on aqs1012.eqiad.wmnet with reason: REIMAGE
  • 10:11 marostegui@cumin1001: dbctl commit (dc=all): 'db1178 (re)pooling @ 75%: Repool db1178', diff saved to https://phabricator.wikimedia.org/P16055 and previous config saved to /var/cache/conftool/dbconfig/20210518-101104-root.json
  • 10:03 kormat: stopping mariadb on db1085 T282096
  • 09:56 marostegui@cumin1001: dbctl commit (dc=all): 'db1178 (re)pooling @ 50%: Repool db1178', diff saved to https://phabricator.wikimedia.org/P16054 and previous config saved to /var/cache/conftool/dbconfig/20210518-095600-root.json
  • 09:47 kormat@cumin1001: dbctl commit (dc=all): 'db1131 (re)pooling @ 100%: reimaged to buster T280751', diff saved to https://phabricator.wikimedia.org/P16053 and previous config saved to /var/cache/conftool/dbconfig/20210518-094732-kormat.json
  • 09:44 XioNoX: 👍
  • 09:40 marostegui@cumin1001: dbctl commit (dc=all): 'db1178 (re)pooling @ 25%: Repool db1178', diff saved to https://phabricator.wikimedia.org/P16052 and previous config saved to /var/cache/conftool/dbconfig/20210518-094056-root.json
  • 09:35 marostegui@cumin1001: dbctl commit (dc=all): 'Remove db1087 from dbctl T282093', diff saved to https://phabricator.wikimedia.org/P16051 and previous config saved to /var/cache/conftool/dbconfig/20210518-093552-marostegui.json
  • 09:32 kormat@cumin1001: dbctl commit (dc=all): 'db1131 (re)pooling @ 75%: reimaged to buster T280751', diff saved to https://phabricator.wikimedia.org/P16050 and previous config saved to /var/cache/conftool/dbconfig/20210518-093228-kormat.json
  • 09:30 topranks: add peering sessions to AS8708 RCS & RDS on cr2-esams
  • 09:27 XioNoX: push test SNMP filter config on asw-a-codfw - T283060
  • 09:17 kormat@cumin1001: dbctl commit (dc=all): 'db1131 (re)pooling @ 50%: reimaged to buster T280751', diff saved to https://phabricator.wikimedia.org/P16049 and previous config saved to /var/cache/conftool/dbconfig/20210518-091725-kormat.json
  • 09:17 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1178', diff saved to https://phabricator.wikimedia.org/P16048 and previous config saved to /var/cache/conftool/dbconfig/20210518-091717-marostegui.json
  • 09:17 marostegui@cumin1001: dbctl commit (dc=all): 'db1126 (re)pooling @ 100%: Repool db1126', diff saved to https://phabricator.wikimedia.org/P16047 and previous config saved to /var/cache/conftool/dbconfig/20210518-091702-root.json
  • 09:04 kormat@cumin1001: dbctl commit (dc=all): 'Set db1131 to weight 400 in s6/eqiad T280751', diff saved to https://phabricator.wikimedia.org/P16046 and previous config saved to /var/cache/conftool/dbconfig/20210518-090449-kormat.json
  • 09:02 kormat@cumin1001: dbctl commit (dc=all): 'db1131 (re)pooling @ 25%: reimaged to buster T280751', diff saved to https://phabricator.wikimedia.org/P16045 and previous config saved to /var/cache/conftool/dbconfig/20210518-090215-kormat.json
  • 09:02 marostegui@cumin1001: dbctl commit (dc=all): 'db1126 (re)pooling @ 75%: Repool db1126', diff saved to https://phabricator.wikimedia.org/P16044 and previous config saved to /var/cache/conftool/dbconfig/20210518-090159-root.json
  • 09:01 kormat@cumin1001: dbctl commit (dc=all): 'Remove s6 eqiad primary from 'api' group T280751', diff saved to https://phabricator.wikimedia.org/P16043 and previous config saved to /var/cache/conftool/dbconfig/20210518-090156-kormat.json
  • 08:46 marostegui@cumin1001: dbctl commit (dc=all): 'db1126 (re)pooling @ 50%: Repool db1126', diff saved to https://phabricator.wikimedia.org/P16042 and previous config saved to /var/cache/conftool/dbconfig/20210518-084643-root.json
  • 08:31 marostegui@cumin1001: dbctl commit (dc=all): 'db1126 (re)pooling @ 25%: Repool db1126', diff saved to https://phabricator.wikimedia.org/P16041 and previous config saved to /var/cache/conftool/dbconfig/20210518-083139-root.json
  • 07:55 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1126', diff saved to https://phabricator.wikimedia.org/P16040 and previous config saved to /var/cache/conftool/dbconfig/20210518-075532-marostegui.json
  • 07:54 marostegui@cumin1001: dbctl commit (dc=all): 'db1111 (re)pooling @ 100%: Repool db1111', diff saved to https://phabricator.wikimedia.org/P16039 and previous config saved to /var/cache/conftool/dbconfig/20210518-075458-root.json
  • 07:39 marostegui@cumin1001: dbctl commit (dc=all): 'db1111 (re)pooling @ 75%: Repool db1111', diff saved to https://phabricator.wikimedia.org/P16038 and previous config saved to /var/cache/conftool/dbconfig/20210518-073955-root.json
  • 07:24 marostegui@cumin1001: dbctl commit (dc=all): 'db1111 (re)pooling @ 50%: Repool db1111', diff saved to https://phabricator.wikimedia.org/P16037 and previous config saved to /var/cache/conftool/dbconfig/20210518-072451-root.json
  • 07:09 marostegui@cumin1001: dbctl commit (dc=all): 'db1111 (re)pooling @ 25%: Repool db1111', diff saved to https://phabricator.wikimedia.org/P16036 and previous config saved to /var/cache/conftool/dbconfig/20210518-070947-root.json
  • 07:06 marostegui: Deploy schema change on s4 codfw, lag will appear in codfw T266486 T268392 T273360
  • 06:54 XioNoX: Homerify cloudsw ospf
  • 06:44 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1111', diff saved to https://phabricator.wikimedia.org/P16035 and previous config saved to /var/cache/conftool/dbconfig/20210518-064426-marostegui.json
  • 06:41 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db1083.eqiad.wmnet
  • 06:40 marostegui@cumin1001: dbctl commit (dc=all): 'db1114 (re)pooling @ 100%: Repool db1114', diff saved to https://phabricator.wikimedia.org/P16034 and previous config saved to /var/cache/conftool/dbconfig/20210518-064033-root.json
  • 06:33 marostegui@cumin1001: START - Cookbook sre.hosts.decommission for hosts db1083.eqiad.wmnet
  • 06:29 marostegui@cumin1001: dbctl commit (dc=all): 'Remove db1083 from dbctl T281445', diff saved to https://phabricator.wikimedia.org/P16033 and previous config saved to /var/cache/conftool/dbconfig/20210518-062947-marostegui.json
  • 06:25 marostegui@cumin1001: dbctl commit (dc=all): 'db1114 (re)pooling @ 75%: Repool db1114', diff saved to https://phabricator.wikimedia.org/P16032 and previous config saved to /var/cache/conftool/dbconfig/20210518-062529-root.json
  • 06:10 marostegui@cumin1001: dbctl commit (dc=all): 'db1114 (re)pooling @ 50%: Repool db1114', diff saved to https://phabricator.wikimedia.org/P16031 and previous config saved to /var/cache/conftool/dbconfig/20210518-061026-root.json
  • 05:55 marostegui@cumin1001: dbctl commit (dc=all): 'db1114 (re)pooling @ 25%: Repool db1114', diff saved to https://phabricator.wikimedia.org/P16030 and previous config saved to /var/cache/conftool/dbconfig/20210518-055522-root.json
  • 05:52 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts labsdb1009.eqiad.wmnet
  • 05:42 marostegui@cumin1001: START - Cookbook sre.hosts.decommission for hosts labsdb1009.eqiad.wmnet
  • 05:25 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1106.eqiad.wmnet with reason: REIMAGE
  • 05:23 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1106.eqiad.wmnet with reason: REIMAGE
  • 05:23 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1114', diff saved to https://phabricator.wikimedia.org/P16029 and previous config saved to /var/cache/conftool/dbconfig/20210518-052324-marostegui.json
  • 05:09 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1106', diff saved to https://phabricator.wikimedia.org/P16028 and previous config saved to /var/cache/conftool/dbconfig/20210518-050949-marostegui.json
  • 05:06 marostegui: Restart db1115 mysql
  • 00:56 eileen: civicrm revision changed from 38ac15233f to b3fb3c9cb0, config revision is 1f8d0a6bfa

2021-05-17

  • 23:33 urbanecm@deploy1002: update-interwiki-cache aborted: Update interwiki cache for Beta Cluster (duration: 00m 01s)
  • 23:27 urbanecm@deploy1002: Synchronized wmf-config/interwiki.php: Update interwiki cache (duration: 01m 55s)
  • 21:46 sbassett: Deployed security patch (and ran scap sync-l10n) for T260865
  • 19:45 otto@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Finalize WikidataCompletionSearchClicks Event Platform migration - T282140 (duration: 00m 58s)
  • 19:13 otto@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Migrate VirtualPageView to Event Platform on group 0 and group 1 - T238138 (duration: 00m 59s)
  • 18:27 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.5/skins/Vector/includes/FeatureManagement/Requirements/LanguageInHeaderTreatmentRequirement.php: e180b99: Allow `languageinheader` query param to fully control treatment of languages (T282543) (duration: 00m 58s)
  • 18:19 urbanecm@deploy1002: Synchronized wmf-config/throttle.php: c30f92b5: Remove expired throttle rule (duration: 00m 59s)
  • 16:53 marostegui@cumin1001: dbctl commit (dc=all): 'db1112 (re)pooling @ 100%: Repool db1112', diff saved to https://phabricator.wikimedia.org/P16022 and previous config saved to /var/cache/conftool/dbconfig/20210517-165322-root.json
  • 16:38 marostegui@cumin1001: dbctl commit (dc=all): 'db1112 (re)pooling @ 75%: Repool db1112', diff saved to https://phabricator.wikimedia.org/P16021 and previous config saved to /var/cache/conftool/dbconfig/20210517-163819-root.json
  • 16:23 marostegui@cumin1001: dbctl commit (dc=all): 'db1112 (re)pooling @ 50%: Repool db1112', diff saved to https://phabricator.wikimedia.org/P16020 and previous config saved to /var/cache/conftool/dbconfig/20210517-162315-root.json
  • 16:08 marostegui@cumin1001: dbctl commit (dc=all): 'db1112 (re)pooling @ 25%: Repool db1112', diff saved to https://phabricator.wikimedia.org/P16019 and previous config saved to /var/cache/conftool/dbconfig/20210517-160811-root.json
  • 15:33 marostegui@cumin1001: dbctl commit (dc=all): 'db1099:3318 (re)pooling @ 100%: Repool db1099:3318', diff saved to https://phabricator.wikimedia.org/P16018 and previous config saved to /var/cache/conftool/dbconfig/20210517-153311-root.json
  • 15:27 dancy@deploy1002: rebuilt and synchronized wikiversions files: group2 wikis to 1.37.0-wmf.5
  • 15:26 elukey@deploy1002: Finished deploy [ores/deploy@3e1ff5f]: Update editquality submodule after Turkish Wikipedia's labelling campain - T257359 (duration: 19m 48s)
  • 15:18 marostegui@cumin1001: dbctl commit (dc=all): 'db1099:3318 (re)pooling @ 75%: Repool db1099:3318', diff saved to https://phabricator.wikimedia.org/P16017 and previous config saved to /var/cache/conftool/dbconfig/20210517-151807-root.json
  • 15:06 elukey@deploy1002: Started deploy [ores/deploy@3e1ff5f]: Update editquality submodule after Turkish Wikipedia's labelling campain - T257359
  • 15:03 marostegui@cumin1001: dbctl commit (dc=all): 'db1099:3318 (re)pooling @ 50%: Repool db1099:3318', diff saved to https://phabricator.wikimedia.org/P16016 and previous config saved to /var/cache/conftool/dbconfig/20210517-150303-root.json
  • 14:53 hnowlan@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 14:53 hnowlan@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 14:50 hnowlan@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 14:50 hnowlan@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 14:48 marostegui@cumin1001: dbctl commit (dc=all): 'db1099:3318 (re)pooling @ 25%: Repool db1099:3318', diff saved to https://phabricator.wikimedia.org/P16015 and previous config saved to /var/cache/conftool/dbconfig/20210517-144800-root.json
  • 14:41 hnowlan@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 14:41 hnowlan@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 14:17 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1099:3318', diff saved to https://phabricator.wikimedia.org/P16014 and previous config saved to /var/cache/conftool/dbconfig/20210517-141737-marostegui.json
  • 14:16 marostegui@cumin1001: dbctl commit (dc=all): 'db1101:3318 (re)pooling @ 100%: Repool db1101:3318', diff saved to https://phabricator.wikimedia.org/P16013 and previous config saved to /var/cache/conftool/dbconfig/20210517-141627-root.json
  • 14:04 marostegui@cumin1001: dbctl commit (dc=all): 'db1144:3315 (re)pooling @ 100%: Repool db1144:3315', diff saved to https://phabricator.wikimedia.org/P16012 and previous config saved to /var/cache/conftool/dbconfig/20210517-140438-root.json
  • 14:04 marostegui@cumin1001: dbctl commit (dc=all): 'db1144:3314 (re)pooling @ 100%: Repool db1144:3314', diff saved to https://phabricator.wikimedia.org/P16011 and previous config saved to /var/cache/conftool/dbconfig/20210517-140435-root.json
  • 14:01 marostegui@cumin1001: dbctl commit (dc=all): 'db1101:3318 (re)pooling @ 75%: Repool db1101:3318', diff saved to https://phabricator.wikimedia.org/P16010 and previous config saved to /var/cache/conftool/dbconfig/20210517-140123-root.json
  • 13:49 marostegui@cumin1001: dbctl commit (dc=all): 'db1144:3315 (re)pooling @ 75%: Repool db1144:3315', diff saved to https://phabricator.wikimedia.org/P16009 and previous config saved to /var/cache/conftool/dbconfig/20210517-134934-root.json
  • 13:49 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1131.eqiad.wmnet with reason: REIMAGE
  • 13:49 marostegui@cumin1001: dbctl commit (dc=all): 'db1144:3314 (re)pooling @ 75%: Repool db1144:3314', diff saved to https://phabricator.wikimedia.org/P16008 and previous config saved to /var/cache/conftool/dbconfig/20210517-134931-root.json
  • 13:47 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1131.eqiad.wmnet with reason: REIMAGE
  • 13:46 marostegui@cumin1001: dbctl commit (dc=all): 'db1101:3318 (re)pooling @ 50%: Repool db1101:3318', diff saved to https://phabricator.wikimedia.org/P16007 and previous config saved to /var/cache/conftool/dbconfig/20210517-134619-root.json
  • 13:34 marostegui@cumin1001: dbctl commit (dc=all): 'db1144:3315 (re)pooling @ 50%: Repool db1144:3315', diff saved to https://phabricator.wikimedia.org/P16006 and previous config saved to /var/cache/conftool/dbconfig/20210517-133431-root.json
  • 13:34 marostegui@cumin1001: dbctl commit (dc=all): 'db1144:3314 (re)pooling @ 50%: Repool db1144:3314', diff saved to https://phabricator.wikimedia.org/P16005 and previous config saved to /var/cache/conftool/dbconfig/20210517-133427-root.json
  • 13:31 marostegui@cumin1001: dbctl commit (dc=all): 'db1101:3318 (re)pooling @ 25%: Repool db1101:3318', diff saved to https://phabricator.wikimedia.org/P16004 and previous config saved to /var/cache/conftool/dbconfig/20210517-133116-root.json
  • 13:19 marostegui@cumin1001: dbctl commit (dc=all): 'db1144:3315 (re)pooling @ 25%: Repool db1144:3315', diff saved to https://phabricator.wikimedia.org/P16003 and previous config saved to /var/cache/conftool/dbconfig/20210517-131927-root.json
  • 13:19 marostegui@cumin1001: dbctl commit (dc=all): 'db1144:3314 (re)pooling @ 25%: Repool db1144:3314', diff saved to https://phabricator.wikimedia.org/P16002 and previous config saved to /var/cache/conftool/dbconfig/20210517-131924-root.json
  • 13:10 marostegui: Upgrade kernel and mysql (10.4.19) on db1144:3314, db1144:3315
  • 13:09 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1144:3314, db1144:3315 for kernel and mysql upgrade', diff saved to https://phabricator.wikimedia.org/P16001 and previous config saved to /var/cache/conftool/dbconfig/20210517-130935-marostegui.json
  • 12:57 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1101:3318', diff saved to https://phabricator.wikimedia.org/P16000 and previous config saved to /var/cache/conftool/dbconfig/20210517-125742-marostegui.json
  • 12:35 marostegui@cumin1001: dbctl commit (dc=all): 'db1158 (re)pooling @ 100%: Repool db1158', diff saved to https://phabricator.wikimedia.org/P15999 and previous config saved to /var/cache/conftool/dbconfig/20210517-123548-root.json
  • 12:20 marostegui@cumin1001: dbctl commit (dc=all): 'db1158 (re)pooling @ 75%: Repool db1158', diff saved to https://phabricator.wikimedia.org/P15998 and previous config saved to /var/cache/conftool/dbconfig/20210517-122045-root.json
  • 12:08 mvolz@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'citoid' for release 'production' .
  • 12:07 mvolz@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'citoid' for release 'production' .
  • 12:05 marostegui@cumin1001: dbctl commit (dc=all): 'db1158 (re)pooling @ 50%: Repool db1158', diff saved to https://phabricator.wikimedia.org/P15997 and previous config saved to /var/cache/conftool/dbconfig/20210517-120541-root.json
  • 12:04 mvolz@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'citoid' for release 'staging' .
  • 11:55 marostegui: Deploy schema change on s8 codfw, lag will appear in codfw T266486 T268392 T273360
  • 11:50 marostegui@cumin1001: dbctl commit (dc=all): 'db1158 (re)pooling @ 25%: Repool db1158', diff saved to https://phabricator.wikimedia.org/P15996 and previous config saved to /var/cache/conftool/dbconfig/20210517-115037-root.json
  • 11:50 Urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript namespaceDupes.php --wiki=mswikibooks --fix
  • 11:50 Urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript namespaceDupes.php --wiki=mswiki --fix
  • 11:49 Urbanecm: 11:49:22 Synchronized wmf-config/InitialiseSettings.php: a73fe2d: Make the Malaysian talk namespaces names consistent (duration: 01m 08s)
  • 11:27 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on aqs1012.eqiad.wmnet with reason: Testing removing from new AQS cluster
  • 11:27 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on aqs1012.eqiad.wmnet with reason: Testing removing from new AQS cluster
  • 11:24 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 1e06f83: Enable SandboxLink at azwiki (T282954) (duration: 01m 08s)
  • 11:22 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 32e4343: urwiki: Grant `editprotected` to eliminators (T281274) (duration: 01m 08s)
  • 11:17 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 36d29a6: Enable NewUserMessage on ptwikinews (T282845) (duration: 01m 09s)
  • 11:13 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1158', diff saved to https://phabricator.wikimedia.org/P15995 and previous config saved to /var/cache/conftool/dbconfig/20210517-111343-marostegui.json
  • 11:07 Urbanecm: Purge https://en.wikipedia.org/static/images/project-logos/{bnwiki,bnwiki-1.5x,bnwiki-2x}.png (T282886)
  • 11:07 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on aqs1012.eqiad.wmnet with reason: Testing removing from new AQS cluster
  • 11:07 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 0:30:00 on aqs1012.eqiad.wmnet with reason: Testing removing from new AQS cluster
  • 11:06 urbanecm@deploy1002: Synchronized static/images/project-logos/: b1da7aa: Update bnwiki project logo (T282886) (duration: 01m 42s)
  • 11:03 Urbanecm: [urbanecm@mwmaint1002 ~/uploads]$ mwscript importImages.php --wiki=commonswiki --comment-ext=txt --sleep=3600 --user=Lusccasdeutsch . # T278856
  • 10:38 marostegui@cumin1001: dbctl commit (dc=all): 'db1127 (re)pooling @ 100%: Repool db1127', diff saved to https://phabricator.wikimedia.org/P15994 and previous config saved to /var/cache/conftool/dbconfig/20210517-103823-root.json
  • 10:37 jdrewniak@deploy1002: Synchronized portals: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 01m 07s)
  • 10:36 jdrewniak@deploy1002: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 01m 08s)
  • 10:30 moritzm: installing postgresql-11 security updates
  • 10:23 marostegui@cumin1001: dbctl commit (dc=all): 'db1127 (re)pooling @ 75%: Repool db1127', diff saved to https://phabricator.wikimedia.org/P15993 and previous config saved to /var/cache/conftool/dbconfig/20210517-102319-root.json
  • 10:08 marostegui@cumin1001: dbctl commit (dc=all): 'db1127 (re)pooling @ 50%: Repool db1127', diff saved to https://phabricator.wikimedia.org/P15992 and previous config saved to /var/cache/conftool/dbconfig/20210517-100815-root.json
  • 09:53 marostegui@cumin1001: dbctl commit (dc=all): 'db1127 (re)pooling @ 25%: Repool db1127', diff saved to https://phabricator.wikimedia.org/P15991 and previous config saved to /var/cache/conftool/dbconfig/20210517-095312-root.json
  • 09:43 hashar: Restarted CI Jenkins to update the instant-messaging and ircbot plugins # T271122
  • 09:33 moritzm: installing libimage-exiftool-perl security updates
  • 09:29 topranks: push CR691140 to eqiad and codfw core routers - T282809
  • 09:18 hashar: Restarting CI Jenkins to upgrade the Gearman plugin # T281737
  • 09:16 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1127', diff saved to https://phabricator.wikimedia.org/P15990 and previous config saved to /var/cache/conftool/dbconfig/20210517-091636-marostegui.json
  • 09:16 marostegui@cumin1001: dbctl commit (dc=all): 'db1170:3317 (re)pooling @ 100%: Repool db1170:3317', diff saved to https://phabricator.wikimedia.org/P15989 and previous config saved to /var/cache/conftool/dbconfig/20210517-091604-root.json
  • 09:06 ema: cp_eqsin: run confd-reload-vcl manually to fix /var/run/reload-vcl-state T282880
  • 09:01 marostegui@cumin1001: dbctl commit (dc=all): 'db1170:3317 (re)pooling @ 75%: Repool db1170:3317', diff saved to https://phabricator.wikimedia.org/P15988 and previous config saved to /var/cache/conftool/dbconfig/20210517-090101-root.json
  • 08:52 vgutierrez: pool cp5016
  • 08:45 marostegui@cumin1001: dbctl commit (dc=all): 'db1170:3317 (re)pooling @ 50%: Repool db1170:3317', diff saved to https://phabricator.wikimedia.org/P15987 and previous config saved to /var/cache/conftool/dbconfig/20210517-084557-root.json
  • 08:45 vgutierrez: depool cp5016
  • 08:30 marostegui@cumin1001: dbctl commit (dc=all): 'db1170:3317 (re)pooling @ 25%: Repool db1170:3317', diff saved to https://phabricator.wikimedia.org/P15986 and previous config saved to /var/cache/conftool/dbconfig/20210517-083053-root.json
  • 08:28 Urbanecm: wikiadmin@10.64.48.109(centralauth)> delete from global_group_restrictions where ggr_group="Indic_Bots"; # T282968
  • 08:26 urbanecm@deploy1002: Synchronized wmf-config/logos.php: 93e61f7: Use svwiki 20th anniversary logos (T282389) (duration: 01m 08s)
  • 08:24 urbanecm@deploy1002: Synchronized static/images/project-logos/: 0f356a3: Add svwiki 20th anniversary logos (T282389) (duration: 01m 12s)
  • 06:12 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1170:3317', diff saved to https://phabricator.wikimedia.org/P15985 and previous config saved to /var/cache/conftool/dbconfig/20210517-061232-marostegui.json
  • 06:01 kormat: restarting mariadb on db1131 to pick up report_host T266483
  • 05:55 marostegui@cumin1001: dbctl commit (dc=all): 'db1124 (re)pooling @ 100%: Repool db1124', diff saved to https://phabricator.wikimedia.org/P15984 and previous config saved to /var/cache/conftool/dbconfig/20210517-055556-root.json
  • 05:42 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db1079.eqiad.wmnet
  • 05:40 marostegui@cumin1001: dbctl commit (dc=all): 'db1124 (re)pooling @ 75%: Repool db1124', diff saved to https://phabricator.wikimedia.org/P15983 and previous config saved to /var/cache/conftool/dbconfig/20210517-054053-root.json
  • 05:32 marostegui@cumin1001: START - Cookbook sre.hosts.decommission for hosts db1079.eqiad.wmnet
  • 05:25 marostegui@cumin1001: dbctl commit (dc=all): 'db1124 (re)pooling @ 50%: Repool db1124', diff saved to https://phabricator.wikimedia.org/P15982 and previous config saved to /var/cache/conftool/dbconfig/20210517-052549-root.json
  • 05:17 marostegui@cumin1001: dbctl commit (dc=all): 'Remove db1079 from dbctl T282079', diff saved to https://phabricator.wikimedia.org/P15981 and previous config saved to /var/cache/conftool/dbconfig/20210517-051728-marostegui.json
  • 05:13 kormat@cumin1001: dbctl commit (dc=all): 'Depool db1131 until it's reimaged to buster T282124', diff saved to https://phabricator.wikimedia.org/P15980 and previous config saved to /var/cache/conftool/dbconfig/20210517-051312-kormat.json
  • 05:10 marostegui@cumin1001: dbctl commit (dc=all): 'db1124 (re)pooling @ 25%: Repool db1124', diff saved to https://phabricator.wikimedia.org/P15979 and previous config saved to /var/cache/conftool/dbconfig/20210517-051045-root.json
  • 05:07 kormat@cumin1001: dbctl commit (dc=all): 'Promote db1173 to s6 master and set section read-write T282124', diff saved to https://phabricator.wikimedia.org/P15978 and previous config saved to /var/cache/conftool/dbconfig/20210517-050740-kormat.json
  • 05:05 kormat@cumin1001: dbctl commit (dc=all): 'Set s6 eqiad as read-only for maintenance - T282124', diff saved to https://phabricator.wikimedia.org/P15977 and previous config saved to /var/cache/conftool/dbconfig/20210517-050526-kormat.json
  • 05:05 kormat: Starting s6 eqiad failover from db1131 to db1173 - T282124
  • 04:53 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1112.eqiad.wmnet with reason: REIMAGE
  • 04:50 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1112.eqiad.wmnet with reason: REIMAGE
  • 04:46 kormat@cumin1001: dbctl commit (dc=all): 'Set db1173 with weight 0 T282124', diff saved to https://phabricator.wikimedia.org/P15976 and previous config saved to /var/cache/conftool/dbconfig/20210517-044657-kormat.json
  • 04:46 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 26 hosts with reason: Master switchover s6 T282124
  • 04:46 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 26 hosts with reason: Master switchover s6 T282124
  • 04:35 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1112 T280492', diff saved to https://phabricator.wikimedia.org/P15975 and previous config saved to /var/cache/conftool/dbconfig/20210517-043551-marostegui.json
  • 04:31 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1124', diff saved to https://phabricator.wikimedia.org/P15974 and previous config saved to /var/cache/conftool/dbconfig/20210517-043148-marostegui.json
  • 02:10 legoktm: uninstalled python3-dbg on lists1001
  • 01:31 legoktm: restarted mailman3-web
  • 00:13 legoktm: installing python3-dbg on lists1001

2021-05-16

  • 22:45 Urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript extensions/WikimediaMaintenance/createExtensionTables.php --wiki=tawiki wikilove # T280326
  • 20:46 legoktm: restarted mailman3-web
  • 19:38 legoktm: restarted mailman3-web
  • 17:29 Amir1: restart mailman3-web
  • 02:39 legoktm: restarting mailman3-web on lists1001 again
  • 00:53 legoktm: restarted mailman3-web on lists1001, uwsgi looked like it got stuck, consuming all CPU/memory

2021-05-15

  • 12:33 Amir1: set fr_quality to 0 for all revisions on several wikis (T279761)
  • 06:54 Amir1: migrating most of last mailing lists of T280322

2021-05-14

  • 20:42 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts people1002.eqiad.wmnet
  • 20:32 mutante: people1002 - decom'ing - please use people1003 and see list mail
  • 20:31 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts people1002.eqiad.wmnet
  • 18:58 cdanis@cumin1001: END (PASS) - Cookbook sre.network.cf (exit_code=0)
  • 18:58 cdanis@cumin1001: START - Cookbook sre.network.cf
  • 18:39 cdanis: ✔️ cdanis@install1003.wikimedia.org ~ 🕝☕ sudo systemctl restart squid.service
  • 18:14 mutante: people1003/people2002: awk -F: '$6 ~ "^\/home" {print $1,$6}' /etc/passwd | while read line ; do user=${line% *}; dir=${line#* }; sudo mkdir -p ${dir}/public_html; sudo chown $user ${dir}/public_html; done (courtesy of Jbond)
  • 17:49 bblack: install1003 - restored normal resolv.conf + re-enabled+ran puppet
  • 17:41 bblack: install1003 - restart squid
  • 17:35 bblack: install1003 - puppet disabled and /etc/resolv.conf manually patched over to deal with a current issue
  • 17:25 cdanis: rolled back cr1-eqiad/cr2-eqiad interface disables T282881
  • 17:10 cdanis: cdanis@re0.cr1-eqiad# set interfaces gr-3/3/0.1 disable # T282881
  • 17:03 cdanis: cdanis@re0.cr2-eqiad# set interfaces gr-4/3/0.2 disable # T282881
  • 15:22 cdanis@cumin2002: END (PASS) - Cookbook sre.network.cf (exit_code=0)
  • 15:22 cdanis@cumin2002: START - Cookbook sre.network.cf
  • 15:05 Urbanecm: Start server-side upload for 1 video file (T282874)
  • 14:09 andrew@deploy1002: Finished deploy [horizon/deploy@5d0a683]: removing 'locality' from trove dashboard (duration: 04m 15s)
  • 14:04 andrew@deploy1002: Started deploy [horizon/deploy@5d0a683]: removing 'locality' from trove dashboard
  • 12:54 bblack: re-running puppet agent on cp5*
  • 12:19 jbond42: run puppet on CP servers
  • 04:20 tstarling@deploy1002: Synchronized php-1.37.0-wmf.5/includes/revisionlist/RevisionItem.php: fix deprecation warning T282825 (duration: 01m 07s)
  • 04:19 tstarling@deploy1002: Synchronized php-1.37.0-wmf.5/includes/revisiondelete/RevDelRevisionItem.php: fix deprecation warning T282825 (duration: 01m 07s)
  • 04:18 ariel@deploy1002: Finished deploy [dumps/dumps@b97a2a9]: eliminate double slash in construction of api path (duration: 00m 03s)
  • 04:18 ariel@deploy1002: Started deploy [dumps/dumps@b97a2a9]: eliminate double slash in construction of api path
  • 03:25 tstarling@deploy1002: Synchronized php-1.37.0-wmf.5/extensions/MapSources/includes/specials/MapSourcesPage.php: fix PHP notice T282833 (duration: 01m 07s)
  • 03:20 tstarling@deploy1002: Synchronized php-1.37.0-wmf.5/includes/page/WikiPage.php: T282844 (duration: 01m 06s)
  • 03:18 tstarling@deploy1002: Synchronized php-1.37.0-wmf.5/includes/page/PageArchive.php: T282844 (duration: 01m 07s)
  • 03:16 tstarling@deploy1002: Synchronized php-1.37.0-wmf.5/includes/Revision/RevisionArchiveRecord.php: fix DeletedContributions breakage T282844 (duration: 01m 07s)
  • 03:13 tstarling@deploy1002: Synchronized php-1.37.0-wmf.5/includes/logging/LogEventsList.php: fix PHP notice T282834 (duration: 01m 08s)
  • 00:39 ryankemper: T280382 `sudo -i wmf-auto-reimage-host -p T280382 --new wdqs2003.codfw.wmnet` on `ryankemper@cumin2001` tmux session `wdqs_reimage`

2021-05-13

  • 23:53 mutante: [sodium:~] $ sudo systemctl start update-ubuntu-mirror.service
  • 23:50 mutante: [sodium:~] $ sudo -u mirror /usr/local/sbin/update-ubuntu-mirror
  • 23:22 thcipriani@deploy1002: Synchronized php-1.37.0-wmf.5/extensions/WikimediaEvents: Backport: Fix "final_state: vector" bug in VectorPrefDiffInstrumentation (T261842) (duration: 01m 07s)
  • 23:11 thcipriani@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Enable WikiLove extension on tawiki (T280326) (duration: 01m 07s)
  • 23:10 ryankemper: T280382 `sudo -i wmf-auto-reimage-host -p T280382 --new wdqs2003.codfw.wmnet` on `ryankemper@cumin2001` tmux session `wdqs_reimage`
  • 23:09 ryankemper: T280382 `sudo -i wmf-auto-reimage-host -p T280382 wdqs2003.codfw.wmnet` on `ryankemper@cumin2001` tmux session `wdqs_reimage`
  • 23:09 ryankemper: T280382 `sudo -i wmf-auto-reimage-host -p T280382 wdqs1003.eqiad.wmnet` on `ryankemper@cumin1001` tmux session `wdqs_reimage`
  • 20:21 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: REVERT: 9dc74e4: Revert "Enable media change tags on wikipedias" (T266067, T282822) (duration: 01m 07s)
  • 20:09 herron@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'canary' .
  • 20:09 herron@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
  • 20:08 herron@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'canary' .
  • 20:08 herron@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
  • 19:43 dancy@deploy1002: Synchronized php: group1 wikis to 1.37.0-wmf.5 (duration: 01m 06s)
  • 19:42 dancy@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.37.0-wmf.5
  • 19:39 dancy@deploy1002: Synchronized php-1.37.0-wmf.5/extensions/GeoData/includes/Hooks.php: Backport: Make sure mId exists (T282735) (duration: 01m 08s)
  • 19:05 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 80e5b9d: cd113a7: Enable structured_task/article/link_suggestion_interaction schema (T278177) (duration: 01m 06s)
  • 18:59 Urbanecm: Morning B&C is going to take few more minutes
  • 18:36 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts people2001.codfw.wmnet
  • 18:35 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.5/extensions/GrowthExperiments/: 0856ae1: ca52e78: GrowthExperiments backports (T282711, T282175) (duration: 01m 08s)
  • 18:26 mutante: people2001 is going down - people1003 (eqiad) and people2002 (codfw) are your replacements on bullseye
  • 18:25 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts people2001.codfw.wmnet
  • 18:22 Urbanecm: Start server-side upload for 2 video files (T282643, T282644)
  • 18:21 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 4cd6a78: Growth features: Push elwiki and cawiki out of dark mode (T280673; T280172) (duration: 01m 07s)
  • 18:19 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 04eb9d3: Enable media change tags on wikipedias (T266067) (duration: 01m 07s)
  • 18:09 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: b3300c3: 59c8448: Enable Extension:MediaSearch on (test)commons (T265939) (duration: 01m 08s)
  • 17:20 andrew@deploy1002: Finished deploy [horizon/deploy@3d160f6]: Adding Database dashboards (duration: 04m 08s)
  • 17:16 andrew@deploy1002: Started deploy [horizon/deploy@3d160f6]: Adding Database dashboards
  • 16:36 jiji@deploy1002: Synchronized wmf-config/ProductionServices.php: Config: ProductionServices: add poolcounter1005 back to config (T273278) (duration: 01m 07s)
  • 16:26 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host poolcounter1005.eqiad.wmnet
  • 16:24 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host poolcounter1005.eqiad.wmnet
  • 16:24 effie: rebooting poolcounter1005
  • 16:09 jiji@deploy1002: Synchronized wmf-config/ProductionServices.php: Config: ProductionServices: poolcounter1005 will be rebooted for updates (T273278) (duration: 01m 07s)
  • 15:58 jiji@deploy1002: Synchronized wmf-config/ProductionServices.php: Config: ProductionServices: add poolcounter1004 back to config (T273278) (duration: 01m 07s)
  • 15:49 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host poolcounter1004.eqiad.wmnet
  • 15:46 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host poolcounter1004.eqiad.wmnet
  • 15:46 effie: restarting poolcounter1004
  • 15:27 jiji@deploy1002: Synchronized wmf-config/ProductionServices.php: Config: ProductionServices: poolcounter1004 will be rebooted for updates (T273278) (duration: 01m 08s)
  • 14:49 Urbanecm: Start server-side upload for 1 video file (T282785)
  • 14:07 Urbanecm: Start server-side upload for 3 video files (T282558, T282556)
  • 12:40 tgr@deploy1002: Synchronized php-1.37.0-wmf.5/extensions/GrowthExperiments: Backport: instrumentation patches ([[gerrit:690070|]] [[gerrit:690071|]] [[gerrit:690072|]] [[gerrit:690073|]]) (T278116 T278117 T278114 T278177 T278487 T278112 T278111 T278118) (duration: 01m 09s)
  • 11:00 hnowlan: deleting packages still referenced by jessie components: `sudo -i reprepro clearvanished --delete`
  • 10:46 mvolz@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'citoid' for release 'production' .
  • 10:40 mvolz@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'citoid' for release 'staging' .
  • 10:31 mvolz@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'zotero' for release 'production' .
  • 10:25 mvolz@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'zotero' for release 'production' .
  • 10:11 mvolz@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'zotero' for release 'staging' .
  • 08:47 akosiaris@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'linkrecommendation' for release 'internal' .
  • 08:47 akosiaris@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'linkrecommendation' for release 'external' .
  • 08:45 akosiaris@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'linkrecommendation' for release 'external' .
  • 08:45 akosiaris@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'linkrecommendation' for release 'internal' .
  • 08:21 akosiaris@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'linkrecommendation' for release 'staging' .
  • 07:43 kevinbazira@deploy1002: Finished deploy [ores/deploy@8fd23ed]: Regular ORES Deployment T278723 (duration: 32m 50s)
  • 07:10 kevinbazira@deploy1002: Started deploy [ores/deploy@8fd23ed]: Regular ORES Deployment T278723
  • 05:54 _joe_: running docker image prune on contint1001, which has 722 unlinked images stored in its docker daemon
  • 01:20 ryankemper@cumin2001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)

2021-05-12

  • 23:48 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.4/extensions/WikiEditor/includes/WikiEditorHooks.php: 2f6af514c49d47bbec5ce51f9f7263015e039003? PHP VisualEditorFeatureUse logging: properly record session id (T281409) (duration: 01m 07s)
  • 23:40 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.5/extensions/WikiEditor/includes/WikiEditorHooks.php: ef41396: PHP VisualEditorFeatureUse logging: properly record session id (T281409) (duration: 01m 08s)
  • 23:27 ryankemper: T280382 `sudo -i cookbook sre.wdqs.data-transfer --source wdqs2001.codfw.wmnet --dest wdqs2007.codfw.wmnet --reason "transferring fresh wikidata journal following reimage" --blazegraph_instance blazegraph` on `ryankemper@cumin2001` tmux session `wdqs_reimage`
  • 23:27 ryankemper@cumin2001: START - Cookbook sre.wdqs.data-transfer
  • 22:01 ryankemper@cumin2001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 21:56 ryankemper: T280382 `sudo -i cookbook sre.wdqs.data-transfer --source wdqs2001.codfw.wmnet --dest wdqs2007.codfw.wmnet --reason "transferring fresh categories journal following reimage" --blazegraph_instance categories` on `ryankemper@cumin1001` tmux session `reimage`
  • 21:56 ryankemper@cumin2001: START - Cookbook sre.wdqs.data-transfer
  • 21:54 ryankemper: T280382 `wdqs1012.eqiad.wmnet` has been re-imaged and had the appropriate wikidata/categories journal files transferred. `df -h` shows disk space is no longer an issue following the switch to `raid0`: `/dev/mapper/vg0-srv 2.7T 998G 1.6T 39% /srv`
  • 20:57 ottomata: starting new drop_event data purge job to drop all event data older than 90 days in the Hive event database - T273789
  • 20:33 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 19:27 ryankemper@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs2007.codfw.wmnet with reason: REIMAGE
  • 19:25 ryankemper@cumin2001: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs2007.codfw.wmnet with reason: REIMAGE
  • 19:15 ryankemper: T280382 `sudo -i cookbook sre.wdqs.data-transfer --source wdqs1011.eqiad.wmnet --dest wdqs1012.eqiad.wmnet --reason "transferring fresh wikidata journal following reimage" --blazegraph_instance blazegraph` on `ryankemper@cumin1001` tmux session `wdqs_reimage`
  • 19:15 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 19:15 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 19:11 dancy@deploy1002: Synchronized php: group1 wikis to 1.37.0-wmf.4 (duration: 01m 07s)
  • 19:10 dancy@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.37.0-wmf.4
  • 19:10 ryankemper: T280382 `sudo -i cookbook sre.wdqs.data-transfer --source wdqs1011.eqiad.wmnet --dest wdqs1012.eqiad.wmnet --reason "transferring fresh categories journal following reimage" --blazegraph_instance categories` on `ryankemper@cumin1001` tmux session `wdqs_reimage`
  • 19:09 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 19:07 ryankemper@cumin2001: END (PASS) - Cookbook sre.elasticsearch.rolling-operation (exit_code=0) reboot without plugin upgrade (3 nodes at a time) for ElasticSearch cluster search_codfw: codfw reboot - ryankemper@cumin2001 - T280563
  • 19:06 dancy@deploy1002: Synchronized php: group1 wikis to 1.37.0-wmf.5 (duration: 01m 06s)
  • 19:05 dancy@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.37.0-wmf.5
  • 19:05 ryankemper: T280382 T281437 `sudo -i wmf-auto-reimage-host -p T280382 wdqs2007.codfw.wmnet` on `ryankemper@cumin2001` tmux session `wdqs_reimage`
  • 19:00 ryankemper: T280563 `sudo -i cookbook sre.elasticsearch.rolling-operation search_codfw "codfw reboot" --reboot --nodes-per-run 3 --start-datetime 2021-04-29T23:04:29 --task-id T280563` on `ryankemper@cumin2001` tmux session `elastic_restarts`
  • 19:00 ryankemper@cumin2001: START - Cookbook sre.elasticsearch.rolling-operation reboot without plugin upgrade (3 nodes at a time) for ElasticSearch cluster search_codfw: codfw reboot - ryankemper@cumin2001 - T280563
  • 18:59 ryankemper: [Elastic] Restarted `*search*` services on `elastic2058`
  • 18:48 mutante: rsyncing home dirs of people1003 over to people2002 as well (T280989)
  • 18:42 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.5/extensions/GrowthExperiments/: 3999be1: Add Link: refine exclusion rules for finding link text matches (duration: 01m 08s)
  • 18:28 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: eb65aff: Update wordmark and tagline for kawiki (T278251; 2/2) (duration: 01m 09s)
  • 18:26 urbanecm@deploy1002: Synchronized static/images/mobile/: eb65aff: Update wordmark and tagline for kawiki (T278251; 1/2) (duration: 01m 06s)
  • 18:25 urbanecm@deploy1002: sync-file aborted: eb65aff: Update wordmark and tagline for kawiki (T278251) (duration: 00m 00s)
  • 18:16 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 0cd3297: Disable Education Program namespaces in cswiki (T282691) (duration: 01m 15s)
  • 18:11 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.5/includes/skins/SkinTemplate.php: 7f14913: Modern keys must be unset (T282646) (duration: 01m 08s)
  • 18:05 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 11defd4: enwiki: Growth features: Change help panel links (T281896) (duration: 01m 23s)
  • 16:15 hnowlan: including envoyproxy_1.15.5-1_amd64.changes with reprepro
  • 15:51 aborrero@cumin2001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts cloudnet2003-dev.codfw.wmnet
  • 14:45 aborrero@cumin2001: START - Cookbook sre.hosts.decommission for hosts cloudnet2003-dev.codfw.wmnet
  • 14:02 marostegui: Upgrad mysql on clouddb1015
  • 14:01 marostegui: Upgraded mysql on clouddb1014
  • 13:57 kormat: uploaded wmfmariadbpy 0.6.1 for bullseye
  • 13:32 marostegui@cumin1001: dbctl commit (dc=all): 'db1174 (re)pooling @ 100%: Repool db1174', diff saved to https://phabricator.wikimedia.org/P15950 and previous config saved to /var/cache/conftool/dbconfig/20210512-133248-root.json
  • 13:17 marostegui@cumin1001: dbctl commit (dc=all): 'db1174 (re)pooling @ 75%: Repool db1174', diff saved to https://phabricator.wikimedia.org/P15949 and previous config saved to /var/cache/conftool/dbconfig/20210512-131745-root.json
  • 13:15 aborrero@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudnet2004-dev.codfw.wmnet with reason: REIMAGE
  • 13:13 aborrero@cumin2001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudnet2004-dev.codfw.wmnet with reason: REIMAGE
  • 13:06 volans@cumin2002: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) homer to cumin2002.codfw.wmnet with reason: Test deploy procedure on cumin2002 - volans@cumin2002
  • 13:05 volans@cumin2002: START - Cookbook sre.deploy.python-code homer to cumin2002.codfw.wmnet with reason: Test deploy procedure on cumin2002 - volans@cumin2002
  • 13:02 marostegui@cumin1001: dbctl commit (dc=all): 'db1174 (re)pooling @ 50%: Repool db1174', diff saved to https://phabricator.wikimedia.org/P15948 and previous config saved to /var/cache/conftool/dbconfig/20210512-130239-root.json
  • 12:47 marostegui@cumin1001: dbctl commit (dc=all): 'db1174 (re)pooling @ 25%: Repool db1174', diff saved to https://phabricator.wikimedia.org/P15947 and previous config saved to /var/cache/conftool/dbconfig/20210512-124736-root.json
  • 12:44 aborrero@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudnet2004-dev.codfw.wmnet with reason: REIMAGE
  • 12:42 aborrero@cumin2001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudnet2004-dev.codfw.wmnet with reason: REIMAGE
  • 12:10 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1174', diff saved to https://phabricator.wikimedia.org/P15946 and previous config saved to /var/cache/conftool/dbconfig/20210512-121004-marostegui.json
  • 12:07 marostegui@cumin1001: dbctl commit (dc=all): 'db1101:3317 (re)pooling @ 100%: Repool db1101:3317', diff saved to https://phabricator.wikimedia.org/P15945 and previous config saved to /var/cache/conftool/dbconfig/20210512-120746-root.json
  • 11:52 marostegui@cumin1001: dbctl commit (dc=all): 'db1101:3317 (re)pooling @ 75%: Repool db1101:3317', diff saved to https://phabricator.wikimedia.org/P15944 and previous config saved to /var/cache/conftool/dbconfig/20210512-115242-root.json
  • 11:43 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.5/extensions/GrowthExperiments/: 6cc2530: c268d08: b89592e: 7620953: 8fd7610: GrowthExperiments backports (duration: 01m 17s)
  • 11:37 marostegui@cumin1001: dbctl commit (dc=all): 'db1101:3317 (re)pooling @ 50%: Repool db1101:3317', diff saved to https://phabricator.wikimedia.org/P15943 and previous config saved to /var/cache/conftool/dbconfig/20210512-113737-root.json
  • 11:22 marostegui@cumin1001: dbctl commit (dc=all): 'db1101:3317 (re)pooling @ 25%: Repool db1101:3317', diff saved to https://phabricator.wikimedia.org/P15942 and previous config saved to /var/cache/conftool/dbconfig/20210512-112234-root.json
  • 11:09 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 9939edb: zhwikinews: Allow sysops to grant/revoke transwiki group (T273405) (duration: 02m 17s)
  • 10:46 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 180 days, 0:00:00 on cloudvirt1038.eqiad.wmnet with reason: T276922
  • 10:46 aborrero@cumin1001: START - Cookbook sre.hosts.downtime for 180 days, 0:00:00 on cloudvirt1038.eqiad.wmnet with reason: T276922
  • 10:32 volans@cumin2002: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) homer to cumin2002.codfw.wmnet with reason: Initial deploy to cumin2002 - volans@cumin2002
  • 10:31 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host poolcounter2004.codfw.wmnet
  • 10:29 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host poolcounter2004.codfw.wmnet
  • 10:14 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host poolcounter2003.codfw.wmnet
  • 10:01 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host poolcounter2003.codfw.wmnet
  • 10:01 effie: reboot poolcounter2003 and poolcounter2004
  • 09:55 volans@cumin2002: START - Cookbook sre.deploy.python-code homer to cumin2002.codfw.wmnet with reason: Initial deploy to cumin2002 - volans@cumin2002
  • 09:33 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1101:3317', diff saved to https://phabricator.wikimedia.org/P15940 and previous config saved to /var/cache/conftool/dbconfig/20210512-093333-marostegui.json
  • 09:33 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3317 (re)pooling @ 100%: Repool db1098:3317', diff saved to https://phabricator.wikimedia.org/P15939 and previous config saved to /var/cache/conftool/dbconfig/20210512-093308-root.json
  • 09:20 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db1074.eqiad.wmnet
  • 09:18 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3317 (re)pooling @ 75%: Repool db1098:3317', diff saved to https://phabricator.wikimedia.org/P15938 and previous config saved to /var/cache/conftool/dbconfig/20210512-091804-root.json
  • 09:10 marostegui@cumin1001: START - Cookbook sre.hosts.decommission for hosts db1074.eqiad.wmnet
  • 09:03 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3317 (re)pooling @ 50%: Repool db1098:3317', diff saved to https://phabricator.wikimedia.org/P15937 and previous config saved to /var/cache/conftool/dbconfig/20210512-090301-root.json
  • 08:47 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3317 (re)pooling @ 25%: Repool db1098:3317', diff saved to https://phabricator.wikimedia.org/P15936 and previous config saved to /var/cache/conftool/dbconfig/20210512-084757-root.json
  • 08:47 marostegui@cumin1001: dbctl commit (dc=all): 'Remove db1074 from dbctl T281959', diff saved to https://phabricator.wikimedia.org/P15935 and previous config saved to /var/cache/conftool/dbconfig/20210512-084755-marostegui.json
  • 08:23 jbond42: rolling restart of ats
  • 07:10 marostegui@cumin1001: dbctl commit (dc=all): 'db1121 (re)pooling @ 100%: Repool db1121', diff saved to https://phabricator.wikimedia.org/P15934 and previous config saved to /var/cache/conftool/dbconfig/20210512-071017-root.json
  • 07:02 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1098:3317', diff saved to https://phabricator.wikimedia.org/P15933 and previous config saved to /var/cache/conftool/dbconfig/20210512-070202-marostegui.json
  • 06:55 marostegui@cumin1001: dbctl commit (dc=all): 'db1121 (re)pooling @ 75%: Repool db1121', diff saved to https://phabricator.wikimedia.org/P15932 and previous config saved to /var/cache/conftool/dbconfig/20210512-065513-root.json
  • 06:40 marostegui@cumin1001: dbctl commit (dc=all): 'db1121 (re)pooling @ 50%: Repool db1121', diff saved to https://phabricator.wikimedia.org/P15931 and previous config saved to /var/cache/conftool/dbconfig/20210512-064009-root.json
  • 06:25 marostegui@cumin1001: dbctl commit (dc=all): 'db1121 (re)pooling @ 25%: Repool db1121', diff saved to https://phabricator.wikimedia.org/P15930 and previous config saved to /var/cache/conftool/dbconfig/20210512-062506-root.json
  • 06:21 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1079', diff saved to https://phabricator.wikimedia.org/P15929 and previous config saved to /var/cache/conftool/dbconfig/20210512-062118-marostegui.json
  • 06:20 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db2121 and db2108 in s7 T282535', diff saved to https://phabricator.wikimedia.org/P15928 and previous config saved to /var/cache/conftool/dbconfig/20210512-062046-marostegui.json
  • 06:17 marostegui@cumin1001: dbctl commit (dc=all): 'db1181 (re)pooling @ 100%: Repool db1181', diff saved to https://phabricator.wikimedia.org/P15927 and previous config saved to /var/cache/conftool/dbconfig/20210512-061702-root.json
  • 06:08 marostegui@cumin1001: dbctl commit (dc=all): 'Move db2148 to also serve vslow in s2 T282535', diff saved to https://phabricator.wikimedia.org/P15926 and previous config saved to /var/cache/conftool/dbconfig/20210512-060817-marostegui.json
  • 06:01 marostegui@cumin1001: dbctl commit (dc=all): 'db1181 (re)pooling @ 75%: Repool db1181', diff saved to https://phabricator.wikimedia.org/P15925 and previous config saved to /var/cache/conftool/dbconfig/20210512-060158-root.json
  • 05:46 marostegui@cumin1001: dbctl commit (dc=all): 'db1181 (re)pooling @ 50%: Repool db1181', diff saved to https://phabricator.wikimedia.org/P15924 and previous config saved to /var/cache/conftool/dbconfig/20210512-054655-root.json
  • 05:31 marostegui@cumin1001: dbctl commit (dc=all): 'db1181 (re)pooling @ 25%: Repool db1181', diff saved to https://phabricator.wikimedia.org/P15923 and previous config saved to /var/cache/conftool/dbconfig/20210512-053151-root.json
  • 05:00 marostegui: Stop MySQL on labsdb1009 labsdb1010 labsdb1011 T282524 T282523 T282522
  • 04:47 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1181', diff saved to https://phabricator.wikimedia.org/P15922 and previous config saved to /var/cache/conftool/dbconfig/20210512-044728-marostegui.json
  • 04:42 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2121 T282535', diff saved to https://phabricator.wikimedia.org/P15920 and previous config saved to /var/cache/conftool/dbconfig/20210512-044222-marostegui.json
  • 04:41 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2108 T282535', diff saved to https://phabricator.wikimedia.org/P15919 and previous config saved to /var/cache/conftool/dbconfig/20210512-044109-marostegui.json
  • 04:38 marostegui: Drop testing mailman3 databases T281548
  • 04:36 Amir1: importing archives of wikitech-l (T280322)
  • 01:42 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 1 day, 0:00:00 on people2002.codfw.wmnet with reason: new host
  • 01:42 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on people2002.codfw.wmnet with reason: new host
  • 01:35 mutante: people2002 - created new VM resembling people2001, signed puppet cert request, initial puppet run T280989
  • 01:19 tstarling@deploy1002: Synchronized php-1.37.0-wmf.5/includes/specialpage/ChangesListSpecialPage.php: T282183 fix hidemyself in RC and watchlist (duration: 01m 08s)
  • 01:17 tstarling@deploy1002: Synchronized php-1.37.0-wmf.4/includes/specialpage/ChangesListSpecialPage.php: T282183 fix hidemyself in RC and watchlist (duration: 01m 16s)
  • 00:54 mutante: made public_html dirs on people1002 readonly to make it obvious it is not the active backend anymore
  • 00:51 mutante: [people1002:/home] $ sudo find . -type d -name public_html -exec chmod 555 {} \;

2021-05-11

  • 23:19 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: ec37795: Change namespace names and aliases on tiwiki and tiwiktionary (T263840) (duration: 01m 07s)
  • 23:16 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 5bc40ac: ptwiki: Use celebration logos in new vector (T281925) (duration: 01m 06s)
  • 23:08 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: eac843a: Make DT source mode toolbar available as beta on all wikis (T279124) (duration: 01m 12s)
  • 23:06 urbanecm@deploy1002: Synchronized static/images/mobile/copyright/wikipedia-pt-20.png: 60e6e4e: ptwiki: Add wikipedia-pt-20.png (T281925) (duration: 01m 08s)
  • 23:02 urbanecm@deploy1002: Synchronized static/images/mobile/copyright/: e35199b: Adding square logo and wordmark for ptwiki 20 years celebration (T281925) (duration: 01m 50s)
  • 22:14 legoktm@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts lists1002.wikimedia.org
  • 22:05 legoktm@cumin1001: START - Cookbook sre.hosts.decommission for hosts lists1002.wikimedia.org
  • 21:37 Urbanecm: Start server-side upload for 3 video files (T282566, T282565, T282559)
  • 21:37 herron@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on logstash1012.eqiad.wmnet with reason: REIMAGE
  • 21:34 herron@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on logstash1012.eqiad.wmnet with reason: REIMAGE
  • 20:52 legoktm: upgraded mailman3 on lists1001
  • 20:37 dzahn@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host people2002.codfw.wmnet
  • 20:24 mforns@deploy1002: Finished deploy [analytics/refinery@270c753] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@270c753fc746b979cf90e1537f9a67ede6372795] (duration: 06m 57s)
  • 20:17 mforns@deploy1002: Started deploy [analytics/refinery@270c753] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@270c753fc746b979cf90e1537f9a67ede6372795]
  • 20:17 mforns@deploy1002: Finished deploy [analytics/refinery@270c753] (thin): Regular analytics weekly train THIN [analytics/refinery@270c753fc746b979cf90e1537f9a67ede6372795] (duration: 00m 05s)
  • 20:17 mforns@deploy1002: Started deploy [analytics/refinery@270c753] (thin): Regular analytics weekly train THIN [analytics/refinery@270c753fc746b979cf90e1537f9a67ede6372795]
  • 20:17 mforns@deploy1002: Finished deploy [analytics/refinery@270c753]: Regular analytics weekly train [analytics/refinery@270c753fc746b979cf90e1537f9a67ede6372795] (duration: 17m 01s)
  • 20:00 mforns@deploy1002: Started deploy [analytics/refinery@270c753]: Regular analytics weekly train [analytics/refinery@270c753fc746b979cf90e1537f9a67ede6372795]
  • 19:55 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm for new host people2002.codfw.wmnet
  • 19:46 mforns@deploy1002: Finished deploy [analytics/refinery@7e0598d] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@7e0598d3f0805bf3dda4e01b637d95c16a6a668b] (duration: 09m 45s)
  • 19:37 mforns@deploy1002: Started deploy [analytics/refinery@7e0598d] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@7e0598d3f0805bf3dda4e01b637d95c16a6a668b]
  • 19:33 dancy@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.37.0-wmf.5
  • 19:29 mforns@deploy1002: Finished deploy [analytics/refinery@7e0598d] (thin): Regular analytics weekly train THIN [analytics/refinery@7e0598d3f0805bf3dda4e01b637d95c16a6a668b] (duration: 00m 07s)
  • 19:29 mforns@deploy1002: Started deploy [analytics/refinery@7e0598d] (thin): Regular analytics weekly train THIN [analytics/refinery@7e0598d3f0805bf3dda4e01b637d95c16a6a668b]
  • 19:28 mforns@deploy1002: Finished deploy [analytics/refinery@7e0598d]: Regular analytics weekly train [analytics/refinery@7e0598d3f0805bf3dda4e01b637d95c16a6a668b] (duration: 45m 45s)
  • 18:55 herron@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on logstash1011.eqiad.wmnet with reason: REIMAGE
  • 18:53 otto@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Migrate VirtualPageView to EventPlatform on testwiki - T238138 (duration: 01m 09s)
  • 18:52 herron@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on logstash1011.eqiad.wmnet with reason: REIMAGE
  • 18:43 mforns@deploy1002: Started deploy [analytics/refinery@7e0598d]: Regular analytics weekly train [analytics/refinery@7e0598d3f0805bf3dda4e01b637d95c16a6a668b]
  • 18:20 dancy@deploy1002: Finished scap: testwikis wikis to 1.37.0-wmf.5 (duration: 09m 43s)
  • 18:10 dancy@deploy1002: Started scap: testwikis wikis to 1.37.0-wmf.5
  • 17:36 andrew@deploy1002: Finished deploy [horizon/deploy@acc3c68]: testing default policy deployment in codfw1dev (again) (duration: 01m 25s)
  • 17:35 herron@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on logstash1010.eqiad.wmnet with reason: REIMAGE
  • 17:35 andrew@deploy1002: Started deploy [horizon/deploy@acc3c68]: testing default policy deployment in codfw1dev (again)
  • 17:34 andrew@deploy1002: Finished deploy [horizon/deploy@acc3c68]: testing default policy deployment in codfw1dev (again) (duration: 02m 27s)
  • 17:33 herron@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on logstash1010.eqiad.wmnet with reason: REIMAGE
  • 17:32 andrew@deploy1002: Started deploy [horizon/deploy@acc3c68]: testing default policy deployment in codfw1dev (again)
  • 17:31 andrew@deploy1002: Finished deploy [horizon/deploy@2604d7b]: testing default policy deployment in codfw1dev (duration: 01m 59s)
  • 17:29 andrew@deploy1002: Started deploy [horizon/deploy@2604d7b]: testing default policy deployment in codfw1dev
  • 17:20 mutante: the backend for people.wikimedia.org switched from people1002 to people1003, the people.wikimedia.org CNAME has been updated. MOTD is about to be updated to inform users.
  • 17:18 legoktm: disabled pipermail redirects on lists.wikimedia.org
  • 17:07 dancy@deploy1002: scap failed: average error rate on 9/9 canaries increased by 10x (rerun with --force to override this check, see https://logstash.wikimedia.org/goto/83629bcb5560d11e61d3085c89dd9ed6 for details)
  • 16:12 jynus: restarting bacula-dir on backup1001, stuck process
  • 15:59 dancy@deploy1002: rebuilt and synchronized wikiversions files: (no justification provided)
  • 15:58 herron@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mwlog1001.eqiad.wmnet
  • 15:55 bstorm: restart haproxy on dbproxy1018/9 to remove old config
  • 15:47 herron@cumin1001: START - Cookbook sre.hosts.decommission for hosts mwlog1001.eqiad.wmnet
  • 15:38 herron@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mwlog2001.codfw.wmnet
  • 15:37 dancy@deploy1002: scap failed: average error rate on 9/9 canaries increased by 10x (rerun with --force to override this check, see https://logstash.wikimedia.org/goto/83629bcb5560d11e61d3085c89dd9ed6 for details)
  • 15:36 dancy@deploy1002: sync-world aborted: testwikis wikis to 1.37.0-wmf.4 (duration: 02m 04s)
  • 15:34 dancy@deploy1002: Started scap: testwikis wikis to 1.37.0-wmf.4
  • 15:33 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:31 dancy@deploy1002: scap failed: RuntimeError scap failed: average error rate on 9/9 canaries increased by 10x (rerun with --force to override this check, see https://logstash.wikimedia.org/goto/83629bcb5560d11e61d3085c89dd9ed6 for details) (duration: 17m 36s)
  • 15:31 dancy@deploy1002: scap failed: average error rate on 9/9 canaries increased by 10x (rerun with --force to override this check, see https://logstash.wikimedia.org/goto/83629bcb5560d11e61d3085c89dd9ed6 for details)
  • 15:27 herron@cumin1001: START - Cookbook sre.hosts.decommission for hosts mwlog2001.codfw.wmnet
  • 15:24 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 15:13 dancy@deploy1002: Started scap: testwikis wikis to 1.37.0-wmf.5
  • 15:03 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:01 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 14:59 herron@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on logstash1010.eqiad.wmnet with reason: REIMAGE
  • 14:57 herron@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on logstash1010.eqiad.wmnet with reason: REIMAGE
  • 14:49 moritzm: installing busybox security updates
  • 14:38 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:31 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 14:29 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 14:27 moritzm: installing cgal security updates
  • 14:26 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 14:14 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:14 hashar: Restarted CI Jenkins with a snapshot of the Gearman Jenkins plugin # T281737
  • 14:10 hashar: Restarted CI Jenkins for plugin upgrade # T282433
  • 14:05 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 14:01 hashar: Restarted releases Jenkins for plugin upgrade # T282433
  • 13:47 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 1d4d007: enwiki: Growth features: Change help panel links (T281896) (duration: 01m 02s)
  • 13:39 jbond42: rolling restart of ats-backend
  • 12:11 jmm@cumin2002: END (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: mc1027.eqiad.wmnet
  • 12:11 jmm@cumin2002: START - Cookbook sre.debmonitor.remove-hosts for 1 hosts: mc1027.eqiad.wmnet
  • 11:45 marostegui@cumin1001: dbctl commit (dc=all): 'db1162 (re)pooling @ 100%: Repool db1162', diff saved to https://phabricator.wikimedia.org/P15913 and previous config saved to /var/cache/conftool/dbconfig/20210511-114540-root.json
  • 11:35 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudgw1002.eqiad.wmnet
  • 11:30 marostegui@cumin1001: dbctl commit (dc=all): 'db1162 (re)pooling @ 75%: Repool db1162', diff saved to https://phabricator.wikimedia.org/P15912 and previous config saved to /var/cache/conftool/dbconfig/20210511-113036-root.json
  • 11:16 ladsgroup@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Add P2671 and P4839 to deprecated properties list (T280779) (duration: 00m 58s)
  • 11:15 marostegui@cumin1001: dbctl commit (dc=all): 'db1162 (re)pooling @ 50%: Repool db1162', diff saved to https://phabricator.wikimedia.org/P15911 and previous config saved to /var/cache/conftool/dbconfig/20210511-111532-root.json
  • 11:00 marostegui@cumin1001: dbctl commit (dc=all): 'db1162 (re)pooling @ 25%: Repool db1162', diff saved to https://phabricator.wikimedia.org/P15910 and previous config saved to /var/cache/conftool/dbconfig/20210511-110029-root.json
  • 10:52 jayme@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
  • 10:46 jayme@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
  • 10:23 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1162', diff saved to https://phabricator.wikimedia.org/P15909 and previous config saved to /var/cache/conftool/dbconfig/20210511-102303-marostegui.json
  • 10:22 marostegui@cumin1001: dbctl commit (dc=all): 'db1170:3312 (re)pooling @ 100%: Repool db1170:3312', diff saved to https://phabricator.wikimedia.org/P15908 and previous config saved to /var/cache/conftool/dbconfig/20210511-102212-root.json
  • 10:13 aborrero@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudgw1002.eqiad.wmnet
  • 10:13 jayme@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
  • 10:07 jayme@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
  • 10:07 marostegui@cumin1001: dbctl commit (dc=all): 'db1170:3312 (re)pooling @ 75%: Repool db1170:3312', diff saved to https://phabricator.wikimedia.org/P15907 and previous config saved to /var/cache/conftool/dbconfig/20210511-100708-root.json
  • 09:54 aborrero@cumin1001: END (ERROR) - Cookbook sre.hosts.reboot-single (exit_code=97) for host cloudgw2002-dev.codfw.wmnet
  • 09:52 marostegui@cumin1001: dbctl commit (dc=all): 'db1170:3312 (re)pooling @ 50%: Repool db1170:3312', diff saved to https://phabricator.wikimedia.org/P15904 and previous config saved to /var/cache/conftool/dbconfig/20210511-095204-root.json
  • 09:37 marostegui@cumin1001: dbctl commit (dc=all): 'db1170:3312 (re)pooling @ 25%: Repool db1170:3312', diff saved to https://phabricator.wikimedia.org/P15903 and previous config saved to /var/cache/conftool/dbconfig/20210511-093701-root.json
  • 09:23 aborrero@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudgw2002-dev.codfw.wmnet
  • 08:37 jayme@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'.
  • 08:36 jayme@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'.
  • 08:35 jayme@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'apply'.
  • 08:34 jayme@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'apply'.
  • 08:32 moritzm: installing hivex security updates
  • 08:31 jayme@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'sync'.
  • 08:30 jayme@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'sync'.
  • 08:20 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1170:3312', diff saved to https://phabricator.wikimedia.org/P15901 and previous config saved to /var/cache/conftool/dbconfig/20210511-082038-marostegui.json
  • 08:19 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'sync'.
  • 08:17 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'sync'.
  • 07:55 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'sync'.
  • 07:54 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'sync'.
  • 07:40 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'sync'.
  • 07:39 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'sync'.
  • 07:07 marostegui@cumin1001: dbctl commit (dc=all): 'db1182 (re)pooling @ 100%: Repool db1182', diff saved to https://phabricator.wikimedia.org/P15899 and previous config saved to /var/cache/conftool/dbconfig/20210511-070742-root.json
  • 06:52 marostegui@cumin1001: dbctl commit (dc=all): 'db1182 (re)pooling @ 75%: Repool db1182', diff saved to https://phabricator.wikimedia.org/P15898 and previous config saved to /var/cache/conftool/dbconfig/20210511-065238-root.json
  • 06:50 marostegui: Stop replication on db2094:3318 T282514
  • 06:37 marostegui@cumin1001: dbctl commit (dc=all): 'db1182 (re)pooling @ 50%: Repool db1182', diff saved to https://phabricator.wikimedia.org/P15897 and previous config saved to /var/cache/conftool/dbconfig/20210511-063734-root.json
  • 06:22 marostegui@cumin1001: dbctl commit (dc=all): 'db1182 (re)pooling @ 25%: Repool db1182', diff saved to https://phabricator.wikimedia.org/P15896 and previous config saved to /var/cache/conftool/dbconfig/20210511-062231-root.json
  • 05:44 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db1082.eqiad.wmnet
  • 05:36 marostegui@cumin1001: START - Cookbook sre.hosts.decommission for hosts db1082.eqiad.wmnet
  • 05:32 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1121.eqiad.wmnet with reason: REIMAGE
  • 05:30 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1121.eqiad.wmnet with reason: REIMAGE
  • 05:11 marostegui: Reimage db1121 to buster, this will generate lag on s4 (commonswiki) on wikireplicas T280492
  • 05:11 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1121 - going to be reimaged to buster T280492', diff saved to https://phabricator.wikimedia.org/P15895 and previous config saved to /var/cache/conftool/dbconfig/20210511-051102-marostegui.json
  • 05:08 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1182', diff saved to https://phabricator.wikimedia.org/P15894 and previous config saved to /var/cache/conftool/dbconfig/20210511-050816-marostegui.json

2021-05-10

  • 23:38 urbanecm@deploy1002: Synchronized wmf-config/CommonSettings.php: 779fb53: Update messages used for tech CoC (T280886) (duration: 00m 56s)
  • 23:32 urbanecm@deploy1002: Synchronized wmf-config/extension-list: ba8b786: NO-OP: Enable ChessBrowser on beta (T244075) (duration: 00m 57s)
  • 23:12 urbanecm@deploy1002: Synchronized wmf-config/logos.php: dd6fa65: Use ptwiki 20th anniversary logos (T281925) (duration: 00m 59s)
  • 23:08 urbanecm@deploy1002: Synchronized static/images/project-logos/: f2a76b1: Add ptwiki 20th anniversary logos (T281925) (duration: 00m 58s)
  • 22:28 eileen: civicrm revision changed from 2052d79248 to 38ac15233f, config revision is 47f21e4568
  • 21:59 dancy@deploy1002: Synchronized php-1.37.0-wmf.4/extensions/MediaSearch/MediaSearch.i18n.php: Backport: Manually include I18nUtils class (T282206) (duration: 00m 56s)
  • 21:45 dancy@deploy1002: Synchronized php-1.37.0-wmf.3/extensions/MediaSearch/MediaSearch.i18n.php: Backport: Manually include I18nUtils class (T282206) (duration: 01m 01s)
  • 21:39 legoktm: nvm, downgraded flufl.bounce on lists1001
  • 21:26 legoktm: upgraded flufl.bounce on lists1001 and restarted mailman3 T282348
  • 20:44 andrew@deploy1002: Finished deploy [horizon/deploy@2604d7b]: more deployment fixes (duration: 03m 44s)
  • 20:41 andrew@deploy1002: Started deploy [horizon/deploy@2604d7b]: more deployment fixes
  • 20:40 andrew@deploy1002: Finished deploy [horizon/deploy@6dc83bd]: update horizon to fix T282489 (duration: 02m 07s)
  • 20:38 andrew@deploy1002: Started deploy [horizon/deploy@6dc83bd]: update horizon to fix T282489
  • 20:35 andrew@deploy1002: Finished deploy [horizon/deploy@6dc83bd]: update horizon to fix T282489 (duration: 01m 55s)
  • 20:33 andrew@deploy1002: Started deploy [horizon/deploy@6dc83bd]: update horizon to fix T282489
  • 20:31 andrew@deploy1002: Finished deploy [horizon/deploy@6dc83bd]: update horizon to fix T282489 (duration: 01m 21s)
  • 20:29 andrew@deploy1002: Started deploy [horizon/deploy@6dc83bd]: update horizon to fix T282489
  • 20:29 andrew@deploy1002: deploy aborted: update horizon to fix T282489 (duration: 00m 36s)
  • 20:29 andrew@deploy1002: Started deploy [horizon/deploy@6dc83bd]: update horizon to fix T282489
  • 20:29 andrew@deploy1002: deploy aborted: update horizon to fix T282489 (duration: 00m 15s)
  • 20:28 andrew@deploy1002: Started deploy [horizon/deploy@6dc83bd]: update horizon to fix T282489
  • 20:25 andrew@deploy1002: Finished deploy [horizon/deploy@6dc83bd]: update horizon to fix T282489 (duration: 04m 10s)
  • 20:21 andrew@deploy1002: Started deploy [horizon/deploy@6dc83bd]: update horizon to fix T282489
  • 18:34 jforrester@deploy1002: Synchronized wmf-config/CommonSettings.php: Config: loginwiki: Allow users to mark Notifications as read (T264834) (duration: 00m 57s)
  • 18:25 jforrester@deploy1002: Synchronized wmf-config/CommonSettings.php: Config: Disable LocalisationUpdate, part I (T158360) (duration: 00m 58s)
  • 18:24 XioNoX: add cmooney to all network devices
  • 18:18 jforrester@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [wikitech] Enable VE desktop section edit links (T280291) (duration: 00m 57s)
  • 18:13 jforrester@deploy1002: Synchronized wmf-config: Config: wgAbuseFilterAflFilterMigrationStage: Stop setting, COMPAT_NEW is default (T269712) (duration: 00m 57s)
  • 18:10 jforrester@deploy1002: Synchronized wmf-config/flaggedrevs.php: Config: FlaggedRevs: Stop setting wgFlaggedRevsWhitelist, now ignored (duration: 00m 57s)
  • 18:08 legoktm: imported new mailman3, flufl.bounce packages to apt.wm.o
  • 16:27 jbond42: rm -r /var/lib/routinator/repository and rebuilding repo
  • 16:23 herron@deploy1002: Synchronized wmf-config/ProductionServices.php: Config: arclamp/xenon: point all hosts to eqiad (mwlog1002) (T224565) (duration: 00m 59s)
  • 15:20 elukey: restart rsyslog on rpki1001
  • 14:32 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=maps1005.eqiad.wmnet
  • 13:14 marostegui@cumin1001: dbctl commit (dc=all): 'db1105:3312 (re)pooling @ 100%: Repool db1105:3312', diff saved to https://phabricator.wikimedia.org/P15892 and previous config saved to /var/cache/conftool/dbconfig/20210510-131434-root.json
  • 12:59 marostegui@cumin1001: dbctl commit (dc=all): 'db1105:3312 (re)pooling @ 75%: Repool db1105:3312', diff saved to https://phabricator.wikimedia.org/P15891 and previous config saved to /var/cache/conftool/dbconfig/20210510-125930-root.json
  • 12:44 marostegui@cumin1001: dbctl commit (dc=all): 'db1105:3312 (re)pooling @ 50%: Repool db1105:3312', diff saved to https://phabricator.wikimedia.org/P15890 and previous config saved to /var/cache/conftool/dbconfig/20210510-124427-root.json
  • 12:29 volans@cumin2002: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) homer to cumin2002.codfw.wmnet with reason: Initial deploy to cumin2002 - volans@cumin2002
  • 12:29 marostegui@cumin1001: dbctl commit (dc=all): 'db1105:3312 (re)pooling @ 25%: Repool db1105:3312', diff saved to https://phabricator.wikimedia.org/P15889 and previous config saved to /var/cache/conftool/dbconfig/20210510-122923-root.json
  • 12:27 volans@cumin2002: START - Cookbook sre.deploy.python-code homer to cumin2002.codfw.wmnet with reason: Initial deploy to cumin2002 - volans@cumin2002
  • 11:46 Urbanecm: EU B&C window done
  • 11:41 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 3418237: Disabling Education Program namespaces in Russian Wikipedia (T282112) (duration: 00m 57s)
  • 11:37 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 8bef11c: Add *.geograph.ie to the wgCopyUploadsDomains allowlist of Wikimedia Commons (T282007) (duration: 00m 57s)
  • 11:33 Urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript namespaceDupes.php --wiki=jawikivoyage --fix # T262155
  • 11:33 Urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript namespaceDupes.php --wiki=jawikivoyage # T262155
  • 11:33 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 068cd7e: Change namespace name and aliases on jawikivoyage (T262155) (duration: 00m 57s)
  • 11:26 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 9209d96: Remove Vector language button from Commons, Wikidata, Mediawiki, Wikispecies (T281968) (duration: 00m 57s)
  • 11:20 urbanecm@deploy1002: Synchronized wmf-config/Wikibase.php: 7f6f849: Add tmpSerializeEmptyListsAsObjects to Wikibase.php (T241422) (duration: 01m 01s)
  • 11:19 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 6138c64: Add tmpSerializeEmptyListsAsObjects Wikibase repo config (T241422) (duration: 00m 57s)
  • 11:11 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 23271dd: Enable ReferencePreviews as full default on Marathi wiki (T282147) (duration: 00m 57s)
  • 11:09 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.4/includes/block/DatabaseBlockStore.php: bd28391: DatabaseBlockStore: fetch correct ActorNormalization (3/3; T281972) (duration: 00m 56s)
  • 11:08 urbanecm@deploy1002: sync-file aborted: bd28391: DatabaseBlockStore: fetch correct ActorNormalization (T281972) (duration: 00m 04s)
  • 11:07 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.4/includes/ServiceWiring.php: 85dc711: DatabaseBlockStore: fetch correct ActorNormalization (2/3; T281972) (duration: 00m 56s)
  • 11:05 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.4/includes/block/DatabaseBlockStore.php: 85dc711: DatabaseBlockStore: fetch correct ActorNormalization (1/3; T281972) (duration: 00m 57s)
  • 11:01 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1105:3312', diff saved to https://phabricator.wikimedia.org/P15888 and previous config saved to /var/cache/conftool/dbconfig/20210510-110125-marostegui.json
  • 10:41 marostegui@cumin1001: dbctl commit (dc=all): 'db1156 (re)pooling @ 100%: Repool db1156', diff saved to https://phabricator.wikimedia.org/P15887 and previous config saved to /var/cache/conftool/dbconfig/20210510-104119-root.json
  • 10:40 jdrewniak@deploy1002: Synchronized portals: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 00m 58s)
  • 10:39 jdrewniak@deploy1002: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 00m 59s)
  • 10:31 moritzm: installing openjdk-11 security updates
  • 10:26 marostegui@cumin1001: dbctl commit (dc=all): 'db1156 (re)pooling @ 75%: Repool db1156', diff saved to https://phabricator.wikimedia.org/P15886 and previous config saved to /var/cache/conftool/dbconfig/20210510-102615-root.json
  • 10:22 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=maps1005.eqiad.wmnet
  • 10:18 vgutierrez: rolling restart of ATS backend instances to clear spurious warnings
  • 10:17 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=maps1004.eqiad.wmnet
  • 10:13 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on maps1005.eqiad.wmnet with reason: Resyncing database from master
  • 10:13 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on maps1005.eqiad.wmnet with reason: Resyncing database from master
  • 10:11 marostegui@cumin1001: dbctl commit (dc=all): 'db1156 (re)pooling @ 50%: Repool db1156', diff saved to https://phabricator.wikimedia.org/P15885 and previous config saved to /var/cache/conftool/dbconfig/20210510-101112-root.json
  • 09:56 marostegui@cumin1001: dbctl commit (dc=all): 'db1156 (re)pooling @ 25%: Repool db1156', diff saved to https://phabricator.wikimedia.org/P15884 and previous config saved to /var/cache/conftool/dbconfig/20210510-095608-root.json
  • 09:48 vgutierrez: Enforce Puppet Internal CA validation on trafficserver@eqiad - T281673
  • 09:45 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1074 T281959', diff saved to https://phabricator.wikimedia.org/P15883 and previous config saved to /var/cache/conftool/dbconfig/20210510-094554-marostegui.json
  • 09:28 jmm@puppetmaster1001: conftool action : set/pooled=yes; selector: name=ldap-replica1004.wikimedia.org
  • 09:27 jmm@puppetmaster1001: conftool action : set/pooled=yes; selector: name=ldap-replica1003.wikimedia.org
  • 09:26 jmm@puppetmaster1001: conftool action : set/pooled=yes; selector: name=ldap-replica2006.wikimedia.org
  • 08:52 moritzm: installing bind9 security updates on stretch (client-side tools/libs only)
  • 08:48 vgutierrez: Enforce Puppet Internal CA validation on trafficserver@esams - T281673
  • 08:41 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1156 for schema change', diff saved to https://phabricator.wikimedia.org/P15881 and previous config saved to /var/cache/conftool/dbconfig/20210510-084102-marostegui.json
  • 08:40 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts failoid1001.eqiad.wmnet
  • 08:40 marostegui@cumin1001: dbctl commit (dc=all): 'db1146:3312 (re)pooling @ 100%: Repool db1146:3312', diff saved to https://phabricator.wikimedia.org/P15880 and previous config saved to /var/cache/conftool/dbconfig/20210510-084040-root.json
  • 08:28 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts failoid1001.eqiad.wmnet
  • 08:25 marostegui@cumin1001: dbctl commit (dc=all): 'db1146:3312 (re)pooling @ 75%: Repool db1146:3312', diff saved to https://phabricator.wikimedia.org/P15879 and previous config saved to /var/cache/conftool/dbconfig/20210510-082536-root.json
  • 08:24 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts failoid2001.codfw.wmnet
  • 08:24 XioNoX: push pfw policies - T282286
  • 08:15 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts failoid2001.codfw.wmnet
  • 08:10 marostegui@cumin1001: dbctl commit (dc=all): 'db1146:3312 (re)pooling @ 50%: Repool db1146:3312', diff saved to https://phabricator.wikimedia.org/P15878 and previous config saved to /var/cache/conftool/dbconfig/20210510-081033-root.json
  • 07:55 marostegui@cumin1001: dbctl commit (dc=all): 'db1146:3312 (re)pooling @ 25%: Repool db1146:3312', diff saved to https://phabricator.wikimedia.org/P15877 and previous config saved to /var/cache/conftool/dbconfig/20210510-075529-root.json
  • 07:38 hashar: Restarted CI Jenkins # T281737
  • 06:37 elukey: apt-get clean on rpki1001 to free some space
  • 06:32 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1146:3312 for schema change', diff saved to https://phabricator.wikimedia.org/P15876 and previous config saved to /var/cache/conftool/dbconfig/20210510-063254-marostegui.json
  • 06:31 marostegui@cumin1001: dbctl commit (dc=all): 'db1129 (re)pooling @ 100%: Repool db1129', diff saved to https://phabricator.wikimedia.org/P15875 and previous config saved to /var/cache/conftool/dbconfig/20210510-063121-root.json
  • 06:16 marostegui@cumin1001: dbctl commit (dc=all): 'db1129 (re)pooling @ 75%: Repool db1129', diff saved to https://phabricator.wikimedia.org/P15874 and previous config saved to /var/cache/conftool/dbconfig/20210510-061617-root.json
  • 06:01 marostegui@cumin1001: dbctl commit (dc=all): 'db1129 (re)pooling @ 50%: Repool db1129', diff saved to https://phabricator.wikimedia.org/P15873 and previous config saved to /var/cache/conftool/dbconfig/20210510-060113-root.json
  • 05:46 marostegui@cumin1001: dbctl commit (dc=all): 'db1129 (re)pooling @ 25%: Repool db1129', diff saved to https://phabricator.wikimedia.org/P15872 and previous config saved to /var/cache/conftool/dbconfig/20210510-054610-root.json
  • 05:13 marostegui@cumin1001: dbctl commit (dc=all): 'Remove db1082 from dbctl T281794', diff saved to https://phabricator.wikimedia.org/P15871 and previous config saved to /var/cache/conftool/dbconfig/20210510-051334-marostegui.json
  • 05:07 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1129 for schema change', diff saved to https://phabricator.wikimedia.org/P15870 and previous config saved to /var/cache/conftool/dbconfig/20210510-050727-marostegui.json

2021-05-09

  • 21:44 legoktm: restarted mailman3 again (T282348) pymysql.err.InternalError: (1205, 'Lock wait timeout exceeded; try restarting transaction')
  • 18:28 legoktm: systemctl restart mailman3, bounce runner died again (T282348)
  • 10:52 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 180 days, 0:00:00 on cloudmetrics1002.eqiad.wmnet with reason: T275605
  • 10:52 aborrero@cumin1001: START - Cookbook sre.hosts.downtime for 180 days, 0:00:00 on cloudmetrics1002.eqiad.wmnet with reason: T275605
  • 09:16 legoktm: mailman3 live hacked patch at https://phabricator.wikimedia.org/T282348#7072358 to fix bounce queue
  • 06:21 legoktm: restarting mailman3 service, bounce runner died
  • 04:27 Amir1: starting upgrade of batch H of mailing lists (T280322)

2021-05-08

  • 17:18 Amir1: starting upgrade of batch G of mailing lists (T280322)

2021-05-07

  • 21:40 legoktm: deleted education@ from MM3, didn't import properly
  • 21:35 legoktm: deleted festivalsommer-teilnehmer from MM3, didn't import properly
  • 21:33 legoktm: fixed owner for wdqs-gui-build list
  • 19:48 pt1979@cumin2001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 19:42 pt1979@cumin2001: START - Cookbook sre.dns.netbox
  • 18:55 legoktm: deleted daily-article-l from mailman3 after failed import
  • 18:33 brennen@deploy1002: rebuilt and synchronized wikiversions files: all wikis to 1.37.0-wmf.4
  • 18:28 brennen@deploy1002: Synchronized php: group1 wikis to 1.37.0-wmf.4 (duration: 01m 07s)
  • 18:27 brennen@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.37.0-wmf.4
  • 18:23 brennen: 1.37.0-wmf.4 train status (T281145): blockers appear resolved, going ahead in the interest of not having a split deploy over weekend
  • 17:50 brennen@deploy1002: Synchronized php-1.37.0-wmf.4/includes/cache/LinkBatch.php: Backport: LinkBatch: skip bad input (T282180 T282070) (duration: 01m 06s)
  • 17:25 andrew@deploy1002: Finished deploy [horizon/deploy@20f479e]: updated trove -> codfw1dev (duration: 01m 55s)
  • 17:23 andrew@deploy1002: Started deploy [horizon/deploy@20f479e]: updated trove -> codfw1dev
  • 15:10 andrew@deploy1002: Finished deploy [horizon/deploy@71f273c]: updated trove -> codfw1dev (duration: 01m 24s)
  • 15:08 andrew@deploy1002: Started deploy [horizon/deploy@71f273c]: updated trove -> codfw1dev
  • 15:03 andrew@deploy1002: Finished deploy [horizon/deploy@71f273c]: updated trove -> codfw1dev (duration: 01m 11s)
  • 15:02 andrew@deploy1002: Started deploy [horizon/deploy@71f273c]: updated trove -> codfw1dev
  • 15:02 andrew@deploy1002: Finished deploy [horizon/deploy@71f273c]: updated trove -> codfw1dev (duration: 01m 26s)
  • 15:00 andrew@deploy1002: Started deploy [horizon/deploy@71f273c]: updated trove -> codfw1dev
  • 15:00 andrew@deploy1002: Finished deploy [horizon/deploy@71f273c]: updated trove -> codfw1dev (duration: 01m 29s)
  • 14:58 andrew@deploy1002: Started deploy [horizon/deploy@71f273c]: updated trove -> codfw1dev
  • 14:57 andrew@deploy1002: Finished deploy [horizon/deploy@71f273c]: updated trove -> codfw1dev (duration: 01m 22s)
  • 14:56 andrew@deploy1002: Started deploy [horizon/deploy@71f273c]: updated trove -> codfw1dev
  • 14:41 bblack@cumin1001: conftool action : set/pooled=yes; selector: name=cp203[34].codfw.wmnet
  • 14:40 andrew@deploy1002: Finished deploy [horizon/deploy@71f273c]: updated trove -> codfw1dev (duration: 01m 19s)
  • 14:38 andrew@deploy1002: Started deploy [horizon/deploy@71f273c]: updated trove -> codfw1dev
  • 14:38 andrew@deploy1002: Finished deploy [horizon/deploy@71f273c]: updated trove -> codfw1dev (duration: 00m 50s)
  • 14:37 andrew@deploy1002: Started deploy [horizon/deploy@71f273c]: updated trove -> codfw1dev
  • 13:04 Urbanecm: Start server-side upload for 1 video file (T281927)
  • 12:19 kormat@cumin1001: dbctl commit (dc=all): 'db1173 (re)pooling @ 100%: reimaged to buster T280751', diff saved to https://phabricator.wikimedia.org/P15856 and previous config saved to /var/cache/conftool/dbconfig/20210507-121908-kormat.json
  • 12:04 kormat@cumin1001: dbctl commit (dc=all): 'db1173 (re)pooling @ 75%: reimaged to buster T280751', diff saved to https://phabricator.wikimedia.org/P15855 and previous config saved to /var/cache/conftool/dbconfig/20210507-120404-kormat.json
  • 11:49 kormat@cumin1001: dbctl commit (dc=all): 'db1173 (re)pooling @ 50%: reimaged to buster T280751', diff saved to https://phabricator.wikimedia.org/P15854 and previous config saved to /var/cache/conftool/dbconfig/20210507-114859-kormat.json
  • 11:33 kormat@cumin1001: dbctl commit (dc=all): 'db1173 (re)pooling @ 25%: reimaged to buster T280751', diff saved to https://phabricator.wikimedia.org/P15853 and previous config saved to /var/cache/conftool/dbconfig/20210507-113355-kormat.json
  • 09:55 dcausse: depooling wdqs1012 T280382, T282222
  • 09:44 vgutierrez: Enforce Puppet Internal CA validation on trafficserver@codfw - T281673
  • 08:50 jmm@puppetmaster1001: conftool action : set/pooled=yes; selector: name=ldap-replica2005.wikimedia.org
  • 08:19 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2057.codfw.wmnet
  • 08:15 vgutierrez: Enforce Puppet Internal CA validation on trafficserver@eqsin - T281673
  • 08:10 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be2057.codfw.wmnet
  • 07:47 marostegui@cumin1001: dbctl commit (dc=all): 'db1161 (re)pooling @ 100%: Repool db1161', diff saved to https://phabricator.wikimedia.org/P15849 and previous config saved to /var/cache/conftool/dbconfig/20210507-074725-root.json
  • 07:32 marostegui@cumin1001: dbctl commit (dc=all): 'db1161 (re)pooling @ 75%: Repool db1161', diff saved to https://phabricator.wikimedia.org/P15848 and previous config saved to /var/cache/conftool/dbconfig/20210507-073222-root.json
  • 07:17 marostegui@cumin1001: dbctl commit (dc=all): 'db1161 (re)pooling @ 50%: Repool db1161', diff saved to https://phabricator.wikimedia.org/P15847 and previous config saved to /var/cache/conftool/dbconfig/20210507-071718-root.json
  • 07:02 marostegui@cumin1001: dbctl commit (dc=all): 'db1161 (re)pooling @ 25%: Repool db1161', diff saved to https://phabricator.wikimedia.org/P15846 and previous config saved to /var/cache/conftool/dbconfig/20210507-070214-root.json
  • 06:17 marostegui: Deploy schema change on s2 codfw, lag will appear T266486 T268392 T273360
  • 06:11 tstarling@deploy1002: Synchronized php-1.37.0-wmf.4/includes/api/ApiQueryLogEvents.php: fix UBN T282122 (duration: 01m 10s)
  • 06:09 tstarling@deploy1002: Synchronized php-1.37.0-wmf.3/includes/api/ApiQueryLogEvents.php: fix UBN T282122 (duration: 01m 06s)
  • 05:54 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1161 for schema change', diff saved to https://phabricator.wikimedia.org/P15845 and previous config saved to /var/cache/conftool/dbconfig/20210507-055425-marostegui.json
  • 05:53 marostegui@cumin1001: dbctl commit (dc=all): 'db1130 (re)pooling @ 100%: Repool db1130', diff saved to https://phabricator.wikimedia.org/P15844 and previous config saved to /var/cache/conftool/dbconfig/20210507-055350-root.json
  • 05:38 marostegui@cumin1001: dbctl commit (dc=all): 'db1130 (re)pooling @ 75%: Repool db1130', diff saved to https://phabricator.wikimedia.org/P15842 and previous config saved to /var/cache/conftool/dbconfig/20210507-053847-root.json
  • 05:23 marostegui@cumin1001: dbctl commit (dc=all): 'db1130 (re)pooling @ 50%: Repool db1130', diff saved to https://phabricator.wikimedia.org/P15841 and previous config saved to /var/cache/conftool/dbconfig/20210507-052343-root.json
  • 05:15 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1087 T282093', diff saved to https://phabricator.wikimedia.org/P15840 and previous config saved to /var/cache/conftool/dbconfig/20210507-051519-marostegui.json
  • 05:08 marostegui@cumin1001: dbctl commit (dc=all): 'db1130 (re)pooling @ 25%: Repool db1130', diff saved to https://phabricator.wikimedia.org/P15839 and previous config saved to /var/cache/conftool/dbconfig/20210507-050839-root.json
  • 04:33 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1130 for schema change', diff saved to https://phabricator.wikimedia.org/P15837 and previous config saved to /var/cache/conftool/dbconfig/20210507-043350-marostegui.json

2021-05-06

  • 23:50 brennen@deploy1002: rebuilt and synchronized wikiversions files: Rollback group1 and group2 to 1.37.0-wmf.3 (T282193)
  • 22:52 legoktm: upgrading mailman3 and hyperkitty on lists1001 (T282092)
  • 22:11 brennen@deploy1002: Synchronized php-1.37.0-wmf.4/includes/specials/SpecialWatchlist.php: Backport: Reorder tables in SpecialWatchlist (T282181) (duration: 00m 57s)
  • 21:48 legoktm: upgraded mailman3 and hyperkitty on lists1002 (T282092)
  • 21:46 legoktm: uploaded new mailman3 and hyperkitty packages to apt.wm.o (T282092)
  • 21:11 hashar: restarted CI Jenkins due to T281737
  • 19:05 brennen@deploy1002: rebuilt and synchronized wikiversions files: all wikis to 1.37.0-wmf.4
  • 19:04 ejegg: updated fundraising CiviCRM from 8034e47008 to 2052d79248
  • 18:58 otto@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Migrate WikidataCompletionSearchClicks to event platform on all wikis (T282140) (duration: 01m 04s)
  • 18:55 urbanecm@deploy1002: Synchronized wmf-config/Wikibase.php: 338d1df: Wikibase: Use wikidataclient-test dblist for testwikidata localClientDatabases (T282160) (duration: 01m 05s)
  • 18:46 urbanecm@deploy1002: Synchronized wmf-config/Wikibase.php: 7e21cf0: NO-OP: Wikibase: Use wikidataclient dblist directly for repo localClientDatabases (T282160) (duration: 01m 04s)
  • 18:31 otto@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Declare WikidataCompletionSearchClicks stream and migrate on testwiki - T282140 (duration: 01m 06s)
  • 17:59 volans@cumin2001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for cumin1001.eqiad.wmnet
  • 17:59 volans@cumin2001: START - Cookbook sre.hosts.remove-downtime for cumin1001.eqiad.wmnet
  • 17:47 volans@cumin2001: END (FAIL) - Cookbook sre.hosts.remove-downtime (exit_code=99) for cumin1001.eqiad.wmnet
  • 17:47 volans@cumin2001: START - Cookbook sre.hosts.remove-downtime for cumin1001.eqiad.wmnet
  • 17:35 jgiannelos@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'proton' for release 'production' .
  • 17:33 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'proton' for release 'production' .
  • 17:27 bblack@cumin1001: conftool action : set/pooled=no; selector: name=cp203[34].codfw.wmnet
  • 17:20 jgiannelos@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'proton' for release 'production' .
  • 17:15 volans: upgrade spicerack on cumin* to 0.0.52
  • 17:15 ryankemper: [Elastic] Set `elastic2043` as the only banned node in Cirrussearch Elasticsearch clusters (`elastic2058-production-search-codfw`, `elastic2058-production-search-omega-codfw`, `elastic2058-production-search-psi-codfw`)
  • 17:13 papaul: powerdown ms-be2057 for relocation
  • 17:13 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'proton' for release 'production' .
  • 17:12 volans: uploaded spicerack_0.0.52 to apt.wikimedia.org buster-wikimedia,bullseye-wikimedia
  • 17:00 papaul: powerdown elastic2058 for relocation
  • 16:43 vgutierrez: Enforce Puppet Internal CA validation on trafficserver@ulsfo - T281673
  • 16:12 papaul: powerdown mc-gp2002 for relocation
  • 16:09 ryankemper: [Elastic] Set `elastic2058` as the only banned node in Cirrussearch Elasticsearch clusters (`elastic2058-production-search-codfw`, `elastic2058-production-search-omega-codfw`, `elastic2058-production-search-psi-codfw`)
  • 15:58 Amir1: starting upgrade of public mailing lists in group d and e (T280322)
  • 15:50 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs1012.eqiad.wmnet with reason: REIMAGE
  • 15:47 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs1012.eqiad.wmnet with reason: REIMAGE
  • 15:42 papaul: powerdown logstash2027 for relocation
  • 15:41 mvolz@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'zotero' for release 'production' .
  • 15:40 ryankemper@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) reboot without plugin upgrade (3 nodes at a time) for ElasticSearch cluster search_codfw: codfw reboot - ryankemper@cumin1001 - T280563
  • 15:34 XioNoX: push cloud-gw-transport-eqiad to asw2-b-eqiad and cloudsw
  • 15:33 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation reboot without plugin upgrade (3 nodes at a time) for ElasticSearch cluster search_codfw: codfw reboot - ryankemper@cumin1001 - T280563
  • 15:32 ryankemper: T280382 `sudo -i wmf-auto-reimage-host -p T280382 wdqs1012.eqiad.wmnet` on `ryankemper@cumin1001` tmux session `reimage`
  • 15:32 ryankemper: T280382 `sudo -i wmf-auto-reimage-host -p T280382 wdqs2003.codfw.wmnet` on `ryankemper@cumin1001` tmux session `reimage`
  • 15:31 mvolz@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'zotero' for release 'staging' .
  • 15:29 cdanis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:05:00 on cumin1001.eqiad.wmnet with reason: quiz
  • 15:29 cdanis@cumin1001: START - Cookbook sre.hosts.downtime for 0:05:00 on cumin1001.eqiad.wmnet with reason: quiz
  • 15:26 ryankemper: T280382 [WDQS] Pooled `wdqs1007` and `wdqs2004`
  • 15:26 ryankemper: T280382 `wdqs2004.codfw.wmnet` has been re-imaged and had the appropriate wikidata/categories journal files transferred. `df -h` shows disk space is no longer an issue following the switch to `raid0`: `/dev/md2 2.6T 998G 1.5T 40% /srv`
  • 15:26 ryankemper: T280382 `wdqs1007.eqiad.wmnet` has been re-imaged and had the appropriate wikidata/categories journal files transferred. `df -h` shows disk space is no longer an issue following the switch to `raid0`: `/dev/md2 2.6T 998G 1.5T 40% /srv`
  • 15:20 mvolz@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'citoid' for release 'production' .
  • 15:16 mvolz@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'citoid' for release 'production' .
  • 15:14 papaul: powerdown ms-be2053 for relocation
  • 15:10 moritzm: imported wmfbackups 0.5+deb11u1 for bullseye-wikimedia to apt.wikimedia.org
  • 15:07 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 9 hosts with reason: T270704
  • 15:06 aborrero@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 9 hosts with reason: T270704
  • 15:06 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 105 hosts with reason: T270704
  • 15:06 aborrero@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 105 hosts with reason: T270704
  • 15:06 mvolz@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'citoid' for release 'staging' .
  • 15:05 moritzm: imported wmfmariadbpy 0.6+deb11u1 for bullseye-wikimedia to apt.wikimedia.org
  • 14:55 papaul: powerdown kafka-main2002 for relocation
  • 14:30 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1113:3315', diff saved to https://phabricator.wikimedia.org/P15833 and previous config saved to /var/cache/conftool/dbconfig/20210506-143002-marostegui.json
  • 14:09 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1113:3315 for schema change', diff saved to https://phabricator.wikimedia.org/P15829 and previous config saved to /var/cache/conftool/dbconfig/20210506-140916-marostegui.json
  • 13:37 marostegui@cumin1001: dbctl commit (dc=all): 'db1144:3315 (re)pooling @ 100%: Repool db1144:3315', diff saved to https://phabricator.wikimedia.org/P15828 and previous config saved to /var/cache/conftool/dbconfig/20210506-133738-root.json
  • 13:22 marostegui@cumin1001: dbctl commit (dc=all): 'db1144:3315 (re)pooling @ 75%: Repool db1144:3315', diff saved to https://phabricator.wikimedia.org/P15827 and previous config saved to /var/cache/conftool/dbconfig/20210506-132234-root.json
  • 13:21 XioNoX: push pfw policies - T281942
  • 13:07 marostegui@cumin1001: dbctl commit (dc=all): 'db1144:3315 (re)pooling @ 50%: Repool db1144:3315', diff saved to https://phabricator.wikimedia.org/P15826 and previous config saved to /var/cache/conftool/dbconfig/20210506-130730-root.json
  • 12:52 marostegui@cumin1001: dbctl commit (dc=all): 'db1144:3315 (re)pooling @ 25%: Repool db1144:3315', diff saved to https://phabricator.wikimedia.org/P15825 and previous config saved to /var/cache/conftool/dbconfig/20210506-125226-root.json
  • 11:44 hnowlan@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts eventlog1002.eqiad.wmnet
  • 11:35 mlitn@deploy1002: Synchronized wmf-config: Config: Enable Extension:MediaSearch on betacommons (T265939) (duration: 01m 06s)
  • 11:34 mlitn@deploy1002: sync-file aborted: Config: Enable Extension:MediaSearch on betacommons (T265939) (duration: 00m 56s)
  • 11:34 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1173.eqiad.wmnet with reason: REIMAGE
  • 11:31 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1173.eqiad.wmnet with reason: REIMAGE
  • 11:30 hnowlan@cumin1001: START - Cookbook sre.hosts.decommission for hosts eventlog1002.eqiad.wmnet
  • 11:28 hnowlan@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=99) for hosts eventlog1002.eqiad.wmnet
  • 11:27 hnowlan@cumin1001: START - Cookbook sre.hosts.decommission for hosts eventlog1002.eqiad.wmnet
  • 11:23 wmde-fisch@deploy1002: Synchronized wmf-config/CommonSettings.php: Config: Enable ReferencePreviews as full default on pilot wikis (T271206) (duration: 01m 06s)
  • 11:22 wmde-fisch@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Enable ReferencePreviews as full default on pilot wikis (T271206) (duration: 01m 06s)
  • 11:12 kormat@cumin1001: dbctl commit (dc=all): 'db1173 depooling: Reimage to buster T280751', diff saved to https://phabricator.wikimedia.org/P15824 and previous config saved to /var/cache/conftool/dbconfig/20210506-111256-kormat.json
  • 11:12 kormat: reimaging db1173 to buster T280751
  • 10:59 volans: upgrading spicerack on cumin hosts to 0.0.51-1
  • 10:59 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1144:3315 for schema change', diff saved to https://phabricator.wikimedia.org/P15823 and previous config saved to /var/cache/conftool/dbconfig/20210506-105909-marostegui.json
  • 10:58 marostegui@cumin1001: dbctl commit (dc=all): 'db1110 (re)pooling @ 100%: Repool db1110', diff saved to https://phabricator.wikimedia.org/P15822 and previous config saved to /var/cache/conftool/dbconfig/20210506-105850-root.json
  • 10:43 marostegui@cumin1001: dbctl commit (dc=all): 'db1110 (re)pooling @ 75%: Repool db1110', diff saved to https://phabricator.wikimedia.org/P15821 and previous config saved to /var/cache/conftool/dbconfig/20210506-104346-root.json
  • 10:28 marostegui@cumin1001: dbctl commit (dc=all): 'db1110 (re)pooling @ 50%: Repool db1110', diff saved to https://phabricator.wikimedia.org/P15820 and previous config saved to /var/cache/conftool/dbconfig/20210506-102842-root.json
  • 10:19 jynus: stop dbprov2002 in advance of maintenance T281135
  • 10:13 marostegui@cumin1001: dbctl commit (dc=all): 'db1110 (re)pooling @ 25%: Repool db1110', diff saved to https://phabricator.wikimedia.org/P15819 and previous config saved to /var/cache/conftool/dbconfig/20210506-101339-root.json
  • 09:55 akosiaris@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'linkrecommendation' for release 'external' .
  • 09:55 akosiaris@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'linkrecommendation' for release 'internal' .
  • 09:50 akosiaris@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'linkrecommendation' for release 'external' .
  • 09:50 akosiaris@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'linkrecommendation' for release 'internal' .
  • 09:45 akosiaris@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'linkrecommendation' for release 'staging' .
  • 09:22 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1110 for schema change', diff saved to https://phabricator.wikimedia.org/P15818 and previous config saved to /var/cache/conftool/dbconfig/20210506-092217-marostegui.json
  • 09:18 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3315 (re)pooling @ 100%: Repool db1096:3315', diff saved to https://phabricator.wikimedia.org/P15817 and previous config saved to /var/cache/conftool/dbconfig/20210506-091818-root.json
  • 09:03 elukey: sudo apt-get remove linux-image-4.19.0-11-amd64 linux-image-4.19.0-9-amd64 linux-image-4.19.0-13-amd64 on ping[123]001 host to free some space (tiny root partition, these are old kernels)
  • 09:03 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3315 (re)pooling @ 75%: Repool db1096:3315', diff saved to https://phabricator.wikimedia.org/P15816 and previous config saved to /var/cache/conftool/dbconfig/20210506-090315-root.json
  • 08:48 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3315 (re)pooling @ 50%: Repool db1096:3315', diff saved to https://phabricator.wikimedia.org/P15815 and previous config saved to /var/cache/conftool/dbconfig/20210506-084811-root.json
  • 08:47 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1087 db1167', diff saved to https://phabricator.wikimedia.org/P15814 and previous config saved to /var/cache/conftool/dbconfig/20210506-084754-marostegui.json
  • 08:44 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1087 and db1167 to switch sanitarium masters', diff saved to https://phabricator.wikimedia.org/P15813 and previous config saved to /var/cache/conftool/dbconfig/20210506-084443-marostegui.json
  • 08:39 marostegui@cumin1001: dbctl commit (dc=all): 'db1160 (re)pooling @ 100%: Repool db1160', diff saved to https://phabricator.wikimedia.org/P15812 and previous config saved to /var/cache/conftool/dbconfig/20210506-083910-root.json
  • 08:33 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3315 (re)pooling @ 25%: Repool db1096:3315', diff saved to https://phabricator.wikimedia.org/P15811 and previous config saved to /var/cache/conftool/dbconfig/20210506-083307-root.json
  • 08:27 ariel@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts snapshot1007.eqiad.wmnet
  • 08:24 marostegui@cumin1001: dbctl commit (dc=all): 'db1160 (re)pooling @ 75%: Repool db1160', diff saved to https://phabricator.wikimedia.org/P15810 and previous config saved to /var/cache/conftool/dbconfig/20210506-082406-root.json
  • 08:23 moritzm: imported wikimedia-lvs-realserver to apt.wikimedia.org/bullseye T275873
  • 08:18 ariel@cumin1001: START - Cookbook sre.hosts.decommission for hosts snapshot1007.eqiad.wmnet
  • 08:16 ariel@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts snapshot1006.eqiad.wmnet
  • 08:09 marostegui@cumin1001: dbctl commit (dc=all): 'db1160 (re)pooling @ 50%: Repool db1160', diff saved to https://phabricator.wikimedia.org/P15809 and previous config saved to /var/cache/conftool/dbconfig/20210506-080902-root.json
  • 08:06 ariel@cumin1001: START - Cookbook sre.hosts.decommission for hosts snapshot1006.eqiad.wmnet
  • 08:04 ariel@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts snapshot1005.eqiad.wmnet
  • 07:54 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1096:3315 for schema change', diff saved to https://phabricator.wikimedia.org/P15808 and previous config saved to /var/cache/conftool/dbconfig/20210506-075416-marostegui.json
  • 07:53 marostegui@cumin1001: dbctl commit (dc=all): 'db1160 (re)pooling @ 25%: Repool db1160', diff saved to https://phabricator.wikimedia.org/P15807 and previous config saved to /var/cache/conftool/dbconfig/20210506-075359-root.json
  • 07:47 jynus: shutting down and removing db2098:s3 instance
  • 07:47 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1160 for schema change', diff saved to https://phabricator.wikimedia.org/P15806 and previous config saved to /var/cache/conftool/dbconfig/20210506-074746-marostegui.json
  • 07:45 ariel@cumin1001: START - Cookbook sre.hosts.decommission for hosts snapshot1005.eqiad.wmnet
  • 07:29 vgutierrez: Enforce Puppet Internal CA validation on trafficserver@cp[4026,4032] - T281673
  • 07:26 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 07:24 moritzm: installing exim security updates on bullseye hosts
  • 07:24 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 06:40 marostegui@cumin1001: dbctl commit (dc=all): 'db1112 (re)pooling @ 100%: Repool db1112 after checking its tables', diff saved to https://phabricator.wikimedia.org/P15805 and previous config saved to /var/cache/conftool/dbconfig/20210506-064020-root.json
  • 06:29 marostegui@cumin1001: dbctl commit (dc=all): 'db1079 (re)pooling @ 100%: Repool db1079', diff saved to https://phabricator.wikimedia.org/P15804 and previous config saved to /var/cache/conftool/dbconfig/20210506-062931-root.json
  • 06:29 marostegui@cumin1001: dbctl commit (dc=all): 'db1158 (re)pooling @ 100%: Repool db1158', diff saved to https://phabricator.wikimedia.org/P15803 and previous config saved to /var/cache/conftool/dbconfig/20210506-062915-root.json
  • 06:25 marostegui@cumin1001: dbctl commit (dc=all): 'db1112 (re)pooling @ 75%: Repool db1112 after checking its tables', diff saved to https://phabricator.wikimedia.org/P15802 and previous config saved to /var/cache/conftool/dbconfig/20210506-062516-root.json
  • 06:20 elukey: apt-get clean on ping[1,2,3]001 to free some space
  • 06:14 marostegui@cumin1001: dbctl commit (dc=all): 'db1079 (re)pooling @ 75%: Repool db1079', diff saved to https://phabricator.wikimedia.org/P15801 and previous config saved to /var/cache/conftool/dbconfig/20210506-061427-root.json
  • 06:14 marostegui@cumin1001: dbctl commit (dc=all): 'db1158 (re)pooling @ 75%: Repool db1158', diff saved to https://phabricator.wikimedia.org/P15800 and previous config saved to /var/cache/conftool/dbconfig/20210506-061411-root.json
  • 06:10 marostegui@cumin1001: dbctl commit (dc=all): 'db1112 (re)pooling @ 50%: Repool db1112 after checking its tables', diff saved to https://phabricator.wikimedia.org/P15799 and previous config saved to /var/cache/conftool/dbconfig/20210506-061012-root.json
  • 06:01 ryankemper: T280382 `sudo -i cookbook sre.wdqs.data-transfer --source wdqs1011.eqiad.wmnet --dest wdqs1007.eqiad.wmnet --reason "transferring fresh wikidata journal following reimage" --blazegraph_instance blazegraph` on `ryankemper@cumin1001` tmux session `reimage`
  • 06:00 ryankemper: T280382 `sudo -i cookbook sre.wdqs.data-transfer --source wdqs2008.codfw.wmnet --dest wdqs2004.codfw.wmnet --reason "transferring fresh wikidata journal following reimage" --blazegraph_instance blazegraph` on `ryankemper@cumin1001` tmux session `reimage`
  • 06:00 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 05:59 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 05:59 marostegui@cumin1001: dbctl commit (dc=all): 'db1079 (re)pooling @ 50%: Repool db1079', diff saved to https://phabricator.wikimedia.org/P15798 and previous config saved to /var/cache/conftool/dbconfig/20210506-055923-root.json
  • 05:59 marostegui@cumin1001: dbctl commit (dc=all): 'db1158 (re)pooling @ 50%: Repool db1158', diff saved to https://phabricator.wikimedia.org/P15797 and previous config saved to /var/cache/conftool/dbconfig/20210506-055907-root.json
  • 05:55 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1083 T281445', diff saved to https://phabricator.wikimedia.org/P15796 and previous config saved to /var/cache/conftool/dbconfig/20210506-055535-marostegui.json
  • 05:55 marostegui@cumin1001: dbctl commit (dc=all): 'db1112 (re)pooling @ 25%: Repool db1112 after checking its tables', diff saved to https://phabricator.wikimedia.org/P15795 and previous config saved to /var/cache/conftool/dbconfig/20210506-055509-root.json
  • 05:44 marostegui@cumin1001: dbctl commit (dc=all): 'db1079 (re)pooling @ 25%: Repool db1079', diff saved to https://phabricator.wikimedia.org/P15794 and previous config saved to /var/cache/conftool/dbconfig/20210506-054419-root.json
  • 05:44 marostegui@cumin1001: dbctl commit (dc=all): 'db1158 (re)pooling @ 25%: Repool db1158', diff saved to https://phabricator.wikimedia.org/P15793 and previous config saved to /var/cache/conftool/dbconfig/20210506-054404-root.json
  • 05:43 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 05:43 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 05:38 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 05:38 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1079 and db1158 to switch sanitarium masters', diff saved to https://phabricator.wikimedia.org/P15792 and previous config saved to /var/cache/conftool/dbconfig/20210506-053801-marostegui.json
  • 05:38 ryankemper: T280382 `sudo -i cookbook sre.wdqs.data-transfer --source wdqs1011.eqiad.wmnet --dest wdqs1007.eqiad.wmnet --reason "transferring fresh categories journal following reimage" --blazegraph_instance categories` on `ryankemper@cumin1001` tmux session `reimage`
  • 05:37 ryankemper: T280382 `sudo -i cookbook sre.wdqs.data-transfer --source wdqs2008.codfw.wmnet --dest wdqs2004.codfw.wmnet --reason "transferring fresh categories journal following reimage" --blazegraph_instance categories` on `ryankemper@cumin1001` tmux session `reimage`
  • 05:37 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 05:32 tstarling@deploy1002: Synchronized php-1.37.0-wmf.4/includes/page/PageReferenceValue.php: fixing T282070 RC/log breakage due to unblocking autoblocks (duration: 01m 09s)
  • 05:27 effie: upgrade scap to 3.17.1-1 - T279695
  • 03:56 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs2004.codfw.wmnet with reason: REIMAGE
  • 03:54 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs1007.eqiad.wmnet with reason: REIMAGE
  • 03:53 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs2004.codfw.wmnet with reason: REIMAGE
  • 03:52 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs1007.eqiad.wmnet with reason: REIMAGE
  • 03:38 ryankemper: T280382 `sudo -i wmf-auto-reimage-host -p T280382 wdqs1007.eqiad.wmnet` on `ryankemper@cumin1001` tmux session `reimage`
  • 03:38 ryankemper: T280382 `sudo -i wmf-auto-reimage-host -p T280382 wdqs2004.codfw.wmnet` on `ryankemper@cumin1001` tmux session `reimage`
  • 03:18 ryankemper: [Elastic] `elastic2043` is ssh unreachable. Power cycling it to bring it briefly back online - if it has the shard it should be able to repair the cluster state. Otherwise I'll have to delete the index for `enwiki_titlesuggest_1620184482` given the data would be unrecoverable
  • 03:08 ryankemper: [Elastic] `ryankemper@elastic2044:~$ curl -H 'Content-Type: application/json' -XPUT http://localhost:9200/_cluster/settings -d '{"transient":{"cluster.routing.allocation.exclude":{"_host": null,"_name": null}'`}}
  • 03:08 ryankemper: [Elastic] Temporarily unbanning `elastic2033` and `elastic2043` from `production-search-codfw` to see if we can get the cluster green again. If it returns to green then we'll ban one node, wait for the shards to redistribute, and then ban the other
  • 03:06 ryankemper: [Elastic] I banned two nodes simultaneously earlier today - if there's an index with only 1 replica, and its primary and replica happened to be on the two nodes I banned, then that would have caused this situation
  • 03:04 ryankemper: [Elastic] It looks like we've got a single missing shard in `production-search-codfw` (port 9200), which is putting the cluster into red status. The cluster won't get back into green status without intervention
  • 02:56 ryankemper@cumin1001: END (ERROR) - Cookbook sre.elasticsearch.rolling-operation (exit_code=97) reboot without plugin upgrade (3 nodes at a time) for ElasticSearch cluster search_codfw: codfw reboot - ryankemper@cumin1001 - T280563
  • 02:55 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation reboot without plugin upgrade (3 nodes at a time) for ElasticSearch cluster search_codfw: codfw reboot - ryankemper@cumin1001 - T280563
  • 00:35 Amir1: sudo service mailman3-web restart

2021-05-05

  • 23:35 ryankemper: T281621 T281327 [Elastic] Banned `elastic2033` and `elastic2043` from the Cirrussearch Elasticsearch clusters
  • 23:10 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.4/extensions/GlobalWatchlist/modules/SpecialGlobalWatchlist.display.css: 4947241: Fix centering of as-of label (duration: 01m 08s)
  • 22:13 mutante: welcome new deployer derick - user created on deploy1002 and bastions (T281564)
  • 22:05 mutante: pushing puppet run on all bastion hosts
  • 21:45 mutante: mailing lists: approved Alangi Derick's pending request for membership in ops mailing list (is becoming deployer) T281309
  • 21:37 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.4/extensions/CentralAuth/includes/CentralAuthUser.php: 52b134e: Cross-wiki block should pass correct wiki blocker (T281972) (duration: 01m 09s)
  • 21:34 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.3/extensions/CentralAuth/includes/CentralAuthUser.php: 6526884: Cross-wiki block should pass correct wiki blocker (T281972) (duration: 01m 08s)
  • 21:32 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.4/includes/user/UserIdentityValue.php: f189c46: UserIdentityValue: Introduce convenience static factory methods (T281972) (duration: 01m 09s)
  • 21:30 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.3/includes/user/UserIdentityValue.php: 8ffb52d: UserIdentityValue: Introduce convenience static factory methods (T281972) (duration: 01m 11s)
  • 21:29 urbanecm@deploy1002: sync-file aborted: 8ffb52d: UserIdentityValue: Introduce convenience static factory methods (T281972) (duration: 00m 04s)
  • 20:37 ejegg: updated email preferences wiki (donorwiki) from d449599540 to 9f51ace546
  • 20:36 ejegg: updated payments-wiki from d449599540 to 9f51ace546
  • 20:20 ejegg: updated email preferences wiki (donorwiki) from a232fc3438 to d449599540
  • 19:59 jbond42: re-enable puppet post 685485
  • 19:53 jbond42: disable puppet: rolling out change (685485) which affects all hosts
  • 19:21 brennen@deploy1002: Synchronized php: group1 wikis to 1.37.0-wmf.4 (duration: 01m 07s)
  • 19:19 brennen@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.37.0-wmf.4
  • 19:16 jbond42: ignore the last log message will wait for deploy to finish
  • 19:16 brennen@deploy1002: Synchronized php-1.37.0-wmf.4/tests/phpunit/includes: Backport: Fix order of joins in SpecialRecentChanges (T281981) (duration: 01m 10s)
  • 19:16 jbond42: disable puppet: rolling out change (685485) which affects all hosts
  • 19:14 brennen@deploy1002: Synchronized php-1.37.0-wmf.4/includes/specials: Backport: Fix order of joins in SpecialRecentChanges (T281981) (duration: 01m 08s)
  • 19:10 Amir1: starting migration of public mailing lists in group b and c to mailman3 (T280322)
  • 19:01 brennen: 1.37.0-wmf.4 train status (T281145): deploying patch for T282038 and then rolling forward to group1.
  • 18:59 bblack@cumin1001: conftool action : set/pooled=yes; selector: name=cp501[46].eqsin.wmnet
  • 18:50 bblack@cumin1001: conftool action : set/pooled=yes; selector: name=cp501[35].eqsin.wmnet
  • 18:43 tgr_: Morning deploys done
  • 18:43 tgr@deploy1002: Synchronized php-1.37.0-wmf.4/extensions/GrowthExperiments/modules/homepage/addlink/AddLinkArticleTarget.js: Backport: Prevent edit notices from appearing (T281960) (duration: 01m 08s)
  • 18:42 tgr@deploy1002: Synchronized php-1.37.0-wmf.3/extensions/GrowthExperiments/modules/homepage/addlink/AddLinkArticleTarget.js: Backport: Prevent edit notices from appearing (T281960) (duration: 01m 08s)
  • 18:40 tgr@deploy1002: Synchronized wmf-config/flaggedrevs.php: Config: flaggedrevs.php: Use MediaWikiServices, not an extension function (duration: 01m 08s)
  • 18:34 tgr@deploy1002: Synchronized php-1.37.0-wmf.4/extensions/Popups/includes: Backport: Enable Reference Previews for more users (T271206) (duration: 01m 08s)
  • 18:33 tgr@deploy1002: Synchronized php-1.37.0-wmf.3/extensions/Popups/includes: Backport: Enable Reference Previews for more users (T271206) (duration: 01m 11s)
  • 18:24 tgr@deploy1002: Synchronized wmf-config/ProductionServices.php: Config: replace mwlog1001 with new mwlog[12]002 hosts (T224565) (duration: 01m 24s)
  • 17:59 bblack@cumin1001: conftool action : set/weight=100; selector: name=cp501[3456].eqsin.wmnet,service=ats-be
  • 17:59 bblack@cumin1001: conftool action : set/weight=1; selector: name=cp501[3456].eqsin.wmnet,service=ats-tls
  • 17:59 bblack@cumin1001: conftool action : set/weight=1; selector: name=cp501[3456].eqsin.wmnet,service=varnish-fe
  • 17:59 mutante: adding a systemd timer to all thumbor servers that writes output of fc-list command into /srv/fc-list/fc-list (T280718)
  • 17:58 XioNoX: push pfw policies - T281942
  • 17:10 ejegg: updated standalone SmashPig deploy from 250a8570d1 to be272c02ce
  • 15:54 marostegui@cumin1001: dbctl commit (dc=all): 'db1126 (re)pooling @ 100%: Repool db1126', diff saved to https://phabricator.wikimedia.org/P15786 and previous config saved to /var/cache/conftool/dbconfig/20210505-155453-root.json
  • 15:43 herron@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts icinga2001.wikimedia.org
  • 15:39 marostegui@cumin1001: dbctl commit (dc=all): 'db1126 (re)pooling @ 75%: Repool db1126', diff saved to https://phabricator.wikimedia.org/P15785 and previous config saved to /var/cache/conftool/dbconfig/20210505-153949-root.json
  • 15:25 herron@cumin1001: START - Cookbook sre.hosts.decommission for hosts icinga2001.wikimedia.org
  • 15:24 marostegui@cumin1001: dbctl commit (dc=all): 'db1126 (re)pooling @ 50%: Repool db1126', diff saved to https://phabricator.wikimedia.org/P15784 and previous config saved to /var/cache/conftool/dbconfig/20210505-152445-root.json
  • 15:23 herron@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts icinga1001.wikimedia.org
  • 15:11 herron@cumin1001: START - Cookbook sre.hosts.decommission for hosts icinga1001.wikimedia.org
  • 15:10 herron: decommissioning icinga[12]001 hosts T279601 T279602
  • 15:10 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on 10 hosts with reason: Table check on db2129 T280751
  • 15:10 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on 10 hosts with reason: Table check on db2129 T280751
  • 15:09 marostegui@cumin1001: dbctl commit (dc=all): 'db1126 (re)pooling @ 30%: Repool db1126', diff saved to https://phabricator.wikimedia.org/P15783 and previous config saved to /var/cache/conftool/dbconfig/20210505-150942-root.json
  • 14:54 marostegui@cumin1001: dbctl commit (dc=all): 'db1126 (re)pooling @ 20%: Repool db1126', diff saved to https://phabricator.wikimedia.org/P15782 and previous config saved to /var/cache/conftool/dbconfig/20210505-145438-root.json
  • 14:44 marostegui@cumin1001: dbctl commit (dc=all): 'db1165 (re)pooling @ 100%: Repool db1165', diff saved to https://phabricator.wikimedia.org/P15781 and previous config saved to /var/cache/conftool/dbconfig/20210505-144431-root.json
  • 14:39 marostegui@cumin1001: dbctl commit (dc=all): 'db1126 (re)pooling @ 10%: Repool db1126', diff saved to https://phabricator.wikimedia.org/P15780 and previous config saved to /var/cache/conftool/dbconfig/20210505-143934-root.json
  • 14:29 marostegui@cumin1001: dbctl commit (dc=all): 'db1165 (re)pooling @ 75%: Repool db1165', diff saved to https://phabricator.wikimedia.org/P15779 and previous config saved to /var/cache/conftool/dbconfig/20210505-142927-root.json
  • 14:28 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 10 hosts with reason: Reimage db2129 T280751
  • 14:28 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 10 hosts with reason: Reimage db2129 T280751
  • 14:24 marostegui@cumin1001: dbctl commit (dc=all): 'db1126 (re)pooling @ 5%: Repool db1126', diff saved to https://phabricator.wikimedia.org/P15778 and previous config saved to /var/cache/conftool/dbconfig/20210505-142431-root.json
  • 14:19 kormat@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2129.codfw.wmnet with reason: REIMAGE
  • 14:18 marostegui: Upgrade kernel and enable report_host on db1126
  • 14:17 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1126 to enable report_host', diff saved to https://phabricator.wikimedia.org/P15777 and previous config saved to /var/cache/conftool/dbconfig/20210505-141735-marostegui.json
  • 14:17 kormat@cumin2001: START - Cookbook sre.hosts.downtime for 2:00:00 on db2129.codfw.wmnet with reason: REIMAGE
  • 14:14 marostegui@cumin1001: dbctl commit (dc=all): 'db1165 (re)pooling @ 50%: Repool db1165', diff saved to https://phabricator.wikimedia.org/P15776 and previous config saved to /var/cache/conftool/dbconfig/20210505-141423-root.json
  • 13:59 marostegui@cumin1001: dbctl commit (dc=all): 'db1165 (re)pooling @ 25%: Repool db1165', diff saved to https://phabricator.wikimedia.org/P15775 and previous config saved to /var/cache/conftool/dbconfig/20210505-135920-root.json
  • 13:58 kevinbazira@deploy1002: Finished deploy [ores/deploy@5612f30]: Regular ORES Deployment T278723 (duration: 16m 47s)
  • 13:48 wmde-fisch@deploy1002: Synchronized wmf-config/CommonSettings.php: Config: Revert "Enable ReferencePreviews on first wikis CommonSettings" () (duration: 02m 08s)
  • 13:41 kevinbazira@deploy1002: Started deploy [ores/deploy@5612f30]: Regular ORES Deployment T278723
  • 13:33 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1165 for schema change', diff saved to https://phabricator.wikimedia.org/P15774 and previous config saved to /var/cache/conftool/dbconfig/20210505-133259-marostegui.json
  • 13:32 marostegui@cumin1001: dbctl commit (dc=all): 'db1180 (re)pooling @ 100%: Repool db1180', diff saved to https://phabricator.wikimedia.org/P15773 and previous config saved to /var/cache/conftool/dbconfig/20210505-133202-root.json
  • 13:19 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 9 hosts with reason: Reimage db2129 T280751
  • 13:19 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 9 hosts with reason: Reimage db2129 T280751
  • 13:16 marostegui@cumin1001: dbctl commit (dc=all): 'db1180 (re)pooling @ 75%: Repool db1180', diff saved to https://phabricator.wikimedia.org/P15772 and previous config saved to /var/cache/conftool/dbconfig/20210505-131658-root.json
  • 13:12 kormat: reimaging db2129 to buster T280751
  • 13:01 marostegui@cumin1001: dbctl commit (dc=all): 'db1180 (re)pooling @ 50%: Repool db1180', diff saved to https://phabricator.wikimedia.org/P15771 and previous config saved to /var/cache/conftool/dbconfig/20210505-130155-root.json
  • 12:46 marostegui@cumin1001: dbctl commit (dc=all): 'db1180 (re)pooling @ 25%: Repool db1180', diff saved to https://phabricator.wikimedia.org/P15770 and previous config saved to /var/cache/conftool/dbconfig/20210505-124651-root.json
  • 12:23 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1180 for schema change', diff saved to https://phabricator.wikimedia.org/P15769 and previous config saved to /var/cache/conftool/dbconfig/20210505-122351-marostegui.json
  • 12:13 marostegui@cumin1001: dbctl commit (dc=all): 'db1173 (re)pooling @ 100%: Repool db1173', diff saved to https://phabricator.wikimedia.org/P15768 and previous config saved to /var/cache/conftool/dbconfig/20210505-121353-root.json
  • 12:01 moritzm: installing exim security updates on stretch
  • 11:58 marostegui@cumin1001: dbctl commit (dc=all): 'db1173 (re)pooling @ 75%: Repool db1173', diff saved to https://phabricator.wikimedia.org/P15767 and previous config saved to /var/cache/conftool/dbconfig/20210505-115849-root.json
  • 11:43 marostegui@cumin1001: dbctl commit (dc=all): 'db1173 (re)pooling @ 50%: Repool db1173', diff saved to https://phabricator.wikimedia.org/P15765 and previous config saved to /var/cache/conftool/dbconfig/20210505-114345-root.json
  • 11:28 marostegui@cumin1001: dbctl commit (dc=all): 'db1173 (re)pooling @ 25%: Repool db1173', diff saved to https://phabricator.wikimedia.org/P15764 and previous config saved to /var/cache/conftool/dbconfig/20210505-112842-root.json
  • 11:25 urbanecm@deploy1002: Synchronized wmf-config/CommonSettings.php: 3565427: Enable ReferencePreviews on first wikis (T271206; 2/2) (duration: 01m 10s)
  • 11:24 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 4f3051b: Enable ReferencePreviews on first wikis (T271206; 1/2) (duration: 01m 20s)
  • 11:17 urbanecm@deploy1002: Scap failed!: Call to mwscript eval.php stderr: not empty
  • 11:11 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 289dc34: Enable new language button for all logged in users outside test projects (T280526) (duration: 02m 24s)
  • 10:29 jmm@cumin2001: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0)
  • 09:54 hashar: Restarted Zuul / CI
  • 09:49 marostegui@cumin1001: dbctl commit (dc=all): 'db1168 (re)pooling @ 100%: Repool db1168', diff saved to https://phabricator.wikimedia.org/P15762 and previous config saved to /var/cache/conftool/dbconfig/20210505-094945-root.json
  • 09:40 marostegui@cumin1001: dbctl commit (dc=all): 'db1178 (re)pooling @ 100%: Slowly pool db1178 into s8 T275633', diff saved to https://phabricator.wikimedia.org/P15761 and previous config saved to /var/cache/conftool/dbconfig/20210505-094005-root.json
  • 09:34 marostegui@cumin1001: dbctl commit (dc=all): 'db1168 (re)pooling @ 75%: Repool db1168', diff saved to https://phabricator.wikimedia.org/P15760 and previous config saved to /var/cache/conftool/dbconfig/20210505-093441-root.json
  • 09:25 marostegui@cumin1001: dbctl commit (dc=all): 'db1178 (re)pooling @ 80%: Slowly pool db1178 into s8 T275633', diff saved to https://phabricator.wikimedia.org/P15759 and previous config saved to /var/cache/conftool/dbconfig/20210505-092501-root.json
  • 09:19 marostegui@cumin1001: dbctl commit (dc=all): 'db1168 (re)pooling @ 50%: Repool db1168', diff saved to https://phabricator.wikimedia.org/P15758 and previous config saved to /var/cache/conftool/dbconfig/20210505-091938-root.json
  • 09:09 marostegui@cumin1001: dbctl commit (dc=all): 'db1178 (re)pooling @ 70%: Slowly pool db1178 into s8 T275633', diff saved to https://phabricator.wikimedia.org/P15757 and previous config saved to /var/cache/conftool/dbconfig/20210505-090957-root.json
  • 09:08 hashar: Upgraded Jenkins ldap plugin from 1.26 to 2.6 # T281737
  • 09:04 marostegui@cumin1001: dbctl commit (dc=all): 'db1168 (re)pooling @ 25%: Repool db1168', diff saved to https://phabricator.wikimedia.org/P15756 and previous config saved to /var/cache/conftool/dbconfig/20210505-090434-root.json
  • 08:55 hashar: Restarting CI Jenkins # T281737
  • 08:54 marostegui@cumin1001: dbctl commit (dc=all): 'db1178 (re)pooling @ 60%: Slowly pool db1178 into s8 T275633', diff saved to https://phabricator.wikimedia.org/P15755 and previous config saved to /var/cache/conftool/dbconfig/20210505-085454-root.json
  • 08:50 jgiannelos@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
  • 08:47 jgiannelos@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
  • 08:41 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'mobileapps' for release 'staging' .
  • 08:39 marostegui@cumin1001: dbctl commit (dc=all): 'db1178 (re)pooling @ 50%: Slowly pool db1178 into s8 T275633', diff saved to https://phabricator.wikimedia.org/P15754 and previous config saved to /var/cache/conftool/dbconfig/20210505-083950-root.json
  • 08:38 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1168 for schema change', diff saved to https://phabricator.wikimedia.org/P15753 and previous config saved to /var/cache/conftool/dbconfig/20210505-083810-marostegui.json
  • 08:26 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1113:3316', diff saved to https://phabricator.wikimedia.org/P15752 and previous config saved to /var/cache/conftool/dbconfig/20210505-082609-marostegui.json
  • 08:24 marostegui@cumin1001: dbctl commit (dc=all): 'db1178 (re)pooling @ 35%: Slowly pool db1178 into s8 T275633', diff saved to https://phabricator.wikimedia.org/P15751 and previous config saved to /var/cache/conftool/dbconfig/20210505-082446-root.json
  • 08:13 volans: uploaded spicerack_0.0.51 to apt.wikimedia.org buster-wikimedia
  • 08:09 marostegui@cumin1001: dbctl commit (dc=all): 'db1178 (re)pooling @ 30%: Slowly pool db1178 into s8 T275633', diff saved to https://phabricator.wikimedia.org/P15750 and previous config saved to /var/cache/conftool/dbconfig/20210505-080942-root.json
  • 07:54 marostegui@cumin1001: dbctl commit (dc=all): 'db1178 (re)pooling @ 25%: Slowly pool db1178 into s8 T275633', diff saved to https://phabricator.wikimedia.org/P15749 and previous config saved to /var/cache/conftool/dbconfig/20210505-075438-root.json
  • 07:53 ryankemper@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) reboot without plugin upgrade (3 nodes at a time) for ElasticSearch cluster search_codfw: codfw reboot - ryankemper@cumin1001 - T280563
  • 07:39 marostegui@cumin1001: dbctl commit (dc=all): 'db1178 (re)pooling @ 20%: Slowly pool db1178 into s8 T275633', diff saved to https://phabricator.wikimedia.org/P15748 and previous config saved to /var/cache/conftool/dbconfig/20210505-073934-root.json
  • 07:37 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1113:3316 for schema change', diff saved to https://phabricator.wikimedia.org/P15747 and previous config saved to /var/cache/conftool/dbconfig/20210505-073722-marostegui.json
  • 07:36 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3316 (re)pooling @ 100%: Repool db1096:3316 after schema change', diff saved to https://phabricator.wikimedia.org/P15746 and previous config saved to /var/cache/conftool/dbconfig/20210505-073653-root.json
  • 07:35 jmm@cumin2001: START - Cookbook sre.cassandra.roll-restart
  • 07:35 moritzm: rolling restart of cassandra in eqiad to pick up Java security updates
  • 07:34 marostegui@cumin1001: dbctl commit (dc=all): 'db1156 (re)pooling @ 100%: Repool db1156', diff saved to https://phabricator.wikimedia.org/P15745 and previous config saved to /var/cache/conftool/dbconfig/20210505-073416-root.json
  • 07:32 marostegui@cumin1001: dbctl commit (dc=all): 'db1074 (re)pooling @ 100%: Repool db1074', diff saved to https://phabricator.wikimedia.org/P15744 and previous config saved to /var/cache/conftool/dbconfig/20210505-073223-root.json
  • 07:24 marostegui@cumin1001: dbctl commit (dc=all): 'db1178 (re)pooling @ 15%: Slowly pool db1178 into s8 T275633', diff saved to https://phabricator.wikimedia.org/P15743 and previous config saved to /var/cache/conftool/dbconfig/20210505-072431-root.json
  • 07:21 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3316 (re)pooling @ 75%: Repool db1096:3316 after schema change', diff saved to https://phabricator.wikimedia.org/P15742 and previous config saved to /var/cache/conftool/dbconfig/20210505-072149-root.json
  • 07:19 marostegui@cumin1001: dbctl commit (dc=all): 'db1156 (re)pooling @ 75%: Repool db1156', diff saved to https://phabricator.wikimedia.org/P15741 and previous config saved to /var/cache/conftool/dbconfig/20210505-071912-root.json
  • 07:17 marostegui@cumin1001: dbctl commit (dc=all): 'db1074 (re)pooling @ 75%: Repool db1074', diff saved to https://phabricator.wikimedia.org/P15740 and previous config saved to /var/cache/conftool/dbconfig/20210505-071720-root.json
  • 07:11 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1082 T281794', diff saved to https://phabricator.wikimedia.org/P15739 and previous config saved to /var/cache/conftool/dbconfig/20210505-071132-marostegui.json
  • 07:09 marostegui@cumin1001: dbctl commit (dc=all): 'db1178 (re)pooling @ 10%: Slowly pool db1178 into s8 T275633', diff saved to https://phabricator.wikimedia.org/P15738 and previous config saved to /var/cache/conftool/dbconfig/20210505-070927-root.json
  • 07:06 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3316 (re)pooling @ 50%: Repool db1096:3316 after schema change', diff saved to https://phabricator.wikimedia.org/P15737 and previous config saved to /var/cache/conftool/dbconfig/20210505-070646-root.json
  • 07:04 marostegui@cumin1001: dbctl commit (dc=all): 'db1156 (re)pooling @ 50%: Repool db1156', diff saved to https://phabricator.wikimedia.org/P15736 and previous config saved to /var/cache/conftool/dbconfig/20210505-070409-root.json
  • 07:02 marostegui@cumin1001: dbctl commit (dc=all): 'db1074 (re)pooling @ 50%: Repool db1074', diff saved to https://phabricator.wikimedia.org/P15735 and previous config saved to /var/cache/conftool/dbconfig/20210505-070216-root.json
  • 06:54 marostegui@cumin1001: dbctl commit (dc=all): 'db1178 (re)pooling @ 5%: Slowly pool db1178 into s8 T275633', diff saved to https://phabricator.wikimedia.org/P15734 and previous config saved to /var/cache/conftool/dbconfig/20210505-065423-root.json
  • 06:51 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3316 (re)pooling @ 25%: Repool db1096:3316 after schema change', diff saved to https://phabricator.wikimedia.org/P15733 and previous config saved to /var/cache/conftool/dbconfig/20210505-065142-root.json
  • 06:49 marostegui@cumin1001: dbctl commit (dc=all): 'db1156 (re)pooling @ 25%: Repool db1156', diff saved to https://phabricator.wikimedia.org/P15732 and previous config saved to /var/cache/conftool/dbconfig/20210505-064905-root.json
  • 06:47 marostegui@cumin1001: dbctl commit (dc=all): 'db1074 (re)pooling @ 25%: Repool db1074', diff saved to https://phabricator.wikimedia.org/P15731 and previous config saved to /var/cache/conftool/dbconfig/20210505-064712-root.json
  • 06:42 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1074 and db1156 to switch sanitarium hosts T280492', diff saved to https://phabricator.wikimedia.org/P15730 and previous config saved to /var/cache/conftool/dbconfig/20210505-064204-marostegui.json
  • 06:41 marostegui: Check tables on db1112 (lag might show up on s3 on wiki replicas) T280492
  • 06:39 marostegui@cumin1001: dbctl commit (dc=all): 'db1178 (re)pooling @ 3%: Slowly pool db1178 into s8 T275633', diff saved to https://phabricator.wikimedia.org/P15729 and previous config saved to /var/cache/conftool/dbconfig/20210505-063920-root.json
  • 06:24 marostegui@cumin1001: dbctl commit (dc=all): 'db1178 (re)pooling @ 2%: Slowly pool db1178 into s8 T275633', diff saved to https://phabricator.wikimedia.org/P15728 and previous config saved to /var/cache/conftool/dbconfig/20210505-062416-root.json
  • 06:09 marostegui@cumin1001: dbctl commit (dc=all): 'db1178 (re)pooling @ 1%: Slowly pool db1178 into s8 T275633', diff saved to https://phabricator.wikimedia.org/P15727 and previous config saved to /var/cache/conftool/dbconfig/20210505-060912-root.json
  • 06:08 marostegui@cumin1001: dbctl commit (dc=all): 'Add db1178 into dbctl T275633', diff saved to https://phabricator.wikimedia.org/P15726 and previous config saved to /var/cache/conftool/dbconfig/20210505-060814-marostegui.json
  • 06:06 marostegui@cumin1001: dbctl commit (dc=all): 'Remove db1104 from API', diff saved to https://phabricator.wikimedia.org/P15725 and previous config saved to /var/cache/conftool/dbconfig/20210505-060636-marostegui.json
  • 06:00 marostegui: Restart mysqld on x1 database primary master (db1103) T281212
  • 05:38 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1099:3311 into main traffic', diff saved to https://phabricator.wikimedia.org/P15724 and previous config saved to /var/cache/conftool/dbconfig/20210505-053841-marostegui.json
  • 05:32 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1106 into s1 vslow, remove db1099:3311', diff saved to https://phabricator.wikimedia.org/P15723 and previous config saved to /var/cache/conftool/dbconfig/20210505-053211-marostegui.json
  • 05:29 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1096:3316 for schema change', diff saved to https://phabricator.wikimedia.org/P15722 and previous config saved to /var/cache/conftool/dbconfig/20210505-052943-marostegui.json
  • 04:53 eileen: civicrm revision changed from e7c610fd87 to 8034e47008, config revision is 189788d452
  • 03:58 ryankemper: T280563 `sudo -i cookbook sre.elasticsearch.rolling-operation search_codfw "codfw reboot" --reboot --nodes-per-run 3 --start-datetime 2021-04-29T23:04:29 --task-id T280563` on `ryankemper@cumin1001` tmux session `elastic_restarts`
  • 03:58 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation reboot without plugin upgrade (3 nodes at a time) for ElasticSearch cluster search_codfw: codfw reboot - ryankemper@cumin1001 - T280563
  • 03:56 ryankemper: T280563 Reboot of `eqiad` complete. Only ~half of `codfw` is remaining.
  • 03:56 ryankemper@cumin1001: END (PASS) - Cookbook sre.elasticsearch.rolling-operation (exit_code=0) reboot without plugin upgrade (3 nodes at a time) for ElasticSearch cluster search_eqiad: eqiad reboot to apply sec updates - ryankemper@cumin1001 - T280563
  • 03:54 ryankemper: T280382 `wdqs1011.eqiad.wmnet` has been re-imaged and had the appropriate wikidata/categories journal files transferred. `df -h` shows disk space is no longer an issue following the switch to `raid0`: `/dev/mapper/vg0-srv 2.7T 998G 1.6T 39% /srv`
  • 03:52 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation reboot without plugin upgrade (3 nodes at a time) for ElasticSearch cluster search_eqiad: eqiad reboot to apply sec updates - ryankemper@cumin1001 - T280563
  • 03:51 ryankemper: T280382 [WDQS] `ryankemper@wdqs2007:~$ sudo depool` (need to monitor host to see if it becomes ssh unreachable again or if it was a one-off; also high update lag)
  • 03:50 ryankemper: T280382 `wdqs2007.codfw.wmnet` has been re-imaged and had the appropriate wikidata/categories journal files transferred. `df -h` shows disk space is no longer an issue following the switch to `raid0`: `/dev/mapper/vg0-srv 2.7T 998G 1.6T 39% /srv`
  • 03:07 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 03:02 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 02:59 ryankemper@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) reboot without plugin upgrade (3 nodes at a time) for ElasticSearch cluster search_eqiad: eqiad reboot to apply sec updates - ryankemper@cumin1001 - T280563
  • 01:55 ryankemper: T281327 [Elastic] Unbanned `elastic2043` from cluster
  • 01:50 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 01:49 ryankemper: T280382 `sudo -i cookbook sre.wdqs.data-transfer --source wdqs2001.codfw.wmnet --dest wdqs2007.codfw.wmnet --reason "transferring fresh wikidata journal following reimage" --blazegraph_instance blazegraph` on `ryankemper@cumin1001` tmux session `reimage` (will likely fail due to underlying hw but we'll see)
  • 01:47 ryankemper@cumin1001: END (ERROR) - Cookbook sre.wdqs.data-transfer (exit_code=97)
  • 01:45 ryankemper: T280382 `sudo -i cookbook sre.wdqs.data-transfer --source wdqs1006.eqiad.wmnet --dest wdqs1011.eqiad.wmnet --reason "transferring fresh wikidata journal following reimage" --blazegraph_instance blazegraph` on `ryankemper@cumin1001` tmux session `reimage`
  • 01:45 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 01:44 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 01:43 ryankemper: T280382 [WDQS] `racadm>>racadm serveraction powercycle` on `wdqs2007`
  • 01:39 ryankemper: T280382 `sudo -i cookbook sre.wdqs.data-transfer --source wdqs1006.eqiad.wmnet --dest wdqs1011.eqiad.wmnet --reason "transferring fresh categories journal following reimage" --blazegraph_instance categories` on `ryankemper@cumin1001` tmux session `reimage`
  • 01:39 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 01:36 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation reboot without plugin upgrade (3 nodes at a time) for ElasticSearch cluster search_eqiad: eqiad reboot to apply sec updates - ryankemper@cumin1001 - T280563
  • 00:29 eileen: civicrm revision changed from 94e321dbe0 to e7c610fd87, config revision is 189788d452
  • 00:15 ejegg: updated payments-wiki from 44570561f2 to d449599540
  • 00:08 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 3f6ea8c: Growth: enwiki: Add list of mentors (T281896) (duration: 01m 10s)
  • 00:00 urbanecm@deploy1002: Synchronized fc-list: 9397049: update fc-list to current version on buster (T79424) (duration: 01m 09s)

2021-05-04

  • 23:41 urbanecm@deploy1002: Synchronized wmf-config/config/enwiki.yaml: d29dbb2: Enable Growth features on enwiki in the dark mode (T281896; 3/3) (duration: 01m 09s)
  • 23:40 urbanecm@deploy1002: Synchronized dblists/growthexperiments.dblist: d29dbb2: Enable Growth features on enwiki in the dark mode (T281896; 2/3) (duration: 01m 09s)
  • 23:38 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: d29dbb2: Enable Growth features on enwiki in the dark mode (T281896; 1/3) (duration: 01m 09s)
  • 23:31 urbanecm@deploy1002: Synchronized wmf-config/config/bgwiki.yaml: 5b4c516: Enable Growth team features in dark mode on bgwiki (T280824; 3/3) (duration: 01m 09s)
  • 23:30 urbanecm@deploy1002: sync-file aborted: 5b4c516: Enable Growth team features in dark mode on bgwiki (T280824; 3/3) (duration: 00m 03s)
  • 23:30 urbanecm@deploy1002: Synchronized dblists/growthexperiments.dblist: 5b4c516: Enable Growth team features in dark mode on bgwiki (T280824; 2/3) (duration: 01m 09s)
  • 23:28 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 5b4c516: Enable Growth team features in dark mode on bgwiki (T280824; 1/3) (duration: 01m 09s)
  • 23:26 Urbanecm: Create tables for GrowthExperiments extension on enwiki (T281896)
  • 23:24 Urbanecm: Create tables for GrowthExperiments extension on bgwiki (T280824)
  • 23:22 urbanecm@deploy1002: Synchronized wmf-config/CommonSettings.php: a3c24f3: Avoid using User::getGroups() and ::getEffectiveGroups() (T281823) (duration: 01m 10s)
  • 23:13 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: e467d92: Add extendedconfirmed on ptwiki (T281926) (duration: 01m 10s)
  • 23:06 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 012d613: Add extendedconfirmed on azwiki (T281860) (duration: 01m 10s)
  • 22:49 bblack@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on cp5016.eqsin.wmnet with reason: REIMAGE
  • 22:47 bblack@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on cp5015.eqsin.wmnet with reason: REIMAGE
  • 22:46 bblack@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp5014.eqsin.wmnet with reason: REIMAGE
  • 22:44 bblack@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp5016.eqsin.wmnet with reason: REIMAGE
  • 22:44 bblack@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp5013.eqsin.wmnet with reason: REIMAGE
  • 22:42 bblack@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp5015.eqsin.wmnet with reason: REIMAGE
  • 22:42 bblack@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp5014.eqsin.wmnet with reason: REIMAGE
  • 22:42 bblack@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp5013.eqsin.wmnet with reason: REIMAGE
  • 21:30 eileen: civicrm revision changed from 33a63d5789 to 94e321dbe0, config revision is a212d6ab23
  • 21:17 ebernhardson@deploy1002: Finished deploy [wikimedia/discovery/analytics@06a4a3e]: Bump glent to 0.2.4 (duration: 03m 55s)
  • 21:13 ebernhardson@deploy1002: Started deploy [wikimedia/discovery/analytics@06a4a3e]: Bump glent to 0.2.4
  • 20:13 pt1979@cumin2001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 20:10 pt1979@cumin2001: START - Cookbook sre.dns.netbox
  • 20:09 joal@deploy1002: Finished deploy [analytics/refinery@0dc3ae7] (hadoop-test): Regular analytics weekly train HADOOP-TEST [analytics/refinery@0dc3ae7] (duration: 05m 16s)
  • 20:04 joal@deploy1002: Started deploy [analytics/refinery@0dc3ae7] (hadoop-test): Regular analytics weekly train HADOOP-TEST [analytics/refinery@0dc3ae7]
  • 20:03 joal@deploy1002: Finished deploy [analytics/refinery@0dc3ae7] (thin): Regular analytics weekly train THIN [analytics/refinery@0dc3ae7] (duration: 00m 07s)
  • 20:03 joal@deploy1002: Started deploy [analytics/refinery@0dc3ae7] (thin): Regular analytics weekly train THIN [analytics/refinery@0dc3ae7]
  • 20:03 joal@deploy1002: Finished deploy [analytics/refinery@0dc3ae7]: Regular analytics weekly train [analytics/refinery@0dc3ae7] (duration: 17m 15s)
  • 19:46 joal@deploy1002: Started deploy [analytics/refinery@0dc3ae7]: Regular analytics weekly train [analytics/refinery@0dc3ae7]
  • 19:38 dancy@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.37.0-wmf.4
  • 17:58 dancy@deploy1002: Finished scap: testwikis wikis to 1.37.0-wmf.4 (duration: 42m 33s)
  • 17:26 ebernhardson@deploy1002: Finished deploy [wikimedia/discovery/analytics@0c4538f]: Increase convert_to_esbulk memory overhead (duration: 01m 46s)
  • 17:24 ebernhardson@deploy1002: Started deploy [wikimedia/discovery/analytics@0c4538f]: Increase convert_to_esbulk memory overhead
  • 17:16 dancy@deploy1002: Started scap: testwikis wikis to 1.37.0-wmf.4
  • 17:03 brennen: 1.37.0-wmf.4 was branched at f069fd8 for T281145
  • 17:00 volans: uploaded spicerack_0.0.51 to apt.wikimedia.org bullseye-wikimedia
  • 16:26 ebernhardson@deploy1002: Finished deploy [wikimedia/discovery/analytics@e6ae572]: Increase convert_to_esbulk memory overhead (duration: 01m 54s)
  • 16:25 ebernhardson@deploy1002: Started deploy [wikimedia/discovery/analytics@e6ae572]: Increase convert_to_esbulk memory overhead
  • 16:16 dzahn@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
  • 16:15 dzahn@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
  • 16:13 mutante: k8s: upgrading release=namespaces, helmfile apply to create miscweb namespace T281538
  • 16:13 dzahn@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 16:12 hnowlan@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 16:12 hnowlan@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 16:12 dzahn@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 16:07 hnowlan@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 16:07 hnowlan@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 15:59 hnowlan@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 15:59 hnowlan@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 14:41 jmm@cumin2001: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0)
  • 13:46 moritzm: installing exim security updates on buster
  • 13:39 marostegui@cumin1001: dbctl commit (dc=all): 'db1137 (re)pooling @ 100%: Repool db1137 after mysql upgrade', diff saved to https://phabricator.wikimedia.org/P15721 and previous config saved to /var/cache/conftool/dbconfig/20210504-133950-root.json
  • 13:24 marostegui@cumin1001: dbctl commit (dc=all): 'db1137 (re)pooling @ 75%: Repool db1137 after mysql upgrade', diff saved to https://phabricator.wikimedia.org/P15720 and previous config saved to /var/cache/conftool/dbconfig/20210504-132446-root.json
  • 13:14 moritzm: upgrading linux-libc-dev on buster hosts (to version introduced by 10.9 point release)
  • 13:12 hnowlan@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 13:12 hnowlan@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 13:09 marostegui@cumin1001: dbctl commit (dc=all): 'db1137 (re)pooling @ 50%: Repool db1137 after mysql upgrade', diff saved to https://phabricator.wikimedia.org/P15719 and previous config saved to /var/cache/conftool/dbconfig/20210504-130943-root.json
  • 13:01 moritzm: installing debian-archive-keyring updates on buster
  • 12:54 marostegui@cumin1001: dbctl commit (dc=all): 'db1137 (re)pooling @ 25%: Repool db1137 after mysql upgrade', diff saved to https://phabricator.wikimedia.org/P15718 and previous config saved to /var/cache/conftool/dbconfig/20210504-125439-root.json
  • 12:50 marostegui: Upgrade mysql and kernel on db1137 T281212
  • 12:49 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1137 to upgrade its mysql T281212', diff saved to https://phabricator.wikimedia.org/P15717 and previous config saved to /var/cache/conftool/dbconfig/20210504-124937-marostegui.json
  • 12:48 marostegui@cumin1001: dbctl commit (dc=all): 'db1120 (re)pooling @ 100%: Repool db1120 after mysql upgrade', diff saved to https://phabricator.wikimedia.org/P15716 and previous config saved to /var/cache/conftool/dbconfig/20210504-124848-root.json
  • 12:46 kormat@cumin1001: dbctl commit (dc=all): 'Repooling after sanitarium master switch T280751', diff saved to https://phabricator.wikimedia.org/P15715 and previous config saved to /var/cache/conftool/dbconfig/20210504-124647-kormat.json
  • 12:35 kormat@cumin1001: dbctl commit (dc=all): 'Depooling for sanitarium master switch T280751', diff saved to https://phabricator.wikimedia.org/P15714 and previous config saved to /var/cache/conftool/dbconfig/20210504-123537-kormat.json
  • 12:34 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 15 hosts with reason: Replace db1085 with db1165 T280751
  • 12:34 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 15 hosts with reason: Replace db1085 with db1165 T280751
  • 12:33 marostegui@cumin1001: dbctl commit (dc=all): 'db1120 (re)pooling @ 75%: Repool db1120 after mysql upgrade', diff saved to https://phabricator.wikimedia.org/P15713 and previous config saved to /var/cache/conftool/dbconfig/20210504-123344-root.json
  • 12:27 hnowlan@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 12:27 hnowlan@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 12:22 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 683b876: 5763630: GrowthExperiments: Rename control variant to control, GrowthExperiments: Set linkrecommendation variant to 0 (T281727) (duration: 00m 58s)
  • 12:20 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.3/extensions/GrowthExperiments/: 8f938c2: c8c07ab: GrowthExperiments backports (T281727) (duration: 00m 59s)
  • 12:18 marostegui@cumin1001: dbctl commit (dc=all): 'db1120 (re)pooling @ 50%: Repool db1120 after mysql upgrade', diff saved to https://phabricator.wikimedia.org/P15712 and previous config saved to /var/cache/conftool/dbconfig/20210504-121841-root.json
  • 12:08 jmm@cumin2001: START - Cookbook sre.cassandra.roll-restart
  • 12:03 marostegui@cumin1001: dbctl commit (dc=all): 'db1120 (re)pooling @ 25%: Repool db1120 after mysql upgrade', diff saved to https://phabricator.wikimedia.org/P15711 and previous config saved to /var/cache/conftool/dbconfig/20210504-120337-root.json
  • 11:58 marostegui: Upgrade mysql and kernel on db1120 T281212
  • 11:56 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1120 to upgrade its mysql T281212', diff saved to https://phabricator.wikimedia.org/P15710 and previous config saved to /var/cache/conftool/dbconfig/20210504-115634-marostegui.json
  • 11:40 jmm@cumin2001: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0)
  • 11:31 Urbanecm: Run `User::newSystemUser( 'Maintenance script', [ 'steal' => true ] );` on arwiki, bnwiki, viwiki (T278710, T281703)
  • 11:27 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 87dff0b: GrowthExperiments: Enable link recommendations for target wikis (T278710) (duration: 00m 57s)
  • 11:10 Urbanecm: Create growthexperiments_link_recommendations and growthexperiments_link_submissions on arwiki,bnwiki,viwiki x1 (T266913)
  • 11:06 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 8228f6b: Disable ContentTranslation New article campaign in fiwiki (T277473) (duration: 00m 59s)
  • 10:26 marostegui@cumin1001: dbctl commit (dc=all): 'db1167 (re)pooling @ 100%: Repool db1167', diff saved to https://phabricator.wikimedia.org/P15707 and previous config saved to /var/cache/conftool/dbconfig/20210504-102649-root.json
  • 10:11 marostegui@cumin1001: dbctl commit (dc=all): 'db1167 (re)pooling @ 75%: Repool db1167', diff saved to https://phabricator.wikimedia.org/P15705 and previous config saved to /var/cache/conftool/dbconfig/20210504-101145-root.json
  • 09:57 moritzm: installing bind9 security updates on buster (client side tools/libs only)
  • 09:56 marostegui@cumin1001: dbctl commit (dc=all): 'db1167 (re)pooling @ 50%: Repool db1167', diff saved to https://phabricator.wikimedia.org/P15704 and previous config saved to /var/cache/conftool/dbconfig/20210504-095642-root.json
  • 09:45 godog: +50G for prometheus k8s in codfw
  • 09:41 marostegui@cumin1001: dbctl commit (dc=all): 'db1167 (re)pooling @ 25%: Repool db1167', diff saved to https://phabricator.wikimedia.org/P15703 and previous config saved to /var/cache/conftool/dbconfig/20210504-094138-root.json
  • 09:04 jmm@cumin2001: START - Cookbook sre.cassandra.roll-restart
  • 09:04 moritzm: rolling restart of cassandra in codfw to pick up Java security updates
  • 08:17 marostegui@cumin1001: dbctl commit (dc=all): 'db1082 (re)pooling @ 100%: Repool db1082', diff saved to https://phabricator.wikimedia.org/P15702 and previous config saved to /var/cache/conftool/dbconfig/20210504-081716-root.json
  • 08:02 marostegui: Check tables on db1106, lag will show up on s1 on wiki replicas (T280492)
  • 08:02 marostegui@cumin1001: dbctl commit (dc=all): 'db1161 (re)pooling @ 100%: Repool db1161', diff saved to https://phabricator.wikimedia.org/P15701 and previous config saved to /var/cache/conftool/dbconfig/20210504-080213-root.json
  • 08:02 marostegui@cumin1001: dbctl commit (dc=all): 'db1082 (re)pooling @ 75%: Repool db1082', diff saved to https://phabricator.wikimedia.org/P15700 and previous config saved to /var/cache/conftool/dbconfig/20210504-080212-root.json
  • 08:02 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1106 from s1 vslow to get its tables checked and pool db1099:3311 instead T280492', diff saved to https://phabricator.wikimedia.org/P15699 and previous config saved to /var/cache/conftool/dbconfig/20210504-080206-marostegui.json
  • 07:46 marostegui@cumin1001: dbctl commit (dc=all): 'db1161 (re)pooling @ 75%: Repool db1161', diff saved to https://phabricator.wikimedia.org/P15698 and previous config saved to /var/cache/conftool/dbconfig/20210504-074639-root.json
  • 07:46 marostegui@cumin1001: dbctl commit (dc=all): 'db1082 (re)pooling @ 50%: Repool db1082', diff saved to https://phabricator.wikimedia.org/P15697 and previous config saved to /var/cache/conftool/dbconfig/20210504-074632-root.json
  • 07:31 marostegui@cumin1001: dbctl commit (dc=all): 'db1161 (re)pooling @ 50%: Repool db1161', diff saved to https://phabricator.wikimedia.org/P15696 and previous config saved to /var/cache/conftool/dbconfig/20210504-073135-root.json
  • 07:31 marostegui@cumin1001: dbctl commit (dc=all): 'db1082 (re)pooling @ 25%: Repool db1082', diff saved to https://phabricator.wikimedia.org/P15695 and previous config saved to /var/cache/conftool/dbconfig/20210504-073127-root.json
  • 07:16 marostegui@cumin1001: dbctl commit (dc=all): 'db1161 (re)pooling @ 25%: Repool db1161', diff saved to https://phabricator.wikimedia.org/P15694 and previous config saved to /var/cache/conftool/dbconfig/20210504-071632-root.json
  • 07:16 marostegui@cumin1001: dbctl commit (dc=all): 'db1082 (re)pooling @ 10%: Repool db1082', diff saved to https://phabricator.wikimedia.org/P15693 and previous config saved to /var/cache/conftool/dbconfig/20210504-071623-root.json
  • 07:11 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1161 and db1082 to change s5 sanitarium master T280492', diff saved to https://phabricator.wikimedia.org/P15692 and previous config saved to /var/cache/conftool/dbconfig/20210504-071146-marostegui.json
  • 06:50 marostegui@cumin1001: dbctl commit (dc=all): 'db1158 (re)pooling @ 100%: Repool db1158', diff saved to https://phabricator.wikimedia.org/P15691 and previous config saved to /var/cache/conftool/dbconfig/20210504-065034-root.json
  • 06:35 marostegui@cumin1001: dbctl commit (dc=all): 'db1158 (re)pooling @ 75%: Repool db1158', diff saved to https://phabricator.wikimedia.org/P15690 and previous config saved to /var/cache/conftool/dbconfig/20210504-063530-root.json
  • 06:20 marostegui@cumin1001: dbctl commit (dc=all): 'db1158 (re)pooling @ 50%: Repool db1158', diff saved to https://phabricator.wikimedia.org/P15689 and previous config saved to /var/cache/conftool/dbconfig/20210504-062027-root.json
  • 06:17 marostegui@cumin1001: dbctl commit (dc=all): 'db1118 (re)pooling @ 100%: Repool db1118', diff saved to https://phabricator.wikimedia.org/P15688 and previous config saved to /var/cache/conftool/dbconfig/20210504-061700-root.json
  • 06:05 marostegui@cumin1001: dbctl commit (dc=all): 'db1158 (re)pooling @ 25%: Repool db1158', diff saved to https://phabricator.wikimedia.org/P15687 and previous config saved to /var/cache/conftool/dbconfig/20210504-060523-root.json
  • 06:01 marostegui@cumin1001: dbctl commit (dc=all): 'db1118 (re)pooling @ 75%: Repool db1118', diff saved to https://phabricator.wikimedia.org/P15686 and previous config saved to /var/cache/conftool/dbconfig/20210504-060156-root.json
  • 05:51 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1167 to clone db1178 T275633', diff saved to https://phabricator.wikimedia.org/P15684 and previous config saved to /var/cache/conftool/dbconfig/20210504-055116-marostegui.json
  • 05:50 marostegui@cumin1001: dbctl commit (dc=all): 'db1158 (re)pooling @ 10%: Repool db1158', diff saved to https://phabricator.wikimedia.org/P15683 and previous config saved to /var/cache/conftool/dbconfig/20210504-055020-root.json
  • 05:46 marostegui@cumin1001: dbctl commit (dc=all): 'db1118 (re)pooling @ 50%: Repool db1118', diff saved to https://phabricator.wikimedia.org/P15682 and previous config saved to /var/cache/conftool/dbconfig/20210504-054653-root.json
  • 05:45 marostegui: Stop mysql on db1158 to clone db1178
  • 05:45 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1158 to clone db1178 T275633', diff saved to https://phabricator.wikimedia.org/P15680 and previous config saved to /var/cache/conftool/dbconfig/20210504-054539-marostegui.json
  • 05:36 marostegui: Deploy schema change on s6 codfw, lag will appear - T266486 T268392 T273360
  • 05:31 marostegui@cumin1001: dbctl commit (dc=all): 'db1118 (re)pooling @ 25%: Repool db1118', diff saved to https://phabricator.wikimedia.org/P15678 and previous config saved to /var/cache/conftool/dbconfig/20210504-053149-root.json
  • 05:26 marostegui@cumin1001: dbctl commit (dc=all): 'db1121 (re)pooling @ 100%: Repool db1121', diff saved to https://phabricator.wikimedia.org/P15677 and previous config saved to /var/cache/conftool/dbconfig/20210504-052612-root.json
  • 05:11 marostegui@cumin1001: dbctl commit (dc=all): 'db1121 (re)pooling @ 75%: Repool db1121', diff saved to https://phabricator.wikimedia.org/P15676 and previous config saved to /var/cache/conftool/dbconfig/20210504-051108-root.json
  • 05:07 marostegui: Restart sanitarium hosts to pick up new filters T263817
  • 04:56 marostegui@cumin1001: dbctl commit (dc=all): 'db1121 (re)pooling @ 50%: Repool db1121', diff saved to https://phabricator.wikimedia.org/P15675 and previous config saved to /var/cache/conftool/dbconfig/20210504-045605-root.json
  • 04:41 marostegui@cumin1001: dbctl commit (dc=all): 'db1121 (re)pooling @ 25%: Repool db1121', diff saved to https://phabricator.wikimedia.org/P15674 and previous config saved to /var/cache/conftool/dbconfig/20210504-044101-root.json
  • 04:06 ryankemper@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) reboot without plugin upgrade (3 nodes at a time) for ElasticSearch cluster search_eqiad: eqiad reboot to apply sec updates - ryankemper@cumin1001 - T280563
  • 03:38 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation reboot without plugin upgrade (3 nodes at a time) for ElasticSearch cluster search_eqiad: eqiad reboot to apply sec updates - ryankemper@cumin1001 - T280563
  • 03:38 ryankemper@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) reboot without plugin upgrade (3 nodes at a time) for ElasticSearch cluster search_eqiad: eqiad reboot to apply sec updates - ryankemper@cumin1001 - T280563
  • 03:36 ryankemper: T280563 `sudo -i cookbook sre.elasticsearch.rolling-operation search_eqiad "eqiad reboot to apply sec updates" --reboot --nodes-per-run 3 --start-datetime 2021-04-29T23:04:29 --task-id T280563`
  • 03:35 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation reboot without plugin upgrade (3 nodes at a time) for ElasticSearch cluster search_eqiad: eqiad reboot to apply sec updates - ryankemper@cumin1001 - T280563
  • 02:09 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on phab2002.codfw.wmnet with reason: REIMAGE
  • 02:07 pt1979@cumin2001: START - Cookbook sre.hosts.downtime for 2:00:00 on phab2002.codfw.wmnet with reason: REIMAGE
  • 01:41 ryankemper@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) reboot without plugin upgrade (3 nodes at a time) for ElasticSearch cluster search_eqiad: eqiad reboot to apply sec updates - ryankemper@cumin1001 - T280563

2021-05-03

  • 23:18 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 230ef57: Prepare for new configuration option (T277951) (duration: 00m 57s)
  • 23:15 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 7c47ee1: Replace $wgRelatedArticlesFooterWhitelistedSkins (T277958) (duration: 00m 57s)
  • 23:14 urbanecm@deploy1002: sync-file aborted: 7c47ee1: Replace $wgRelatedArticlesFooterWhitelistedSkins (T277958)¨ (duration: 00m 01s)
  • 22:17 legoktm: ran disable_list for: iegcom wikien-l fundraiser spcommittee-private-l spcommittee-l mediation-en-l test-second wikifr-colloque-l
  • 22:14 mutante: [backup1001:~] $ sudo check_bacula.py --icinga
  • 21:56 ryankemper: T280563 `sudo -i cookbook sre.elasticsearch.rolling-operation search_eqiad "eqiad reboot to apply sec updates" --reboot --nodes-per-run 3 --start-datetime 2021-04-29T23:04:29 --task-id T280563`
  • 21:55 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation reboot without plugin upgrade (3 nodes at a time) for ElasticSearch cluster search_eqiad: eqiad reboot to apply sec updates - ryankemper@cumin1001 - T280563
  • 21:54 ryankemper: T280563 eqiad reboot failed with: `curator.exceptions.FailedExecution: Exception encountered. Rerun with loglevel DEBUG and/or check Elasticsearch logs for more information. Exception: ConnectionTimeout caused by - ReadTimeoutError(HTTPSConnectionPool(host='search.svc.eqiad.wmnet', port=9243): Read timed out. (read timeout=10))`
  • 21:52 ryankemper@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) reboot without plugin upgrade (3 nodes at a time) for ElasticSearch cluster search_eqiad: eqiad reboot to apply sec updates - ryankemper@cumin1001 - T280563
  • 21:47 ryankemper: T280563 `sudo -i cookbook sre.elasticsearch.rolling-operation search_eqiad "eqiad reboot to apply sec updates" --reboot --nodes-per-run 3 --start-datetime 2021-04-29T23:04:29 --task-id T280563`
  • 21:46 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation reboot without plugin upgrade (3 nodes at a time) for ElasticSearch cluster search_eqiad: eqiad reboot to apply sec updates - ryankemper@cumin1001 - T280563
  • 21:32 krinkle@deploy1002: Synchronized wmf-config/InitialiseSettings.php: d95b91648 (duration: 00m 58s)
  • 21:27 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs1011.eqiad.wmnet with reason: REIMAGE
  • 21:25 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs1011.eqiad.wmnet with reason: REIMAGE
  • 21:22 ryankemper: [WDQS] `ryankemper@wdqs1003:~$ sudo pool`
  • 21:20 ryankemper: T280382 [WDQS] `ryankemper@puppetmaster1001:~$ sudo confctl select 'name=wdqs1011.eqiad.wmnet' set/pooled=no`
  • 21:19 ryankemper@puppetmaster1001: conftool action : set/pooled=no; selector: name=wdqs1011.eqiad.wmnet
  • 21:09 ryankemper: T280382 `sudo -i wmf-auto-reimage-host -p T280382 wdqs1011.eqiad.wmnet` on `ryankemper@cumin1001` tmux session `reimage`
  • 21:06 ryankemper: T280382 `sudo -i cookbook sre.wdqs.data-transfer --source wdqs2001.codfw.wmnet --dest wdqs2007.codfw.wmnet --reason "transferring fresh wikidata journal following reimage" --blazegraph_instance blazegraph` on `ryankemper@cumin1001` tmux session `reimage`
  • 21:05 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 21:02 ryankemper: T280382 `wdqs1010.eqiad.wmnet` has been re-imaged and had the appropriate wikidata/categories journal files transferred. `df -h` shows disk space is no longer an issue following the switch to `raid0`: `/dev/md2 2.6T 975G 1.5T 39% /srv`
  • 20:56 ryankemper: T280382 [WDQS] `ryankemper@wdqs2001:~$ sudo run-puppet-agent --force`
  • 20:44 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 20:42 ryankemper@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99)
  • 20:37 ryankemper: T280382 `sudo -i cookbook sre.wdqs.data-transfer --source wdqs2001.codfw.wmnet --dest wdqs2007.codfw.wmnet --reason "transferring fresh categories journal following reimage" --blazegraph_instance categories` on `ryankemper@cumin1001` tmux session `reimage`
  • 20:37 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 19:40 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs2007.codfw.wmnet with reason: REIMAGE
  • 19:39 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs2007.codfw.wmnet with reason: REIMAGE
  • 19:24 ryankemper: T280382 `sudo -i cookbook sre.wdqs.data-transfer --without-lvs --source wdqs1003.eqiad.wmnet --dest wdqs1010.eqiad.wmnet --reason "transferring fresh wikidata journal following reimage" --blazegraph_instance blazegraph` on `ryankemper@cumin1001` tmux session `reimage`
  • 19:24 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 19:21 ryankemper@puppetmaster1001: conftool action : set/pooled=no; selector: name=wdqs1004.eqiad.wmnet
  • 19:21 ryankemper: T280382 [WDQS] `sudo confctl select 'name=wdqs1004.eqiad.wmnet' set/pooled=no` (`wdqs1004` failed re-image [not sure why yet] and won't let me ssh in to depool so using conftool instead)
  • 18:20 Urbanecm: Morning B&C window done
  • 18:19 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.3/extensions/RelatedArticles/resources/ext.relatedArticles.readMore.bootstrap/index.js: cf9d9da: Hotfix: loadRelatedArticles should consider existence of container element (T281547) (duration: 00m 57s)
  • 18:15 urbanecm@deploy1002: Synchronized wmf-config/filebackend.php: bc1bc90: NOOP: beta: Use upload.wikimedia.beta.wmflabs.o for uploads (T281650; 2/2) (duration: 00m 57s)
  • 18:14 urbanecm@deploy1002: Synchronized wmf-config/CommonSettings.php: bc1bc90: NOOP: beta: Use upload.wikimedia.beta.wmflabs.o for uploads (T281650; 1/2) (duration: 00m 58s)
  • 17:44 ryankemper@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) reboot without plugin upgrade (3 nodes at a time) for ElasticSearch cluster search_eqiad: eqiad reboot to apply sec updates - ryankemper@cumin1001 - T280563
  • 17:20 hashar: Restarting CI Jenkins due to "Gearman worker contint2001.wikimedia.org_manager" thread dieing unexpectedly # T281737
  • 16:30 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation reboot without plugin upgrade (3 nodes at a time) for ElasticSearch cluster search_eqiad: eqiad reboot to apply sec updates - ryankemper@cumin1001 - T280563
  • 16:29 ryankemper: T281498 `sudo confctl select 'name=wdqs2004.codfw.wmnet' set/pooled=yes:weight=10` after merge of https://gerrit.wikimedia.org/r/c/operations/puppet/+/684435
  • 16:27 ryankemper@puppetmaster1001: conftool action : set/pooled=yes:weight=10; selector: name=wdqs2004.codfw.wmnet
  • 16:19 legoktm: legoktm@lists1001:~$ sudo apt install default-mysql-client # for temporary debugging
  • 15:48 pt1979@cumin2001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:44 pt1979@cumin2001: START - Cookbook sre.dns.netbox
  • 15:27 Amir1: upgrade group A to mailman3 (T280322)
  • 14:27 volans: uploaded conftool_1.3.1 to apt.wikimedia.org bullseye-wikimedia
  • 13:43 volans: uploaded cumin_4.1.0 to apt.wikimedia.org buster-wikimedia,bullseye-wikimedia
  • 13:10 Urbanecm: Run `User::newSystemUser( 'Maintenance script', [ 'steal' => true ] )` on cswiki to make the user a proper system user (T281703)
  • 12:36 kostajh: Backport window done
  • 12:33 kharlan@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: GrowthExperiments: Set default variant (T278123) GrowthExperiments: enable link recommendations frontend on cswiki (T278710) (duration: 00m 57s)
  • 12:07 kharlan@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: GrowthExperiments: enable link recommendations backend on cswiki (T278710) (duration: 00m 57s)
  • 11:56 kharlan@deploy1002: Synchronized php-1.37.0-wmf.3/extensions/GrowthExperiments: Backport: refreshLinkRecommendations.php: Use per-wiki locks Handle DB readonly errors (T281382) (duration: 00m 58s)
  • 11:15 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.3/extensions/Popups/: a438b64: Fix settings dialog offering ReferencePreviews when unavailable (T281352) (duration: 00m 58s)
  • 11:11 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: c5a7c67: Set wgGEMentorshipMigrationStage to SCHEMA_COMPAT_NEW everywhere (T279853) (duration: 00m 57s)
  • 11:04 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: f1a5ef0: wikidata: post edit constraint jobs on 70% of edits (T204031) (duration: 00m 57s)
  • 10:59 moritzm: installing avahi security updates on buster
  • 10:47 jdrewniak@deploy1002: Synchronized portals: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 00m 57s)
  • 10:46 jdrewniak@deploy1002: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 00m 58s)
  • 09:42 moritzm: installing python3.7 security updates
  • 09:41 joal@deploy1002: Finished deploy [analytics/refinery@584ed6a] (hadoop-test): Hotfix analytics deploy (monthly sqoop) HADOOP-TEST [analytics/refinery@584ed6a] (duration: 29m 24s)
  • 09:12 joal@deploy1002: Started deploy [analytics/refinery@584ed6a] (hadoop-test): Hotfix analytics deploy (monthly sqoop) HADOOP-TEST [analytics/refinery@584ed6a]
  • 09:10 joal@deploy1002: Finished deploy [analytics/refinery@584ed6a] (thin): Hotfix analytics deploy (monthly sqoop) THIN [analytics/refinery@584ed6a] (duration: 00m 07s)
  • 09:10 joal@deploy1002: Started deploy [analytics/refinery@584ed6a] (thin): Hotfix analytics deploy (monthly sqoop) THIN [analytics/refinery@584ed6a]
  • 09:09 joal@deploy1002: Finished deploy [analytics/refinery@584ed6a]: Hotfix analytics deploy (monthly sqoop) [analytics/refinery@584ed6a] (duration: 16m 06s)
  • 08:52 joal@deploy1002: Started deploy [analytics/refinery@584ed6a]: Hotfix analytics deploy (monthly sqoop) [analytics/refinery@584ed6a]
  • 08:01 moritzm: installing edk2 security updates
  • 07:31 moritzm: installing libimage-exiftool-perl security updates

2021-05-02

  • 13:40 dcaro@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on cloudmetrics1002.eqiad.wmnet with reason: Flaky host
  • 13:40 dcaro@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on cloudmetrics1002.eqiad.wmnet with reason: Flaky host

2021-05-01

  • 19:12 Urbanecm: Invalidate password for MaraBot@SUL (T281586)
  • 16:58 legoktm@deploy1002: Synchronized logos/config.yaml: Add eswiki 20th anniversary logos (duration: 00m 57s)
  • 16:56 legoktm@deploy1002: Synchronized wmf-config/logos.php: Use eswiki 20th anniversary logos (T280908) (duration: 00m 56s)
  • 16:50 legoktm@deploy1002: Synchronized static/images/project-logos/: Add eswiki 20th anniversary logos (duration: 00m 57s)
  • 07:22 elukey: powercycle elastic2033 - no ssh, no tty available via mgmt

Archives

See Server Admin Log/Archives.