You are browsing a read-only backup copy of Wikitech. The live site can be found at wikitech.wikimedia.org

Server Admin Log: Difference between revisions

From Wikitech-static
Jump to navigation Jump to search
imported>Stashbot
(eileen: civicrm revision changed from a1929b3dfd to a480bf03c9, config revision is 77cb7ec866)
imported>Stashbot
(bd808@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'toolhub' for release 'main' .)
Line 1: Line 1:
== 2021-09-29 ==
== 2021-09-29 ==
* 23:20 bd808@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'toolhub' for release 'main' .
* 23:05 bd808@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'toolhub' for release 'main' .
* 23:02 bd808@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'toolhub' for release 'main' .
* 22:10 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 22:07 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 21:57 legoktm@deploy1002: Synchronized php-1.38.0-wmf.2/extensions/timeline/includes/Timeline.php: Catch TimelineException from fixMap() ([[phab:T292126|T292126]]) (duration: 01m 07s)
* 21:48 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 21:44 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 21:37 legoktm@deploy1002: Synchronized php-1.38.0-wmf.2/extensions/timeline/includes/Timeline.php: Bump Timeline::CACHE_VERSION (duration: 01m 08s)
* 20:34 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 20:31 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 20:22 jhuneidi@deploy1002: Synchronized php: group1 wikis to 1.38.0-wmf.2  refs [[phab:T281166|T281166]] (duration: 01m 08s)
* 20:21 jhuneidi@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.38.0-wmf.2  refs [[phab:T281166|T281166]]
* 20:20 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 20:16 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 20:16 jhuneidi@deploy1002: Finished scap: Fix pywikibot feature detection (duration: 13m 38s)
* 20:02 jhuneidi@deploy1002: Started scap: Fix pywikibot feature detection
* 19:17 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 19:13 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 19:06 legoktm@deploy1002: Synchronized php-1.38.0-wmf.2/extensions/timeline/scripts/renderTimeline.sh: Fix passing temp directory to EasyTimeline.pl (duration: 01m 07s)
* 18:59 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 18:55 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 18:52 dancy@deploy1002: Synchronized php-1.38.0-wmf.2/skins/MinervaNeue/resources/skins.minerva.base.styles/ui.less: Backport: [[gerrit:724787{{!}}Search header should be vertically centered, not top aligned(take 2) (T292071)]] (duration: 01m 08s)
* 17:26 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 17:22 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 17:14 ladsgroup@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:724776{{!}}Fully enable change dispatching via jobs on test wikis]], Part I (duration: 01m 09s)
* 17:13 ladsgroup@deploy1002: Synchronized wmf-config/Wikibase.php: Config: [[gerrit:724776{{!}}Fully enable change dispatching via jobs on test wikis]], Part I (duration: 01m 07s)
* 16:52 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 16:47 pt1979@cumin2002: END (PASS) - Cookbook sre.experimental.reimage (exit_code=0) for host thumbor2006.codfw.wmnet
* 16:47 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 16:43 akosiaris: start hbal -L -G row_B -X on ganeti01.svc.codfw.wmnet . Rows C and D are fine
* 16:42 akosiaris: start hbal -L -G row_A -X on ganeti01.svc.codfw.wmnet
* 16:40 akosiaris: migrate kubemaster2001 off ganeti2007 and to ganeti2008 due to memory starvation on ganeti2007
* 16:37 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 16:34 pt1979@cumin2002: START - Cookbook sre.experimental.reimage for host thumbor2006.codfw.wmnet
* 16:33 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 16:25 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.38.0-wmf.2/extensions/WikimediaBadges/: Backport: [[gerrit:724561{{!}}Handle missing items in WikibaseClientSiteLinksForItemHandler (T291953)]] (duration: 01m 08s)
* 16:24 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.38.0-wmf.1/extensions/WikimediaBadges/: Backport: [[gerrit:724560{{!}}Handle missing items in WikibaseClientSiteLinksForItemHandler (T291953)]] (duration: 01m 10s)
* 15:58 pt1979@cumin2002: END (FAIL) - Cookbook sre.experimental.reimage (exit_code=99) for host thumbor2006.codfw.wmnet
* 15:53 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 15:50 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 15:45 Amir1: disabled cron dispatching for mediawikiwiki
* 15:44 ladsgroup@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:724765{{!}}Enable change dispatching via jobs in wikidatawiki (T48643)]] (duration: 01m 08s)
* 15:44 pt1979@cumin2002: START - Cookbook sre.experimental.reimage for host thumbor2006.codfw.wmnet
* 15:39 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2035.codfw.wmnet
* 15:39 ladsgroup@deploy1002: Synchronized php-1.38.0-wmf.2/extensions/Wikibase/client: Backport: [[gerrit:724558{{!}}Track time until dispatched recent changes are inserted (T291962)]] (duration: 01m 10s)
* 15:24 pt1979@cumin2002: END (PASS) - Cookbook sre.experimental.reimage (exit_code=0) for host thumbor2006.codfw.wmnet
* 15:22 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be2035.codfw.wmnet
* 15:21 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2026.codfw.wmnet
* 15:12 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2026.codfw.wmnet
* 15:02 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality' for release 'main' .
* 14:51 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2025.codfw.wmnet
* 14:45 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2025.codfw.wmnet
* 14:38 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 14:35 jmm@cumin2002: START - Cookbook sre.dns.netbox
* 14:25 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts testvm2001.codfw.wmnet
* 14:17 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts testvm2001.codfw.wmnet
* 14:08 dcausse@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
* 14:07 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
* 14:07 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 14:04 pt1979@cumin2002: START - Cookbook sre.experimental.reimage for host thumbor2006.codfw.wmnet
* 14:01 dcausse@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
* 13:38 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'.
* 13:38 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'.
* 13:34 volans@cumin2002: END (PASS) - Cookbook sre.experimental.reimage (exit_code=0) for host sretest1001.eqiad.wmnet
* 13:31 dcausse@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
* 13:11 otto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-analytics' for release 'production' .
* 13:11 otto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-analytics' for release 'canary' .
* 13:09 volans@cumin2002: START - Cookbook sre.experimental.reimage for host sretest1001.eqiad.wmnet
* 13:09 otto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-analytics' for release 'canary' .
* 13:09 otto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-analytics' for release 'production' .
* 13:08 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'.
* 13:08 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'.
* 13:04 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
* 13:04 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 12:09 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 12:06 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 11:56 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 11:52 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 11:48 Lucas_WMDE: EU backport+config window done
* 11:48 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.38.0-wmf.2/skins/MinervaNeue/skinStyles/mobile.startup/Overlay.less: Backport: [[gerrit:724553{{!}}Revert "Search header should be vertically centered, not top aligned." (T292030)]] (duration: 01m 07s)
* 11:43 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.38.0-wmf.2/extensions/Wikibase/repo/includes/Store/Sql/SqlSiteLinkConflictLookup.php: Backport: [[gerrit:724371{{!}}Use CONN_TRX_AUTOCOMMIT in SqlSiteLinkConflictLookup (T291377)]] (duration: 01m 07s)
* 11:43 volans@cumin2002: END (PASS) - Cookbook sre.experimental.reimage (exit_code=0) for host sretest1001.eqiad.wmnet
* 11:42 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 11:38 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 11:26 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 11:26 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:722279{{!}}Enable line numbering on all namespaces (pilot wikis) (T280027)]] (duration: 01m 09s)
* 11:22 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 11:20 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.38.0-wmf.1/extensions/DiscussionTools/modules/dt.ui.ReplyWidget.js: Backport: [[gerrit:724378{{!}}Fix almost all errors codes being logged as `http-0` (T290514)]] (duration: 01m 09s)
* 11:17 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.38.0-wmf.2/extensions/DiscussionTools/modules/dt.ui.ReplyWidget.js: Backport: [[gerrit:724379{{!}}Fix almost all errors codes being logged as `http-0` (T290514)]] (duration: 01m 09s)
* 11:16 volans@cumin2002: START - Cookbook sre.experimental.reimage for host sretest1001.eqiad.wmnet
* 11:15 volans@cumin2002: END (ERROR) - Cookbook sre.experimental.reimage (exit_code=97) for host sretest1001.eqiad.wmnet
* 10:35 volans@cumin2002: START - Cookbook sre.experimental.reimage for host sretest1001.eqiad.wmnet
* 10:34 volans@cumin2002: END (ERROR) - Cookbook sre.experimental.reimage (exit_code=97) for host sretest1001.eqiad.wmnet
* 10:24 volans@cumin2002: START - Cookbook sre.experimental.reimage for host sretest1001.eqiad.wmnet
* 10:02 volans@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:05:00 on cumin1001.eqiad.wmnet with reason: testing latest change
* 10:00 volans@cumin1001: START - Cookbook sre.hosts.downtime for 0:05:00 on cumin1001.eqiad.wmnet with reason: testing latest change
* 09:54 godog: bounce mtail on centrallog* - [[phab:T246470|T246470]]
* 09:47 dcausse@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
* 09:40 jgiannelos@deploy1002: Finished deploy [kartotherian/deploy@0a38bc5]: tegola: use eqiad discovery endpoin (duration: 00m 11s)
* 09:39 jgiannelos@deploy1002: Started deploy [kartotherian/deploy@0a38bc5]: tegola: use eqiad discovery endpoin
* 08:58 dcausse@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
* 08:39 dcausse@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
* 08:22 ema: fleet-wide rm /etc/rsyslog.d/00-abort-unclean-config.conf && systemctl restart rsyslog
* 07:51 godog: fail sdg on be2036 - [[phab:T291988|T291988]]
* 07:25 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db2081 [[phab:T290868|T290868]]', diff saved to https://phabricator.wikimedia.org/P17345 and previous config saved to /var/cache/conftool/dbconfig/20210929-072520-marostegui.json
* 07:15 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 06:15 marostegui: Deploy schema change on s8 codfw (lag will show up) [[phab:T283499|T283499]]
* 06:10 ryankemper: [[phab:T289517|T289517]] Ran puppet across query_service fleet `sudo cumin -b 6 'P<nowiki>{</nowiki>w*qs*<nowiki>}</nowiki>' 'sudo run-puppet-agent'`
* 06:09 ryankemper: [[phab:T289517|T289517]] Merged https://gerrit.wikimedia.org/r/c/operations/puppet/+/720746 (fix dcat-ap loading)
* 05:56 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db2103 [[phab:T290865|T290865]]', diff saved to https://phabricator.wikimedia.org/P17344 and previous config saved to /var/cache/conftool/dbconfig/20210929-055645-marostegui.json
* 04:50 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2081 [[phab:T290868|T290868]]', diff saved to https://phabricator.wikimedia.org/P17342 and previous config saved to /var/cache/conftool/dbconfig/20210929-045033-marostegui.json
* 03:18 eileen: civicrm revision changed from {{Gerrit|a0bc324a61}} to {{Gerrit|2ecb8f0bcd}}, config revision is {{Gerrit|77cb7ec866}}
* 03:01 eileen: civicrm revision changed from {{Gerrit|1b7bae4033}} to {{Gerrit|a0bc324a61}}, config revision is {{Gerrit|77cb7ec866}}
* 03:00 eileen: civicrm revision changed from {{Gerrit|a480bf03c9}} to {{Gerrit|1b7bae4033}}, config revision is {{Gerrit|77cb7ec866}}
* 02:36 legoktm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Have PdfHandler/PagedTiffHandler use Shellbox on all wikis but Commons (duration: 01m 07s)
* 00:52 eileen: civicrm revision changed from {{Gerrit|a1929b3dfd}} to {{Gerrit|a480bf03c9}}, config revision is {{Gerrit|77cb7ec866}}
* 00:52 eileen: civicrm revision changed from {{Gerrit|a1929b3dfd}} to {{Gerrit|a480bf03c9}}, config revision is {{Gerrit|77cb7ec866}}
* 00:27 legoktm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Have SyntaxHighlight use Shellbox on all wikis (duration: 01m 18s)
* 00:27 legoktm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Have SyntaxHighlight use Shellbox on all wikis (duration: 01m 18s)

Revision as of 23:20, 29 September 2021

2021-09-29

  • 23:20 bd808@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'toolhub' for release 'main' .
  • 23:05 bd808@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'toolhub' for release 'main' .
  • 23:02 bd808@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'toolhub' for release 'main' .
  • 22:10 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 22:07 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 21:57 legoktm@deploy1002: Synchronized php-1.38.0-wmf.2/extensions/timeline/includes/Timeline.php: Catch TimelineException from fixMap() (T292126) (duration: 01m 07s)
  • 21:48 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 21:44 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 21:37 legoktm@deploy1002: Synchronized php-1.38.0-wmf.2/extensions/timeline/includes/Timeline.php: Bump Timeline::CACHE_VERSION (duration: 01m 08s)
  • 20:34 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 20:31 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 20:22 jhuneidi@deploy1002: Synchronized php: group1 wikis to 1.38.0-wmf.2 refs T281166 (duration: 01m 08s)
  • 20:21 jhuneidi@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.38.0-wmf.2 refs T281166
  • 20:20 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 20:16 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 20:16 jhuneidi@deploy1002: Finished scap: Fix pywikibot feature detection (duration: 13m 38s)
  • 20:02 jhuneidi@deploy1002: Started scap: Fix pywikibot feature detection
  • 19:17 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 19:13 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 19:06 legoktm@deploy1002: Synchronized php-1.38.0-wmf.2/extensions/timeline/scripts/renderTimeline.sh: Fix passing temp directory to EasyTimeline.pl (duration: 01m 07s)
  • 18:59 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 18:55 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 18:52 dancy@deploy1002: Synchronized php-1.38.0-wmf.2/skins/MinervaNeue/resources/skins.minerva.base.styles/ui.less: Backport: Search header should be vertically centered, not top aligned(take 2) (T292071) (duration: 01m 08s)
  • 17:26 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 17:22 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 17:14 ladsgroup@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Fully enable change dispatching via jobs on test wikis, Part I (duration: 01m 09s)
  • 17:13 ladsgroup@deploy1002: Synchronized wmf-config/Wikibase.php: Config: Fully enable change dispatching via jobs on test wikis, Part I (duration: 01m 07s)
  • 16:52 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 16:47 pt1979@cumin2002: END (PASS) - Cookbook sre.experimental.reimage (exit_code=0) for host thumbor2006.codfw.wmnet
  • 16:47 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 16:43 akosiaris: start hbal -L -G row_B -X on ganeti01.svc.codfw.wmnet . Rows C and D are fine
  • 16:42 akosiaris: start hbal -L -G row_A -X on ganeti01.svc.codfw.wmnet
  • 16:40 akosiaris: migrate kubemaster2001 off ganeti2007 and to ganeti2008 due to memory starvation on ganeti2007
  • 16:37 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 16:34 pt1979@cumin2002: START - Cookbook sre.experimental.reimage for host thumbor2006.codfw.wmnet
  • 16:33 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 16:25 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.38.0-wmf.2/extensions/WikimediaBadges/: Backport: Handle missing items in WikibaseClientSiteLinksForItemHandler (T291953) (duration: 01m 08s)
  • 16:24 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.38.0-wmf.1/extensions/WikimediaBadges/: Backport: Handle missing items in WikibaseClientSiteLinksForItemHandler (T291953) (duration: 01m 10s)
  • 15:58 pt1979@cumin2002: END (FAIL) - Cookbook sre.experimental.reimage (exit_code=99) for host thumbor2006.codfw.wmnet
  • 15:53 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 15:50 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 15:45 Amir1: disabled cron dispatching for mediawikiwiki
  • 15:44 ladsgroup@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Enable change dispatching via jobs in wikidatawiki (T48643) (duration: 01m 08s)
  • 15:44 pt1979@cumin2002: START - Cookbook sre.experimental.reimage for host thumbor2006.codfw.wmnet
  • 15:39 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2035.codfw.wmnet
  • 15:39 ladsgroup@deploy1002: Synchronized php-1.38.0-wmf.2/extensions/Wikibase/client: Backport: Track time until dispatched recent changes are inserted (T291962) (duration: 01m 10s)
  • 15:24 pt1979@cumin2002: END (PASS) - Cookbook sre.experimental.reimage (exit_code=0) for host thumbor2006.codfw.wmnet
  • 15:22 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be2035.codfw.wmnet
  • 15:21 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2026.codfw.wmnet
  • 15:12 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2026.codfw.wmnet
  • 15:02 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality' for release 'main' .
  • 14:51 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2025.codfw.wmnet
  • 14:45 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2025.codfw.wmnet
  • 14:38 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:35 jmm@cumin2002: START - Cookbook sre.dns.netbox
  • 14:25 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts testvm2001.codfw.wmnet
  • 14:17 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts testvm2001.codfw.wmnet
  • 14:08 dcausse@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
  • 14:07 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
  • 14:07 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
  • 14:04 pt1979@cumin2002: START - Cookbook sre.experimental.reimage for host thumbor2006.codfw.wmnet
  • 14:01 dcausse@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
  • 13:38 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'.
  • 13:38 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'.
  • 13:34 volans@cumin2002: END (PASS) - Cookbook sre.experimental.reimage (exit_code=0) for host sretest1001.eqiad.wmnet
  • 13:31 dcausse@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
  • 13:11 otto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-analytics' for release 'production' .
  • 13:11 otto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-analytics' for release 'canary' .
  • 13:09 volans@cumin2002: START - Cookbook sre.experimental.reimage for host sretest1001.eqiad.wmnet
  • 13:09 otto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-analytics' for release 'canary' .
  • 13:09 otto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-analytics' for release 'production' .
  • 13:08 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'.
  • 13:08 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'.
  • 13:04 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
  • 13:04 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
  • 12:09 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 12:06 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 11:56 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 11:52 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 11:48 Lucas_WMDE: EU backport+config window done
  • 11:48 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.38.0-wmf.2/skins/MinervaNeue/skinStyles/mobile.startup/Overlay.less: Backport: Revert "Search header should be vertically centered, not top aligned." (T292030) (duration: 01m 07s)
  • 11:43 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.38.0-wmf.2/extensions/Wikibase/repo/includes/Store/Sql/SqlSiteLinkConflictLookup.php: Backport: Use CONN_TRX_AUTOCOMMIT in SqlSiteLinkConflictLookup (T291377) (duration: 01m 07s)
  • 11:43 volans@cumin2002: END (PASS) - Cookbook sre.experimental.reimage (exit_code=0) for host sretest1001.eqiad.wmnet
  • 11:42 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 11:38 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 11:26 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 11:26 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Enable line numbering on all namespaces (pilot wikis) (T280027) (duration: 01m 09s)
  • 11:22 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 11:20 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.38.0-wmf.1/extensions/DiscussionTools/modules/dt.ui.ReplyWidget.js: Backport: Fix almost all errors codes being logged as `http-0` (T290514) (duration: 01m 09s)
  • 11:17 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.38.0-wmf.2/extensions/DiscussionTools/modules/dt.ui.ReplyWidget.js: Backport: Fix almost all errors codes being logged as `http-0` (T290514) (duration: 01m 09s)
  • 11:16 volans@cumin2002: START - Cookbook sre.experimental.reimage for host sretest1001.eqiad.wmnet
  • 11:15 volans@cumin2002: END (ERROR) - Cookbook sre.experimental.reimage (exit_code=97) for host sretest1001.eqiad.wmnet
  • 10:35 volans@cumin2002: START - Cookbook sre.experimental.reimage for host sretest1001.eqiad.wmnet
  • 10:34 volans@cumin2002: END (ERROR) - Cookbook sre.experimental.reimage (exit_code=97) for host sretest1001.eqiad.wmnet
  • 10:24 volans@cumin2002: START - Cookbook sre.experimental.reimage for host sretest1001.eqiad.wmnet
  • 10:02 volans@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:05:00 on cumin1001.eqiad.wmnet with reason: testing latest change
  • 10:00 volans@cumin1001: START - Cookbook sre.hosts.downtime for 0:05:00 on cumin1001.eqiad.wmnet with reason: testing latest change
  • 09:54 godog: bounce mtail on centrallog* - T246470
  • 09:47 dcausse@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
  • 09:40 jgiannelos@deploy1002: Finished deploy [kartotherian/deploy@0a38bc5]: tegola: use eqiad discovery endpoin (duration: 00m 11s)
  • 09:39 jgiannelos@deploy1002: Started deploy [kartotherian/deploy@0a38bc5]: tegola: use eqiad discovery endpoin
  • 08:58 dcausse@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
  • 08:39 dcausse@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
  • 08:22 ema: fleet-wide rm /etc/rsyslog.d/00-abort-unclean-config.conf && systemctl restart rsyslog
  • 07:51 godog: fail sdg on be2036 - T291988
  • 07:25 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db2081 T290868', diff saved to https://phabricator.wikimedia.org/P17345 and previous config saved to /var/cache/conftool/dbconfig/20210929-072520-marostegui.json
  • 07:15 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 06:15 marostegui: Deploy schema change on s8 codfw (lag will show up) T283499
  • 06:10 ryankemper: T289517 Ran puppet across query_service fleet `sudo cumin -b 6 'P{w*qs*}' 'sudo run-puppet-agent'`
  • 06:09 ryankemper: T289517 Merged https://gerrit.wikimedia.org/r/c/operations/puppet/+/720746 (fix dcat-ap loading)
  • 05:56 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db2103 T290865', diff saved to https://phabricator.wikimedia.org/P17344 and previous config saved to /var/cache/conftool/dbconfig/20210929-055645-marostegui.json
  • 04:50 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2081 T290868', diff saved to https://phabricator.wikimedia.org/P17342 and previous config saved to /var/cache/conftool/dbconfig/20210929-045033-marostegui.json
  • 03:18 eileen: civicrm revision changed from a0bc324a61 to 2ecb8f0bcd, config revision is 77cb7ec866
  • 03:01 eileen: civicrm revision changed from 1b7bae4033 to a0bc324a61, config revision is 77cb7ec866
  • 03:00 eileen: civicrm revision changed from a480bf03c9 to 1b7bae4033, config revision is 77cb7ec866
  • 02:36 legoktm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Have PdfHandler/PagedTiffHandler use Shellbox on all wikis but Commons (duration: 01m 07s)
  • 00:52 eileen: civicrm revision changed from a1929b3dfd to a480bf03c9, config revision is 77cb7ec866
  • 00:27 legoktm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Have SyntaxHighlight use Shellbox on all wikis (duration: 01m 18s)
  • 00:21 ryankemper: T280001 `ryankemper@authdns1001:~$ sudo -i authdns-update` following merge of https://gerrit.wikimedia.org/r/c/operations/dns/+/724538
  • 00:19 ryankemper: T280001 Okay now we're clear to proceed to https://wikitech.wikimedia.org/wiki/LVS#For_active/active_services; merging https://gerrit.wikimedia.org/r/c/operations/dns/+/724538
  • 00:15 ryankemper: T280001 `ryankemper@cumin1001:~$ sudo cumin 'A:icinga or A:dns-auth' run-puppet-agent` per https://wikitech.wikimedia.org/wiki/LVS#Make_the_service_page,_add_discovery_resources
  • 00:14 ryankemper: T280001 Moving wcqs state from `monitoring_setup` to `production`; merged https://gerrit.wikimedia.org/r/c/operations/puppet/+/724536

2021-09-28

  • 23:53 ryankemper: T280001 New icinga checks are green, will proceed to next step of moving wcqs state from `monitoring_setup` -> `production`
  • 23:49 ryankemper: T280001 New icinga alerts showing up as expected following wcqs state change to `monitoring_setup`: `LVS wcqs codfw port 443/tcp - Wikimedia Commons Query Service IPv4` and `LVS wcqs eqiad port 443/tcp - Wikimedia Commons Query Service IPv4`
  • 23:45 ryankemper: T280001 Changing wcqs state from `lvs_setup` to `monitoring_setup`: `ryankemper@cumin1001:~$ sudo cumin 'A:icinga' 'run-puppet-agent'`
  • 23:14 ryankemper: !log T282117 `error: plugin_geoip: Invalid resource name 'disc-wcqs' detected from zonefile lookup` We must be missing a line, reverting change to fix
  • 23:14 ryankemper: T282117 `ryankemper@authdns1001:~$ sudo -i authdns-update` following merge of https://gerrit.wikimedia.org/r/724520
  • 23:13 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mw2419.codfw.wmnet with reason: REIMAGE
  • 23:11 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2419.codfw.wmnet with reason: REIMAGE
  • 22:46 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mw2418.codfw.wmnet with reason: REIMAGE
  • 22:44 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2418.codfw.wmnet with reason: REIMAGE
  • 22:41 legoktm@deploy1002: Finished scap: Fix erroneous en-gb translations in 1.38.0-wmf.1 (T291717) (duration: 17m 43s)
  • 22:25 eileen: civicrm revision changed from b8f756b60e to a1929b3dfd, config revision is 77cb7ec866
  • 22:25 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mw2417.codfw.wmnet with reason: REIMAGE
  • 22:23 legoktm@deploy1002: Started scap: Fix erroneous en-gb translations in 1.38.0-wmf.1 (T291717)
  • 22:23 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2417.codfw.wmnet with reason: REIMAGE
  • 22:17 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mw2416.codfw.wmnet with reason: REIMAGE
  • 22:15 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2416.codfw.wmnet with reason: REIMAGE
  • 22:15 legoktm@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=wcqs
  • 21:58 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mw2415.codfw.wmnet with reason: REIMAGE
  • 21:56 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2415.codfw.wmnet with reason: REIMAGE
  • 21:52 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mw2414.codfw.wmnet with reason: REIMAGE
  • 21:49 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2414.codfw.wmnet with reason: REIMAGE
  • 21:22 ryankemper: T280247 Puppet run complete on all of `cp-text`, trafficserver backend work is done
  • 21:22 pt1979@cumin1001: END (PASS) - Cookbook sre.experimental.reimage (exit_code=0) for host thumbor2005.codfw.wmnet
  • 21:19 bd808: bd808@mwmaint1002 echo "https://toolhub.wikimedia.org/static/js/chunk-vendors.js" | mwscript purgeList.php
  • 21:17 topranks: Configure cr2-esams for NaWas BGP peering to gateway-1 IPv6 and gateway-2 (T288505)
  • 21:11 topranks: Configure cr2-esams for NaWas BGP peering to gateway-1 IPv4 (T288505)
  • 21:10 ryankemper: T280247 `ryankemper@cumin1001:~$ sudo cumin -b 5 'A:cp-text' 'sudo run-puppet-agent --force'`
  • 21:09 ryankemper: T280247 `ryankemper@cp1075:~$ sudo grep commons-query /etc/trafficserver/remap.config` shows `map http://commons-query.wikimedia.org https://wcqs.discovery.wmnet`; proceeding to rest of fleet in batches of 5
  • 21:08 pt1979@cumin1001: START - Cookbook sre.experimental.reimage for host thumbor2005.codfw.wmnet
  • 21:07 ryankemper: T280247 Running on single cp-text host: `ryankemper@cp1075:~$ sudo run-puppet-agent --force`
  • 21:05 ryankemper: T280247 Merged https://gerrit.wikimedia.org/r/c/operations/puppet/+/720078
  • 21:03 ryankemper: T280247 `ryankemper@cumin1001:~$ sudo cumin 'A:cp-text' 'sudo disable-puppet "Add trafficserver backend mapping for commons-query.wikimedia.org - T280247"'`
  • 21:02 legoktm: legoktm@deploy1002:~$ echo "https://toolhub.wikimedia.org/" | mwscript purgeList.php
  • 20:52 otto@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics' for release 'canary' .
  • 20:52 otto@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics' for release 'production' .
  • 20:51 ryankemper: T280247 Puppet successfully ran on all `w*qs*` hosts; GUI working as before for WDQS, and WCQS seems fine as well. Deploy succeeded without any hitches
  • 20:49 legoktm: re-enabling and running puppet on A:cp-text: sudo cumin -b 5 A:cp-text 'enable-puppet --force && run-puppet-agent'
  • 20:49 otto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
  • 20:49 otto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'canary' .
  • 20:41 legoktm: disabling puppet on A:cp-text in preparation for adding toolhub
  • 20:38 ryankemper: T280247 `ryankemper@cumin1001:~$ sudo cumin -b 5 'P{w*qs*}' 'sudo run-puppet-agent --force'`; 25 hosts total so will take 5 iterations
  • 20:37 ryankemper: T280247 Test queries on `wdqs1003` passed (tunneled into `wdqs1003`), proceeding to rest of fleet
  • 20:37 ryankemper: T280247 Ran on wdqs canary `wdqs1003`: `ryankemper@wdqs1003:~$ sudo run-puppet-agent --force`
  • 20:33 ryankemper: T280247 Running on single wcqs hosts: `ryankemper@wcqs1001:~$ sudo run-puppet-agent --force`
  • 20:33 ryankemper: T280247 `ryankemper@cumin1001` -> `sudo cumin 'P{w*qs*}' 'sudo disable-puppet "Make query_service nginx proxy to GUI microsite - T280247"'`
  • 20:33 topranks: Adding IPv6 address to NaWas sub-interfaceon cr2-esams (AMS-IX) - T288505
  • 19:48 jhuneidi@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.38.0-wmf.2 refs T281166
  • 19:35 legoktm@deploy1002: Synchronized private/PrivateSettings.php: Use IPUtils instead of removed IP class (T292010) (duration: 01m 09s)
  • 19:27 jhuneidi@deploy1002: rebuilt and synchronized wikiversions files: Revert "group0 wikis to 1.38.0-wmf.1"
  • 19:08 jhuneidi@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.38.0-wmf.2 refs T281166
  • 19:05 legoktm@cumin1001: conftool action : set/pooled=true; selector: name=eqiad,dnsdisc=toolhub
  • 19:04 legoktm: adding toolhub to discovery DNS (T280881)
  • 19:00 jgiannelos@deploy1002: Finished deploy [kartotherian/deploy@0a38bc5]: tegola: use eqiad discovery endpoin (duration: 00m 20s)
  • 19:00 jgiannelos@deploy1002: Started deploy [kartotherian/deploy@0a38bc5]: tegola: use eqiad discovery endpoin
  • 18:54 ryankemper: T280001 Merged https://gerrit.wikimedia.org/r/c/operations/puppet/+/721600 (add wcqs scap dsh groups), running puppet on scap::dsh hosts: `ryankemper@cumin1001:~$ sudo cumin 'P:scap::dsh' 'sudo run-puppet-agent'`
  • 18:45 jhuneidi@deploy1002: Finished scap: testwikis wikis to 1.38.0-wmf.2 refs T281166 (duration: 49m 27s)
  • 18:21 pt1979@cumin1001: END (FAIL) - Cookbook sre.experimental.reimage (exit_code=99) for host thumbor2005.codfw.wmnet
  • 18:18 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on puppetmaster1005.eqiad.wmnet with reason: REIMAGE
  • 18:18 jgiannelos@deploy1002: Finished deploy [kartotherian/deploy@0a38bc5]: tegola: use eqiad discovery endpoin (duration: 00m 08s)
  • 18:18 jgiannelos@deploy1002: Started deploy [kartotherian/deploy@0a38bc5]: tegola: use eqiad discovery endpoin
  • 18:16 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on puppetmaster1005.eqiad.wmnet with reason: REIMAGE
  • 18:14 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on puppetmaster1004.eqiad.wmnet with reason: REIMAGE
  • 18:12 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on puppetmaster1004.eqiad.wmnet with reason: REIMAGE
  • 18:02 otto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
  • 18:02 otto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'canary' .
  • 18:01 pt1979@cumin1001: START - Cookbook sre.experimental.reimage for host thumbor2005.codfw.wmnet
  • 18:00 otto@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
  • 17:57 otto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
  • 17:57 otto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'canary' .
  • 17:55 jhuneidi@deploy1002: Started scap: testwikis wikis to 1.38.0-wmf.2 refs T281166
  • 17:50 pt1979@cumin2002: END (PASS) - Cookbook sre.experimental.reimage (exit_code=0) for host mw2413.codfw.wmnet
  • 17:48 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:46 otto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'canary' .
  • 17:46 otto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
  • 17:44 pt1979@cumin2002: START - Cookbook sre.dns.netbox
  • 17:44 otto@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
  • 17:36 jgiannelos@deploy1002: Finished deploy [kartotherian/deploy@0a38bc5]: tegola: use eqiad discovery endpoin (duration: 00m 11s)
  • 17:36 jgiannelos@deploy1002: Started deploy [kartotherian/deploy@0a38bc5]: tegola: use eqiad discovery endpoin
  • 17:35 jgiannelos@deploy1002: Finished deploy [kartotherian/deploy@0a38bc5]: tegola: use eqiad discovery endpoin (duration: 00m 17s)
  • 17:35 pt1979@cumin2002: START - Cookbook sre.experimental.reimage for host mw2413.codfw.wmnet
  • 17:35 jgiannelos@deploy1002: Started deploy [kartotherian/deploy@0a38bc5]: tegola: use eqiad discovery endpoin
  • 17:32 jgiannelos@deploy1002: Finished deploy [kartotherian/deploy@0a38bc5]: tegola: use eqiad discovery endpoin (duration: 00m 06s)
  • 17:32 jgiannelos@deploy1002: Started deploy [kartotherian/deploy@0a38bc5]: tegola: use eqiad discovery endpoin
  • 17:29 mbsantos@deploy1002: Finished deploy [kartotherian/deploy@0a38bc5]: tegola: use eqiad discovery endpoin (duration: 02m 43s)
  • 17:26 mbsantos@deploy1002: Started deploy [kartotherian/deploy@0a38bc5]: tegola: use eqiad discovery endpoin
  • 17:24 mbsantos@deploy1002: Finished deploy [kartotherian/deploy@35b9174]: tegola: remove mirror_threshold variable because of parsing errors (duration: 00m 24s)
  • 17:24 mbsantos@deploy1002: Started deploy [kartotherian/deploy@35b9174]: tegola: remove mirror_threshold variable because of parsing errors
  • 17:23 pt1979@cumin2002: END (FAIL) - Cookbook sre.experimental.reimage (exit_code=99) for host mw2413.codfw.wmnet
  • 17:14 mbsantos@deploy1002: Finished deploy [kartotherian/deploy@1f90e6f]: tegola: hard code threshold because deployment fails (duration: 00m 18s)
  • 17:13 mbsantos@deploy1002: Started deploy [kartotherian/deploy@1f90e6f]: tegola: hard code threshold because deployment fails
  • 17:09 mbsantos@deploy1002: Finished deploy [kartotherian/deploy@3e52e0a]: tegola: use global config var for load tests (duration: 00m 11s)
  • 17:09 mbsantos@deploy1002: Started deploy [kartotherian/deploy@3e52e0a]: tegola: use global config var for load tests
  • 17:05 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
  • 17:04 jgiannelos@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
  • 17:04 pt1979@cumin2002: START - Cookbook sre.experimental.reimage for host mw2413.codfw.wmnet
  • 17:00 pt1979@cumin2002: END (PASS) - Cookbook sre.experimental.reimage (exit_code=0) for host mw2412.codfw.wmnet
  • 16:46 pt1979@cumin2002: START - Cookbook sre.experimental.reimage for host mw2412.codfw.wmnet
  • 16:39 bd808@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'toolhub' for release 'main' .
  • 16:38 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.roll-restart-workers (exit_code=0) restart workers for Hadoop analytics cluster: Roll restart of jvm daemons for openjdk upgrade. - elukey@cumin1001
  • 16:28 mbsantos@deploy1002: Finished deploy [kartotherian/deploy@3e52e0a]: tegola: use global config var for load tests (duration: 00m 14s)
  • 16:28 mbsantos@deploy1002: Started deploy [kartotherian/deploy@3e52e0a]: tegola: use global config var for load tests
  • 16:27 bd808@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'toolhub' for release 'main' .
  • 16:26 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:21 pt1979@cumin2002: START - Cookbook sre.dns.netbox
  • 16:19 mbsantos@deploy1002: Finished deploy [kartotherian/deploy@f35571e] (eqiad): tegola: mirror kartotherian/eqiad traffic to codfw/tegola (duration: 00m 18s)
  • 16:19 mbsantos@deploy1002: Started deploy [kartotherian/deploy@f35571e] (eqiad): tegola: mirror kartotherian/eqiad traffic to codfw/tegola
  • 16:16 bd808@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'toolhub' for release 'main' .
  • 16:13 bd808@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'toolhub' for release 'main' .
  • 16:12 jgiannelos@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
  • 16:10 pt1979@cumin2002: END (FAIL) - Cookbook sre.experimental.reimage (exit_code=99) for host mw2412.codfw.wmnet
  • 16:09 jgiannelos@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
  • 16:07 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
  • 15:53 pt1979@cumin2002: START - Cookbook sre.experimental.reimage for host mw2412.codfw.wmnet
  • 15:41 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 15:39 _joe_: restarting pybal on lvs2010
  • 15:38 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 15:31 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'toolhub' for release 'main' .
  • 14:51 _joe_: restarting pybals in codfw again
  • 14:41 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'toolhub' for release 'main' .
  • 14:39 elukey@cumin1001: START - Cookbook sre.hadoop.roll-restart-workers restart workers for Hadoop analytics cluster: Roll restart of jvm daemons for openjdk upgrade. - elukey@cumin1001
  • 14:38 marostegui: Remove flaggedimages from s5 T290340
  • 14:36 _joe_: restarting pybal on lvs2009
  • 14:34 _joe_: restarting pybal on lvs1015
  • 14:33 hnowlan@puppetmaster1001: conftool action : set/pooled=false; selector: dnsdisc=kartotherian,name=codfw
  • 14:32 _joe_: restarting pybal on lvs2010
  • 14:32 arturo: add packages for buster-wikimedia|thirdparty/kubeadm-k8s-1-20 (T280402)
  • 14:31 _joe_: restarting pybal on lvs1016
  • 13:40 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2080 T290868', diff saved to https://phabricator.wikimedia.org/P17339 and previous config saved to /var/cache/conftool/dbconfig/20210928-134030-marostegui.json
  • 13:40 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2103 T290865', diff saved to https://phabricator.wikimedia.org/P17337 and previous config saved to /var/cache/conftool/dbconfig/20210928-134012-marostegui.json
  • 13:39 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on centrallog2002.codfw.wmnet with reason: REIMAGE
  • 13:37 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on centrallog2002.codfw.wmnet with reason: REIMAGE
  • 13:36 marostegui@cumin1001: END (PASS) - Cookbook sre.experimental.reimage (exit_code=0) for host db2103.codfw.wmnet
  • 13:36 otto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-main' for release 'canary' .
  • 13:36 otto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
  • 13:33 otto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
  • 13:33 otto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-main' for release 'canary' .
  • 13:30 otto@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
  • 13:03 marostegui@cumin1001: START - Cookbook sre.experimental.reimage for host db2103.codfw.wmnet
  • 13:01 btullis@deploy1002: Finished deploy [analytics/refinery@380d165] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@380d165] (duration: 07m 02s)
  • 12:54 btullis@deploy1002: Started deploy [analytics/refinery@380d165] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@380d165]
  • 12:54 btullis@deploy1002: Finished deploy [analytics/refinery@380d165] (thin): Regular analytics weekly train THIN [analytics/refinery@380d165] (duration: 00m 07s)
  • 12:53 btullis@deploy1002: Started deploy [analytics/refinery@380d165] (thin): Regular analytics weekly train THIN [analytics/refinery@380d165]
  • 12:53 btullis@deploy1002: Finished deploy [analytics/refinery@380d165]: Regular analytics weekly train [analytics/refinery@380d165] (duration: 17m 42s)
  • 12:35 btullis@deploy1002: Started deploy [analytics/refinery@380d165]: Regular analytics weekly train [analytics/refinery@380d165]
  • 12:29 akosiaris@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
  • 12:27 akosiaris@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
  • 12:11 urbanecm: [urbanecm@wtp1026 ~]$ sudo -i /usr/local/sbin/restart-php7.2-fpm
  • 12:10 Lucas_WMDE: lucaswerkmeister-wmde@wtp1026:~$ sudo -u mwdeploy /usr/local/sbin/restart-php7.2-fpm # attempt to solve a recurrence of T290120, but it failed
  • 12:06 marostegui: Remove flaggedimages from s7 T290340
  • 12:03 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 12:00 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 11:57 Lucas_WMDE: EU backport+config window done
  • 11:54 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.38.0-wmf.1/extensions/Wikibase/repo/includes/Store/Sql/SqlSiteLinkConflictLookup.php: Backport: Use CONN_TRX_AUTOCOMMIT in SqlSiteLinkConflictLookup (T291377) (duration: 00m 57s)
  • 11:32 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host testvm2001.codfw.wmnet
  • 11:31 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 11:29 marostegui: Deploy schema change on s3 codfw (lag will show up) T283499
  • 11:29 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Add support for SectionTranslationTargetLanguages (T290302, T290175) (duration: 00m 57s)
  • 11:29 arturo: cleanup unused repo component buster-wikimedia|thirdparty/kubeadm-k8s-1-18 (T280402)
  • 11:28 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 11:27 elukey@cumin1001: END (PASS) - Cookbook sre.ores.roll-restart-workers (exit_code=0) for ORES codfw cluster: Roll restart of ORES's daemons. - elukey@cumin1001
  • 11:25 marostegui: Deploy schema change on s6 codfw (lag will show up) T283499
  • 11:12 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host testvm2001.codfw.wmnet
  • 11:09 ladsgroup@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Enable new dispatch via job approach on testwikidata and testwiki (T291610) (duration: 00m 57s)
  • 11:08 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts testvm2002.codfw.wmnet
  • 11:07 elukey@cumin1001: START - Cookbook sre.ores.roll-restart-workers for ORES codfw cluster: Roll restart of ORES's daemons. - elukey@cumin1001
  • 11:05 effie: downgrading scap to 3.17.1 on deploy1002 - T291095
  • 11:01 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host rdb1011.eqiad.wmnet
  • 10:53 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host rdb1011.eqiad.wmnet
  • 10:49 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 10:46 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 10:46 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts testvm2002.codfw.wmnet
  • 10:45 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host rdb1012.eqiad.wmnet
  • 10:40 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host rdb1012.eqiad.wmnet
  • 10:32 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host rdb2009.codfw.wmnet
  • 10:29 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host rdb2009.codfw.wmnet
  • 10:23 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host rdb2007.codfw.wmnet
  • 10:16 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host rdb2007.codfw.wmnet
  • 10:13 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host rdb2010.codfw.wmnet
  • 10:10 elukey@cumin1001: END (PASS) - Cookbook sre.ores.roll-restart-workers (exit_code=0) for ORES codfw cluster: Roll restart of ORES's daemons. - elukey@cumin1001
  • 10:08 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host rdb2010.codfw.wmnet
  • 10:01 marostegui: Deploy schema change on s5 codfw (lag will show up) T283499
  • 10:00 marostegui: Deploy schema change on s7 codfw (lag will show up) T283499
  • 09:57 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host rdb2008.codfw.wmnet
  • 09:50 elukey@cumin1001: START - Cookbook sre.ores.roll-restart-workers for ORES codfw cluster: Roll restart of ORES's daemons. - elukey@cumin1001
  • 09:48 _joe_: removing old builds from compiler1002.puppet-diffs.eqiad1.wikimedia.cloud
  • 09:48 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host rdb2008.codfw.wmnet
  • 09:46 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host rdb1010.eqiad.wmnet
  • 09:42 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host rdb1010.eqiad.wmnet
  • 09:37 elukey@cumin1001: END (PASS) - Cookbook sre.kafka.roll-restart-mirror-maker (exit_code=0) restart MirrorMaker for Kafka A:kafka-mirror-maker-test-eqiad cluster: Roll restart of jvm daemons. - elukey@cumin1001
  • 09:27 elukey@cumin1001: START - Cookbook sre.kafka.roll-restart-mirror-maker restart MirrorMaker for Kafka A:kafka-mirror-maker-test-eqiad cluster: Roll restart of jvm daemons. - elukey@cumin1001
  • 09:26 marostegui: Deploy schema change on s4 codfw (lag will show up) T283499
  • 09:23 marostegui: Deploy schema change on s2 codfw (lag will show up) T283499
  • 09:00 marostegui@cumin1001: END (PASS) - Cookbook sre.experimental.reimage (exit_code=0) for host db2080.codfw.wmnet
  • 08:57 effie: upgrade scap on eqiad and codfw - T291095
  • 08:30 marostegui@cumin1001: START - Cookbook sre.experimental.reimage for host db2080.codfw.wmnet
  • 08:17 volans: uploaded spicerack_1.0.3 to apt.wikimedia.org buster-wikimedia,bullseye-wikimedia
  • 07:38 volans@cumin2002: END (PASS) - Cookbook sre.experimental.reimage (exit_code=0) for host sretest1002.eqiad.wmnet
  • 07:21 volans@cumin2002: END (PASS) - Cookbook sre.experimental.reimage (exit_code=0) for host sretest1001.eqiad.wmnet
  • 07:14 volans@cumin2002: START - Cookbook sre.experimental.reimage for host sretest1002.eqiad.wmnet
  • 06:54 volans@cumin2002: START - Cookbook sre.experimental.reimage for host sretest1001.eqiad.wmnet
  • 06:52 volans@cumin2002: END (FAIL) - Cookbook sre.experimental.reimage (exit_code=99) for host sretest1002.eqiad.wmnet
  • 06:52 volans@cumin2002: START - Cookbook sre.experimental.reimage for host sretest1002.eqiad.wmnet
  • 06:42 volans: installed spicerack 1.0.2 on cumin2002
  • 05:10 marostegui: Remove flaggedimages from s6 T290340
  • 02:42 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 02:39 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 02:15 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 02:12 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 01:26 eileen: civicrm revision changed from ef5367bffc to b8f756b60e, config revision is 77cb7ec866

2021-09-27

  • 23:42 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 23:40 krinkle@deploy1002: Synchronized docroot/wikipedia.org/speed-tests/: I82f072 (duration: 00m 59s)
  • 23:39 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 23:30 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 23:27 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 23:17 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 1891d28: Deploy Growth features to 100% of newcomers of small wikis (T291876) (duration: 00m 57s)
  • 22:58 legoktm@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'shellbox-syntaxhighlight' for release 'main' .
  • 22:57 legoktm@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'shellbox-syntaxhighlight' for release 'main' .
  • 22:48 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 22:44 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 22:34 legoktm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Have SyntaxHighlight use Shellbox on group1 wikis too (T289227) (duration: 00m 57s)
  • 22:31 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 22:28 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 22:27 legoktm@deploy1002: Synchronized wmf-config/CommonSettings.php: Have PdfHandler use Shellbox service on group0 wikis (T289228) (2/2) (duration: 00m 56s)
  • 22:26 legoktm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Have PdfHandler use Shellbox service on group0 wikis (T289228) (1/2) (duration: 00m 57s)
  • 22:25 legoktm@deploy1002: sync-file aborted: Have PdfHandler use Shellbox service on group0 wikis (T289228) (duration: 00m 00s)
  • 22:23 maryum: deployed security patch for T291696
  • 22:14 legoktm@deploy1002: Synchronized wmf-config/CommonSettings.php: Have PagedTiffHandler use Shellbox service on group0 wikis (T289228) (2/2) (duration: 00m 58s)
  • 22:14 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 22:13 legoktm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Have PagedTiffHandler use Shellbox service on group0 wikis (T289228) (1/2) (duration: 00m 57s)
  • 22:11 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 21:41 tzatziki: re-running `extensions/SecurePoll/cli/wm-scripts/makeGlobalVoterList.php` for MCDC elections (in a screen this time) (https://phabricator.wikimedia.org/T291668)
  • 21:37 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 21:36 mutante: puppetmaster2001 - systemctl disable sync-puppet-ca, systemctl unmask sync-puppet-ca, rm /usr/lib/systemd/system/sync-puppet-ca.*, systemctl stop sync-puppet-ca.timer
  • 21:34 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 21:34 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 21:33 tzatziki: running `extensions/SecurePoll/cli/wm-scripts/makeGlobalVoterList.php` for MCDC elections
  • 21:29 mutante: puppetmaster2001 - rm /usr/lib/systemd/system/sync-puppet-ca.*
  • 21:28 pt1979@cumin2002: START - Cookbook sre.dns.netbox
  • 21:25 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 21:24 mutante: puppetmaster2001 systemctl reset-failed
  • 21:22 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 21:20 legoktm@deploy1002: Synchronized wmf-config/CommonSettings.php: Set $wgTimelineFonts and send all Timeline generation to Shellbox (T289226) (2/2) (duration: 00m 56s)
  • 21:18 legoktm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Set $wgTimelineFonts and send all Timeline generation to Shellbox (T289226) (1/2) (duration: 00m 56s)
  • 21:16 mutante: puppetmaster2001 - /usr/bin/rsync -avz --delete puppetmaster1001.eqiad.wmnet::puppet_ca /var/lib/puppet/server/ssl/ca
  • 21:14 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 21:10 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 21:01 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 21:01 legoktm@deploy1002: Synchronized docroot/: Configure Timeline like most other extensions (4/3) (duration: 00m 56s)
  • 20:59 legoktm@deploy1002: Synchronized wmf-config/: Configure Timeline like most other extensions (3/3) (duration: 00m 57s)
  • 20:59 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 20:56 legoktm@deploy1002: Synchronized wmf-config/CommonSettings.php: Configure Timeline like most other extensions (2/3) (duration: 00m 56s)
  • 20:50 legoktm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Configure Timeline like most other extensions (1/3) (duration: 00m 58s)
  • 20:42 mutante: [puppetmaster2001:~] $ sudo systemctl start sync-puppet-volatile
  • 20:28 brennen: gitlab1001: done with user renames, restarting gitlab to apply session duration value after a reconfiguration
  • 20:06 brennen: gitlab1001: ~1hr downtime to attempt migration of usernames to shell uid (T288392)
  • 20:00 mutante: ms-be2036 - remove commeeted out swift-drive-audit cron
  • 19:55 eileen: civicrm revision changed from 18228490ae to ef5367bffc, config revision is 77cb7ec866
  • 19:32 otto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-main' for release 'canary' .
  • 19:32 otto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
  • 19:28 otto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-main' for release 'canary' .
  • 19:28 otto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
  • 19:26 otto@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
  • 19:24 otto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'canary' .
  • 19:24 otto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
  • 19:22 otto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'canary' .
  • 19:22 otto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
  • 19:20 otto@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
  • 19:16 otto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-analytics' for release 'canary' .
  • 19:16 otto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-analytics' for release 'production' .
  • 19:16 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on elastic1082.eqiad.wmnet with reason: REIMAGE
  • 19:15 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on elastic1081.eqiad.wmnet with reason: REIMAGE
  • 19:14 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 19:14 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on elastic1080.eqiad.wmnet with reason: REIMAGE
  • 19:14 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic1083.eqiad.wmnet with reason: REIMAGE
  • 19:13 otto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-analytics' for release 'canary' .
  • 19:13 otto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-analytics' for release 'production' .
  • 19:12 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic1079.eqiad.wmnet with reason: REIMAGE
  • 19:12 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 19:11 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic1083.eqiad.wmnet with reason: REIMAGE
  • 19:11 otto@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics' for release 'production' .
  • 19:11 otto@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics' for release 'canary' .
  • 19:11 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic1082.eqiad.wmnet with reason: REIMAGE
  • 19:10 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on elastic1078.eqiad.wmnet with reason: REIMAGE
  • 19:10 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic1081.eqiad.wmnet with reason: REIMAGE
  • 19:09 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic1080.eqiad.wmnet with reason: REIMAGE
  • 19:08 otto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'canary' .
  • 19:08 otto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
  • 19:08 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic1079.eqiad.wmnet with reason: REIMAGE
  • 19:08 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic1078.eqiad.wmnet with reason: REIMAGE
  • 19:05 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic1077.eqiad.wmnet with reason: REIMAGE
  • 19:05 otto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'canary' .
  • 19:05 otto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
  • 19:03 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic1076.eqiad.wmnet with reason: REIMAGE
  • 19:03 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 19:02 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic1077.eqiad.wmnet with reason: REIMAGE
  • 19:01 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on elastic1075.eqiad.wmnet with reason: REIMAGE
  • 19:00 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic1076.eqiad.wmnet with reason: REIMAGE
  • 18:59 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 18:59 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic1075.eqiad.wmnet with reason: REIMAGE
  • 18:57 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic1074.eqiad.wmnet with reason: REIMAGE
  • 18:56 otto@deploy1002: Synchronized wmf-config/CommonSettings.php: REVERT: Enable x_client_ip_forwarding_enabled for eventgate-analytics and eventgate-analytics-external - T288853 (duration: 00m 56s)
  • 18:55 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on elastic1073.eqiad.wmnet with reason: REIMAGE
  • 18:53 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic1074.eqiad.wmnet with reason: REIMAGE
  • 18:53 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic1073.eqiad.wmnet with reason: REIMAGE
  • 18:52 otto@deploy1002: scap failed: average error rate on 6/9 canaries increased by 10x (rerun with --force to override this check, see https://logstash.wikimedia.org/goto/83629bcb5560d11e61d3085c89dd9ed6 for details)
  • 18:50 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic1072.eqiad.wmnet with reason: REIMAGE
  • 18:48 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on elastic1069.eqiad.wmnet with reason: REIMAGE
  • 18:47 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic1072.eqiad.wmnet with reason: REIMAGE
  • 18:46 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic1069.eqiad.wmnet with reason: REIMAGE
  • 18:43 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic1071.eqiad.wmnet with reason: REIMAGE
  • 18:42 otto@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
  • 18:41 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on elastic1070.eqiad.wmnet with reason: REIMAGE
  • 18:41 Amir1: Deployed patch for T284419 second time
  • 18:40 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic1071.eqiad.wmnet with reason: REIMAGE
  • 18:39 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic1070.eqiad.wmnet with reason: REIMAGE
  • 18:35 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on elastic1068.eqiad.wmnet with reason: REIMAGE
  • 18:33 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic1068.eqiad.wmnet with reason: REIMAGE
  • 18:32 urbanecm@deploy1002: Synchronized php-1.38.0-wmf.1/includes/changetags/ChangeTags.php: b1f4b4e: ChangeTags: Set interface flag when parsing tag names (T291776) (duration: 00m 56s)
  • 18:30 cmjohnson1: updating firmware on sessionstore1003
  • 18:21 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 18:18 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 18:16 Amir1: Deployed patch for T284419
  • 18:10 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 2cb6f47: Growth: Promote 208 wikis out of dark mode (T290582) (duration: 00m 56s)
  • 17:54 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 17:51 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 17:46 ladsgroup@deploy1002: Synchronized php-1.38.0-wmf.1/includes/Title.php: Backport: Expand local URLs to absolute URLs in ParserOutput (T263581), Part IV (duration: 00m 56s)
  • 17:44 ladsgroup@deploy1002: Synchronized php-1.38.0-wmf.1/includes/parser/ParserCache.php: Backport: Expand local URLs to absolute URLs in ParserOutput (T263581), Part III (duration: 00m 56s)
  • 17:43 ladsgroup@deploy1002: Synchronized php-1.38.0-wmf.1/includes/parser/ParserOutput.php: Backport: Expand local URLs to absolute URLs in ParserOutput (T263581), Part II (duration: 00m 57s)
  • 17:42 ladsgroup@deploy1002: Synchronized php-1.38.0-wmf.1/includes/page/Article.php: Backport: Expand local URLs to absolute URLs in ParserOutput (T263581), Part I (duration: 00m 59s)
  • 17:39 volans: uploaded spicerack_1.0.2 to apt.wikimedia.org buster-wikimedia,bullseye-wikimedia
  • 17:32 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 17:29 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 17:27 dzahn@cumin1001: conftool action : set/weight=10; selector: name=ldap-replica2006.wikimedia.org
  • 17:26 dzahn@cumin1001: conftool action : set/weight=10; selector: name=ldap-replica2006.wikimedia.org
  • 17:26 dzahn@cumin1001: conftool action : set/weight=10; selector: name=ldap-replica2005.wikimedia.org
  • 17:26 dzahn@cumin1001: conftool action : set/weight=10; selector: name=ldap-replica200*.wikimedia.org
  • 17:25 dzahn@cumin1001: conftool action : set/weight=10; selector: name=ldap-replica1004.wikimedia.org
  • 17:25 dzahn@cumin1001: conftool action : set/weight=10; selector: name=ldap-replica1003.wikimedia.org
  • 17:24 dzahn@cumin1001: conftool action : set/weight=10; selector: name=ldap-replica*
  • 17:24 dzahn@cumin1001: conftool action : set/weight=10; selector: name=ldap-ro*.eqiad.wmnet
  • 16:18 otto@puppetmaster1001: conftool action : set/ttl=300; selector: dnsdisc=eventgate-logging-external
  • 16:16 otto@puppetmaster1001: conftool action : set/pooled=true; selector: dnsdisc=eventgate-logging-external,name=codfw
  • 16:14 otto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
  • 16:14 otto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'canary' .
  • 16:12 otto@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
  • 16:10 ottomata: reverting eventgate-logging-external chart change in codfw - T291504
  • 16:08 urbanecm: [urbanecm@mwmaint1002 ~]$ scap pull # T291836
  • 16:01 urbanecm: Livehack debugging at mwmaint1002 for T291836
  • 14:41 urbanecm: /usr/local/bin/foreachwikiindblist /srv/mediawiki/dblists/growthexperiments.dblist extensions/GrowthExperiments/maintenance/updateMenteeData.php --statsd # measuring time backports saved
  • 14:39 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 14:38 otto@puppetmaster1001: conftool action : set/pooled=false; selector: dnsdisc=eventgate-logging-external,name=codfw
  • 14:36 urbanecm@deploy1002: Synchronized php-1.38.0-wmf.1/extensions/GrowthExperiments/: 08f1e73: 3b154db: GrowthExperiments backports (T290609, T291658) (duration: 00m 58s)
  • 14:36 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 14:33 volker-e@deploy1002: Finished deploy [design/style-guide@9b3b0fb]: Deploy design/style-guide: 9b3b0fb “Apps”: Fix typos and unify orthography (#491) (duration: 00m 06s)
  • 14:33 volker-e@deploy1002: Started deploy [design/style-guide@9b3b0fb]: Deploy design/style-guide: 9b3b0fb “Apps”: Fix typos and unify orthography (#491)
  • 14:30 otto@puppetmaster1001: conftool action : set/pooled=true; selector: dnsdisc=eventgate-logging-external,name=codfw
  • 14:27 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 14:24 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 14:14 otto@deploy1002: conftool action : set/pooled=true; selector: dnsdisc=eventgate-logging-external,name=codfw
  • 14:11 otto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'canary' .
  • 14:11 otto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
  • 13:59 otto@puppetmaster1001: conftool action : set/pooled=false; selector: dnsdisc=eventgate-logging-external,name=codfw
  • 13:58 ottomata: beginning re-deploy of eventgate-logging-external - https://phabricator.wikimedia.org/T291504#7380252
  • 13:57 otto@puppetmaster1001: conftool action : set/ttl=10; selector: dnsdisc=eventgate-logging-external
  • 13:52 oblivian@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
  • 13:48 otto@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
  • 13:36 mbsantos@deploy1002: Finished deploy [kartotherian/deploy@04d2df4]: tegola: use eqiad discovery endpoin (duration: 00m 15s)
  • 13:35 mbsantos@deploy1002: Started deploy [kartotherian/deploy@04d2df4]: tegola: use eqiad discovery endpoin
  • 11:45 marostegui: Upgrade es4 in codfw to 10.4.21
  • 11:43 marostegui: Turn off es2021 for onsite maintenance T290327
  • 11:09 volans: re-enabled puppet on install hosts after deployment of g/723996 - T221388
  • 11:02 volans: disabling puppet on install hosts to deploy 723996 - T221388
  • 10:35 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-fe2003.codfw.wmnet
  • 10:29 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host thanos-fe2003.codfw.wmnet
  • 10:02 jmm@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cumin2002.codfw.wmnet
  • 09:55 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-fe2002.codfw.wmnet
  • 09:53 jmm@cumin1001: START - Cookbook sre.hosts.reboot-single for host cumin2002.codfw.wmnet
  • 09:51 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host thanos-fe2002.codfw.wmnet
  • 09:50 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-fe2001.codfw.wmnet
  • 09:45 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mx2001.wikimedia.org
  • 09:44 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host thanos-fe2001.codfw.wmnet
  • 09:43 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-fe1003.eqiad.wmnet
  • 09:42 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host mx2001.wikimedia.org
  • 09:38 marostegui: Optimize table commonswiki.image on codfw (s4 will show lag) - T288273
  • 09:38 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudcontrol1004.wikimedia.org
  • 09:37 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host thanos-fe1003.eqiad.wmnet
  • 09:36 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-fe1002.eqiad.wmnet
  • 09:34 jbond@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:10:00 on ldap-replica2006.wikimedia.org with reason: reboot - T291813
  • 09:33 jbond@cumin2002: START - Cookbook sre.hosts.downtime for 0:10:00 on ldap-replica2006.wikimedia.org with reason: reboot - T291813
  • 09:31 jbond@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:10:00 on ldap-replica2005.wikimedia.org with reason: reboot - T291813
  • 09:30 jbond@cumin2002: START - Cookbook sre.hosts.downtime for 0:10:00 on ldap-replica2005.wikimedia.org with reason: reboot - T291813
  • 09:30 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host thanos-fe1002.eqiad.wmnet
  • 09:29 moritzm: systemctl reset-failed networking T273026
  • 09:29 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-fe1001.eqiad.wmnet
  • 09:27 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:10:00 on ldap-replica1004.wikimedia.org with reason: reboot - T291813
  • 09:27 jbond@cumin1001: START - Cookbook sre.hosts.downtime for 0:10:00 on ldap-replica1004.wikimedia.org with reason: reboot - T291813
  • 09:27 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mx1001.wikimedia.org
  • 09:24 aborrero@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudcontrol1004.wikimedia.org
  • 09:23 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host thanos-fe1001.eqiad.wmnet
  • 09:22 filippo@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host thanos-fe1001.eqiad.wmnet
  • 09:22 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host thanos-fe1001.eqiad.wmnet
  • 09:18 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host mx1001.wikimedia.org
  • 09:17 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:10:00 on ldap-replica1003.wikimedia.org with reason: reboot - T291813
  • 09:17 jbond@cumin1001: START - Cookbook sre.hosts.downtime for 0:10:00 on ldap-replica1003.wikimedia.org with reason: reboot - T291813
  • 09:13 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host failoid1002.eqiad.wmnet
  • 09:10 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host failoid1002.eqiad.wmnet
  • 09:09 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:10:00 on people1003.eqiad.wmnet with reason: reboot - T291813
  • 09:09 jbond@cumin1001: START - Cookbook sre.hosts.downtime for 0:10:00 on people1003.eqiad.wmnet with reason: reboot - T291813
  • 09:07 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:10:00 on people2002.codfw.wmnet with reason: reboot - T291813
  • 09:07 jbond@cumin1001: START - Cookbook sre.hosts.downtime for 0:10:00 on people2002.codfw.wmnet with reason: reboot - T291813
  • 09:07 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host failoid2002.codfw.wmnet
  • 09:04 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host failoid2002.codfw.wmnet
  • 08:35 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host copernicium.wikimedia.org
  • 08:35 dcausse@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
  • 08:30 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host copernicium.wikimedia.org
  • 08:27 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host theemin.codfw.wmnet
  • 08:24 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host theemin.codfw.wmnet
  • 08:14 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sretest1002.eqiad.wmnet
  • 08:07 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host sretest1002.eqiad.wmnet
  • 07:18 godog: swift eqiad-prod: add weight to ms-be10[64-67] - T290546
  • 07:07 marostegui: Remove flaggedimages from s3 T290340
  • 06:13 effie: rolling restart php-fpm in eqiad - T291052
  • 06:07 effie: upgrade php7.2 in eqiad - T291052
  • 05:56 marostegui: Drop labswiki from m5 T167973
  • 05:28 marostegui: Remove flaggedimages from s2 T290340

2021-09-26

  • 14:51 volker-e@deploy1002: Finished deploy [design/style-guide@aac0ae9]: Deploy design/style-guide: aac0ae9 “Apps”: Fix image path (#490) (duration: 00m 06s)
  • 14:51 volker-e@deploy1002: Started deploy [design/style-guide@aac0ae9]: Deploy design/style-guide: aac0ae9 “Apps”: Fix image path (#490)
  • 03:16 legoktm: killed queries on db1099
  • 03:14 legoktm: killing queries on db1105

2021-09-25

  • 02:00 legoktm@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'shellbox-timeline' for release 'main' .
  • 01:27 legoktm@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'shellbox-timeline' for release 'main' .
  • 01:24 legoktm@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'shellbox-timeline' for release 'main' .

2021-09-24

  • 20:00 volker-e@deploy1002: Finished deploy [design/style-guide@362c6b1]: Deploy design/style-guide: 362c6b1 “Components”: Fix index link (#489) (duration: 00m 06s)
  • 20:00 volker-e@deploy1002: Started deploy [design/style-guide@362c6b1]: Deploy design/style-guide: 362c6b1 “Components”: Fix index link (#489)
  • 19:33 volker-e@deploy1002: Finished deploy [design/style-guide@6585e79]: Deploy design/style-guide: 6585e79 “Apps”: Add Apps x Design System section (#487) (duration: 00m 07s)
  • 19:33 volker-e@deploy1002: Started deploy [design/style-guide@6585e79]: Deploy design/style-guide: 6585e79 “Apps”: Add Apps x Design System section (#487)
  • 19:07 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 19:04 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 18:57 legoktm@deploy1002: Synchronized php-1.38.0-wmf.1/includes/MovePage.php: MovePage: don't create a recent change for a redirect (T291677) (duration: 00m 57s)
  • 18:54 legoktm@deploy1002: Synchronized php-1.38.0-wmf.1/extensions/PageTriage/: Revert "Remove deprecated date.js library" (T291675) (duration: 00m 59s)
  • 18:53 legoktm@deploy1002: sync-file aborted: (no justification provided) (duration: 00m 00s)
  • 18:13 legoktm@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'zotero' for release 'production' .
  • 18:12 legoktm@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'zotero' for release 'production' .
  • 17:20 elukey@cumin1001: END (PASS) - Cookbook sre.kafka.roll-restart-mirror-maker (exit_code=0) restart MirrorMaker for Kafka A:kafka-mirror-maker-jumbo-eqiad cluster: Roll restart of jvm daemons. - elukey@cumin1001
  • 17:02 elukey@cumin1001: START - Cookbook sre.kafka.roll-restart-mirror-maker restart MirrorMaker for Kafka A:kafka-mirror-maker-jumbo-eqiad cluster: Roll restart of jvm daemons. - elukey@cumin1001
  • 16:35 elukey@cumin1001: END (PASS) - Cookbook sre.kafka.roll-restart-brokers (exit_code=0) for Kafka A:kafka-jumbo-eqiad cluster: Roll restart of jvm daemons for openjdk upgrade. - elukey@cumin1001
  • 15:59 elukey@cumin1001: END (PASS) - Cookbook sre.zookeeper.roll-restart-zookeeper (exit_code=0) for Zookeeper A:zookeeper-analytics cluster: Roll restart of jvm daemons. - elukey@cumin1001
  • 15:53 elukey@cumin1001: START - Cookbook sre.zookeeper.roll-restart-zookeeper for Zookeeper A:zookeeper-analytics cluster: Roll restart of jvm daemons. - elukey@cumin1001
  • 15:52 elukey@cumin1001: END (PASS) - Cookbook sre.zookeeper.roll-restart-zookeeper (exit_code=0) for Zookeeper A:zookeeper-druid-public cluster: Roll restart of jvm daemons. - elukey@cumin1001
  • 15:46 elukey@cumin1001: START - Cookbook sre.zookeeper.roll-restart-zookeeper for Zookeeper A:zookeeper-druid-public cluster: Roll restart of jvm daemons. - elukey@cumin1001
  • 15:23 elukey@cumin1001: END (PASS) - Cookbook sre.zookeeper.roll-restart-zookeeper (exit_code=0) for Zookeeper A:zookeeper-druid-analytics cluster: Roll restart of jvm daemons. - elukey@cumin1001
  • 15:17 elukey@cumin1001: START - Cookbook sre.zookeeper.roll-restart-zookeeper for Zookeeper A:zookeeper-druid-analytics cluster: Roll restart of jvm daemons. - elukey@cumin1001
  • 15:09 elukey: sudo cumin -m async -b2 "c:profile::analytics::cluster::hdfs_mount" "umount /mnt/hdfs" "mount /mnt/hdfs" - T288625
  • 14:32 bd808@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'toolhub' for release 'main' .
  • 14:07 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:03 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 13:31 Amir1: start of rebuilding metadata of images in commons to make them use json
  • 13:24 elukey@cumin1001: START - Cookbook sre.kafka.roll-restart-brokers for Kafka A:kafka-jumbo-eqiad cluster: Roll restart of jvm daemons for openjdk upgrade. - elukey@cumin1001
  • 11:58 effie: upgrading scap on canaries - T291095
  • 11:39 jiji@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=tegola-vector-tiles
  • 11:32 effie: uploading scap-4.0.0 to buster-wikimedia and stretch-wikimedia
  • 11:17 effie: restart pybal in low traffic load balancers
  • 10:44 jynus: corrupting and fixing image metadata on testwiki before running script on commons T290462
  • 10:16 elukey@cumin1001: END (PASS) - Cookbook sre.druid.roll-restart-workers (exit_code=0) for Druid public cluster: Roll restart of Druid jvm daemons. - elukey@cumin1001
  • 10:11 btullis@cumin1001: END (FAIL) - Cookbook sre.hadoop.roll-restart-masters (exit_code=99) restart masters for Hadoop analytics cluster: Restart of jvm daemons. - btullis@cumin1001
  • 09:39 jynus: upgrade and restart db2099
  • 09:32 btullis@cumin1001: START - Cookbook sre.hadoop.roll-restart-masters restart masters for Hadoop analytics cluster: Restart of jvm daemons. - btullis@cumin1001
  • 09:29 btullis@cumin1001: END (PASS) - Cookbook sre.hadoop.roll-restart-masters (exit_code=0) restart masters for Hadoop test cluster: Restart of jvm daemons. - btullis@cumin1001
  • 09:25 marostegui: Rename flaggedimages on db1096(ruwiki) and db1098(arwiki) T290340
  • 09:25 elukey@cumin1001: START - Cookbook sre.druid.roll-restart-workers for Druid public cluster: Roll restart of Druid jvm daemons. - elukey@cumin1001
  • 09:09 jynus: upgrade and restart db2139, db2101
  • 09:03 btullis@cumin1001: START - Cookbook sre.hadoop.roll-restart-masters restart masters for Hadoop test cluster: Restart of jvm daemons. - btullis@cumin1001
  • 08:35 elukey@cumin1001: END (PASS) - Cookbook sre.kafka.roll-restart-brokers (exit_code=0) for Kafka A:kafka-test-eqiad cluster: Roll restart of jvm daemons for openjdk upgrade. - elukey@cumin1001
  • 08:22 jynus: upgrade and restart db2098 T290868
  • 08:20 elukey@cumin1001: END (PASS) - Cookbook sre.druid.roll-restart-workers (exit_code=0) for Druid analytics cluster: Roll restart of Druid jvm daemons. - elukey@cumin1001
  • 08:08 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mx2002.wikimedia.org
  • 07:59 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts mx2002.wikimedia.org
  • 07:42 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mx1002.wikimedia.org
  • 07:34 elukey@cumin1001: START - Cookbook sre.druid.roll-restart-workers for Druid analytics cluster: Roll restart of Druid jvm daemons. - elukey@cumin1001
  • 07:17 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.roll-restart-workers (exit_code=0) restart workers for Hadoop test cluster: Roll restart of jvm daemons for openjdk upgrade. - elukey@cumin1001
  • 07:11 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts mx1002.wikimedia.org
  • 07:01 elukey@cumin1001: START - Cookbook sre.hadoop.roll-restart-workers restart workers for Hadoop test cluster: Roll restart of jvm daemons for openjdk upgrade. - elukey@cumin1001
  • 07:01 elukey@cumin1001: END (ERROR) - Cookbook sre.hadoop.roll-restart-workers (exit_code=97) restart workers for Hadoop test cluster: Roll restart of jvm daemons for openjdk upgrade. - elukey@cumin1001
  • 07:00 elukey@cumin1001: START - Cookbook sre.hadoop.roll-restart-workers restart workers for Hadoop test cluster: Roll restart of jvm daemons for openjdk upgrade. - elukey@cumin1001
  • 06:55 elukey@cumin1001: START - Cookbook sre.kafka.roll-restart-brokers for Kafka A:kafka-test-eqiad cluster: Roll restart of jvm daemons for openjdk upgrade. - elukey@cumin1001
  • 06:53 elukey@cumin1001: END (PASS) - Cookbook sre.druid.roll-restart-workers (exit_code=0) for Druid test cluster: Roll restart of Druid jvm daemons. - elukey@cumin1001
  • 06:44 elukey@cumin1001: START - Cookbook sre.druid.roll-restart-workers for Druid test cluster: Roll restart of Druid jvm daemons. - elukey@cumin1001
  • 06:41 elukey@cumin1001: END (PASS) - Cookbook sre.presto.roll-restart-workers (exit_code=0) for Presto analytics cluster: Roll restart of all Presto's jvm daemons. - elukey@cumin1001
  • 06:30 elukey@cumin1001: START - Cookbook sre.presto.roll-restart-workers for Presto analytics cluster: Roll restart of all Presto's jvm daemons. - elukey@cumin1001
  • 06:26 elukey: restart archiva on archiva1002 to pick up new openjdk upgrades
  • 06:11 marostegui@cumin1001: dbctl commit (dc=all): 'db1177 (re)pooling @ 100%: After fixing some indexes T291584', diff saved to https://phabricator.wikimedia.org/P17324 and previous config saved to /var/cache/conftool/dbconfig/20210924-061105-root.json
  • 05:56 marostegui@cumin1001: dbctl commit (dc=all): 'db1177 (re)pooling @ 75%: After fixing some indexes T291584', diff saved to https://phabricator.wikimedia.org/P17323 and previous config saved to /var/cache/conftool/dbconfig/20210924-055601-root.json
  • 05:40 marostegui@cumin1001: dbctl commit (dc=all): 'db1177 (re)pooling @ 50%: After fixing some indexes T291584', diff saved to https://phabricator.wikimedia.org/P17322 and previous config saved to /var/cache/conftool/dbconfig/20210924-054057-root.json
  • 05:25 marostegui@cumin1001: dbctl commit (dc=all): 'db1177 (re)pooling @ 25%: After fixing some indexes T291584', diff saved to https://phabricator.wikimedia.org/P17321 and previous config saved to /var/cache/conftool/dbconfig/20210924-052554-root.json
  • 05:10 marostegui@cumin1001: dbctl commit (dc=all): 'db1177 (re)pooling @ 10%: After fixing some indexes T291584', diff saved to https://phabricator.wikimedia.org/P17320 and previous config saved to /var/cache/conftool/dbconfig/20210924-051050-root.json
  • 05:07 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1177 T291584', diff saved to https://phabricator.wikimedia.org/P17319 and previous config saved to /var/cache/conftool/dbconfig/20210924-050739-marostegui.json
  • 01:27 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 01:23 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 01:16 krinkle@deploy1002: Synchronized wmf-config/profiler.php: I25f4b70b9d4b (duration: 00m 57s)
  • 00:39 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 00:39 legoktm@deploy1002: Synchronized php-1.38.0-wmf.1/resources/src/mediawiki.searchSuggest/searchSuggest.js: Hiding fallback button depends on HTML order (T291272) (duration: 00m 57s)
  • 00:36 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .

2021-09-23

  • 23:38 foks: running wm-scripts/mcdc2021/populateEditCount.php on each wiki (s1 thru s8 simultaneously) https://phabricator.wikimedia.org/T291668
  • 22:58 bd808@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'toolhub' for release 'main' .
  • 22:58 foks: creating `mcdc2021_edits` table on each wiki for elections voterlist https://phabricator.wikimedia.org/T291668
  • 22:37 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 22:34 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 22:33 reedy@deploy1002: Synchronized php-1.38.0-wmf.1/extensions/SecurePoll/cli/wm-scripts/: T291668 (duration: 00m 57s)
  • 22:27 ryankemper: T280001 `ryankemper@cumin1001:~$ sudo cumin 'P{puppetmaster*}' 'sudo rm -fv /var/run/confd-template/.wcqs*'` complete, forcing recheck
  • 22:27 ryankemper: T280001 The pooling of the `wcqs*` hosts has gotten `/srv/config-master/pybal/${DC}/wcqs` to render, but we need to clear away the stale error files to get rid of the associated warnings `Stale template error files present for '/srv/config-master/pybal/${DC}/wcqs'` => `sudo rm -fv /var/run/confd-template/.wcqs*`
  • 22:20 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 22:18 ryankemper: T280001 `ryankemper@puppetmaster1001:/srv$ sudo confctl select 'name=wcqs.*' set/pooled=yes:weight=10`
  • 22:17 ryankemper@puppetmaster1001: conftool action : set/pooled=yes:weight=10; selector: name=wcqs.*
  • 22:17 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 22:13 ryankemper: T280001 [codfw] `root@lvs2010:/home/ryankemper# ipvsadm -Dt 10.2.2.67:443` and `root@lvs2009:/home/ryankemper# ipvsadm -Dt 10.2.2.67:443`
  • 22:13 ryankemper: T280001 [eqiad] `root@lvs1016:/home/ryankemper# ipvsadm -Dt 10.2.1.67:443` and `root@lvs1015:/home/ryankemper# ipvsadm -Dt 10.2.1.67:443`
  • 22:06 ryankemper: T280001 Restarted pybal on low-traffic primaries: `ryankemper@cumin1001:~$ sudo cumin 'P{lvs2009*,lvs1015*}' 'sudo systemctl restart pybal'`
  • 22:06 ryankemper: T280001 Waited 120s and checked https://icinga.wikimedia.org/alerts, proceeding to primary low-traffic hosts `lvs2009` and `lvs1015`
  • 22:05 ryankemper: T280001 [Cleanup required] `TCP 10.2.1.67:443 wrr` shows up on `ryankemper@lvs1016:~$ sudo ipvsadm -L -n` and `TCP 10.2.2.67:443 wrr` shows up on `ryankemper@lvs2010:~$ sudo ipvsadm -L -n` (erroneous)
  • 22:05 ryankemper: T280001 [Sanity check] `TCP 10.2.2.67:443 wrr` shows up on `ryankemper@lvs1016:~$ sudo ipvsadm -L -n` and `TCP 10.2.1.67:443 wrr` shows up on `ryankemper@lvs2010:~$ sudo ipvsadm -L -n` as expected
  • 22:04 ryankemper: T280001 Restarted pybal on low-traffic backups: `ryankemper@cumin1001:~$ sudo cumin 'P{lvs2010*,lvs1016*}' 'sudo systemctl restart pybal'`
  • 22:03 ryankemper: T280001 Restarting pybal on low-traffic backups `lvs2010` and `lvs1016`...
  • 22:03 ryankemper: T280001 Ran puppet on all lvs hosts: `ryankemper@cumin1001:~$ sudo cumin 'O:lvs::balancer' 'sudo run-puppet-agent'`
  • 22:00 ryankemper: T280001 Running puppet on all lvs hosts: `ryankemper@cumin1001:~$ sudo cumin 'O:lvs::balancer' 'sudo run-puppet-agent'`...
  • 21:59 ryankemper: T280001 Merged https://gerrit.wikimedia.org/r/c/operations/puppet/+/723315, ran puppet agent on `wcqs*` to fix `local lo:LVS destination IPs`
  • 21:59 ryankemper: T280001 Swapped the netbox IPAM addresses back, after erroneously swapping them earlier. `sre.dns.netbox` cookbook run complete as well
  • 21:57 ryankemper@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 21:53 ryankemper@cumin1001: START - Cookbook sre.dns.netbox
  • 21:43 bd808@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'toolhub' for release 'main' .
  • 21:43 foks: altering some rows in the `securepoll_elections` table on metawiki
  • 21:36 ryankemper: T280001 `sre.dns.netbox` run complete, netbox IP mixup *should* be resolved
  • 21:33 ryankemper@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 21:27 ryankemper: T280001 `ryankemper@cumin1001:~$ sudo -i cookbook sre.dns.netbox -t T280001 'Fix swapped wcqs.svc.[eqiad,codfw].wmnet'` in progress (note: no `sudo authdns-update` will be necessary because that's just for `operations/dns` repo changes; we only need to run the netbox cookbook)
  • 21:24 ryankemper@cumin1001: START - Cookbook sre.dns.netbox
  • 21:23 ryankemper: T280001 Swapped IPs of https://netbox.wikimedia.org/ipam/ip-addresses/9062/ and https://netbox.wikimedia.org/ipam/ip-addresses/9063; this should fix the issue where eqiad and codfw were swapped in netbox (my error)...still need to run netbox cookbook and possibly a manual `sudo authdns-update`
  • 21:19 ryankemper: The pybal side of the changes looks good, but I made a mistake with the assigning of IPs in netbox; `wcqs.svc.eqiad.wmnet` is routing to where codfw should go and vice versa. Fixing...
  • 21:05 ryankemper: T280001 Restarted pybal on low-traffic primaries: `ryankemper@cumin1001:~$ sudo cumin 'P{lvs2009*,lvs1015*}' 'sudo systemctl restart pybal'`
  • 21:04 ryankemper: T280001 Restarting pybal on low-traffic primaries `lvs2009` and `lvs1015`...
  • 21:04 ryankemper: T280001 Waited 120s and checked https://icinga.wikimedia.org/alerts, proceeding to primary low-traffic hosts `lvs2009` and `lvs1015`
  • 21:00 ryankemper: T280001 Sanity check of `sudo ipvsadm -L -n` on low-traffic backups `lvs2010` and `lvs1016` looks good, proceeding
  • 21:00 ryankemper: T280001 `TCP 10.2.1.67:443 wrr` shows up on `ryankemper@lvs1016:~$ sudo ipvsadm -L -n ` and `TCP 10.2.2.67:443 wrr` shows up on `ryankemper@lvs2010:~$ sudo ipvsadm -L -n` as expected
  • 20:58 brennen: canceling backport training window for 2021-09-23
  • 20:54 ryankemper: T280001 Restarted pybal on backup low-traffic hosts: `ryankemper@cumin1001:~$ sudo cumin 'P{lvs2010*,lvs1016*}' 'sudo systemctl restart pybal'`
  • 20:53 ryankemper: T280001 Restarting pybal on backup low-traffic hosts `lvs2010` and `lvs1016`...
  • 20:53 ryankemper: T280001 Ran puppet on all lvs hosts => `ryankemper@cumin1001:~$ sudo cumin 'O:lvs::balancer' 'sudo run-puppet-agent'`
  • 20:47 ryankemper: T280001 Merging https://gerrit.wikimedia.org/r/c/operations/puppet/+/723254 to proceed with `lvs_setup` state change; will be restarting low-traffic lvs hosts shortly
  • 20:04 dduvall: 1.38.0-wmf.1 promoted to all wikis. no new errors or rising rates (T281165)
  • 20:02 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 19:58 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 19:50 dduvall@deploy1002: rebuilt and synchronized wikiversions files: all wikis to 1.38.0-wmf.1
  • 19:40 kostajh: UTC morning backport window done
  • 19:40 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 19:39 kharlan@deploy1002: Synchronized php-1.38.0-wmf.1/extensions/GrowthExperiments/includes/HomepageHooks.php: Backport: Suggested Edits: Update editor preference for tasks that shouldn't open the editor by default (T291020) (duration: 01m 05s)
  • 19:36 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 19:13 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 19:10 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 19:02 krinkle@deploy1002: Synchronized wmf-config/InitialiseSettings.php: I3323ce (duration: 01m 07s)
  • 18:58 ryankemper: T280001 Merged https://gerrit.wikimedia.org/r/c/operations/puppet/+/721089 to see if it resolves the `confd` error that popped up
  • 18:57 krinkle@deploy1002: Synchronized wmf-config/logging.php: I2cd81a (duration: 01m 05s)
  • 18:51 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 18:48 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 18:02 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:56 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 17:31 volans@cumin2002: END (PASS) - Cookbook sre.experimental.reimage (exit_code=0) for host sretest1001.eqiad.wmnet
  • 17:30 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:28 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 17:22 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:19 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 17:18 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:13 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 17:06 volans@cumin2002: START - Cookbook sre.experimental.reimage for host sretest1001.eqiad.wmnet
  • 17:01 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:59 volans: uploaded spicerack_1.0.1 to apt.wikimedia.org buster-wikimedia,bullseye-wikimedia
  • 16:55 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 16:38 ryankemper: T280001 Merged https://gerrit.wikimedia.org/r/c/operations/puppet/+/713959, running puppet on `*w*qs*` (i.e. wcqs and wdqs)
  • 16:13 elukey: reboot an-worker1096 to see if megacli status for a new disk changes - T290805
  • 16:09 brennen: gitlab1001: reverting gitlab cas: uid instead of CN; add nickname_key for T288392, as existing user logins are broken.
  • 15:54 Lucas_WMDE: lucaswerkmeister-wmde@mwmaint1002:~$ echo 'https://query.wikidata.org/querybuilder/' | mwscript purgeList.php # T285761
  • 15:54 brennen: gitlab1001: brief downtime to apply gitlab cas: uid instead of CN; add nickname_key for T288392
  • 15:12 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 15:09 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 15:09 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'.
  • 15:09 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'.
  • 15:06 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'.
  • 15:06 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'.
  • 14:58 reedy@deploy1002: Synchronized wmf-config/reverse-proxy-staging.php: T291643 (duration: 01m 05s)
  • 14:19 moritzm: removed routers filter for mx1001, reimage to bullseye complete T286911
  • 14:19 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'.
  • 14:19 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'.
  • 14:14 otto@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
  • 13:53 effie: upgrade php7.2 on codfw - T291052
  • 13:36 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.renew-cert (exit_code=0) for mx1001.wikimedia.org: Renew puppet certificate - jmm@cumin2002
  • 13:36 jmm@cumin2002: START - Cookbook sre.puppet.renew-cert for mx1001.wikimedia.org: Renew puppet certificate - jmm@cumin2002
  • 13:34 jmm@cumin2002: END (FAIL) - Cookbook sre.puppet.renew-cert (exit_code=99) for mx1001.wikimedia.org: Renew puppet certificate - jmm@cumin2002
  • 13:34 jmm@cumin2002: START - Cookbook sre.puppet.renew-cert for mx1001.wikimedia.org: Renew puppet certificate - jmm@cumin2002
  • 13:28 marostegui: Deploy schema change on s8 codfw wikidatawiki.wb_changes T291584
  • 13:27 moritzm: reimaging mx1001 to bullseye T286911
  • 13:25 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on mx1001.wikimedia.org with reason: reimage
  • 13:25 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on mx1001.wikimedia.org with reason: reimage
  • 13:23 jbond: merge refactor of resolv.conf puppet class - (gerrit 717241)
  • 13:14 marostegui: Deploy schema change on s4 {commonswiki,testcommonswiki}.wb_changes T291584
  • 13:11 marostegui: Deploy schema change on s3 testwikidatawiki.wb_changes T291584
  • 13:09 elukey: update pcc facts (after change in puppetdb's fact filter list, to allow partitions for analytics)
  • 11:24 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 11:21 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 11:19 marostegui: Upgrade db2081 db2082 db2083 db2084 db2091 db2152 T290868
  • 11:16 kostajh: UTC morning backport and config deploys done
  • 11:15 kharlan@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: GrowthExperiments: Place new dewiki accounts in control group (T288420) (duration: 01m 06s)
  • 11:10 jynus: restart and upgrade db2141 T290865
  • 10:55 mvolz@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'zotero' for release 'production' .
  • 10:53 moritzm: mx1001 filterered on the routers for forthcoming reimage to bullseye T286911
  • 10:52 mvolz@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'zotero' for release 'production' .
  • 10:51 volans@cumin1001: END (PASS) - Cookbook sre.experimental.reimage (exit_code=0) for host sretest1002.eqiad.wmnet
  • 10:50 marostegui: Upgrade db2102 db2116 db2130 db2145 db2146
  • 10:47 mvolz@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'zotero' for release 'staging' .
  • 10:27 volans@cumin1001: START - Cookbook sre.experimental.reimage for host sretest1002.eqiad.wmnet
  • 09:59 jiji@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
  • 09:55 jiji@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
  • 09:52 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.renew-cert (exit_code=0) for mx2002.wikimedia.org: Renew puppet certificate - jmm@cumin2002
  • 09:52 jmm@cumin2002: START - Cookbook sre.puppet.renew-cert for mx2002.wikimedia.org: Renew puppet certificate - jmm@cumin2002
  • 09:51 jiji@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
  • 09:40 moritzm: reinstalling mx2002 (test server) to validate bullseye installs are fixed
  • 09:31 jiji@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
  • 09:30 jiji@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
  • 09:29 jiji@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
  • 08:16 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 08:12 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 08:04 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 08:04 legoktm@deploy1002: Synchronized wmf-config/CommonSettings.php: Have SyntaxHighlight use Shellbox service on group0 wikis (2/2) (T289227) (duration: 01m 05s)
  • 08:02 legoktm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Have SyntaxHighlight use Shellbox service on group0 wikis (1/2) (T289227) (duration: 01m 06s)
  • 08:01 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 07:54 legoktm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Rename $wmgUseGeSHi to $wmgUseSyntaxHighlight (3/3) (duration: 01m 05s)
  • 07:52 legoktm@deploy1002: Synchronized wmf-config/CommonSettings.php: Rename $wmgUseGeSHi to $wmgUseSyntaxHighlight (2/3) (duration: 01m 05s)
  • 07:49 legoktm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Rename $wmgUseGeSHi to $wmgUseSyntaxHighlight (1/3) (duration: 01m 06s)
  • 07:10 tgr: running `mwscript extensions/GrowthExperiments/maintenance/fixLinkRecommendationData.php --wiki=$WIKI --search-index --db-table --statsd` for growthexperiments.dblist wikis
  • 07:01 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
  • 07:01 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
  • 06:59 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
  • 06:59 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
  • 06:57 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'.
  • 06:57 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'.
  • 06:56 marostegui: Upgrade db2116
  • 06:55 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'.
  • 06:55 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'.
  • 06:55 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
  • 06:55 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
  • 06:53 marostegui: Upgrade db2085, db2088 and db2092
  • 05:24 marostegui: Optimize ruwiki.logging on codfw T286102
  • 02:55 eileen: civicrm revision changed from 14658445a2 to 18228490ae, config revision is 77cb7ec866
  • 02:06 RoanKattouw: Deployed patch for T291600
  • 01:05 eileen: tools revision changed from 1d67c52c12 to d90f4c91ee
  • 00:35 catrope@deploy1002: Synchronized php-1.38.0-wmf.1/extensions/MediaSearch/: Use text() instead of parse() for MediaSearch UI messages (T291590) (duration: 01m 08s)
  • 00:29 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 00:27 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .

2021-09-22

  • 22:51 mutante: mx2001 - re-enabled puppet
  • 20:48 ryankemper: [WDQS] After puppet-merging, running puppet on `miscweb*`, and doing a `ryankemper@mwmaint1002:~$ echo 'https://query.wikidata.org/querybuilder' | mwscript purgeList.php`, https://query.wikidata.org/querybuilder is working properly again
  • 20:39 ryankemper: [WDQS] Merging https://gerrit.wikimedia.org/r/c/operations/puppet/+/722958/ which should (hopefully) resolve an issue where https://query.wikidata.org/querybuilder gives a 404, whereas https://query.wikidata.org/querybuilder/ works (due to the trailing slash avoiding the rewrite regex)
  • 20:38 ryankemper: `[WCQS]` `wcqs1001.eqiad.wmnet` is reachable again following the powercycle
  • 20:20 ryankemper: `[WCQS]` Ran `racadm>>racadm serveraction powercycle` on `wcqs1001.mgmt.eqiad.wmnet`
  • 20:18 ryankemper: `[WCQS]` `wcqs1001` is ssh unreachable (https://icinga.wikimedia.org/cgi-bin/icinga/extinfo.cgi?type=2&host=wcqs1001&service=SSH), will try restarting from mgmt console
  • 19:29 dduvall: 1.38.0-wmf.1 promoted to group1. no new errors or rising error rates (T281165)
  • 19:22 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 19:20 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 19:20 dduvall@deploy1002: Synchronized php: group1 wikis to 1.38.0-wmf.1 (duration: 01m 11s)
  • 19:18 dduvall@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.38.0-wmf.1
  • 19:12 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 19:11 dduvall@deploy1002: Synchronized php-1.38.0-wmf.1/extensions/CentralAuth: Backport: Avoid $wgUser deprecation warnings (T291515) (duration: 01m 06s)
  • 19:10 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 18:33 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 18:32 legoktm@deploy1002: Synchronized php-1.38.0-wmf.1/extensions/GrowthExperiments/modules/help/ext.growthExperiments.PostEditPanel.js: Post-edit Panel: Set task.pageviews to null rather than undefined (T291510) (duration: 01m 05s)
  • 18:31 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 18:23 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 18:21 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 18:13 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 18:12 legoktm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: logging: send DuplicateParse bucket to Logstash (duration: 01m 05s)
  • 18:11 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 18:06 legoktm@deploy1002: Synchronized wmf-config/ProductionServices.php: Add new Shellboxes (duration: 01m 16s)
  • 18:03 volans@cumin1001: END (PASS) - Cookbook sre.experimental.reimage (exit_code=0) for host sretest1001.eqiad.wmnet
  • 17:43 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 17:40 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 17:38 volans@cumin1001: START - Cookbook sre.experimental.reimage for host sretest1001.eqiad.wmnet
  • 17:38 legoktm@deploy1002: Synchronized php-1.38.0-wmf.1/includes/api/: Restore deprecated API token methods (3/3) (duration: 01m 07s)
  • 17:36 legoktm@deploy1002: Synchronized php-1.38.0-wmf.1/autoload.php: Restore deprecated API token methods (2/3) (duration: 01m 05s)
  • 17:34 legoktm@deploy1002: Synchronized php-1.38.0-wmf.1/includes/api/ApiTokens.php: Restore deprecated API token methods (1/3) (duration: 01m 05s)
  • 16:58 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on cloudcephosd1022.eqiad.wmnet with reason: REIMAGE
  • 16:57 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 16:56 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd1022.eqiad.wmnet with reason: REIMAGE
  • 16:55 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 16:53 volans@cumin1001: END (ERROR) - Cookbook sre.experimental.reimage (exit_code=97) for host sretest1002.eqiad.wmnet
  • 16:50 reedy@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Remove wmgFileBlacklist (duration: 01m 06s)
  • 16:49 joal@deploy1002: Finished deploy [analytics/refinery@04aae46] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@04aae46] (duration: 06m 17s)
  • 16:48 reedy@deploy1002: Synchronized wmf-config/CommonSettings.php: Use wmgProhibitedFileExtensions (duration: 01m 05s)
  • 16:48 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 16:46 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 16:45 reedy@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Add wmgProhibitedFileExtensions (duration: 01m 07s)
  • 16:43 joal@deploy1002: Started deploy [analytics/refinery@04aae46] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@04aae46]
  • 16:41 mutante: [netmon1002:~] $ sudo systemctl start rancid-differ
  • 16:41 reedy@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Rename wgShortPagesNamespaceBlacklist to wgShortPagesNamespaceExclusions (duration: 01m 05s)
  • 16:40 mutante: [netmon1002:~] $ sudo systemctl start rancid-clean-logs
  • 16:39 reedy@deploy1002: Synchronized wmf-config/CommonSettings.php: Rename wgEnableUserEmailBlacklist to wgEnableUserEmailMuteList (duration: 01m 05s)
  • 16:38 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 16:38 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:37 joal@deploy1002: Finished deploy [analytics/refinery@04aae46] (thin): Regular analytics weekly train THIN [analytics/refinery@04aae46] (duration: 00m 07s)
  • 16:37 joal@deploy1002: Started deploy [analytics/refinery@04aae46] (thin): Regular analytics weekly train THIN [analytics/refinery@04aae46]
  • 16:36 volans@cumin1001: START - Cookbook sre.experimental.reimage for host sretest1002.eqiad.wmnet
  • 16:36 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 16:35 reedy@deploy1002: Synchronized wmf-config/CommonSettings.php: Use wgMimeTypeExclusions and set wgProhibitedFileExtensions not wgFileBlacklist (duration: 01m 05s)
  • 16:32 joal@deploy1002: Finished deploy [analytics/refinery@04aae46]: Regular analytics weekly train [analytics/refinery@04aae46] (duration: 18m 19s)
  • 16:30 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 16:19 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 16:17 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 16:14 joal@deploy1002: Started deploy [analytics/refinery@04aae46]: Regular analytics weekly train [analytics/refinery@04aae46]
  • 16:13 ladsgroup@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Set jQuery migrate to false everywhere except metawiki (T280944) (duration: 01m 56s)
  • 16:08 volans@cumin1001: END (ERROR) - Cookbook sre.experimental.reimage (exit_code=97) for host sretest1002.eqiad.wmnet
  • 15:57 volans@cumin1001: START - Cookbook sre.experimental.reimage for host sretest1002.eqiad.wmnet
  • 15:56 joal@deploy1002: Finished deploy [analytics/refinery@b2ca54f] (hadoop-test): Bugfix analytics deploy TEST [analytics/refinery@b2ca54f] (duration: 06m 17s)
  • 15:52 moritzm: removed filters on mx1001 filterered on the routers due to an issue with the mx1001 reinstall T286911
  • 15:49 joal@deploy1002: Started deploy [analytics/refinery@b2ca54f] (hadoop-test): Bugfix analytics deploy TEST [analytics/refinery@b2ca54f]
  • 15:49 joal@deploy1002: Finished deploy [analytics/refinery@b2ca54f] (thin): Bugfix analytics deploy THIN [analytics/refinery@b2ca54f] (duration: 00m 07s)
  • 15:49 joal@deploy1002: Started deploy [analytics/refinery@b2ca54f] (thin): Bugfix analytics deploy THIN [analytics/refinery@b2ca54f]
  • 15:16 mbsantos@deploy1002: Finished deploy [kartotherian/deploy@7ed9c3b]: Revert "change tegola uri to test single production node" (duration: 00m 15s)
  • 15:15 mbsantos@deploy1002: Started deploy [kartotherian/deploy@7ed9c3b]: Revert "change tegola uri to test single production node"
  • 15:02 moritzm: re-installing mx1001 with bullseye T286911
  • 14:47 volans: upgraded spicerack to 1.0.0 on cumin hosts
  • 14:14 volans: uploaded spicerack_1.0.0 to apt.wikimedia.org buster-wikimedia,bullseye-wikimedia
  • 13:39 herron: flushed mx1001 mail queue to mx2001 T286911
  • 13:26 moritzm: mx1001 filterered on the routers for forthcoming reimage to bullseye T286911
  • 13:23 joal@deploy1002: Finished deploy [analytics/refinery@b2ca54f]: Bugfix analytics deploy [analytics/refinery@b2ca54f] (duration: 18m 25s)
  • 13:09 mbsantos@deploy1002: Finished deploy [kartotherian/deploy@3293ce1]: tegola: increase mirrored requests to 10% (duration: 00m 14s)
  • 13:09 mbsantos@deploy1002: Started deploy [kartotherian/deploy@3293ce1]: tegola: increase mirrored requests to 10%
  • 13:04 joal@deploy1002: Started deploy [analytics/refinery@b2ca54f]: Bugfix analytics deploy [analytics/refinery@b2ca54f]
  • 12:56 mbsantos@deploy1002: Finished deploy [kartotherian/deploy@5617839]: tegola: increase mirrored requests to 5% (duration: 00m 15s)
  • 12:55 mbsantos@deploy1002: Started deploy [kartotherian/deploy@5617839]: tegola: increase mirrored requests to 5%
  • 12:46 mbsantos@deploy1002: Finished deploy [kartotherian/deploy@8765218]: change tegola uri to test single production node (duration: 00m 14s)
  • 12:46 mbsantos@deploy1002: Started deploy [kartotherian/deploy@8765218]: change tegola uri to test single production node
  • 11:46 jiji@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
  • 11:38 jbond: enable puppet fleet wide to post puppetdb restart
  • 11:33 jbond: disable puppet fleet wide to preforme puppdb restart
  • 11:11 jgiannelos@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
  • 10:50 jgiannelos@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
  • 10:31 jiji@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
  • 10:20 jiji@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
  • 09:51 jiji@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
  • 09:38 jiji@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
  • 08:46 effie: upgrade php7.2 on api-canaries and restart service - T291052
  • 06:02 elukey: update pcc facts
  • 05:48 legoktm@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=shellbox-syntaxhighlight
  • 05:48 legoktm@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=shellbox-timeline
  • 05:47 legoktm@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=shellbox-media
  • 05:31 legoktm: restarting pybal on lvs2009
  • 05:27 legoktm: restarting pybal on lvs2010
  • 05:23 legoktm: restarting pybal on lvs1015
  • 05:17 legoktm: restarting pybal on lvs1016
  • 05:12 legoktm: sudo cumin 'O:lvs::balancer' 'run-puppet-agent'
  • 04:48 legoktm: ran authdns-update for adding new shellbox svc entries https://gerrit.wikimedia.org/r/721908

2021-09-21

  • 23:20 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 23:19 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 22:56 bd808@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'toolhub' for release 'main' .
  • 22:29 bd808@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'toolhub' for release 'main' .
  • 21:58 bd808@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'toolhub' for release 'main' .
  • 21:16 cstone: payments-wiki revision is 23d0ffac66
  • 19:59 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 19:57 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 19:54 hashar@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Enable 'DuplicateParse' logging bucket (duration: 01m 07s)
  • 19:19 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 19:15 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 19:10 ryankemper: T280001 `sre.dns.netbox` completed successfully
  • 19:06 dduvall@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.38.0-wmf.1
  • 19:03 ryankemper@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 18:57 ryankemper@cumin1001: START - Cookbook sre.dns.netbox
  • 18:56 ryankemper: T280001 Running `sudo -i cookbook sre.dns.netbox -t T280001 'Added wcqs.svc.[eqiad,codfw].wmnet'` per final step of https://wikitech.wikimedia.org/wiki/LVS#DNS_changes_(svc_zone_only)...
  • 18:53 ryankemper: T280001 `for i in 0 1 2 ; do dig @ns${i}.wikimedia.org -t any wcqs.svc.[eqiad,codfw].wmnet ; done` looks as expected
  • 18:48 ryankemper: T280001 `OK - authdns-update successful on all nodes!`
  • 18:45 ryankemper: T280001 `ryankemper@authdns1001:~$ sudo authdns-update`
  • 18:44 ryankemper: T280001 Merging https://gerrit.wikimedia.org/r/c/operations/dns/+/713929; will follow steps in https://wikitech.wikimedia.org/wiki/DNS#Changing_records_in_a_zonefile post-merge
  • 17:56 cstone: payments-wiki revision is 23d0ffac66
  • 17:49 dduvall: 1.38.0-wmf.1 deployed to testwikis (T281165)
  • 17:48 jiji@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
  • 17:48 dduvall@deploy1002: Finished scap: testwikis wikis to 1.38.0-wmf.1 (duration: 35m 44s)
  • 17:42 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 17:39 elukey: update pcc facts
  • 17:37 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 17:35 jiji@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
  • 17:27 jgiannelos@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
  • 17:12 dduvall@deploy1002: Started scap: testwikis wikis to 1.38.0-wmf.1
  • 17:08 jgiannelos@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
  • 17:00 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 16:58 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 16:51 jgiannelos@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
  • 16:35 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 16:33 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 16:33 jgiannelos@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
  • 16:14 jgiannelos@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
  • 15:56 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 15:54 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 15:46 jgiannelos@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
  • 15:39 elukey: update pcc facts
  • 15:26 effie: upgrade php7.2 on app-canaries and restart service - T291052
  • 15:24 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:21 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 15:10 marostegui@cumin1001: dbctl commit (dc=all): 'Remove s10 from codfw T167973', diff saved to https://phabricator.wikimedia.org/P17307 and previous config saved to /var/cache/conftool/dbconfig/20210921-150958-marostegui.json
  • 14:35 XioNoX: re-enable AMS-IX peering sessions - T291407
  • 14:17 XioNoX: temporarily downpref Telia-Deutsch Telekom to not saturate Telia transit - T291407
  • 13:52 XioNoX: disable AMS-IX peering sessions for maintenance - T291407
  • 13:48 otto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-main' for release 'canary' .
  • 13:48 otto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
  • 13:41 otto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
  • 13:41 otto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-main' for release 'canary' .
  • 13:37 otto@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
  • 13:32 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host testvm2002.codfw.wmnet
  • 13:18 effie: upgrading php on wtp* servers to 7.2.34-18+0~20210223.60+debian10~1.gbpb21322+wmf2 && rolling service restart - T291052
  • 13:08 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host testvm2002.codfw.wmnet
  • 12:01 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host ganeti2025.codfw.wmnet
  • 11:58 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2025.codfw.wmnet
  • 11:55 elukey@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 11:46 elukey@cumin1001: START - Cookbook sre.dns.netbox
  • 11:45 jgiannelos@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Configure event stream for map tile state change - 3b01ef587 (duration: 00m 57s)
  • 11:45 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host ganeti2026.codfw.wmnet
  • 11:40 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2026.codfw.wmnet
  • 11:29 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 11:27 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 11:07 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2026.codfw.wmnet
  • 10:56 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2026.codfw.wmnet
  • 10:54 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts testvm2001.codfw.wmnet
  • 10:25 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts testvm2001.codfw.wmnet
  • 09:59 _joe_: rebuilding openjdk8* image, ruby, nodejs-slim for T291458
  • 09:46 _joe_: deneb:~# docker-registryctl delete-tags docker-registry.wikimedia.org/fluentd T291458
  • 09:44 _joe_: deleting images for graphoid, T291458
  • 05:16 kart_: Upgraded cxserver to 2021-09-16-130208-production
  • 05:12 kartik@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'cxserver' for release 'production' .
  • 05:03 kartik@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'cxserver' for release 'production' .
  • 04:58 kartik@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'cxserver' for release 'staging' .
  • 02:35 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 02:33 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 02:05 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 02:03 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 00:16 tgr: Evening deploys done
  • 00:16 tgr@deploy1002: Synchronized php-1.37.0-wmf.23/extensions/GrowthExperiments/modules/ext.growthExperiments.StructuredTask/addlink/AddLinkArticleTarget.js: Backport: AddLink: Skip over headings in phrase matching (T291361) (duration: 00m 57s)
  • 00:04 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 00:03 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .

2021-09-20

  • 23:31 ejegg: updated fundraising CiviCRM from e6bf81d99c to 14658445a2
  • 23:29 bd808@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'toolhub' for release 'main' .
  • 23:22 mutante: LDAP - added georginaburnett-wmde to NDA group (T291391, T273780)
  • 23:15 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 23:13 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 22:21 bd808@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'toolhub' for release 'main' .
  • 22:14 mutante: wdqs1004 - depool
  • 22:10 mutante: wdqs1004 - service wdqs-updater restart
  • 22:06 mutante: wdqs1004 - HTTP/1.1 503 Service Unavailable - systemctl restart wdqs-blazegraph
  • 22:05 foks: changing user email for MIskander (WMF)@collabwiki
  • 21:41 mutante: ms-fe1005 - systemctl start swift_dispersion_stats.service (gerrit:719285)
  • 21:30 bd808@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'toolhub' for release 'main' .
  • 19:49 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 19:47 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 19:45 legoktm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Revert "Disable jQuery Migrate on group1" (T291410) (duration: 00m 56s)
  • 17:02 legoktm: repooled codfw (traffic/caches) 1 week after DC switchover
  • 16:41 effie: upgrading php on wtp[1025-1029] to 7.2.34-18+0~20210223.60+debian10~1.gbpb21322+wmf2 - T291052
  • 16:28 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 16:26 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 14:48 marostegui@cumin1001: dbctl commit (dc=all): 'db1168 (re)pooling @ 100%: After migrating wikitech to codfw', diff saved to https://phabricator.wikimedia.org/P17305 and previous config saved to /var/cache/conftool/dbconfig/20210920-144844-root.json
  • 14:42 jiji@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
  • 14:33 marostegui@cumin1001: dbctl commit (dc=all): 'db1168 (re)pooling @ 75%: After migrating wikitech to codfw', diff saved to https://phabricator.wikimedia.org/P17304 and previous config saved to /var/cache/conftool/dbconfig/20210920-143340-root.json
  • 14:18 marostegui@cumin1001: dbctl commit (dc=all): 'db1168 (re)pooling @ 50%: After migrating wikitech to codfw', diff saved to https://phabricator.wikimedia.org/P17303 and previous config saved to /var/cache/conftool/dbconfig/20210920-141836-root.json
  • 14:11 jiji@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
  • 14:03 marostegui@cumin1001: dbctl commit (dc=all): 'db1168 (re)pooling @ 25%: After migrating wikitech to codfw', diff saved to https://phabricator.wikimedia.org/P17302 and previous config saved to /var/cache/conftool/dbconfig/20210920-140333-root.json
  • 13:51 jiji@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
  • 13:47 hnowlan@cumin1001: END (PASS) - Cookbook sre.postgresql.postgres-init (exit_code=0)
  • 13:45 moritzm: restarting apache on Logstash ELK5 cluster to pick up GNUTLS update T283165
  • 13:39 elukey@cumin1001: END (PASS) - Cookbook sre.ores.roll-restart-workers (exit_code=0) for ORES codfw cluster: Roll restart of ORES's daemons. - elukey@cumin1001
  • 13:20 elukey@cumin1001: START - Cookbook sre.ores.roll-restart-workers for ORES codfw cluster: Roll restart of ORES's daemons. - elukey@cumin1001
  • 13:13 damilare: updated payments-wiki from f9cbf95a12 to 23d0ffac66
  • 12:59 jiji@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
  • 12:58 marostegui: Drop ct_tag_id_log key from db1144:3314 T277416
  • 12:54 moritzm: installing gnutls28 updates for stretch with backport for forthcoming Let's encrypt issuance chain update (T283165)
  • 12:42 marostegui: Add ct_tag_id_log key to db1144:3314 T277416
  • 11:50 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 11:48 hnowlan@cumin1001: START - Cookbook sre.postgresql.postgres-init
  • 11:47 hnowlan@cumin1001: END (PASS) - Cookbook sre.postgresql.postgres-init (exit_code=0)
  • 11:46 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 11:36 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 11:31 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 11:31 urbanecm@deploy1002: Finished scap: b9031bc: Mentor dashboard: Mentor tools (T280307) (duration: 11m 44s)
  • 11:23 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 11:20 urbanecm@deploy1002: Started scap: b9031bc: Mentor dashboard: Mentor tools (T280307)
  • 11:18 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 11:18 ladsgroup@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Disable jQuery Migrate on group1 (T280944) (duration: 00m 56s)
  • 11:16 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: b518d8b: Mentor dashboard: Enable beta mode at testwiki (T281534) (duration: 00m 55s)
  • 11:11 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.23/extensions/GrowthExperiments/includes/: b9031bc: Mentor dashboard: Mentor tools (T280307; 5) (duration: 00m 56s)
  • 11:10 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.23/extensions/GrowthExperiments/ServiceWiring.php: b9031bc: Mentor dashboard: Mentor tools (T280307; 4) (duration: 00m 56s)
  • 11:09 hnowlan: roll restarting restbase service in codfw
  • 11:08 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.23/extensions/GrowthExperiments/includes/MentorDashboard/Modules/MentorTools.php: b9031bc: Mentor dashboard: Mentor tools (T280307; 2) (duration: 00m 55s)
  • 11:07 urbanecm@deploy1002: sync-file aborted: b9031bc: Mentor dashboard: Mentor tools (T280307; 1) (duration: 00m 00s)
  • 11:07 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.23/extensions/GrowthExperiments/includes/MentorDashboard/MentorTools/MentorStatusManager.php: b9031bc: Mentor dashboard: Mentor tools (T280307; 1) (duration: 00m 57s)
  • 11:05 hnowlan: roll restarting restbase service in eqiad for openssl updates
  • 10:45 hnowlan: roll restarting kartotherian and tilerator on maps2*
  • 10:41 hnowlan: roll restarting kartotherian and tilerator on maps1*
  • 10:36 jynus: rolling restart bacula & minio daemons on backup hosts
  • 09:59 moritzm: restarting apache2 on thorium
  • 09:48 hnowlan@cumin1001: START - Cookbook sre.postgresql.postgres-init
  • 09:47 marostegui@cumin1001: dbctl commit (dc=all): 'Remove s10 from eqiad T167973', diff saved to https://phabricator.wikimedia.org/P17300 and previous config saved to /var/cache/conftool/dbconfig/20210920-094739-marostegui.json
  • 09:10 moritzm: installing openssl1.0 updates for stretch with backport for forthcoming Let's encrypt issuance chain update (T283165)
  • 08:35 moritzm: updating clamav on ticket.wikimedia.org/otrs1001 to 0.103.3
  • 08:02 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 07:58 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 07:58 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 07:49 moritzm: uploaded maps-deduped-tilelist 0.0.3~deb10u1 to buster-wikimedia/main T290982
  • 07:48 moritzm: uploaded maps-deduped-tilelist 0.0.3~deb10u1 to buster-wikimedia/main
  • 07:48 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 07:43 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 07:43 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 07:35 marostegui: Stop db1168 and db2129 in sync T167973
  • 07:34 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 07:34 urbanecm@deploy1002: Synchronized wmf-config/throttle.php: af9d6e4: Revert "Add throttle rule for Czech wiki course" (duration: 00m 56s)
  • 07:32 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1168 T167973', diff saved to https://phabricator.wikimedia.org/P17299 and previous config saved to /var/cache/conftool/dbconfig/20210920-073256-marostegui.json
  • 07:32 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1096:3316 T167973', diff saved to https://phabricator.wikimedia.org/P17298 and previous config saved to /var/cache/conftool/dbconfig/20210920-073206-marostegui.json
  • 07:31 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1096:3316 T167973', diff saved to https://phabricator.wikimedia.org/P17297 and previous config saved to /var/cache/conftool/dbconfig/20210920-073141-marostegui.json
  • 07:31 moritzm: uploaded PHP 7.2.34-18+0~20210223.60+debian10~1.gbpb21322+wmf2 to apt.wikimedia.org (component/php7.2 for buster-wikimedia) T291052
  • 07:29 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 07:28 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 8c1d665: enwiki: Bump Growth features to 25% (mentorship limited to 20% of those users) (T290927) (duration: 00m 57s)
  • 07:20 urbanecm: Revert undeployed config patch (https://gerrit.wikimedia.org/r/c/operations/mediawiki-config/+/721959); not even pulled to deployment, so assuming it never hit prod (T289771)
  • 06:00 marostegui: Upgrade db2071, db2072, db2094

2021-09-18

  • 01:47 ladsgroup@deploy1002: Synchronized php-1.37.0-wmf.23/includes/libs/rdbms/database/Database.php: (no justification provided) (duration: 00m 57s)
  • 01:01 ladsgroup@deploy1002: Synchronized php-1.37.0-wmf.23/includes/libs/rdbms/database/Database.php: (no justification provided) (duration: 01m 03s)

2021-09-17

  • 21:28 legoktm@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 21:19 legoktm@cumin1001: START - Cookbook sre.dns.netbox
  • 19:00 hnowlan@cumin1001: END (PASS) - Cookbook sre.postgresql.postgres-init (exit_code=0)
  • 17:02 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on cloudcephosd1022.eqiad.wmnet with reason: REIMAGE
  • 17:02 hnowlan@cumin1001: START - Cookbook sre.postgresql.postgres-init
  • 17:00 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd1022.eqiad.wmnet with reason: REIMAGE
  • 16:48 hnowlan@cumin1001: END (PASS) - Cookbook sre.postgresql.postgres-init (exit_code=0)
  • 16:27 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:25 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 16:11 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:04 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 14:49 hnowlan@cumin1001: START - Cookbook sre.postgresql.postgres-init
  • 14:29 hnowlan@cumin1001: END (PASS) - Cookbook sre.postgresql.postgres-init (exit_code=0)
  • 13:06 moritzm: installing 4.9.272 kernels on stretch hosts (no reboots yet)
  • 11:28 hnowlan@cumin1001: START - Cookbook sre.postgresql.postgres-init
  • 11:14 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 11:09 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 09:37 milimetric@deploy1002: Finished deploy [analytics/refinery@37e904a] (thin): Only syncing sanitize allowlist, deploying THIN for consistency (duration: 00m 07s)
  • 09:37 milimetric@deploy1002: Started deploy [analytics/refinery@37e904a] (thin): Only syncing sanitize allowlist, deploying THIN for consistency
  • 09:36 milimetric@deploy1002: Finished deploy [analytics/refinery@37e904a]: Only syncing sanitize allowlist (duration: 17m 43s)
  • 09:19 milimetric@deploy1002: Started deploy [analytics/refinery@37e904a]: Only syncing sanitize allowlist
  • 08:00 jayme: restarting php-fpm on wtp1037 and wtp1030
  • 02:28 ryankemper: T290330 [Remove WDQS codfw ~hourly restarts] Successfully rolled out to rest of fleet `sudo cumin 'C:query_service::crontasks' 'sudo run-puppet-agent --force && sudo systemctl reset-failed wdqs-restart-hourly-w-random-delay.timer'`
  • 02:22 ryankemper: T290330 [Remove WDQS codfw ~hourly restarts] `wdqs2001` and `wdqs2004` look fine after running `sudo systemctl reset-failed wdqs-restart-hourly-w-random-delay.timer` to clean up dangling timer
  • 01:55 ryankemper: T290330 [Remove WDQS codfw ~hourly restarts] Testing on arbitrary codfw host: `ryankemper@wdqs2001:~$ sudo run-puppet-agent`
  • 01:48 ryankemper: T290330 [Remove WDQS codfw ~hourly restarts] `sudo cumin 'C:query_service::crontasks' 'sudo disable-puppet "Stop doing wdqs codfw ~hourly restarts - T290330"'`
  • 00:04 legoktm@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'shellbox-media' for release 'main' .
  • 00:01 legoktm@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'shellbox-media' for release 'main' .

2021-09-16

  • 23:58 legoktm@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'shellbox-media' for release 'main' .
  • 23:51 ryankemper: T273673 All looks good, re-enabling puppet and running on rest of fleet: `sudo cumin 'R:Class = elasticsearch::log::hot_threads' 'sudo run-puppet-agent --force'`
  • 23:44 ryankemper: T273673 The associated crons are gone and I see the new systemd timers for both gc-cleanup and the hot threads logger
  • 23:39 ryankemper: T273673 Testing elasticsearch cron->systemd timer-job changes on canary instance `ryankemper@elastic1064:~$ sudo run-puppet-agent --force`
  • 23:37 ryankemper: T273673 Disabling puppet on elasticsearch hosts `sudo cumin 'R:Class = elasticsearch::log::hot_threads' 'sudo disable-puppet "https://gerrit.wikimedia.org/r/c/operations/puppet/+/721413 - T273673"'`
  • 23:21 legoktm@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
  • 23:21 legoktm@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
  • 23:19 legoktm@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
  • 23:18 legoktm@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
  • 23:18 legoktm@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
  • 23:17 legoktm@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
  • 23:17 legoktm@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 23:16 legoktm@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 22:45 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 22:40 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 22:38 legoktm@deploy1002: Finished scap: i18n for restoring deprecated token APIs (duration: 15m 30s)
  • 22:30 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 22:25 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 22:23 legoktm@deploy1002: Started scap: i18n for restoring deprecated token APIs
  • 22:21 legoktm@deploy1002: Synchronized php-1.37.0-wmf.23/includes/api/: Restore deprecated token APIs (3/3) (duration: 00m 56s)
  • 22:19 legoktm@deploy1002: Synchronized php-1.37.0-wmf.23/autoload.php: Restore deprecated token APIs (2/3) (duration: 00m 56s)
  • 22:16 legoktm@deploy1002: Synchronized php-1.37.0-wmf.23/includes/api/ApiTokens.php: Restore deprecated token APIs (1/3) (duration: 00m 56s)
  • 21:22 robh@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on ganeti4004.ulsfo.wmnet with reason: REIMAGE
  • 21:19 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti4004.ulsfo.wmnet with reason: REIMAGE
  • 21:04 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 21:00 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 20:49 ladsgroup@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Set jQuery migrate to false for wikibooks and Commons (T280944) (duration: 00m 56s)
  • 19:26 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 19:21 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 19:08 hashar@deploy1002: rebuilt and synchronized wikiversions files: all wikis to 1.37.0-wmf.23
  • 18:55 robh@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 18:51 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 18:50 robh@cumin1001: START - Cookbook sre.dns.netbox
  • 18:49 dzahn@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 18:46 dzahn@cumin1001: START - Cookbook sre.dns.netbox
  • 18:44 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 18:29 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.23/extensions/GrowthExperiments/modules/ext.growthExperiments.StructuredTask/addlink/AddLinkArticleTarget.js: bb8cba1: Use growthexperiments-structuredtask-no-suggestions-found-dialog-button in outdated suggestions dialog (2/2) (duration: 01m 06s)
  • 18:27 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.23/extensions/GrowthExperiments/extension.json: bb8cba1: Use growthexperiments-structuredtask-no-suggestions-found-dialog-button in outdated suggestions dialog (1/2) (duration: 01m 07s)
  • 17:54 volans: turn of lldp agent on NIC (both ports) on ms-be105[1-9],ms-be205[2-6] - T290984
  • 17:31 volans: turn of lldp agent on NIC (both ports) on ms-be2051 - T290984
  • 17:09 jynus: deployed extra grants for admin user on s6 primary
  • 16:39 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts an-test-coord1002.eqiad.wmnet
  • 16:17 btullis@cumin1001: START - Cookbook sre.hosts.decommission for hosts an-test-coord1002.eqiad.wmnet
  • 16:04 marostegui: Disconnect s6 master from m5 master (noting the replication position) T167973
  • 16:04 marostegui: Disconnect s6 master from m5 master (noting the replication position)
  • 15:52 bd808: marostegui is awesome and made wikitech better today. :)
  • 15:04 marostegui@cumin1001: dbctl commit (dc=all): 'Set wikitech on read-only for maintenance T287454', diff saved to https://phabricator.wikimedia.org/P17283 and previous config saved to /var/cache/conftool/dbconfig/20210916-150444-marostegui.json
  • 15:03 marostegui: Set wikitech on read-only (from now on all SAL changes will fail) T167973
  • 14:56 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on mwmaint2002.codfw.wmnet with reason: reimage
  • 14:55 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on mwmaint2002.codfw.wmnet with reason: reimage
  • 14:53 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mwmaint2002.codfw.wmnet with reason: REIMAGE
  • 14:53 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 14:51 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 14:51 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mwmaint2002.codfw.wmnet with reason: REIMAGE
  • 14:35 mutante: reimaging mwmaint2002 to buster (T267607, T245757)
  • 14:21 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mwmaint2002.codfw.wmnet with reason: reimage
  • 14:21 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mwmaint2002.codfw.wmnet with reason: reimage
  • 14:12 mutante: switching https://noc.wikimedia.org from codfw to eqiad (T287539, T267607)
  • 13:44 sukhe: homer: running for Gerrit: 721018: set up BGP peering to durum hosts in {eqiad,codfw,esams,ulsfo,eqsin}
  • 13:25 effie: pool mw1422 mw1455
  • 13:24 effie: poiol mw1422 mw1455
  • 13:18 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 13:16 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 13:12 hashar@deploy1002: Synchronized php: group1 wikis to 1.37.0-wmf.23 (duration: 01m 04s)
  • 13:11 hashar@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.37.0-wmf.23
  • 12:08 marostegui: Deploy schema change on s2 codfw (lag will show up) T290057
  • 12:00 mbsantos: start OSM re-import script in maps2009 (depooled)
  • 11:51 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.21/extensions/GrowthExperiments/includes/MentorDashboard/MenteeOverview/UncachedMenteeOverviewDataProvider.php: 529f86c: UncachedMenteeOverviewDataProvider: Do not fatal with zero mentees (T291088) (duration: 01m 04s)
  • 11:49 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.23/extensions/GrowthExperiments/includes/MentorDashboard/MenteeOverview/UncachedMenteeOverviewDataProvider.php: 9e0f6f8: UncachedMenteeOverviewDataProvider: Do not fatal with zero mentees (T291088) (duration: 01m 04s)
  • 11:43 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.23/extensions/AbuseFilter/: Fixing incorrect deployment of 01e4450 for T291123. This is supposed to be a no-op. (duration: 01m 05s)
  • 11:43 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 11:41 urbanecm: [urbanecm@deploy1002 /srv/mediawiki-staging/php-1.37.0-wmf.23 (wmf/1.37.0-wmf.23 * u+2-2)]$ git rebase && git submodule update extensions/AbuseFilter/ # fixing an incorrect deployment that happened in T291123
  • 11:41 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 11:41 urbanecm: [urbanecm@deploy1002 /srv/mediawiki-staging/php-1.37.0-wmf.23/extensions/AbuseFilter (wmf/1.37.0-wmf.23 u=)]$ git co 0d2bc7c # reset repo to expected state, fixing incorrect deploy of a backport in T291123
  • 11:34 moritzm: installing 4.9.272 kernels on stretch hosts (no reboots yet)
  • 11:34 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 11:32 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 11:21 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.renew-cert (exit_code=0) for mx2002.wikimedia.org: Renew puppet certificate - jmm@cumin2002
  • 11:21 jmm@cumin2002: START - Cookbook sre.puppet.renew-cert for mx2002.wikimedia.org: Renew puppet certificate - jmm@cumin2002
  • 11:15 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 11:13 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 11:11 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings-labs.php: Config: Add new WikimediaBadges config (T232927) (2/2) (duration: 01m 05s)
  • 11:09 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Add new WikimediaBadges config (T232927) (1/2) (duration: 01m 05s)
  • 11:03 jmm@cumin2002: END (FAIL) - Cookbook sre.puppet.renew-cert (exit_code=99) for mx2002.wikimedia.org: Renew puppet certificate - jmm@cumin2002
  • 11:03 jmm@cumin2002: START - Cookbook sre.puppet.renew-cert for mx2002.wikimedia.org: Renew puppet certificate - jmm@cumin2002
  • 10:59 hashar@deploy1002: Synchronized php-1.37.0-wmf.21/includes/language/Message.php: Message: Remove deprecated format property - T146416 T291124 (duration: 01m 06s)
  • 10:55 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 10:54 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 10:21 topranks: Changing default gateway on mw1422 to use VRRP backup (cr2), to determine if tail drops from switches to cr1 is cause of TCP retransmissions.
  • 10:14 effie: depool mw1455 for network testing
  • 10:11 effie: depool mw1422 for network testing
  • 10:01 jmm@cumin2002: END (FAIL) - Cookbook sre.puppet.renew-cert (exit_code=99) for mx2002.wikimedia.org: Renew puppet certificate - jmm@cumin2002
  • 10:01 jmm@cumin2002: START - Cookbook sre.puppet.renew-cert for mx2002.wikimedia.org: Renew puppet certificate - jmm@cumin2002
  • 10:00 jmm@cumin2002: END (FAIL) - Cookbook sre.puppet.renew-cert (exit_code=99) for mx2002.wikimedia.org: Renew puppet certificate - jmm@cumin2002
  • 10:00 jmm@cumin2002: START - Cookbook sre.puppet.renew-cert for mx2002.wikimedia.org: Renew puppet certificate - jmm@cumin2002
  • 09:36 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on mx2002.wikimedia.org with reason: reimage
  • 09:36 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on mx2002.wikimedia.org with reason: reimage
  • 09:10 moritzm: in-place re-installation of mx2002.wikimedia.org (test VM) to test the new installer key support in the sre.puppet.renew-cert cookbook
  • 08:04 moritzm: upgrading scandium to PHP 7.2 backport of patch for enhanced DOM replaceChild/removeChild performance T291052
  • 07:48 elukey@puppetmaster1001: conftool action : set/pooled=true; selector: dnsdisc=helm-charts,name=eqiad
  • 05:35 marostegui: Optimize dewiki.logging in codfw T287344

2021-09-15

  • 23:02 legoktm: upgrading lists1001 to use postorius 1.3.5
  • 22:51 legoktm: uploaded new mailmanclient/postorius packages to apt1001
  • 22:38 ryankemper: [WDQS Deploy] Deploy complete. Successful test query placed on query.wikidata.org, there's no relevant criticals in Icinga, and Grafana looks good
  • 22:03 ryankemper: [WDQS Deploy] Restarting `wdqs-categories` across lvs-managed hosts, one node at a time: `sudo -E cumin -b 1 'A:wdqs-all and not A:wdqs-test' 'depool && sleep 45 && systemctl restart wdqs-categories && sleep 45 && pool'`
  • 22:03 ryankemper: [WDQS Deploy] Restarted `wdqs-categories` across both test hosts simultaneously: `sudo -E cumin 'A:wdqs-test' 'systemctl restart wdqs-categories'`
  • 22:03 ryankemper: [WDQS Deploy] Restarted `wdqs-updater` across all hosts, 4 hosts at a time: `sudo -E cumin -b 4 'A:wdqs-all' 'systemctl restart wdqs-updater'`
  • 22:02 ryankemper@deploy1002: Finished deploy [wdqs/wdqs@902529b]: 0.3.85 (duration: 06m 59s)
  • 21:56 ryankemper: [WDQS Deploy] Tests passing following deploy of `0.3.85` on canary `wdqs1003`; proceeding to rest of fleet
  • 21:55 ryankemper@deploy1002: Started deploy [wdqs/wdqs@902529b]: 0.3.85
  • 21:55 ryankemper: [WDQS Deploy] Gearing up for deploy of wdqs `0.3.85`. Pre-deploy tests passing on canary `wdqs1003`
  • 21:42 ebernhardson@deploy1002: Finished deploy [wdqs/wdqs@f3473d9]: Reference files deployed by puppet through query_service paths instead of wdqs (duration: 02m 07s)
  • 21:40 ebernhardson@deploy1002: Started deploy [wdqs/wdqs@f3473d9]: Reference files deployed by puppet through query_service paths instead of wdqs
  • 21:29 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 21:27 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 21:04 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 21:03 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 21:00 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 60e7e51: Set wmgEchoEnablePush to false explicitly on arbcom_* wikis (T291128) (duration: 01m 06s)
  • 19:50 twentyafterfour@deploy1002: Synchronized php-1.37.0-wmf.23/extensions/AbuseFilter/: sync backport for https://gerrit.wikimedia.org/r/c/mediawiki/extensions/AbuseFilter/+/721312 (duration: 01m 06s)
  • 19:44 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 19:43 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 19:15 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 19:14 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 19:07 hashar@deploy1002: rebuilt and synchronized wikiversions files: Rollback all wikis to 1.37.0-wmf.23
  • 19:07 urbanecm: Re-start server-side upload for 1 video file, likely temporary swift failure (T289781)
  • 19:06 urbanecm: Start server-side upload for 1 video file (T287686)
  • 19:04 hashar@deploy1002: Synchronized php: group1 wikis to 1.37.0-wmf.23 (duration: 00m 55s)
  • 19:03 hashar@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.37.0-wmf.23
  • 18:52 urbanecm: Start server-side upload for 1 video file (T289949)
  • 18:50 urbanecm: Start server-side upload for 1 video file (T289781)
  • 18:44 urbanecm: Start server-side upload for 3 large PDF files (T290722)
  • 18:43 legoktm: migrated sitereq-l@ from Google Groups to Mailman (T290908)
  • 18:27 urbanecm: Start server-side upload for 1 video file (T290290)
  • 18:23 urbanecm: Start server-side upload for 1 video file (T290685)
  • 18:21 urbanecm: Start server-side upload for 1 video file (T290707)
  • 18:16 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 18:14 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 18:07 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 18:05 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 18:05 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 7620084: Add portrattarkiv.se to wgCopyUploadsDomains whitelist of Wikimedia Commons (T290581) (duration: 01m 05s)
  • 17:39 mutante: thumbor - running puppet on all thumbor hosts, removed cron job systemd-thumbor-tmpfiles-clean, added thumbor_systemd_tmpfiles_clean timer job
  • 16:56 joal@deploy1002: Finished deploy [analytics/refinery@0f7f6f3] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@0f7f6f3] (duration: 06m 15s)
  • 16:50 joal@deploy1002: Started deploy [analytics/refinery@0f7f6f3] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@0f7f6f3]
  • 16:47 joal@deploy1002: Finished deploy [analytics/refinery@0f7f6f3] (thin): Regular analytics weekly train THIN [analytics/refinery@0f7f6f3] (duration: 00m 07s)
  • 16:47 joal@deploy1002: Started deploy [analytics/refinery@0f7f6f3] (thin): Regular analytics weekly train THIN [analytics/refinery@0f7f6f3]
  • 16:45 joal@deploy1002: Finished deploy [analytics/refinery@0f7f6f3]: Regular analytics weekly train [analytics/refinery@0f7f6f3] (duration: 19m 43s)
  • 16:31 dzahn@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host durum5002.eqsin.wmnet
  • 16:26 joal@deploy1002: Started deploy [analytics/refinery@0f7f6f3]: Regular analytics weekly train [analytics/refinery@0f7f6f3]
  • 16:19 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm for new host durum5002.eqsin.wmnet
  • 16:17 dzahn@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host durum5001.eqsin.wmnet
  • 16:02 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm for new host durum5001.eqsin.wmnet
  • 15:56 urbanecm: Remove 2FA for User:Rho at wikitech, identity verified via a videocall
  • 14:50 moritzm: installing lz4 security updates on stretch
  • 13:50 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 13:33 ottomata: pointing {stats,analytics}.wikimedia.org at analytics-web.discovery.wmnet cname - T285355
  • 13:32 dzahn@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host durum4002.ulsfo.wmnet
  • 13:18 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm for new host durum4002.ulsfo.wmnet
  • 13:15 dzahn@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host durum4001.ulsfo.wmnet
  • 13:03 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm for new host durum4001.ulsfo.wmnet
  • 12:54 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 11:41 marostegui: Install 10.4.21-2 on db1125
  • 11:25 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 11:23 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 11:21 Lucas_WMDE: EU backport+config window done
  • 11:20 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Enable change-tags for new edits' proofread status at mulWS (T289140) (duration: 01m 06s)
  • 11:16 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 11:10 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Don’t check constraints on two property qualifiers (T235292) (duration: 01m 11s)
  • 11:09 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 10:03 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=maps1010.eqiad.wmnet
  • 09:55 effie: depool wtp1026
  • 09:54 effie: depooling mw1312 and mw1319
  • 09:46 topranks: Disabling Intel X710 NIC on-board LLDP processing on relforge1003 (T290984)
  • 07:04 jiji@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 06:57 elukey: shutdown ms-be2045 (again) after seeing T290881
  • 06:02 elukey: powercycle ms-be2045 - no ssh, no remote tty available
  • 05:28 marostegui@cumin1001: dbctl commit (dc=all): 'Restore db1109 original load', diff saved to https://phabricator.wikimedia.org/P17274 and previous config saved to /var/cache/conftool/dbconfig/20210915-052802-marostegui.json
  • 04:30 marostegui@cumin1001: dbctl commit (dc=all): 'Increase db1109 load', diff saved to https://phabricator.wikimedia.org/P17273 and previous config saved to /var/cache/conftool/dbconfig/20210915-043053-marostegui.json

2021-09-14

  • 23:01 legoktm@deploy1002: Synchronized wmf-config/CommonSettings.php: Re-enable VipsScaler (2 of 2) (duration: 01m 04s)
  • 22:59 legoktm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Re-enable VipsScaler (1 of 2) (duration: 01m 05s)
  • 22:58 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 22:53 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 22:43 legoktm: legoktm@cumin2001:~$ sudo systemctl reset-failed # clear httpbb_hourly_tests failure, moved to cumin1001
  • 22:34 legoktm@deploy1002: Finished scap: Rebuild i18n for redeployment of VipsScaler (T290759) (duration: 23m 49s)
  • 22:34 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 22:29 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 22:12 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 22:11 legoktm@deploy1002: Started scap: Rebuild i18n for redeployment of VipsScaler (T290759)
  • 22:10 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 20:26 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 20:23 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 20:20 dancy: testing upcoming Scap release on beta
  • 20:20 ladsgroup@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Early adopt wgIncludejQueryMigrate=false on nlwiki (T280944) (duration: 01m 48s)
  • 20:06 cdanis: T290425 ✔️ cdanis@alert1001.wikimedia.org ~ 🕓🍵 sudo /usr/bin/statograph -c /etc/statograph/config.yml erase_metric_data lyfcttm2lhw4
  • 20:06 cdanis: T290425 ✔️ cdanis@alert1001.wikimedia.org ~ 🕓🍵 sudo /usr/bin/statograph -c /etc/statograph/config.yml erase_metric_data h5mvbny28713
  • 19:19 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 19:16 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 19:08 hashar@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.37.0-wmf.23
  • 18:48 moritzm: removed filter for tcp/25 on mx2001, reimage is complete T286911
  • 18:33 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 18:32 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 18:24 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 18:23 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 18:20 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 2982638: Offer the DiscussionTools reply tool as opt-out setting at ptwikinews (T285162) (duration: 01m 06s)
  • 18:16 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 7f1de32: Offer the DiscussionTools reply tool as opt-out setting at Wikimania wiki (T284339) (duration: 01m 05s)
  • 18:15 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 18:14 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 18:13 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: e36f4d3: DiscussionTools: Make newtopictool available to everyone on arwiki and cswiki (T285724) (duration: 01m 04s)
  • 18:09 urbanecm@deploy1002: Synchronized debug.json: Idef64e72 (duration: 01m 29s)
  • 18:06 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 18:05 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 17:54 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on mx2001.wikimedia.org with reason: reimage
  • 17:54 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on mx2001.wikimedia.org with reason: reimage
  • 17:45 moritzm: reimaging mx2001 to bullseye T286911
  • 16:57 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 16:55 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 16:32 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:28 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 16:28 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:24 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 16:20 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:16 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 15:53 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on maps1010.eqiad.wmnet with reason: Resyncing from master
  • 15:53 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 5:00:00 on maps1010.eqiad.wmnet with reason: Resyncing from master
  • 15:51 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=maps1010.eqiad.wmnet
  • 15:43 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 15:41 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 15:34 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 15:32 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 15:19 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for 37 hosts
  • 15:19 kormat@cumin1001: START - Cookbook sre.hosts.remove-downtime for 37 hosts
  • 15:11 jelto@cumin2002: END (PASS) - Cookbook sre.switchdc.mediawiki.09-update-tendril (exit_code=0)
  • 15:11 jelto@cumin2002: START - Cookbook sre.switchdc.mediawiki.09-update-tendril
  • 15:10 jelto@cumin2002: END (PASS) - Cookbook sre.switchdc.mediawiki.09-run-puppet-on-db-masters (exit_code=0)
  • 15:07 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:07 jelto@cumin2002: START - Cookbook sre.switchdc.mediawiki.09-run-puppet-on-db-masters
  • 15:06 jelto@cumin2002: END (PASS) - Cookbook sre.switchdc.mediawiki.09-restore-ttl (exit_code=0)
  • 15:05 jelto@cumin2002: START - Cookbook sre.switchdc.mediawiki.09-restore-ttl
  • 15:04 marostegui@cumin1001: dbctl commit (dc=all): 'Increase db1109 load', diff saved to https://phabricator.wikimedia.org/P17271 and previous config saved to /var/cache/conftool/dbconfig/20210914-150458-marostegui.json
  • 15:03 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 15:00 jelto@cumin2002: END (PASS) - Cookbook sre.switchdc.mediawiki.08-start-maintenance (exit_code=0)
  • 14:58 jelto@cumin2002: START - Cookbook sre.switchdc.mediawiki.08-start-maintenance
  • 14:55 marostegui@cumin1001: dbctl commit (dc=all): 'Reduce db1109 load', diff saved to https://phabricator.wikimedia.org/P17270 and previous config saved to /var/cache/conftool/dbconfig/20210914-145522-marostegui.json
  • 14:54 jelto@cumin2002: END (PASS) - Cookbook sre.switchdc.mediawiki.01-stop-maintenance (exit_code=0)
  • 14:54 jelto@cumin2002: START - Cookbook sre.switchdc.mediawiki.01-stop-maintenance
  • 14:53 jelto@cumin2002: END (ERROR) - Cookbook sre.switchdc.mediawiki.08-start-maintenance (exit_code=97)
  • 14:53 marostegui@cumin1001: dbctl commit (dc=all): 'Reduce db1109 load', diff saved to https://phabricator.wikimedia.org/P17269 and previous config saved to /var/cache/conftool/dbconfig/20210914-145324-marostegui.json
  • 14:52 jelto@cumin2002: START - Cookbook sre.switchdc.mediawiki.08-start-maintenance
  • 14:49 jelto@cumin2002: END (FAIL) - Cookbook sre.switchdc.mediawiki.08-restart-envoy-on-jobrunners (exit_code=99)
  • 14:49 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:49 jelto@cumin2002: START - Cookbook sre.switchdc.mediawiki.08-restart-envoy-on-jobrunners
  • 14:46 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 14:46 jelto@cumin2002: END (PASS) - Cookbook sre.switchdc.mediawiki.07-set-readwrite (exit_code=0)
  • 14:46 jelto@cumin2002: MediaWiki read-only period ends at: 2021-09-14 14:46:30.570035
  • 14:45 jelto@cumin2002: START - Cookbook sre.switchdc.mediawiki.07-set-readwrite
  • 14:45 jelto@cumin2002: END (PASS) - Cookbook sre.switchdc.mediawiki.06-set-db-readwrite (exit_code=0)
  • 14:45 jelto@cumin2002: START - Cookbook sre.switchdc.mediawiki.06-set-db-readwrite
  • 14:45 jelto@cumin2002: END (PASS) - Cookbook sre.switchdc.mediawiki.05-invert-redis-sessions (exit_code=0)
  • 14:45 jelto@cumin2002: START - Cookbook sre.switchdc.mediawiki.05-invert-redis-sessions
  • 14:45 jelto@cumin2002: END (PASS) - Cookbook sre.switchdc.mediawiki.04-switch-mediawiki (exit_code=0)
  • 14:44 jelto@cumin2002: START - Cookbook sre.switchdc.mediawiki.04-switch-mediawiki
  • 14:44 jelto@cumin2002: END (PASS) - Cookbook sre.switchdc.mediawiki.03-set-db-readonly (exit_code=0)
  • 14:44 jelto@cumin2002: START - Cookbook sre.switchdc.mediawiki.03-set-db-readonly
  • 14:44 jelto@cumin2002: END (PASS) - Cookbook sre.switchdc.mediawiki.02-set-readonly (exit_code=0)
  • 14:43 jelto@cumin2002: MediaWiki read-only period starts at: 2021-09-14 14:43:48.272827
  • 14:43 jelto@cumin2002: START - Cookbook sre.switchdc.mediawiki.02-set-readonly
  • 14:40 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 37 hosts with reason: DC switchover
  • 14:40 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 37 hosts with reason: DC switchover
  • 14:39 jelto@cumin2002: END (PASS) - Cookbook sre.switchdc.mediawiki.01-stop-maintenance (exit_code=0)
  • 14:39 jelto@cumin2002: START - Cookbook sre.switchdc.mediawiki.01-stop-maintenance
  • 14:34 jelto@cumin2002: END (PASS) - Cookbook sre.switchdc.mediawiki.00-warmup-caches (exit_code=0)
  • 14:32 jelto@cumin2002: START - Cookbook sre.switchdc.mediawiki.00-warmup-caches
  • 14:30 jelto@cumin2002: END (PASS) - Cookbook sre.switchdc.mediawiki.00-reduce-ttl (exit_code=0)
  • 14:24 jelto@cumin2002: START - Cookbook sre.switchdc.mediawiki.00-reduce-ttl
  • 14:22 jelto@cumin2002: END (PASS) - Cookbook sre.switchdc.mediawiki.00-disable-puppet (exit_code=0)
  • 14:22 jelto@cumin2002: START - Cookbook sre.switchdc.mediawiki.00-disable-puppet
  • 14:19 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 14:12 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 14:10 legoktm@deploy1002: Synchronized wmf-config/CommonSettings.php: Avoid warning about undefined $wgFileBlacklist (T290640) (duration: 01m 32s)
  • 13:44 mbsantos@deploy1002: Finished deploy [kartotherian/deploy@0a38bc5]: kartotherian: restore v4 maxzoom to z15 (duration: 00m 10s)
  • 13:43 mbsantos@deploy1002: Started deploy [kartotherian/deploy@0a38bc5]: kartotherian: restore v4 maxzoom to z15
  • 13:43 mbsantos@deploy1002: Finished deploy [kartotherian/deploy@79bc0c6]: geoshapes: update table names (duration: 00m 14s)
  • 13:42 mbsantos@deploy1002: Started deploy [kartotherian/deploy@79bc0c6]: geoshapes: update table names
  • 13:27 mbsantos@deploy1002: Finished deploy [kartotherian/deploy@0a38bc5]: kartotherian: restore v4 maxzoom to z15 (duration: 00m 10s)
  • 13:27 mbsantos@deploy1002: Started deploy [kartotherian/deploy@0a38bc5]: kartotherian: restore v4 maxzoom to z15
  • 13:26 mbsantos@deploy1002: Finished deploy [kartotherian/deploy@1ebdca4]: (no justification provided) (duration: 00m 15s)
  • 13:26 mbsantos@deploy1002: Started deploy [kartotherian/deploy@1ebdca4]: (no justification provided)
  • 12:32 jelto@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
  • 12:32 jelto@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
  • 12:29 jelto@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
  • 12:29 jelto@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
  • 12:19 jelto@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
  • 12:19 jelto@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
  • 12:17 jelto@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 12:17 jelto@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 11:33 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2026.codfw.wmnet
  • 11:22 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2026.codfw.wmnet
  • 10:47 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host testvm2001.codfw.wmnet
  • 10:31 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host testvm2001.codfw.wmnet
  • 10:05 hashar@deploy1002: Pruned MediaWiki: 1.37.0-wmf.20 (duration: 01m 48s)
  • 09:47 hashar@deploy1002: Pruned MediaWiki: 1.37.0-wmf.19 (duration: 04m 13s)
  • 09:40 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts testvm2001.codfw.wmnet
  • 09:38 hashar@deploy1002: Finished scap: testwikis wikis to 1.37.0-wmf.23 (duration: 70m 39s)
  • 09:29 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts testvm2001.codfw.wmnet
  • 09:21 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts testvm2002.codfw.wmnet
  • 09:10 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts testvm2002.codfw.wmnet
  • 09:09 Emperor: swift rebalance to remove h/w faulty host ms-be2045 T290881
  • 09:04 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 08:57 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 08:47 moritzm: installing testvm2002
  • 08:42 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host testvm2002.codfw.wmnet
  • 08:28 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host testvm2002.codfw.wmnet
  • 08:27 hashar@deploy1002: Started scap: testwikis wikis to 1.37.0-wmf.23
  • 08:25 godog: poweroff ms-be2045 and set it as failed in netbox - T290881
  • 08:24 hashar: train: applied security patches for 1.37.0-wmf.23 # T281164
  • 08:05 godog: wipe non-os partitions from ms-be2045 - T290881
  • 07:50 vgutierrez: update acme-chief to version 0.31 on acmechief hosts - T290249
  • 04:47 eileen: civicrm revision changed from 1f071f6c6c to e6bf81d99c, config revision is 23eda8ba3a
  • 02:41 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 02:39 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 02:07 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 02:07 James_F: wmf/1.37.0-wmf.23 was branched at ea72c9b for T281164
  • 02:03 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 00:04 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 00:01 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .

2021-09-13

  • 23:54 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 23:52 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 23:45 jforrester@deploy1002: Synchronized wmf-config/InitialiseSettings.php: T290759: Undeploy VipsScaler: III – Don't set wmgUseVips, now ignored (duration: 00m 58s)
  • 23:45 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 23:43 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 23:41 jforrester@deploy1002: Synchronized wmf-config/CommonSettings.php: T290759: Undeploy VipsScaler: II – Don't load regardless of config (duration: 00m 58s)
  • 19:52 jforrester@deploy1002: Synchronized wmf-config/InitialiseSettings.php: T290759 Undeploy VipsScaler: I – Disable on all wikis (duration: 00m 57s)
  • 19:49 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 19:47 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 19:04 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 18:59 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 18:59 urbanecm: [urbanecm@mwmaint2002 ~]$ mwscript resetAuthenticationThrottle.php --wiki={cswiki,cswikiversity} --signup --ip=185.47.223.49 # T290809
  • 18:58 urbanecm@deploy1002: Synchronized wmf-config/throttle.php: 9db1d1a: Add throttle rule for Czech wiki course (T290809) (duration: 00m 58s)
  • 18:29 ryankemper: [Cirrus] `eqiad` fully recovered (100% of shards), `codfw` at 99.816%. `codfw` is getting held up by recovery of `enwiki` shards which tend to be quite large
  • 18:25 razzi: reenable replication on dbstore1007 for T290841
  • 18:16 cwhite: apply high log volume from ES mitigations to deprecated inputs
  • 18:13 razzi: razzi@dbstore1007:~$ sudo systemctl restart mariadb@s3.service for T290841
  • 18:05 razzi: sudo systemctl restart mariadb@s2.service
  • 17:48 ryankemper: [Cirrus] `eqiad` is at 99.13% shards recovered and `codfw` is at 98.83%
  • 17:20 volans@cumin1001: END (PASS) - Cookbook sre.experimental.reimage (exit_code=0) for host sretest1002.eqiad.wmnet
  • 17:17 ryankemper: [Cirrus] `enwiki` searches appear to be working now. `production-search-eqiad` is at 93.5% recovered shards, `production-search-codfw` is at 95.3% recovered
  • 16:57 volans@cumin1001: START - Cookbook sre.experimental.reimage for host sretest1002.eqiad.wmnet
  • 16:18 legoktm@cumin1001: conftool action : set/pooled=false; selector: name=codfw,dnsdisc=eventgate-main
  • 16:16 volans@cumin1001: conftool action : set/pooled=yes; selector: name=mw1414.*
  • 16:08 volans@cumin1001: conftool action : set/pooled=no; selector: name=mw1414.*
  • 16:06 volans@cumin1001: END (PASS) - Cookbook sre.experimental.reimage (exit_code=0) for host mw1414.eqiad.wmnet
  • 15:54 moritzm: filtered mx2001 on the routers for reimage T286911
  • 15:43 vgutierrez: update acme-chief to version 0.31 on acmechief-test hosts - T290249
  • 15:40 vgutierrez: upload acme-chief 0.31 to apt.wm.o (buster) - T290249
  • 15:32 jelto: Traffic: depool codfw from user traffic
  • 15:26 jelto@cumin2002: END (PASS) - Cookbook sre.switchdc.services.02-restore-ttl (exit_code=0)
  • 15:25 jelto@cumin2002: START - Cookbook sre.switchdc.services.02-restore-ttl
  • 15:25 volans@cumin1001: START - Cookbook sre.experimental.reimage for host mw1414.eqiad.wmnet
  • 15:20 Emperor: rebooting ms-be2045 to see if that brings the disk back properly T290881
  • 15:13 jelto@cumin2002: conftool action : set/pooled=false; selector: name=eqiad,dnsdisc=restbase-async
  • 15:13 legoktm: (cotd.) box-constraints|similar-users|termbox|thanos-query|thanos-swift|wdqs|wdqs-internal|wikifeeds|zotero)
  • 15:13 rzl: (contd.) box-constraints|similar-users|termbox|thanos-query|thanos-swift|wdqs|wdqs-internal|wikifeeds|zotero)
  • 15:12 jelto@cumin2002: conftool action : set/pooled=true; selector: name=codfw,dnsdisc=(apertium|api-gateway|citoid|cxserver|echostore|eventgate-analytics|eventgate-analytics-external|eventgate-logging-external|eventgate-main|eventstreams|eventstreams-internal|kartotherian|linkrecommendation|mathoid|mobileapps|ores|proton|push-notifications|recommendation-api|restbase|restbase-async|schema|search|sessionstore|shellbox|shell
  • 15:02 jelto@cumin2002: END (PASS) - Cookbook sre.switchdc.services.00-reduce-ttl-and-sleep (exit_code=0)
  • 15:02 topranks: Restarting unused line-card FPC 1 in cr2-codfw in attempt to clear alarm.
  • 14:56 jelto@cumin2002: START - Cookbook sre.switchdc.services.00-reduce-ttl-and-sleep
  • 14:44 herron: drained mx2001 mail queue to mx1001 T286911
  • 14:38 dcausse: restarting wdqs-updater.service on all wdqs servers
  • 14:21 jelto@cumin2002: END (PASS) - Cookbook sre.switchdc.services.02-restore-ttl (exit_code=0)
  • 14:20 jelto@cumin2002: START - Cookbook sre.switchdc.services.02-restore-ttl
  • 14:13 jelto@cumin2002: END (PASS) - Cookbook sre.switchdc.services.01-switch-dc (exit_code=0)
  • 14:13 legoktm: (cotd.) ternal, eventgate-main, wikifeeds, eventstreams-internal, eventgate-analytics-external: codfw => eqiad
  • 14:12 jelto@cumin2002: Switching services echostore, termbox, cxserver, eventstreams, search, ores, mathoid, schema, push-notifications, thanos-swift, wdqs, sessionstore, restbase, wdqs-internal, apertium, eventgate-analytics, citoid, api-gateway, restbase-async, proton, linkrecommendation, thanos-query, shellbox, kartotherian, mobileapps, recommendation-api, zotero, similar-users, shellbox-constraints, eventgate-logging-ex
  • 14:12 jelto@cumin2002: START - Cookbook sre.switchdc.services.01-switch-dc
  • 14:11 jelto@cumin2002: END (PASS) - Cookbook sre.switchdc.services.00-reduce-ttl-and-sleep (exit_code=0)
  • 14:05 jelto@cumin2002: START - Cookbook sre.switchdc.services.00-reduce-ttl-and-sleep
  • 14:03 dzahn@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host durum3002.esams.wmnet
  • 13:51 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm for new host durum3002.esams.wmnet
  • 13:50 dzahn@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host durum3001.esams.wmnet
  • 13:39 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm for new host durum3001.esams.wmnet
  • 13:36 dzahn@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host durum2002.codfw.wmnet
  • 13:21 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm for new host durum2002.codfw.wmnet
  • 13:20 dzahn@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host durum2001.codfw.wmnet
  • 13:08 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm for new host durum2001.codfw.wmnet
  • 12:09 volans@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 12:03 volans@cumin1001: START - Cookbook sre.dns.netbox
  • 11:32 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 11:27 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 11:26 kostajh: European mid-day backport window deploys done
  • 11:24 kharlan@deploy1002: Synchronized wmf-config: Config: WikimediaEvents: Remove UnderstandingFirstDay config (duration: 00m 59s)
  • 10:51 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts testvm2002.codfw.wmnet
  • 10:43 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts testvm2002.codfw.wmnet
  • 10:15 volans@cumin1001: END (FAIL) - Cookbook sre.experimental.reimage (exit_code=93) for host mw1414.eqiad.wmnet
  • 09:33 volans: restarting tcpircbot-logmsgbot on alert1001, not relying messages
  • 09:18 elukey: upgrade rsyslog* on ml-serve* nodes to 8.1901.0-1+wmf2
  • 09:16 godog: swift eqiad-prod: add weight to ms-be10[64-67] - T290546
  • 09:11 moritzm: reimaging sretest1002
  • 09:11 elukey: upload rsyslog* 8.1901.0-1+wmf2 to buster-wikimedia component/rsyslog-k8s - T277739
  • 08:16 godog: bump +100G prometheus/ops codfw

2021-09-12

  • 18:33 vgutierrez: restart varnish-fe on cp3061, cp3063 and cp3065
  • 18:29 vgutierrez: restart varnish on cp3055
  • 18:26 vgutierrez: restart varnish on cp3057
  • 04:53 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 04:52 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .

2021-09-11

  • 19:02 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 27814b8: testwiki: Fully remove securepoll-related groups (T290808) (duration: 00m 57s)
  • 18:35 urbanecm: [urbanecm@mwmaint2002 ~]$ mwscript emptyUserGroup.php --wiki=testwiki {electionadmin,electcomm} # T290808
  • 18:31 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 908bbf3: Revert "test: Add electcomm and electionadmin groups" (T290808) (duration: 00m 58s)

2021-09-10

  • 21:28 legoktm@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'shellbox-syntaxhighlight' for release 'main' .
  • 21:27 legoktm@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'shellbox-syntaxhighlight' for release 'main' .
  • 21:21 legoktm@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'shellbox-syntaxhighlight' for release 'main' .
  • 20:46 jhuneidi@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'blubberoid' for release 'production' .
  • 20:44 jhuneidi@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'blubberoid' for release 'production' .
  • 20:42 jhuneidi@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'blubberoid' for release 'staging' .
  • 18:34 volans@cumin1001: END (FAIL) - Cookbook sre.experimental.reimage (exit_code=99) for host sretest1001.eqiad.wmnet
  • 18:08 volans@cumin1001: START - Cookbook sre.experimental.reimage for host sretest1001.eqiad.wmnet
  • 17:16 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on puppetmaster2005.codfw.wmnet with reason: REIMAGE
  • 17:14 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on puppetmaster2005.codfw.wmnet with reason: REIMAGE
  • 16:42 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on puppetmaster2004.codfw.wmnet with reason: REIMAGE
  • 16:40 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on puppetmaster2004.codfw.wmnet with reason: REIMAGE
  • 16:14 volans@cumin1001: END (FAIL) - Cookbook sre.experimental.reimage (exit_code=99) for host sretest1001.eqiad.wmnet
  • 16:03 volans@cumin1001: START - Cookbook sre.experimental.reimage for host sretest1001.eqiad.wmnet
  • 15:39 volans@cumin1001: END (FAIL) - Cookbook sre.experimental.reimage (exit_code=99) for host sretest1001.eqiad.wmnet
  • 15:27 volans@cumin1001: START - Cookbook sre.experimental.reimage for host sretest1001.eqiad.wmnet
  • 14:48 jiji@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 14:43 jiji@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 13:54 jiji@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 09:31 XioNoX: push pfw policies - T290611
  • 09:07 mutante: planet - deleted all state files for all languages, running fresh update via systemctl start for all languages after proxy changes (T285251)
  • 08:37 jynus: upgrade and restart db2139
  • 08:14 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
  • 08:14 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
  • 08:14 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
  • 08:13 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
  • 08:12 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
  • 08:12 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
  • 07:58 jayme: updating rsyslog to 8.1901.0-1~bpo9+wmf2 on kubernetes-workers - T289766
  • 07:57 moritzm: installing ntfs-3g security updates
  • 07:46 jayme@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
  • 07:45 jayme@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
  • 07:31 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
  • 07:31 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
  • 07:25 jayme: updating rsyslog to 8.1901.0-1~bpo9+wmf2 on kubernetes-staging - T289766
  • 07:19 jayme: importes rsyslog 8.1901.0-1~bpo9+wmf2 to stretch-wikimedia - T289766
  • 06:56 effie: disable puppet on deploy1002 and mw2254
  • 06:29 jayme@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'sync'.
  • 06:27 jayme@deploy1002: helmfile [codfw] START helmfile.d/admin 'sync'.
  • 06:26 jayme@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'sync'.
  • 06:26 jayme@deploy1002: helmfile [eqiad] START helmfile.d/admin 'sync'.
  • 06:02 elukey@puppetmaster1001: conftool action : set/pooled=inactive; selector: name=mw2280.codfw.wmnet
  • 05:59 jiji@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 05:56 elukey: powercycle mw2280 - no tty available in mgmt, no ssh, host frozen
  • 05:55 elukey@puppetmaster1001: conftool action : set/pooled=no; selector: name=mw2280.codfw.wmnet
  • 05:54 jiji@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 05:45 jiji@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 05:42 jiji@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 05:12 marostegui: Repool clouddb1017:3311
  • 05:12 marostegui: Repool clouddb1013:3311
  • 04:49 marostegui: Depool clouddb1013:3311
  • 04:49 marostegui: Depool clouddb1017:3311
  • 02:52 eileen: civicrm revision changed from 83f514f693 to 1f071f6c6c, config revision is 23eda8ba3a
  • 00:35 tgr: Deployed patch for T290692

2021-09-09

  • 23:07 brennen: no takers on patches, ending backport & config training window.
  • 21:31 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 21:27 pt1979@cumin2002: START - Cookbook sre.dns.netbox
  • 21:20 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 21:17 pt1979@cumin2002: START - Cookbook sre.dns.netbox
  • 21:02 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 20:58 pt1979@cumin2002: START - Cookbook sre.dns.netbox
  • 19:40 jiji@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 19:37 jiji@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 19:06 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 19:04 jiji@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 18:37 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 18:34 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 18:33 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: bc4f204: Growth: Push 44 wikis out of dark mode (T289680) (duration: 00m 57s)
  • 18:24 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 6af38d9: Deploy Growth features in dark modes to ~200 wikis (T290582; 3/3) (duration: 00m 57s)
  • 18:22 urbanecm@deploy1002: Synchronized wmf-config/config/: 6af38d9: Deploy Growth features in dark modes to ~200 wikis (T290582; 2/3) (duration: 01m 01s)
  • 18:21 urbanecm@deploy1002: Synchronized dblists/growthexperiments.dblist: 6af38d9: Deploy Growth features in dark modes to ~200 wikis (T290582; 1/3) (duration: 00m 58s)
  • 18:21 jayme@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'sync'.
  • 18:20 jayme@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'sync'.
  • 18:20 urbanecm@deploy1002: sync-file aborted: 6af38d9: Deploy Growth features in dark modes to ~200 wikis (T290582) (duration: 00m 05s)
  • 18:18 volans@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest1001.eqiad.wmnet with reason: REIMAGE
  • 18:18 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'sync'.
  • 18:17 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 18:17 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'sync'.
  • 18:16 volans@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest1001.eqiad.wmnet with reason: REIMAGE
  • 18:13 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 18:12 urbanecm: [urbanecm@mwmaint2002 /srv/mediawiki/php]$ foreachwikiindblist growthexperiments extensions/GrowthExperiments/maintenance/initWikiConfig.php --phab=T290582 | tee ~/initwikiconfig.out # T290582
  • 18:11 urbanecm: Run extensions/WikimediaMaintenance/createExtensionTables.php growthexperiments for wikis in P17258 (T290582)
  • 18:06 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 18:05 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 18:05 urbanecm@deploy1002: Synchronized wmf-config/config: no-op: 76c51f2: Standardize indentation in several .yaml files (duration: 00m 58s)
  • 17:29 jelto@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.08-update-tendril (exit_code=0)
  • 17:28 jelto@cumin1001: START - Cookbook sre.switchdc.mediawiki.08-update-tendril
  • 17:28 jelto@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.08-start-maintenance (exit_code=0)
  • 17:26 jelto@cumin1001: START - Cookbook sre.switchdc.mediawiki.08-start-maintenance
  • 17:25 jelto@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.08-run-puppet-on-db-masters (exit_code=0)
  • 17:22 jelto@cumin1001: START - Cookbook sre.switchdc.mediawiki.08-run-puppet-on-db-masters
  • 17:21 jelto@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.08-restore-ttl (exit_code=0)
  • 17:21 jelto@cumin1001: START - Cookbook sre.switchdc.mediawiki.08-restore-ttl
  • 17:21 jelto@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.08-restart-envoy-on-jobrunners (exit_code=0)
  • 17:20 jelto@cumin1001: START - Cookbook sre.switchdc.mediawiki.08-restart-envoy-on-jobrunners
  • 17:14 jelto@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.07-set-readwrite (exit_code=0)
  • 17:14 jelto@cumin1001: [DRY-RUN] MediaWiki read-only period ends at: 2021-09-09 17:14:12.502162
  • 17:14 jelto@cumin1001: START - Cookbook sre.switchdc.mediawiki.07-set-readwrite
  • 17:14 jelto@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.06-set-db-readwrite (exit_code=0)
  • 17:14 jelto@cumin1001: START - Cookbook sre.switchdc.mediawiki.06-set-db-readwrite
  • 17:13 jelto@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.05-invert-redis-sessions (exit_code=0)
  • 17:13 jelto@cumin1001: START - Cookbook sre.switchdc.mediawiki.05-invert-redis-sessions
  • 17:13 jelto@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.04-switch-mediawiki (exit_code=0)
  • 17:13 jelto@cumin1001: START - Cookbook sre.switchdc.mediawiki.04-switch-mediawiki
  • 17:13 jelto@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.03-set-db-readonly (exit_code=0)
  • 17:13 jelto@cumin1001: START - Cookbook sre.switchdc.mediawiki.03-set-db-readonly
  • 17:12 jelto@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.02-set-readonly (exit_code=0)
  • 17:12 jelto@cumin1001: [DRY-RUN] MediaWiki read-only period starts at: 2021-09-09 17:12:27.974410
  • 17:12 jelto@cumin1001: START - Cookbook sre.switchdc.mediawiki.02-set-readonly
  • 17:08 jelto@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.01-stop-maintenance (exit_code=0)
  • 17:07 jelto@cumin1001: START - Cookbook sre.switchdc.mediawiki.01-stop-maintenance
  • 17:07 jelto@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.00-warmup-caches (exit_code=0)
  • 17:04 jelto@cumin1001: START - Cookbook sre.switchdc.mediawiki.00-warmup-caches
  • 17:04 jelto@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.00-reduce-ttl (exit_code=0)
  • 16:58 jelto@cumin1001: START - Cookbook sre.switchdc.mediawiki.00-reduce-ttl
  • 16:58 jelto@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.00-disable-puppet (exit_code=0)
  • 16:58 jelto@cumin1001: START - Cookbook sre.switchdc.mediawiki.00-disable-puppet
  • 16:57 jelto: start cookbook sre.switchdc.mediawiki eqiad codfw --live-test this will generate some additional SAL logs here
  • 16:41 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:36 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 16:33 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:23 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 16:10 volans@cumin1001: END (FAIL) - Cookbook sre.experimental.reimage (exit_code=99) for host sretest1001.eqiad.wmnet
  • 16:00 volans@cumin1001: START - Cookbook sre.experimental.reimage for host sretest1001.eqiad.wmnet
  • 15:34 volans@cumin1001: END (FAIL) - Cookbook sre.experimental.reimage (exit_code=99) for host sretest1001.eqiad.wmnet
  • 15:32 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 15:29 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 15:28 dancy@deploy1002: Synchronized .pipeline/config.yaml: Config: pipeline: add comment redirecting to correct file (duration: 00m 59s)
  • 15:24 volans@cumin1001: START - Cookbook sre.experimental.reimage for host sretest1001.eqiad.wmnet
  • 14:47 mutante: planet - deleting all state and lock files for the "en" feeds (T285251 T289984)
  • 14:34 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mx2002.wikimedia.org
  • 14:31 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host mx2002.wikimedia.org
  • 14:25 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Muehlenhoff out of all services on: 2 hosts
  • 14:25 jmm@cumin2002: START - Cookbook sre.idm.logout Logging Muehlenhoff out of all services on: 2 hosts
  • 14:19 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Muehlenhoff out of all services on: 2 hosts
  • 14:19 jmm@cumin2002: START - Cookbook sre.idm.logout Logging Muehlenhoff out of all services on: 2 hosts
  • 14:11 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=maps1007.eqiad.wmnet
  • 13:48 jmm@cumin2002: END (ERROR) - Cookbook sre.hosts.reboot-single (exit_code=97) for host mx2002.wikimedia.org
  • 13:16 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 13:11 mutante: planet1002 - re-enabling disabled puppet
  • 13:06 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Muehlenhoff out of all services on: 2 hosts
  • 13:06 jmm@cumin2002: START - Cookbook sre.idm.logout Logging Muehlenhoff out of all services on: 2 hosts
  • 13:05 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Muehlenhoff out of all services on: 2 hosts
  • 13:05 jmm@cumin2002: START - Cookbook sre.idm.logout Logging Muehlenhoff out of all services on: 2 hosts
  • 13:03 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Muehlenhoff out of all services on: 2 hosts
  • 13:03 jmm@cumin2002: START - Cookbook sre.idm.logout Logging Muehlenhoff out of all services on: 2 hosts
  • 13:01 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 12:56 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 10:49 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=maps1007.eqiad.wmnet
  • 10:48 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on maps1007.eqiad.wmnet with reason: Resyncing from master
  • 10:48 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 5:00:00 on maps1007.eqiad.wmnet with reason: Resyncing from master
  • 10:48 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=maps1007.eqiad.wmnet
  • 10:48 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=maps1006.eqiad.wmnet
  • 10:47 topranks: Removing peering to old IPs of AS139931 (BSCCL) at Equinix Singapore (cr3-eqsin).
  • 10:45 topranks: Removing peering to AS24218 at Equinix Singapore (cr3-eqsin) - network no longer uses this ASN.
  • 10:22 volans: upgrading spicerack on cumin1001
  • 10:20 volans@cumin2002: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts mc1027.eqiad.wmnet
  • 10:10 volans@cumin2002: START - Cookbook sre.hosts.decommission for hosts mc1027.eqiad.wmnet
  • 09:56 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host mx2002.wikimedia.org
  • 09:47 volans@cumin2002: END (ERROR) - Cookbook sre.hosts.decommission (exit_code=97) for hosts mc1027.eqiad.wmnet
  • 09:46 volans@cumin2002: START - Cookbook sre.hosts.decommission for hosts mc1027.eqiad.wmnet
  • 09:37 godog: swift eqiad add ms-be10[64-67] with initial weight - T290546
  • 09:19 filippo@puppetmaster1001: conftool action : set/pooled=false; selector: dnsdisc=swift-ro,name=eqiad
  • 09:19 filippo@puppetmaster1001: conftool action : set/pooled=false; selector: dnsdisc=swift,name=eqiad
  • 09:15 volans: rebooting sretest1001 to test ipmi reboot via spicerack
  • 09:15 volans@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:20:00 on sretest1001.eqiad.wmnet with reason: testing reboot via ipmi
  • 09:15 volans@cumin2002: START - Cookbook sre.hosts.downtime for 0:20:00 on sretest1001.eqiad.wmnet with reason: testing reboot via ipmi
  • 09:13 btullis@cumin1001: END (PASS) - Cookbook sre.aqs.roll-restart (exit_code=0) for AQS aqs cluster: Roll restart of all AQS's nodejs daemons. - btullis@cumin1001
  • 09:09 btullis@cumin1001: START - Cookbook sre.aqs.roll-restart for AQS aqs cluster: Roll restart of all AQS's nodejs daemons. - btullis@cumin1001
  • 08:59 godog: move swift traffic fully to codfw to rebalance eqiad - T287539
  • 08:59 filippo@puppetmaster1001: conftool action : set/pooled=true; selector: dnsdisc=swift,name=codfw
  • 08:58 filippo@puppetmaster1001: conftool action : set/pooled=true; selector: dnsdisc=swift-ro,name=codfw
  • 08:56 volans: upgrading spicerack on cumin2002 to test the new release
  • 08:50 volans: uploaded spicerack_0.0.59 to apt.wikimedia.org buster-wikimedia,bullseye-wikimedia
  • 08:23 jelto: run ansible change 719041 on gitlab1001
  • 08:13 jelto: run ansible change 719041 on gitlab2001
  • 07:07 dzahn@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host durum1002.eqiad.wmnet
  • 06:47 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm for new host durum1002.eqiad.wmnet
  • 04:37 ryankemper: [WDQS] Dispatched e-mail to the banned user agent (dailymotion)
  • 03:57 ryankemper: [WDQS] Dispatched e-mail to WDQS public mailing list informing them the outage is over; all that's left is the e-mail to the banned UA
  • 03:47 ryankemper: [WDQS] Restarting `wdqs-blazegraph` on `wdqs[2001-2008].codfw.wmnet`; if banning the dailymotion UA was sufficient then servers should come back up healthy and not drop back into deadlock
  • 03:43 ryankemper: [WDQS] Running puppet agent on `wdqs[2001-2008].codfw.wmnet` to roll out https://gerrit.wikimedia.org/r/719753
  • 03:29 ryankemper: [WDQS] There's no clear indication of them being a culprit, but by far the most common user agent is a dailymotion VideocatalogTopic UA (see https://logstash.wikimedia.org/goto/51f238e9010d0220e5d33c6c210be93e)
  • 03:12 bstorm: attempting to start replication on clouddb1017 s1 T290630
  • 03:11 bstorm: stopping and restarting mariadb on clouddb1017 s1
  • 03:04 ryankemper: [WDQS] Dispatched email to Wikidata public mailing list about reduced service availability
  • 02:36 ryankemper: [WDQS] https://grafana.wikimedia.org/d/000000489/wikidata-query-service?viewPanel=7&orgId=1&from=1631152574841&to=1631154942992 shows the availability pattern, anywhere we see missing data (null) represents time that blazegraph was locked up and therefore unable to report metrics
  • 02:34 ryankemper: [WDQS] For context I glanced at `ryankemper@cumin1001:~$ sudo -E cumin 'P{wdqs2*}' 'sudo systemctl status wdqs-blazegraph'` before doing the aforementioned restarts and they'd all last restarted between 25-28 minutes ago
  • 02:33 ryankemper: [WDQS] Restarting `wdqs-blazegraph` across all of `wdqs2*`
  • 00:50 legoktm@deploy1002: Synchronized wmf-config/CommonSettings.php: Don't set default to Score (try #2) (duration: 00m 58s)
  • 00:48 legoktm@deploy1002: Synchronized php-1.37.0-wmf.21/extensions/Score/includes/Score.php: Use the 'score' Shellbox if configured (T290193) (duration: 00m 57s)
  • 00:46 legoktm@deploy1002: Synchronized php-1.37.0-wmf.21/includes/shell/CommandFactory.php: shell: Fix $wgShellboxUrls by passing service name when creating BoxedCommand (T290193) (duration: 00m 58s)
  • 00:45 legoktm@deploy1002: sync-file aborted: shell: Fix $wgShellboxUrls by passing service name when creating BoxedCommand (T290193 (duration: 00m 07s)
  • 00:15 legoktm@deploy1002: Synchronized wmf-config/CommonSettings.php: Remove putenv() for GDFONTPATH (duration: 00m 58s)

2021-09-08

  • 22:34 ryankemper: WDQS] T280247 Ran puppet-agent on `miscweb*` following merge of https://gerrit.wikimedia.org/r/c/wikidata/query/gui-deploy/+/717649
  • 22:24 ryankemper: WDQS] T280247 Ran puppet-agent on `miscweb*` following merge of https://gerrit.wikimedia.org/r/c/wikidata/query/gui-deploy/+/714623
  • 21:55 ryankemper: [WDQS] T280247 Purged varnish to make sure change took effect: `echo 'https://query-preview.wikidata.org/' | mwscript purgeList.php` and `echo 'https://query.wikidata.org/' | mwscript purgeList.php` on `mwmaint1002`
  • 21:53 ryankemper: [WDQS] T280247 Merged https://gerrit.wikimedia.org/r/c/operations/puppet/+/719502 and ran puppet-agent on `miscweb*`
  • 20:49 eileen: civicrm revision changed from 593d01f4fc to 83f514f693, config revision is 23eda8ba3a
  • 20:41 legoktm: Successfully published image docker-registry.discovery.wmnet/php7.2-fpm-multiversion-base:1.0.2
  • 19:25 Krinkle: krinkle@mw1369 Running some benchmarks in Eqiad on load.php
  • 18:27 urbanecm@deploy1002: Synchronized wmf-config/config/itwiki.yaml: 6bcbe61: Italian Wikipedia is now a group 1 wiki (T286664; 2/2) (duration: 00m 58s)
  • 18:26 urbanecm@deploy1002: Synchronized dblists/: 6bcbe61: Italian Wikipedia is now a group 1 wiki (T286664; 1/2) (duration: 00m 58s)
  • 18:10 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: bbefce6: Growth: Remove config that moved on-wiki (T290295) (duration: 00m 58s)
  • 18:03 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 950a377: Stop setting $wgAbuseFilterParserClass (T239990) (duration: 00m 58s)
  • 17:01 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts maps2004.codfw.wmnet
  • 16:53 hnowlan@cumin1001: START - Cookbook sre.hosts.decommission for hosts maps2004.codfw.wmnet
  • 16:52 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts maps2003.codfw.wmnet
  • 16:37 hnowlan@cumin1001: START - Cookbook sre.hosts.decommission for hosts maps2003.codfw.wmnet
  • 16:37 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts maps2001.codfw.wmnet
  • 16:33 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.21/extensions/GrowthExperiments/maintenance/updateMenteeData.php: 796e23c: updateMenteeData.php: Make it possible to force update (duration: 00m 58s)
  • 16:28 ladsgroup@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Turn off jQuery migrate on wikisource wikis (T280944) (duration: 00m 59s)
  • 16:23 hnowlan@cumin1001: START - Cookbook sre.hosts.decommission for hosts maps2001.codfw.wmnet
  • 16:22 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=maps1006.eqiad.wmnet
  • 16:14 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 20:00:00 on maps1006.eqiad.wmnet with reason: Resyncing from master
  • 16:14 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 20:00:00 on maps1006.eqiad.wmnet with reason: Resyncing from master
  • 16:13 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on maps1006.eqiad.wmnet with reason: Resyncing from master
  • 16:13 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 5:00:00 on maps1006.eqiad.wmnet with reason: Resyncing from master
  • 16:13 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=maps1005.eqiad.wmnet
  • 15:43 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1067.eqiad.wmnet
  • 15:43 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1065.eqiad.wmnet
  • 15:43 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1066.eqiad.wmnet
  • 15:41 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1064.eqiad.wmnet
  • 15:38 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be1067.eqiad.wmnet
  • 15:37 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be1065.eqiad.wmnet
  • 15:37 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be1066.eqiad.wmnet
  • 15:37 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be1064.eqiad.wmnet
  • 14:57 marostegui: Retroactive: started to warm up eqiad databaes
  • 14:57 moritzm: installing 4.19.194 kernels on stretch systems with 4.19.x (no reboots yet)
  • 14:54 brennen: gitlab: upgrading gitlab2001, followed by gitlab1001, to 14.2.3 (T289802)
  • 14:53 robh@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on ms-be1067.eqiad.wmnet with reason: REIMAGE
  • 14:51 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be1067.eqiad.wmnet with reason: REIMAGE
  • 14:33 moritzm: installing zeromq3 security updates
  • 13:50 mbsantos@deploy1002: Finished deploy [kartotherian/deploy@eb211ac]: kartotherian: restore v4 maxzoom to z15 (duration: 06m 42s)
  • 13:44 mbsantos@deploy1002: Started deploy [kartotherian/deploy@eb211ac]: kartotherian: restore v4 maxzoom to z15
  • 13:38 brennen: gitlab: upgrading gitlab2001, followed by gitlab1001, to 14.1.5 (T289802)
  • 13:13 brennen: gitlab1001: downtiming alerts for 2.5 hours; upgrading to 14.0.10 (T289802)
  • 12:45 brennen: gitlab: pausing all runners in preparation for upgrade to 14.0.10 (T289802)
  • 11:57 moritzm: installing curl security updates on stretch
  • 11:09 jbond: upload statograph_0.1.2
  • 11:02 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on maps1005.eqiad.wmnet with reason: Resyncing from master
  • 11:01 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 5:00:00 on maps1005.eqiad.wmnet with reason: Resyncing from master
  • 11:01 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=maps1005.eqiad.wmnet
  • 10:06 jelto: upgrade gitlab2001 to gitlab-ce=14.0.10-ce.0
  • 10:03 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on gitlab2001.wikimedia.org with reason: upgrade gitlab2001 to new version https://phabricator.wikmiedia.org/T289802
  • 10:03 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on gitlab2001.wikimedia.org with reason: upgrade gitlab2001 to new version https://phabricator.wikmiedia.org/T289802
  • 09:38 godog: start rollout of prometheus-rsyslog-exporter 0.0.0+git20201008-3 to wikimedia.org - T210137
  • 09:29 godog: start rollout of prometheus-rsyslog-exporter 0.0.0+git20201008-3 to codfw - T210137
  • 09:09 godog: start rollout of prometheus-rsyslog-exporter 0.0.0+git20201008-3 to eqiad - T210137
  • 07:45 godog: start rollout of prometheus-rsyslog-exporter 0.0.0+git20201008-3 to eqsin/esams/ulsfo - T210137
  • 06:46 ryankemper: [WDQS] Manually running puppet-agent on `miscweb2002.codfw.wmnet,miscweb1002.eqiad.wmnet`
  • 06:45 ryankemper: [WDQS] Merged https://gerrit.wikimedia.org/r/c/operations/puppet/+/719185 to rollback query.wikidata.org changes
  • 02:59 eileen: civicrm revision changed from 06ef98593f to 593d01f4fc, config revision is 5f004d94d7
  • 00:00 legoktm: legoktm@lists1001:~$ sudo rm -rf /etc/mailman # cleanup as part of 4869d91b0be / T282303

2021-09-07

  • 23:25 robh@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 23:20 robh@cumin1001: START - Cookbook sre.dns.netbox
  • 23:13 ladsgroup@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Enable UrlShortener everywhere (T267925) (duration: 00m 58s)
  • 23:07 dpifke@deploy1002: Synchronized wmf-config/profiler.php: Config: profiler: use seperate pipeline inside k8s pods (T288165) (duration: 00m 58s)
  • 22:29 cstone: SmashPig revision changed from afd362b163 to 3607b16f83
  • 20:41 ladsgroup@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Set $wgWBRepoSettings['tmpNormalizeDataValues'] on all wikis (T251480) (duration: 00m 59s)
  • 20:31 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 20:27 pt1979@cumin2002: START - Cookbook sre.dns.netbox
  • 17:18 jgiannelos@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'push-notifications' for release 'main' .
  • 17:09 jgiannelos@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'push-notifications' for release 'main' .
  • 17:01 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'push-notifications' for release 'main' .
  • 16:39 moritzm: installing jetty9 security updates on buster
  • 16:30 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5 days, 8:00:00 on planet1002.eqiad.wmnet with reason: known issue
  • 16:30 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 5 days, 8:00:00 on planet1002.eqiad.wmnet with reason: known issue
  • 16:30 dancy@deploy1002: Synchronized README: testing (duration: 00m 59s)
  • 15:18 akosiaris: run_benchmarky.py against mwdebug.svc.codfw.wmnet for performance tests
  • 15:07 akosiaris@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 15:04 jbond: upload python-prometheus-client_0.6.0 to stretch-wikimedia
  • 14:50 mutante: snapshot1015 - manually removed prometheus-puppet-agent-stats from crontab which was sending spam and is now a timer
  • 14:33 mutante: CI - migrating zuul-merger cronjob to systemd timer (contint*)
  • 14:23 XioNoX: re-pool esams-eqiad - T288503
  • 14:23 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on cloudcephosd1024.eqiad.wmnet with reason: REIMAGE
  • 14:23 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd1024.eqiad.wmnet with reason: REIMAGE
  • 14:22 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on cloudcephosd1023.eqiad.wmnet with reason: REIMAGE
  • 14:22 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd1023.eqiad.wmnet with reason: REIMAGE
  • 14:17 marostegui: No more db maintenance on eqiad T288594
  • 14:08 mutante: alert1001 - temp disabled puppet, stopped icinga-wm
  • 14:07 mutante: temp killed icinga-wm because of flooding
  • 14:01 Emperor: removing pc2010 from orchestrator T289117
  • 13:59 Emperor: removing pc2010 from tendril and zarcillo T289117
  • 13:57 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 13:57 XioNoX: drain esams-eqiad for circuit maintenance - T288503
  • 13:54 pt1979@cumin2002: START - Cookbook sre.dns.netbox
  • 13:51 jayme: uncordoned kubestage2001
  • 13:50 jiji@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 13:49 mutante: mw2264 - scap pulled and repooled after T290242
  • 13:49 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2264.codfw.wmnet
  • 13:43 jiji@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 13:40 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts pc2010.codfw.wmnet
  • 13:25 mvernon@cumin1001: START - Cookbook sre.hosts.decommission for hosts pc2010.codfw.wmnet
  • 13:21 Emperor: removing pc2009 from orchestrator T289116
  • 13:21 Emperor: removing pc2009 from tendril and zarcillo T289116
  • 13:02 marostegui@cumin1001: dbctl commit (dc=all): 'fix s8 weights T288594', diff saved to https://phabricator.wikimedia.org/P17248 and previous config saved to /var/cache/conftool/dbconfig/20210907-130244-marostegui.json
  • 12:59 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts pc2009.codfw.wmnet
  • 12:51 mvernon@deploy1002: Synchronized wmf-config/ProductionServices.php: Remove old decommissioned pc hosts T284825 (duration: 01m 02s)
  • 12:45 mvernon@cumin1001: START - Cookbook sre.hosts.decommission for hosts pc2009.codfw.wmnet
  • 12:27 marostegui@cumin1001: dbctl commit (dc=all): 'fix s1 weights T288594', diff saved to https://phabricator.wikimedia.org/P17247 and previous config saved to /var/cache/conftool/dbconfig/20210907-122747-marostegui.json
  • 12:27 marostegui@cumin1001: dbctl commit (dc=all): 'fix s1 weights T288594', diff saved to https://phabricator.wikimedia.org/P17246 and previous config saved to /var/cache/conftool/dbconfig/20210907-122708-marostegui.json
  • 11:46 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for 6 hosts
  • 11:46 btullis@cumin1001: START - Cookbook sre.hosts.remove-downtime for 6 hosts
  • 11:36 awight: EU backport complete
  • 11:33 awight@deploy1002: Synchronized php-1.37.0-wmf.21/extensions/CodeMirror/extension.json: Backport: Change line numbers default to null (T290226) (duration: 00m 59s)
  • 11:28 awight@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Set template namespace for code mirror line numbering (T290226) (duration: 00m 59s)
  • 10:51 Emperor: removing pc2008 from orchestrator T289115
  • 10:49 Emperor: removing pc2008 from tendril and zarcillo T289115
  • 10:46 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts pc2008.codfw.wmnet
  • 10:35 mvernon@cumin1001: START - Cookbook sre.hosts.decommission for hosts pc2008.codfw.wmnet
  • 10:29 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5 days, 0:00:00 on 6 hosts with reason: commissioning aqs_new hosts
  • 10:29 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 5 days, 0:00:00 on 6 hosts with reason: commissioning aqs_new hosts
  • 10:29 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5 days, 0:00:00 on aqs1010.eqiad.wmnet with reason: commissioning aqs_new hosts
  • 10:29 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 5 days, 0:00:00 on aqs1010.eqiad.wmnet with reason: commissioning aqs_new hosts
  • 10:27 Emperor: removing pc1010 from orchestrator T289122
  • 10:22 Emperor: removing pc1010 from tendril and zarcillo T289122
  • 10:15 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts pc1010.eqiad.wmnet
  • 10:02 mvernon@cumin1001: START - Cookbook sre.hosts.decommission for hosts pc1010.eqiad.wmnet
  • 09:46 Emperor: removing pc1009 from orchestrator T289120
  • 09:26 Emperor: removing pc1009 from tendril and zarcillo T289120
  • 09:25 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts pc1009.eqiad.wmnet
  • 09:16 mvernon@cumin1001: START - Cookbook sre.hosts.decommission for hosts pc1009.eqiad.wmnet
  • 08:57 elukey@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 08:53 elukey@cumin1001: START - Cookbook sre.dns.netbox
  • 08:51 Emperor: removing pc1008 from orchestrator T289119
  • 08:44 Emperor: removing pc1008 from tendril and zarcillo T289119
  • 08:42 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts pc1008.eqiad.wmnet
  • 08:31 mvernon@cumin1001: START - Cookbook sre.hosts.decommission for hosts pc1008.eqiad.wmnet
  • 08:29 marostegui@cumin1001: dbctl commit (dc=all): 'More weight for db2090 into API T288803', diff saved to https://phabricator.wikimedia.org/P17241 and previous config saved to /var/cache/conftool/dbconfig/20210907-082952-marostegui.json
  • 08:25 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
  • 08:25 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
  • 08:25 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
  • 08:24 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
  • 08:02 marostegui@cumin1001: dbctl commit (dc=all): 'db2090 (re)pooling @ 100%: Slowly repool T288803', diff saved to https://phabricator.wikimedia.org/P17240 and previous config saved to /var/cache/conftool/dbconfig/20210907-080230-root.json
  • 07:52 kormat@cumin1001: dbctl commit (dc=all): 'db2118 (re)pooling @ 100%: reimage to buster (now with fixed pool config) T288244', diff saved to https://phabricator.wikimedia.org/P17239 and previous config saved to /var/cache/conftool/dbconfig/20210907-075235-kormat.json
  • 07:49 marostegui@cumin1001: dbctl commit (dc=all): 'More weight for db2090 into API T288803', diff saved to https://phabricator.wikimedia.org/P17238 and previous config saved to /var/cache/conftool/dbconfig/20210907-074901-marostegui.json
  • 07:47 marostegui@cumin1001: dbctl commit (dc=all): 'db2090 (re)pooling @ 75%: Slowly repool T288803', diff saved to https://phabricator.wikimedia.org/P17237 and previous config saved to /var/cache/conftool/dbconfig/20210907-074726-root.json
  • 07:37 kormat@cumin1001: dbctl commit (dc=all): 'db2118 (re)pooling @ 75%: reimage to buster (now with fixed pool config) T288244', diff saved to https://phabricator.wikimedia.org/P17236 and previous config saved to /var/cache/conftool/dbconfig/20210907-073731-kormat.json
  • 07:37 godog: +100G for prometheus/k8s codfw
  • 07:34 marostegui@cumin1001: dbctl commit (dc=all): 'Start to pool db2090 into API T288803', diff saved to https://phabricator.wikimedia.org/P17235 and previous config saved to /var/cache/conftool/dbconfig/20210907-073436-marostegui.json
  • 07:32 marostegui@cumin1001: dbctl commit (dc=all): 'db2090 (re)pooling @ 50%: Slowly repool T288803', diff saved to https://phabricator.wikimedia.org/P17234 and previous config saved to /var/cache/conftool/dbconfig/20210907-073222-root.json
  • 07:22 kormat@cumin1001: dbctl commit (dc=all): 'db2118 (re)pooling @ 50%: reimage to buster (now with fixed pool config) T288244', diff saved to https://phabricator.wikimedia.org/P17233 and previous config saved to /var/cache/conftool/dbconfig/20210907-072227-kormat.json
  • 07:17 marostegui@cumin1001: dbctl commit (dc=all): 'db2090 (re)pooling @ 25%: Slowly repool T288803', diff saved to https://phabricator.wikimedia.org/P17232 and previous config saved to /var/cache/conftool/dbconfig/20210907-071719-root.json
  • 07:13 jayme@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
  • 07:13 jayme@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'wikifeeds' for release 'staging' .
  • 07:07 kormat@cumin1001: dbctl commit (dc=all): 'db2118 (re)pooling @ 25%: reimage to buster (now with fixed pool config) T288244', diff saved to https://phabricator.wikimedia.org/P17231 and previous config saved to /var/cache/conftool/dbconfig/20210907-070724-kormat.json
  • 07:07 kormat@cumin1001: dbctl commit (dc=all): 'Fixing db2118's pooling config T288244', diff saved to https://phabricator.wikimedia.org/P17230 and previous config saved to /var/cache/conftool/dbconfig/20210907-070702-kormat.json
  • 07:02 marostegui@cumin1001: dbctl commit (dc=all): 'db2090 (re)pooling @ 10%: Slowly repool T288803', diff saved to https://phabricator.wikimedia.org/P17229 and previous config saved to /var/cache/conftool/dbconfig/20210907-070215-root.json
  • 06:47 marostegui@cumin1001: dbctl commit (dc=all): 'db2090 (re)pooling @ 5%: Slowly repool T288803', diff saved to https://phabricator.wikimedia.org/P17228 and previous config saved to /var/cache/conftool/dbconfig/20210907-064711-root.json
  • 05:15 marostegui: Optimize eowiki.flaggedtemplates in eqiad T290057
  • 05:15 marostegui: Optimize vecwiki.flaggedtemplates in eqiad T290057
  • 05:14 marostegui: Optimize kawiki.flaggedtemplates in eqiad T290057

2021-09-06

  • 23:52 tstarling@deploy1002: Synchronized php-1.37.0-wmf.21/extensions/SecurePoll/includes/Talliers/STVTallier.php: T290000 (duration: 00m 58s)
  • 16:14 Amir1: Deployed patch for T290394
  • 15:01 Emperor: removing pc1007 from orchestrator T289118
  • 15:00 jiji@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 14:53 kormat@cumin1001: dbctl commit (dc=all): 'db2118 (re)pooling @ 25%: reimage to buster T288244', diff saved to https://phabricator.wikimedia.org/P17226 and previous config saved to /var/cache/conftool/dbconfig/20210906-145341-kormat.json
  • 14:50 Emperor: removing pc1007 from tendril and zarcillo T289118
  • 14:45 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts pc1007.eqiad.wmnet
  • 14:45 volans@cumin1001: END (ERROR) - Cookbook sre.hosts.decommission (exit_code=97) for hosts mc1026.eqiad.wmnet
  • 14:44 volans@cumin1001: START - Cookbook sre.hosts.decommission for hosts mc1026.eqiad.wmnet
  • 14:36 volans@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts mc1027.eqiad.wmnet
  • 14:35 mvernon@cumin1001: START - Cookbook sre.hosts.decommission for hosts pc1007.eqiad.wmnet
  • 14:22 volans@cumin1001: START - Cookbook sre.hosts.decommission for hosts mc1027.eqiad.wmnet
  • 14:19 ladsgroup@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Set permission of creating short url to everyone everywhere (T267921 T267925), Part II (duration: 00m 57s)
  • 14:17 ladsgroup@deploy1002: Synchronized wmf-config/CommonSettings.php: Config: Set permission of creating short url to everyone everywhere (T267921 T267925), Part I (duration: 00m 59s)
  • 14:12 moritzm: installing postgres 9.6 security updates
  • 14:05 gehel: re-pooling wdqs1007, catched up on lag
  • 13:56 jbond: update facter networking fact gerrit:715949
  • 13:51 jiji@deploy1002: Synchronized wmf-config/ProductionServices.php: Config: ProductionServices: fix comment for rdb* servers (duration: 00m 58s)
  • 13:42 moritzm: updated thirdparty/gitlab component to 14.0.10 T284811
  • 13:04 jiji@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 12:42 jiji@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 12:42 akosiaris@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
  • 12:42 akosiaris@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
  • 12:41 akosiaris@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'sync'.
  • 12:40 akosiaris@deploy1002: helmfile [codfw] START helmfile.d/admin 'sync'.
  • 12:29 jiji@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 12:06 godog: silence statograph until thurs on alert1001 - T290425
  • 11:58 urbanecm: [urbanecm@mwmaint2002 ~]$ mwscript renameRestrictions.php --wiki=plwiki 'editor' 'editeditorprotected' # T230103
  • 11:56 urbanecm: [urbanecm@mwmaint2002 ~]$ mwscript renameRestrictions.php --wiki={hewiki,lvwiki,srwiki,srwikibooks} 'autopatrol' 'editautopatrolprotected' # T230103
  • 11:53 urbanecm: [urbanecm@mwmaint2002 ~]$ mwscript renameRestrictions.php --wiki=etwiki 'autopatrol' 'editautopatrolprotected' # T230103
  • 11:50 urbanecm: [urbanecm@mwmaint2002 ~]$ mwscript renameRestrictions.php --wiki=dewiktionary 'autoreviewprotected' 'editautoreviewprotected' # T230103
  • 11:48 urbanecm: [urbanecm@mwmaint2002 ~]$ mwscript renameRestrictions.php --wiki=arwiki 'autoreview' 'editautoreviewprotected' # T230103
  • 11:07 urbanecm: EU B&C window done
  • 11:06 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: c8d7cf8: foundationwiki: Create editor group (T205352) (duration: 00m 57s)
  • 11:04 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: f90862b: Growth: Define wgGEMentorDashboardDiscoveryEnabled (T289054) (duration: 00m 58s)
  • 11:02 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.21/maintenance/renameRestrictions.php: 18e43ec: renameRestrictions.php: Update protected_titles as well (T290398) (duration: 00m 59s)
  • 10:39 volans@cumin1001: END (ERROR) - Cookbook sre.hosts.decommission (exit_code=97) for hosts mc1027.eqiad.wmnet
  • 10:38 volans@cumin1001: START - Cookbook sre.hosts.decommission for hosts mc1027.eqiad.wmnet
  • 10:22 volans@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts mc1027.eqiad.wmnet
  • 10:17 volans@cumin1001: START - Cookbook sre.hosts.decommission for hosts mc1027.eqiad.wmnet
  • 09:22 gehel: depooling wdqs1007, catching up on lag
  • 09:06 gehel: restart blazegraph and updater on wdqs1007
  • 08:46 jbond: update networking fact - gerrit:715943
  • 07:57 godog: fail sdw on ms-be1062, reported errors
  • 07:51 moritzm: installing libssh security updates
  • 07:45 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
  • 07:45 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
  • 07:44 moritzm: installing squashfs-tools security updates
  • 06:56 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
  • 06:56 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
  • 06:28 marostegui: Optimize table mkwiki.flaggedtemplates in eqiad T290057
  • 06:26 marostegui: Optimize table bewiki.flaggedtemplates in eqiad T290057
  • 06:23 marostegui: Optimize table dewiki.flaggedtemplates in eqiad T290057
  • 05:34 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2090.codfw.wmnet with reason: REIMAGE
  • 05:32 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db2090.codfw.wmnet with reason: REIMAGE
  • 05:07 marostegui: Stop replication on db2090 (old s4 master) T289650 T288803
  • 05:05 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2110 (current master) from API T289650', diff saved to https://phabricator.wikimedia.org/P17223 and previous config saved to /var/cache/conftool/dbconfig/20210906-050502-marostegui.json
  • 05:04 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2090 T289650', diff saved to https://phabricator.wikimedia.org/P17222 and previous config saved to /var/cache/conftool/dbconfig/20210906-050419-marostegui.json
  • 05:01 marostegui@cumin1001: dbctl commit (dc=all): 'Promote db2110 to s4 primary and set section read-write T289650', diff saved to https://phabricator.wikimedia.org/P17221 and previous config saved to /var/cache/conftool/dbconfig/20210906-050140-root.json
  • 05:00 marostegui@cumin1001: dbctl commit (dc=all): 'Set s4 codfw as read-only for maintenance - T289650', diff saved to https://phabricator.wikimedia.org/P17220 and previous config saved to /var/cache/conftool/dbconfig/20210906-050048-root.json
  • 05:00 marostegui: Starting s4 codfw failover from db2090 to db2110 - T289650
  • 04:07 marostegui@cumin1001: dbctl commit (dc=all): 'Set db2110 with weight 0 T289650', diff saved to https://phabricator.wikimedia.org/P17219 and previous config saved to /var/cache/conftool/dbconfig/20210906-040740-root.json
  • 04:07 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on 33 hosts with reason: Primary switchover s4 T289650
  • 04:06 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on 33 hosts with reason: Primary switchover s4 T289650

2021-09-05

  • 18:54 urbanecm: wikiadmin@10.192.0.119(ptwiki)> update protected_titles set pt_create_perm='editautoreviewprotected' where pt_create_perm='autoreviewer'; # T290396

2021-09-04

  • 13:35 marostegui@cumin1001: dbctl commit (dc=all): 'db2137:3314 (re)pooling @ 100%: Slowly repool T290374', diff saved to https://phabricator.wikimedia.org/P17217 and previous config saved to /var/cache/conftool/dbconfig/20210904-133532-root.json
  • 13:20 marostegui@cumin1001: dbctl commit (dc=all): 'db2137:3314 (re)pooling @ 75%: Slowly repool T290374', diff saved to https://phabricator.wikimedia.org/P17216 and previous config saved to /var/cache/conftool/dbconfig/20210904-132029-root.json
  • 13:05 marostegui@cumin1001: dbctl commit (dc=all): 'db2137:3314 (re)pooling @ 50%: Slowly repool T290374', diff saved to https://phabricator.wikimedia.org/P17215 and previous config saved to /var/cache/conftool/dbconfig/20210904-130525-root.json
  • 12:50 marostegui@cumin1001: dbctl commit (dc=all): 'db2137:3314 (re)pooling @ 25%: Slowly repool T290374', diff saved to https://phabricator.wikimedia.org/P17214 and previous config saved to /var/cache/conftool/dbconfig/20210904-125021-root.json
  • 12:35 marostegui@cumin1001: dbctl commit (dc=all): 'db2137:3314 (re)pooling @ 10%: Slowly repool T290374', diff saved to https://phabricator.wikimedia.org/P17213 and previous config saved to /var/cache/conftool/dbconfig/20210904-123518-root.json
  • 12:20 marostegui@cumin1001: dbctl commit (dc=all): 'db2137:3314 (re)pooling @ 5%: Slowly repool T290374', diff saved to https://phabricator.wikimedia.org/P17212 and previous config saved to /var/cache/conftool/dbconfig/20210904-122014-root.json
  • 09:04 elukey: restart wmf_auto_restart_rsyslog.service on puppetdb1002
  • 09:00 elukey: `systemctl reset-failed ifup@ens6.service` on puppetdb2002 - T273026
  • 03:02 rzl@cumin2001: dbctl commit (dc=all): 'Depool db2137:3314', diff saved to https://phabricator.wikimedia.org/P17210 and previous config saved to /var/cache/conftool/dbconfig/20210904-030231-rzl.json

2021-09-03

  • 21:49 bd808@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'toolhub' for release 'main' .
  • 20:30 bd808@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'toolhub' for release 'main' .
  • 19:33 krinkle@deploy1002: Finished deploy [integration/docroot@6492b3d]: I48480e89e5f6 (duration: 00m 10s)
  • 19:33 krinkle@deploy1002: Started deploy [integration/docroot@6492b3d]: I48480e89e5f6
  • 19:26 bd808@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'toolhub' for release 'main' .
  • 19:04 ryankemper: T290330 `ryankemper@cumin1001:~$ sudo -E cumin 'P{wdqs2*}' 'sudo rm -fv /etc/cron.hourly/restart-blazegraph'` (Cleaned up manually created crons now that we have [somewhat hacky] systemd timers doing the same job)
  • 17:42 dduvall@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'blubberoid' for release 'production' .
  • 17:40 dduvall@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'blubberoid' for release 'production' .
  • 17:35 dduvall@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'blubberoid' for release 'staging' .
  • 17:17 ryankemper: T290330 Deployed https://gerrit.wikimedia.org/r/c/operations/puppet/+/717508 across `wdqs` fleet; codfw wdqs hosts will restart on average once per hour now to address ongoing availability issues for wdqs codfw
  • 16:32 bd808@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'toolhub' for release 'main' .
  • 16:10 gehel: blazegraph (public cofdfw cluster) will now restart every hour - T290330
  • 15:53 jbond: enable puppet fleet wide to post puppetdb database maintance - T263578
  • 15:21 jbond: create lvm snapshot puppetdb2002_data_snapshot on ganeti2023 - T263578
  • 15:17 jbond: create lvm snapshot puppetdb1002_data_snapshot on ganeti1012 - T263578
  • 15:00 jbond: disable puppet fleet wide to preform puppetdb database maintance - T263578
  • 14:58 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
  • 14:58 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
  • 14:35 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:29 pt1979@cumin2002: START - Cookbook sre.dns.netbox
  • 14:20 mutante: mw2264 - scap pull
  • 14:18 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
  • 14:18 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
  • 13:11 jiji@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts mc1027.eqiad.wmnet
  • 13:10 dcausse: installing openjdk-8-dbg on wdqs2007
  • 13:04 jiji@cumin1001: START - Cookbook sre.hosts.decommission for hosts mc1027.eqiad.wmnet
  • 13:02 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mc1023.eqiad.wmnet
  • 12:48 jiji@cumin1001: START - Cookbook sre.hosts.decommission for hosts mc1023.eqiad.wmnet
  • 12:46 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mc[1035-1036].eqiad.wmnet
  • 12:32 jiji@cumin1001: START - Cookbook sre.hosts.decommission for hosts mc[1035-1036].eqiad.wmnet
  • 12:12 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mc[1028-1032].eqiad.wmnet
  • 12:03 joal@deploy1002: Finished deploy [analytics/refinery@7208d3d] (thin): Analytics hotfix deploy (bis) THIN [analytics/refinery@7208d3d] (duration: 00m 06s)
  • 12:03 joal@deploy1002: Started deploy [analytics/refinery@7208d3d] (thin): Analytics hotfix deploy (bis) THIN [analytics/refinery@7208d3d]
  • 12:03 joal@deploy1002: Finished deploy [analytics/refinery@7208d3d]: Analytics hotfix deploy (bis)[analytics/refinery@7208d3d] (duration: 19m 16s)
  • 11:56 dcausse@deploy1002: Finished deploy [wdqs/wdqs@8361ac9]: ban queries from a generic UA (duration: 19m 21s)
  • 11:44 joal@deploy1002: Started deploy [analytics/refinery@7208d3d]: Analytics hotfix deploy (bis)[analytics/refinery@7208d3d]
  • 11:42 marostegui: Remove flaggedrevs_stats2 and flaggedrevs_stats from enwiki - T289050
  • 11:37 dcausse@deploy1002: Started deploy [wdqs/wdqs@8361ac9]: ban queries from a generic UA
  • 11:36 dcausse@deploy1002: Finished deploy [wdqs/wdqs@8361ac9]: ban queries from a generic UA (duration: 01m 07s)
  • 11:35 dcausse@deploy1002: Started deploy [wdqs/wdqs@8361ac9]: ban queries from a generic UA
  • 10:58 jiji@cumin1001: START - Cookbook sre.hosts.decommission for hosts mc[1028-1032].eqiad.wmnet
  • 10:54 jiji@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts mc[1025-1026].eqiad.wmnet
  • 10:47 joal@deploy1002: Finished deploy [analytics/aqs/deploy@d273fde] (aqs-next): Deploy latest code on AQS new servers - test after failures (duration: 00m 32s)
  • 10:46 joal@deploy1002: Started deploy [analytics/aqs/deploy@d273fde] (aqs-next): Deploy latest code on AQS new servers - test after failures
  • 10:45 joal@deploy1002: deploy aborted: Deploy latest code on AQS new servers - test after failures (duration: 00m 05s)
  • 10:45 joal@deploy1002: Started deploy [analytics/aqs/deploy@d273fde] (aqs-test): Deploy latest code on AQS new servers - test after failures
  • 10:29 hnowlan@deploy1002: Finished deploy [analytics/aqs/deploy@d273fde] (aqs-next): deploying aqs to inactive aqs-next hosts (duration: 00m 03s)
  • 10:29 hnowlan@deploy1002: Started deploy [analytics/aqs/deploy@d273fde] (aqs-next): deploying aqs to inactive aqs-next hosts
  • 10:22 hnowlan@deploy1002: Finished deploy [analytics/aqs/deploy@d273fde] (aqs-next): deploying aqs to inactive aqs-next hosts (duration: 00m 55s)
  • 10:21 hnowlan@deploy1002: Started deploy [analytics/aqs/deploy@d273fde] (aqs-next): deploying aqs to inactive aqs-next hosts
  • 10:17 hnowlan@deploy1002: Finished deploy [analytics/aqs/deploy@d273fde] (aqs-next): deploying aqs to inactive aqs-next hosts (duration: 00m 36s)
  • 10:16 hnowlan@deploy1002: Started deploy [analytics/aqs/deploy@d273fde] (aqs-next): deploying aqs to inactive aqs-next hosts
  • 10:08 hnowlan@deploy1002: Finished deploy [analytics/aqs/deploy@d273fde] (aqs-next): deploying aqs to inactive aqs-next hosts (duration: 00m 45s)
  • 10:08 hnowlan@deploy1002: Started deploy [analytics/aqs/deploy@d273fde] (aqs-next): deploying aqs to inactive aqs-next hosts
  • 10:05 hnowlan@deploy1002: Finished deploy [analytics/aqs/deploy@d273fde] (aqs-next): deploying aqs to inactive aqs-next hosts (duration: 00m 36s)
  • 10:04 hnowlan@deploy1002: Started deploy [analytics/aqs/deploy@d273fde] (aqs-next): deploying aqs to inactive aqs-next hosts
  • 10:02 hnowlan@deploy1002: Finished deploy [analytics/aqs/deploy@d273fde] (aqs-next): deploying aqs to inactive aqs-next hosts (duration: 01m 25s)
  • 10:01 hnowlan@deploy1002: Started deploy [analytics/aqs/deploy@d273fde] (aqs-next): deploying aqs to inactive aqs-next hosts
  • 10:00 hnowlan@deploy1002: Finished deploy [analytics/aqs/deploy@d273fde] (aqs-next): deploying aqs to inactive aqs-next hosts (duration: 01m 53s)
  • 09:58 hnowlan@deploy1002: Started deploy [analytics/aqs/deploy@d273fde] (aqs-next): deploying aqs to inactive aqs-next hosts
  • 09:57 hnowlan@deploy1002: Finished deploy [analytics/aqs/deploy@d273fde] (aqs-next): deploying aqs to inactive aqs-next hosts (duration: 00m 09s)
  • 09:57 hnowlan@deploy1002: Started deploy [analytics/aqs/deploy@d273fde] (aqs-next): deploying aqs to inactive aqs-next hosts
  • 09:32 joal@deploy1002: Finished deploy [analytics/refinery@4ff8979] (thin): Analytics hotfix deploy THIN [analytics/refinery@4ff8979] (duration: 00m 07s)
  • 09:32 joal@deploy1002: Started deploy [analytics/refinery@4ff8979] (thin): Analytics hotfix deploy THIN [analytics/refinery@4ff8979]
  • 09:26 joal@deploy1002: Finished deploy [analytics/refinery@4ff8979]: Analytics hotfix deploy [analytics/refinery@4ff8979] (duration: 17m 36s)
  • 09:25 jiji@cumin1001: START - Cookbook sre.hosts.decommission for hosts mc[1025-1026].eqiad.wmnet
  • 09:15 jelto@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
  • 09:14 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mc1022.eqiad.wmnet
  • 09:13 jelto@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
  • 09:09 akosiaris@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mathoid' for release 'production' .
  • 09:09 jelto@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
  • 09:09 joal@deploy1002: Started deploy [analytics/refinery@4ff8979]: Analytics hotfix deploy [analytics/refinery@4ff8979]
  • 09:08 jelto@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
  • 09:06 akosiaris@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mathoid' for release 'production' .
  • 09:03 jelto@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
  • 09:03 jelto@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
  • 08:53 jelto@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 08:52 jelto@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 08:45 ema: cp-eqsin: clean apt cache to free up some space T290305
  • 08:45 jiji@cumin1001: START - Cookbook sre.hosts.decommission for hosts mc1022.eqiad.wmnet
  • 08:23 akosiaris@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'mathoid' for release 'staging' .
  • 07:43 legoktm: uploaded pygments 2.10.0+dfsg-1~wmf1 to apt.wm.o in component/pygments
  • 07:42 marostegui: Remove flaggedrevs_stats2 and flaggedrevs_stats from severak s3 wikis - T289050
  • 07:10 godog: more weight to ms-be20[62-65] - T288458
  • 07:01 marostegui@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 06:57 marostegui@cumin1001: START - Cookbook sre.dns.netbox
  • 06:45 elukey: run `apt-get clean` on cp5012 to free some space (94% of the root partition used)
  • 06:12 marostegui@cumin1001: dbctl commit (dc=all): 'db2138:3314 (re)pooling @ 100%: Slowly repool after reimage T288803', diff saved to https://phabricator.wikimedia.org/P17203 and previous config saved to /var/cache/conftool/dbconfig/20210903-061204-root.json
  • 06:11 marostegui@cumin1001: dbctl commit (dc=all): 'db2138:3312 (re)pooling @ 100%: Slowly repool after reimage T288803', diff saved to https://phabricator.wikimedia.org/P17202 and previous config saved to /var/cache/conftool/dbconfig/20210903-061138-root.json
  • 05:57 marostegui@cumin1001: dbctl commit (dc=all): 'db2138:3314 (re)pooling @ 75%: Slowly repool after reimage T288803', diff saved to https://phabricator.wikimedia.org/P17201 and previous config saved to /var/cache/conftool/dbconfig/20210903-055700-root.json
  • 05:56 marostegui@cumin1001: dbctl commit (dc=all): 'db2138:3312 (re)pooling @ 75%: Slowly repool after reimage T288803', diff saved to https://phabricator.wikimedia.org/P17200 and previous config saved to /var/cache/conftool/dbconfig/20210903-055635-root.json
  • 05:41 marostegui@cumin1001: dbctl commit (dc=all): 'db2138:3314 (re)pooling @ 50%: Slowly repool after reimage T288803', diff saved to https://phabricator.wikimedia.org/P17199 and previous config saved to /var/cache/conftool/dbconfig/20210903-054157-root.json
  • 05:41 marostegui@cumin1001: dbctl commit (dc=all): 'db2138:3312 (re)pooling @ 50%: Slowly repool after reimage T288803', diff saved to https://phabricator.wikimedia.org/P17198 and previous config saved to /var/cache/conftool/dbconfig/20210903-054131-root.json
  • 05:30 marostegui@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts pc2007.codfw.wmnet
  • 05:26 marostegui@cumin1001: dbctl commit (dc=all): 'db2138:3314 (re)pooling @ 25%: Slowly repool after reimage T288803', diff saved to https://phabricator.wikimedia.org/P17196 and previous config saved to /var/cache/conftool/dbconfig/20210903-052653-root.json
  • 05:26 marostegui@cumin1001: dbctl commit (dc=all): 'db2138:3312 (re)pooling @ 25%: Slowly repool after reimage T288803', diff saved to https://phabricator.wikimedia.org/P17195 and previous config saved to /var/cache/conftool/dbconfig/20210903-052628-root.json
  • 05:20 marostegui@cumin1001: START - Cookbook sre.hosts.decommission for hosts pc2007.codfw.wmnet
  • 05:11 marostegui@cumin1001: dbctl commit (dc=all): 'db2138:3314 (re)pooling @ 10%: Slowly repool after reimage T288803', diff saved to https://phabricator.wikimedia.org/P17194 and previous config saved to /var/cache/conftool/dbconfig/20210903-051149-root.json
  • 05:11 marostegui@cumin1001: dbctl commit (dc=all): 'db2138:3312 (re)pooling @ 10%: Slowly repool after reimage T288803', diff saved to https://phabricator.wikimedia.org/P17193 and previous config saved to /var/cache/conftool/dbconfig/20210903-051124-root.json
  • 05:04 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2138 for upgrade', diff saved to https://phabricator.wikimedia.org/P17192 and previous config saved to /var/cache/conftool/dbconfig/20210903-050423-marostegui.json
  • 00:31 tgr@deploy1002: Synchronized php-1.37.0-wmf.21/extensions/GrowthExperiments/maintenance/fixLinkRecommendationData.php: Backport: fixLinkRecommendationData: Try harder to avoid >10K result sets (T284531) (duration: 00m 58s)
  • 00:25 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 00:23 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .

2021-09-02

  • 23:12 thcipriani@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Adding wordmark for ptwikinews mobile and desktop skins (T281591) Part II (duration: 00m 57s)
  • 23:11 thcipriani@deploy1002: Synchronized static/images/mobile/copyright/wikinews-wordmark-pt.svg: Config: Adding wordmark for ptwikinews mobile and desktop skins (T281591) Part I (duration: 01m 14s)
  • 21:47 bd808@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'toolhub' for release 'main' .
  • 21:37 bd808@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'toolhub' for release 'main' .
  • 21:17 bd808@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'toolhub' for release 'main' .
  • 19:57 ejegg: updated fundraising CiviCRM from 7ac13753c7 to 06ef98593f
  • 19:49 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 19:48 jiji@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts mc1021.eqiad.wmnet
  • 19:45 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 19:40 jiji@cumin1001: START - Cookbook sre.hosts.decommission for hosts mc1021.eqiad.wmnet
  • 19:28 twentyafterfour@deploy1002: rebuilt and synchronized wikiversions files: all wikis to 1.37.0-wmf.21 refs T281162
  • 18:31 ryankemper: [WCQS] `wcqs100[1-3],wcqs200[1-3]` downtimed until `2021-09-09 20:29:55` (UTC)
  • 18:28 ryankemper: [WCQS] Merged & deployed https://gerrit.wikimedia.org/r/c/operations/puppet/+/713946, going to suppress icinga alerts on `wcqs*` hosts because these are still in the process of being spun up properly and aren't serving traffic or anything
  • 18:24 ryankemper@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'linkrecommendation' for release 'external' .
  • 18:24 ryankemper@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'linkrecommendation' for release 'internal' .
  • 18:20 ryankemper@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'linkrecommendation' for release 'internal' .
  • 18:20 ryankemper@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'linkrecommendation' for release 'external' .
  • 17:06 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 17:03 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 16:57 akosiaris@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 16:18 jiji@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 16:09 jiji@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 16:04 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mc1020.eqiad.wmnet
  • 15:53 jiji@cumin1001: START - Cookbook sre.hosts.decommission for hosts mc1020.eqiad.wmnet
  • 15:40 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mc1019.eqiad.wmnet
  • 15:31 dzahn@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'miscweb' for release 'main' .
  • 15:28 dzahn@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'miscweb' for release 'main' .
  • 15:26 jiji@cumin1001: START - Cookbook sre.hosts.decommission for hosts mc1019.eqiad.wmnet
  • 15:16 jiji@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=99) for hosts mc1033.eqiad.wmnet
  • 15:15 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mc1034.eqiad.wmnet
  • 15:04 marostegui@cumin1001: dbctl commit (dc=all): 'db2136 (re)pooling @ 100%: Slowly repool after reimage T288803', diff saved to https://phabricator.wikimedia.org/P17178 and previous config saved to /var/cache/conftool/dbconfig/20210902-150412-root.json
  • 14:50 jiji@cumin1001: START - Cookbook sre.hosts.decommission for hosts mc1034.eqiad.wmnet
  • 14:49 marostegui@cumin1001: dbctl commit (dc=all): 'db2136 (re)pooling @ 75%: Slowly repool after reimage T288803', diff saved to https://phabricator.wikimedia.org/P17177 and previous config saved to /var/cache/conftool/dbconfig/20210902-144908-root.json
  • 14:49 jiji@cumin1001: START - Cookbook sre.hosts.decommission for hosts mc1033.eqiad.wmnet
  • 14:47 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:44 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 14:39 jayme@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'cxserver' for release 'production' .
  • 14:38 jayme@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'cxserver' for release 'production' .
  • 14:38 jayme@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'cxserver' for release 'staging' .
  • 14:35 jayme@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
  • 14:34 marostegui@cumin1001: dbctl commit (dc=all): 'db2136 (re)pooling @ 50%: Slowly repool after reimage T288803', diff saved to https://phabricator.wikimedia.org/P17176 and previous config saved to /var/cache/conftool/dbconfig/20210902-143405-root.json
  • 14:33 jayme@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
  • 14:32 jayme@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'wikifeeds' for release 'staging' .
  • 14:22 moritzm: installing exiv2 security updates
  • 14:19 marostegui@cumin1001: dbctl commit (dc=all): 'db2136 (re)pooling @ 25%: Slowly repool after reimage T288803', diff saved to https://phabricator.wikimedia.org/P17175 and previous config saved to /var/cache/conftool/dbconfig/20210902-141901-root.json
  • 14:13 moritzm: installing ffmpeg security updates
  • 14:03 marostegui@cumin1001: dbctl commit (dc=all): 'db2136 (re)pooling @ 10%: Slowly repool after reimage T288803', diff saved to https://phabricator.wikimedia.org/P17174 and previous config saved to /var/cache/conftool/dbconfig/20210902-140357-root.json
  • 14:00 jayme@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'zotero' for release 'production' .
  • 13:57 jayme@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'zotero' for release 'production' .
  • 13:55 jayme@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'zotero' for release 'staging' .
  • 13:48 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2136 for upgrade', diff saved to https://phabricator.wikimedia.org/P17173 and previous config saved to /var/cache/conftool/dbconfig/20210902-134838-marostegui.json
  • 13:44 marostegui@cumin1001: dbctl commit (dc=all): 'db2119 (re)pooling @ 100%: Slowly repool after reimage T288803', diff saved to https://phabricator.wikimedia.org/P17172 and previous config saved to /var/cache/conftool/dbconfig/20210902-134448-root.json
  • 13:42 jayme@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'citoid' for release 'production' .
  • 13:42 jayme@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'citoid' for release 'production' .
  • 13:41 jayme@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'citoid' for release 'staging' .
  • 13:39 jayme@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'termbox' for release 'production' .
  • 13:39 jayme@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'termbox' for release 'production' .
  • 13:38 jayme@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'termbox' for release 'test' .
  • 13:38 jayme@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'termbox' for release 'staging' .
  • 13:36 jayme@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'blubberoid' for release 'production' .
  • 13:35 jayme@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'blubberoid' for release 'production' .
  • 13:29 marostegui@cumin1001: dbctl commit (dc=all): 'db2119 (re)pooling @ 75%: Slowly repool after reimage T288803', diff saved to https://phabricator.wikimedia.org/P17171 and previous config saved to /var/cache/conftool/dbconfig/20210902-132945-root.json
  • 13:29 jayme@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'blubberoid' for release 'staging' .
  • 13:24 jbond@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on sretest1002.eqiad.wmnet with reason: REIMAGE
  • 13:22 jbond@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest1002.eqiad.wmnet with reason: REIMAGE
  • 13:14 jbond: reimage sretest1002 (not sretest1001)
  • 13:14 marostegui@cumin1001: dbctl commit (dc=all): 'db2119 (re)pooling @ 50%: Slowly repool after reimage T288803', diff saved to https://phabricator.wikimedia.org/P17169 and previous config saved to /var/cache/conftool/dbconfig/20210902-131441-root.json
  • 13:14 jbond: reimage sretest1001
  • 12:59 marostegui@cumin1001: dbctl commit (dc=all): 'db2119 (re)pooling @ 25%: Slowly repool after reimage T288803', diff saved to https://phabricator.wikimedia.org/P17168 and previous config saved to /var/cache/conftool/dbconfig/20210902-125937-root.json
  • 12:55 jbond: disable puppet fleet wide to roll out 715728
  • 12:44 marostegui@cumin1001: dbctl commit (dc=all): 'db2119 (re)pooling @ 10%: Slowly repool after reimage T288803', diff saved to https://phabricator.wikimedia.org/P17167 and previous config saved to /var/cache/conftool/dbconfig/20210902-124434-root.json
  • 12:42 marostegui: Upgrade db2119
  • 12:41 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2119 for upgrade', diff saved to https://phabricator.wikimedia.org/P17166 and previous config saved to /var/cache/conftool/dbconfig/20210902-124102-marostegui.json
  • 12:28 marostegui@cumin1001: dbctl commit (dc=all): 'db2106 (re)pooling @ 100%: Slowly repool after reimage T288803', diff saved to https://phabricator.wikimedia.org/P17165 and previous config saved to /var/cache/conftool/dbconfig/20210902-122826-root.json
  • 12:13 marostegui@cumin1001: dbctl commit (dc=all): 'db2106 (re)pooling @ 75%: Slowly repool after reimage T288803', diff saved to https://phabricator.wikimedia.org/P17164 and previous config saved to /var/cache/conftool/dbconfig/20210902-121323-root.json
  • 11:58 marostegui@cumin1001: dbctl commit (dc=all): 'db2106 (re)pooling @ 50%: Slowly repool after reimage T288803', diff saved to https://phabricator.wikimedia.org/P17163 and previous config saved to /var/cache/conftool/dbconfig/20210902-115819-root.json
  • 11:43 marostegui@cumin1001: dbctl commit (dc=all): 'db2106 (re)pooling @ 25%: Slowly repool after reimage T288803', diff saved to https://phabricator.wikimedia.org/P17162 and previous config saved to /var/cache/conftool/dbconfig/20210902-114315-root.json
  • 11:28 marostegui@cumin1001: dbctl commit (dc=all): 'db2106 (re)pooling @ 10%: Slowly repool after reimage T288803', diff saved to https://phabricator.wikimedia.org/P17161 and previous config saved to /var/cache/conftool/dbconfig/20210902-112812-root.json
  • 11:26 urbanecm@deploy1002: Synchronized README: testing scap (duration: 01m 06s)
  • 11:22 jmm@puppetmaster1001: conftool action : set/pooled=inactive; selector: name=mw2264.codfw.wmnet
  • 11:18 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2106 for upgrade', diff saved to https://phabricator.wikimedia.org/P17160 and previous config saved to /var/cache/conftool/dbconfig/20210902-111843-marostegui.json
  • 11:07 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 11:05 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 3ce5d80: dewiki: Enable Growth features for 30% of newcomers (T288420) (duration: 01m 58s)
  • 11:05 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 11:04 urbanecm: metawiki: Server-side page move from VRT -> Volunteer Response Team (T290083)
  • 11:00 marostegui@cumin1001: dbctl commit (dc=all): 'db2073 (re)pooling @ 100%: Slowly repool after reimage T288803', diff saved to https://phabricator.wikimedia.org/P17158 and previous config saved to /var/cache/conftool/dbconfig/20210902-110022-root.json
  • 10:45 marostegui@cumin1001: dbctl commit (dc=all): 'db2073 (re)pooling @ 75%: Slowly repool after reimage T288803', diff saved to https://phabricator.wikimedia.org/P17155 and previous config saved to /var/cache/conftool/dbconfig/20210902-104518-root.json
  • 10:38 mbsantos: REINDEX database gis in maps1009 while it's in depooled state
  • 10:30 marostegui@cumin1001: dbctl commit (dc=all): 'db2073 (re)pooling @ 50%: Slowly repool after reimage T288803', diff saved to https://phabricator.wikimedia.org/P17152 and previous config saved to /var/cache/conftool/dbconfig/20210902-103014-root.json
  • 10:24 mvolz@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'zotero' for release 'production' .
  • 10:23 mvolz@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'zotero' for release 'production' .
  • 10:19 mvolz@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'zotero' for release 'staging' .
  • 10:15 marostegui@cumin1001: dbctl commit (dc=all): 'db2073 (re)pooling @ 25%: Slowly repool after reimage T288803', diff saved to https://phabricator.wikimedia.org/P17150 and previous config saved to /var/cache/conftool/dbconfig/20210902-101511-root.json
  • 10:00 marostegui@cumin1001: dbctl commit (dc=all): 'db2073 (re)pooling @ 10%: Slowly repool after reimage T288803', diff saved to https://phabricator.wikimedia.org/P17147 and previous config saved to /var/cache/conftool/dbconfig/20210902-100007-root.json
  • 09:57 marostegui: Upgrade db2073
  • 09:56 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2073 for upgrade', diff saved to https://phabricator.wikimedia.org/P17145 and previous config saved to /var/cache/conftool/dbconfig/20210902-095601-marostegui.json
  • 09:56 hashar@deploy1002: Finished deploy [integration/docroot@973ac8a]: Support listing files on index pages - T289196 (duration: 00m 07s)
  • 09:55 hashar@deploy1002: Started deploy [integration/docroot@973ac8a]: Support listing files on index pages - T289196
  • 09:20 marostegui@cumin1001: dbctl commit (dc=all): 'db2140 (re)pooling @ 100%: Slowly repool after reimage T288803', diff saved to https://phabricator.wikimedia.org/P17142 and previous config saved to /var/cache/conftool/dbconfig/20210902-092026-root.json
  • 09:05 marostegui@cumin1001: dbctl commit (dc=all): 'db2140 (re)pooling @ 75%: Slowly repool after reimage T288803', diff saved to https://phabricator.wikimedia.org/P17141 and previous config saved to /var/cache/conftool/dbconfig/20210902-090523-root.json
  • 08:55 marostegui: Remove flaggedrevs_stats2 and flaggedrevs_stats from eowiki,idwiki,plwiki,trwiki - T289050
  • 08:50 marostegui@cumin1001: dbctl commit (dc=all): 'db2140 (re)pooling @ 50%: Slowly repool after reimage T288803', diff saved to https://phabricator.wikimedia.org/P17140 and previous config saved to /var/cache/conftool/dbconfig/20210902-085019-root.json
  • 08:35 marostegui@cumin1001: dbctl commit (dc=all): 'db2140 (re)pooling @ 25%: Slowly repool after reimage T288803', diff saved to https://phabricator.wikimedia.org/P17138 and previous config saved to /var/cache/conftool/dbconfig/20210902-083515-root.json
  • 08:20 marostegui@cumin1001: dbctl commit (dc=all): 'db2140 (re)pooling @ 10%: Slowly repool after reimage T288803', diff saved to https://phabricator.wikimedia.org/P17136 and previous config saved to /var/cache/conftool/dbconfig/20210902-082012-root.json
  • 08:14 marostegui: Upgrade db2140
  • 08:14 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2140 for upgrade', diff saved to https://phabricator.wikimedia.org/P17135 and previous config saved to /var/cache/conftool/dbconfig/20210902-081436-marostegui.json
  • 07:57 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1064.eqiad.wmnet
  • 07:51 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be1064.eqiad.wmnet
  • 07:44 marostegui: Remove flaggedrevs_stats2 and flaggedrevs_stats on huwiki - T289050
  • 07:44 marostegui: Remove flaggedrevs_stats2 and flaggedrevs_stats on arwiki - T289050
  • 07:06 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 07:04 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 07:00 marostegui: Stop mariadb on pc2007 before decommissioning T289112
  • 06:59 marostegui@deploy1002: Synchronized wmf-config/ProductionServices.php: Remove pc2007 T289112 (duration: 01m 06s)
  • 06:13 eileen: civicrm revision changed from ad37f21a7d to 7ac13753c7, config revision is 5f004d94d7
  • 04:50 marostegui: Remove flaggedrevs_stats2 and flaggedrevs_stats on ruwiki - T289050
  • 02:09 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 02:06 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 02:05 krinkle@deploy1002: Synchronized php-1.37.0-wmf.21/extensions/WikimediaMaintenance/blameStartupRegistry.php: I63bf19 (duration: 01m 07s)

2021-09-01

  • 23:50 Amir1: mwscript createAndPromote.php --wiki=test2wiki --sysop --force Ladsgroup
  • 23:50 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 23:45 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 23:35 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 23:30 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 23:27 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.20/extensions/GrowthExperiments/maintenance/fixLinkRecommendationData.php: 0bd6542: fixLinkRecommendationData: stay under 10K search limit (T284531) (duration: 01m 06s)
  • 23:27 eileen: civicrm revision changed from 30cd9c1d90 to ad37f21a7d, config revision is 5f004d94d7
  • 23:25 legoktm@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'shellbox-timeline' for release 'main' .
  • 23:24 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.20/extensions/GrowthExperiments/maintenance/fixLinkRecommendationData.php: 3c7d4ec: fixLinkRecommendationData: Allow --db-table in dry-run mode (T283868) (duration: 01m 06s)
  • 23:20 urbanecm@deploy1002: Synchronized wmf-config/extension-list: 91ff927: Enable NearbyPages on beta cluster (T246493; 3/3) (duration: 01m 05s)
  • 23:19 legoktm@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'shellbox-timeline' for release 'main' .
  • 23:18 urbanecm@deploy1002: Synchronized wmf-config/CommonSettings.php: 91ff927: Enable NearbyPages on beta cluster (T246493; 2/3) (duration: 01m 06s)
  • 23:18 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 23:17 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 91ff927: Enable NearbyPages on beta cluster (T246493; 1/3) (duration: 01m 06s)
  • 23:16 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 23:15 legoktm@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'shellbox-timeline' for release 'main' .
  • 23:11 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: bb7d92c: Enable WVUI search on Wikimedia Commons (T287215) (duration: 01m 07s)
  • 23:04 dpifke@deploy1002: Finished deploy [performance/navtiming@63c9d31]: Deploy fix for CpuBenchmark-related Prometheus timeouts T281243 (duration: 00m 06s)
  • 23:04 dpifke@deploy1002: Started deploy [performance/navtiming@63c9d31]: Deploy fix for CpuBenchmark-related Prometheus timeouts T281243
  • 22:44 legoktm@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
  • 22:43 legoktm@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
  • 22:43 legoktm@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
  • 22:43 legoktm@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
  • 22:42 legoktm@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
  • 22:42 legoktm@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
  • 22:40 legoktm@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 22:39 legoktm@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 22:35 legoktm@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
  • 22:34 legoktm@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
  • 22:33 legoktm@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
  • 22:33 legoktm@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
  • 22:32 legoktm@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
  • 22:32 legoktm@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
  • 22:30 legoktm@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 22:29 legoktm@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 20:02 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 20:01 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 19:57 twentyafterfour@deploy1002: Synchronized php: group1 wikis to 1.37.0-wmf.21 refs T281161 (duration: 01m 06s)
  • 19:57 twentyafterfour: twentyafterfour@deploy1002 rebuilt and synchronized wikiversions files: group1 wikis to 1.37.0-wmf.21 refs T281162
  • 19:56 twentyafterfour@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.37.0-wmf.21 refs T281161
  • 18:38 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 18:37 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 18:30 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 18:28 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 18:21 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 18:19 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 18:17 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: fe1ae2e: Growth features: Deploy to 100% of newcomers on small wikis (T289786) (duration: 01m 06s)
  • 18:09 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 18:07 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 27e85b1: nlwiki: Enable link recommendations for all Growth users (T285254) (duration: 01m 06s)
  • 18:07 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 18:05 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 94b1cca: Growth features: Enable for newcomers on two wikis (T285254, T287867) (duration: 01m 09s)
  • 17:31 ejegg: updated payments-wiki from c4d56178d0 to f9cbf95a12
  • 16:23 mforns@deploy1002: Finished deploy [analytics/refinery@ff15071] (thin): Fix for cassandra3 loading THIN [analytics/refinery@ff15071] (duration: 00m 06s)
  • 16:23 mforns@deploy1002: Started deploy [analytics/refinery@ff15071] (thin): Fix for cassandra3 loading THIN [analytics/refinery@ff15071]
  • 16:22 mforns@deploy1002: Finished deploy [analytics/refinery@ff15071]: Fix for cassandra3 loading [analytics/refinery@ff15071] (duration: 26m 58s)
  • 16:06 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on ms-be1066.eqiad.wmnet with reason: REIMAGE
  • 16:04 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be1065.eqiad.wmnet with reason: REIMAGE
  • 16:02 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on ms-be1064.eqiad.wmnet with reason: REIMAGE
  • 16:01 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be1066.eqiad.wmnet with reason: REIMAGE
  • 16:01 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be1065.eqiad.wmnet with reason: REIMAGE
  • 16:00 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be1064.eqiad.wmnet with reason: REIMAGE
  • 15:55 mforns@deploy1002: Started deploy [analytics/refinery@ff15071]: Fix for cassandra3 loading [analytics/refinery@ff15071]
  • 15:35 jiji@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 15:08 dzahn@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'miscweb' for release 'main' .
  • 14:08 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 14:07 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 14:04 godog: move simone-this-dot from wmf to nda ldap group - T289783
  • 13:51 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host rdb2009.codfw.wmnet
  • 13:49 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 13:48 krinkle@deploy1002: Synchronized php-1.37.0-wmf.20/includes/resourceloader: Id7c258 (duration: 01m 06s)
  • 13:46 krinkle@deploy1002: Synchronized php-1.37.0-wmf.21/includes/resourceloader: Id7c258 (duration: 01m 49s)
  • 13:45 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host rdb2009.codfw.wmnet
  • 13:45 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 13:38 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 13:36 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 13:16 dzahn@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'miscweb' for release 'main' .
  • 13:05 mutante: planet1002 - temp removing feed from ad.huikeshoven - seems to cause corrupt state file (T289984)
  • 13:01 dzahn@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'miscweb' for release 'main' .
  • 12:48 godog: s/webperf/navtiming/
  • 12:47 godog: bounce webperf on webperf2001 - T290138
  • 12:41 mutante: planet1002 - rm /etc/rawdog/en/feeds/39a7970f.state (corrupt) T289984
  • 12:38 dzahn@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'miscweb' for release 'main' .
  • 11:19 Krinkle: effie restarted php-fpm on parse2007.codfw.wmnet, ref T290120.
  • 10:21 jbond: start filtering more puppet facts G:715461 - T263578
  • 09:23 marostegui: Drop flaggedrevs_stats and flaggedrevs_stats2 from dewiki T289050
  • 07:45 ema: deploy Varnish SLO dashboard with grr apply slo_dashboards.jsonnet T289036
  • 07:05 XioNoX: pfw NAT and ACLs changes - T290077
  • 06:29 elukey@cumin1001: END (PASS) - Cookbook sre.puppet.renew-cert (exit_code=0) for sodium.wikimedia.org: Renew puppet certificate - elukey@cumin1001
  • 06:28 elukey@cumin1001: START - Cookbook sre.puppet.renew-cert for sodium.wikimedia.org: Renew puppet certificate - elukey@cumin1001
  • 05:25 effie: depool mw2251 mw2255 parse2001 for tests - T280497
  • 04:41 marostegui: Optimize idwiki.flaggedtemplates T290057
  • 04:23 marostegui: Optimize arwiki.flaggedtemplates T290057
  • 04:16 eileen: civicrm revision changed from 7da3eba4f9 to 30cd9c1d90, config revision is 5f004d94d7
  • 00:53 eileen: civicrm revision changed from e567b4c289 to 7da3eba4f9, config revision is 5f004d94d7

2021-08-31

  • 23:41 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 23:38 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 23:38 eileen: civicrm revision changed from 718aa9cad3 to e567b4c289, config revision is 7a24870bc7
  • 23:33 dpifke@deploy1002: Synchronized wmf-config/profiler.php: Revert excimer-k8s pipelines T288165 (duration: 01m 14s)
  • 23:31 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 23:25 dpifke@deploy1002: scap failed: average error rate on 3/6 canaries increased by 10x (rerun with --force to override this check, see https://logstash.wikimedia.org/goto/83629bcb5560d11e61d3085c89dd9ed6 for details)
  • 23:23 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 23:15 mforns: failed deployment of refinery (v0.1.17) to an-test-coord1001.eqiad.wmnet (scap error)
  • 23:15 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 23:14 mforns@deploy1002: Finished deploy [analytics/refinery@a0f039b] (hadoop-test): Regular analytics weekly train TEST v0.1.17 [analytics/refinery@a0f039b] (duration: 13m 42s)
  • 23:09 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 1437d99: Enable link recommendation frontent in dewiki and nlwiki (T288420, T285254) (duration: 01m 06s)
  • 23:08 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 23:07 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 8997ae5: Fix wgDiscussionTools_sourcemodetoolbar settings (duration: 01m 22s)
  • 23:01 mforns@deploy1002: Started deploy [analytics/refinery@a0f039b] (hadoop-test): Regular analytics weekly train TEST v0.1.17 [analytics/refinery@a0f039b]
  • 23:00 mforns@deploy1002: Finished deploy [analytics/refinery@a0f039b] (thin): Regular analytics weekly train THIN v0.1.17 [analytics/refinery@a0f039b] (duration: 00m 07s)
  • 23:00 mforns@deploy1002: Started deploy [analytics/refinery@a0f039b] (thin): Regular analytics weekly train THIN v0.1.17 [analytics/refinery@a0f039b]
  • 23:00 mforns@deploy1002: Finished deploy [analytics/refinery@a0f039b]: Regular analytics weekly train v0.1.17 [analytics/refinery@a0f039b] (duration: 17m 39s)
  • 22:42 mforns@deploy1002: Started deploy [analytics/refinery@a0f039b]: Regular analytics weekly train v0.1.17 [analytics/refinery@a0f039b]
  • 21:58 ejegg: switched Adyen to new Checkout integration
  • 21:41 dduvall@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'blubberoid' for release 'production' .
  • 21:38 dduvall@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'blubberoid' for release 'production' .
  • 21:34 dduvall@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'blubberoid' for release 'staging' .
  • 20:19 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 20:11 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 20:00 twentyafterfour@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.37.0-wmf.21 refs T281161
  • 19:20 brennen: gitlab1001: brief downtime for testing reconfiguration of cas3.session_duration
  • 19:05 twentyafterfour@deploy1002: Finished scap: testwikis wikis to 1.37.0-wmf.21 refs T281161 (duration: 35m 53s)
  • 19:03 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 18:56 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 18:40 ejegg: switched Adyen back to HPP integration
  • 18:38 ejegg: updated payments-wiki from 564daed816 to c4d56178d0, switched Adyen to Checkout integration
  • 18:30 twentyafterfour@deploy1002: Started scap: testwikis wikis to 1.37.0-wmf.21 refs T281161
  • 18:24 twentyafterfour: ran `scap prep 1.37.0-wmf.21` and `scap apply-patches --train 1.37.0-wmf.21` refs T281162
  • 18:05 XioNoX: re-pool eqsin-codfw link
  • 16:18 dcausse@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
  • 16:14 dcausse@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
  • 16:08 hnowlan@deploy1002: Finished deploy [restbase/deploy@09156c2]: fix core Title redirect loop (duration: 16m 02s)
  • 15:52 hnowlan@deploy1002: Started deploy [restbase/deploy@09156c2]: fix core Title redirect loop
  • 14:30 jbond: enable puppet fleet wide to post preform puppetdb maintance T263578
  • 14:29 hashar: Restarting CI Jenkins for plugins upgrade
  • 14:19 ottomata: merged change to service_auto_restart.pp that changes the way service names are matched to be more explicit. tested in deployment prep and nothing bad happened. Logging in case something bad does happen in prod. https://gerrit.wikimedia.org/r/c/operations/puppet/+/697605
  • 14:09 otto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
  • 14:09 otto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
  • 14:07 otto@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
  • 14:05 otto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-analytics' for release 'production' .
  • 14:05 otto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-analytics' for release 'production' .
  • 14:03 otto@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics' for release 'production' .
  • 14:03 otto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
  • 14:02 otto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
  • 14:02 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on puppetdb2002.codfw.wmnet with reason: puppetdb maintance - T289779
  • 14:02 jbond@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on puppetdb2002.codfw.wmnet with reason: puppetdb maintance - T289779
  • 14:02 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on puppetdb1002.eqiad.wmnet with reason: puppetdb maintance - T289779
  • 14:02 jbond@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on puppetdb1002.eqiad.wmnet with reason: puppetdb maintance - T289779
  • 14:01 otto@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
  • 14:00 otto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
  • 13:47 jbond: disable puppet fleet wide to preform puppetdb maintance T263578
  • 13:41 otto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
  • 13:37 urbanecm: Start `mwscript extensions/GrowthExperiments/maintenance/refreshLinkRecommendations.php --wiki=nlwiki --verbose` in a tmux session at mwmaint2002
  • 13:28 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=maps1010.eqiad.wmnet
  • 13:06 dzahn@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'miscweb' for release 'main' .
  • 13:04 jayme@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
  • 12:59 urbanecm: [urbanecm@mwmaint2002 ~]$ sudo -u www-data kill 133282 # stop updateMenteeData.php at frwiki
  • 12:52 jelto: run kubectl scale deployments.apps -n ci mediawiki-bruce --replicas=0 to stop ImagePulling and reduce io on kubestage1001
  • 12:42 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5 days, 8:00:00 on planet1002.eqiad.wmnet with reason: known issue
  • 12:42 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 5 days, 8:00:00 on planet1002.eqiad.wmnet with reason: known issue
  • 11:38 jbond: sudo gnt-instance modify --disk add:size=100G puppetdb2002.codfw.wmnet T263578
  • 11:38 jbond: sudo gnt-instance modify --disk add:size=100G puppetdb1002.eqiad.wmnet T263578
  • 11:37 jbond: sudo gnt-instance modify --disk add:size=100G puppetdb2002.codfw.wmnet
  • 11:35 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 11:33 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 11:31 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.20/extensions/GrowthExperiments/maintenance/updateMenteeData.php: 53a1856: updateMenteeData: Send timing to statsd (T278971) (duration: 00m 57s)
  • 11:11 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 11:09 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 11:07 urbanecm: EU B&C window done
  • 11:06 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: eb482e3: Offer the DiscussionTools reply tool as opt-out setting at 21 phase 2 Wikipedias (T288483) (duration: 00m 57s)
  • 10:38 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on maps1010.eqiad.wmnet with reason: Resyncing from master
  • 10:38 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 5:00:00 on maps1010.eqiad.wmnet with reason: Resyncing from master
  • 10:23 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=maps1010.eqiad.wmnet
  • 10:23 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=maps1008.eqiad.wmnet
  • 10:14 marostegui: Optimize huwiki.flaggedtemplates T290057
  • 10:11 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=maps1008.eqiad.wmnet
  • 08:39 marostegui: Optimize plwiki.flaggedtemplates T290057
  • 08:18 marostegui: Optimize cewiki.flaggedtemplates T290057
  • 08:05 marostegui: Optimize plwiktionary.flaggedtemplates T290057
  • 07:44 marostegui: Optimize ruwiki.flaggedtemplates T290057
  • 07:01 XioNoX: drain eqsin-codfw link
  • 06:56 marostegui@cumin1001: dbctl commit (dc=all): 'db2110 (re)pooling @ 100%: Slowly repool after reimage T288803', diff saved to https://phabricator.wikimedia.org/P17113 and previous config saved to /var/cache/conftool/dbconfig/20210831-065600-root.json
  • 06:40 marostegui@cumin1001: dbctl commit (dc=all): 'db2110 (re)pooling @ 75%: Slowly repool after reimage T288803', diff saved to https://phabricator.wikimedia.org/P17112 and previous config saved to /var/cache/conftool/dbconfig/20210831-064056-root.json
  • 06:25 marostegui@cumin1001: dbctl commit (dc=all): 'db2110 (re)pooling @ 50%: Slowly repool after reimage T288803', diff saved to https://phabricator.wikimedia.org/P17111 and previous config saved to /var/cache/conftool/dbconfig/20210831-062553-root.json
  • 06:10 marostegui@cumin1001: dbctl commit (dc=all): 'db2110 (re)pooling @ 25%: Slowly repool after reimage T288803', diff saved to https://phabricator.wikimedia.org/P17110 and previous config saved to /var/cache/conftool/dbconfig/20210831-061049-root.json
  • 06:06 marostegui: Rename flaggedrevs_stats2 and flaggedrevs_stats on dewiki codfw T289050
  • 05:55 marostegui@cumin1001: dbctl commit (dc=all): 'db2110 (re)pooling @ 10%: Slowly repool after reimage T288803', diff saved to https://phabricator.wikimedia.org/P17109 and previous config saved to /var/cache/conftool/dbconfig/20210831-055546-root.json
  • 03:39 eileen: civicrm revision changed from e89504652a to 718aa9cad3, config revision is cb0a008cad
  • 02:33 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 02:31 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 02:09 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 02:08 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 02:04 eileen: tools revision changed from 14e4125f73 to 1d67c52c12

2021-08-30

  • 23:14 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 23:13 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 23:11 urbanecm: Evening B&C done
  • 23:11 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.20/extensions/GrowthExperiments/includes/Specials/SpecialMentorDashboard.php: 9e2264a: Instrument Special:MentorDashboard (T289369) (duration: 00m 55s)
  • 23:08 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.20/extensions/GrowthExperiments/includes/Specials/SpecialHomepage.php: 9e2264a: Instrument Special:MentorDashboard (T289369) (duration: 00m 57s)
  • 21:56 eileen: civicrm revision changed from 13bf3a02df to e89504652a, config revision is cb0a008cad
  • 19:59 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 19:57 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 19:52 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 9a92e2a: Fix mediawiki.mentor_dashboard.visits definition (duration: 00m 56s)
  • 19:08 tgr: morning deploys done for real
  • 19:06 tgr@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Fix schema definition for mediawiki.mentor_dashboard.visit (T289369) (duration: 00m 56s)
  • 19:05 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 19:03 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 18:49 tgr@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Revert: Add mediawiki.mentor_dashboard.visit schema (T289369) (duration: 00m 26s)
  • 18:48 tgr@deploy1002: Scap failed!: 5/6 canaries failed their endpoint checks(https://en.wikipedia.org)
  • 18:43 tgr: morning deploys done
  • 18:43 tgr@deploy1002: scap failed: average error rate on 3/6 canaries increased by 10x (rerun with --force to override this check, see https://logstash.wikimedia.org/goto/83629bcb5560d11e61d3085c89dd9ed6 for details)
  • 18:41 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 18:38 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 18:26 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 18:24 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 18:22 tgr@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: GrowthExperiments: Enable link recommendation for dewiki and nlwiki (T288420 T285254) (duration: 00m 56s)
  • 18:18 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 18:16 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 18:14 tgr@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: GrowthExperiments: Switch image recommendations flag off (T288797) (duration: 00m 57s)
  • 17:44 ryankemper: [WDQS Deploy] Test query passing on `query.wikidata.org` and icinga looks good. This deploy is done.
  • 17:12 ryankemper: [WDQS Deploy] Restarting `wdqs-categories` across lvs-managed hosts, one node at a time: `sudo -E cumin -b 1 'A:wdqs-all and not A:wdqs-test' 'depool && sleep 45 && systemctl restart wdqs-categories && sleep 45 && pool'`
  • 17:12 ryankemper: [WDQS Deploy] Restarted `wdqs-categories` across both test hosts simultaneously: `sudo -E cumin 'A:wdqs-test' 'systemctl restart wdqs-categories'`
  • 17:12 ryankemper: [WDQS Deploy] Restarted `wdqs-updater` across all hosts, 4 hosts at a time: `sudo -E cumin -b 4 'A:wdqs-all' 'systemctl restart wdqs-updater'`
  • 17:10 ryankemper@deploy1002: Finished deploy [wdqs/wdqs@a17833c]: 0.3.84 (duration: 08m 16s)
  • 17:04 ryankemper: [WDQS Deploy] Tests passing following deploy of `0.3.84` on canary `wdqs1003`; proceeding to rest of fleet
  • 17:02 ryankemper@deploy1002: Started deploy [wdqs/wdqs@a17833c]: 0.3.84
  • 17:02 ryankemper: [WDQS Deploy] Gearing up for deploy of wdqs `0.3.84`. Pre-deploy tests passing on canary `wdqs1003`
  • 17:00 ryankemper: T289483 Pooled `wdqs1013`
  • 16:36 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti1024.eqiad.wmnet with reason: REIMAGE
  • 16:34 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti1024.eqiad.wmnet with reason: REIMAGE
  • 16:20 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on maps1008.eqiad.wmnet with reason: Resyncing from master
  • 16:20 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on maps1008.eqiad.wmnet with reason: Resyncing from master
  • 16:20 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=maps1008.eqiad.wmnet
  • 16:20 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=maps1007.eqiad.wmnet
  • 16:16 sukhe: running authdns-update for Gerrit 715499
  • 14:44 dzahn@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'miscweb' for release 'main' .
  • 14:21 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on maps1007.eqiad.wmnet with reason: Resyncing from master
  • 14:21 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 5:00:00 on maps1007.eqiad.wmnet with reason: Resyncing from master
  • 14:21 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=maps1007.eqiad.wmnet
  • 14:21 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=maps1006.eqiad.wmnet
  • 14:18 akosiaris@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
  • 14:02 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 14:00 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 13:55 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: b170153: Growth mentor dashboard: Enable beta features only on beta wikis (T280307) (duration: 00m 55s)
  • 13:53 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: f1a178e: knwiki: Disable wmgNewUserMessageOnAutoCreate (T289333) (duration: 00m 57s)
  • 13:52 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 13:48 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 13:48 urbanecm@deploy1002: Synchronized wmf-config/CommonSettings.php: 6fbcc93: Add missing edit*protected rights to $wgAvailableRights (duration: 00m 56s)
  • 12:12 Amir1: ladsgroup@mwmaint2002:~$ mwscript extensions/WikimediaMaintenance/filebackend/setZoneAccess.php --wiki=jvwikisource --backend=local-multiwrite (T289860)
  • 11:52 jelto@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
  • 11:51 jelto@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
  • 11:48 jelto@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
  • 11:47 jelto@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
  • 11:31 jelto@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
  • 11:30 jelto@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
  • 10:55 jelto@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 10:53 jelto@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 10:21 dcausse@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
  • 09:51 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 09:46 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 09:34 ladsgroup@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Set $wgIncludejQueryMigrate to false in group0 (T280944) (duration: 00m 57s)
  • 09:01 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on maps1006.eqiad.wmnet with reason: Resyncing from master
  • 09:01 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 5:00:00 on maps1006.eqiad.wmnet with reason: Resyncing from master
  • 09:01 hnowlan@cumin1001: END (FAIL) - Cookbook sre.postgresql.postgres-init (exit_code=99)
  • 09:00 hnowlan@cumin1001: START - Cookbook sre.postgresql.postgres-init
  • 08:59 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=maps1006.eqiad.wmnet
  • 08:57 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=maps1005.eqiad.wmnet
  • 08:57 godog: +100G to prometheus/global in codfw
  • 08:04 vgutierrez: pool cp2027 - T289908
  • 06:53 elukey: drop an-airflow1001's old airflow logs to fix root partition almost filled up
  • 06:38 godog: more weight to ms-be20[62-65] - T288458
  • 05:44 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2110.codfw.wmnet with reason: REIMAGE
  • 05:42 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db2110.codfw.wmnet with reason: REIMAGE
  • 05:23 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2110 for reimage T288803', diff saved to https://phabricator.wikimedia.org/P17105 and previous config saved to /var/cache/conftool/dbconfig/20210830-052336-marostegui.json

2021-08-29

  • 00:15 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 00:13 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .

2021-08-28

  • 23:26 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 23:24 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 09:12 elukey: powercycle cp2027 - OEM event registered in racadm getsel, no tty, no ssh
  • 09:11 elukey@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp2027.codfw.wmnet

2021-08-27

  • 16:46 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on maps1005.eqiad.wmnet with reason: Resyncing from master
  • 16:46 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on maps1005.eqiad.wmnet with reason: Resyncing from master
  • 14:50 akosiaris: stop flink on staging cluster to verify some IOPS starvation issues
  • 14:46 akosiaris@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'sync'.
  • 14:45 akosiaris@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'sync'.
  • 14:44 akosiaris@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'sync'.
  • 14:44 akosiaris@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'sync'.
  • 14:44 akosiaris@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'sync'.
  • 14:44 akosiaris@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'sync'.
  • 14:39 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=maps1005.eqiad.wmnet
  • 14:38 hnowlan@cumin1001: END (FAIL) - Cookbook sre.postgresql.postgres-init (exit_code=99)
  • 14:37 hnowlan@cumin1001: START - Cookbook sre.postgresql.postgres-init
  • 14:30 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on maps1005.eqiad.wmnet with reason: Resyncing from master
  • 14:30 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on maps1005.eqiad.wmnet with reason: Resyncing from master
  • 13:48 dzahn@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'miscweb' for release 'main' .
  • 12:49 mutante: rsynced /srv/org/wikimedia/racktables from miscweb1002 to miscweb2002 (T269746)
  • 12:04 topranks: removing peering to Wave Division Holdings / AS11404 at Equinix Chicago cr2-eqord, AS no longer on exchange.
  • 10:56 akosiaris: sudo cumin 'mw*' 'ip ro ls dev docker0 && sysctl net.ipv4.ip_forward=0' to clear up the docker remnants of the dragonfly evaluation. T286054
  • 10:31 godog: bounce logstash on logstash1007
  • 10:22 elukey: fallback codfw ores to rdb2007 after maintenance
  • 10:18 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host rdb2007.codfw.wmnet
  • 10:12 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host rdb2007.codfw.wmnet
  • 09:49 elukey: restart ores uwsgi/celery workers to failover rdb2007 to rdb2008 (and ease the reboot of rdb2007
  • 09:33 topranks: Running homer against mr1-ulsfo to force OOB interface to 100Mb/full-duplex - T288343
  • 09:25 cmooney@cumin1001: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) homer to cumin2002.codfw.wmnet with reason: Update to expose int type from Netbox - cmooney@cumin1001
  • 09:25 cmooney@cumin1001: START - Cookbook sre.deploy.python-code homer to cumin2002.codfw.wmnet with reason: Update to expose int type from Netbox - cmooney@cumin1001
  • 09:23 cmooney@deploy1002: Finished deploy [homer/deploy@8183056]: Homer update exposing interface type from Netbox - T288343 (duration: 01m 28s)
  • 09:21 cmooney@deploy1002: Started deploy [homer/deploy@8183056]: Homer update exposing interface type from Netbox - T288343
  • 08:05 tstarling@deploy1002: Synchronized php-1.37.0-wmf.20/extensions/SecurePoll/cli/wm-scripts/sendMail.php: (no justification provided) (duration: 00m 56s)
  • 07:49 jayme: stopped kube-apiserver on kubestagemaster2001 for testing
  • 07:49 jayme: stopped kube-apiserver on kubestage2001 for testing
  • 07:00 godog: bounce logstash on logstash1008
  • 06:43 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 06:41 tstarling@deploy1002: Synchronized php-1.37.0-wmf.20/extensions/SecurePoll/cli/wm-scripts/sendMail.php: (no justification provided) (duration: 00m 56s)
  • 06:41 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 00:46 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 00:44 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 00:44 legoktm@deploy1002: Synchronized php-1.37.0-wmf.20/extensions/PageTriage/: Revert backbone.js and underscore.js updates (T289825) (duration: 01m 06s)

2021-08-26

  • 22:06 legoktm: restarted mailman3-web on lists1001 (T289798)
  • 19:09 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 19:08 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 19:02 dancy@deploy1002: rebuilt and synchronized wikiversions files: group2 wikis to 1.37.0-wmf.20
  • 18:59 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 18:54 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 18:26 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 18:24 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 18:19 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 66717bc: Install Extension Quiz on ja.wikibooks (T289383) (duration: 01m 05s)
  • 18:17 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 18:16 sukhe@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on durum1001.eqiad.wmnet with reason: testing out durum
  • 18:16 sukhe@cumin1001: START - Cookbook sre.hosts.downtime for 0:30:00 on durum1001.eqiad.wmnet with reason: testing out durum
  • 18:15 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 18:15 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: cde8891: Install Extension Quiz on fa.wikibooks (T289381) (duration: 01m 07s)
  • 18:09 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 18:07 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 18:03 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: d4340e9: Finalize Event Platform migration of EchoEmail and EchoInteraction (T287210) (duration: 01m 07s)
  • 17:40 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 17:38 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 17:31 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 17:30 dancy@deploy1002: Synchronized php: group1 wikis to 1.37.0-wmf.20 (duration: 01m 05s)
  • 17:30 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 17:29 dancy@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.37.0-wmf.20
  • 17:26 dancy@deploy1002: Synchronized php-1.37.0-wmf.20/includes/page/PageStore.php: Backport: PageStore: Pass query flags to getPageById() too (T289717 T195069) (duration: 01m 05s)
  • 16:27 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
  • 16:26 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
  • 15:56 sukhe: ran homer for Gerrit 715007: Set up BGP peering to durum1001 in eqiad
  • 15:41 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
  • 15:40 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
  • 14:24 Amir1: start of mwscript extensions/FlaggedRevs/maintenance/pruneRevData.php --wiki=plwiki --prune --batch-size=10 --sleep=2 (T289249)
  • 13:19 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve1004.eqiad.wmnet
  • 13:15 klausman@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-serve1004.eqiad.wmnet
  • 13:04 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve1003.eqiad.wmnet
  • 12:59 klausman@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-serve1003.eqiad.wmnet
  • 12:57 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
  • 12:56 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
  • 12:21 sukhe: running puppet initial run on durum1001.eqiad.wmnet - T289536
  • 11:50 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 11:48 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 11:42 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 11:40 Lucas_WMDE: EU backport+config window done
  • 11:40 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 11:39 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.37.0-wmf.19/extensions/Math/src/HookHandlers/ParserHooksHandler.php: Backport: Allow rendering of <math>0</math> (T288846) (duration: 01m 04s)
  • 11:35 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.37.0-wmf.20/extensions/Math/src/HookHandlers/ParserHooksHandler.php: Backport: Allow rendering of <math>0</math> (T288846) (duration: 01m 05s)
  • 11:32 dzahn@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host durum1001.eqiad.wmnet
  • 11:21 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm for new host durum1001.eqiad.wmnet
  • 11:20 nikerabbit@deploy1002: Synchronized wmf-config/CommonSettings.php: Config: Rename wgTranslateBlacklist to wgTranslateDisabledTargetLanguages (duration: 01m 05s)
  • 11:13 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 11:12 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 10:09 vgutierrez: rolling restart of varnishkafka-statsv - T289618
  • 10:07 vgutierrez: disable puppet on cp-text to merge I52cf2a - T286038
  • 10:06 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve1001.eqiad.wmnet
  • 10:01 klausman@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-serve1001.eqiad.wmnet
  • 09:36 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve1002.eqiad.wmnet
  • 09:30 klausman@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-serve1002.eqiad.wmnet
  • 09:24 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve-ctrl1001.eqiad.wmnet
  • 09:21 elukey: elukey@kafka-main1001:~$ kafka acls --add --allow-principal User:CN=varnishkafka --producer --topic statsv - T286038
  • 09:21 klausman@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-serve-ctrl1001.eqiad.wmnet
  • 09:20 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-etcd1003.eqiad.wmnet
  • 09:17 elukey: restart varnishkafka-statsv on cp4032 to pick up TLS settings
  • 09:15 klausman@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-etcd1003.eqiad.wmnet
  • 09:15 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-etcd1002.eqiad.wmnet
  • 09:13 klausman@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-etcd1002.eqiad.wmnet
  • 09:12 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-etcd1001.eqiad.wmnet
  • 09:10 klausman@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-etcd1001.eqiad.wmnet
  • 08:52 vgutierrez: restart varnishkafka-statsv on cp4032
  • 06:59 marostegui@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on db1138.eqiad.wmnet with reason: REIMAGE
  • 06:57 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1138.eqiad.wmnet with reason: REIMAGE
  • 06:48 godog: more weight to ms-be20[62-65] - T288458
  • 06:46 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1160 T288273', diff saved to https://phabricator.wikimedia.org/P17085 and previous config saved to /var/cache/conftool/dbconfig/20210826-064655-marostegui.json
  • 06:43 marostegui: Reimage s4 eqiad master (db1138), expect lag on eqiad T288803
  • 06:37 elukey@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 06:33 elukey@cumin1001: START - Cookbook sre.dns.netbox

2021-08-25

  • 23:23 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 23:22 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 23:20 urbanecm: Evening B&C window completed
  • 23:19 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.20/extensions/GlobalWatchlist/modules/EntryLog.js: 230aec3: GlobalWatchlistEntryLog: fix storing log id (T288385) (duration: 01m 07s)
  • 22:19 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 22:18 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 22:11 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 22:10 legoktm@deploy1002: Synchronized debug.json: List primary DC servers first (T289246) (duration: 01m 04s)
  • 22:09 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 22:07 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.20/extensions/Flow/includes/Content/BoardContent.php: 694b946: BoardContent: Fix deprecation warning (T289625) (duration: 01m 04s)
  • 22:04 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.20/extensions/VisualEditor/includes/ApiVisualEditor.php: 73478bc: Make sure params is an array (T289730) (duration: 01m 04s)
  • 22:00 legoktm@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'shellbox-constraints' for release 'main' .
  • 21:59 brennen: 1.37.0-wmf.20 train status (T281161) blockers should be patched shortly; as we've reached the 15:00 Pacific deploy cutoff for the day, train will resume first thing in US morning
  • 21:58 legoktm@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'shellbox-constraints' for release 'main' .
  • 21:37 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 21:36 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 21:35 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.20/extensions/DiscussionTools/includes/Notifications/EventDispatcher.php: cc04b33: EventDispatcher: Try really, really hard to read from master (T289717) (duration: 01m 04s)
  • 21:32 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.20/includes/page/PageStore.php: 34fb2b9: PageStore: Pass query flags to getPageByName() (T289717; T195069) (duration: 01m 06s)
  • 21:19 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 21:17 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 21:14 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.20/extensions/ConfirmEdit/SimpleCaptcha/SimpleCaptcha.php: 190d8b7: Use Parser::getUserIdentity() instead of ::getUser() in SimpleCaptcha (T289731) (duration: 01m 05s)
  • 21:05 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 21:04 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 21:03 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.20/extensions/ProofreadPage/: 913043a: Fixes exception thrown by FilePagination::getPageNumber (T289728) (duration: 01m 06s)
  • 20:02 brennen: 1.37.0-wmf.20 (T281161) status: blocked at group0; 2/3 blockers have probable patches, all seem to be getting attention, so holding off on blocker mail for now.
  • 19:54 urbanecm: enwikisource: Start server-side upload for one video file (T289698)
  • 19:45 urbanecm: Start server-side upload for ~2 GB tiff file (T289711)
  • 19:31 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 19:29 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 19:28 dancy@deploy1002: Synchronized php: group1 wikis to 1.37.0-wmf.19 (duration: 01m 05s)
  • 19:27 dancy@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.37.0-wmf.19
  • 19:17 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 19:16 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 19:14 dancy@deploy1002: Synchronized php: group1 wikis to 1.37.0-wmf.20 (duration: 01m 04s)
  • 19:13 dancy@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.37.0-wmf.20
  • 19:10 eileen: tools revision changed from 15bfaa7117 to 14e4125f73
  • 18:43 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 18:42 robh@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 18:34 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 18:32 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 18:30 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 18:25 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 18:25 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.20/extensions/Flow/modules/editor/editors/visualeditor/ui/inspectors/mw.flow.ve.ui.MentionInspector.js: dd464b4: Fix reference to renamed abortAllApiRequests method (T289648) (duration: 01m 04s)
  • 18:24 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 18:23 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.20/skins/WikimediaApiPortal/src/Component/NotificationAlertComponent.php: a5bfcc8: Remove call to text() on string (T289692) (duration: 01m 04s)
  • 18:23 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 18:18 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: e7c8c04: Add Wikimedia ES to $wgCopyUploadsDomains whitelist (T289446) (duration: 01m 04s)
  • 18:17 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 18:16 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: e6df080: Disable legacy media dom on a few more wikis (T51097) (duration: 01m 05s)
  • 18:15 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 18:15 robh@cumin1001: START - Cookbook sre.dns.netbox
  • 18:13 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 18:12 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 18:08 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 18:07 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 18:07 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 5182ac8: Disable upcoming DiscussionTools automatic topic subscriptions for now (duration: 01m 04s)
  • 18:06 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 18:06 cmjohnson@cumin1001: END (ERROR) - Cookbook sre.dns.netbox (exit_code=97)
  • 18:05 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 2b14eb5: Enable topic subscriptions as a beta feature on Wikipedias except enwiki (T287801) (duration: 01m 06s)
  • 18:04 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 17:59 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:56 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 17:56 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:53 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 17:52 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:50 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 17:48 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 17:48 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 17:46 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.37.0-wmf.19/extensions/Wikibase/repo/includes/Content/EntityHandler.php: Backport: Set EntityHandler::generateHTMLOnEdit to false (T285987) (duration: 01m 06s)
  • 17:45 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:38 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 17:31 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 17:30 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 17:29 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:27 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.37.0-wmf.20/extensions/Wikibase: Backport: Return normalized snaks from SetClaim, SetReference (T289501) (duration: 01m 11s)
  • 17:24 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 17:14 ryankemper: T289483 Depooled `wdqs1013`
  • 17:14 ladsgroup@deploy1002: Synchronized php-1.37.0-wmf.20/extensions/Wikibase/repo/includes/Content/EntityHandler.php: Backport: Set EntityHandler::generateHTMLOnEdit to false (T285987) (duration: 01m 18s)
  • 17:12 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 17:11 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 15:54 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
  • 15:54 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
  • 15:22 urbanecm: Run `User::newSystemUser( 'MediaWiki default', ['steal' => true] )` in mywiki shell.php session (same issue as T289690)
  • 15:16 urbanecm: [urbanecm@mwmaint2002 ~]$ mwscript extensions/WikimediaMaintenance/createExtensionTables.php --wiki=zh_yuewiki growthexperiments # T289680
  • 15:04 klausman@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve2004.codfw.wmnet
  • 15:04 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 15:02 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.19/extensions/GrowthExperiments/includes/Config/WikiPageConfigWriter.php: 0b9ca1e: WikiPageConfigWriter: Fix `autopatrol` right name (T288886) (duration: 01m 04s)
  • 15:02 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 15:00 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 0ccac4b: Deploy Growth features to 44 new Wikipedias in dark mode (T289680; 3/3) (duration: 01m 06s)
  • 14:59 klausman@cumin2002: START - Cookbook sre.hosts.reboot-single for host ml-serve2004.codfw.wmnet
  • 14:58 klausman@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve2003.codfw.wmnet
  • 14:56 urbanecm@deploy1002: Synchronized wmf-config/config/: 0ccac4b: Deploy Growth features to 44 new Wikipedias in dark mode (T289680; 2/3) (duration: 01m 05s)
  • 14:55 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 14:55 urbanecm@deploy1002: Synchronized dblists/growthexperiments.dblist: 0ccac4b: Deploy Growth features to 44 new Wikipedias in dark mode (T289680; 1/3) (duration: 01m 06s)
  • 14:54 urbanecm@deploy1002: sync-file aborted: 0ccac4b: Deploy Growth features to 44 new Wikipedias in dark mode (T289680) (duration: 00m 01s)
  • 14:54 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 14:52 klausman@cumin2002: START - Cookbook sre.hosts.reboot-single for host ml-serve2003.codfw.wmnet
  • 14:52 klausman@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve2002.codfw.wmnet
  • 14:46 klausman@cumin2002: START - Cookbook sre.hosts.reboot-single for host ml-serve2002.codfw.wmnet
  • 14:42 urbanecm: [urbanecm@mwmaint2002 /srv/mediawiki/php]$ mwscript extensions/GrowthExperiments/maintenance/initWikiConfig.php --wiki=brwiki # T289690, T289680
  • 14:40 urbanecm: Run `User::newSystemUser( 'MediaWiki default', ['steal' => true] )` in brwiki shell.php session (T289690)
  • 14:35 klausman@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve2001.codfw.wmnet
  • 14:32 urbanecm: mwmaint2002: scap pull # clearing temporary config changes
  • 14:30 klausman@cumin2002: START - Cookbook sre.hosts.reboot-single for host ml-serve2001.codfw.wmnet
  • 14:29 klausman@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve-ctrl2002.codfw.wmnet
  • 14:26 klausman@cumin2002: START - Cookbook sre.hosts.reboot-single for host ml-serve-ctrl2002.codfw.wmnet
  • 14:25 klausman@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve-ctrl2001.codfw.wmnet
  • 14:23 urbanecm: [urbanecm@mwmaint2002 /srv/mediawiki/php]$ foreachwikiindblist growthexperiments extensions/GrowthExperiments/maintenance/initWikiConfig.php # T289680 # r714765 applied at mwmaint2002
  • 14:22 urbanecm: Apply https://gerrit.wikimedia.org/r/c/operations/mediawiki-config/+/714765/ at mwmaint2002 temporarily (T289680)
  • 14:21 klausman@cumin2002: START - Cookbook sre.hosts.reboot-single for host ml-serve-ctrl2001.codfw.wmnet
  • 14:20 urbanecm: Create GrowthExperiments DB tables for wikis listed in P17081 (T289680)
  • 14:20 klausman@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-etcd2003.codfw.wmnet
  • 14:18 klausman@cumin2002: START - Cookbook sre.hosts.reboot-single for host ml-etcd2003.codfw.wmnet
  • 14:17 klausman@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-etcd2002.codfw.wmnet
  • 14:15 klausman@cumin2002: START - Cookbook sre.hosts.reboot-single for host ml-etcd2002.codfw.wmnet
  • 14:12 klausman@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-etcd2001.codfw.wmnet
  • 14:10 ejegg: updated fundraising CiviCRM from d60442e119 to 13bf3a02df
  • 14:08 klausman@cumin2001: START - Cookbook sre.hosts.reboot-single for host ml-etcd2001.codfw.wmnet
  • 13:59 volans@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 0:05:00 on cumin2001.codfw.wmnet with reason: apostrophe's test failure
  • 13:59 volans@cumin1001: START - Cookbook sre.hosts.downtime for 0:05:00 on cumin2001.codfw.wmnet with reason: apostrophe's test failure
  • 13:57 ejegg: updated fundraising CiviCRM from 42bb64c608 to d60442e119
  • 13:53 volans@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:05:00 on cumin1001.eqiad.wmnet with reason: apostrophe's test
  • 13:53 volans@cumin2002: START - Cookbook sre.hosts.downtime for 0:05:00 on cumin1001.eqiad.wmnet with reason: apostrophe's test
  • 13:51 volans: upgraded spicerack to 0.0.58 on cumin2002
  • 13:37 joal@deploy1002: Finished deploy [analytics/refinery@7bed213] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@7bed213] (duration: 05m 55s)
  • 13:32 joal@deploy1002: Started deploy [analytics/refinery@7bed213] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@7bed213]
  • 13:31 joal@deploy1002: Finished deploy [analytics/refinery@7bed213] (thin): Regular analytics weekly train THIN [analytics/refinery@7bed213] (duration: 00m 07s)
  • 13:31 joal@deploy1002: Started deploy [analytics/refinery@7bed213] (thin): Regular analytics weekly train THIN [analytics/refinery@7bed213]
  • 13:31 joal@deploy1002: Finished deploy [analytics/refinery@7bed213]: Regular analytics weekly train [analytics/refinery@7bed213] (duration: 20m 25s)
  • 13:10 joal@deploy1002: Started deploy [analytics/refinery@7bed213]: Regular analytics weekly train [analytics/refinery@7bed213]
  • 13:03 jayme: restarted all pods in kube-system namespace in codfw k8s cluster - T289131
  • 12:25 kormat@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 12:21 kormat@cumin1001: START - Cookbook sre.dns.netbox
  • 11:39 jayme: slowly restarting all pods in kube-system namespace in eqiad k8s cluster - T289131
  • 11:38 btullis@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host an-test-coord1002.eqiad.wmnet
  • 11:32 kharlan@deploy1002: Synchronized php-1.37.0-wmf.20/extensions/VisualEditor/includes/ApiVisualEditorEdit.php: Backport: ApiVisualEditorEdit: data-{plugin} is not multi (T289652) (duration: 01m 06s)
  • 11:30 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 11:28 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 11:18 volans: uploaded spicerack_0.0.58 to apt.wikimedia.org buster-wikimedia,bullseye-wikimedia
  • 11:02 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host rdb2010.codfw.wmnet
  • 10:57 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host rdb2010.codfw.wmnet
  • 10:54 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host rdb2008.codfw.wmnet
  • 10:49 ladsgroup@deploy1002: Synchronized php-1.37.0-wmf.19/includes/Storage/DerivedPageDataUpdater.php: Backport: Introduce concept of generateHTMLOnEdit() for ContentHandler (T285987), Part II (duration: 01m 04s)
  • 10:47 ladsgroup@deploy1002: Synchronized php-1.37.0-wmf.19/includes/content/ContentHandler.php: Backport: Introduce concept of generateHTMLOnEdit() for ContentHandler (T285987), Part I (duration: 01m 08s)
  • 10:46 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 10:45 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host rdb2008.codfw.wmnet
  • 10:45 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 10:21 jbond: rolling out openssl updates
  • 10:07 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 10:05 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 10:03 ladsgroup@deploy1002: Synchronized php-1.37.0-wmf.20/includes: Backport: Introduce concept of generateHTMLOnEdit() for ContentHandler (T285987) (duration: 02m 17s)
  • 10:01 mutante: - removed jmads from wmf group
  • 09:59 btullis@cumin1001: START - Cookbook sre.ganeti.makevm for new host an-test-coord1002.eqiad.wmnet
  • 09:49 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host rdb1011.eqiad.wmnet
  • 09:44 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host rdb1011.eqiad.wmnet
  • 09:35 jayme@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
  • 09:35 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host rdb1012.eqiad.wmnet
  • 09:35 jayme@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
  • 09:35 jayme@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
  • 09:34 jayme@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
  • 09:30 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host rdb1012.eqiad.wmnet
  • 08:59 jiji@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mc2033.codfw.wmnet with reason: REIMAGE
  • 08:57 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mc2033.codfw.wmnet with reason: REIMAGE
  • 08:17 godog: swift codfw add ms-be20[62-65] with initial weight - T288458
  • 07:01 marostegui@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on db1160.eqiad.wmnet with reason: REIMAGE
  • 06:59 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1160.eqiad.wmnet with reason: REIMAGE
  • 06:43 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1160 for reimage T288803', diff saved to https://phabricator.wikimedia.org/P17078 and previous config saved to /var/cache/conftool/dbconfig/20210825-064319-marostegui.json
  • 06:08 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db2118.codfw.wmnet with reason: Reimaging T288244
  • 06:08 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db2118.codfw.wmnet with reason: Reimaging T288244
  • 06:07 kormat@cumin1001: dbctl commit (dc=all): 'Depool db2118 until it's reimaged to buster T289129', diff saved to https://phabricator.wikimedia.org/P17077 and previous config saved to /var/cache/conftool/dbconfig/20210825-060742-kormat.json
  • 06:02 kormat@cumin1001: dbctl commit (dc=all): 'Promote db2121 to s7 primary and set section read-write T289129', diff saved to https://phabricator.wikimedia.org/P17076 and previous config saved to /var/cache/conftool/dbconfig/20210825-060222-kormat.json
  • 06:01 kormat@cumin1001: dbctl commit (dc=all): 'Set s7 codfw as read-only for maintenance - T289129', diff saved to https://phabricator.wikimedia.org/P17075 and previous config saved to /var/cache/conftool/dbconfig/20210825-060112-kormat.json
  • 06:00 kormat: Starting s7 codfw failover from db2118 to db2121 - T289129
  • 05:33 eileen: civicrm revision changed from a4ce949828 to 42bb64c608, config revision is 1afcea7f5b
  • 05:28 kormat: Moving s7 codfw replicas under db2121 - T289129
  • 05:27 kormat@cumin1001: dbctl commit (dc=all): 'Set db2121 with weight 0 T289129', diff saved to Unable to send diff to phaste and previous config saved to /var/cache/conftool/dbconfig/20210825-052741-kormat.json
  • 05:27 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:04:00 on 27 hosts with reason: Primary switchover s7 T289129
  • 05:27 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:04:00 on 27 hosts with reason: Primary switchover s7 T289129
  • 02:06 eileen: civicrm revision changed from 8ed303f2d1 to a4ce949828, config revision is ac2d75d4a8
  • 00:53 legoktm@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'shellbox' for release 'main' .
  • 00:50 legoktm@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'shellbox' for release 'main' .
  • 00:47 legoktm@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'shellbox' for release 'main' .

2021-08-24

  • 22:05 legoktm@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'shellbox' for release 'main' .
  • 22:04 legoktm@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'shellbox-constraints' for release 'main' .
  • 21:10 tgr: running extensions/GrowthExperiments/maintenance/revalidateLinkRecommendations.php on various wikis per T282873#7303828
  • 20:59 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 20:55 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: a6fd96b: Growth features: Promote 9 wikis out of dark mode (T287871; T287874; T287872; T287880; T287868; T287873; T287879; T287875; T287876) (duration: 01m 25s)
  • 20:54 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 20:43 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 20:35 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 20:35 dancy@deploy1002: Pruned MediaWiki: 1.37.0-wmf.17 (duration: 01m 48s)
  • 20:33 dancy@deploy1002: Pruned MediaWiki: 1.37.0-wmf.18 (duration: 03m 26s)
  • 20:27 dancy@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.37.0-wmf.20
  • 20:18 dancy@deploy1002: Finished scap: testwikis wikis to 1.37.0-wmf.20 (duration: 36m 32s)
  • 20:16 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 20:09 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 19:41 dancy@deploy1002: Started scap: testwikis wikis to 1.37.0-wmf.20
  • 17:23 mbsantos@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
  • 17:19 mbsantos@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
  • 17:17 mbsantos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'mobileapps' for release 'staging' .
  • 15:26 dcausse@deploy1002: Finished deploy [wikimedia/discovery/analytics@e02c602]: transfer_to_es: stop adding data to article_topics (duration: 02m 17s)
  • 15:23 dcausse@deploy1002: Started deploy [wikimedia/discovery/analytics@e02c602]: transfer_to_es: stop adding data to article_topics
  • 15:15 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 15:13 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 14:55 jayme@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
  • 14:54 jayme@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
  • 14:50 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 14:49 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 14:23 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc2031.codfw.wmnet with reason: REIMAGE
  • 14:19 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mc2031.codfw.wmnet with reason: REIMAGE
  • 13:12 XioNoX: push pfw policies - T289353
  • 12:45 vgutierrez: enable puppet on P:tlsproxy::envoy hosts - merging https://gerrit.wikimedia.org/r/c/operations/puppet/+/710507/9
  • 12:37 vgutierrez: disable puppet on P:tlsproxy::envoy hosts - merging https://gerrit.wikimedia.org/r/c/operations/puppet/+/710507/9
  • 12:33 godog: test patched python3-eventlet on thanos-fe1003 - T283714
  • 12:30 marostegui: Install 10.4.21 on clouddb1015
  • 11:27 jiji@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mc2029.codfw.wmnet with reason: REIMAGE
  • 11:24 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mc2029.codfw.wmnet with reason: REIMAGE
  • 09:08 jbond: upload new statograph version
  • 09:02 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
  • 09:02 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
  • 08:54 Amir1: start of mwscript extensions/FlaggedRevs/maintenance/pruneRevData.php --wiki=dewiki --prune --batch-size=5 --sleep=5 (T289249)
  • 08:51 Amir1: start of mwscript extensions/FlaggedRevs/maintenance/pruneRevData.php --wiki=arwiki --prune --batch-size=5 --sleep=5 (T289249)
  • 08:01 godog: temp fix thanos-swift.discovery.wmnet in /etc/hosts to get swift-dispersion-stats to work - T283714
  • 07:51 dcausse: repool wdqs1012 T289551
  • 07:29 dcausse: restarting blazegraph on wdqs1012
  • 07:17 marostegui: Optimize huwiki.flaggedtemplates on db1127
  • 07:15 marostegui: Optimize huwiki.flaggedtemplates on db1098:3317
  • 06:17 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc2037.codfw.wmnet with reason: REIMAGE
  • 06:14 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mc2037.codfw.wmnet with reason: REIMAGE
  • 03:51 rzl: rzl@wdqs1012:~$ sudo depool
  • 03:46 legoktm: wdqs1012 restarted prometheus-blazegraph-exporter-wdqs-blazegraph.service and prometheus-blazegraph-exporter-wdqs-categories.service after apparent exceptions/crashes
  • 02:08 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 02:06 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 00:17 ryankemper: [WDQS Deploy] Restarting `wdqs-categories` across lvs-managed hosts, one node at a time: `sudo -E cumin -b 1 'A:wdqs-all and not A:wdqs-test' 'depool && sleep 45 && systemctl restart wdqs-categories && sleep 45 && pool'`
  • 00:17 ryankemper: [WDQS Deploy] Restarted `wdqs-categories` across all test hosts simultaneously: `sudo -E cumin 'A:wdqs-test' 'systemctl restart wdqs-categories'`
  • 00:17 ryankemper: [WDQS Deploy] Restarted `wdqs-updater` across all hosts, 4 hosts at a time: `sudo -E cumin -b 4 'A:wdqs-all' 'systemctl restart wdqs-updater'`
  • 00:16 ryankemper@deploy1002: Finished deploy [wdqs/wdqs@da9efa9]: 0.3.83 (duration: 07m 05s)
  • 00:10 ryankemper: [WDQS Deploy] Tests passing following deploy of `0.3.83` on canary `wdqs1003`; proceeding to rest of fleet
  • 00:09 ryankemper@deploy1002: Started deploy [wdqs/wdqs@da9efa9]: 0.3.83
  • 00:08 ryankemper: [WDQS Deploy] Gearing up for deploy of wdqs `0.3.83`. Pre-deploy tests passing on canary `wdqs1003`

2021-08-23

  • 23:41 ryankemper: T285355 `helmfile -e staging -i apply` on `/srv/deployment-charts/helmfile.d/services/linkrecommendation/` from `ryankemper@deploy1002`
  • 23:40 ryankemper@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'linkrecommendation' for release 'staging' .
  • 18:56 tgr: morning deploys done
  • 18:56 tgr@deploy1002: Synchronized php-1.37.0-wmf.19/extensions/GrowthExperiments: Backport: Add Link: store when tasks were generated (T284551) (duration: 00m 57s)
  • 18:49 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 18:47 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 18:35 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 18:34 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 18:27 dancy@deploy1002: Synchronized wmf-config/etcd.php: Config: wmfSetupEtcd only supports array input (duration: 00m 57s)
  • 18:27 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 18:24 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 18:23 dancy@deploy1002: Synchronized wmf-config: Config: Use array format to specify etcd server (duration: 00m 57s)
  • 18:12 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 18:12 dancy@deploy1002: Synchronized wmf-config/etcd.php: Config: Allow protocol for etcd server to be specified (duration: 00m 57s)
  • 18:11 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 17:59 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 17:57 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 17:17 ebernhardson@deploy1002: Finished deploy [search/airflow@4c49df7]: ship modern pip/wheel version to support manylinux2014 (pyarrow) (duration: 00m 56s)
  • 17:16 ebernhardson@deploy1002: Started deploy [search/airflow@4c49df7]: ship modern pip/wheel version to support manylinux2014 (pyarrow)
  • 16:37 ebernhardson@deploy1002: Finished deploy [search/airflow@32f5039]: Add pyarrow lib for hdfs integration (duration: 00m 35s)
  • 16:37 ebernhardson@deploy1002: Started deploy [search/airflow@32f5039]: Add pyarrow lib for hdfs integration
  • 16:24 jiji@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mc2027.codfw.wmnet with reason: REIMAGE
  • 16:21 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mc2027.codfw.wmnet with reason: REIMAGE
  • 15:48 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:44 pt1979@cumin2002: START - Cookbook sre.dns.netbox
  • 15:43 pt1979@cumin2002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 15:38 pt1979@cumin2002: START - Cookbook sre.dns.netbox
  • 14:59 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 14:59 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 26fe6d7: ckbwiki: Enable Growth features in dark mode (T287867; 3/3) (duration: 00m 56s)
  • 14:58 urbanecm@deploy1002: Synchronized wmf-config/config/ckbwiki.yaml: 26fe6d7: ckbwiki: Enable Growth features in dark mode (T287867; 2/3) (duration: 00m 57s)
  • 14:57 urbanecm@deploy1002: Synchronized dblists/growthexperiments.dblist: 26fe6d7: ckbwiki: Enable Growth features in dark mode (T287867; 1/3) (duration: 00m 57s)
  • 14:56 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 14:54 urbanecm: [urbanecm@mwmaint2002 /srv/mediawiki-staging/php]$ mwscript extensions/GrowthExperiments/maintenance/initWikiConfig.php --wiki=ckbwiki --phab=T287867 # T287867
  • 14:53 urbanecm: [urbanecm@mwmaint2002 /srv/mediawiki-staging/php]$ mwscript extensions/WikimediaMaintenance/createExtensionTables.php --wiki=ckbwiki growthexperiments # T287867
  • 14:29 zpapierski@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
  • 14:26 btullis@cumin1001: END (PASS) - Cookbook sre.hadoop.roll-restart-masters (exit_code=0) restart masters for Hadoop analytics cluster: Restart of jvm daemons. - btullis@cumin1001
  • 14:00 btullis@cumin1001: START - Cookbook sre.hadoop.roll-restart-masters restart masters for Hadoop analytics cluster: Restart of jvm daemons. - btullis@cumin1001
  • 13:57 zpapierski@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
  • 13:56 zpapierski@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
  • 13:26 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5 days, 8:00:00 on planet1002.eqiad.wmnet with reason: known issue
  • 13:26 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 5 days, 8:00:00 on planet1002.eqiad.wmnet with reason: known issue
  • 12:59 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 12:57 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 12:55 jiji@deploy1002: Synchronized wmf-config/ProductionServices.php: Config: ProductionServices: change rdb* servers in eqiad and codfw (T280582) (duration: 00m 57s)
  • 11:35 Lucas_WMDE: EU backport+config window done
  • 11:33 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/Wikibase.php: Config: Set $wgWBRepoSettings['tmpNormalizeDataValues'] on test wikis (T251480) (2/2) (duration: 00m 57s)
  • 11:31 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Set $wgWBRepoSettings['tmpNormalizeDataValues'] on test wikis (T251480) (1/2) (duration: 00m 58s)
  • 11:20 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 11:18 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 11:10 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 11:09 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 11:04 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Revert "Enable NewUserMessage on hiwiktionary" (T287091) (duration: 00m 57s)
  • 10:57 jiji@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mc2025.codfw.wmnet with reason: REIMAGE
  • 10:55 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mc2025.codfw.wmnet with reason: REIMAGE
  • 09:56 ladsgroup@deploy1002: Synchronized php-1.37.0-wmf.19/extensions/FlaggedRevs/maintenance/pruneRevData.php: Backport: Add extra sleep option between each batch in pruneRevData.php (T289249) (duration: 00m 58s)
  • 09:55 mbsantos: start re-import OSM planet data into maps1009 eqiad master (T288400, T288897)
  • 09:53 urbanecm: Deploy security patch for T289408
  • 09:51 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 09:50 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 09:33 filippo@puppetmaster1001: conftool action : set/pooled=false; selector: dnsdisc=swift-ro,name=codfw
  • 09:33 filippo@puppetmaster1001: conftool action : set/pooled=false; selector: dnsdisc=swift,name=codfw
  • 09:02 filippo@puppetmaster1001: conftool action : set/pooled=true; selector: dnsdisc=swift-ro,name=eqiad
  • 09:02 filippo@puppetmaster1001: conftool action : set/pooled=true; selector: dnsdisc=swift,name=eqiad
  • 09:01 godog: pooling swift in eqiad - T288458
  • 07:59 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 07:55 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 07:46 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 07:44 ladsgroup@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Set request languages rdf output for wikidata to true (T285795) (duration: 00m 57s)
  • 07:42 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 07:28 Amir1: running FlaggedRevs/maintenance/pruneRevData.php on all flaggedrevs wikis
  • 07:28 ladsgroup@deploy1002: Synchronized php-1.37.0-wmf.19/extensions/FlaggedRevs/maintenance/pruneRevData.php: Backport: Avoid calling delete() with empty arrays in PruneFRIncludeData (T289249) (duration: 00m 59s)
  • 07:20 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc2023.codfw.wmnet with reason: REIMAGE
  • 07:18 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mc2023.codfw.wmnet with reason: REIMAGE

2021-08-21

  • 15:12 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 15:11 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .

2021-08-20

  • 23:17 legoktm: deployed patch for T289385
  • 17:03 btullis@cumin1001: END (PASS) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=0) for hosts an-worker1141.eqiad.wmnet
  • 17:01 btullis@cumin1001: START - Cookbook sre.hadoop.init-hadoop-workers for hosts an-worker1141.eqiad.wmnet
  • 16:58 btullis@cumin1001: END (PASS) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=0) for hosts an-worker1140.eqiad.wmnet
  • 16:56 btullis@cumin1001: START - Cookbook sre.hadoop.init-hadoop-workers for hosts an-worker1140.eqiad.wmnet
  • 16:56 btullis@cumin1001: END (PASS) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=0) for hosts an-worker1139.eqiad.wmnet
  • 16:54 btullis@cumin1001: START - Cookbook sre.hadoop.init-hadoop-workers for hosts an-worker1139.eqiad.wmnet
  • 16:45 btullis@cumin1001: END (PASS) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=0) for hosts an-worker1134.eqiad.wmnet
  • 16:43 btullis@cumin1001: START - Cookbook sre.hadoop.init-hadoop-workers for hosts an-worker1134.eqiad.wmnet
  • 16:38 btullis@cumin1001: END (PASS) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=0) for hosts an-worker1133.eqiad.wmnet
  • 16:36 btullis@cumin1001: START - Cookbook sre.hadoop.init-hadoop-workers for hosts an-worker1133.eqiad.wmnet
  • 15:37 jayme: deleting various pods from staging to have them recreated with priorities - T289131
  • 15:25 btullis@cumin1001: END (PASS) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=0) for hosts an-worker1129.eqiad.wmnet
  • 15:23 btullis@cumin1001: START - Cookbook sre.hadoop.init-hadoop-workers for hosts an-worker1129.eqiad.wmnet
  • 15:14 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
  • 14:41 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc2021.codfw.wmnet with reason: REIMAGE
  • 14:39 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mc2021.codfw.wmnet with reason: REIMAGE
  • 13:54 jelto@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'miscweb' for release 'main' .
  • 13:48 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
  • 12:00 jayme: enabled priority admission plugin on k8s staging, rolling restart all pods in kube-system namespace - T289131
  • 11:55 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
  • 10:35 dzahn@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'miscweb' for release 'main' .
  • 09:33 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts druid1001.eqiad.wmnet
  • 09:32 dzahn@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'miscweb' for release 'main' .
  • 09:23 btullis@cumin1001: START - Cookbook sre.hosts.decommission for hosts druid1001.eqiad.wmnet
  • 08:48 godog: roll depool/pool thanos-fe to apply swift change - T288815
  • 08:43 godog: temp depool thanos-fe2003 to test https://gerrit.wikimedia.org/r/c/operations/puppet/+/713815
  • 08:43 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5 days, 0:00:00 on druid1001.eqiad.wmnet with reason: decommissioning druid1001
  • 08:43 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 5 days, 0:00:00 on druid1001.eqiad.wmnet with reason: decommissioning druid1001
  • 07:14 jiji@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mc2019.codfw.wmnet with reason: REIMAGE
  • 07:12 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1038.eqiad.wmnet with reason: REIMAGE
  • 07:10 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1037.eqiad.wmnet with reason: REIMAGE
  • 07:10 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1038.eqiad.wmnet with reason: REIMAGE
  • 07:09 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mc2019.codfw.wmnet with reason: REIMAGE
  • 07:08 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1037.eqiad.wmnet with reason: REIMAGE
  • 06:13 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
  • 06:07 TimStarling: sending election email to 44k people
  • 03:15 legoktm@deploy1002: Synchronized php-1.37.0-wmf.19/extensions/Score/scripts/removeTagline.php: removeTagline: Set explicit pcre.backtrack_limit (T289298) (duration: 00m 58s)
  • 03:12 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 03:11 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 00:18 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 00:16 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 00:13 tstarling@deploy1002: Synchronized php-1.37.0-wmf.19/extensions/SecurePoll/cli/wm-scripts/makeMailingList.php: code that uses said hack (duration: 00m 57s)
  • 00:12 tstarling@deploy1002: Synchronized php-1.37.0-wmf.19/extensions/SecurePoll/includes/User/LocalAuth.php: hack for mailout (duration: 00m 58s)
  • 00:09 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 00:08 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .

2021-08-19

  • 23:15 brennen: ended backport & config window early, as no patches were scheduled and no new attendees for this week
  • 22:42 ejegg: updated payments-wiki from 0a27dbe9b6 to 564daed816
  • 21:20 Amir1: ladsgroup@mwmaint2002:~$ mwscript extensions/FlaggedRevs/maintenance/pruneRevData.php --wiki=huwiki --prune (T289249)
  • 19:13 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 19:12 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 19:07 brennen@deploy1002: rebuilt and synchronized wikiversions files: all wikis to 1.37.0-wmf.19
  • 19:07 razzi@deploy1002: Finished deploy [analytics/aqs/deploy@57c253e]: Deploy aqs 9c062f2 (duration: 03m 30s)
  • 19:03 razzi@deploy1002: Started deploy [analytics/aqs/deploy@57c253e]: Deploy aqs 9c062f2
  • 18:27 razzi: Beginning aqs deploy process
  • 18:00 herron@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts kafkamon2001.codfw.wmnet
  • 17:49 herron@cumin1001: START - Cookbook sre.hosts.decommission for hosts kafkamon2001.codfw.wmnet
  • 17:48 herron@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts kafkamon1001.eqiad.wmnet
  • 17:41 herron@cumin1001: START - Cookbook sre.hosts.decommission for hosts kafkamon1001.eqiad.wmnet
  • 17:11 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts maps1004.eqiad.wmnet
  • 17:01 hnowlan@cumin1001: START - Cookbook sre.hosts.decommission for hosts maps1004.eqiad.wmnet
  • 17:00 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts maps1003.eqiad.wmnet
  • 16:54 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 16:52 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 16:49 legoktm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Re-enable Score with Shellbox on most public wikis (T257066) (duration: 01m 08s)
  • 16:46 hnowlan@cumin1001: START - Cookbook sre.hosts.decommission for hosts maps1003.eqiad.wmnet
  • 16:45 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts maps1002.eqiad.wmnet
  • 16:31 hnowlan@cumin1001: START - Cookbook sre.hosts.decommission for hosts maps1002.eqiad.wmnet
  • 16:31 hnowlan@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=99) for hosts maps1002.eqiad.wmnet
  • 16:30 hnowlan@cumin1001: START - Cookbook sre.hosts.decommission for hosts maps1002.eqiad.wmnet
  • 16:28 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts maps1001.eqiad.wmnet
  • 16:14 hnowlan@cumin1001: START - Cookbook sre.hosts.decommission for hosts maps1001.eqiad.wmnet
  • 16:14 hnowlan: starting decommission of old eqiad maps hardware
  • 16:10 cwhite: remove rotated logstash-plain-* and logstash-json-* logs on logstash collectors
  • 16:00 akosiaris@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
  • 15:53 dpifke@deploy1002: Finished deploy [performance/navtiming@f8bf39f]: Deploy CpuBenchmark processor again T281243 (duration: 00m 06s)
  • 15:52 dpifke@deploy1002: Started deploy [performance/navtiming@f8bf39f]: Deploy CpuBenchmark processor again T281243
  • 15:50 Amir1: test2wiki)> delete from flaggedtemplates where ft_rev_id not in (select fp_stable from flaggedpages); (T289249)
  • 15:42 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=maps2005.codfw.wmnet
  • 15:40 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=maps2005.codfw.wmnet
  • 15:38 jiji@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
  • 15:35 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=maps2005.codfw.wmnet
  • 15:33 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=maps2005.codfw.wmnet
  • 15:29 jiji@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
  • 15:25 jiji@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
  • 15:06 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on maps[1001-1004].eqiad.wmnet with reason: Awaiting decommissioning
  • 15:06 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on maps[1001-1004].eqiad.wmnet with reason: Awaiting decommissioning
  • 15:04 godog: clean logstash json logs off logstash hosts
  • 14:51 jiji@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
  • 14:49 jiji@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
  • 14:36 effie: enable puppet on mediawiki and memcached servers for 713842
  • 14:26 effie: disable puppet on mediawiki and memcached servers for 713842
  • 13:58 zpapierski@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
  • 13:49 urbanecm: Start server-side upload for 1 video file (T288384)
  • 13:48 urbanecm: Start server-side upload for 1 video file (T288554)
  • 13:47 kharlan@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'linkrecommendation' for release 'external' .
  • 13:47 kharlan@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'linkrecommendation' for release 'internal' .
  • 13:45 urbanecm: Start server-side upload for 1 video file (T288628)
  • 13:44 kharlan@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'linkrecommendation' for release 'internal' .
  • 13:44 kharlan@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'linkrecommendation' for release 'external' .
  • 13:42 urbanecm: Start server-side upload for 1 video file (T289203)
  • 13:40 kharlan@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'linkrecommendation' for release 'staging' .
  • 13:34 kormat: reconfiguring replication tree on pc3 T284825
  • 13:30 kormat: reconfiguring replication tree on pc2 T284825
  • 13:24 kormat: reconfiguring replication tree on pc1 T284825
  • 13:18 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 13:17 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 13:10 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 13:09 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 13:09 kormat@deploy1002: Synchronized wmf-config/ProductionServices.php: Promote new h/w to primary of eqiad pc sections T284825 (duration: 01m 08s)
  • 12:35 zpapierski@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
  • 12:17 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 12:15 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 12:11 Lucas_WMDE: EU backport+config window done
  • 12:11 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.37.0-wmf.18/extensions/Wikibase/view/lib/wikibase-termbox/: Backport: Update termbox (T236893, T286775) (duration: 01m 08s)
  • 11:56 zpapierski@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
  • 11:49 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 11:45 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 11:42 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/Wikibase.php: Config: Revert "Don't set termbox v2 tags yet" (T236893, T286775) (duration: 01m 06s)
  • 11:40 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.37.0-wmf.19/extensions/Wikibase/view/lib/wikibase-termbox/: Backport: Update termbox (T236893, T286775) (duration: 01m 08s)
  • 11:39 lucaswerkmeister-wmde@deploy1002: sync-file aborted: Backport: Update termbox (T236893T286775) (duration: 00m 01s)
  • 11:35 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 11:34 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 10:45 mvolz@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'citoid' for release 'production' .
  • 10:42 mvolz@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'citoid' for release 'production' .
  • 10:36 mvolz@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'citoid' for release 'staging' .
  • 10:12 twentyafterfour: restart php-fpm on phab1001
  • 10:02 godog: roll-reload nginx on ms-fe to apply config change
  • 08:48 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 08:48 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 08:41 dzahn@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'miscweb' for release 'main' .
  • 04:20 effie: pool mw2383 - T286463
  • 01:13 ejegg: updated fundraising CiviCRM from 73f6ec9190 to 8ed303f2d1
  • 00:42 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 00:40 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox

2021-08-18

  • 22:16 ebernhardson@deploy1002: Finished deploy [wikimedia/discovery/analytics@26480d5]: fully enable imagerec data shipping (duration: 02m 09s)
  • 22:14 ebernhardson@deploy1002: Started deploy [wikimedia/discovery/analytics@26480d5]: fully enable imagerec data shipping
  • 21:15 jgleeson: civicrm changed from 66568246a2 to 73f6ec9190
  • 19:40 ebernhardson@deploy1002: Finished deploy [wikimedia/discovery/analytics@8d71e72]: configuration for imagerec data shipping (duration: 02m 12s)
  • 19:38 ebernhardson@deploy1002: Started deploy [wikimedia/discovery/analytics@8d71e72]: configuration for imagerec data shipping
  • 19:14 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 19:12 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 19:09 brennen@deploy1002: Synchronized php: group1 wikis to 1.37.0-wmf.19 (duration: 01m 05s)
  • 19:08 brennen@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.37.0-wmf.19
  • 18:20 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 18:18 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 18:16 legoktm: Successfully published image docker-registry.discovery.wmnet/nodejs12-devel:0.0.1, docker-registry.discovery.wmnet/nodejs12-slim:0.0.1 (T284346)
  • 18:12 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 559dd70: Enable page previews on German Wikivoyage (T264305) (duration: 01m 08s)
  • 18:09 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 18:07 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 18:06 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 35113b6: Enable DiscussionTools topicsubscription as beta feature on phase 1 wikis (T287800) (duration: 01m 25s)
  • 16:28 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:24 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 15:48 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:46 ejegg: updated matching gift employers list on payments-wiki
  • 15:43 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 14:50 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
  • 14:26 effie: enable puppet on alert*
  • 14:11 effie: disable puppet on alerts* to avoid alert flood due to 713494
  • 14:01 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 13:57 jiji@deploy1002: Synchronized wmf-config/ProductionServices.php: Config: ProductionServices: change rdb* servers in eqiad and codfw (T280582) (duration: 01m 51s)
  • 13:57 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 13:41 godog: bounce logstash on logstash100[89]
  • 13:33 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
  • 13:24 effie: mw2383 is depooled - T286463
  • 13:01 kormat: Deploying wmfmariadbpy 0.7.2 T289139
  • 13:01 kormat: uploaded wmfmariadbpy 0.7.2 to apt.wm.o
  • 11:38 zpapierski@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
  • 11:36 zpapierski@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
  • 11:35 zpapierski@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
  • 11:12 jiji@cumin1001: conftool action : set/pooled=inactive; selector: name=mw2383.codfw.wmnet
  • 11:03 zpapierski@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
  • 10:47 effie: pooling mw2383 - T286463
  • 10:41 zpapierski@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
  • 10:18 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on maps1004.eqiad.wmnet with reason: Awaiting decommissioning
  • 10:18 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on maps1004.eqiad.wmnet with reason: Awaiting decommissioning
  • 10:13 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2383.codfw.wmnet with reason: REIMAGE
  • 10:10 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2383.codfw.wmnet with reason: REIMAGE
  • 09:36 joal@deploy1002: Finished deploy [analytics/refinery@88c6618] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@88c6618] (duration: 05m 48s)
  • 09:30 joal@deploy1002: Started deploy [analytics/refinery@88c6618] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@88c6618]
  • 09:30 joal@deploy1002: Finished deploy [analytics/refinery@88c6618] (thin): Regular analytics weekly train THIN [analytics/refinery@88c6618] (duration: 00m 07s)
  • 09:30 joal@deploy1002: Started deploy [analytics/refinery@88c6618] (thin): Regular analytics weekly train THIN [analytics/refinery@88c6618]
  • 09:29 joal@deploy1002: Finished deploy [analytics/refinery@88c6618]: Regular analytics weekly train [analytics/refinery@88c6618] (duration: 32m 29s)
  • 08:57 joal@deploy1002: Started deploy [analytics/refinery@88c6618]: Regular analytics weekly train [analytics/refinery@88c6618]
  • 04:38 marostegui: Drop user2 from s6 - T289051
  • 02:03 rzl@cumin2001: conftool action : get/pooled; selector: service=docker-registry
  • 00:39 dpifke@deploy1002: Finished deploy [performance/navtiming@88f12a0]: Revert CpuBenchmark again (T281243) (duration: 00m 05s)
  • 00:39 dpifke@deploy1002: Started deploy [performance/navtiming@88f12a0]: Revert CpuBenchmark again (T281243)
  • 00:38 dpifke@deploy1002: Finished deploy [performance/navtiming@88f12a0]: Re-deploy fixed CpuBenchmark (T281243) (duration: 00m 06s)
  • 00:38 dpifke@deploy1002: Started deploy [performance/navtiming@88f12a0]: Re-deploy fixed CpuBenchmark (T281243)

2021-08-17

  • 23:40 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 23:33 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 23:32 ebernhardson@deploy1002: Synchronized php-1.37.0-wmf.19/extensions/CirrusSearch/maintenance/UpdateSuggesterIndex.php: T288233: Work around cache failure for wikitech (duration: 01m 28s)
  • 23:05 tzatziki: resetting email for vanished user
  • 21:51 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 21:44 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 21:44 urbanecm: Deploy security patch for T289063
  • 20:30 brennen: running scap pull on mw2383
  • 20:29 brennen@deploy1002: Pruned MediaWiki: 1.37.0-wmf.16 (duration: 02m 01s)
  • 20:20 brennen@deploy1002: Pruned MediaWiki: 1.37.0-wmf.15 (duration: 06m 51s)
  • 20:14 brennen: pruning 1.37.0-wmf.15 and .16 (T281160)
  • 20:06 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.18/includes/block/BlockUser.php: d377d4f: BlockUser: Restore blocking autoblocked IP addresses (T287798) (duration: 01m 08s)
  • 19:05 brennen@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.37.0-wmf.19
  • 19:02 brennen: 1.37.0-wmf.19 train status: no current blockers, proceeding to group0 (T281160)
  • 17:44 ladsgroup@deploy1002: Synchronized php-1.37.0-wmf.18/includes/: Backport: Revert "objectcache: make use of new `modtoken` field in SqlBagOStuff" (T288998) (duration: 01m 13s)
  • 17:41 urbanecm: [urbanecm@mw2383 ~]$ scap pull # to clear an icinga alert
  • 17:39 ladsgroup@deploy1002: Synchronized php-1.37.0-wmf.19/includes/: Backport: Revert "objectcache: make use of new `modtoken` field in SqlBagOStuff" (T288998) (duration: 01m 14s)
  • 17:15 bblack: authdns2001,dns[245]001: upgrade gdnsd package to 3.8.0-1~wmf1 (all authdns upgraded after this)
  • 17:07 mbsantos@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
  • 17:04 mbsantos@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
  • 17:02 mbsantos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'mobileapps' for release 'staging' .
  • 16:56 brennen@deploy1002: Finished scap: testwikis wikis to 1.37.0-wmf.19 (duration: 38m 24s)
  • 16:50 bblack: dns1001: upgrade gdnsd package to 3.8.0-1~wmf1
  • 16:25 bblack: dns3001: upgrade gdnsd package to 3.8.0-1~wmf1
  • 16:17 brennen@deploy1002: Started scap: testwikis wikis to 1.37.0-wmf.19
  • 16:13 brennen: 1.37.0-wmf.19 train: running scap prep, branched at 79c9b9e
  • 16:08 zpapierski@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
  • 16:06 zpapierski@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
  • 16:05 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 16:03 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 15:55 urbanecm: Deploy a security patch for T289064
  • 15:37 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
  • 15:32 zpapierski@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
  • 15:23 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
  • 15:06 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
  • 14:41 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 14:40 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 14:37 kormat@deploy1002: Synchronized wmf-config/ProductionServices.php: Promote pc2013 to primary of pc3 T284825 (duration: 00m 58s)
  • 14:25 jynus: running a full testwiki media backup on a single thread, single worker T262668
  • 14:23 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 14:21 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 14:21 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
  • 14:20 kormat@deploy1002: Synchronized wmf-config/ProductionServices.php: Promote pc2012 to primary of pc2 T284825 (duration: 00m 59s)
  • 13:53 jynus: rolling restart of minio on backup server
  • 13:51 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
  • 13:06 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
  • 12:55 jayme@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'blubberoid' for release 'staging' .
  • 12:55 jayme@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'blubberoid' for release 'production' .
  • 12:55 jayme@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'blubberoid' for release 'staging' .
  • 12:55 jayme@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'blubberoid' for release 'production' .
  • 11:29 phuedx@deploy1002: Synchronized php-1.37.0-wmf.18/extensions/SecurePoll/includes/Jobs/TallyElectionJob.php: Backport: tallyElectionJob: Catch and log exceptions (T288361) (duration: 00m 58s)
  • 11:16 mvernon@cumin1001: dbctl commit (dc=all): 'db2121 (re)pooling @ 100%: buster reimage T288244', diff saved to https://phabricator.wikimedia.org/P17038 and previous config saved to /var/cache/conftool/dbconfig/20210817-111629-mvernon.json
  • 11:15 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
  • 11:13 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 11:09 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 11:01 mvernon@cumin1001: dbctl commit (dc=all): 'db2121 (re)pooling @ 75%: buster reimage T288244', diff saved to https://phabricator.wikimedia.org/P17037 and previous config saved to /var/cache/conftool/dbconfig/20210817-110125-mvernon.json
  • 10:46 mvernon@cumin1001: dbctl commit (dc=all): 'db2121 (re)pooling @ 50%: buster reimage T288244', diff saved to https://phabricator.wikimedia.org/P17035 and previous config saved to /var/cache/conftool/dbconfig/20210817-104622-mvernon.json
  • 10:31 mvernon@cumin1001: dbctl commit (dc=all): 'db2121 (re)pooling @ 25%: buster reimage T288244', diff saved to https://phabricator.wikimedia.org/P17034 and previous config saved to /var/cache/conftool/dbconfig/20210817-103118-mvernon.json
  • 10:07 effie: enable puppet on mediawiki hosts
  • 09:52 mvernon@cumin2002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on db2121.codfw.wmnet with reason: REIMAGE
  • 09:50 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2121.codfw.wmnet with reason: REIMAGE
  • 09:20 mvernon@cumin1001: dbctl commit (dc=all): 'db2121 depooling: reimage to buster T288244', diff saved to https://phabricator.wikimedia.org/P17033 and previous config saved to /var/cache/conftool/dbconfig/20210817-092045-mvernon.json
  • 09:17 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1456.eqiad.wmnet
  • 09:16 Emperor: reimaging db2121 to buster T288244
  • 09:08 dzahn@cumin1001: conftool action : set/weight=30; selector: name=mw1456.eqiad.wmnet
  • 08:37 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw[1276-1279].eqiad.wmnet
  • 08:29 effie: disable puppet on mediawiki hosts to merge 712920
  • 08:24 jelto@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw[1276-1279].eqiad.wmnet
  • 08:24 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1456.eqiad.wmnet with reason: new setup
  • 08:24 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1456.eqiad.wmnet with reason: new setup
  • 08:21 mutante: mw2383 - scap pull (still depooled because T286463 but alerts in Icinga since a while)
  • 08:20 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mw1456.eqiad.wmnet with reason: REIMAGE
  • 08:18 zpapierski@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
  • 08:18 jelto@cumin1001: conftool action : set/pooled=inactive; selector: name=mw127[6-9].eqiad.wmnet
  • 08:18 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1456.eqiad.wmnet with reason: REIMAGE
  • 08:17 jelto@cumin1001: conftool action : set/pooled=no; selector: name=mw127[6-9].eqiad.wmnet
  • 08:15 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on mw[1276-1279].eqiad.wmnet with reason: decom old appservers in eqiad T280203
  • 08:15 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on mw[1276-1279].eqiad.wmnet with reason: decom old appservers in eqiad T280203
  • 08:06 zpapierski@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
  • 08:00 jelto@cumin1001: conftool action : set/weight=30; selector: name=mw144[7-9].eqiad.wmnet
  • 07:59 mutante: mw1384 - start failed ferm service
  • 07:59 jelto@cumin1001: conftool action : set/weight=30; selector: name=mw1450.eqiad.wmnet
  • 07:52 mutante: mw1451 through mw1455 - fresh hardware pooled the first time as appservers
  • 07:50 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw145[1-5].eqiad.wmnet
  • 07:49 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw145[1-5].eqiad.wmnet
  • 07:48 dzahn@cumin1001: conftool action : set/weight=30; selector: name=mw145[1-5].eqiad.wmnet
  • 07:44 marostegui: Drop aft_feedback tables on x1 T250715
  • 07:39 jelto@cumin1001: conftool action : set/pooled=yes; selector: name=mw1450.eqiad.wmnet
  • 07:39 jelto@cumin1001: conftool action : set/pooled=yes; selector: name=mw144[7-9].eqiad.wmnet
  • 06:57 tstarling@deploy1002: Synchronized php-1.37.0-wmf.18/extensions/SecurePoll/includes/Entities/Election.php: T288924 (duration: 00m 57s)
  • 06:56 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 06:55 tstarling@deploy1002: Synchronized php-1.37.0-wmf.18/extensions/SecurePoll/cli/dump.php: T288924 (duration: 00m 58s)
  • 06:54 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 05:59 TimStarling: foreachwikiindblist securepollglobal mysql.php --write -- -e 'insert into securepoll_properties (pr_entity,pr_key,pr_value) select el_entity,'\mobile-jump-url'\,'\https://vote.m.wikimedia.org/wiki/Special:SecurePoll'\ from securepoll_elections where el_title='\DWalden STV Election Test 456'\ limit 1;'
  • 05:47 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 05:43 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 05:37 tstarling@deploy1002: Finished scap: collected SecurePoll maintenance scripts and bug fix (duration: 04m 12s)
  • 05:36 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 05:35 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 05:33 tstarling@deploy1002: Started scap: collected SecurePoll maintenance scripts and bug fix
  • 05:28 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 05:27 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 03:11 eileen: civicrm revision changed from 175a3101f7 to 66568246a2, config revision is 7bdc78073d
  • 02:39 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 02:37 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 02:05 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 02:04 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 00:44 eileen: civicrm revision changed from ba0c7705bb to 175a3101f7, config revision is 7bdc78073d
  • 00:14 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 00:12 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 00:07 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: eccdd3e: Growth mentor dashboard: Enable on testwiki (T278920) (duration: 00m 59s)

2021-08-16

  • 23:20 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 23:20 urbanecm: Evening B&C window done
  • 23:19 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 23:18 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: a14868b: Enable NewUserMessage on hiwiktionary (T287091) (duration: 01m 00s)
  • 23:15 eileen: civicrm revision changed from 1e32084622 to ba0c7705bb, config revision is 7bdc78073d
  • 22:13 bblack: dns[1235]002: upgrade gdnsd package to 3.8.0-1~wmf1
  • 21:31 bblack: authdns1001: upgrade gdnsd package to 3.8.0-1~wmf1
  • 21:28 bblack: dns4002: upgrade gdnsd package to 3.8.0-1~wmf1
  • 20:38 bstorm@cumin1001: END (PASS) - Cookbook wmcs.wikireplicas.add_wiki (exit_code=0)
  • 20:38 bstorm@cumin1001: Added views for new wiki: labswiki T287442
  • 20:37 bstorm@cumin1001: START - Cookbook wmcs.wikireplicas.add_wiki
  • 20:36 bstorm@cumin1001: END (FAIL) - Cookbook wmcs.wikireplicas.add_wiki (exit_code=99)
  • 20:36 bstorm@cumin1001: START - Cookbook wmcs.wikireplicas.add_wiki
  • 20:35 bstorm@cumin1001: END (FAIL) - Cookbook wmcs.wikireplicas.add_wiki (exit_code=99)
  • 20:35 bstorm@cumin1001: START - Cookbook wmcs.wikireplicas.add_wiki
  • 18:48 dancy: Restarted Jenkins due to stuck jobs.
  • 18:17 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 18:15 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 17:52 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mw1453.eqiad.wmnet with reason: REIMAGE
  • 17:49 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1453.eqiad.wmnet with reason: REIMAGE
  • 17:34 cmjohnson1: installing new line card in slot1 cr2-eqiad T277339
  • 17:33 ladsgroup@deploy1002: Synchronized php-1.37.0-wmf.18/extensions/SpamBlacklist/includes/SpamBlacklistHooks.php: Backport: Try to use EditStash before re-rendering (T288639) (duration: 00m 59s)
  • 17:30 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 17:28 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 17:25 XioNoX: cr1-eqiad> request chassis fpc offline slot 5 - T277339
  • 17:17 cmjohnson1: installing new line card in slot1 cr1-eqiad T277339
  • 17:11 ejegg: updated fundraising CiviCRM from f3895dc907 to 1e32084622
  • 17:08 XioNoX: asw2-a-eqiad> request virtual-chassis vc-port set pic-slot 1 member 8 port 1 - T288834
  • 17:05 XioNoX: asw2-a-eqiad> request virtual-chassis vc-port delete pic-slot 1 member 8 port 1 - T288834
  • 16:40 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:37 cwhite: restart logstash on logstash1008
  • 16:36 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 16:01 mutante: LDAP - added user tandic to nda group (T288527)
  • 15:37 ryankemper: [WDQS] Re-pooled `codfw`: `ryankemper@puppetmaster1001:~$ sudo -i confctl --quiet --object-type discovery select 'dnsdisc=wdqs,name=codfw' set/pooled=true`
  • 14:42 mutante: miscweb - deploying new microsite for Wikidata Query Builder subpage (T266703)
  • 14:41 mutante: mw1455 - works fine after a reimage, unknown why it didnt last time, but ok :)
  • 14:11 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1455.eqiad.wmnet with reason: REIMAGE
  • 14:08 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1455.eqiad.wmnet with reason: REIMAGE
  • 13:53 mutante: mw1455 - mysteriously showing a bunch of issues in icinga, broken packages, envoy, memcached etc, after recent fresh install, trying another reimage (T273915)
  • 13:50 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 13:48 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 13:42 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Remove $wmgWikibaseFineGrainedLuaTracking (T288612) (duration: 00m 58s)
  • 13:42 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 13:40 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5 days, 8:00:00 on planet1002.eqiad.wmnet with reason: known issue
  • 13:40 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 5 days, 8:00:00 on planet1002.eqiad.wmnet with reason: known issue
  • 13:40 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 13:40 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/Wikibase.php: Config: Stop setting $wgWBClientSettings['fineGrainedLuaTracking'] (T288612) (duration: 00m 58s)
  • 13:37 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings-labs.php: Config: Remove $wmgWikibaseClientUseTermsTableSearchFields (T288612) (beta, 2/2) (duration: 00m 59s)
  • 13:36 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Remove $wmgWikibaseClientUseTermsTableSearchFields (T288612) (prod, 1/2) (duration: 00m 59s)
  • 13:34 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 13:33 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/Wikibase.php: Config: Stop setting 'useTermsTableSearchFields' Wikibase option (T288612) (duration: 00m 59s)
  • 13:32 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 12:22 Lucas_WMDE: EU backport+config window done (slightly belatedly)
  • 12:20 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 12:19 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 12:18 tstarling@deploy1002: Synchronized php-1.37.0-wmf.18/extensions/SecurePoll/includes/Pages/VotePage.php: allow linking by title (duration: 00m 58s)
  • 12:17 jynus@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2099.codfw.wmnet with reason: REIMAGE
  • 12:15 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.37.0-wmf.18/extensions/Math/src/HookHandlers/ParserHooksHandler.php: Backport: Support null content in parser tag hook (T288846) (hopefully also fixes T288790) (duration: 00m 59s)
  • 12:15 jynus@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2099.codfw.wmnet with reason: REIMAGE
  • 12:14 kormat: clean up old /root/.my.cnf files T150446
  • 11:54 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 11:50 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 11:49 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Add extendedconfirmed on zhwiki (T287322) + Config: Fix extendedconfirmed for bots on zhwiki (T287322) (duration: 01m 01s)
  • 11:44 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 11:42 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 11:26 Lucas_WMDE: namespaceDupes.php for T287024 finished
  • 11:22 Lucas_WMDE: lucaswerkmeister-wmde@mwmaint2002:~$ mwscript namespaceDupes.php hrwiki --fix --add-prefix=T287024/ | tee T287024.out # T287024
  • 11:12 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Add namespace aliases for hr.wiki (T287024) (duration: 00m 59s)
  • 11:11 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 11:07 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 10:40 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 10:33 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 10:32 ladsgroup@deploy1002: Synchronized wmf-config/Wikibase.php: Config: Add tags for wikidata edits (T236893) (duration: 00m 58s)
  • 09:16 gehel: depooling wdqs codfw to allow catching up on lag
  • 08:49 jynus: replacing s2 with s4 on db2097 T287230
  • 08:28 gehel: repool wdqs eqiad (`confctl --quiet --object-type discovery select 'dnsdisc=wdqs,name=eqiad' set/pooled=true`) - codfw currently overloaded
  • 07:47 marostegui: Rename aft_feedback tables on db2115, db2131 - T250715
  • 06:41 TimStarling: on votewiki, set voter-privacy option to 1 on all prior elections T288924
  • 05:54 marostegui@cumin1001: dbctl commit (dc=all): 'db2088:3312 (re)pooling @ 100%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17031 and previous config saved to /var/cache/conftool/dbconfig/20210816-055445-root.json
  • 05:54 marostegui@cumin1001: dbctl commit (dc=all): 'db2088:3311 (re)pooling @ 100%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17030 and previous config saved to /var/cache/conftool/dbconfig/20210816-055427-root.json
  • 05:39 marostegui@cumin1001: dbctl commit (dc=all): 'db2088:3312 (re)pooling @ 75%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17029 and previous config saved to /var/cache/conftool/dbconfig/20210816-053941-root.json
  • 05:39 marostegui@cumin1001: dbctl commit (dc=all): 'db2088:3311 (re)pooling @ 75%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17028 and previous config saved to /var/cache/conftool/dbconfig/20210816-053924-root.json
  • 05:24 marostegui@cumin1001: dbctl commit (dc=all): 'db2088:3312 (re)pooling @ 50%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17027 and previous config saved to /var/cache/conftool/dbconfig/20210816-052437-root.json
  • 05:24 marostegui@cumin1001: dbctl commit (dc=all): 'db2088:3311 (re)pooling @ 50%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17026 and previous config saved to /var/cache/conftool/dbconfig/20210816-052420-root.json
  • 05:09 marostegui@cumin1001: dbctl commit (dc=all): 'db2088:3312 (re)pooling @ 25%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17025 and previous config saved to /var/cache/conftool/dbconfig/20210816-050934-root.json
  • 05:09 marostegui@cumin1001: dbctl commit (dc=all): 'db2088:3311 (re)pooling @ 25%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17024 and previous config saved to /var/cache/conftool/dbconfig/20210816-050916-root.json
  • 04:54 marostegui@cumin1001: dbctl commit (dc=all): 'db2088:3312 (re)pooling @ 10%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17023 and previous config saved to /var/cache/conftool/dbconfig/20210816-045430-root.json
  • 04:54 marostegui@cumin1001: dbctl commit (dc=all): 'db2088:3311 (re)pooling @ 10%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17022 and previous config saved to /var/cache/conftool/dbconfig/20210816-045413-root.json
  • 04:49 marostegui: Upgrade db2088 (s1 and s2) to 10.4.21
  • 04:49 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2088 (s1 and s2) to upgrade', diff saved to https://phabricator.wikimedia.org/P17021 and previous config saved to /var/cache/conftool/dbconfig/20210816-044906-marostegui.json

2021-08-15

  • 20:02 addshore: restarting blazegraph on wdqs2004
  • 16:13 andrew@deploy1002: Finished deploy [horizon/deploy@c23a155]: adding cinder volume resize warning (duration: 03m 52s)
  • 16:10 andrew@deploy1002: Started deploy [horizon/deploy@c23a155]: adding cinder volume resize warning

2021-08-14

  • 03:54 legoktm[m]: restarting mailman3 on lists1001, bounce runner crashed (T288880)

2021-08-13

  • 18:43 bblack: reprepro: uploaded gdnsd-3.8.0-1~wmf1 to buster-wikimedia - T252132
  • 17:32 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on mw[1451-1452,1454-1455].eqiad.wmnet with reason: setup new mediawiki servers in eqiad https://phabricator.wikimedia.org/T279309
  • 17:32 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on mw[1451-1452,1454-1455].eqiad.wmnet with reason: setup new mediawiki servers in eqiad https://phabricator.wikimedia.org/T279309
  • 17:06 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw[1451-1452,1454-1455].eqiad.wmnet with reason: setup new mediawiki servers in eqiad https://phabricator.wikimedia.org/T279309
  • 17:05 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw[1451-1452,1454-1455].eqiad.wmnet with reason: setup new mediawiki servers in eqiad https://phabricator.wikimedia.org/T279309
  • 15:39 mutante: mw1451, mw1452, mw1454 - rebooting after reimage, memcached needs one
  • 15:30 mutante: mw1453 - racadm serveraction powercycle (down and was working until right before the switch issue)
  • 15:18 godog: restart pybal on lvs2009, to clear CRITICAL - thanos-swift_443: Servers thanos-fe2002.codfw.wmnet are marked down but pooled
  • 15:14 godog: restart pybal on lvs2010, to clear CRITICAL - thanos-swift_443: Servers thanos-fe2002.codfw.wmnet are marked down but pooled
  • 15:02 mutante: etherpad1002 - started failed ferm
  • 15:00 mutante: an-worker1117, an-worker1118 - started failed ferm (why are these slowly trickling in )
  • 14:57 jelto@cumin1001: conftool action : set/pooled=no; selector: name=mw1450.eqiad.wmnet
  • 14:57 jelto@cumin1001: conftool action : set/pooled=no; selector: name=mw144[7-9].eqiad.wmnet
  • 14:54 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw[1451-1452,1454-1455].eqiad.wmnet with reason: new setup
  • 14:54 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw[1451-1452,1454-1455].eqiad.wmnet with reason: new setup
  • 14:50 mutante: an-worker1079 - started failed ferm
  • 14:47 jelto@cumin1001: conftool action : set/weight=25; selector: name=mw1450.eqiad.wmnet
  • 14:46 jelto@cumin1001: conftool action : set/weight=25; selector: name=mw144[7-9].eqiad.wmnet
  • 14:45 mutante: an-worker1095 - started ferm, service failed
  • 14:44 mutante: an-worker1082 - started ferm (was failed due to DNS hickup)
  • 14:44 jelto@cumin1001: conftool action : set/pooled=inactive; selector: name=mw1450.eqiad.wmnet
  • 14:43 jelto@cumin1001: conftool action : set/pooled=inactive; selector: name=mw144[7-9].eqiad.wmnet
  • 14:41 mutante: mw1419 - started ferm
  • 13:35 sukhe: ran homer for Gerrit 712400: Set up BGP peering to doh4002 in ulsfo
  • 13:23 mutante: mw1453 - manual powercycle after it never rebooted when the reimage cookbook tries to trigger one
  • 13:22 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1450.eqiad.wmnet with reason: setup new mediawiki servers in eqiad https://phabricator.wikimedia.org/T279309
  • 13:21 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1450.eqiad.wmnet with reason: setup new mediawiki servers in eqiad https://phabricator.wikimedia.org/T279309
  • 13:21 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw[1447-1449].eqiad.wmnet with reason: setup new mediawiki servers in eqiad https://phabricator.wikimedia.org/T279309
  • 13:21 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw[1447-1449].eqiad.wmnet with reason: setup new mediawiki servers in eqiad https://phabricator.wikimedia.org/T279309
  • 12:54 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mw1454.eqiad.wmnet with reason: REIMAGE
  • 12:53 godog: set runtime envoy.reloadable_features.strict_1xx_and_204_response_headers=false on thanos-fe* - T288815
  • 12:53 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on thanos-fe2001.codfw.wmnet with reason: new setup
  • 12:53 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on thanos-fe2001.codfw.wmnet with reason: new setup
  • 12:52 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1454.eqiad.wmnet with reason: REIMAGE
  • 12:33 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mw1452.eqiad.wmnet with reason: REIMAGE
  • 12:31 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mw1451.eqiad.wmnet with reason: REIMAGE
  • 12:30 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1452.eqiad.wmnet with reason: REIMAGE
  • 12:29 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on mw1450.eqiad.wmnet with reason: new setup
  • 12:29 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on mw1450.eqiad.wmnet with reason: new setup
  • 12:29 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mw1450.eqiad.wmnet with reason: REIMAGE
  • 12:28 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1451.eqiad.wmnet with reason: REIMAGE
  • 12:26 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1450.eqiad.wmnet with reason: REIMAGE
  • 12:26 jelto@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mw1455.eqiad.wmnet with reason: REIMAGE
  • 12:24 urbanecm: mwscript extensions/Translate/scripts/refresh-translatable-pages.php --wiki=commonswiki --jobqueue # T288683
  • 12:24 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1455.eqiad.wmnet with reason: REIMAGE
  • 12:24 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1449.eqiad.wmnet with reason: REIMAGE
  • 12:22 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1448.eqiad.wmnet with reason: REIMAGE
  • 12:21 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1444.eqiad.wmnet
  • 12:21 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1449.eqiad.wmnet with reason: REIMAGE
  • 12:21 mutante: mw1444 - scap pull, pooled as new API server for the first time
  • 12:20 dzahn@cumin1001: conftool action : set/weight=30; selector: name=mw1444.eqiad.wmnet
  • 12:19 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1448.eqiad.wmnet with reason: REIMAGE
  • 11:59 urbanecm: mwscript extensions/Translate/scripts/refresh-translatable-pages.php --wiki=mediawikiwiki --jobqueue # T288683
  • 11:36 topranks: cloudsw1-d5-eqiad - configuring new 2x40G trunk to cloudsw2-d5-eqiad with homer (T277340)
  • 11:11 jelto: mw1455 - powering on via mgmt - OS install, initial setup (T279309, T273915)
  • 10:22 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on mw1444.eqiad.wmnet with reason: new setup
  • 10:22 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on mw1444.eqiad.wmnet with reason: new setup
  • 10:07 filippo@puppetmaster1001: conftool action : set/pooled=yes; selector: name=thanos-fe2003.codfw.wmnet
  • 09:53 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on mw1444.eqiad.wmnet with reason: new setup
  • 09:53 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on mw1444.eqiad.wmnet with reason: new setup
  • 09:42 mutante: mw1448, mw1449, mw1450 - powering on via mgmt - OS install, initial setup (T279309, T273915)
  • 09:38 filippo@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on thanos-fe2003.codfw.wmnet with reason: REIMAGE
  • 09:35 filippo@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on thanos-fe2003.codfw.wmnet with reason: REIMAGE
  • 09:35 mutante: mw1444 - signed puppet cert, initial run (after hardware fix) T279309
  • 09:17 filippo@puppetmaster1001: conftool action : set/pooled=no; selector: name=thanos-fe2003.codfw.wmnet
  • 09:17 filippo@puppetmaster1001: conftool action : set/pooled=yes; selector: name=thanos-fe2001.codfw.wmnet
  • 09:15 filippo@puppetmaster1001: conftool action : set/pooled=yes; selector: name=thanos-fe2002.codfw.wmnet
  • 08:42 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on thanos-fe2002.codfw.wmnet with reason: REIMAGE
  • 08:40 filippo@puppetmaster1001: conftool action : set/pooled=no; selector: name=thanos-fe2002.codfw.wmnet
  • 08:40 filippo@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on thanos-fe2002.codfw.wmnet with reason: REIMAGE
  • 05:24 marostegui@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on db1132.eqiad.wmnet with reason: REIMAGE
  • 05:22 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1132.eqiad.wmnet with reason: REIMAGE
  • 01:02 tgr: running extensions/GrowthExperiments/maintenance/fixLinkRecommendationData.php for Growth wikis

2021-08-12

  • 23:50 derick@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Set archive namespaces on foundationwiki to 'noindex,follow' (T288763) (duration: 00m 59s)
  • 23:47 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 23:45 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 23:38 cjming@deploy1002: Synchronized php-1.37.0-wmf.18/extensions/GrowthExperiments: Backport: Add Link: fix invalidation on non-addlink edit (T283606) (duration: 01m 00s)
  • 23:34 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 23:33 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 22:09 tgr: T283867 running userOptions.php on Growth wikis as per T283867#7280296
  • 22:00 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 21:59 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 21:57 ladsgroup@deploy1002: Synchronized php-1.37.0-wmf.18/extensions/SpamBlacklist/includes/SpamBlacklistHooks.php: Backport: Don't generate HTML when asking for ParserOutput (T288639) (duration: 00m 58s)
  • 21:52 urbanecm: Run `mwscript extensions/Translate/scripts/refresh-translatable-pages.php --wiki=$WIKI --jobqueue` for a bunch of Translate-enabled wikis (T288683)
  • 21:32 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 21:31 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 21:30 twentyafterfour@deploy1002: rebuilt and synchronized wikiversions files: all wikis to 1.37.0-wmf.18 refs T281159
  • 21:13 twentyafterfour@deploy1002: Synchronized php-1.37.0-wmf.18/extensions/DiscussionTools/includes/Notifications/EventDispatcher.php: sync Ic27418 to unblock the train refs T288775 and T281159 (duration: 01m 07s)
  • 20:56 urbanecm: [urbanecm@mwmaint2002 /srv/mediawiki/php]$ mwscript extensions/Translate/scripts/refresh-translatable-pages.php --wiki=testwikidatawiki --jobqueue # T288683, errored out
  • 20:54 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 20:54 urbanecm: [urbanecm@mwmaint2002 /srv/mediawiki/php]$ mwscript extensions/Translate/scripts/refresh-translatable-pages.php --wiki=testwiki --jobqueue # T288683
  • 20:53 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 20:24 urbanecm: [urbanecm@mwmaint2002 /srv/mediawiki/php]$ mwscript extensions/Translate/scripts/refresh-translatable-pages.php --wiki=wikimaniawiki --jobqueue # T288683
  • 20:13 urbanecm: [urbanecm@mwmaint2002 /srv/mediawiki/php]$ mwscript extensions/Translate/scripts/refresh-translatable-pages.php --wiki=wikimaniawiki --jobqueue # T288683
  • 19:43 twentyafterfour@deploy1002: Synchronized php-1.37.0-wmf.18/extensions/Translate/src/PageTranslation/TranslationPage.php: sync I2f46ab which should fix T288683 & T288700 thus unblocking the train: T281159 (duration: 01m 07s)
  • 19:36 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 19:34 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 18:52 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 18:49 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 16:47 dzahn@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host doh4002.wikimedia.org
  • 16:37 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm for new host doh4002.wikimedia.org
  • 16:33 mbsantos@deploy1002: Finished deploy [tilerator/deploy@b88cf50]: maps1005: (duration: 00m 15s)
  • 16:32 mbsantos@deploy1002: Started deploy [tilerator/deploy@b88cf50]: maps1005:
  • 16:32 effie: enabling puppet on mediawiki servers && rolling restart mcrouter
  • 16:31 mbsantos@deploy1002: Finished deploy [tilerator/deploy@b88cf50]: maps1006: (duration: 00m 15s)
  • 16:31 mbsantos@deploy1002: Started deploy [tilerator/deploy@b88cf50]: maps1006:
  • 16:31 mbsantos@deploy1002: Finished deploy [tilerator/deploy@b88cf50]: maps1007: (duration: 00m 15s)
  • 16:30 mbsantos@deploy1002: Started deploy [tilerator/deploy@b88cf50]: maps1007:
  • 16:29 mbsantos@deploy1002: Finished deploy [tilerator/deploy@b88cf50]: maps1008: (duration: 00m 15s)
  • 16:29 mbsantos@deploy1002: Started deploy [tilerator/deploy@b88cf50]: maps1008:
  • 16:29 mbsantos@deploy1002: Finished deploy [tilerator/deploy@b88cf50]: maps1009: (duration: 00m 17s)
  • 16:28 mbsantos@deploy1002: Started deploy [tilerator/deploy@b88cf50]: maps1009:
  • 16:27 mbsantos@deploy1002: Finished deploy [tilerator/deploy@b88cf50]: maps1010: (duration: 00m 15s)
  • 16:27 mbsantos@deploy1002: Started deploy [tilerator/deploy@b88cf50]: maps1010:
  • 16:26 mbsantos@deploy1002: Finished deploy [tilerator/deploy@b88cf50]: maps2005: (duration: 00m 24s)
  • 16:26 mbsantos@deploy1002: Started deploy [tilerator/deploy@b88cf50]: maps2005:
  • 16:24 mbsantos@deploy1002: Finished deploy [tilerator/deploy@b88cf50]: maps2006: (duration: 00m 23s)
  • 16:24 mbsantos@deploy1002: Started deploy [tilerator/deploy@b88cf50]: maps2006:
  • 16:23 mbsantos@deploy1002: Finished deploy [tilerator/deploy@b88cf50]: maps2007: (duration: 00m 27s)
  • 16:23 mbsantos@deploy1002: Started deploy [tilerator/deploy@b88cf50]: maps2007:
  • 16:22 mbsantos@deploy1002: Finished deploy [tilerator/deploy@b88cf50]: maps2008: (duration: 00m 24s)
  • 16:21 mbsantos@deploy1002: Started deploy [tilerator/deploy@b88cf50]: maps2008:
  • 16:16 mbsantos@deploy1002: Finished deploy [tilerator/deploy@b88cf50]: maps2009: (duration: 00m 24s)
  • 16:15 mbsantos@deploy1002: Started deploy [tilerator/deploy@b88cf50]: maps2009:
  • 16:14 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
  • 16:14 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
  • 16:14 mbsantos@deploy1002: Finished deploy [tilerator/deploy@b88cf50]: maps2010: (duration: 00m 23s)
  • 16:14 mbsantos@deploy1002: Started deploy [tilerator/deploy@b88cf50]: maps2010:
  • 16:14 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
  • 16:14 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
  • 16:13 mbsantos@deploy1002: Finished deploy [tilerator/deploy@b88cf50]: Deploy tilerator 1.1.7-beta.5 (duration: 02m 30s)
  • 16:10 mbsantos@deploy1002: Started deploy [tilerator/deploy@b88cf50]: Deploy tilerator 1.1.7-beta.5
  • 15:50 papaul: powerdown ms-be2060 for relocation
  • 15:49 mutante: netbox - deleted 2620:0:863:1:198:35:26:6/64 (along with 198.35.26.6) due to the previous error when running makevm cookbook (T288630)
  • 15:47 mutante: netbox - deleted 198.35.26.6 (doh4002)
  • 15:44 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:37 dzahn@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host doh4002.wikimedia.org
  • 15:36 pt1979@cumin2002: START - Cookbook sre.dns.netbox
  • 15:35 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm for new host doh4002.wikimedia.org
  • 15:33 moritzm: importing openjdk-8 8u302-b08-1+deb11u1 to apt.wikimedia.org/component/jdk8 T287960
  • 15:10 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts druid1002.eqiad.wmnet
  • 15:07 filippo@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on thanos-fe1003.eqiad.wmnet with reason: REIMAGE
  • 15:04 filippo@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on thanos-fe1003.eqiad.wmnet with reason: REIMAGE
  • 15:00 btullis@cumin1001: START - Cookbook sre.hosts.decommission for hosts druid1002.eqiad.wmnet
  • 14:48 papaul: reset to factory ps-test-d8-codfw
  • 14:35 filippo@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on thanos-fe1002.eqiad.wmnet with reason: REIMAGE
  • 14:33 filippo@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on thanos-fe1002.eqiad.wmnet with reason: REIMAGE
  • 14:33 papaul: reset to factory ps2-test-d8-codfw
  • 14:25 hnowlan: reenabling puppet on P:cassandra
  • 13:57 hnowlan: disabling puppet on P:cassandra to test removal of cassandra-metrics-agent
  • 13:50 effie: disable puppet on mediawiki hosts to merge 705852
  • 13:39 hnowlan@cumin1001: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:sessionstore: Restarting to pick up Java security updates - hnowlan@cumin1001
  • 13:31 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts druid1003.eqiad.wmnet
  • 13:20 btullis@cumin1001: START - Cookbook sre.hosts.decommission for hosts druid1003.eqiad.wmnet
  • 13:03 hnowlan@cumin1001: START - Cookbook sre.cassandra.roll-restart for nodes matching A:sessionstore: Restarting to pick up Java security updates - hnowlan@cumin1001
  • 12:43 godog: upgrade NIC firmware on thanos-be2* / thanos-fe2* - T286722
  • 12:28 hnowlan@cumin1001: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:restbase-codfw: Restarting to pick up Java security updates - hnowlan@cumin1001
  • 12:23 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on thanos-fe1001.eqiad.wmnet with reason: REIMAGE
  • 12:18 filippo@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on thanos-fe1001.eqiad.wmnet with reason: REIMAGE
  • 12:09 godog: upgrade NIC firmware on thanos-be1* - T286722
  • 12:08 godog: upgrade NIC firmware on thanos-fe100[34] - T286722
  • 12:04 godog: upgrade NIC firmware on thanos-fe100[12] - T286722
  • 11:57 moritzm: installing openexr security updates
  • 11:47 moritzm: installing bluez security updates on buster
  • 10:22 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Holger Knust out of all services on: 1743 hosts
  • 10:22 jmm@cumin2002: START - Cookbook sre.idm.logout Logging Holger Knust out of all services on: 1743 hosts
  • 10:18 marostegui@cumin1001: dbctl commit (dc=all): 'Pool db2107 into API', diff saved to https://phabricator.wikimedia.org/P17016 and previous config saved to /var/cache/conftool/dbconfig/20210812-101840-marostegui.json
  • 10:18 mvolz@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'citoid' for release 'production' .
  • 10:13 mvolz@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'citoid' for release 'production' .
  • 10:08 mvolz@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'citoid' for release 'staging' .
  • 09:49 hnowlan@cumin1001: START - Cookbook sre.cassandra.roll-restart for nodes matching A:restbase-codfw: Restarting to pick up Java security updates - hnowlan@cumin1001
  • 09:38 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 09:36 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 09:31 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.37.0-wmf.18/extensions/Wikibase/: Backport: Revert "Inject NamespaceInfo into EntitySourceDefinitionsConfigParser" (T288724) (2/2) (duration: 01m 12s)
  • 09:30 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 8 hosts with reason: Reconfiguring replication tree T284825
  • 09:30 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 8 hosts with reason: Reconfiguring replication tree T284825
  • 09:30 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 09:29 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.37.0-wmf.18/extensions/Wikibase/data-access/: Backport: Revert "Inject NamespaceInfo into EntitySourceDefinitionsConfigParser" (T288724) (1/2) (duration: 01m 08s)
  • 09:29 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 09:29 marostegui@cumin1001: dbctl commit (dc=all): 'db2107 (re)pooling @ 100%: After reimage', diff saved to https://phabricator.wikimedia.org/P17015 and previous config saved to /var/cache/conftool/dbconfig/20210812-092909-root.json
  • 09:28 kormat: reconfiguring replication tree for pc1 T284825
  • 09:27 kormat@deploy1002: Synchronized wmf-config/ProductionServices.php: Promote pc2011 to primary of pc1 T284825 (duration: 01m 10s)
  • 09:14 marostegui@cumin1001: dbctl commit (dc=all): 'db2107 (re)pooling @ 80%: After reimage', diff saved to https://phabricator.wikimedia.org/P17014 and previous config saved to /var/cache/conftool/dbconfig/20210812-091406-root.json
  • 08:59 marostegui@cumin1001: dbctl commit (dc=all): 'db2107 (re)pooling @ 60%: After reimage', diff saved to https://phabricator.wikimedia.org/P17013 and previous config saved to /var/cache/conftool/dbconfig/20210812-085902-root.json
  • 08:57 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 08:56 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 08:55 dcaro@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on cloudservices[1003-1004].wikimedia.org with reason: T288725
  • 08:55 dcaro@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on cloudservices[1003-1004].wikimedia.org with reason: T288725
  • 08:53 kormat@deploy1002: Synchronized wmf-config/ProductionServices.php: Adding new pc hosts (duration: 01m 09s)
  • 08:48 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host copernicium.wikimedia.org
  • 08:48 jmm@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host theemin.codfw.wmnet
  • 08:43 marostegui@cumin1001: dbctl commit (dc=all): 'db2107 (re)pooling @ 50%: After reimage', diff saved to https://phabricator.wikimedia.org/P17012 and previous config saved to /var/cache/conftool/dbconfig/20210812-084359-root.json
  • 08:43 jmm@cumin1001: START - Cookbook sre.hosts.reboot-single for host theemin.codfw.wmnet
  • 08:43 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host copernicium.wikimedia.org
  • 08:41 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host people1003.eqiad.wmnet
  • 08:38 jmm@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cumin2002.codfw.wmnet
  • 08:37 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host people1003.eqiad.wmnet
  • 08:29 jmm@cumin1001: START - Cookbook sre.hosts.reboot-single for host cumin2002.codfw.wmnet
  • 08:28 marostegui@cumin1001: dbctl commit (dc=all): 'db2107 (re)pooling @ 40%: After reimage', diff saved to https://phabricator.wikimedia.org/P17011 and previous config saved to /var/cache/conftool/dbconfig/20210812-082855-root.json
  • 08:21 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host people2002.codfw.wmnet
  • 08:18 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host people2002.codfw.wmnet
  • 08:13 marostegui@cumin1001: dbctl commit (dc=all): 'db2107 (re)pooling @ 30%: After reimage', diff saved to https://phabricator.wikimedia.org/P17010 and previous config saved to /var/cache/conftool/dbconfig/20210812-081351-root.json
  • 07:58 marostegui@cumin1001: dbctl commit (dc=all): 'db2107 (re)pooling @ 20%: After reimage', diff saved to https://phabricator.wikimedia.org/P17009 and previous config saved to /var/cache/conftool/dbconfig/20210812-075848-root.json
  • 07:58 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sretest1002.eqiad.wmnet
  • 07:53 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host sretest1002.eqiad.wmnet
  • 07:52 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-fe2001.codfw.wmnet
  • 07:46 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host thanos-fe2001.codfw.wmnet
  • 07:43 marostegui@cumin1001: dbctl commit (dc=all): 'db2107 (re)pooling @ 15%: After reimage', diff saved to https://phabricator.wikimedia.org/P17008 and previous config saved to /var/cache/conftool/dbconfig/20210812-074344-root.json
  • 07:40 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ldap-replica2006.wikimedia.org
  • 07:38 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ldap-replica2006.wikimedia.org
  • 07:36 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ldap-replica2005.wikimedia.org
  • 07:34 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ldap-replica2005.wikimedia.org
  • 07:32 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ldap-replica1004.wikimedia.org
  • 07:30 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ldap-replica1004.wikimedia.org
  • 07:28 marostegui@cumin1001: dbctl commit (dc=all): 'db2107 (re)pooling @ 10%: After reimage', diff saved to https://phabricator.wikimedia.org/P17007 and previous config saved to /var/cache/conftool/dbconfig/20210812-072841-root.json
  • 07:26 godog: temp upgrade thanos to 0.22.0 on thanos-fe2001 to help debug a potential upstream issue
  • 07:25 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ldap-replica1003.wikimedia.org
  • 07:23 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ldap-replica1003.wikimedia.org
  • 07:21 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host failoid1002.eqiad.wmnet
  • 07:17 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host failoid1002.eqiad.wmnet
  • 07:16 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host failoid2002.codfw.wmnet
  • 07:13 marostegui@cumin1001: dbctl commit (dc=all): 'db2107 (re)pooling @ 5%: After reimage', diff saved to https://phabricator.wikimedia.org/P17006 and previous config saved to /var/cache/conftool/dbconfig/20210812-071337-root.json
  • 07:13 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host failoid2002.codfw.wmnet
  • 06:58 marostegui@cumin1001: dbctl commit (dc=all): 'db2107 (re)pooling @ 1%: After reimage', diff saved to https://phabricator.wikimedia.org/P17005 and previous config saved to /var/cache/conftool/dbconfig/20210812-065833-root.json
  • 06:49 tstarling@deploy1002: Synchronized php-1.37.0-wmf.18/extensions/SecurePoll/includes/Crypt/GpgCrypt.php: fix for T288711 failure of election creation (duration: 01m 09s)
  • 06:47 moritzm: updating bullseye installations to the latest state of testing
  • 06:46 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 06:36 moritzm: installing c-ares security updates on Bullseye
  • 06:32 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 06:31 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 06:00 marostegui: Failover m3 from db1132 to db1107 - T288197
  • 05:15 ryankemper: [WDQS] `sudo -i cookbook sre.wdqs.data-transfer --source wdqs2005.codfw.wmnet --dest wdqs2004.codfw.wmnet --reason "transferring fresh wikidata journal after nuking wdqs2004's" --blazegraph_instance blazegraph`
  • 05:15 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 05:14 ryankemper: [WDQS Deploy] Deploy complete. Successful test query placed on query.wikidata.org, there's no relevant criticals in Icinga, and Grafana looks good
  • 04:45 eileen: tools revision changed from c26a8c0cb6 to 15bfaa7117
  • 04:44 ryankemper: [WDQS Deploy] Restarting `wdqs-categories` across lvs-managed hosts, one node at a time: `sudo -E cumin -b 1 'A:wdqs-all and not A:wdqs-test' 'depool && sleep 45 && systemctl restart wdqs-categories && sleep 45 && pool'`
  • 04:44 ryankemper: [WDQS Deploy] Restarted `wdqs-categories` across all test hosts simultaneously: `sudo -E cumin 'A:wdqs-test' 'systemctl restart wdqs-categories'`
  • 04:44 ryankemper: [WDQS Deploy] Restarted `wdqs-updater` across all hosts, 4 hosts at a time: `sudo -E cumin -b 4 'A:wdqs-all' 'systemctl restart wdqs-updater'`
  • 04:43 ryankemper@deploy1002: Finished deploy [wdqs/wdqs@9d03aaa]: 0.3.81 (duration: 02m 07s)
  • 04:41 ryankemper@deploy1002: Started deploy [wdqs/wdqs@9d03aaa]: 0.3.81
  • 04:41 ryankemper: [WDQS Deploy] Re-rolling deploy so that `wdqs2004` gets deployed to
  • 04:41 ryankemper: [WDQS] `wdqs2004`'s disk is full due to overinflated `wikidata.jnl`, nuking and depooling: `sudo rm -fv /srv/wdqs/wikidata.jnl && sudo depool`
  • 04:40 ryankemper@deploy1002: Finished deploy [wdqs/wdqs@9d03aaa]: 0.3.81 (duration: 17m 03s)
  • 04:26 ryankemper: [WDQS Deploy] Tests passing following deploy of `0.3.81` on canary `wdqs1003`; proceeding to rest of fleet
  • 04:23 ryankemper@deploy1002: Started deploy [wdqs/wdqs@9d03aaa]: 0.3.81
  • 04:21 ryankemper: [WDQS Deploy] Gearing up for deploy of wdqs `0.3.81`. Pre-deploy tests passing on canary `wdqs1003`
  • 03:40 eileen: process-control config revision is 7bdc78073d
  • 03:01 eileen: civicrm revision changed from d8ebf45819 to f3895dc907, config revision is 7bdc78073d

2021-08-11

  • 23:30 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 23:29 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 23:24 ebernhardson@deploy1002: Synchronized wmf-config/InitialiseSettings.php: cirrus: switch more_like traffic to codfw 2/2 (duration: 01m 08s)
  • 23:07 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 23:06 ebernhardson@deploy1002: Synchronized wmf-config/InitialiseSettings.php: cirrus: switch more_like traffic to codfw 1/2 (duration: 01m 08s)
  • 23:06 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 22:34 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 22:33 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 22:32 legoktm@deploy1002: Synchronized php-1.37.0-wmf.18/extensions/Score/includes/Score.php: Record shell outs in statsd (duration: 01m 07s)
  • 22:30 legoktm@deploy1002: Synchronized php-1.37.0-wmf.17/extensions/Score/includes/Score.php: Record shell outs in statsd (duration: 01m 08s)
  • 21:46 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 21:42 ladsgroup@deploy1002: Synchronized php-1.37.0-wmf.17/extensions/SpamBlacklist/includes/SpamBlacklistHooks.php: Backport: Avoid using deprecated WikiPage::prepareContentForEdit (T288639) (duration: 01m 08s)
  • 21:40 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 21:33 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 21:30 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 21:29 ladsgroup@deploy1002: Synchronized php-1.37.0-wmf.18/extensions/SpamBlacklist/includes/SpamBlacklistHooks.php: Backport: Avoid using deprecated WikiPage::prepareContentForEdit (T288639) (duration: 01m 07s)
  • 21:18 legoktm@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 20:58 legoktm@cumin1001: START - Cookbook sre.dns.netbox
  • 20:30 urbanecm: [urbanecm@mwmaint2002 ~]$ mwscript namespaceDupes.php --wiki=wikimaniawiki --move-talk --add-prefix=T288643 --fix # T288643
  • 20:28 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 20:27 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 20:23 mholloway-shell@deploy1002: Synchronized php-1.37.0-wmf.17/extensions/Popups: Log VirtualPageView events to Event Platform (T288655) (duration: 01m 06s)
  • 20:21 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 20:20 mholloway-shell@deploy1002: Synchronized php-1.37.0-wmf.18/extensions/Popups: Log VirtualPageView events to Event Platform (T288655) (duration: 01m 09s)
  • 20:19 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 19:45 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 19:39 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 19:36 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 19:35 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 19:30 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 19:29 jhuneidi@deploy1002: Synchronized php: group1 wikis to 1.37.0-wmf.18 refs T281159 (duration: 01m 08s)
  • 19:28 jhuneidi@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.37.0-wmf.18 refs T281159
  • 19:22 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 19:19 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 19:10 jhuneidi@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.37.0-wmf.18 refs T281159
  • 19:01 jgleeson: payments-wiki updated from a70aaa7944 to 0a27dbe9b6
  • 18:32 hnowlan@cumin1001: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:restbase-eqiad: Restarting to pick up Java security updates - hnowlan@cumin1001
  • 18:24 legoktm@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
  • 18:23 legoktm@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
  • 18:23 legoktm@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
  • 18:22 legoktm@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
  • 18:22 legoktm@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
  • 18:21 bstorm: removed thirdparty/kubeadm-k8s-1-17 in reprepro
  • 18:21 legoktm@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
  • 18:20 legoktm@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 18:19 legoktm@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 18:04 ebernhardson@deploy1002: Finished deploy [wikimedia/discovery/analytics@563f876]: process_sparql_query: increase parallelism to help backfill (duration: 02m 21s)
  • 18:02 ebernhardson@deploy1002: Started deploy [wikimedia/discovery/analytics@563f876]: process_sparql_query: increase parallelism to help backfill
  • 17:37 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 17:36 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 17:35 jforrester@deploy1002: Synchronized php-1.37.0-wmf.18/includes/specials/pagers/ContribsPager.php: T288563 Don't explode Special:Contributions on extension-formatted rows (3/3) (duration: 01m 06s)
  • 17:34 jforrester@deploy1002: Synchronized php-1.37.0-wmf.18/includes/Revision/RevisionFactory.php: T288563 Don't explode Special:Contributions on extension-formatted rows (2/3) (duration: 01m 08s)
  • 17:32 jforrester@deploy1002: Synchronized php-1.37.0-wmf.18/includes/Revision/RevisionStore.php: T288563 Don't explode Special:Contributions on extension-formatted rows (1/3) (duration: 01m 09s)
  • 16:22 dancy: Results of testing php_fpm_always_restart: php_fpm_always_restart=false: 1m19.942s php_fpm_always_restart=true: 3m12.836s
  • 16:19 dancy@deploy1002: Synchronized README: Testing scap php-rpm rolling restart (after) (duration: 03m 12s)
  • 16:16 thcipriani: moment of truth for php-fpm-always-restart in scap
  • 16:10 btullis@cumin1001: END (PASS) - Cookbook sre.druid.roll-restart-workers (exit_code=0) for Druid analytics cluster: Roll restart of Druid jvm daemons. - btullis@cumin1001
  • 16:05 dancy@deploy1002: Synchronized README: Testing scap php-rpm rolling restart (before) (duration: 01m 19s)
  • 15:37 hnowlan@cumin1001: START - Cookbook sre.cassandra.roll-restart for nodes matching A:restbase-eqiad: Restarting to pick up Java security updates - hnowlan@cumin1001
  • 15:12 moritzm: import openjdk-8 8u302-b08-1+wmf1 to bullseye-wikimedia (bootstrap build, not to be used yet) T287960
  • 15:02 sukhe@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts bast4002.wikimedia.org
  • 14:57 btullis@cumin1001: START - Cookbook sre.druid.roll-restart-workers for Druid analytics cluster: Roll restart of Druid jvm daemons. - btullis@cumin1001
  • 14:55 sukhe@cumin1001: START - Cookbook sre.hosts.decommission for hosts bast4002.wikimedia.org
  • 14:55 sukhe@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=99) for hosts bast4002.wikimedia.org
  • 14:44 sukhe@cumin1001: START - Cookbook sre.hosts.decommission for hosts bast4002.wikimedia.org
  • 14:44 sukhe: s/depool/decommission bast4002.wikimedia.org - T288579
  • 14:43 sukhe: depool bast4002.wikimedia.org - T288579
  • 14:23 moritzm: installing mx2002 T286911
  • 14:21 hnowlan: disabled cassandra-metrics-collector on maps*
  • 13:33 moritzm: installing Java 8/Java 11 security updates on various analytics hosts
  • 13:29 btullis@cumin1001: END (PASS) - Cookbook sre.druid.roll-restart-workers (exit_code=0) for Druid analytics cluster: Roll restart of Druid jvm daemons. - btullis@cumin1001
  • 12:45 moritzm: imported openjdk-8 8u302-b08-1~deb10u1 to component/jdk8 for buster-wikimedia (forward port of the latest Java 8 security release)
  • 12:32 godog: roll-restart prometheus T284213
  • 12:16 moritzm: installing c-ares security updates on stretch
  • 12:16 btullis@cumin1001: START - Cookbook sre.druid.roll-restart-workers for Druid analytics cluster: Roll restart of Druid jvm daemons. - btullis@cumin1001
  • 12:14 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 12:08 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 11:41 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 11:33 Lucas_WMDE: EU backport+config window done
  • 11:32 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Remove $wmgWikibaseClientEntityNamespaces (T257260) (duration: 01m 08s)
  • 11:31 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 11:29 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 11:29 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/Wikibase.php: Config: Stop setting $wgWBClientSettings['entityNamespaces'] (T257260) (duration: 01m 07s)
  • 11:25 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Remove $wmgWikibaseRepoEntityNamespaces (T257260) (duration: 01m 08s)
  • 11:22 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/Wikibase.php: Config: Stop setting $wgWBRepoSettings['entityNamespaces'] (T257260) (duration: 01m 08s)
  • 11:17 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 11:17 btullis@cumin1001: END (PASS) - Cookbook sre.druid.roll-restart-workers (exit_code=0) for Druid analytics cluster: Roll restart of Druid jvm daemons. - btullis@cumin1001
  • 11:17 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.37.0-wmf.18/extensions/SecurePoll/: Backport: Add ad-hoc logging to tally process (T288366) (duration: 01m 09s)
  • 11:11 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 11:06 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Disable Collection sidebar link on English Wikisource (T288021) (duration: 01m 14s)
  • 10:58 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 10:42 moritzm: rolling restart of Buster-based maps services to pick up c-ares security updates
  • 10:37 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 10:20 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 10:02 btullis@cumin1001: START - Cookbook sre.druid.roll-restart-workers for Druid analytics cluster: Roll restart of Druid jvm daemons. - btullis@cumin1001
  • 09:50 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.37.0-wmf.18/includes/specials/SpecialWhatLinksHere.php: Backport: Fix SelectQueryBuilder use in SpecialWhatLinksHere (T288565) (duration: 01m 08s)
  • 09:50 godog: upgrade thanos on cloudmetrics* - T288604
  • 09:26 godog: upgrade thanos on prometheus* - T288604
  • 09:21 elukey: run "sudo find /var/log/airflow -type f -mtime +15 -delete" on an-airflow1001 to free space (root partition almost full)
  • 09:19 jgiannelos@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
  • 09:15 jgiannelos@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
  • 09:11 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'mobileapps' for release 'staging' .
  • 09:05 godog: upgrade thanos on thanos-fe* - T288604
  • 08:23 kormat@deploy1002: Synchronized wmf-config/ProductionServices.php: Minor cleanup of parsercache entries (duration: 01m 17s)
  • 08:19 moritzm: restart Aphlict to pick up c-ares security updates
  • 08:17 moritzm: restart Turnilo on an-tool1007 to pick up c-ares security updates
  • 08:02 moritzm: rolling restart of AQS to pick up the c-ares security update
  • 07:09 moritzm: restart etherpad-lite on etherpad1002 to pick up c-ares security updates
  • 06:59 _joe_: deleting the staging deployment of mwdebug
  • 05:55 marostegui@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on db2107.codfw.wmnet with reason: REIMAGE
  • 05:53 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db2107.codfw.wmnet with reason: REIMAGE
  • 05:22 marostegui: Stop replication on db2107 T287454
  • 05:18 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2107 T287454', diff saved to https://phabricator.wikimedia.org/P16999 and previous config saved to /var/cache/conftool/dbconfig/20210811-051856-marostegui.json
  • 05:10 marostegui@cumin1001: dbctl commit (dc=all): 'Promote db2104 to s2 master and set section read-write T287454', diff saved to https://phabricator.wikimedia.org/P16998 and previous config saved to /var/cache/conftool/dbconfig/20210811-051041-root.json
  • 05:00 marostegui@cumin1001: dbctl commit (dc=all): 'Set s2 codfw as read-only for maintenance - T287454', diff saved to https://phabricator.wikimedia.org/P16997 and previous config saved to /var/cache/conftool/dbconfig/20210811-050040-marostegui.json
  • 05:00 marostegui: Starting s2 codfw failover from db2107 to db2104 - T287454
  • 04:16 marostegui@cumin1001: dbctl commit (dc=all): 'Set db2104 with weight 0 T287454', diff saved to https://phabricator.wikimedia.org/P16996 and previous config saved to /var/cache/conftool/dbconfig/20210811-041625-root.json
  • 04:15 root@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 24 hosts with reason: Master switchover s2 T287454
  • 04:15 root@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 24 hosts with reason: Master switchover s2 T287454
  • 03:45 razzi@cumin1001: END (FAIL) - Cookbook sre.aqs.roll-restart (exit_code=99) for AQS aqs cluster: Roll restart of all AQS's nodejs daemons. - razzi@cumin1001
  • 03:45 razzi@cumin1001: START - Cookbook sre.aqs.roll-restart for AQS aqs cluster: Roll restart of all AQS's nodejs daemons. - razzi@cumin1001
  • 01:49 dpifke@deploy1002: Finished deploy [performance/navtiming@12d8381]: Revert https://gerrit.wikimedia.org/r/c/performance/navtiming/+/693423 (duration: 00m 05s)
  • 01:49 dpifke@deploy1002: Started deploy [performance/navtiming@12d8381]: Revert https://gerrit.wikimedia.org/r/c/performance/navtiming/+/693423
  • 01:47 dpifke@deploy1002: Finished deploy [performance/navtiming@12d8381]: Deploying https://gerrit.wikimedia.org/r/c/performance/navtiming/+/693423 (duration: 00m 06s)
  • 01:47 dpifke@deploy1002: Started deploy [performance/navtiming@12d8381]: Deploying https://gerrit.wikimedia.org/r/c/performance/navtiming/+/693423
  • 01:38 legoktm@deploy1002: Synchronized docroot/noc/conf/index.php: noc: Expose primary datacenter on conf/ (duration: 01m 06s)
  • 01:22 bstorm@cumin1001: END (PASS) - Cookbook wmcs.wikireplicas.add_wiki (exit_code=0)
  • 01:22 bstorm@cumin1001: Added views for new wiki: jvwikisource T286245
  • 01:00 bstorm@cumin1001: START - Cookbook wmcs.wikireplicas.add_wiki
  • 00:38 bstorm@cumin1001: END (ERROR) - Cookbook wmcs.wikireplicas.add_wiki (exit_code=97)
  • 00:36 bstorm@cumin1001: START - Cookbook wmcs.wikireplicas.add_wiki

2021-08-10

  • 23:33 cjming@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Enable user links feature for pilot wikis, modern vector (T288274) (duration: 01m 08s)
  • 23:18 legoktm@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 23:06 krinkle@deploy1002: Synchronized wmf-config/: I13e88c303a, T284418 (duration: 01m 07s)
  • 23:02 legoktm@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 22:58 eileen: process-control config revision is 7bdc78073d
  • 22:50 krinkle@deploy1002: Synchronized wmf-config/: I8052636, I2038702b7e0 (duration: 01m 21s)
  • 21:50 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1054.eqiad.wmnet with reason: REIMAGE
  • 21:48 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1053.eqiad.wmnet with reason: REIMAGE
  • 21:46 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1054.eqiad.wmnet with reason: REIMAGE
  • 21:46 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1052.eqiad.wmnet with reason: REIMAGE
  • 21:44 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1053.eqiad.wmnet with reason: REIMAGE
  • 21:44 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1051.eqiad.wmnet with reason: REIMAGE
  • 21:42 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1052.eqiad.wmnet with reason: REIMAGE
  • 21:42 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1050.eqiad.wmnet with reason: REIMAGE
  • 21:40 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1051.eqiad.wmnet with reason: REIMAGE
  • 21:40 ryankemper: [WDQS] `ryankemper@wdqs2005:~$ sudo pool`
  • 21:40 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1049.eqiad.wmnet with reason: REIMAGE
  • 21:40 ryankemper: T288501 `ryankemper@wdqs2003:~$ sudo pool`
  • 21:38 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1050.eqiad.wmnet with reason: REIMAGE
  • 21:37 robh@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mc1048.eqiad.wmnet with reason: REIMAGE
  • 21:36 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1049.eqiad.wmnet with reason: REIMAGE
  • 21:35 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1048.eqiad.wmnet with reason: REIMAGE
  • 21:35 robh@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mc1047.eqiad.wmnet with reason: REIMAGE
  • 21:33 robh@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mc1046.eqiad.wmnet with reason: REIMAGE
  • 21:32 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1047.eqiad.wmnet with reason: REIMAGE
  • 21:30 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1046.eqiad.wmnet with reason: REIMAGE
  • 21:08 jhuneidi@deploy1002: rebuilt and synchronized wikiversions files: Revert "group0 wikis to 1.37.0-wmf.18"
  • 21:02 krinkle@deploy1002: Synchronized wmf-config/: I3b54d163b6 (duration: 01m 09s)
  • 20:54 krinkle@deploy1002: Synchronized wmf-config/CommonSettings.php: If7a8d6b6 (duration: 01m 22s)
  • 20:43 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on cloudvirt1038.eqiad.wmnet with reason: REIMAGE
  • 20:42 krinkle@deploy1002: Synchronized wmf-config/: Ic5ff34b (duration: 01m 08s)
  • 20:40 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1038.eqiad.wmnet with reason: REIMAGE
  • 20:37 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1045.eqiad.wmnet with reason: REIMAGE
  • 20:35 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1044.eqiad.wmnet with reason: REIMAGE
  • 20:34 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1045.eqiad.wmnet with reason: REIMAGE
  • 20:33 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1043.eqiad.wmnet with reason: REIMAGE
  • 20:32 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1044.eqiad.wmnet with reason: REIMAGE
  • 20:31 krinkle@deploy1002: Synchronized docroot/noc/: Ic013a93998f (duration: 01m 37s)
  • 20:31 robh@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mc1042.eqiad.wmnet with reason: REIMAGE
  • 20:30 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1043.eqiad.wmnet with reason: REIMAGE
  • 20:29 robh@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mc1041.eqiad.wmnet with reason: REIMAGE
  • 20:28 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1042.eqiad.wmnet with reason: REIMAGE
  • 20:26 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1041.eqiad.wmnet with reason: REIMAGE
  • 19:29 robh@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mc1040.eqiad.wmnet with reason: REIMAGE
  • 19:27 robh@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mc1039.eqiad.wmnet with reason: REIMAGE
  • 19:27 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1040.eqiad.wmnet with reason: REIMAGE
  • 19:25 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1039.eqiad.wmnet with reason: REIMAGE
  • 19:16 cmjohnson@cumin1001: END (ERROR) - Cookbook sre.dns.netbox (exit_code=97)
  • 19:15 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 19:09 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on dumpsdata1005.eqiad.wmnet with reason: REIMAGE
  • 19:09 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on ganeti1024.eqiad.wmnet with reason: REIMAGE
  • 19:07 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dumpsdata1004.eqiad.wmnet with reason: REIMAGE
  • 19:05 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on ganeti1023.eqiad.wmnet with reason: REIMAGE
  • 19:04 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on dumpsdata1005.eqiad.wmnet with reason: REIMAGE
  • 19:04 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti1024.eqiad.wmnet with reason: REIMAGE
  • 19:04 jhuneidi@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.37.0-wmf.18 refs T281159
  • 19:04 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on dumpsdata1004.eqiad.wmnet with reason: REIMAGE
  • 19:03 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti1023.eqiad.wmnet with reason: REIMAGE
  • 18:49 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 18:47 ryankemper: [WDQS] `ryankemper@wdqs2005:~$ sudo depool` (~1.26 hours of lag)
  • 18:46 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 18:46 ryankemper: T288501 (Misread grafana graph, `wdqs2003` only has 1.33 hours to catch up on)
  • 18:45 ryankemper: T288501 `data-transfer` of `wikidata.jnl` completed successfully. Host needs to catch up on ~22 hours of WDQS lag before being re-pooled
  • 18:42 ryankemper@cumin2001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 17:23 jhuneidi@deploy1002: Finished scap: testwikis wikis to 1.37.0-wmf.18 (duration: 36m 35s)
  • 17:19 ryankemper: T288501 `sudo -i cookbook sre.wdqs.data-transfer --source wdqs2005.codfw.wmnet --dest wdqs2003.codfw.wmnet --reason "transferring fresh wikidata journal to resolve disk issue" --blazegraph_instance blazegraph` on `cumin2001` tmux session `wdqs_data_xfer`
  • 17:19 ryankemper@cumin2001: START - Cookbook sre.wdqs.data-transfer
  • 17:18 mbsantos@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
  • 17:13 ryankemper: T288501 [WDQS] `ryankemper@wdqs2003:~$ sudo rm -fv /srv/wdqs/wikidata.jnl`
  • 17:09 razzi@cumin1001: END (FAIL) - Cookbook sre.druid.roll-restart-workers (exit_code=99) for Druid analytics cluster: Roll restart of Druid's jvm daemons. - razzi@cumin1001
  • 17:09 razzi@cumin1001: START - Cookbook sre.druid.roll-restart-workers for Druid analytics cluster: Roll restart of Druid's jvm daemons. - razzi@cumin1001
  • 17:06 mbsantos@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
  • 17:02 btullis@cumin1001: END (FAIL) - Cookbook sre.druid.roll-restart-workers (exit_code=99) for Druid analytics cluster: Roll restart of Druid's jvm daemons. - btullis@cumin1001
  • 17:02 btullis@cumin1001: START - Cookbook sre.druid.roll-restart-workers for Druid analytics cluster: Roll restart of Druid's jvm daemons. - btullis@cumin1001
  • 17:01 mbsantos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'mobileapps' for release 'staging' .
  • 16:49 btullis@cumin1001: END (FAIL) - Cookbook sre.druid.roll-restart-workers (exit_code=99) for Druid analytics cluster: Roll restart of Druid's jvm daemons. - btullis@cumin1001
  • 16:49 btullis@cumin1001: START - Cookbook sre.druid.roll-restart-workers for Druid analytics cluster: Roll restart of Druid's jvm daemons. - btullis@cumin1001
  • 16:47 jhuneidi@deploy1002: Started scap: testwikis wikis to 1.37.0-wmf.18
  • 16:36 ebernhardson@deploy1002: Finished deploy [wikimedia/discovery/analytics@d3c5363]: T287225: Bump rdf-spark-tools to 0.3.81 (duration: 02m 10s)
  • 16:34 ebernhardson@deploy1002: Started deploy [wikimedia/discovery/analytics@d3c5363]: T287225: Bump rdf-spark-tools to 0.3.81
  • 16:33 btullis@cumin1001: END (FAIL) - Cookbook sre.druid.roll-restart-workers (exit_code=99) for Druid analytics cluster: Roll restart of Druid's jvm daemons. - btullis@cumin1001
  • 16:33 btullis@cumin1001: START - Cookbook sre.druid.roll-restart-workers for Druid analytics cluster: Roll restart of Druid's jvm daemons. - btullis@cumin1001
  • 16:25 brennen: gitlab: run ansible to apply fix shell for backup cronjob (T288324)
  • 16:01 moritzm: installing c-ares security updates on buster
  • 14:48 ladsgroup@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Reduce ten seconds from dispatch max time (T288175) (duration: 00m 58s)
  • 13:32 moritzm: updating bullseye installations to the latest state of testing
  • 13:19 moritzm: installing perl security updates on Bullseye (older distros not affected)
  • 13:00 jayme@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 12:54 ppchelko@deploy1002: Finished deploy [restbase/deploy@5791a7a]: Add count parameter to recommendations API T287227 (duration: 37m 18s)
  • 12:42 lucaswerkmeister-wmde@deploy1002: Synchronized tests/multiversion/StaticSettingsTest.php: Config: Remove wmgWBRepoConceptBaseUri (T257260) (3/3, test) (duration: 00m 57s)
  • 12:41 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings-labs.php: Config: Remove wmgWBRepoConceptBaseUri (T257260) (2/3, beta) (duration: 00m 57s)
  • 12:39 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Remove wmgWBRepoConceptBaseUri (T257260) (1/3, prod) (duration: 00m 57s)
  • 12:36 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/Wikibase.php: Config: Stop setting $wgWBRepoSettings['conceptBaseUri'] (T257260) (duration: 00m 58s)
  • 12:23 kormat: non-destructive (🤞) testing of db-switchover against s2/eqiad T288500
  • 12:17 ppchelko@deploy1002: Started deploy [restbase/deploy@5791a7a]: Add count parameter to recommendations API T287227
  • 11:27 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5 days, 8:00:00 on planet1002.eqiad.wmnet with reason: known issue
  • 11:27 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 5 days, 8:00:00 on planet1002.eqiad.wmnet with reason: known issue
  • 10:56 marostegui: Install 10.4.21 on db1169 (s1)
  • 10:54 jayme@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 10:53 mutante: etherpad deleting 2 pads as requested in T288328
  • 10:52 marostegui: Install 10.4.21 on db1096 (s5 and s6)
  • 10:34 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
  • 10:34 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
  • 10:33 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
  • 10:33 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
  • 10:28 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 10:27 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 10:24 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 09:55 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings-labs.php: Config: Remove $wmgWikibaseClientRepoDatabase (T257260) (2/2, beta) (duration: 00m 57s)
  • 09:54 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Remove $wmgWikibaseClientRepoDatabase (T257260) (1/2, prod) (duration: 00m 57s)
  • 09:50 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/Wikibase.php: Config: Stop setting $wgWBClientSettings['repoDatabase'] (T257260) (duration: 00m 58s)
  • 09:47 jayme@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 09:23 ariel@deploy1002: Finished deploy [dumps/dumps@72ff209]: refuse to use info from corrupt run settings file (duration: 00m 03s)
  • 09:22 ariel@deploy1002: Started deploy [dumps/dumps@72ff209]: refuse to use info from corrupt run settings file
  • 09:17 kormat: running non-destructive test against s7/codfw (db2107/db2014) T288500
  • 09:05 jayme@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 09:04 moritzm: removing stale Java 8 packages from logstash1024/1025/2023/2024/2025 (ELK7 Logstash cluster is on Java 11 for a while now)
  • 09:00 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 08:58 ariel@deploy1002: Finished deploy [dumps/dumps@170e394]: more resilience when reading bad run cache settings files (duration: 00m 03s)
  • 08:58 ariel@deploy1002: Started deploy [dumps/dumps@170e394]: more resilience when reading bad run cache settings files
  • 08:49 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 08:20 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
  • 08:20 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
  • 08:19 jayme@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
  • 08:18 jayme@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
  • 08:16 jayme@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
  • 08:16 jayme@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
  • 08:15 jayme@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
  • 08:15 jayme@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
  • 08:15 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 08:14 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 08:06 godog: upload thanos 0.21.1-1 and upgrade prometheus1004 / thanos-fe2001 to it - T288326
  • 08:03 moritzm: installing openjdk-8 security updates on stretch
  • 07:33 moritzm: installing lynx security updates
  • 05:56 marostegui@cumin1001: dbctl commit (dc=all): 'db2104 (re)pooling @ 100%: repool after failed switchover', diff saved to https://phabricator.wikimedia.org/P16987 and previous config saved to /var/cache/conftool/dbconfig/20210810-055642-root.json
  • 05:41 marostegui@cumin1001: dbctl commit (dc=all): 'db2104 (re)pooling @ 75%: repool after failed switchover', diff saved to https://phabricator.wikimedia.org/P16986 and previous config saved to /var/cache/conftool/dbconfig/20210810-054139-root.json
  • 05:26 marostegui@cumin1001: dbctl commit (dc=all): 'db2104 (re)pooling @ 50%: repool after failed switchover', diff saved to https://phabricator.wikimedia.org/P16985 and previous config saved to /var/cache/conftool/dbconfig/20210810-052635-root.json
  • 05:11 marostegui@cumin1001: dbctl commit (dc=all): 'db2104 (re)pooling @ 25%: repool after failed switchover', diff saved to https://phabricator.wikimedia.org/P16984 and previous config saved to /var/cache/conftool/dbconfig/20210810-051131-root.json
  • 05:06 marostegui@cumin1001: dbctl commit (dc=all): 'Set s2 as read-write again - master has not been swapped T287454', diff saved to https://phabricator.wikimedia.org/P16983 and previous config saved to /var/cache/conftool/dbconfig/20210810-050604-root.json
  • 05:00 marostegui@cumin1001: dbctl commit (dc=all): 'Set s2 codfw as read-only for maintenance - T287454', diff saved to https://phabricator.wikimedia.org/P16982 and previous config saved to /var/cache/conftool/dbconfig/20210810-050051-root.json
  • 05:00 marostegui: Starting s2 codfw failover from db2107 to db2104 - T287454
  • 04:23 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 24 hosts with reason: Master switchover s2 T287454
  • 04:23 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 24 hosts with reason: Master switchover s2 T287454
  • 04:16 marostegui@cumin1001: dbctl commit (dc=all): 'Set db2104 with weight 0 T287454', diff saved to https://phabricator.wikimedia.org/P16981 and previous config saved to /var/cache/conftool/dbconfig/20210810-041627-root.json
  • 02:35 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 02:33 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 02:07 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 02:06 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .

2021-08-09

  • 16:12 legoktm@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'shellbox-constraints' for release 'main' .
  • 16:10 jayme@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
  • 16:09 jayme@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
  • 16:07 legoktm@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'shellbox-constraints' for release 'main' .
  • 16:07 jayme@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
  • 16:07 jayme@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
  • 16:04 legoktm@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'shellbox-constraints' for release 'main' .
  • 16:03 jayme@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
  • 16:03 jayme@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
  • 16:03 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 16:02 legoktm@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'shellbox' for release 'main' .
  • 16:02 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 16:00 legoktm@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'shellbox' for release 'main' .
  • 16:00 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 16:00 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 15:57 legoktm@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'shellbox' for release 'main' .
  • 15:34 filippo@cumin1001: END (ERROR) - Cookbook sre.hosts.reboot-single (exit_code=97) for host ms-be2065.codfw.wmnet
  • 15:33 filippo@cumin1001: END (ERROR) - Cookbook sre.hosts.reboot-single (exit_code=97) for host ms-be2064.codfw.wmnet
  • 15:33 filippo@cumin1001: END (ERROR) - Cookbook sre.hosts.reboot-single (exit_code=97) for host ms-be2062.codfw.wmnet
  • 14:17 sukhe: ran homer for Gerrit 710358: Set up BGP peering to doh5002 in eqsin
  • 14:10 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2063.codfw.wmnet
  • 14:09 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=maps100[1234].eqiad.wmnet
  • 14:06 jayme: re-enabled (and ran) puppet on all kubernetes nodes - T288345
  • 14:05 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be2065.codfw.wmnet
  • 14:05 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be2064.codfw.wmnet
  • 14:05 filippo@cumin1001: END (ERROR) - Cookbook sre.hosts.reboot-single (exit_code=97) for host ms-be2063.codfw.wmnet
  • 14:05 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be2063.codfw.wmnet
  • 14:04 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be2063.codfw.wmnet
  • 14:04 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 14:03 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be2062.codfw.wmnet
  • 14:03 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 14:02 reedy@deploy1002: Synchronized wmf-config/CommonSettings.php: T280886 UCoC comment update (duration: 00m 58s)
  • 13:58 marostegui@cumin1001: dbctl commit (dc=all): 'db2128 (re)pooling @ 100%: After mysql restart', diff saved to https://phabricator.wikimedia.org/P16979 and previous config saved to /var/cache/conftool/dbconfig/20210809-135805-root.json
  • 13:52 kormat: disabling puppet on all db hosts for roll-out of T285390
  • 13:43 marostegui@cumin1001: dbctl commit (dc=all): 'db2128 (re)pooling @ 80%: After mysql restart', diff saved to https://phabricator.wikimedia.org/P16978 and previous config saved to /var/cache/conftool/dbconfig/20210809-134301-root.json
  • 13:27 marostegui@cumin1001: dbctl commit (dc=all): 'db2128 (re)pooling @ 60%: After mysql restart', diff saved to https://phabricator.wikimedia.org/P16977 and previous config saved to /var/cache/conftool/dbconfig/20210809-132758-root.json
  • 13:12 marostegui@cumin1001: dbctl commit (dc=all): 'db2128 (re)pooling @ 40%: After mysql restart', diff saved to https://phabricator.wikimedia.org/P16976 and previous config saved to /var/cache/conftool/dbconfig/20210809-131254-root.json
  • 12:57 marostegui@cumin1001: dbctl commit (dc=all): 'db2128 (re)pooling @ 20%: After mysql restart', diff saved to https://phabricator.wikimedia.org/P16975 and previous config saved to /var/cache/conftool/dbconfig/20210809-125750-root.json
  • 12:42 marostegui@cumin1001: dbctl commit (dc=all): 'db2128 (re)pooling @ 10%: After mysql restart', diff saved to https://phabricator.wikimedia.org/P16974 and previous config saved to /var/cache/conftool/dbconfig/20210809-124247-root.json
  • 12:38 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2128 T288398', diff saved to https://phabricator.wikimedia.org/P16973 and previous config saved to /var/cache/conftool/dbconfig/20210809-123852-marostegui.json
  • 11:58 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=maps1005.eqiad.wmnet
  • 11:53 jayme: running puppet on kubernetes staging nodes (-b1 -s10) - T288345
  • 11:50 jayme: disabling puppet on all kubernetes nodes - T288345
  • 11:45 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 11:44 Lucas_WMDE: EU backport+config window done
  • 11:44 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 11:43 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Remove wmgWikibaseClientRepoNamespaces (T257260) (duration: 00m 57s)
  • 11:39 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/Wikibase.php: Config: Stop setting $wgWBClientSettings['repoNamespaces'] (T257260) (duration: 00m 57s)
  • 11:37 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 11:35 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings-labs.php: Config: Remove wmgWikibaseClientRepositories (T257260) (2/2, beta) (duration: 00m 56s)
  • 11:34 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 11:34 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Remove wmgWikibaseClientRepositories (T257260) (1/2, prod) (duration: 00m 57s)
  • 11:31 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/Wikibase.php: Config: Stop setting $wgWBClientSettings['repositories'] (T257260) (duration: 00m 57s)
  • 11:29 kormat@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on db1136.eqiad.wmnet with reason: REIMAGE
  • 11:27 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1136.eqiad.wmnet with reason: REIMAGE
  • 11:25 urbanecm: >>> \MediaWiki\MediaWikiServices::getInstance()->get('GrowthExperimentsWikiPageConfigLoader')->invalidate(Title::newFromText('MediaWiki:GrowthExperimentsConfig.json')) # dewiki shell.php; debugging Growth's wiki config
  • 11:24 urbanecm@deploy1002: Synchronized wmf-config/config/dewiki.yaml: d656435: dewiki: Enable Growth features in dark mode (T288420; 3/3) (duration: 00m 57s)
  • 11:23 urbanecm@deploy1002: Synchronized dblists/growthexperiments.dblist: d656435: dewiki: Enable Growth features in dark mode (T288420; 2/3) (duration: 00m 57s)
  • 11:21 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: d656435: dewiki: Enable Growth features in dark mode (T288420; 1/3) (duration: 00m 57s)
  • 11:19 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 11:17 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 11:16 urbanecm: [urbanecm@mwmaint2002 ~]$ mwscript extensions/GrowthExperiments/maintenance/initWikiConfig.php --wiki=dewiki --phab=T288420 # T288420
  • 11:15 urbanecm: [urbanecm@mwmaint2002 ~]$ mwscript extensions/WikimediaMaintenance/createExtensionTables.php --wiki=dewiki growthexperiments # T288420
  • 11:15 urbanecm@deploy1002: Synchronized dblists/commonsuploads.dblist: 9b9bb5b: Disable local uploads for non-administrators on nlwiki (T288386) (duration: 00m 57s)
  • 11:11 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 11:11 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 037aceb: Enable GeoData on zhwikinews (T287807) (duration: 00m 57s)
  • 11:10 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 11:05 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on 15 hosts with reason: Reimage db1136 (s7 primary) to buster T288244
  • 11:04 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 5:00:00 on 15 hosts with reason: Reimage db1136 (s7 primary) to buster T288244
  • 11:04 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 54c532f: Add *.happysrv.de to the wgCopyUploadsDomains allowlist of Wikimedia Commons (T288039) (duration: 00m 58s)
  • 10:44 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 10:42 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 10:36 ladsgroup@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Enable shellbox constraint for commons wikis (T176312) (duration: 00m 57s)
  • 10:35 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 10:33 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 10:31 awight@deploy1002: sync-file aborted: Config: [beta] Enable new VE template dialog sidebar (T286765) (duration: 00m 23s)
  • 10:27 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 10:27 ladsgroup@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Enable post edit constraint jobs in all edits (T204031) (duration: 00m 58s)
  • 10:26 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 09:55 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 09:50 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 09:49 ladsgroup@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Increase post edit constraint jobs to 85% of edits (T204031) (duration: 00m 58s)
  • 09:46 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on maps1005.eqiad.wmnet with reason: REIMAGE
  • 09:44 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on maps1005.eqiad.wmnet with reason: REIMAGE
  • 09:31 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=maps200[1234].codfw.wmnet
  • 08:46 godog: upgrade prometheus on prometheus2004 - T222113
  • 08:41 godog: upgrade prometheus on prometheus1004 - T222113
  • 08:36 jynus@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dbprov2002.codfw.wmnet with reason: REIMAGE
  • 08:34 jynus@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on dbprov2002.codfw.wmnet with reason: REIMAGE
  • 08:24 marostegui: Upgrade db1117 (all sections) to 10.4.19
  • 08:03 ariel@deploy1002: Finished deploy [dumps/dumps@142e91c]: fix for T288192 runnerutils bug (duration: 00m 03s)
  • 08:03 ariel@deploy1002: Started deploy [dumps/dumps@142e91c]: fix for T288192 runnerutils bug
  • 07:52 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1160 T288273', diff saved to https://phabricator.wikimedia.org/P16971 and previous config saved to /var/cache/conftool/dbconfig/20210809-075212-marostegui.json
  • 07:50 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 07:45 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 07:30 ladsgroup@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Enable shellbox for constraint for all of wikidata (T176312) (duration: 00m 58s)
  • 07:15 marostegui: Stop db1117:3323 to clone db1107 - T288197
  • 07:05 kart__: Updated cxserver to 2021-08-06-062053-production (T288272)
  • 07:04 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1107.eqiad.wmnet with reason: REIMAGE
  • 07:02 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1107.eqiad.wmnet with reason: REIMAGE
  • 06:53 kartik@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'cxserver' for release 'production' .
  • 06:45 kartik@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'cxserver' for release 'production' .
  • 05:56 XioNoX: enable cloudsw1-c8 interfaces toward cloudsw2-c8 - T277340
  • 05:23 marostegui: Lag in s4 (commonswiki) will appear on clouddb* hosts (wiki replicas) T288273
  • 05:23 marostegui: Optimize commonswiki.image on eqiad, lag will appear - T288273

2021-08-06

  • 19:17 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 19:12 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 19:04 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 18:53 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 18:53 cmjohnson@cumin1001: END (ERROR) - Cookbook sre.dns.netbox (exit_code=97)
  • 18:52 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 18:45 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 18:41 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 18:40 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 18:36 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 17:39 brennen: gitlab: run ansible to apply remove backup warning for config backups (T288324)
  • 16:59 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=maps2005.codfw.wmnet
  • 16:56 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 16:50 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 16:38 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts peek2001.codfw.wmnet
  • 16:34 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 16:34 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on maps1005.eqiad.wmnet with reason: Awaiting reimaging, depooled.
  • 16:34 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on maps1005.eqiad.wmnet with reason: Awaiting reimaging, depooled.
  • 16:30 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 16:30 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts peek2001.codfw.wmnet
  • 16:29 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8 days, 4:00:00 on peek2001.codfw.wmnet with reason: decom
  • 16:29 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 8 days, 4:00:00 on peek2001.codfw.wmnet with reason: decom
  • 16:03 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 16:02 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 15:14 hnowlan: removing maps1005 from old maps cassandra cluster before reimaging
  • 14:35 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=maps1005.eqiad.wmnet
  • 14:29 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on maps2005.codfw.wmnet with reason: Reimaging
  • 14:29 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on maps2005.codfw.wmnet with reason: Reimaging
  • 14:26 hnowlan@cumin2002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on maps2005.codfw.wmnet with reason: REIMAGE
  • 14:24 hnowlan@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on maps2005.codfw.wmnet with reason: REIMAGE
  • 13:35 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=maps1006.eqiad.wmnet
  • 13:07 zpapierski@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
  • 12:56 godog: test thanos 0.22 on thanos-fe2001 - T288326
  • 12:48 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 12:34 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 12:26 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'sync'.
  • 12:25 oblivian@deploy1002: helmfile [codfw] START helmfile.d/admin 'sync'.
  • 12:25 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
  • 12:25 oblivian@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
  • 12:23 oblivian@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'sync'.
  • 12:22 oblivian@deploy1002: helmfile [eqiad] START helmfile.d/admin 'sync'.
  • 12:22 oblivian@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'sync'.
  • 12:22 oblivian@deploy1002: helmfile [eqiad] START helmfile.d/admin 'sync'.
  • 12:21 oblivian@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'sync'.
  • 12:21 oblivian@deploy1002: helmfile [eqiad] START helmfile.d/admin 'sync'.
  • 12:20 oblivian@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
  • 12:20 oblivian@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
  • 11:45 jayme: enabling dragonfly dfdaemon on kubernetes200*
  • 11:16 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on maps1006.eqiad.wmnet with reason: REIMAGE
  • 11:14 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on maps1006.eqiad.wmnet with reason: REIMAGE
  • 10:16 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1181.eqiad.wmnet with reason: REIMAGE
  • 10:14 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1181.eqiad.wmnet with reason: REIMAGE
  • 09:58 kormat: reimaging db1181 (s7) to buster T288244
  • 09:15 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on maps2005.codfw.wmnet with reason: Rebuilding as buster replica of maps1009
  • 09:15 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on maps2005.codfw.wmnet with reason: Rebuilding as buster replica of maps1009
  • 09:15 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=maps2005.codfw.wmnet
  • 09:14 dcausse@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
  • 08:38 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
  • 08:38 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
  • 08:31 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
  • 08:30 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
  • 08:10 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
  • 08:09 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
  • 07:58 godog: test thanos 0.21 on thanos-fe2001 - T288326
  • 07:48 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
  • 07:48 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
  • 07:42 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
  • 07:36 dcausse@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
  • 07:32 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
  • 07:15 dcausse@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
  • 07:02 dcausse@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
  • 06:43 marostegui: Reboot db1107 to upgrade its kernel
  • 05:47 marostegui: Optimize commonswiki.image on db1160 T288273
  • 05:44 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1160 T288273', diff saved to https://phabricator.wikimedia.org/P16965 and previous config saved to /var/cache/conftool/dbconfig/20210806-054433-marostegui.json
  • 05:44 eileen: civicrm revision changed from 931b3defbe to c132d2f943, config revision is 3696499932
  • 04:03 TimStarling: on mwmaint1002 mwscript extensions/SecurePoll/cli/wm-scripts/makeGlobalVoterList.php --wiki=mediawikiwiki --edit-count-table=bv2021_edits --list-name=board-vote-2021 --short-min-edits=20 --long-min-edits=300
  • 04:00 eileen: civicrm revision changed from e52f569991 to 931b3defbe, config revision is 3696499932
  • 03:54 tstarling@deploy1002: Synchronized php-1.37.0-wmf.17/extensions/SecurePoll/cli/wm-scripts/makeGlobalVoterList.php: need to run this script T288025 (duration: 00m 57s)
  • 03:32 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 03:30 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 01:01 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on ms-be2065.codfw.wmnet with reason: REIMAGE
  • 00:58 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2065.codfw.wmnet with reason: REIMAGE
  • 00:14 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on ms-be2064.codfw.wmnet with reason: REIMAGE
  • 00:12 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2064.codfw.wmnet with reason: REIMAGE
  • 00:03 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 00:03 egardner@deploy1002: Synchronized php-1.37.0-wmf.17/extensions/MediaSearch: Backport: Revert "Open search result links in-place" (duration: 00m 58s)
  • 00:01 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .

2021-08-05

  • 23:39 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on ms-be2063.codfw.wmnet with reason: REIMAGE
  • 23:37 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2063.codfw.wmnet with reason: REIMAGE
  • 23:17 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 23:16 legoktm@deploy1002: Synchronized php-1.37.0-wmf.17/includes/: Revert "Use CsrfTokenSet as CSRF token source" (T287542) (duration: 01m 03s)
  • 23:00 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on ms-be2062.codfw.wmnet with reason: REIMAGE
  • 22:58 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2062.codfw.wmnet with reason: REIMAGE
  • 22:53 legoktm@deploy1002: Synchronized php-1.37.0-wmf.16/includes/: Revert "Use CsrfTokenSet as CSRF token source" (T287542) (duration: 01m 02s)
  • 22:52 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 22:51 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 22:12 jforrester@deploy1002: Synchronized php-1.37.0-wmf.17/includes/content/: T288191: Support deprecated Content::preSaveTransform override (2/2) (duration: 00m 55s)
  • 22:11 jforrester@deploy1002: Synchronized php-1.37.0-wmf.17/includes/content/ContentHandler.php: T288191: Support deprecated Content::preSaveTransform override (1/2) (duration: 01m 00s)
  • 22:10 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 22:07 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 21:42 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 21:41 jforrester@deploy1002: Synchronized php-1.37.0-wmf.17/skins/MonoBook/resources/screen-common.less: T288288 Restore visualClear style to MonoBook so that footer doesn't show in the interwiki list (duration: 01m 24s)
  • 21:40 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 21:03 ejegg: updated payments-wiki from 72fe99abb1 to a70aaa7944
  • 20:56 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 20:51 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 20:48 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on ms-be2062.codfw.wmnet with reason: REIMAGE
  • 20:48 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 20:46 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2062.codfw.wmnet with reason: REIMAGE
  • 20:44 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 20:23 dduvall: 1.37.0-wmf.17 promoted to all wikis. no new errors or concerning rates (T281158). fixes for open UBN T288191 will be handled via backport (see task discussion)
  • 20:18 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 20:13 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 19:24 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 19:20 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 19:18 dduvall@deploy1002: rebuilt and synchronized wikiversions files: all wikis to 1.37.0-wmf.17
  • 19:00 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 18:58 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 18:44 ladsgroup@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Increase the ratio for shellbox for constraints to 42% in Wikidata (T176312) (duration: 01m 06s)
  • 18:38 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 18:37 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 18:28 ladsgroup@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Increase the ratio for shellbox for constraints to 21% in Wikidata (T176312) (duration: 01m 06s)
  • 18:23 topranks: Adding peering to second router of Xiber LLC - AS393950 - on cr2-eqord (Equinix IX Chicago)
  • 18:23 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: da36bc3: DiscussionTools: Make sourcemodetoolbar available everywhere (T287927) (duration: 01m 06s)
  • 18:18 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 0a14eb4: wikimediaEvents: Enable IP address copy action instrument on all wikis (T279540) (duration: 01m 07s)
  • 18:17 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.17/extensions/DiscussionTools/extension.json: 91f7c02: Change sourcemodetoolbar default to enabled when available (T287927) (duration: 01m 06s)
  • 18:16 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 18:16 urbanecm@deploy1002: sync-file aborted: 91f7c02: Change sourcemodetoolbar default to enabled when available (T287927) (duration: 00m 04s)
  • 18:15 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.16/extensions/DiscussionTools/extension.json: 38a8658: Change sourcemodetoolbar default to enabled when available (T287927) (duration: 01m 06s)
  • 18:15 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 17:55 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 17:54 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 17:49 ladsgroup@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Increase the shellbox ratio to 5% for wikidata (T176312) (duration: 01m 15s)
  • 17:43 elukey: upgrade helm3 to 3.6.3-1 on release*, contint*, chartmuseum*, deploy2002 (1002 was already done before)
  • 17:43 herron: rolling restart eqiad logstash cluster for java updates
  • 17:41 ebernhardson: restart airflow-{scheduler|webserver} on an-airflow1001 to pickup deployed plugin changes
  • 17:36 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
  • 17:32 ebernhardson@deploy1002: Finished deploy [wikimedia/discovery/analytics@9872df9]: pyspark generalization gerrit:709837 and 666774 (duration: 09m 01s)
  • 17:26 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
  • 17:25 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
  • 17:25 Amir1: end of pdf rebuild on commonswiki (T275268)
  • 17:23 ebernhardson@deploy1002: Started deploy [wikimedia/discovery/analytics@9872df9]: pyspark generalization gerrit:709837 and 666774
  • 17:15 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
  • 16:52 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 16:51 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 16:48 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=maps2006.codfw.wmnet
  • 16:48 ladsgroup@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Enable shellbox for constraints for 1% of wikidata (T176312) (duration: 01m 27s)
  • 16:47 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=maps1010.eqiad.wmnet
  • 16:42 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
  • 16:42 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:42 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
  • 16:36 pt1979@cumin2002: START - Cookbook sre.dns.netbox
  • 16:21 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
  • 16:21 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
  • 16:16 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
  • 16:16 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
  • 16:15 mbsantos@deploy1002: Finished deploy [tilerator/deploy@16dbc04]: maps2006: imposm: add codfw targets (duration: 00m 22s)
  • 16:14 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
  • 16:14 mbsantos@deploy1002: Started deploy [tilerator/deploy@16dbc04]: maps2006: imposm: add codfw targets
  • 16:14 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
  • 16:13 mbsantos@deploy1002: Finished deploy [tilerator/deploy@16dbc04]: maps2007: imposm: add codfw targets (duration: 00m 25s)
  • 16:12 mbsantos@deploy1002: Started deploy [tilerator/deploy@16dbc04]: maps2007: imposm: add codfw targets
  • 16:11 mbsantos@deploy1002: Finished deploy [tilerator/deploy@16dbc04]: maps2008: imposm: add codfw targets (duration: 00m 23s)
  • 16:10 mbsantos@deploy1002: Started deploy [tilerator/deploy@16dbc04]: maps2008: imposm: add codfw targets
  • 16:10 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 16:10 mbsantos@deploy1002: Finished deploy [tilerator/deploy@16dbc04]: maps2009: imposm: add codfw targets (duration: 00m 29s)
  • 16:10 mbsantos@deploy1002: Started deploy [tilerator/deploy@16dbc04]: maps2009: imposm: add codfw targets
  • 16:09 mbsantos@deploy1002: Finished deploy [tilerator/deploy@16dbc04]: maps2010: imposm: add codfw targets (duration: 00m 22s)
  • 16:09 mbsantos@deploy1002: Started deploy [tilerator/deploy@16dbc04]: maps2010: imposm: add codfw targets
  • 16:04 hnowlan: draining maps1006 from maps cassandra cluster
  • 16:04 mbsantos@deploy1002: Finished deploy [kartotherian/deploy@0ea1846]: maps2006: tegola: mirror 5% of requests everywhere (duration: 00m 24s)
  • 16:04 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 16:03 mbsantos@deploy1002: Started deploy [kartotherian/deploy@0ea1846]: maps2006: tegola: mirror 5% of requests everywhere
  • 16:02 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on maps1006.eqiad.wmnet with reason: Rebuilding as buster replica of maps1009
  • 16:02 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on maps1006.eqiad.wmnet with reason: Rebuilding as buster replica of maps1009
  • 16:02 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
  • 16:02 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
  • 16:02 mbsantos@deploy1002: Finished deploy [kartotherian/deploy@0ea1846]: maps2010: tegola: mirror 5% of requests everywhere (duration: 00m 21s)
  • 16:02 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=maps1006.eqiad.wmnet
  • 16:01 mbsantos@deploy1002: Started deploy [kartotherian/deploy@0ea1846]: maps2010: tegola: mirror 5% of requests everywhere
  • 16:01 mbsantos@deploy1002: Finished deploy [kartotherian/deploy@0ea1846]: maps2009: tegola: mirror 5% of requests everywhere (duration: 00m 55s)
  • 16:01 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
  • 16:00 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
  • 16:00 mbsantos@deploy1002: Started deploy [kartotherian/deploy@0ea1846]: maps2009: tegola: mirror 5% of requests everywhere
  • 15:59 mbsantos@deploy1002: Finished deploy [kartotherian/deploy@0ea1846]: maps2008: tegola: mirror 5% of requests everywhere (duration: 00m 21s)
  • 15:59 mbsantos@deploy1002: Started deploy [kartotherian/deploy@0ea1846]: maps2008: tegola: mirror 5% of requests everywhere
  • 15:59 dcausse@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
  • 15:59 mbsantos@deploy1002: Finished deploy [kartotherian/deploy@0ea1846]: tegola: mirror 5% of requests everywhere (duration: 00m 22s)
  • 15:58 mbsantos@deploy1002: Started deploy [kartotherian/deploy@0ea1846]: tegola: mirror 5% of requests everywhere
  • 15:57 mbsantos@deploy1002: deploy aborted: tegola: mirror 5% of requests everywhere (duration: 00m 03s)
  • 15:57 mbsantos@deploy1002: Started deploy [kartotherian/deploy@0ea1846] (imposm): tegola: mirror 5% of requests everywhere
  • 15:54 herron: rolling restart codfw logstash elasticsearch cluster for java updates
  • 15:52 elukey: upgrade helm3 to 3.6.3-1 on deploy1002
  • 15:28 vgutierrez: pool lvs2009 - T286881
  • 15:27 mbsantos@deploy1002: Finished deploy [kartotherian/deploy@0a38bc5]: Deploy imposm to maps2006 (duration: 00m 20s)
  • 15:27 mbsantos@deploy1002: Started deploy [kartotherian/deploy@0a38bc5]: Deploy imposm to maps2006
  • 15:11 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on lvs2009.codfw.wmnet with reason: T286881
  • 15:11 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on lvs2009.codfw.wmnet with reason: T286881
  • 15:11 vgutierrez: depool lvs2009 - T286881
  • 15:10 vgutierrez: pool lvs2008 - T286881
  • 14:54 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on lvs2008.codfw.wmnet with reason: T286881
  • 14:53 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on lvs2008.codfw.wmnet with reason: T286881
  • 14:52 vgutierrez: depool lvs2008 - T286881
  • 14:50 elukey: upload helm 3.6.3-1 to {buster,stretch}-wikimedia
  • 14:30 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on maps1010.eqiad.wmnet with reason: Reimaging
  • 14:30 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on maps1010.eqiad.wmnet with reason: Reimaging
  • 14:24 dzahn@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host doh5002.wikimedia.org
  • 14:18 hnowlan@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on maps1010.eqiad.wmnet with reason: REIMAGE
  • 14:16 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on maps1010.eqiad.wmnet with reason: REIMAGE
  • 14:14 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm for new host doh5002.wikimedia.org
  • 14:12 hnowlan@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on maps2006.codfw.wmnet with reason: REIMAGE
  • 14:10 hnowlan@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on maps2006.codfw.wmnet with reason: REIMAGE
  • 14:00 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 13:49 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
  • 13:48 oblivian@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
  • 13:48 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
  • 13:48 oblivian@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
  • 13:44 dzahn@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host doh5002.wikimedia.org
  • 13:43 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm for new host doh5002.wikimedia.org
  • 13:39 mutante: deleted reserved (not active) IP 103.102.166.5/28 from netbox (T284246)
  • 13:39 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 13:35 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 13:28 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 13:28 ladsgroup@deploy1002: Synchronized php-1.37.0-wmf.17/extensions/WikibaseQualityConstraints/src/ConstraintCheck/Checker/FormatChecker.php: Backport: Add 'constraint-regex-checker' to isEnabled() check as well (T176312) (duration: 01m 06s)
  • 13:25 ladsgroup@deploy1002: Synchronized php-1.37.0-wmf.16/extensions/WikibaseQualityConstraints/src/ConstraintCheck/Checker/FormatChecker.php: Backport: Add 'constraint-regex-checker' to isEnabled() check as well (T176312) (duration: 01m 19s)
  • 13:12 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 13:08 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 12:54 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 12:52 oblivian@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
  • 12:52 oblivian@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
  • 12:44 oblivian@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
  • 12:44 oblivian@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
  • 12:44 oblivian@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
  • 12:44 oblivian@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
  • 11:59 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on maps1010.eqiad.wmnet with reason: Rebuilding as buster replica of maps1009
  • 11:59 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on maps1010.eqiad.wmnet with reason: Rebuilding as buster replica of maps1009
  • 11:58 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=maps1010.eqiad.wmnet
  • 11:55 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=maps1008.eqiad.wmnet
  • 11:47 XioNoX: prepare cloudsw1-c8-eqiad for cloudsw2-c8 - T277340
  • 11:41 hnowlan: removing maps2006 from old maps cassandra cluster
  • 11:39 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on maps2006.codfw.wmnet with reason: Rebuilding as buster replica of maps2009
  • 11:39 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on maps2006.codfw.wmnet with reason: Rebuilding as buster replica of maps2009
  • 11:38 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on maps2008.codfw.wmnet with reason: Rebuilding as buster replica of maps2009
  • 11:38 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on maps2008.codfw.wmnet with reason: Rebuilding as buster replica of maps2009
  • 11:11 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=maps2006.codfw.wmnet
  • 11:07 topranks: Reconfiguring packet buffer partitioning on cloudsw-d5-eqiad T288037
  • 11:06 ayounsi@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 11:01 ayounsi@cumin1001: START - Cookbook sre.dns.netbox
  • 10:25 ladsgroup@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Add shellbox-constraint services and use them (T176312), Part III (duration: 01m 06s)
  • 10:24 ladsgroup@deploy1002: Synchronized wmf-config/Wikibase.php: Config: Add shellbox-constraint services and use them (T176312), Part II (duration: 01m 07s)
  • 10:23 ladsgroup@deploy1002: Synchronized wmf-config/ProductionServices.php: Config: Add shellbox-constraint services and use them (T176312), Part I (duration: 01m 07s)
  • 10:15 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 10:11 vgutierrez: restart acme-chief on acmechief1001
  • 10:06 jayme@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host dragonfly-supernode2001.codfw.wmnet
  • 10:04 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 10:03 topranks: Reconfiguring packet buffer partitioning on cloudsw-c8-eqiad T288036
  • 10:01 ladsgroup@deploy1002: Synchronized php-1.37.0-wmf.17/extensions/WikibaseQualityConstraints/src/ConstraintCheck/Checker/FormatChecker.php: Backport: Route Shellbox requests to 'constraint-regex-checker' service (T176312) (duration: 01m 06s)
  • 09:59 ladsgroup@deploy1002: Synchronized php-1.37.0-wmf.16/extensions/WikibaseQualityConstraints/src/ConstraintCheck/Checker/FormatChecker.php: Backport: Route Shellbox requests to 'constraint-regex-checker' service (T176312) (duration: 01m 27s)
  • 09:56 jayme@cumin1001: START - Cookbook sre.ganeti.makevm for new host dragonfly-supernode2001.codfw.wmnet
  • 09:49 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 09:36 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on maps1008.eqiad.wmnet with reason: REIMAGE
  • 09:34 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on maps1008.eqiad.wmnet with reason: REIMAGE
  • 09:28 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 09:19 jayme@cumin1001: conftool action : set/pooled=yes; selector: name=registry2004.codfw.wmnet,dc=codfw,cluster=docker-registry
  • 09:05 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 08:28 godog: bounce grafana to apply new settings - T119719
  • 08:00 marostegui: Failover m2 from db1107 to db1183 - T287852
  • 06:54 godog: prometheus/ops codfw +100G
  • 06:15 godog: add back thanos-be1003 sdf1 in thanos-swift
  • 04:03 ejegg: re-enabled fundraising scheduled jobs (process-control)
  • 03:03 ejegg: disabled fundraising scheduled jobs (process-control)
  • 02:50 TimStarling: on mwmaint1002 killing populateEditCount.php for loginwiki -- it's slow but it's not going to find any edits
  • 02:46 eileen: civicrm revision changed from d6baf291f4 to e52f569991, config revision is 360c8a1f08
  • 01:26 Krinkle: krinkle@mwmaint1002 Temporarily grant myself `translationadmin` on wikimania2016wiki in order to approve an edit given FlaggedRevs-like nature of Translate
  • 00:24 legoktm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Remove DynamicPageList from all Wikimania wikis except 2016 (T287916) (duration: 01m 52s)

2021-08-04

  • 22:18 ebernhardson@deploy1002: Finished deploy [wikimedia/discovery/analytics@34cd541]: gerrit:709835 and 709836 (duration: 06m 52s)
  • 22:11 ebernhardson@deploy1002: Started deploy [wikimedia/discovery/analytics@34cd541]: gerrit:709835 and 709836
  • 20:56 legoktm@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 20:21 legoktm@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 19:22 dduvall: 1.37.0-wmf.17 promoted to group1. no new errors or troubling error rates spotted (T281158)
  • 19:16 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 19:12 dduvall@deploy1002: Synchronized php: group1 wikis to 1.37.0-wmf.17 (duration: 01m 15s)
  • 19:11 dduvall@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.37.0-wmf.17
  • 18:45 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 18:41 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.17/extensions/GrowthExperiments/: 5c3ac58: Fix array key handling for GEHelpPanelLinks in on-wiki config (T288023) (duration: 01m 08s)
  • 18:34 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2380.codfw.wmnet
  • 18:33 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2379.codfw.wmnet
  • 18:30 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.16/extensions/GrowthExperiments/: 36a2b9f: Fix array key handling for GEHelpPanelLinks in on-wiki config (T288023) (duration: 01m 06s)
  • 18:28 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 18:25 mutante: mw2379, mw2380 - scap pull
  • 18:16 brennen: gitlab1001: upgrading to 13.12.9
  • 18:11 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 18:11 brennen: gitlab2001: upgrading to 13.12.9
  • 18:10 otto@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Enable canary events by default - T287789 (duration: 01m 06s)
  • 18:03 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on mw2380.codfw.wmnet with reason: reimage
  • 18:03 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on mw2380.codfw.wmnet with reason: reimage
  • 18:01 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mw2380.codfw.wmnet with reason: REIMAGE
  • 18:01 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 1:00:00 on mw2380.codfw.wmnet with reason: reimage
  • 18:01 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on mw2380.codfw.wmnet with reason: reimage
  • 18:01 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on mw2379.codfw.wmnet with reason: reimage
  • 18:01 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on mw2379.codfw.wmnet with reason: reimage
  • 17:59 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mw2379.codfw.wmnet with reason: REIMAGE
  • 17:59 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2380.codfw.wmnet with reason: REIMAGE
  • 17:57 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2379.codfw.wmnet with reason: REIMAGE
  • 17:49 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2378.codfw.wmnet
  • 17:49 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2377.codfw.wmnet
  • 17:46 dzahn@cumin1001: conftool action : set/weight=30; selector: name=mw2380.codfw.wmnet
  • 17:46 dzahn@cumin1001: conftool action : set/weight=30; selector: name=mw237[7-9].codfw.wmnet
  • 17:45 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2357.codfw.wmnet
  • 17:41 dzahn@cumin1001: conftool action : set/weight=25; selector: name=mw2357.codfw.wmnet
  • 17:40 mutante: mw2357, mw2377, mw2378 - scap pull
  • 17:40 dzahn@cumin1001: conftool action : set/weight=30; selector: name=mw2357.codfw.wmnet
  • 17:39 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 17:29 dzahn@cumin1001: conftool action : set/weight=25; selector: name=mw238[1-2].codfw.wmnet
  • 17:29 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 17:27 ejegg: updated payments-wiki config to 360c8a1f08
  • 17:26 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2355.codfw.wmnet
  • 17:25 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2353.codfw.wmnet
  • 17:25 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2351.codfw.wmnet
  • 17:25 dzahn@cumin1001: conftool action : set/weight=25; selector: name=mw2355.codfw.wmnet
  • 17:25 dzahn@cumin1001: conftool action : set/weight=25; selector: name=mw2353.codfw.wmnet
  • 17:25 dzahn@cumin1001: conftool action : set/weight=25; selector: name=mw2351.codfw.wmnet
  • 17:15 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2378.codfw.wmnet with reason: REIMAGE
  • 17:13 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2377.codfw.wmnet with reason: REIMAGE
  • 17:12 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.17/extensions/GrowthExperiments/maintenance/updateMenteeData.php: 66c2c75: updateMenteeData: Output how long the script took (T287964) (duration: 01m 07s)
  • 17:11 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2378.codfw.wmnet with reason: REIMAGE
  • 17:11 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2357.codfw.wmnet with reason: REIMAGE
  • 17:10 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
  • 17:10 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
  • 17:09 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2377.codfw.wmnet with reason: REIMAGE
  • 17:08 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2357.codfw.wmnet with reason: REIMAGE
  • 16:57 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 16:55 mutante: mw2351, mw2353, mw2355 - scap pull
  • 16:40 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:37 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 16:25 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on mw2355.codfw.wmnet with reason: reimage
  • 16:25 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on mw2355.codfw.wmnet with reason: reimage
  • 16:23 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mw2355.codfw.wmnet with reason: REIMAGE
  • 16:23 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on mw2357.codfw.wmnet with reason: reimage
  • 16:22 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on mw2357.codfw.wmnet with reason: reimage
  • 16:22 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on mw2353.codfw.wmnet with reason: reimage
  • 16:22 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on mw2353.codfw.wmnet with reason: reimage
  • 16:21 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 4:00:00 on mw2353.codfw.wmnet with reason: reimage
  • 16:21 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on mw2353.codfw.wmnet with reason: reimage
  • 16:21 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mw2353.codfw.wmnet with reason: REIMAGE
  • 16:21 joe: find . -type f -delete on /var/cache/nginx-docker-registry on registry2*, the disk is too small for unbound cache *and* accepting large uploads
  • 16:20 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2355.codfw.wmnet with reason: REIMAGE
  • 16:19 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2351.codfw.wmnet with reason: REIMAGE
  • 16:18 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2353.codfw.wmnet with reason: REIMAGE
  • 16:16 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2351.codfw.wmnet with reason: REIMAGE
  • 16:15 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on maps1008.eqiad.wmnet with reason: Rebuilding as buster replica of maps1009
  • 16:15 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on maps1008.eqiad.wmnet with reason: Rebuilding as buster replica of maps1009
  • 16:14 hnowlan: draining maps1008 from cassandra cluster
  • 16:13 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=maps1008.eqiad.wmnet
  • 16:02 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on mw2357.codfw.wmnet with reason: reimage
  • 16:02 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on mw2357.codfw.wmnet with reason: reimage
  • 16:01 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on mw2380.codfw.wmnet with reason: reimage
  • 16:01 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on mw2380.codfw.wmnet with reason: reimage
  • 16:01 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on mw[2377-2379].codfw.wmnet with reason: reimage
  • 16:01 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on mw[2377-2379].codfw.wmnet with reason: reimage
  • 15:58 mutante: mw2351, mw2353, mw2355, mw2357 - converting from appserver to jobrunner, mw2377, mw2378, mw2379, mw2380 - converting from jobrunner to appserver - for balancing of server types over rows
  • 15:51 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw2380.codfw.wmnet
  • 15:50 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw237[789].codfw.wmnet
  • 15:48 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw235[1357].codfw.wmnet
  • 15:47 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw235[1357].wmnet
  • 14:30 godog: upgrade prometheus on cloudmetrics hosts - T222113
  • 14:28 godog: upgrade prometheus on prometheus4001 - T222113
  • 14:19 moritzm: imported gitlab-ce 13.12.9 to thirdparty/gitlab T287671
  • 14:18 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 14:17 godog: depool prometheus2004 and pool prometheus2003 - T222113
  • 14:13 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on 18 hosts with reason: Firmware upgrade on db1104 (s8 primary) T286226
  • 14:13 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on 18 hosts with reason: Firmware upgrade on db1104 (s8 primary) T286226
  • 14:12 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 14:02 dcausse@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
  • 13:55 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 13:50 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 5d7255c: jvwikisource: Add author namespace (T286241) (duration: 01m 06s)
  • 13:49 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 13:32 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 13:21 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 13:19 urbanecm: jvwikisource was created (T286241)
  • 13:19 urbanecm@deploy1002: Synchronized wmf-config/interwiki.php: Update interwiki cache (duration: 03m 11s)
  • 13:18 volans: upgraded python3-wmflib to v0.0.9 fleet wide
  • 13:15 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Creating jvwikisource (T286241) (duration: 01m 06s)
  • 13:14 urbanecm@deploy1002: Synchronized wmf-config/logos.php: Creating jvwikisource (T286241) (duration: 01m 06s)
  • 13:10 urbanecm@deploy1002: Synchronized static/images/project-logos/: Creating jvwikisource (T286241) (duration: 01m 07s)
  • 13:09 urbanecm@deploy1002: rebuilt and synchronized wikiversions files: Creating jvwikisource (T286241)
  • 13:08 urbanecm@deploy1002: Synchronized dblists: Creating jvwikisource (T286241) (duration: 01m 07s)
  • 13:07 urbanecm@deploy1002: Synchronized wmf-config/db-codfw.php: Creating jvwikisource (T286241) (duration: 01m 07s)
  • 13:05 urbanecm@deploy1002: Synchronized wmf-config/db-eqiad.php: Creating jvwikisource (T286241) (duration: 01m 08s)
  • 12:23 godog: depool prometheus2004 for upgrade - T222113
  • 12:19 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 12:07 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1129', diff saved to https://phabricator.wikimedia.org/P16958 and previous config saved to /var/cache/conftool/dbconfig/20210804-120725-marostegui.json
  • 12:03 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 11:53 reedy@deploy1002: Synchronized docroot/mediawiki.org/xml/index.html: T288040 (duration: 01m 08s)
  • 11:46 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 11:43 moritzm: installing testvm2001 T286206
  • 11:36 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1170:3317 and db1101:3317 T286888!', diff saved to https://phabricator.wikimedia.org/P16957 and previous config saved to /var/cache/conftool/dbconfig/20210804-113623-marostegui.json
  • 11:24 phuedx@deploy1002: Synchronized php-1.37.0-wmf.17/extensions/SecurePoll: Backport: Use real transactions when creating an election (duration: 01m 08s)
  • 11:21 phuedx@deploy1002: Synchronized php-1.37.0-wmf.16/extensions/SecurePoll: Backport: Use real transactions when creating an election (duration: 01m 19s)
  • 10:53 jayme: running puppet on eqiad appservers
  • 10:48 jayme: switch most eqiad appservers to appserver_dragonly role for testing - T286054
  • 10:29 jayme: importing dragonfly 1.0.6-1 (downgrade from 1.0.6-2) to buster-wikimedia and stretch-wikimedia - T286054
  • 10:17 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1101:3317 T286888', diff saved to https://phabricator.wikimedia.org/P16955 and previous config saved to /var/cache/conftool/dbconfig/20210804-101719-marostegui.json
  • 09:10 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host testvm2001.codfw.wmnet
  • 09:10 volans: uploaded python3-wmflib_0.0.9 to apt.wikimedia.org buster-wikimedia,bullseye-wikimedia
  • 08:55 legoktm@cumin1001: conftool action : set/pooled=true; selector: name=codfw,dnsdisc=shellbox-constraints
  • 08:51 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host testvm2001.codfw.wmnet
  • 08:41 godog: pool prometheus1003 (and depool prometheus1004 for testing 1003 only) - T222113
  • 08:27 legoktm: restarting pybal on lvs2009 to add shellbox-constraints service
  • 08:24 legoktm: restarting pybal on lvs1015 to add shellbox-constraints service
  • 08:22 legoktm: restarting pybal on lvs2010 to add shellbox-constraints service
  • 08:18 legoktm: restarting pybal on lvs1016 to add shellbox-constraints service
  • 08:00 godog: upgrade prometheus1003 - T222113
  • 06:53 moritzm: installing testvm2002 T286206
  • 06:45 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1174 and db1127 T286763', diff saved to https://phabricator.wikimedia.org/P16954 and previous config saved to /var/cache/conftool/dbconfig/20210804-064548-marostegui.json
  • 06:03 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1170:3312, db1105:3312, db1105:3311 T286888', diff saved to https://phabricator.wikimedia.org/P16953 and previous config saved to /var/cache/conftool/dbconfig/20210804-060347-marostegui.json
  • 05:35 joe: docker image prune on releases1002, T288024
  • 05:07 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1105:3311', diff saved to https://phabricator.wikimedia.org/P16952 and previous config saved to /var/cache/conftool/dbconfig/20210804-050751-marostegui.json
  • 04:54 TimStarling: on mwmaint2002: running bv2021/populateEditCounts.php on all wikis with one thread per section s1-s8
  • 04:45 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1105:3312 to clone db1170:3312 T286888', diff saved to https://phabricator.wikimedia.org/P16950 and previous config saved to /var/cache/conftool/dbconfig/20210804-044507-marostegui.json
  • 04:34 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1174 to clone db1127 T286763', diff saved to https://phabricator.wikimedia.org/P16948 and previous config saved to /var/cache/conftool/dbconfig/20210804-043438-marostegui.json
  • 04:10 TimStarling: on mwmaint2002: creating bv2021_edits table on all wikis
  • 03:58 tstarling@deploy1002: Synchronized php-1.37.0-wmf.17/extensions/SecurePoll: for bv2021/populateEditCount.php (duration: 01m 06s)
  • 03:56 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 03:56 tstarling@deploy1002: Synchronized php-1.37.0-wmf.16/extensions/SecurePoll: for bv2021/populateEditCount.php (duration: 01m 18s)
  • 03:50 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 03:34 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 03:27 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .

2021-08-03

  • 23:34 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 23:28 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 23:17 ebernhardson@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Re-enable commonswiki sister search (T277225) (duration: 01m 07s)
  • 22:57 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 22:50 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 22:37 dduvall: re-rolled 1.37.0-wmf.17 to group0 following rollback and subsequent fixes for T287988 (T281158)
  • 22:28 dduvall@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.37.0-wmf.17
  • 22:20 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.16/autoload.php: 7d286dc: Add (MySQL/DB)PrimaryPos as an alias to (MySQL/DB)MasterPos (T287988; 3/3) (duration: 01m 07s)
  • 22:18 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.16/includes/libs/rdbms/database/position/MySQLMasterPos.php: 7d286dc: Add (MySQL/DB)PrimaryPos as an alias to (MySQL/DB)MasterPos (T287988; 2/3) (duration: 01m 07s)
  • 22:15 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.16/includes/libs/rdbms/database/position/DBMasterPos.php: 7d286dc: Add (MySQL/DB)PrimaryPos as an alias to (MySQL/DB)MasterPos (T287988; 1/3) (duration: 01m 07s)
  • 21:51 ebernhardson@deploy1002: Finished deploy [wikimedia/discovery/analytics@2d533ba]: enable glent version marker in final output (duration: 45m 00s)
  • 21:43 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 21:37 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 21:10 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 21:06 ebernhardson@deploy1002: Started deploy [wikimedia/discovery/analytics@2d533ba]: enable glent version marker in final output
  • 21:04 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 20:46 rzl@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 13 days, 0:00:00 on mw2383.codfw.wmnet with reason: T286463
  • 20:45 rzl@cumin1001: START - Cookbook sre.hosts.downtime for 13 days, 0:00:00 on mw2383.codfw.wmnet with reason: T286463
  • 20:44 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.16/autoload.php: REVERT: 2d4ea75: Add (MySQL/DB)PrimaryPos as an alias to (MySQL/DB)MasterPos (T287988; 1/3) (duration: 00m 37s)
  • 20:41 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.16/includes/libs/rdbms/database/position/DBMasterPos.php: REVERT: 2d4ea75: Add (MySQL/DB)PrimaryPos as an alias to (MySQL/DB)MasterPos (T287988; 2/3) (duration: 00m 37s)
  • 20:40 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.16/includes/libs/rdbms/database/position/DBMasterPos.php: 2d4ea75: Add (MySQL/DB)PrimaryPos as an alias to (MySQL/DB)MasterPos (T287988; 2/3) (duration: 01m 07s)
  • 20:39 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.16/autoload.php: 2d4ea75: Add (MySQL/DB)PrimaryPos as an alias to (MySQL/DB)MasterPos (T287988; 1/3) (duration: 01m 08s)
  • 20:23 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on ganeti-test2003.codfw.wmnet with reason: REIMAGE
  • 20:20 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti-test2003.codfw.wmnet with reason: REIMAGE
  • 20:13 otto@deploy1002: Finished deploy [analytics/refinery@ea78871] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@ea78871] (duration: 05m 36s)
  • 20:08 otto@deploy1002: Started deploy [analytics/refinery@ea78871] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@ea78871]
  • 20:08 otto@deploy1002: Finished deploy [analytics/refinery@ea78871] (thin): Regular analytics weekly train THIN [analytics/refinery@ea78871] (duration: 00m 07s)
  • 20:07 otto@deploy1002: Started deploy [analytics/refinery@ea78871] (thin): Regular analytics weekly train THIN [analytics/refinery@ea78871]
  • 20:03 otto@deploy1002: Finished deploy [analytics/refinery@ea78871]: Regular analytics weekly train [analytics/refinery@ea78871] (duration: 20m 38s)
  • 19:57 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 19:52 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 19:45 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on ganeti-test2002.codfw.wmnet with reason: REIMAGE
  • 19:43 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti-test2002.codfw.wmnet with reason: REIMAGE
  • 19:42 otto@deploy1002: Started deploy [analytics/refinery@ea78871]: Regular analytics weekly train [analytics/refinery@ea78871]
  • 19:36 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 19:31 ryankemper: T285355 `ryankemper@an-web1001:~$ sudo run-puppet-agent` to establish `role(analytics_cluster::webserver)` on the host in preparation for upcoming cutover from `thorium`->`an-web1001`
  • 19:31 otto@deploy1002: Finished deploy [analytics/refinery@aceb561] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@aceb561] (duration: 05m 40s)
  • 19:30 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 19:27 dduvall@deploy1002: rebuilt and synchronized wikiversions files: revert group0 wikis to 1.37.0-wmf.16
  • 19:25 otto@deploy1002: Started deploy [analytics/refinery@aceb561] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@aceb561]
  • 19:14 dduvall@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.37.0-wmf.17
  • 19:01 otto@deploy1002: Finished deploy [analytics/refinery@aceb561] (thin): Regular analytics weekly train THIN [analytics/refinery@aceb561] (duration: 00m 07s)
  • 19:01 otto@deploy1002: Started deploy [analytics/refinery@aceb561] (thin): Regular analytics weekly train THIN [analytics/refinery@aceb561]
  • 19:00 otto@deploy1002: Finished deploy [analytics/refinery@aceb561]: Regular analytics weekly train [analytics/refinery@aceb561] (duration: 16m 25s)
  • 18:47 Amir1: running mwscript migrateUserGroup.php --wiki=idwiki editor reviewer (T286853)
  • 18:44 otto@deploy1002: Started deploy [analytics/refinery@aceb561]: Regular analytics weekly train [analytics/refinery@aceb561]
  • 18:29 dduvall@deploy1002: Finished scap: testwikis wikis to 1.37.0-wmf.17 (duration: 36m 44s)
  • 18:18 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 18:12 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 18:06 ebernhardson@deploy1002: Finished deploy [search/mjolnir/deploy@f0f70d1]: T286642 fixes to bulk daemon prioritization (duration: 00m 48s)
  • 18:05 ebernhardson@deploy1002: Started deploy [search/mjolnir/deploy@f0f70d1]: T286642 fixes to bulk daemon prioritization
  • 17:52 dduvall@deploy1002: Started scap: testwikis wikis to 1.37.0-wmf.17
  • 16:59 hashar: Gerrit has been upgraded
  • 16:47 dancy@deploy1002: Finished deploy [gerrit/gerrit@244120b]: Gerrit to 3.3.5 on gerrit1001 (duration: 00m 07s)
  • 16:47 dancy@deploy1002: Started deploy [gerrit/gerrit@244120b]: Gerrit to 3.3.5 on gerrit1001
  • 16:45 urbanecm: Start server side upload for 1 video file (T287957)
  • 16:45 hashar: Stopping Gerrit for upgrade
  • 16:43 volans: upgraded spicerack to 0.0.57-1+deb10u1 on cumin1001
  • 16:36 dancy@deploy1002: Finished deploy [gerrit/gerrit@244120b]: Gerrit to 3.3.5 on gerrit2001 (duration: 00m 10s)
  • 16:36 dancy@deploy1002: Started deploy [gerrit/gerrit@244120b]: Gerrit to 3.3.5 on gerrit2001
  • 16:27 hashar: Going to upgrade Gerrit 3.3 (scheduled maintenance)
  • 16:18 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:14 pt1979@cumin2002: START - Cookbook sre.dns.netbox
  • 16:00 dcausse@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
  • 15:55 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 15:50 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 15:49 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:34 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 15:30 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 15:26 jmm@cumin2002: START - Cookbook sre.dns.netbox
  • 15:25 moritzm: prune testvm2001 from Ganeti and clean up from Netbox (stuck in some inconsistent state the decom cookbook can't handle) T286206
  • 15:14 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host testvm2002.codfw.wmnet
  • 15:01 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host testvm2002.codfw.wmnet
  • 14:56 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts testvm2001.codfw.wmnet
  • 14:49 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts testvm2001.codfw.wmnet
  • 14:32 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 14:27 ottomata: chown dumpsgen and chmod 644 /data/xmldatadumps/public/*/20210801/dumpstatus.json on labstore1006 and labstore1007 (it was only readable by root causing an analytics import job to fail), ping apergos
  • 14:23 ottomata: chown dumpsgen and chmod 644 /data/xmldatadumps/public/lezwiki/20210801/dumpstatus.json on labstore1006 and labstore1007 (it was only readable by root causing an analytics import job to fail), ping apergos
  • 14:13 ottomata: chown dumpsgen and chmod 644 dumpsdata1003:/data/xmldatadumps/public/lezwiki/20210801/dumpstatus.json (it was only readable by root causing an analytics import job to fail), ping apergos
  • 12:47 moritzm: restarting Tomcat on idp1001
  • 12:05 moritzm: installing libgcrypt20 security updates
  • 11:48 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host testvm2001.codfw.wmnet
  • 11:36 moritzm: updated bullseye d-i images to rc3 T275873
  • 11:28 godog: upgrade prometheus3001 to 2.24.1+ds-1+wmf1 - T222113
  • 11:25 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host testvm2001.codfw.wmnet
  • 11:19 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 11:18 godog: upgrade prometheus5001 to 2.24.1+ds-1+wmf1 - T222113
  • 11:15 jmm@cumin2002: START - Cookbook sre.dns.netbox
  • 11:13 moritzm: rename Ganeti group for test cluster to row_D T286206
  • 11:01 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host testvm2001.codfw.wmnet
  • 10:58 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host testvm2001.codfw.wmnet
  • 09:18 marostegui: Failover m1, m2 and m3-master T287574
  • 09:12 moritzm: installinh php 7.0 security updates on stretch
  • 09:11 jayme: importing dragonfly 1.0.6-2 to buster-wikimedia and stretch-wikimedia - T286054
  • 08:57 moritzm: installing pillow security updates on stretch
  • 08:53 jynus@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1145.eqiad.wmnet with reason: REIMAGE
  • 08:50 jynus@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1145.eqiad.wmnet with reason: REIMAGE
  • 08:17 legoktm: pausing refreshLinks run against wikiversities while other issues are figured out
  • 08:13 jynus@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1145.eqiad.wmnet with reason: REIMAGE
  • 08:10 jynus@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1145.eqiad.wmnet with reason: REIMAGE
  • 08:03 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5 days, 8:00:00 on planet1002.eqiad.wmnet with reason: known issue
  • 08:03 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 5 days, 8:00:00 on planet1002.eqiad.wmnet with reason: known issue
  • 07:42 moritzm: upgrading spicerack on cumin2002 to 0.0.57
  • 06:31 kart__: Updated cxserver to 2021-08-02-164000-production (T286473)
  • 06:26 kartik@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'cxserver' for release 'production' .
  • 06:20 kartik@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'cxserver' for release 'production' .
  • 06:15 kartik@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'cxserver' for release 'staging' .
  • 04:37 marostegui: Disable puppet on dbproxy1014 dbproxy1013 dbproxy1020
  • 00:43 reedy@deploy1002: Finished deploy [integration/docroot@f9d225d]: with less gref (duration: 00m 05s)
  • 00:43 reedy@deploy1002: Started deploy [integration/docroot@f9d225d]: with less gref
  • 00:29 reedy@deploy1002: Finished deploy [integration/docroot@f7df1c7]: (no justification provided) (duration: 00m 05s)
  • 00:29 reedy@deploy1002: Started deploy [integration/docroot@f7df1c7]: (no justification provided)
  • 00:22 reedy@deploy1002: Finished deploy [integration/docroot@3cff0e4]: (no justification provided) (duration: 00m 08s)
  • 00:22 reedy@deploy1002: Started deploy [integration/docroot@3cff0e4]: (no justification provided)

2021-08-02

  • 23:58 legoktm@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 23:50 legoktm@cumin1001: START - Cookbook sre.dns.netbox
  • 23:38 legoktm@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'shellbox-constraints' for release 'main' .
  • 23:38 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 23:38 legoktm@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'shellbox-constraints' for release 'main' .
  • 23:35 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 23:28 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 23:25 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 23:21 legoktm: Previous sync also deployed c38998f03f "Stop enabling DPL on new wikis" (T287380)
  • 23:18 legoktm@deploy1002: Synchronized dblists/: Move ruwikinews to large wikis dblist (2/2) (duration: 00m 56s)
  • 23:16 legoktm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Move ruwikinews to large wikis dblist (1/2) (duration: 00m 57s)
  • 21:31 tzatziki: removing 1 file for legal compliance
  • 21:16 tzatziki: removing 7 files for legal compliance
  • 19:35 razzi@cumin1001: END (PASS) - Cookbook sre.druid.roll-restart-workers (exit_code=0) for Druid analytics cluster: Roll restart of Druid's jvm daemons. - razzi@cumin1001
  • 19:15 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 19:13 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 19:05 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 19:01 urbanecm: Run extensions/GrowthExperiments/maintenance/initWikiConfig.php on a couple of wikis to init on-wiki config for Growth features (T287868, T287874, T287873)
  • 19:00 urbanecm: Morning B&C window completed
  • 19:00 urbanecm@deploy1002: Synchronized dblists/growthexperiments.dblist: bebf4a9: Enable Growth features on a couple of wikis in dark mode (T287868, T287874, T287873; 2/2) (duration: 00m 56s)
  • 18:59 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: bebf4a9: Enable Growth features on a couple of wikis in dark mode (T287868, T287874, T287873; 1/2) (duration: 00m 57s)
  • 18:58 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 18:57 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 18:50 otto@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Stream config for android_notification_interaction - T287652 (duration: 00m 56s)
  • 18:50 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 18:49 razzi@cumin1001: START - Cookbook sre.druid.roll-restart-workers for Druid analytics cluster: Roll restart of Druid's jvm daemons. - razzi@cumin1001
  • 18:49 urbanecm: Run extensions/GrowthExperiments/maintenance/initWikiConfig.php on a couple of wikis to init on-wiki config for Growth features (T287876, T287871, T287878, T287880, T287875, T287879, T287872)
  • 18:48 razzi@cumin1001: END (PASS) - Cookbook sre.druid.roll-restart-workers (exit_code=0) for Druid public cluster: Roll restart of Druid's jvm daemons. - razzi@cumin1001
  • 18:47 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 18:46 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 18cd360: Growth features: Enable features in dark mode on a few wikis (T287876, T287871, T287878, T287880, T287875, T287879, T287872; 2/2) (duration: 00m 56s)
  • 18:45 urbanecm@deploy1002: Synchronized dblists/growthexperiments.dblist: 18cd360: Growth features: Enable features in dark mode on a few wikis (T287876, T287871, T287878, T287880, T287875, T287879, T287872; 1/2) (duration: 00m 56s)
  • 18:41 urbanecm: Create GrowthExperiments database tables for a bunch of wikis (T287876, T287871, T287878, T287880, T287875, T287879, T287872)
  • 18:40 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: ee47f9d: Add rollbacker group for kswiki (T286789) (duration: 00m 56s)
  • 18:35 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 18:33 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: eec997c: Enable SUL autologin for wikimania.wikimedia.org (T285197) (duration: 00m 55s)
  • 18:33 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 18:31 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.16/extensions/GrowthExperiments/modules/: 05cf1d6: Add a link: Show article extract instead of description in the link inspector (T287636; 2/2) (duration: 00m 56s)
  • 18:30 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.16/extensions/GrowthExperiments/extension.json: 05cf1d6: Add a link: Show article extract instead of description in the link inspector (T287636; 1/2) (duration: 00m 57s)
  • 18:25 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 18:22 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 18:20 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: cc8ca45: Add tewikisource as import source for tewikibooks (T286978) (duration: 00m 56s)
  • 18:15 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 11e96ba: Add media.defense.gov to the wgCopyUploadsDomains allowlist of Wikimedia Commons (T287264) (duration: 00m 56s)
  • 18:11 urbanecm@deploy1002: Synchronized static/images/project-logos/: 97b6897: Remove unused enwiki celebration logos (T272108) (duration: 00m 57s)
  • 18:07 urbanecm@deploy1002: Synchronized static/images/project-logos/: 16f9794: Remove unused eswiki celebration logos (T280908) (duration: 00m 57s)
  • 17:57 razzi@cumin1001: START - Cookbook sre.druid.roll-restart-workers for Druid public cluster: Roll restart of Druid's jvm daemons. - razzi@cumin1001
  • 15:44 jynus: remove s2 from db1139 T287230
  • 14:37 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on lvs2010.codfw.wmnet with reason: NIC maintenance
  • 14:37 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on lvs2010.codfw.wmnet with reason: NIC maintenance
  • 13:21 volans: uploaded spicerack_0.0.57 to apt.wikimedia.org buster-wikimedia,bullseye-wikimedia
  • 13:03 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on gerrit1001.wikimedia.org with reason: apply 706049
  • 13:03 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on gerrit1001.wikimedia.org with reason: apply 706049
  • 13:02 mutante: gerrit1001 - restarting service after 706049
  • 13:00 mutante: gerrit1001 - re-enabling puppet, deploying sshd listening / firewall change
  • 12:38 mutante: gerrit2001 - restarting gerrit after deploying 706049
  • 12:20 mutante: gerrit servers: disabling puppet
  • 12:10 reedy@deploy1002: Synchronized php-1.37.0-wmf.16/extensions/MobileFrontend/: T287528 (duration: 00m 57s)
  • 12:08 reedy@deploy1002: Synchronized php-1.37.0-wmf.16/extensions/SecurePoll/: T287780 (duration: 00m 57s)
  • 11:54 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw1288.eqiad.wmnet
  • 11:42 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw1288.eqiad.wmnet
  • 11:42 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw1271.eqiad.wmnet
  • 11:40 reedy@deploy1002: Synchronized php-1.37.0-wmf.16/extensions/SecurePoll/: T287782 (duration: 00m 56s)
  • 11:31 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw1271.eqiad.wmnet
  • 11:29 hashar: restarting gerrit primary server on gerrit1001
  • 11:27 hashar: restarting Jenkins on contint2001
  • 11:27 hashar: restarting Jenkins on contint1001
  • 11:26 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 11:24 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 11:18 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw1288.eqiad.wmnet
  • 11:17 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 11:16 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw1271.eqiad.wmnet
  • 11:14 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 11:13 urbanecm: EU B&C window completed
  • 11:10 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 43020b7: votewiki: Enable Single Transferable Vote (T283728) (duration: 00m 57s)
  • 11:08 moritzm: installing openjdk-11 security updates
  • 11:05 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 26bcaaf: Restore logging for mediamoderation script to better understand high error rate occurring when running script (T287511) (duration: 00m 57s)
  • 07:53 moritzm: catch up bullseye installs with latest state of testing
  • 07:24 moritzm: installing libsndfile security updates on buster
  • 07:12 moritzm: installing aspell security updates
  • 05:01 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 04:58 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 02:01 tstarling@deploy1002: Synchronized src/defines.php: for consistency only, should have no production impact (duration: 00m 57s)

Archives

See Server Admin Log/Archives.