You are browsing a read-only backup copy of Wikitech. The primary site can be found at wikitech.wikimedia.org

Difference between revisions of "Server Admin Log"

From Wikitech-static
Jump to navigation Jump to search
imported>Stashbot
(legoktm: restarted mailman3 again (T282348) pymysql.err.InternalError: (1205, 'Lock wait timeout exceeded; try restarting transaction'))
imported>Stashbot
(urbanecm@deploy1002: Synchronized wmf-config/CommonSettings.php: 779fb53bfd7a4d9b11f865df14f8a72adb97f33b: Update messages used for tech CoC (T280886) (duration: 00m 56s))
Line 1: Line 1:
== 2021-05-10 ==
* 23:38 urbanecm@deploy1002: Synchronized wmf-config/CommonSettings.php: {{Gerrit|779fb53bfd7a4d9b11f865df14f8a72adb97f33b}}: Update messages used for tech CoC ([[phab:T280886|T280886]]) (duration: 00m 56s)
* 23:32 urbanecm@deploy1002: Synchronized wmf-config/extension-list: {{Gerrit|ba8b786c7f3a290f0747a6859fd07502eb83108f}}: NO-OP: Enable ChessBrowser on beta ([[phab:T244075|T244075]]) (duration: 00m 57s)
* 23:12 urbanecm@deploy1002: Synchronized wmf-config/logos.php: {{Gerrit|dd6fa6504350a90c9f14c218bc972558791f0a6d}}: Use ptwiki 20th anniversary logos ([[phab:T281925|T281925]]) (duration: 00m 59s)
* 23:08 urbanecm@deploy1002: Synchronized static/images/project-logos/: {{Gerrit|f2a76b1a6eb55749395e67d74c74a7fc5df52f1b}}: Add ptwiki 20th anniversary logos ([[phab:T281925|T281925]]) (duration: 00m 58s)
* 22:28 eileen: civicrm revision changed from {{Gerrit|2052d79248}} to {{Gerrit|38ac15233f}}, config revision is {{Gerrit|47f21e4568}}
* 21:59 dancy@deploy1002: Synchronized php-1.37.0-wmf.4/extensions/MediaSearch/MediaSearch.i18n.php: Backport: [[gerrit:688295{{!}}Manually include I18nUtils class (T282206)]] (duration: 00m 56s)
* 21:45 dancy@deploy1002: Synchronized php-1.37.0-wmf.3/extensions/MediaSearch/MediaSearch.i18n.php: Backport: [[gerrit:688294{{!}}Manually include I18nUtils class (T282206)]] (duration: 01m 01s)
* 21:39 legoktm: nvm, downgraded flufl.bounce on lists1001
* 21:26 legoktm: upgraded flufl.bounce on lists1001 and restarted mailman3 [[phab:T282348|T282348]]
* 20:44 andrew@deploy1002: Finished deploy [horizon/deploy@2604d7b]: more deployment fixes (duration: 03m 44s)
* 20:41 andrew@deploy1002: Started deploy [horizon/deploy@2604d7b]: more deployment fixes
* 20:40 andrew@deploy1002: Finished deploy [horizon/deploy@6dc83bd]: update horizon to fix [[phab:T282489|T282489]] (duration: 02m 07s)
* 20:38 andrew@deploy1002: Started deploy [horizon/deploy@6dc83bd]: update horizon to fix [[phab:T282489|T282489]]
* 20:35 andrew@deploy1002: Finished deploy [horizon/deploy@6dc83bd]: update horizon to fix [[phab:T282489|T282489]] (duration: 01m 55s)
* 20:33 andrew@deploy1002: Started deploy [horizon/deploy@6dc83bd]: update horizon to fix [[phab:T282489|T282489]]
* 20:31 andrew@deploy1002: Finished deploy [horizon/deploy@6dc83bd]: update horizon to fix [[phab:T282489|T282489]] (duration: 01m 21s)
* 20:29 andrew@deploy1002: Started deploy [horizon/deploy@6dc83bd]: update horizon to fix [[phab:T282489|T282489]]
* 20:29 andrew@deploy1002: deploy aborted: update horizon to fix [[phab:T282489|T282489]] (duration: 00m 36s)
* 20:29 andrew@deploy1002: Started deploy [horizon/deploy@6dc83bd]: update horizon to fix [[phab:T282489|T282489]]
* 20:29 andrew@deploy1002: deploy aborted: update horizon to fix [[phab:T282489|T282489]] (duration: 00m 15s)
* 20:28 andrew@deploy1002: Started deploy [horizon/deploy@6dc83bd]: update horizon to fix [[phab:T282489|T282489]]
* 20:25 andrew@deploy1002: Finished deploy [horizon/deploy@6dc83bd]: update horizon to fix [[phab:T282489|T282489]] (duration: 04m 10s)
* 20:21 andrew@deploy1002: Started deploy [horizon/deploy@6dc83bd]: update horizon to fix [[phab:T282489|T282489]]
* 18:34 jforrester@deploy1002: Synchronized wmf-config/CommonSettings.php: Config: [[gerrit:632598{{!}}loginwiki: Allow users to mark Notifications as read (T264834)]] (duration: 00m 57s)
* 18:25 jforrester@deploy1002: Synchronized wmf-config/CommonSettings.php: Config: [[gerrit:677325{{!}}Disable LocalisationUpdate, part I (T158360)]] (duration: 00m 58s)
* 18:24 XioNoX: add cmooney to all network devices
* 18:18 jforrester@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:679940{{!}}[wikitech] Enable VE desktop section edit links (T280291)]] (duration: 00m 57s)
* 18:13 jforrester@deploy1002: Synchronized wmf-config: Config: [[gerrit:657697{{!}}wgAbuseFilterAflFilterMigrationStage: Stop setting, COMPAT_NEW is default (T269712)]] (duration: 00m 57s)
* 18:10 jforrester@deploy1002: Synchronized wmf-config/flaggedrevs.php: Config: [[gerrit:673306{{!}}FlaggedRevs: Stop setting wgFlaggedRevsWhitelist, now ignored]] (duration: 00m 57s)
* 18:08 legoktm: imported new mailman3, flufl.bounce packages to apt.wm.o
* 16:27 jbond42: rm -r /var/lib/routinator/repository and rebuilding repo
* 16:23 herron@deploy1002: Synchronized wmf-config/ProductionServices.php: Config: [[gerrit:688281{{!}}arclamp/xenon: point all hosts to eqiad (mwlog1002) (T224565)]] (duration: 00m 59s)
* 15:20 elukey: restart rsyslog on rpki1001
* 14:32 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=maps1005.eqiad.wmnet
* 13:14 marostegui@cumin1001: dbctl commit (dc=all): 'db1105:3312 (re)pooling @ 100%: Repool db1105:3312', diff saved to https://phabricator.wikimedia.org/P15892 and previous config saved to /var/cache/conftool/dbconfig/20210510-131434-root.json
* 12:59 marostegui@cumin1001: dbctl commit (dc=all): 'db1105:3312 (re)pooling @ 75%: Repool db1105:3312', diff saved to https://phabricator.wikimedia.org/P15891 and previous config saved to /var/cache/conftool/dbconfig/20210510-125930-root.json
* 12:44 marostegui@cumin1001: dbctl commit (dc=all): 'db1105:3312 (re)pooling @ 50%: Repool db1105:3312', diff saved to https://phabricator.wikimedia.org/P15890 and previous config saved to /var/cache/conftool/dbconfig/20210510-124427-root.json
* 12:29 volans@cumin2002: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) homer to cumin2002.codfw.wmnet with reason: Initial deploy to cumin2002 - volans@cumin2002
* 12:29 marostegui@cumin1001: dbctl commit (dc=all): 'db1105:3312 (re)pooling @ 25%: Repool db1105:3312', diff saved to https://phabricator.wikimedia.org/P15889 and previous config saved to /var/cache/conftool/dbconfig/20210510-122923-root.json
* 12:27 volans@cumin2002: START - Cookbook sre.deploy.python-code homer to cumin2002.codfw.wmnet with reason: Initial deploy to cumin2002 - volans@cumin2002
* 11:46 Urbanecm: EU B&C window done
* 11:41 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|3418237fdbe3eaff409bb23bf97fbba51e60337a}}: Disabling Education Program namespaces in Russian Wikipedia ([[phab:T282112|T282112]]) (duration: 00m 57s)
* 11:37 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|8bef11c3048683663e6edc38e21cd6d6d1192eb7}}: Add *.geograph.ie to the wgCopyUploadsDomains allowlist of Wikimedia Commons ([[phab:T282007|T282007]]) (duration: 00m 57s)
* 11:33 Urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript namespaceDupes.php --wiki=jawikivoyage --fix # [[phab:T262155|T262155]]
* 11:33 Urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript namespaceDupes.php --wiki=jawikivoyage # [[phab:T262155|T262155]]
* 11:33 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|068cd7e41e339acf72fb81d4fcc3b86292209fe3}}: Change namespace name and aliases on jawikivoyage ([[phab:T262155|T262155]]) (duration: 00m 57s)
* 11:26 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|9209d96560777cf6747d57855c7b525e702664d7}}: Remove Vector language button from Commons, Wikidata, Mediawiki, Wikispecies ([[phab:T281968|T281968]]) (duration: 00m 57s)
* 11:20 urbanecm@deploy1002: Synchronized wmf-config/Wikibase.php: {{Gerrit|7f6f8497cdfba6d766e3e6974ee15a492f0518ac}}: Add tmpSerializeEmptyListsAsObjects to Wikibase.php ([[phab:T241422|T241422]]) (duration: 01m 01s)
* 11:19 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|6138c64e7c13fbc52ad084c0901bdd2ab30ad953}}: Add tmpSerializeEmptyListsAsObjects Wikibase repo config ([[phab:T241422|T241422]]) (duration: 00m 57s)
* 11:11 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|23271ddb555b44c2c9659c32907fdeff2a768916}}: Enable ReferencePreviews as full default on Marathi wiki ([[phab:T282147|T282147]]) (duration: 00m 57s)
* 11:09 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.4/includes/block/DatabaseBlockStore.php: {{Gerrit|bd28391f807d6205875cad0d049760c0e606de24}}: DatabaseBlockStore: fetch correct ActorNormalization (3/3; [[phab:T281972|T281972]]) (duration: 00m 56s)
* 11:08 urbanecm@deploy1002: sync-file aborted: {{Gerrit|bd28391f807d6205875cad0d049760c0e606de24}}: DatabaseBlockStore: fetch correct ActorNormalization ([[phab:T281972|T281972]]) (duration: 00m 04s)
* 11:07 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.4/includes/ServiceWiring.php: {{Gerrit|85dc711dee753ad8302a431369d7814efb2785d1}}: DatabaseBlockStore: fetch correct ActorNormalization (2/3; [[phab:T281972|T281972]]) (duration: 00m 56s)
* 11:05 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.4/includes/block/DatabaseBlockStore.php: {{Gerrit|85dc711dee753ad8302a431369d7814efb2785d1}}: DatabaseBlockStore: fetch correct ActorNormalization (1/3; [[phab:T281972|T281972]]) (duration: 00m 57s)
* 11:01 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1105:3312', diff saved to https://phabricator.wikimedia.org/P15888 and previous config saved to /var/cache/conftool/dbconfig/20210510-110125-marostegui.json
* 10:41 marostegui@cumin1001: dbctl commit (dc=all): 'db1156 (re)pooling @ 100%: Repool db1156', diff saved to https://phabricator.wikimedia.org/P15887 and previous config saved to /var/cache/conftool/dbconfig/20210510-104119-root.json
* 10:40 jdrewniak@deploy1002: Synchronized portals: Wikimedia Portals Update: [[gerrit:688214{{!}} Bumping portals to master (T128546)]] (duration: 00m 58s)
* 10:39 jdrewniak@deploy1002: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: [[gerrit:688214{{!}} Bumping portals to master (T128546)]] (duration: 00m 59s)
* 10:31 moritzm: installing openjdk-11 security updates
* 10:26 marostegui@cumin1001: dbctl commit (dc=all): 'db1156 (re)pooling @ 75%: Repool db1156', diff saved to https://phabricator.wikimedia.org/P15886 and previous config saved to /var/cache/conftool/dbconfig/20210510-102615-root.json
* 10:22 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=maps1005.eqiad.wmnet
* 10:18 vgutierrez: rolling restart of ATS backend instances to clear spurious warnings
* 10:17 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=maps1004.eqiad.wmnet
* 10:13 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on maps1005.eqiad.wmnet with reason: Resyncing database from master
* 10:13 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on maps1005.eqiad.wmnet with reason: Resyncing database from master
* 10:11 marostegui@cumin1001: dbctl commit (dc=all): 'db1156 (re)pooling @ 50%: Repool db1156', diff saved to https://phabricator.wikimedia.org/P15885 and previous config saved to /var/cache/conftool/dbconfig/20210510-101112-root.json
* 09:56 marostegui@cumin1001: dbctl commit (dc=all): 'db1156 (re)pooling @ 25%: Repool db1156', diff saved to https://phabricator.wikimedia.org/P15884 and previous config saved to /var/cache/conftool/dbconfig/20210510-095608-root.json
* 09:48 vgutierrez: Enforce Puppet Internal CA validation on trafficserver@eqiad - [[phab:T281673|T281673]]
* 09:45 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1074 [[phab:T281959|T281959]]', diff saved to https://phabricator.wikimedia.org/P15883 and previous config saved to /var/cache/conftool/dbconfig/20210510-094554-marostegui.json
* 09:28 jmm@puppetmaster1001: conftool action : set/pooled=yes; selector: name=ldap-replica1004.wikimedia.org
* 09:27 jmm@puppetmaster1001: conftool action : set/pooled=yes; selector: name=ldap-replica1003.wikimedia.org
* 09:26 jmm@puppetmaster1001: conftool action : set/pooled=yes; selector: name=ldap-replica2006.wikimedia.org
* 08:52 moritzm: installing bind9 security updates on stretch (client-side tools/libs only)
* 08:48 vgutierrez: Enforce Puppet Internal CA validation on trafficserver@esams - [[phab:T281673|T281673]]
* 08:41 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1156 for schema change', diff saved to https://phabricator.wikimedia.org/P15881 and previous config saved to /var/cache/conftool/dbconfig/20210510-084102-marostegui.json
* 08:40 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts failoid1001.eqiad.wmnet
* 08:40 marostegui@cumin1001: dbctl commit (dc=all): 'db1146:3312 (re)pooling @ 100%: Repool db1146:3312', diff saved to https://phabricator.wikimedia.org/P15880 and previous config saved to /var/cache/conftool/dbconfig/20210510-084040-root.json
* 08:28 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts failoid1001.eqiad.wmnet
* 08:25 marostegui@cumin1001: dbctl commit (dc=all): 'db1146:3312 (re)pooling @ 75%: Repool db1146:3312', diff saved to https://phabricator.wikimedia.org/P15879 and previous config saved to /var/cache/conftool/dbconfig/20210510-082536-root.json
* 08:24 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts failoid2001.codfw.wmnet
* 08:24 XioNoX: push pfw policies - [[phab:T282286|T282286]]
* 08:15 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts failoid2001.codfw.wmnet
* 08:10 marostegui@cumin1001: dbctl commit (dc=all): 'db1146:3312 (re)pooling @ 50%: Repool db1146:3312', diff saved to https://phabricator.wikimedia.org/P15878 and previous config saved to /var/cache/conftool/dbconfig/20210510-081033-root.json
* 07:55 marostegui@cumin1001: dbctl commit (dc=all): 'db1146:3312 (re)pooling @ 25%: Repool db1146:3312', diff saved to https://phabricator.wikimedia.org/P15877 and previous config saved to /var/cache/conftool/dbconfig/20210510-075529-root.json
* 07:38 hashar: Restarted CI Jenkins # [[phab:T281737|T281737]]
* 06:37 elukey: apt-get clean on rpki1001 to free some space
* 06:32 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1146:3312 for schema change', diff saved to https://phabricator.wikimedia.org/P15876 and previous config saved to /var/cache/conftool/dbconfig/20210510-063254-marostegui.json
* 06:31 marostegui@cumin1001: dbctl commit (dc=all): 'db1129 (re)pooling @ 100%: Repool db1129', diff saved to https://phabricator.wikimedia.org/P15875 and previous config saved to /var/cache/conftool/dbconfig/20210510-063121-root.json
* 06:16 marostegui@cumin1001: dbctl commit (dc=all): 'db1129 (re)pooling @ 75%: Repool db1129', diff saved to https://phabricator.wikimedia.org/P15874 and previous config saved to /var/cache/conftool/dbconfig/20210510-061617-root.json
* 06:01 marostegui@cumin1001: dbctl commit (dc=all): 'db1129 (re)pooling @ 50%: Repool db1129', diff saved to https://phabricator.wikimedia.org/P15873 and previous config saved to /var/cache/conftool/dbconfig/20210510-060113-root.json
* 05:46 marostegui@cumin1001: dbctl commit (dc=all): 'db1129 (re)pooling @ 25%: Repool db1129', diff saved to https://phabricator.wikimedia.org/P15872 and previous config saved to /var/cache/conftool/dbconfig/20210510-054610-root.json
* 05:13 marostegui@cumin1001: dbctl commit (dc=all): 'Remove db1082 from dbctl [[phab:T281794|T281794]]', diff saved to https://phabricator.wikimedia.org/P15871 and previous config saved to /var/cache/conftool/dbconfig/20210510-051334-marostegui.json
* 05:07 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1129 for schema change', diff saved to https://phabricator.wikimedia.org/P15870 and previous config saved to /var/cache/conftool/dbconfig/20210510-050727-marostegui.json
== 2021-05-09 ==
== 2021-05-09 ==
* 21:44 legoktm: restarted mailman3 again ([[phab:T282348|T282348]]) pymysql.err.InternalError: (1205, 'Lock wait timeout exceeded; try restarting transaction')
* 21:44 legoktm: restarted mailman3 again ([[phab:T282348|T282348]]) pymysql.err.InternalError: (1205, 'Lock wait timeout exceeded; try restarting transaction')

Revision as of 23:38, 10 May 2021

2021-05-10

  • 23:38 urbanecm@deploy1002: Synchronized wmf-config/CommonSettings.php: 779fb53: Update messages used for tech CoC (T280886) (duration: 00m 56s)
  • 23:32 urbanecm@deploy1002: Synchronized wmf-config/extension-list: ba8b786: NO-OP: Enable ChessBrowser on beta (T244075) (duration: 00m 57s)
  • 23:12 urbanecm@deploy1002: Synchronized wmf-config/logos.php: dd6fa65: Use ptwiki 20th anniversary logos (T281925) (duration: 00m 59s)
  • 23:08 urbanecm@deploy1002: Synchronized static/images/project-logos/: f2a76b1: Add ptwiki 20th anniversary logos (T281925) (duration: 00m 58s)
  • 22:28 eileen: civicrm revision changed from 2052d79248 to 38ac15233f, config revision is 47f21e4568
  • 21:59 dancy@deploy1002: Synchronized php-1.37.0-wmf.4/extensions/MediaSearch/MediaSearch.i18n.php: Backport: Manually include I18nUtils class (T282206) (duration: 00m 56s)
  • 21:45 dancy@deploy1002: Synchronized php-1.37.0-wmf.3/extensions/MediaSearch/MediaSearch.i18n.php: Backport: Manually include I18nUtils class (T282206) (duration: 01m 01s)
  • 21:39 legoktm: nvm, downgraded flufl.bounce on lists1001
  • 21:26 legoktm: upgraded flufl.bounce on lists1001 and restarted mailman3 T282348
  • 20:44 andrew@deploy1002: Finished deploy [horizon/deploy@2604d7b]: more deployment fixes (duration: 03m 44s)
  • 20:41 andrew@deploy1002: Started deploy [horizon/deploy@2604d7b]: more deployment fixes
  • 20:40 andrew@deploy1002: Finished deploy [horizon/deploy@6dc83bd]: update horizon to fix T282489 (duration: 02m 07s)
  • 20:38 andrew@deploy1002: Started deploy [horizon/deploy@6dc83bd]: update horizon to fix T282489
  • 20:35 andrew@deploy1002: Finished deploy [horizon/deploy@6dc83bd]: update horizon to fix T282489 (duration: 01m 55s)
  • 20:33 andrew@deploy1002: Started deploy [horizon/deploy@6dc83bd]: update horizon to fix T282489
  • 20:31 andrew@deploy1002: Finished deploy [horizon/deploy@6dc83bd]: update horizon to fix T282489 (duration: 01m 21s)
  • 20:29 andrew@deploy1002: Started deploy [horizon/deploy@6dc83bd]: update horizon to fix T282489
  • 20:29 andrew@deploy1002: deploy aborted: update horizon to fix T282489 (duration: 00m 36s)
  • 20:29 andrew@deploy1002: Started deploy [horizon/deploy@6dc83bd]: update horizon to fix T282489
  • 20:29 andrew@deploy1002: deploy aborted: update horizon to fix T282489 (duration: 00m 15s)
  • 20:28 andrew@deploy1002: Started deploy [horizon/deploy@6dc83bd]: update horizon to fix T282489
  • 20:25 andrew@deploy1002: Finished deploy [horizon/deploy@6dc83bd]: update horizon to fix T282489 (duration: 04m 10s)
  • 20:21 andrew@deploy1002: Started deploy [horizon/deploy@6dc83bd]: update horizon to fix T282489
  • 18:34 jforrester@deploy1002: Synchronized wmf-config/CommonSettings.php: Config: loginwiki: Allow users to mark Notifications as read (T264834) (duration: 00m 57s)
  • 18:25 jforrester@deploy1002: Synchronized wmf-config/CommonSettings.php: Config: Disable LocalisationUpdate, part I (T158360) (duration: 00m 58s)
  • 18:24 XioNoX: add cmooney to all network devices
  • 18:18 jforrester@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [wikitech] Enable VE desktop section edit links (T280291) (duration: 00m 57s)
  • 18:13 jforrester@deploy1002: Synchronized wmf-config: Config: wgAbuseFilterAflFilterMigrationStage: Stop setting, COMPAT_NEW is default (T269712) (duration: 00m 57s)
  • 18:10 jforrester@deploy1002: Synchronized wmf-config/flaggedrevs.php: Config: FlaggedRevs: Stop setting wgFlaggedRevsWhitelist, now ignored (duration: 00m 57s)
  • 18:08 legoktm: imported new mailman3, flufl.bounce packages to apt.wm.o
  • 16:27 jbond42: rm -r /var/lib/routinator/repository and rebuilding repo
  • 16:23 herron@deploy1002: Synchronized wmf-config/ProductionServices.php: Config: arclamp/xenon: point all hosts to eqiad (mwlog1002) (T224565) (duration: 00m 59s)
  • 15:20 elukey: restart rsyslog on rpki1001
  • 14:32 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=maps1005.eqiad.wmnet
  • 13:14 marostegui@cumin1001: dbctl commit (dc=all): 'db1105:3312 (re)pooling @ 100%: Repool db1105:3312', diff saved to https://phabricator.wikimedia.org/P15892 and previous config saved to /var/cache/conftool/dbconfig/20210510-131434-root.json
  • 12:59 marostegui@cumin1001: dbctl commit (dc=all): 'db1105:3312 (re)pooling @ 75%: Repool db1105:3312', diff saved to https://phabricator.wikimedia.org/P15891 and previous config saved to /var/cache/conftool/dbconfig/20210510-125930-root.json
  • 12:44 marostegui@cumin1001: dbctl commit (dc=all): 'db1105:3312 (re)pooling @ 50%: Repool db1105:3312', diff saved to https://phabricator.wikimedia.org/P15890 and previous config saved to /var/cache/conftool/dbconfig/20210510-124427-root.json
  • 12:29 volans@cumin2002: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) homer to cumin2002.codfw.wmnet with reason: Initial deploy to cumin2002 - volans@cumin2002
  • 12:29 marostegui@cumin1001: dbctl commit (dc=all): 'db1105:3312 (re)pooling @ 25%: Repool db1105:3312', diff saved to https://phabricator.wikimedia.org/P15889 and previous config saved to /var/cache/conftool/dbconfig/20210510-122923-root.json
  • 12:27 volans@cumin2002: START - Cookbook sre.deploy.python-code homer to cumin2002.codfw.wmnet with reason: Initial deploy to cumin2002 - volans@cumin2002
  • 11:46 Urbanecm: EU B&C window done
  • 11:41 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 3418237: Disabling Education Program namespaces in Russian Wikipedia (T282112) (duration: 00m 57s)
  • 11:37 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 8bef11c: Add *.geograph.ie to the wgCopyUploadsDomains allowlist of Wikimedia Commons (T282007) (duration: 00m 57s)
  • 11:33 Urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript namespaceDupes.php --wiki=jawikivoyage --fix # T262155
  • 11:33 Urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript namespaceDupes.php --wiki=jawikivoyage # T262155
  • 11:33 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 068cd7e: Change namespace name and aliases on jawikivoyage (T262155) (duration: 00m 57s)
  • 11:26 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 9209d96: Remove Vector language button from Commons, Wikidata, Mediawiki, Wikispecies (T281968) (duration: 00m 57s)
  • 11:20 urbanecm@deploy1002: Synchronized wmf-config/Wikibase.php: 7f6f849: Add tmpSerializeEmptyListsAsObjects to Wikibase.php (T241422) (duration: 01m 01s)
  • 11:19 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 6138c64: Add tmpSerializeEmptyListsAsObjects Wikibase repo config (T241422) (duration: 00m 57s)
  • 11:11 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 23271dd: Enable ReferencePreviews as full default on Marathi wiki (T282147) (duration: 00m 57s)
  • 11:09 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.4/includes/block/DatabaseBlockStore.php: bd28391: DatabaseBlockStore: fetch correct ActorNormalization (3/3; T281972) (duration: 00m 56s)
  • 11:08 urbanecm@deploy1002: sync-file aborted: bd28391: DatabaseBlockStore: fetch correct ActorNormalization (T281972) (duration: 00m 04s)
  • 11:07 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.4/includes/ServiceWiring.php: 85dc711: DatabaseBlockStore: fetch correct ActorNormalization (2/3; T281972) (duration: 00m 56s)
  • 11:05 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.4/includes/block/DatabaseBlockStore.php: 85dc711: DatabaseBlockStore: fetch correct ActorNormalization (1/3; T281972) (duration: 00m 57s)
  • 11:01 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1105:3312', diff saved to https://phabricator.wikimedia.org/P15888 and previous config saved to /var/cache/conftool/dbconfig/20210510-110125-marostegui.json
  • 10:41 marostegui@cumin1001: dbctl commit (dc=all): 'db1156 (re)pooling @ 100%: Repool db1156', diff saved to https://phabricator.wikimedia.org/P15887 and previous config saved to /var/cache/conftool/dbconfig/20210510-104119-root.json
  • 10:40 jdrewniak@deploy1002: Synchronized portals: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 00m 58s)
  • 10:39 jdrewniak@deploy1002: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 00m 59s)
  • 10:31 moritzm: installing openjdk-11 security updates
  • 10:26 marostegui@cumin1001: dbctl commit (dc=all): 'db1156 (re)pooling @ 75%: Repool db1156', diff saved to https://phabricator.wikimedia.org/P15886 and previous config saved to /var/cache/conftool/dbconfig/20210510-102615-root.json
  • 10:22 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=maps1005.eqiad.wmnet
  • 10:18 vgutierrez: rolling restart of ATS backend instances to clear spurious warnings
  • 10:17 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=maps1004.eqiad.wmnet
  • 10:13 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on maps1005.eqiad.wmnet with reason: Resyncing database from master
  • 10:13 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on maps1005.eqiad.wmnet with reason: Resyncing database from master
  • 10:11 marostegui@cumin1001: dbctl commit (dc=all): 'db1156 (re)pooling @ 50%: Repool db1156', diff saved to https://phabricator.wikimedia.org/P15885 and previous config saved to /var/cache/conftool/dbconfig/20210510-101112-root.json
  • 09:56 marostegui@cumin1001: dbctl commit (dc=all): 'db1156 (re)pooling @ 25%: Repool db1156', diff saved to https://phabricator.wikimedia.org/P15884 and previous config saved to /var/cache/conftool/dbconfig/20210510-095608-root.json
  • 09:48 vgutierrez: Enforce Puppet Internal CA validation on trafficserver@eqiad - T281673
  • 09:45 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1074 T281959', diff saved to https://phabricator.wikimedia.org/P15883 and previous config saved to /var/cache/conftool/dbconfig/20210510-094554-marostegui.json
  • 09:28 jmm@puppetmaster1001: conftool action : set/pooled=yes; selector: name=ldap-replica1004.wikimedia.org
  • 09:27 jmm@puppetmaster1001: conftool action : set/pooled=yes; selector: name=ldap-replica1003.wikimedia.org
  • 09:26 jmm@puppetmaster1001: conftool action : set/pooled=yes; selector: name=ldap-replica2006.wikimedia.org
  • 08:52 moritzm: installing bind9 security updates on stretch (client-side tools/libs only)
  • 08:48 vgutierrez: Enforce Puppet Internal CA validation on trafficserver@esams - T281673
  • 08:41 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1156 for schema change', diff saved to https://phabricator.wikimedia.org/P15881 and previous config saved to /var/cache/conftool/dbconfig/20210510-084102-marostegui.json
  • 08:40 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts failoid1001.eqiad.wmnet
  • 08:40 marostegui@cumin1001: dbctl commit (dc=all): 'db1146:3312 (re)pooling @ 100%: Repool db1146:3312', diff saved to https://phabricator.wikimedia.org/P15880 and previous config saved to /var/cache/conftool/dbconfig/20210510-084040-root.json
  • 08:28 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts failoid1001.eqiad.wmnet
  • 08:25 marostegui@cumin1001: dbctl commit (dc=all): 'db1146:3312 (re)pooling @ 75%: Repool db1146:3312', diff saved to https://phabricator.wikimedia.org/P15879 and previous config saved to /var/cache/conftool/dbconfig/20210510-082536-root.json
  • 08:24 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts failoid2001.codfw.wmnet
  • 08:24 XioNoX: push pfw policies - T282286
  • 08:15 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts failoid2001.codfw.wmnet
  • 08:10 marostegui@cumin1001: dbctl commit (dc=all): 'db1146:3312 (re)pooling @ 50%: Repool db1146:3312', diff saved to https://phabricator.wikimedia.org/P15878 and previous config saved to /var/cache/conftool/dbconfig/20210510-081033-root.json
  • 07:55 marostegui@cumin1001: dbctl commit (dc=all): 'db1146:3312 (re)pooling @ 25%: Repool db1146:3312', diff saved to https://phabricator.wikimedia.org/P15877 and previous config saved to /var/cache/conftool/dbconfig/20210510-075529-root.json
  • 07:38 hashar: Restarted CI Jenkins # T281737
  • 06:37 elukey: apt-get clean on rpki1001 to free some space
  • 06:32 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1146:3312 for schema change', diff saved to https://phabricator.wikimedia.org/P15876 and previous config saved to /var/cache/conftool/dbconfig/20210510-063254-marostegui.json
  • 06:31 marostegui@cumin1001: dbctl commit (dc=all): 'db1129 (re)pooling @ 100%: Repool db1129', diff saved to https://phabricator.wikimedia.org/P15875 and previous config saved to /var/cache/conftool/dbconfig/20210510-063121-root.json
  • 06:16 marostegui@cumin1001: dbctl commit (dc=all): 'db1129 (re)pooling @ 75%: Repool db1129', diff saved to https://phabricator.wikimedia.org/P15874 and previous config saved to /var/cache/conftool/dbconfig/20210510-061617-root.json
  • 06:01 marostegui@cumin1001: dbctl commit (dc=all): 'db1129 (re)pooling @ 50%: Repool db1129', diff saved to https://phabricator.wikimedia.org/P15873 and previous config saved to /var/cache/conftool/dbconfig/20210510-060113-root.json
  • 05:46 marostegui@cumin1001: dbctl commit (dc=all): 'db1129 (re)pooling @ 25%: Repool db1129', diff saved to https://phabricator.wikimedia.org/P15872 and previous config saved to /var/cache/conftool/dbconfig/20210510-054610-root.json
  • 05:13 marostegui@cumin1001: dbctl commit (dc=all): 'Remove db1082 from dbctl T281794', diff saved to https://phabricator.wikimedia.org/P15871 and previous config saved to /var/cache/conftool/dbconfig/20210510-051334-marostegui.json
  • 05:07 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1129 for schema change', diff saved to https://phabricator.wikimedia.org/P15870 and previous config saved to /var/cache/conftool/dbconfig/20210510-050727-marostegui.json

2021-05-09

  • 21:44 legoktm: restarted mailman3 again (T282348) pymysql.err.InternalError: (1205, 'Lock wait timeout exceeded; try restarting transaction')
  • 18:28 legoktm: systemctl restart mailman3, bounce runner died again (T282348)
  • 10:52 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 180 days, 0:00:00 on cloudmetrics1002.eqiad.wmnet with reason: T275605
  • 10:52 aborrero@cumin1001: START - Cookbook sre.hosts.downtime for 180 days, 0:00:00 on cloudmetrics1002.eqiad.wmnet with reason: T275605
  • 09:16 legoktm: mailman3 live hacked patch at https://phabricator.wikimedia.org/T282348#7072358 to fix bounce queue
  • 06:21 legoktm: restarting mailman3 service, bounce runner died
  • 04:27 Amir1: starting upgrade of batch H of mailing lists (T280322)

2021-05-08

  • 17:18 Amir1: starting upgrade of batch G of mailing lists (T280322)

2021-05-07

  • 21:40 legoktm: deleted education@ from MM3, didn't import properly
  • 21:35 legoktm: deleted festivalsommer-teilnehmer from MM3, didn't import properly
  • 21:33 legoktm: fixed owner for wdqs-gui-build list
  • 19:48 pt1979@cumin2001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 19:42 pt1979@cumin2001: START - Cookbook sre.dns.netbox
  • 18:55 legoktm: deleted daily-article-l from mailman3 after failed import
  • 18:33 brennen@deploy1002: rebuilt and synchronized wikiversions files: all wikis to 1.37.0-wmf.4
  • 18:28 brennen@deploy1002: Synchronized php: group1 wikis to 1.37.0-wmf.4 (duration: 01m 07s)
  • 18:27 brennen@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.37.0-wmf.4
  • 18:23 brennen: 1.37.0-wmf.4 train status (T281145): blockers appear resolved, going ahead in the interest of not having a split deploy over weekend
  • 17:50 brennen@deploy1002: Synchronized php-1.37.0-wmf.4/includes/cache/LinkBatch.php: Backport: LinkBatch: skip bad input (T282180 T282070) (duration: 01m 06s)
  • 17:25 andrew@deploy1002: Finished deploy [horizon/deploy@20f479e]: updated trove -> codfw1dev (duration: 01m 55s)
  • 17:23 andrew@deploy1002: Started deploy [horizon/deploy@20f479e]: updated trove -> codfw1dev
  • 15:10 andrew@deploy1002: Finished deploy [horizon/deploy@71f273c]: updated trove -> codfw1dev (duration: 01m 24s)
  • 15:08 andrew@deploy1002: Started deploy [horizon/deploy@71f273c]: updated trove -> codfw1dev
  • 15:03 andrew@deploy1002: Finished deploy [horizon/deploy@71f273c]: updated trove -> codfw1dev (duration: 01m 11s)
  • 15:02 andrew@deploy1002: Started deploy [horizon/deploy@71f273c]: updated trove -> codfw1dev
  • 15:02 andrew@deploy1002: Finished deploy [horizon/deploy@71f273c]: updated trove -> codfw1dev (duration: 01m 26s)
  • 15:00 andrew@deploy1002: Started deploy [horizon/deploy@71f273c]: updated trove -> codfw1dev
  • 15:00 andrew@deploy1002: Finished deploy [horizon/deploy@71f273c]: updated trove -> codfw1dev (duration: 01m 29s)
  • 14:58 andrew@deploy1002: Started deploy [horizon/deploy@71f273c]: updated trove -> codfw1dev
  • 14:57 andrew@deploy1002: Finished deploy [horizon/deploy@71f273c]: updated trove -> codfw1dev (duration: 01m 22s)
  • 14:56 andrew@deploy1002: Started deploy [horizon/deploy@71f273c]: updated trove -> codfw1dev
  • 14:41 bblack@cumin1001: conftool action : set/pooled=yes; selector: name=cp203[34].codfw.wmnet
  • 14:40 andrew@deploy1002: Finished deploy [horizon/deploy@71f273c]: updated trove -> codfw1dev (duration: 01m 19s)
  • 14:38 andrew@deploy1002: Started deploy [horizon/deploy@71f273c]: updated trove -> codfw1dev
  • 14:38 andrew@deploy1002: Finished deploy [horizon/deploy@71f273c]: updated trove -> codfw1dev (duration: 00m 50s)
  • 14:37 andrew@deploy1002: Started deploy [horizon/deploy@71f273c]: updated trove -> codfw1dev
  • 13:04 Urbanecm: Start server-side upload for 1 video file (T281927)
  • 12:19 kormat@cumin1001: dbctl commit (dc=all): 'db1173 (re)pooling @ 100%: reimaged to buster T280751', diff saved to https://phabricator.wikimedia.org/P15856 and previous config saved to /var/cache/conftool/dbconfig/20210507-121908-kormat.json
  • 12:04 kormat@cumin1001: dbctl commit (dc=all): 'db1173 (re)pooling @ 75%: reimaged to buster T280751', diff saved to https://phabricator.wikimedia.org/P15855 and previous config saved to /var/cache/conftool/dbconfig/20210507-120404-kormat.json
  • 11:49 kormat@cumin1001: dbctl commit (dc=all): 'db1173 (re)pooling @ 50%: reimaged to buster T280751', diff saved to https://phabricator.wikimedia.org/P15854 and previous config saved to /var/cache/conftool/dbconfig/20210507-114859-kormat.json
  • 11:33 kormat@cumin1001: dbctl commit (dc=all): 'db1173 (re)pooling @ 25%: reimaged to buster T280751', diff saved to https://phabricator.wikimedia.org/P15853 and previous config saved to /var/cache/conftool/dbconfig/20210507-113355-kormat.json
  • 09:55 dcausse: depooling wdqs1012 T280382, T282222
  • 09:44 vgutierrez: Enforce Puppet Internal CA validation on trafficserver@codfw - T281673
  • 08:50 jmm@puppetmaster1001: conftool action : set/pooled=yes; selector: name=ldap-replica2005.wikimedia.org
  • 08:19 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2057.codfw.wmnet
  • 08:15 vgutierrez: Enforce Puppet Internal CA validation on trafficserver@eqsin - T281673
  • 08:10 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be2057.codfw.wmnet
  • 07:47 marostegui@cumin1001: dbctl commit (dc=all): 'db1161 (re)pooling @ 100%: Repool db1161', diff saved to https://phabricator.wikimedia.org/P15849 and previous config saved to /var/cache/conftool/dbconfig/20210507-074725-root.json
  • 07:32 marostegui@cumin1001: dbctl commit (dc=all): 'db1161 (re)pooling @ 75%: Repool db1161', diff saved to https://phabricator.wikimedia.org/P15848 and previous config saved to /var/cache/conftool/dbconfig/20210507-073222-root.json
  • 07:17 marostegui@cumin1001: dbctl commit (dc=all): 'db1161 (re)pooling @ 50%: Repool db1161', diff saved to https://phabricator.wikimedia.org/P15847 and previous config saved to /var/cache/conftool/dbconfig/20210507-071718-root.json
  • 07:02 marostegui@cumin1001: dbctl commit (dc=all): 'db1161 (re)pooling @ 25%: Repool db1161', diff saved to https://phabricator.wikimedia.org/P15846 and previous config saved to /var/cache/conftool/dbconfig/20210507-070214-root.json
  • 06:17 marostegui: Deploy schema change on s2 codfw, lag will appear T266486 T268392 T273360
  • 06:11 tstarling@deploy1002: Synchronized php-1.37.0-wmf.4/includes/api/ApiQueryLogEvents.php: fix UBN T282122 (duration: 01m 10s)
  • 06:09 tstarling@deploy1002: Synchronized php-1.37.0-wmf.3/includes/api/ApiQueryLogEvents.php: fix UBN T282122 (duration: 01m 06s)
  • 05:54 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1161 for schema change', diff saved to https://phabricator.wikimedia.org/P15845 and previous config saved to /var/cache/conftool/dbconfig/20210507-055425-marostegui.json
  • 05:53 marostegui@cumin1001: dbctl commit (dc=all): 'db1130 (re)pooling @ 100%: Repool db1130', diff saved to https://phabricator.wikimedia.org/P15844 and previous config saved to /var/cache/conftool/dbconfig/20210507-055350-root.json
  • 05:38 marostegui@cumin1001: dbctl commit (dc=all): 'db1130 (re)pooling @ 75%: Repool db1130', diff saved to https://phabricator.wikimedia.org/P15842 and previous config saved to /var/cache/conftool/dbconfig/20210507-053847-root.json
  • 05:23 marostegui@cumin1001: dbctl commit (dc=all): 'db1130 (re)pooling @ 50%: Repool db1130', diff saved to https://phabricator.wikimedia.org/P15841 and previous config saved to /var/cache/conftool/dbconfig/20210507-052343-root.json
  • 05:15 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1087 T282093', diff saved to https://phabricator.wikimedia.org/P15840 and previous config saved to /var/cache/conftool/dbconfig/20210507-051519-marostegui.json
  • 05:08 marostegui@cumin1001: dbctl commit (dc=all): 'db1130 (re)pooling @ 25%: Repool db1130', diff saved to https://phabricator.wikimedia.org/P15839 and previous config saved to /var/cache/conftool/dbconfig/20210507-050839-root.json
  • 04:33 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1130 for schema change', diff saved to https://phabricator.wikimedia.org/P15837 and previous config saved to /var/cache/conftool/dbconfig/20210507-043350-marostegui.json

2021-05-06

  • 23:50 brennen@deploy1002: rebuilt and synchronized wikiversions files: Rollback group1 and group2 to 1.37.0-wmf.3 (T282193)
  • 22:52 legoktm: upgrading mailman3 and hyperkitty on lists1001 (T282092)
  • 22:11 brennen@deploy1002: Synchronized php-1.37.0-wmf.4/includes/specials/SpecialWatchlist.php: Backport: Reorder tables in SpecialWatchlist (T282181) (duration: 00m 57s)
  • 21:48 legoktm: upgraded mailman3 and hyperkitty on lists1002 (T282092)
  • 21:46 legoktm: uploaded new mailman3 and hyperkitty packages to apt.wm.o (T282092)
  • 21:11 hashar: restarted CI Jenkins due to T281737
  • 19:05 brennen@deploy1002: rebuilt and synchronized wikiversions files: all wikis to 1.37.0-wmf.4
  • 19:04 ejegg: updated fundraising CiviCRM from 8034e47008 to 2052d79248
  • 18:58 otto@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Migrate WikidataCompletionSearchClicks to event platform on all wikis (T282140) (duration: 01m 04s)
  • 18:55 urbanecm@deploy1002: Synchronized wmf-config/Wikibase.php: 338d1df: Wikibase: Use wikidataclient-test dblist for testwikidata localClientDatabases (T282160) (duration: 01m 05s)
  • 18:46 urbanecm@deploy1002: Synchronized wmf-config/Wikibase.php: 7e21cf0: NO-OP: Wikibase: Use wikidataclient dblist directly for repo localClientDatabases (T282160) (duration: 01m 04s)
  • 18:31 otto@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Declare WikidataCompletionSearchClicks stream and migrate on testwiki - T282140 (duration: 01m 06s)
  • 17:59 volans@cumin2001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for cumin1001.eqiad.wmnet
  • 17:59 volans@cumin2001: START - Cookbook sre.hosts.remove-downtime for cumin1001.eqiad.wmnet
  • 17:47 volans@cumin2001: END (FAIL) - Cookbook sre.hosts.remove-downtime (exit_code=99) for cumin1001.eqiad.wmnet
  • 17:47 volans@cumin2001: START - Cookbook sre.hosts.remove-downtime for cumin1001.eqiad.wmnet
  • 17:35 jgiannelos@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'proton' for release 'production' .
  • 17:33 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'proton' for release 'production' .
  • 17:27 bblack@cumin1001: conftool action : set/pooled=no; selector: name=cp203[34].codfw.wmnet
  • 17:20 jgiannelos@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'proton' for release 'production' .
  • 17:15 volans: upgrade spicerack on cumin* to 0.0.52
  • 17:15 ryankemper: [Elastic] Set `elastic2043` as the only banned node in Cirrussearch Elasticsearch clusters (`elastic2058-production-search-codfw`, `elastic2058-production-search-omega-codfw`, `elastic2058-production-search-psi-codfw`)
  • 17:13 papaul: powerdown ms-be2057 for relocation
  • 17:13 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'proton' for release 'production' .
  • 17:12 volans: uploaded spicerack_0.0.52 to apt.wikimedia.org buster-wikimedia,bullseye-wikimedia
  • 17:00 papaul: powerdown elastic2058 for relocation
  • 16:43 vgutierrez: Enforce Puppet Internal CA validation on trafficserver@ulsfo - T281673
  • 16:12 papaul: powerdown mc-gp2002 for relocation
  • 16:09 ryankemper: [Elastic] Set `elastic2058` as the only banned node in Cirrussearch Elasticsearch clusters (`elastic2058-production-search-codfw`, `elastic2058-production-search-omega-codfw`, `elastic2058-production-search-psi-codfw`)
  • 15:58 Amir1: starting upgrade of public mailing lists in group d and e (T280322)
  • 15:50 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs1012.eqiad.wmnet with reason: REIMAGE
  • 15:47 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs1012.eqiad.wmnet with reason: REIMAGE
  • 15:42 papaul: powerdown logstash2027 for relocation
  • 15:41 mvolz@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'zotero' for release 'production' .
  • 15:40 ryankemper@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) reboot without plugin upgrade (3 nodes at a time) for ElasticSearch cluster search_codfw: codfw reboot - ryankemper@cumin1001 - T280563
  • 15:34 XioNoX: push cloud-gw-transport-eqiad to asw2-b-eqiad and cloudsw
  • 15:33 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation reboot without plugin upgrade (3 nodes at a time) for ElasticSearch cluster search_codfw: codfw reboot - ryankemper@cumin1001 - T280563
  • 15:32 ryankemper: T280382 `sudo -i wmf-auto-reimage-host -p T280382 wdqs1012.eqiad.wmnet` on `ryankemper@cumin1001` tmux session `reimage`
  • 15:32 ryankemper: T280382 `sudo -i wmf-auto-reimage-host -p T280382 wdqs2003.codfw.wmnet` on `ryankemper@cumin1001` tmux session `reimage`
  • 15:31 mvolz@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'zotero' for release 'staging' .
  • 15:29 cdanis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:05:00 on cumin1001.eqiad.wmnet with reason: quiz
  • 15:29 cdanis@cumin1001: START - Cookbook sre.hosts.downtime for 0:05:00 on cumin1001.eqiad.wmnet with reason: quiz
  • 15:26 ryankemper: T280382 [WDQS] Pooled `wdqs1007` and `wdqs2004`
  • 15:26 ryankemper: T280382 `wdqs2004.codfw.wmnet` has been re-imaged and had the appropriate wikidata/categories journal files transferred. `df -h` shows disk space is no longer an issue following the switch to `raid0`: `/dev/md2 2.6T 998G 1.5T 40% /srv`
  • 15:26 ryankemper: T280382 `wdqs1007.eqiad.wmnet` has been re-imaged and had the appropriate wikidata/categories journal files transferred. `df -h` shows disk space is no longer an issue following the switch to `raid0`: `/dev/md2 2.6T 998G 1.5T 40% /srv`
  • 15:20 mvolz@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'citoid' for release 'production' .
  • 15:16 mvolz@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'citoid' for release 'production' .
  • 15:14 papaul: powerdown ms-be2053 for relocation
  • 15:10 moritzm: imported wmfbackups 0.5+deb11u1 for bullseye-wikimedia to apt.wikimedia.org
  • 15:07 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 9 hosts with reason: T270704
  • 15:06 aborrero@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 9 hosts with reason: T270704
  • 15:06 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 105 hosts with reason: T270704
  • 15:06 aborrero@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 105 hosts with reason: T270704
  • 15:06 mvolz@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'citoid' for release 'staging' .
  • 15:05 moritzm: imported wmfmariadbpy 0.6+deb11u1 for bullseye-wikimedia to apt.wikimedia.org
  • 14:55 papaul: powerdown kafka-main2002 for relocation
  • 14:30 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1113:3315', diff saved to https://phabricator.wikimedia.org/P15833 and previous config saved to /var/cache/conftool/dbconfig/20210506-143002-marostegui.json
  • 14:09 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1113:3315 for schema change', diff saved to https://phabricator.wikimedia.org/P15829 and previous config saved to /var/cache/conftool/dbconfig/20210506-140916-marostegui.json
  • 13:37 marostegui@cumin1001: dbctl commit (dc=all): 'db1144:3315 (re)pooling @ 100%: Repool db1144:3315', diff saved to https://phabricator.wikimedia.org/P15828 and previous config saved to /var/cache/conftool/dbconfig/20210506-133738-root.json
  • 13:22 marostegui@cumin1001: dbctl commit (dc=all): 'db1144:3315 (re)pooling @ 75%: Repool db1144:3315', diff saved to https://phabricator.wikimedia.org/P15827 and previous config saved to /var/cache/conftool/dbconfig/20210506-132234-root.json
  • 13:21 XioNoX: push pfw policies - T281942
  • 13:07 marostegui@cumin1001: dbctl commit (dc=all): 'db1144:3315 (re)pooling @ 50%: Repool db1144:3315', diff saved to https://phabricator.wikimedia.org/P15826 and previous config saved to /var/cache/conftool/dbconfig/20210506-130730-root.json
  • 12:52 marostegui@cumin1001: dbctl commit (dc=all): 'db1144:3315 (re)pooling @ 25%: Repool db1144:3315', diff saved to https://phabricator.wikimedia.org/P15825 and previous config saved to /var/cache/conftool/dbconfig/20210506-125226-root.json
  • 11:44 hnowlan@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts eventlog1002.eqiad.wmnet
  • 11:35 mlitn@deploy1002: Synchronized wmf-config: Config: Enable Extension:MediaSearch on betacommons (T265939) (duration: 01m 06s)
  • 11:34 mlitn@deploy1002: sync-file aborted: Config: Enable Extension:MediaSearch on betacommons (T265939) (duration: 00m 56s)
  • 11:34 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1173.eqiad.wmnet with reason: REIMAGE
  • 11:31 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1173.eqiad.wmnet with reason: REIMAGE
  • 11:30 hnowlan@cumin1001: START - Cookbook sre.hosts.decommission for hosts eventlog1002.eqiad.wmnet
  • 11:28 hnowlan@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=99) for hosts eventlog1002.eqiad.wmnet
  • 11:27 hnowlan@cumin1001: START - Cookbook sre.hosts.decommission for hosts eventlog1002.eqiad.wmnet
  • 11:23 wmde-fisch@deploy1002: Synchronized wmf-config/CommonSettings.php: Config: Enable ReferencePreviews as full default on pilot wikis (T271206) (duration: 01m 06s)
  • 11:22 wmde-fisch@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Enable ReferencePreviews as full default on pilot wikis (T271206) (duration: 01m 06s)
  • 11:12 kormat@cumin1001: dbctl commit (dc=all): 'db1173 depooling: Reimage to buster T280751', diff saved to https://phabricator.wikimedia.org/P15824 and previous config saved to /var/cache/conftool/dbconfig/20210506-111256-kormat.json
  • 11:12 kormat: reimaging db1173 to buster T280751
  • 10:59 volans: upgrading spicerack on cumin hosts to 0.0.51-1
  • 10:59 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1144:3315 for schema change', diff saved to https://phabricator.wikimedia.org/P15823 and previous config saved to /var/cache/conftool/dbconfig/20210506-105909-marostegui.json
  • 10:58 marostegui@cumin1001: dbctl commit (dc=all): 'db1110 (re)pooling @ 100%: Repool db1110', diff saved to https://phabricator.wikimedia.org/P15822 and previous config saved to /var/cache/conftool/dbconfig/20210506-105850-root.json
  • 10:43 marostegui@cumin1001: dbctl commit (dc=all): 'db1110 (re)pooling @ 75%: Repool db1110', diff saved to https://phabricator.wikimedia.org/P15821 and previous config saved to /var/cache/conftool/dbconfig/20210506-104346-root.json
  • 10:28 marostegui@cumin1001: dbctl commit (dc=all): 'db1110 (re)pooling @ 50%: Repool db1110', diff saved to https://phabricator.wikimedia.org/P15820 and previous config saved to /var/cache/conftool/dbconfig/20210506-102842-root.json
  • 10:19 jynus: stop dbprov2002 in advance of maintenance T281135
  • 10:13 marostegui@cumin1001: dbctl commit (dc=all): 'db1110 (re)pooling @ 25%: Repool db1110', diff saved to https://phabricator.wikimedia.org/P15819 and previous config saved to /var/cache/conftool/dbconfig/20210506-101339-root.json
  • 09:55 akosiaris@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'linkrecommendation' for release 'external' .
  • 09:55 akosiaris@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'linkrecommendation' for release 'internal' .
  • 09:50 akosiaris@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'linkrecommendation' for release 'external' .
  • 09:50 akosiaris@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'linkrecommendation' for release 'internal' .
  • 09:45 akosiaris@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'linkrecommendation' for release 'staging' .
  • 09:22 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1110 for schema change', diff saved to https://phabricator.wikimedia.org/P15818 and previous config saved to /var/cache/conftool/dbconfig/20210506-092217-marostegui.json
  • 09:18 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3315 (re)pooling @ 100%: Repool db1096:3315', diff saved to https://phabricator.wikimedia.org/P15817 and previous config saved to /var/cache/conftool/dbconfig/20210506-091818-root.json
  • 09:03 elukey: sudo apt-get remove linux-image-4.19.0-11-amd64 linux-image-4.19.0-9-amd64 linux-image-4.19.0-13-amd64 on ping[123]001 host to free some space (tiny root partition, these are old kernels)
  • 09:03 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3315 (re)pooling @ 75%: Repool db1096:3315', diff saved to https://phabricator.wikimedia.org/P15816 and previous config saved to /var/cache/conftool/dbconfig/20210506-090315-root.json
  • 08:48 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3315 (re)pooling @ 50%: Repool db1096:3315', diff saved to https://phabricator.wikimedia.org/P15815 and previous config saved to /var/cache/conftool/dbconfig/20210506-084811-root.json
  • 08:47 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1087 db1167', diff saved to https://phabricator.wikimedia.org/P15814 and previous config saved to /var/cache/conftool/dbconfig/20210506-084754-marostegui.json
  • 08:44 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1087 and db1167 to switch sanitarium masters', diff saved to https://phabricator.wikimedia.org/P15813 and previous config saved to /var/cache/conftool/dbconfig/20210506-084443-marostegui.json
  • 08:39 marostegui@cumin1001: dbctl commit (dc=all): 'db1160 (re)pooling @ 100%: Repool db1160', diff saved to https://phabricator.wikimedia.org/P15812 and previous config saved to /var/cache/conftool/dbconfig/20210506-083910-root.json
  • 08:33 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3315 (re)pooling @ 25%: Repool db1096:3315', diff saved to https://phabricator.wikimedia.org/P15811 and previous config saved to /var/cache/conftool/dbconfig/20210506-083307-root.json
  • 08:27 ariel@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts snapshot1007.eqiad.wmnet
  • 08:24 marostegui@cumin1001: dbctl commit (dc=all): 'db1160 (re)pooling @ 75%: Repool db1160', diff saved to https://phabricator.wikimedia.org/P15810 and previous config saved to /var/cache/conftool/dbconfig/20210506-082406-root.json
  • 08:23 moritzm: imported wikimedia-lvs-realserver to apt.wikimedia.org/bullseye T275873
  • 08:18 ariel@cumin1001: START - Cookbook sre.hosts.decommission for hosts snapshot1007.eqiad.wmnet
  • 08:16 ariel@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts snapshot1006.eqiad.wmnet
  • 08:09 marostegui@cumin1001: dbctl commit (dc=all): 'db1160 (re)pooling @ 50%: Repool db1160', diff saved to https://phabricator.wikimedia.org/P15809 and previous config saved to /var/cache/conftool/dbconfig/20210506-080902-root.json
  • 08:06 ariel@cumin1001: START - Cookbook sre.hosts.decommission for hosts snapshot1006.eqiad.wmnet
  • 08:04 ariel@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts snapshot1005.eqiad.wmnet
  • 07:54 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1096:3315 for schema change', diff saved to https://phabricator.wikimedia.org/P15808 and previous config saved to /var/cache/conftool/dbconfig/20210506-075416-marostegui.json
  • 07:53 marostegui@cumin1001: dbctl commit (dc=all): 'db1160 (re)pooling @ 25%: Repool db1160', diff saved to https://phabricator.wikimedia.org/P15807 and previous config saved to /var/cache/conftool/dbconfig/20210506-075359-root.json
  • 07:47 jynus: shutting down and removing db2098:s3 instance
  • 07:47 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1160 for schema change', diff saved to https://phabricator.wikimedia.org/P15806 and previous config saved to /var/cache/conftool/dbconfig/20210506-074746-marostegui.json
  • 07:45 ariel@cumin1001: START - Cookbook sre.hosts.decommission for hosts snapshot1005.eqiad.wmnet
  • 07:29 vgutierrez: Enforce Puppet Internal CA validation on trafficserver@cp[4026,4032] - T281673
  • 07:26 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 07:24 moritzm: installing exim security updates on bullseye hosts
  • 07:24 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 06:40 marostegui@cumin1001: dbctl commit (dc=all): 'db1112 (re)pooling @ 100%: Repool db1112 after checking its tables', diff saved to https://phabricator.wikimedia.org/P15805 and previous config saved to /var/cache/conftool/dbconfig/20210506-064020-root.json
  • 06:29 marostegui@cumin1001: dbctl commit (dc=all): 'db1079 (re)pooling @ 100%: Repool db1079', diff saved to https://phabricator.wikimedia.org/P15804 and previous config saved to /var/cache/conftool/dbconfig/20210506-062931-root.json
  • 06:29 marostegui@cumin1001: dbctl commit (dc=all): 'db1158 (re)pooling @ 100%: Repool db1158', diff saved to https://phabricator.wikimedia.org/P15803 and previous config saved to /var/cache/conftool/dbconfig/20210506-062915-root.json
  • 06:25 marostegui@cumin1001: dbctl commit (dc=all): 'db1112 (re)pooling @ 75%: Repool db1112 after checking its tables', diff saved to https://phabricator.wikimedia.org/P15802 and previous config saved to /var/cache/conftool/dbconfig/20210506-062516-root.json
  • 06:20 elukey: apt-get clean on ping[1,2,3]001 to free some space
  • 06:14 marostegui@cumin1001: dbctl commit (dc=all): 'db1079 (re)pooling @ 75%: Repool db1079', diff saved to https://phabricator.wikimedia.org/P15801 and previous config saved to /var/cache/conftool/dbconfig/20210506-061427-root.json
  • 06:14 marostegui@cumin1001: dbctl commit (dc=all): 'db1158 (re)pooling @ 75%: Repool db1158', diff saved to https://phabricator.wikimedia.org/P15800 and previous config saved to /var/cache/conftool/dbconfig/20210506-061411-root.json
  • 06:10 marostegui@cumin1001: dbctl commit (dc=all): 'db1112 (re)pooling @ 50%: Repool db1112 after checking its tables', diff saved to https://phabricator.wikimedia.org/P15799 and previous config saved to /var/cache/conftool/dbconfig/20210506-061012-root.json
  • 06:01 ryankemper: T280382 `sudo -i cookbook sre.wdqs.data-transfer --source wdqs1011.eqiad.wmnet --dest wdqs1007.eqiad.wmnet --reason "transferring fresh wikidata journal following reimage" --blazegraph_instance blazegraph` on `ryankemper@cumin1001` tmux session `reimage`
  • 06:00 ryankemper: T280382 `sudo -i cookbook sre.wdqs.data-transfer --source wdqs2008.codfw.wmnet --dest wdqs2004.codfw.wmnet --reason "transferring fresh wikidata journal following reimage" --blazegraph_instance blazegraph` on `ryankemper@cumin1001` tmux session `reimage`
  • 06:00 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 05:59 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 05:59 marostegui@cumin1001: dbctl commit (dc=all): 'db1079 (re)pooling @ 50%: Repool db1079', diff saved to https://phabricator.wikimedia.org/P15798 and previous config saved to /var/cache/conftool/dbconfig/20210506-055923-root.json
  • 05:59 marostegui@cumin1001: dbctl commit (dc=all): 'db1158 (re)pooling @ 50%: Repool db1158', diff saved to https://phabricator.wikimedia.org/P15797 and previous config saved to /var/cache/conftool/dbconfig/20210506-055907-root.json
  • 05:55 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1083 T281445', diff saved to https://phabricator.wikimedia.org/P15796 and previous config saved to /var/cache/conftool/dbconfig/20210506-055535-marostegui.json
  • 05:55 marostegui@cumin1001: dbctl commit (dc=all): 'db1112 (re)pooling @ 25%: Repool db1112 after checking its tables', diff saved to https://phabricator.wikimedia.org/P15795 and previous config saved to /var/cache/conftool/dbconfig/20210506-055509-root.json
  • 05:44 marostegui@cumin1001: dbctl commit (dc=all): 'db1079 (re)pooling @ 25%: Repool db1079', diff saved to https://phabricator.wikimedia.org/P15794 and previous config saved to /var/cache/conftool/dbconfig/20210506-054419-root.json
  • 05:44 marostegui@cumin1001: dbctl commit (dc=all): 'db1158 (re)pooling @ 25%: Repool db1158', diff saved to https://phabricator.wikimedia.org/P15793 and previous config saved to /var/cache/conftool/dbconfig/20210506-054404-root.json
  • 05:43 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 05:43 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 05:38 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 05:38 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1079 and db1158 to switch sanitarium masters', diff saved to https://phabricator.wikimedia.org/P15792 and previous config saved to /var/cache/conftool/dbconfig/20210506-053801-marostegui.json
  • 05:38 ryankemper: T280382 `sudo -i cookbook sre.wdqs.data-transfer --source wdqs1011.eqiad.wmnet --dest wdqs1007.eqiad.wmnet --reason "transferring fresh categories journal following reimage" --blazegraph_instance categories` on `ryankemper@cumin1001` tmux session `reimage`
  • 05:37 ryankemper: T280382 `sudo -i cookbook sre.wdqs.data-transfer --source wdqs2008.codfw.wmnet --dest wdqs2004.codfw.wmnet --reason "transferring fresh categories journal following reimage" --blazegraph_instance categories` on `ryankemper@cumin1001` tmux session `reimage`
  • 05:37 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 05:32 tstarling@deploy1002: Synchronized php-1.37.0-wmf.4/includes/page/PageReferenceValue.php: fixing T282070 RC/log breakage due to unblocking autoblocks (duration: 01m 09s)
  • 05:27 effie: upgrade scap to 3.17.1-1 - T279695
  • 03:56 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs2004.codfw.wmnet with reason: REIMAGE
  • 03:54 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs1007.eqiad.wmnet with reason: REIMAGE
  • 03:53 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs2004.codfw.wmnet with reason: REIMAGE
  • 03:52 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs1007.eqiad.wmnet with reason: REIMAGE
  • 03:38 ryankemper: T280382 `sudo -i wmf-auto-reimage-host -p T280382 wdqs1007.eqiad.wmnet` on `ryankemper@cumin1001` tmux session `reimage`
  • 03:38 ryankemper: T280382 `sudo -i wmf-auto-reimage-host -p T280382 wdqs2004.codfw.wmnet` on `ryankemper@cumin1001` tmux session `reimage`
  • 03:18 ryankemper: [Elastic] `elastic2043` is ssh unreachable. Power cycling it to bring it briefly back online - if it has the shard it should be able to repair the cluster state. Otherwise I'll have to delete the index for `enwiki_titlesuggest_1620184482` given the data would be unrecoverable
  • 03:08 ryankemper: [Elastic] `ryankemper@elastic2044:~$ curl -H 'Content-Type: application/json' -XPUT http://localhost:9200/_cluster/settings -d '{"transient":{"cluster.routing.allocation.exclude":{"_host": null,"_name": null}'`}}
  • 03:08 ryankemper: [Elastic] Temporarily unbanning `elastic2033` and `elastic2043` from `production-search-codfw` to see if we can get the cluster green again. If it returns to green then we'll ban one node, wait for the shards to redistribute, and then ban the other
  • 03:06 ryankemper: [Elastic] I banned two nodes simultaneously earlier today - if there's an index with only 1 replica, and its primary and replica happened to be on the two nodes I banned, then that would have caused this situation
  • 03:04 ryankemper: [Elastic] It looks like we've got a single missing shard in `production-search-codfw` (port 9200), which is putting the cluster into red status. The cluster won't get back into green status without intervention
  • 02:56 ryankemper@cumin1001: END (ERROR) - Cookbook sre.elasticsearch.rolling-operation (exit_code=97) reboot without plugin upgrade (3 nodes at a time) for ElasticSearch cluster search_codfw: codfw reboot - ryankemper@cumin1001 - T280563
  • 02:55 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation reboot without plugin upgrade (3 nodes at a time) for ElasticSearch cluster search_codfw: codfw reboot - ryankemper@cumin1001 - T280563
  • 00:35 Amir1: sudo service mailman3-web restart

2021-05-05

  • 23:35 ryankemper: T281621 T281327 [Elastic] Banned `elastic2033` and `elastic2043` from the Cirrussearch Elasticsearch clusters
  • 23:10 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.4/extensions/GlobalWatchlist/modules/SpecialGlobalWatchlist.display.css: 4947241: Fix centering of as-of label (duration: 01m 08s)
  • 22:13 mutante: welcome new deployer derick - user created on deploy1002 and bastions (T281564)
  • 22:05 mutante: pushing puppet run on all bastion hosts
  • 21:45 mutante: mailing lists: approved Alangi Derick's pending request for membership in ops mailing list (is becoming deployer) T281309
  • 21:37 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.4/extensions/CentralAuth/includes/CentralAuthUser.php: 52b134e: Cross-wiki block should pass correct wiki blocker (T281972) (duration: 01m 09s)
  • 21:34 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.3/extensions/CentralAuth/includes/CentralAuthUser.php: 6526884: Cross-wiki block should pass correct wiki blocker (T281972) (duration: 01m 08s)
  • 21:32 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.4/includes/user/UserIdentityValue.php: f189c46: UserIdentityValue: Introduce convenience static factory methods (T281972) (duration: 01m 09s)
  • 21:30 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.3/includes/user/UserIdentityValue.php: 8ffb52d: UserIdentityValue: Introduce convenience static factory methods (T281972) (duration: 01m 11s)
  • 21:29 urbanecm@deploy1002: sync-file aborted: 8ffb52d: UserIdentityValue: Introduce convenience static factory methods (T281972) (duration: 00m 04s)
  • 20:37 ejegg: updated email preferences wiki (donorwiki) from d449599540 to 9f51ace546
  • 20:36 ejegg: updated payments-wiki from d449599540 to 9f51ace546
  • 20:20 ejegg: updated email preferences wiki (donorwiki) from a232fc3438 to d449599540
  • 19:59 jbond42: re-enable puppet post 685485
  • 19:53 jbond42: disable puppet: rolling out change (685485) which affects all hosts
  • 19:21 brennen@deploy1002: Synchronized php: group1 wikis to 1.37.0-wmf.4 (duration: 01m 07s)
  • 19:19 brennen@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.37.0-wmf.4
  • 19:16 jbond42: ignore the last log message will wait for deploy to finish
  • 19:16 brennen@deploy1002: Synchronized php-1.37.0-wmf.4/tests/phpunit/includes: Backport: Fix order of joins in SpecialRecentChanges (T281981) (duration: 01m 10s)
  • 19:16 jbond42: disable puppet: rolling out change (685485) which affects all hosts
  • 19:14 brennen@deploy1002: Synchronized php-1.37.0-wmf.4/includes/specials: Backport: Fix order of joins in SpecialRecentChanges (T281981) (duration: 01m 08s)
  • 19:10 Amir1: starting migration of public mailing lists in group b and c to mailman3 (T280322)
  • 19:01 brennen: 1.37.0-wmf.4 train status (T281145): deploying patch for T282038 and then rolling forward to group1.
  • 18:59 bblack@cumin1001: conftool action : set/pooled=yes; selector: name=cp501[46].eqsin.wmnet
  • 18:50 bblack@cumin1001: conftool action : set/pooled=yes; selector: name=cp501[35].eqsin.wmnet
  • 18:43 tgr_: Morning deploys done
  • 18:43 tgr@deploy1002: Synchronized php-1.37.0-wmf.4/extensions/GrowthExperiments/modules/homepage/addlink/AddLinkArticleTarget.js: Backport: Prevent edit notices from appearing (T281960) (duration: 01m 08s)
  • 18:42 tgr@deploy1002: Synchronized php-1.37.0-wmf.3/extensions/GrowthExperiments/modules/homepage/addlink/AddLinkArticleTarget.js: Backport: Prevent edit notices from appearing (T281960) (duration: 01m 08s)
  • 18:40 tgr@deploy1002: Synchronized wmf-config/flaggedrevs.php: Config: flaggedrevs.php: Use MediaWikiServices, not an extension function (duration: 01m 08s)
  • 18:34 tgr@deploy1002: Synchronized php-1.37.0-wmf.4/extensions/Popups/includes: Backport: Enable Reference Previews for more users (T271206) (duration: 01m 08s)
  • 18:33 tgr@deploy1002: Synchronized php-1.37.0-wmf.3/extensions/Popups/includes: Backport: Enable Reference Previews for more users (T271206) (duration: 01m 11s)
  • 18:24 tgr@deploy1002: Synchronized wmf-config/ProductionServices.php: Config: replace mwlog1001 with new mwlog[12]002 hosts (T224565) (duration: 01m 24s)
  • 17:59 bblack@cumin1001: conftool action : set/weight=100; selector: name=cp501[3456].eqsin.wmnet,service=ats-be
  • 17:59 bblack@cumin1001: conftool action : set/weight=1; selector: name=cp501[3456].eqsin.wmnet,service=ats-tls
  • 17:59 bblack@cumin1001: conftool action : set/weight=1; selector: name=cp501[3456].eqsin.wmnet,service=varnish-fe
  • 17:59 mutante: adding a systemd timer to all thumbor servers that writes output of fc-list command into /srv/fc-list/fc-list (T280718)
  • 17:58 XioNoX: push pfw policies - T281942
  • 17:10 ejegg: updated standalone SmashPig deploy from 250a8570d1 to be272c02ce
  • 15:54 marostegui@cumin1001: dbctl commit (dc=all): 'db1126 (re)pooling @ 100%: Repool db1126', diff saved to https://phabricator.wikimedia.org/P15786 and previous config saved to /var/cache/conftool/dbconfig/20210505-155453-root.json
  • 15:43 herron@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts icinga2001.wikimedia.org
  • 15:39 marostegui@cumin1001: dbctl commit (dc=all): 'db1126 (re)pooling @ 75%: Repool db1126', diff saved to https://phabricator.wikimedia.org/P15785 and previous config saved to /var/cache/conftool/dbconfig/20210505-153949-root.json
  • 15:25 herron@cumin1001: START - Cookbook sre.hosts.decommission for hosts icinga2001.wikimedia.org
  • 15:24 marostegui@cumin1001: dbctl commit (dc=all): 'db1126 (re)pooling @ 50%: Repool db1126', diff saved to https://phabricator.wikimedia.org/P15784 and previous config saved to /var/cache/conftool/dbconfig/20210505-152445-root.json
  • 15:23 herron@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts icinga1001.wikimedia.org
  • 15:11 herron@cumin1001: START - Cookbook sre.hosts.decommission for hosts icinga1001.wikimedia.org
  • 15:10 herron: decommissioning icinga[12]001 hosts T279601 T279602
  • 15:10 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on 10 hosts with reason: Table check on db2129 T280751
  • 15:10 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on 10 hosts with reason: Table check on db2129 T280751
  • 15:09 marostegui@cumin1001: dbctl commit (dc=all): 'db1126 (re)pooling @ 30%: Repool db1126', diff saved to https://phabricator.wikimedia.org/P15783 and previous config saved to /var/cache/conftool/dbconfig/20210505-150942-root.json
  • 14:54 marostegui@cumin1001: dbctl commit (dc=all): 'db1126 (re)pooling @ 20%: Repool db1126', diff saved to https://phabricator.wikimedia.org/P15782 and previous config saved to /var/cache/conftool/dbconfig/20210505-145438-root.json
  • 14:44 marostegui@cumin1001: dbctl commit (dc=all): 'db1165 (re)pooling @ 100%: Repool db1165', diff saved to https://phabricator.wikimedia.org/P15781 and previous config saved to /var/cache/conftool/dbconfig/20210505-144431-root.json
  • 14:39 marostegui@cumin1001: dbctl commit (dc=all): 'db1126 (re)pooling @ 10%: Repool db1126', diff saved to https://phabricator.wikimedia.org/P15780 and previous config saved to /var/cache/conftool/dbconfig/20210505-143934-root.json
  • 14:29 marostegui@cumin1001: dbctl commit (dc=all): 'db1165 (re)pooling @ 75%: Repool db1165', diff saved to https://phabricator.wikimedia.org/P15779 and previous config saved to /var/cache/conftool/dbconfig/20210505-142927-root.json
  • 14:28 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 10 hosts with reason: Reimage db2129 T280751
  • 14:28 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 10 hosts with reason: Reimage db2129 T280751
  • 14:24 marostegui@cumin1001: dbctl commit (dc=all): 'db1126 (re)pooling @ 5%: Repool db1126', diff saved to https://phabricator.wikimedia.org/P15778 and previous config saved to /var/cache/conftool/dbconfig/20210505-142431-root.json
  • 14:19 kormat@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2129.codfw.wmnet with reason: REIMAGE
  • 14:18 marostegui: Upgrade kernel and enable report_host on db1126
  • 14:17 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1126 to enable report_host', diff saved to https://phabricator.wikimedia.org/P15777 and previous config saved to /var/cache/conftool/dbconfig/20210505-141735-marostegui.json
  • 14:17 kormat@cumin2001: START - Cookbook sre.hosts.downtime for 2:00:00 on db2129.codfw.wmnet with reason: REIMAGE
  • 14:14 marostegui@cumin1001: dbctl commit (dc=all): 'db1165 (re)pooling @ 50%: Repool db1165', diff saved to https://phabricator.wikimedia.org/P15776 and previous config saved to /var/cache/conftool/dbconfig/20210505-141423-root.json
  • 13:59 marostegui@cumin1001: dbctl commit (dc=all): 'db1165 (re)pooling @ 25%: Repool db1165', diff saved to https://phabricator.wikimedia.org/P15775 and previous config saved to /var/cache/conftool/dbconfig/20210505-135920-root.json
  • 13:58 kevinbazira@deploy1002: Finished deploy [ores/deploy@5612f30]: Regular ORES Deployment T278723 (duration: 16m 47s)
  • 13:48 wmde-fisch@deploy1002: Synchronized wmf-config/CommonSettings.php: Config: Revert "Enable ReferencePreviews on first wikis CommonSettings" () (duration: 02m 08s)
  • 13:41 kevinbazira@deploy1002: Started deploy [ores/deploy@5612f30]: Regular ORES Deployment T278723
  • 13:33 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1165 for schema change', diff saved to https://phabricator.wikimedia.org/P15774 and previous config saved to /var/cache/conftool/dbconfig/20210505-133259-marostegui.json
  • 13:32 marostegui@cumin1001: dbctl commit (dc=all): 'db1180 (re)pooling @ 100%: Repool db1180', diff saved to https://phabricator.wikimedia.org/P15773 and previous config saved to /var/cache/conftool/dbconfig/20210505-133202-root.json
  • 13:19 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 9 hosts with reason: Reimage db2129 T280751
  • 13:19 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 9 hosts with reason: Reimage db2129 T280751
  • 13:16 marostegui@cumin1001: dbctl commit (dc=all): 'db1180 (re)pooling @ 75%: Repool db1180', diff saved to https://phabricator.wikimedia.org/P15772 and previous config saved to /var/cache/conftool/dbconfig/20210505-131658-root.json
  • 13:12 kormat: reimaging db2129 to buster T280751
  • 13:01 marostegui@cumin1001: dbctl commit (dc=all): 'db1180 (re)pooling @ 50%: Repool db1180', diff saved to https://phabricator.wikimedia.org/P15771 and previous config saved to /var/cache/conftool/dbconfig/20210505-130155-root.json
  • 12:46 marostegui@cumin1001: dbctl commit (dc=all): 'db1180 (re)pooling @ 25%: Repool db1180', diff saved to https://phabricator.wikimedia.org/P15770 and previous config saved to /var/cache/conftool/dbconfig/20210505-124651-root.json
  • 12:23 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1180 for schema change', diff saved to https://phabricator.wikimedia.org/P15769 and previous config saved to /var/cache/conftool/dbconfig/20210505-122351-marostegui.json
  • 12:13 marostegui@cumin1001: dbctl commit (dc=all): 'db1173 (re)pooling @ 100%: Repool db1173', diff saved to https://phabricator.wikimedia.org/P15768 and previous config saved to /var/cache/conftool/dbconfig/20210505-121353-root.json
  • 12:01 moritzm: installing exim security updates on stretch
  • 11:58 marostegui@cumin1001: dbctl commit (dc=all): 'db1173 (re)pooling @ 75%: Repool db1173', diff saved to https://phabricator.wikimedia.org/P15767 and previous config saved to /var/cache/conftool/dbconfig/20210505-115849-root.json
  • 11:43 marostegui@cumin1001: dbctl commit (dc=all): 'db1173 (re)pooling @ 50%: Repool db1173', diff saved to https://phabricator.wikimedia.org/P15765 and previous config saved to /var/cache/conftool/dbconfig/20210505-114345-root.json
  • 11:28 marostegui@cumin1001: dbctl commit (dc=all): 'db1173 (re)pooling @ 25%: Repool db1173', diff saved to https://phabricator.wikimedia.org/P15764 and previous config saved to /var/cache/conftool/dbconfig/20210505-112842-root.json
  • 11:25 urbanecm@deploy1002: Synchronized wmf-config/CommonSettings.php: 3565427: Enable ReferencePreviews on first wikis (T271206; 2/2) (duration: 01m 10s)
  • 11:24 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 4f3051b: Enable ReferencePreviews on first wikis (T271206; 1/2) (duration: 01m 20s)
  • 11:17 urbanecm@deploy1002: Scap failed!: Call to mwscript eval.php stderr: not empty
  • 11:11 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 289dc34: Enable new language button for all logged in users outside test projects (T280526) (duration: 02m 24s)
  • 10:29 jmm@cumin2001: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0)
  • 09:54 hashar: Restarted Zuul / CI
  • 09:49 marostegui@cumin1001: dbctl commit (dc=all): 'db1168 (re)pooling @ 100%: Repool db1168', diff saved to https://phabricator.wikimedia.org/P15762 and previous config saved to /var/cache/conftool/dbconfig/20210505-094945-root.json
  • 09:40 marostegui@cumin1001: dbctl commit (dc=all): 'db1178 (re)pooling @ 100%: Slowly pool db1178 into s8 T275633', diff saved to https://phabricator.wikimedia.org/P15761 and previous config saved to /var/cache/conftool/dbconfig/20210505-094005-root.json
  • 09:34 marostegui@cumin1001: dbctl commit (dc=all): 'db1168 (re)pooling @ 75%: Repool db1168', diff saved to https://phabricator.wikimedia.org/P15760 and previous config saved to /var/cache/conftool/dbconfig/20210505-093441-root.json
  • 09:25 marostegui@cumin1001: dbctl commit (dc=all): 'db1178 (re)pooling @ 80%: Slowly pool db1178 into s8 T275633', diff saved to https://phabricator.wikimedia.org/P15759 and previous config saved to /var/cache/conftool/dbconfig/20210505-092501-root.json
  • 09:19 marostegui@cumin1001: dbctl commit (dc=all): 'db1168 (re)pooling @ 50%: Repool db1168', diff saved to https://phabricator.wikimedia.org/P15758 and previous config saved to /var/cache/conftool/dbconfig/20210505-091938-root.json
  • 09:09 marostegui@cumin1001: dbctl commit (dc=all): 'db1178 (re)pooling @ 70%: Slowly pool db1178 into s8 T275633', diff saved to https://phabricator.wikimedia.org/P15757 and previous config saved to /var/cache/conftool/dbconfig/20210505-090957-root.json
  • 09:08 hashar: Upgraded Jenkins ldap plugin from 1.26 to 2.6 # T281737
  • 09:04 marostegui@cumin1001: dbctl commit (dc=all): 'db1168 (re)pooling @ 25%: Repool db1168', diff saved to https://phabricator.wikimedia.org/P15756 and previous config saved to /var/cache/conftool/dbconfig/20210505-090434-root.json
  • 08:55 hashar: Restarting CI Jenkins # T281737
  • 08:54 marostegui@cumin1001: dbctl commit (dc=all): 'db1178 (re)pooling @ 60%: Slowly pool db1178 into s8 T275633', diff saved to https://phabricator.wikimedia.org/P15755 and previous config saved to /var/cache/conftool/dbconfig/20210505-085454-root.json
  • 08:50 jgiannelos@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
  • 08:47 jgiannelos@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
  • 08:41 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'mobileapps' for release 'staging' .
  • 08:39 marostegui@cumin1001: dbctl commit (dc=all): 'db1178 (re)pooling @ 50%: Slowly pool db1178 into s8 T275633', diff saved to https://phabricator.wikimedia.org/P15754 and previous config saved to /var/cache/conftool/dbconfig/20210505-083950-root.json
  • 08:38 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1168 for schema change', diff saved to https://phabricator.wikimedia.org/P15753 and previous config saved to /var/cache/conftool/dbconfig/20210505-083810-marostegui.json
  • 08:26 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1113:3316', diff saved to https://phabricator.wikimedia.org/P15752 and previous config saved to /var/cache/conftool/dbconfig/20210505-082609-marostegui.json
  • 08:24 marostegui@cumin1001: dbctl commit (dc=all): 'db1178 (re)pooling @ 35%: Slowly pool db1178 into s8 T275633', diff saved to https://phabricator.wikimedia.org/P15751 and previous config saved to /var/cache/conftool/dbconfig/20210505-082446-root.json
  • 08:13 volans: uploaded spicerack_0.0.51 to apt.wikimedia.org buster-wikimedia
  • 08:09 marostegui@cumin1001: dbctl commit (dc=all): 'db1178 (re)pooling @ 30%: Slowly pool db1178 into s8 T275633', diff saved to https://phabricator.wikimedia.org/P15750 and previous config saved to /var/cache/conftool/dbconfig/20210505-080942-root.json
  • 07:54 marostegui@cumin1001: dbctl commit (dc=all): 'db1178 (re)pooling @ 25%: Slowly pool db1178 into s8 T275633', diff saved to https://phabricator.wikimedia.org/P15749 and previous config saved to /var/cache/conftool/dbconfig/20210505-075438-root.json
  • 07:53 ryankemper@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) reboot without plugin upgrade (3 nodes at a time) for ElasticSearch cluster search_codfw: codfw reboot - ryankemper@cumin1001 - T280563
  • 07:39 marostegui@cumin1001: dbctl commit (dc=all): 'db1178 (re)pooling @ 20%: Slowly pool db1178 into s8 T275633', diff saved to https://phabricator.wikimedia.org/P15748 and previous config saved to /var/cache/conftool/dbconfig/20210505-073934-root.json
  • 07:37 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1113:3316 for schema change', diff saved to https://phabricator.wikimedia.org/P15747 and previous config saved to /var/cache/conftool/dbconfig/20210505-073722-marostegui.json
  • 07:36 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3316 (re)pooling @ 100%: Repool db1096:3316 after schema change', diff saved to https://phabricator.wikimedia.org/P15746 and previous config saved to /var/cache/conftool/dbconfig/20210505-073653-root.json
  • 07:35 jmm@cumin2001: START - Cookbook sre.cassandra.roll-restart
  • 07:35 moritzm: rolling restart of cassandra in eqiad to pick up Java security updates
  • 07:34 marostegui@cumin1001: dbctl commit (dc=all): 'db1156 (re)pooling @ 100%: Repool db1156', diff saved to https://phabricator.wikimedia.org/P15745 and previous config saved to /var/cache/conftool/dbconfig/20210505-073416-root.json
  • 07:32 marostegui@cumin1001: dbctl commit (dc=all): 'db1074 (re)pooling @ 100%: Repool db1074', diff saved to https://phabricator.wikimedia.org/P15744 and previous config saved to /var/cache/conftool/dbconfig/20210505-073223-root.json
  • 07:24 marostegui@cumin1001: dbctl commit (dc=all): 'db1178 (re)pooling @ 15%: Slowly pool db1178 into s8 T275633', diff saved to https://phabricator.wikimedia.org/P15743 and previous config saved to /var/cache/conftool/dbconfig/20210505-072431-root.json
  • 07:21 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3316 (re)pooling @ 75%: Repool db1096:3316 after schema change', diff saved to https://phabricator.wikimedia.org/P15742 and previous config saved to /var/cache/conftool/dbconfig/20210505-072149-root.json
  • 07:19 marostegui@cumin1001: dbctl commit (dc=all): 'db1156 (re)pooling @ 75%: Repool db1156', diff saved to https://phabricator.wikimedia.org/P15741 and previous config saved to /var/cache/conftool/dbconfig/20210505-071912-root.json
  • 07:17 marostegui@cumin1001: dbctl commit (dc=all): 'db1074 (re)pooling @ 75%: Repool db1074', diff saved to https://phabricator.wikimedia.org/P15740 and previous config saved to /var/cache/conftool/dbconfig/20210505-071720-root.json
  • 07:11 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1082 T281794', diff saved to https://phabricator.wikimedia.org/P15739 and previous config saved to /var/cache/conftool/dbconfig/20210505-071132-marostegui.json
  • 07:09 marostegui@cumin1001: dbctl commit (dc=all): 'db1178 (re)pooling @ 10%: Slowly pool db1178 into s8 T275633', diff saved to https://phabricator.wikimedia.org/P15738 and previous config saved to /var/cache/conftool/dbconfig/20210505-070927-root.json
  • 07:06 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3316 (re)pooling @ 50%: Repool db1096:3316 after schema change', diff saved to https://phabricator.wikimedia.org/P15737 and previous config saved to /var/cache/conftool/dbconfig/20210505-070646-root.json
  • 07:04 marostegui@cumin1001: dbctl commit (dc=all): 'db1156 (re)pooling @ 50%: Repool db1156', diff saved to https://phabricator.wikimedia.org/P15736 and previous config saved to /var/cache/conftool/dbconfig/20210505-070409-root.json
  • 07:02 marostegui@cumin1001: dbctl commit (dc=all): 'db1074 (re)pooling @ 50%: Repool db1074', diff saved to https://phabricator.wikimedia.org/P15735 and previous config saved to /var/cache/conftool/dbconfig/20210505-070216-root.json
  • 06:54 marostegui@cumin1001: dbctl commit (dc=all): 'db1178 (re)pooling @ 5%: Slowly pool db1178 into s8 T275633', diff saved to https://phabricator.wikimedia.org/P15734 and previous config saved to /var/cache/conftool/dbconfig/20210505-065423-root.json
  • 06:51 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3316 (re)pooling @ 25%: Repool db1096:3316 after schema change', diff saved to https://phabricator.wikimedia.org/P15733 and previous config saved to /var/cache/conftool/dbconfig/20210505-065142-root.json
  • 06:49 marostegui@cumin1001: dbctl commit (dc=all): 'db1156 (re)pooling @ 25%: Repool db1156', diff saved to https://phabricator.wikimedia.org/P15732 and previous config saved to /var/cache/conftool/dbconfig/20210505-064905-root.json
  • 06:47 marostegui@cumin1001: dbctl commit (dc=all): 'db1074 (re)pooling @ 25%: Repool db1074', diff saved to https://phabricator.wikimedia.org/P15731 and previous config saved to /var/cache/conftool/dbconfig/20210505-064712-root.json
  • 06:42 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1074 and db1156 to switch sanitarium hosts T280492', diff saved to https://phabricator.wikimedia.org/P15730 and previous config saved to /var/cache/conftool/dbconfig/20210505-064204-marostegui.json
  • 06:41 marostegui: Check tables on db1112 (lag might show up on s3 on wiki replicas) T280492
  • 06:39 marostegui@cumin1001: dbctl commit (dc=all): 'db1178 (re)pooling @ 3%: Slowly pool db1178 into s8 T275633', diff saved to https://phabricator.wikimedia.org/P15729 and previous config saved to /var/cache/conftool/dbconfig/20210505-063920-root.json
  • 06:24 marostegui@cumin1001: dbctl commit (dc=all): 'db1178 (re)pooling @ 2%: Slowly pool db1178 into s8 T275633', diff saved to https://phabricator.wikimedia.org/P15728 and previous config saved to /var/cache/conftool/dbconfig/20210505-062416-root.json
  • 06:09 marostegui@cumin1001: dbctl commit (dc=all): 'db1178 (re)pooling @ 1%: Slowly pool db1178 into s8 T275633', diff saved to https://phabricator.wikimedia.org/P15727 and previous config saved to /var/cache/conftool/dbconfig/20210505-060912-root.json
  • 06:08 marostegui@cumin1001: dbctl commit (dc=all): 'Add db1178 into dbctl T275633', diff saved to https://phabricator.wikimedia.org/P15726 and previous config saved to /var/cache/conftool/dbconfig/20210505-060814-marostegui.json
  • 06:06 marostegui@cumin1001: dbctl commit (dc=all): 'Remove db1104 from API', diff saved to https://phabricator.wikimedia.org/P15725 and previous config saved to /var/cache/conftool/dbconfig/20210505-060636-marostegui.json
  • 06:00 marostegui: Restart mysqld on x1 database primary master (db1103) T281212
  • 05:38 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1099:3311 into main traffic', diff saved to https://phabricator.wikimedia.org/P15724 and previous config saved to /var/cache/conftool/dbconfig/20210505-053841-marostegui.json
  • 05:32 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1106 into s1 vslow, remove db1099:3311', diff saved to https://phabricator.wikimedia.org/P15723 and previous config saved to /var/cache/conftool/dbconfig/20210505-053211-marostegui.json
  • 05:29 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1096:3316 for schema change', diff saved to https://phabricator.wikimedia.org/P15722 and previous config saved to /var/cache/conftool/dbconfig/20210505-052943-marostegui.json
  • 04:53 eileen: civicrm revision changed from e7c610fd87 to 8034e47008, config revision is 189788d452
  • 03:58 ryankemper: T280563 `sudo -i cookbook sre.elasticsearch.rolling-operation search_codfw "codfw reboot" --reboot --nodes-per-run 3 --start-datetime 2021-04-29T23:04:29 --task-id T280563` on `ryankemper@cumin1001` tmux session `elastic_restarts`
  • 03:58 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation reboot without plugin upgrade (3 nodes at a time) for ElasticSearch cluster search_codfw: codfw reboot - ryankemper@cumin1001 - T280563
  • 03:56 ryankemper: T280563 Reboot of `eqiad` complete. Only ~half of `codfw` is remaining.
  • 03:56 ryankemper@cumin1001: END (PASS) - Cookbook sre.elasticsearch.rolling-operation (exit_code=0) reboot without plugin upgrade (3 nodes at a time) for ElasticSearch cluster search_eqiad: eqiad reboot to apply sec updates - ryankemper@cumin1001 - T280563
  • 03:54 ryankemper: T280382 `wdqs1011.eqiad.wmnet` has been re-imaged and had the appropriate wikidata/categories journal files transferred. `df -h` shows disk space is no longer an issue following the switch to `raid0`: `/dev/mapper/vg0-srv 2.7T 998G 1.6T 39% /srv`
  • 03:52 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation reboot without plugin upgrade (3 nodes at a time) for ElasticSearch cluster search_eqiad: eqiad reboot to apply sec updates - ryankemper@cumin1001 - T280563
  • 03:51 ryankemper: T280382 [WDQS] `ryankemper@wdqs2007:~$ sudo depool` (need to monitor host to see if it becomes ssh unreachable again or if it was a one-off; also high update lag)
  • 03:50 ryankemper: T280382 `wdqs2007.codfw.wmnet` has been re-imaged and had the appropriate wikidata/categories journal files transferred. `df -h` shows disk space is no longer an issue following the switch to `raid0`: `/dev/mapper/vg0-srv 2.7T 998G 1.6T 39% /srv`
  • 03:07 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 03:02 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 02:59 ryankemper@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) reboot without plugin upgrade (3 nodes at a time) for ElasticSearch cluster search_eqiad: eqiad reboot to apply sec updates - ryankemper@cumin1001 - T280563
  • 01:55 ryankemper: T281327 [Elastic] Unbanned `elastic2043` from cluster
  • 01:50 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 01:49 ryankemper: T280382 `sudo -i cookbook sre.wdqs.data-transfer --source wdqs2001.codfw.wmnet --dest wdqs2007.codfw.wmnet --reason "transferring fresh wikidata journal following reimage" --blazegraph_instance blazegraph` on `ryankemper@cumin1001` tmux session `reimage` (will likely fail due to underlying hw but we'll see)
  • 01:47 ryankemper@cumin1001: END (ERROR) - Cookbook sre.wdqs.data-transfer (exit_code=97)
  • 01:45 ryankemper: T280382 `sudo -i cookbook sre.wdqs.data-transfer --source wdqs1006.eqiad.wmnet --dest wdqs1011.eqiad.wmnet --reason "transferring fresh wikidata journal following reimage" --blazegraph_instance blazegraph` on `ryankemper@cumin1001` tmux session `reimage`
  • 01:45 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 01:44 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 01:43 ryankemper: T280382 [WDQS] `racadm>>racadm serveraction powercycle` on `wdqs2007`
  • 01:39 ryankemper: T280382 `sudo -i cookbook sre.wdqs.data-transfer --source wdqs1006.eqiad.wmnet --dest wdqs1011.eqiad.wmnet --reason "transferring fresh categories journal following reimage" --blazegraph_instance categories` on `ryankemper@cumin1001` tmux session `reimage`
  • 01:39 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 01:36 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation reboot without plugin upgrade (3 nodes at a time) for ElasticSearch cluster search_eqiad: eqiad reboot to apply sec updates - ryankemper@cumin1001 - T280563
  • 00:29 eileen: civicrm revision changed from 94e321dbe0 to e7c610fd87, config revision is 189788d452
  • 00:15 ejegg: updated payments-wiki from 44570561f2 to d449599540
  • 00:08 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 3f6ea8c: Growth: enwiki: Add list of mentors (T281896) (duration: 01m 10s)
  • 00:00 urbanecm@deploy1002: Synchronized fc-list: 9397049: update fc-list to current version on buster (T79424) (duration: 01m 09s)

2021-05-04

  • 23:41 urbanecm@deploy1002: Synchronized wmf-config/config/enwiki.yaml: d29dbb2: Enable Growth features on enwiki in the dark mode (T281896; 3/3) (duration: 01m 09s)
  • 23:40 urbanecm@deploy1002: Synchronized dblists/growthexperiments.dblist: d29dbb2: Enable Growth features on enwiki in the dark mode (T281896; 2/3) (duration: 01m 09s)
  • 23:38 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: d29dbb2: Enable Growth features on enwiki in the dark mode (T281896; 1/3) (duration: 01m 09s)
  • 23:31 urbanecm@deploy1002: Synchronized wmf-config/config/bgwiki.yaml: 5b4c516: Enable Growth team features in dark mode on bgwiki (T280824; 3/3) (duration: 01m 09s)
  • 23:30 urbanecm@deploy1002: sync-file aborted: 5b4c516: Enable Growth team features in dark mode on bgwiki (T280824; 3/3) (duration: 00m 03s)
  • 23:30 urbanecm@deploy1002: Synchronized dblists/growthexperiments.dblist: 5b4c516: Enable Growth team features in dark mode on bgwiki (T280824; 2/3) (duration: 01m 09s)
  • 23:28 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 5b4c516: Enable Growth team features in dark mode on bgwiki (T280824; 1/3) (duration: 01m 09s)
  • 23:26 Urbanecm: Create tables for GrowthExperiments extension on enwiki (T281896)
  • 23:24 Urbanecm: Create tables for GrowthExperiments extension on bgwiki (T280824)
  • 23:22 urbanecm@deploy1002: Synchronized wmf-config/CommonSettings.php: a3c24f3: Avoid using User::getGroups() and ::getEffectiveGroups() (T281823) (duration: 01m 10s)
  • 23:13 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: e467d92: Add extendedconfirmed on ptwiki (T281926) (duration: 01m 10s)
  • 23:06 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 012d613: Add extendedconfirmed on azwiki (T281860) (duration: 01m 10s)
  • 22:49 bblack@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on cp5016.eqsin.wmnet with reason: REIMAGE
  • 22:47 bblack@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on cp5015.eqsin.wmnet with reason: REIMAGE
  • 22:46 bblack@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp5014.eqsin.wmnet with reason: REIMAGE
  • 22:44 bblack@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp5016.eqsin.wmnet with reason: REIMAGE
  • 22:44 bblack@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp5013.eqsin.wmnet with reason: REIMAGE
  • 22:42 bblack@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp5015.eqsin.wmnet with reason: REIMAGE
  • 22:42 bblack@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp5014.eqsin.wmnet with reason: REIMAGE
  • 22:42 bblack@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp5013.eqsin.wmnet with reason: REIMAGE
  • 21:30 eileen: civicrm revision changed from 33a63d5789 to 94e321dbe0, config revision is a212d6ab23
  • 21:17 ebernhardson@deploy1002: Finished deploy [wikimedia/discovery/analytics@06a4a3e]: Bump glent to 0.2.4 (duration: 03m 55s)
  • 21:13 ebernhardson@deploy1002: Started deploy [wikimedia/discovery/analytics@06a4a3e]: Bump glent to 0.2.4
  • 20:13 pt1979@cumin2001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 20:10 pt1979@cumin2001: START - Cookbook sre.dns.netbox
  • 20:09 joal@deploy1002: Finished deploy [analytics/refinery@0dc3ae7] (hadoop-test): Regular analytics weekly train HADOOP-TEST [analytics/refinery@0dc3ae7] (duration: 05m 16s)
  • 20:04 joal@deploy1002: Started deploy [analytics/refinery@0dc3ae7] (hadoop-test): Regular analytics weekly train HADOOP-TEST [analytics/refinery@0dc3ae7]
  • 20:03 joal@deploy1002: Finished deploy [analytics/refinery@0dc3ae7] (thin): Regular analytics weekly train THIN [analytics/refinery@0dc3ae7] (duration: 00m 07s)
  • 20:03 joal@deploy1002: Started deploy [analytics/refinery@0dc3ae7] (thin): Regular analytics weekly train THIN [analytics/refinery@0dc3ae7]
  • 20:03 joal@deploy1002: Finished deploy [analytics/refinery@0dc3ae7]: Regular analytics weekly train [analytics/refinery@0dc3ae7] (duration: 17m 15s)
  • 19:46 joal@deploy1002: Started deploy [analytics/refinery@0dc3ae7]: Regular analytics weekly train [analytics/refinery@0dc3ae7]
  • 19:38 dancy@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.37.0-wmf.4
  • 17:58 dancy@deploy1002: Finished scap: testwikis wikis to 1.37.0-wmf.4 (duration: 42m 33s)
  • 17:26 ebernhardson@deploy1002: Finished deploy [wikimedia/discovery/analytics@0c4538f]: Increase convert_to_esbulk memory overhead (duration: 01m 46s)
  • 17:24 ebernhardson@deploy1002: Started deploy [wikimedia/discovery/analytics@0c4538f]: Increase convert_to_esbulk memory overhead
  • 17:16 dancy@deploy1002: Started scap: testwikis wikis to 1.37.0-wmf.4
  • 17:03 brennen: 1.37.0-wmf.4 was branched at f069fd8 for T281145
  • 17:00 volans: uploaded spicerack_0.0.51 to apt.wikimedia.org bullseye-wikimedia
  • 16:26 ebernhardson@deploy1002: Finished deploy [wikimedia/discovery/analytics@e6ae572]: Increase convert_to_esbulk memory overhead (duration: 01m 54s)
  • 16:25 ebernhardson@deploy1002: Started deploy [wikimedia/discovery/analytics@e6ae572]: Increase convert_to_esbulk memory overhead
  • 16:16 dzahn@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
  • 16:15 dzahn@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
  • 16:13 mutante: k8s: upgrading release=namespaces, helmfile apply to create miscweb namespace T281538
  • 16:13 dzahn@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 16:12 hnowlan@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 16:12 hnowlan@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 16:12 dzahn@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 16:07 hnowlan@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 16:07 hnowlan@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 15:59 hnowlan@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 15:59 hnowlan@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 14:41 jmm@cumin2001: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0)
  • 13:46 moritzm: installing exim security updates on buster
  • 13:39 marostegui@cumin1001: dbctl commit (dc=all): 'db1137 (re)pooling @ 100%: Repool db1137 after mysql upgrade', diff saved to https://phabricator.wikimedia.org/P15721 and previous config saved to /var/cache/conftool/dbconfig/20210504-133950-root.json
  • 13:24 marostegui@cumin1001: dbctl commit (dc=all): 'db1137 (re)pooling @ 75%: Repool db1137 after mysql upgrade', diff saved to https://phabricator.wikimedia.org/P15720 and previous config saved to /var/cache/conftool/dbconfig/20210504-132446-root.json
  • 13:14 moritzm: upgrading linux-libc-dev on buster hosts (to version introduced by 10.9 point release)
  • 13:12 hnowlan@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 13:12 hnowlan@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 13:09 marostegui@cumin1001: dbctl commit (dc=all): 'db1137 (re)pooling @ 50%: Repool db1137 after mysql upgrade', diff saved to https://phabricator.wikimedia.org/P15719 and previous config saved to /var/cache/conftool/dbconfig/20210504-130943-root.json
  • 13:01 moritzm: installing debian-archive-keyring updates on buster
  • 12:54 marostegui@cumin1001: dbctl commit (dc=all): 'db1137 (re)pooling @ 25%: Repool db1137 after mysql upgrade', diff saved to https://phabricator.wikimedia.org/P15718 and previous config saved to /var/cache/conftool/dbconfig/20210504-125439-root.json
  • 12:50 marostegui: Upgrade mysql and kernel on db1137 T281212
  • 12:49 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1137 to upgrade its mysql T281212', diff saved to https://phabricator.wikimedia.org/P15717 and previous config saved to /var/cache/conftool/dbconfig/20210504-124937-marostegui.json
  • 12:48 marostegui@cumin1001: dbctl commit (dc=all): 'db1120 (re)pooling @ 100%: Repool db1120 after mysql upgrade', diff saved to https://phabricator.wikimedia.org/P15716 and previous config saved to /var/cache/conftool/dbconfig/20210504-124848-root.json
  • 12:46 kormat@cumin1001: dbctl commit (dc=all): 'Repooling after sanitarium master switch T280751', diff saved to https://phabricator.wikimedia.org/P15715 and previous config saved to /var/cache/conftool/dbconfig/20210504-124647-kormat.json
  • 12:35 kormat@cumin1001: dbctl commit (dc=all): 'Depooling for sanitarium master switch T280751', diff saved to https://phabricator.wikimedia.org/P15714 and previous config saved to /var/cache/conftool/dbconfig/20210504-123537-kormat.json
  • 12:34 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 15 hosts with reason: Replace db1085 with db1165 T280751
  • 12:34 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 15 hosts with reason: Replace db1085 with db1165 T280751
  • 12:33 marostegui@cumin1001: dbctl commit (dc=all): 'db1120 (re)pooling @ 75%: Repool db1120 after mysql upgrade', diff saved to https://phabricator.wikimedia.org/P15713 and previous config saved to /var/cache/conftool/dbconfig/20210504-123344-root.json
  • 12:27 hnowlan@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 12:27 hnowlan@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 12:22 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 683b876: 5763630: GrowthExperiments: Rename control variant to control, GrowthExperiments: Set linkrecommendation variant to 0 (T281727) (duration: 00m 58s)
  • 12:20 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.3/extensions/GrowthExperiments/: 8f938c2: c8c07ab: GrowthExperiments backports (T281727) (duration: 00m 59s)
  • 12:18 marostegui@cumin1001: dbctl commit (dc=all): 'db1120 (re)pooling @ 50%: Repool db1120 after mysql upgrade', diff saved to https://phabricator.wikimedia.org/P15712 and previous config saved to /var/cache/conftool/dbconfig/20210504-121841-root.json
  • 12:08 jmm@cumin2001: START - Cookbook sre.cassandra.roll-restart
  • 12:03 marostegui@cumin1001: dbctl commit (dc=all): 'db1120 (re)pooling @ 25%: Repool db1120 after mysql upgrade', diff saved to https://phabricator.wikimedia.org/P15711 and previous config saved to /var/cache/conftool/dbconfig/20210504-120337-root.json
  • 11:58 marostegui: Upgrade mysql and kernel on db1120 T281212
  • 11:56 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1120 to upgrade its mysql T281212', diff saved to https://phabricator.wikimedia.org/P15710 and previous config saved to /var/cache/conftool/dbconfig/20210504-115634-marostegui.json
  • 11:40 jmm@cumin2001: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0)
  • 11:31 Urbanecm: Run `User::newSystemUser( 'Maintenance script', [ 'steal' => true ] );` on arwiki, bnwiki, viwiki (T278710, T281703)
  • 11:27 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 87dff0b: GrowthExperiments: Enable link recommendations for target wikis (T278710) (duration: 00m 57s)
  • 11:10 Urbanecm: Create growthexperiments_link_recommendations and growthexperiments_link_submissions on arwiki,bnwiki,viwiki x1 (T266913)
  • 11:06 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 8228f6b: Disable ContentTranslation New article campaign in fiwiki (T277473) (duration: 00m 59s)
  • 10:26 marostegui@cumin1001: dbctl commit (dc=all): 'db1167 (re)pooling @ 100%: Repool db1167', diff saved to https://phabricator.wikimedia.org/P15707 and previous config saved to /var/cache/conftool/dbconfig/20210504-102649-root.json
  • 10:11 marostegui@cumin1001: dbctl commit (dc=all): 'db1167 (re)pooling @ 75%: Repool db1167', diff saved to https://phabricator.wikimedia.org/P15705 and previous config saved to /var/cache/conftool/dbconfig/20210504-101145-root.json
  • 09:57 moritzm: installing bind9 security updates on buster (client side tools/libs only)
  • 09:56 marostegui@cumin1001: dbctl commit (dc=all): 'db1167 (re)pooling @ 50%: Repool db1167', diff saved to https://phabricator.wikimedia.org/P15704 and previous config saved to /var/cache/conftool/dbconfig/20210504-095642-root.json
  • 09:45 godog: +50G for prometheus k8s in codfw
  • 09:41 marostegui@cumin1001: dbctl commit (dc=all): 'db1167 (re)pooling @ 25%: Repool db1167', diff saved to https://phabricator.wikimedia.org/P15703 and previous config saved to /var/cache/conftool/dbconfig/20210504-094138-root.json
  • 09:04 jmm@cumin2001: START - Cookbook sre.cassandra.roll-restart
  • 09:04 moritzm: rolling restart of cassandra in codfw to pick up Java security updates
  • 08:17 marostegui@cumin1001: dbctl commit (dc=all): 'db1082 (re)pooling @ 100%: Repool db1082', diff saved to https://phabricator.wikimedia.org/P15702 and previous config saved to /var/cache/conftool/dbconfig/20210504-081716-root.json
  • 08:02 marostegui: Check tables on db1106, lag will show up on s1 on wiki replicas (T280492)
  • 08:02 marostegui@cumin1001: dbctl commit (dc=all): 'db1161 (re)pooling @ 100%: Repool db1161', diff saved to https://phabricator.wikimedia.org/P15701 and previous config saved to /var/cache/conftool/dbconfig/20210504-080213-root.json
  • 08:02 marostegui@cumin1001: dbctl commit (dc=all): 'db1082 (re)pooling @ 75%: Repool db1082', diff saved to https://phabricator.wikimedia.org/P15700 and previous config saved to /var/cache/conftool/dbconfig/20210504-080212-root.json
  • 08:02 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1106 from s1 vslow to get its tables checked and pool db1099:3311 instead T280492', diff saved to https://phabricator.wikimedia.org/P15699 and previous config saved to /var/cache/conftool/dbconfig/20210504-080206-marostegui.json
  • 07:46 marostegui@cumin1001: dbctl commit (dc=all): 'db1161 (re)pooling @ 75%: Repool db1161', diff saved to https://phabricator.wikimedia.org/P15698 and previous config saved to /var/cache/conftool/dbconfig/20210504-074639-root.json
  • 07:46 marostegui@cumin1001: dbctl commit (dc=all): 'db1082 (re)pooling @ 50%: Repool db1082', diff saved to https://phabricator.wikimedia.org/P15697 and previous config saved to /var/cache/conftool/dbconfig/20210504-074632-root.json
  • 07:31 marostegui@cumin1001: dbctl commit (dc=all): 'db1161 (re)pooling @ 50%: Repool db1161', diff saved to https://phabricator.wikimedia.org/P15696 and previous config saved to /var/cache/conftool/dbconfig/20210504-073135-root.json
  • 07:31 marostegui@cumin1001: dbctl commit (dc=all): 'db1082 (re)pooling @ 25%: Repool db1082', diff saved to https://phabricator.wikimedia.org/P15695 and previous config saved to /var/cache/conftool/dbconfig/20210504-073127-root.json
  • 07:16 marostegui@cumin1001: dbctl commit (dc=all): 'db1161 (re)pooling @ 25%: Repool db1161', diff saved to https://phabricator.wikimedia.org/P15694 and previous config saved to /var/cache/conftool/dbconfig/20210504-071632-root.json
  • 07:16 marostegui@cumin1001: dbctl commit (dc=all): 'db1082 (re)pooling @ 10%: Repool db1082', diff saved to https://phabricator.wikimedia.org/P15693 and previous config saved to /var/cache/conftool/dbconfig/20210504-071623-root.json
  • 07:11 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1161 and db1082 to change s5 sanitarium master T280492', diff saved to https://phabricator.wikimedia.org/P15692 and previous config saved to /var/cache/conftool/dbconfig/20210504-071146-marostegui.json
  • 06:50 marostegui@cumin1001: dbctl commit (dc=all): 'db1158 (re)pooling @ 100%: Repool db1158', diff saved to https://phabricator.wikimedia.org/P15691 and previous config saved to /var/cache/conftool/dbconfig/20210504-065034-root.json
  • 06:35 marostegui@cumin1001: dbctl commit (dc=all): 'db1158 (re)pooling @ 75%: Repool db1158', diff saved to https://phabricator.wikimedia.org/P15690 and previous config saved to /var/cache/conftool/dbconfig/20210504-063530-root.json
  • 06:20 marostegui@cumin1001: dbctl commit (dc=all): 'db1158 (re)pooling @ 50%: Repool db1158', diff saved to https://phabricator.wikimedia.org/P15689 and previous config saved to /var/cache/conftool/dbconfig/20210504-062027-root.json
  • 06:17 marostegui@cumin1001: dbctl commit (dc=all): 'db1118 (re)pooling @ 100%: Repool db1118', diff saved to https://phabricator.wikimedia.org/P15688 and previous config saved to /var/cache/conftool/dbconfig/20210504-061700-root.json
  • 06:05 marostegui@cumin1001: dbctl commit (dc=all): 'db1158 (re)pooling @ 25%: Repool db1158', diff saved to https://phabricator.wikimedia.org/P15687 and previous config saved to /var/cache/conftool/dbconfig/20210504-060523-root.json
  • 06:01 marostegui@cumin1001: dbctl commit (dc=all): 'db1118 (re)pooling @ 75%: Repool db1118', diff saved to https://phabricator.wikimedia.org/P15686 and previous config saved to /var/cache/conftool/dbconfig/20210504-060156-root.json
  • 05:51 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1167 to clone db1178 T275633', diff saved to https://phabricator.wikimedia.org/P15684 and previous config saved to /var/cache/conftool/dbconfig/20210504-055116-marostegui.json
  • 05:50 marostegui@cumin1001: dbctl commit (dc=all): 'db1158 (re)pooling @ 10%: Repool db1158', diff saved to https://phabricator.wikimedia.org/P15683 and previous config saved to /var/cache/conftool/dbconfig/20210504-055020-root.json
  • 05:46 marostegui@cumin1001: dbctl commit (dc=all): 'db1118 (re)pooling @ 50%: Repool db1118', diff saved to https://phabricator.wikimedia.org/P15682 and previous config saved to /var/cache/conftool/dbconfig/20210504-054653-root.json
  • 05:45 marostegui: Stop mysql on db1158 to clone db1178
  • 05:45 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1158 to clone db1178 T275633', diff saved to https://phabricator.wikimedia.org/P15680 and previous config saved to /var/cache/conftool/dbconfig/20210504-054539-marostegui.json
  • 05:36 marostegui: Deploy schema change on s6 codfw, lag will appear - T266486 T268392 T273360
  • 05:31 marostegui@cumin1001: dbctl commit (dc=all): 'db1118 (re)pooling @ 25%: Repool db1118', diff saved to https://phabricator.wikimedia.org/P15678 and previous config saved to /var/cache/conftool/dbconfig/20210504-053149-root.json
  • 05:26 marostegui@cumin1001: dbctl commit (dc=all): 'db1121 (re)pooling @ 100%: Repool db1121', diff saved to https://phabricator.wikimedia.org/P15677 and previous config saved to /var/cache/conftool/dbconfig/20210504-052612-root.json
  • 05:11 marostegui@cumin1001: dbctl commit (dc=all): 'db1121 (re)pooling @ 75%: Repool db1121', diff saved to https://phabricator.wikimedia.org/P15676 and previous config saved to /var/cache/conftool/dbconfig/20210504-051108-root.json
  • 05:07 marostegui: Restart sanitarium hosts to pick up new filters T263817
  • 04:56 marostegui@cumin1001: dbctl commit (dc=all): 'db1121 (re)pooling @ 50%: Repool db1121', diff saved to https://phabricator.wikimedia.org/P15675 and previous config saved to /var/cache/conftool/dbconfig/20210504-045605-root.json
  • 04:41 marostegui@cumin1001: dbctl commit (dc=all): 'db1121 (re)pooling @ 25%: Repool db1121', diff saved to https://phabricator.wikimedia.org/P15674 and previous config saved to /var/cache/conftool/dbconfig/20210504-044101-root.json
  • 04:06 ryankemper@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) reboot without plugin upgrade (3 nodes at a time) for ElasticSearch cluster search_eqiad: eqiad reboot to apply sec updates - ryankemper@cumin1001 - T280563
  • 03:38 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation reboot without plugin upgrade (3 nodes at a time) for ElasticSearch cluster search_eqiad: eqiad reboot to apply sec updates - ryankemper@cumin1001 - T280563
  • 03:38 ryankemper@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) reboot without plugin upgrade (3 nodes at a time) for ElasticSearch cluster search_eqiad: eqiad reboot to apply sec updates - ryankemper@cumin1001 - T280563
  • 03:36 ryankemper: T280563 `sudo -i cookbook sre.elasticsearch.rolling-operation search_eqiad "eqiad reboot to apply sec updates" --reboot --nodes-per-run 3 --start-datetime 2021-04-29T23:04:29 --task-id T280563`
  • 03:35 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation reboot without plugin upgrade (3 nodes at a time) for ElasticSearch cluster search_eqiad: eqiad reboot to apply sec updates - ryankemper@cumin1001 - T280563
  • 02:09 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on phab2002.codfw.wmnet with reason: REIMAGE
  • 02:07 pt1979@cumin2001: START - Cookbook sre.hosts.downtime for 2:00:00 on phab2002.codfw.wmnet with reason: REIMAGE
  • 01:41 ryankemper@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) reboot without plugin upgrade (3 nodes at a time) for ElasticSearch cluster search_eqiad: eqiad reboot to apply sec updates - ryankemper@cumin1001 - T280563

2021-05-03

  • 23:18 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 230ef57: Prepare for new configuration option (T277951) (duration: 00m 57s)
  • 23:15 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 7c47ee1: Replace $wgRelatedArticlesFooterWhitelistedSkins (T277958) (duration: 00m 57s)
  • 23:14 urbanecm@deploy1002: sync-file aborted: 7c47ee1: Replace $wgRelatedArticlesFooterWhitelistedSkins (T277958)¨ (duration: 00m 01s)
  • 22:17 legoktm: ran disable_list for: iegcom wikien-l fundraiser spcommittee-private-l spcommittee-l mediation-en-l test-second wikifr-colloque-l
  • 22:14 mutante: [backup1001:~] $ sudo check_bacula.py --icinga
  • 21:56 ryankemper: T280563 `sudo -i cookbook sre.elasticsearch.rolling-operation search_eqiad "eqiad reboot to apply sec updates" --reboot --nodes-per-run 3 --start-datetime 2021-04-29T23:04:29 --task-id T280563`
  • 21:55 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation reboot without plugin upgrade (3 nodes at a time) for ElasticSearch cluster search_eqiad: eqiad reboot to apply sec updates - ryankemper@cumin1001 - T280563
  • 21:54 ryankemper: T280563 eqiad reboot failed with: `curator.exceptions.FailedExecution: Exception encountered. Rerun with loglevel DEBUG and/or check Elasticsearch logs for more information. Exception: ConnectionTimeout caused by - ReadTimeoutError(HTTPSConnectionPool(host='search.svc.eqiad.wmnet', port=9243): Read timed out. (read timeout=10))`
  • 21:52 ryankemper@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) reboot without plugin upgrade (3 nodes at a time) for ElasticSearch cluster search_eqiad: eqiad reboot to apply sec updates - ryankemper@cumin1001 - T280563
  • 21:47 ryankemper: T280563 `sudo -i cookbook sre.elasticsearch.rolling-operation search_eqiad "eqiad reboot to apply sec updates" --reboot --nodes-per-run 3 --start-datetime 2021-04-29T23:04:29 --task-id T280563`
  • 21:46 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation reboot without plugin upgrade (3 nodes at a time) for ElasticSearch cluster search_eqiad: eqiad reboot to apply sec updates - ryankemper@cumin1001 - T280563
  • 21:32 krinkle@deploy1002: Synchronized wmf-config/InitialiseSettings.php: d95b91648 (duration: 00m 58s)
  • 21:27 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs1011.eqiad.wmnet with reason: REIMAGE
  • 21:25 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs1011.eqiad.wmnet with reason: REIMAGE
  • 21:22 ryankemper: [WDQS] `ryankemper@wdqs1003:~$ sudo pool`
  • 21:20 ryankemper: T280382 [WDQS] `ryankemper@puppetmaster1001:~$ sudo confctl select 'name=wdqs1011.eqiad.wmnet' set/pooled=no`
  • 21:19 ryankemper@puppetmaster1001: conftool action : set/pooled=no; selector: name=wdqs1011.eqiad.wmnet
  • 21:09 ryankemper: T280382 `sudo -i wmf-auto-reimage-host -p T280382 wdqs1011.eqiad.wmnet` on `ryankemper@cumin1001` tmux session `reimage`
  • 21:06 ryankemper: T280382 `sudo -i cookbook sre.wdqs.data-transfer --source wdqs2001.codfw.wmnet --dest wdqs2007.codfw.wmnet --reason "transferring fresh wikidata journal following reimage" --blazegraph_instance blazegraph` on `ryankemper@cumin1001` tmux session `reimage`
  • 21:05 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 21:02 ryankemper: T280382 `wdqs1010.eqiad.wmnet` has been re-imaged and had the appropriate wikidata/categories journal files transferred. `df -h` shows disk space is no longer an issue following the switch to `raid0`: `/dev/md2 2.6T 975G 1.5T 39% /srv`
  • 20:56 ryankemper: T280382 [WDQS] `ryankemper@wdqs2001:~$ sudo run-puppet-agent --force`
  • 20:44 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 20:42 ryankemper@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99)
  • 20:37 ryankemper: T280382 `sudo -i cookbook sre.wdqs.data-transfer --source wdqs2001.codfw.wmnet --dest wdqs2007.codfw.wmnet --reason "transferring fresh categories journal following reimage" --blazegraph_instance categories` on `ryankemper@cumin1001` tmux session `reimage`
  • 20:37 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 19:40 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs2007.codfw.wmnet with reason: REIMAGE
  • 19:39 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs2007.codfw.wmnet with reason: REIMAGE
  • 19:24 ryankemper: T280382 `sudo -i cookbook sre.wdqs.data-transfer --without-lvs --source wdqs1003.eqiad.wmnet --dest wdqs1010.eqiad.wmnet --reason "transferring fresh wikidata journal following reimage" --blazegraph_instance blazegraph` on `ryankemper@cumin1001` tmux session `reimage`
  • 19:24 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 19:21 ryankemper@puppetmaster1001: conftool action : set/pooled=no; selector: name=wdqs1004.eqiad.wmnet
  • 19:21 ryankemper: T280382 [WDQS] `sudo confctl select 'name=wdqs1004.eqiad.wmnet' set/pooled=no` (`wdqs1004` failed re-image [not sure why yet] and won't let me ssh in to depool so using conftool instead)
  • 18:20 Urbanecm: Morning B&C window done
  • 18:19 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.3/extensions/RelatedArticles/resources/ext.relatedArticles.readMore.bootstrap/index.js: cf9d9da: Hotfix: loadRelatedArticles should consider existence of container element (T281547) (duration: 00m 57s)
  • 18:15 urbanecm@deploy1002: Synchronized wmf-config/filebackend.php: bc1bc90: NOOP: beta: Use upload.wikimedia.beta.wmflabs.o for uploads (T281650; 2/2) (duration: 00m 57s)
  • 18:14 urbanecm@deploy1002: Synchronized wmf-config/CommonSettings.php: bc1bc90: NOOP: beta: Use upload.wikimedia.beta.wmflabs.o for uploads (T281650; 1/2) (duration: 00m 58s)
  • 17:44 ryankemper@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) reboot without plugin upgrade (3 nodes at a time) for ElasticSearch cluster search_eqiad: eqiad reboot to apply sec updates - ryankemper@cumin1001 - T280563
  • 17:20 hashar: Restarting CI Jenkins due to "Gearman worker contint2001.wikimedia.org_manager" thread dieing unexpectedly # T281737
  • 16:30 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation reboot without plugin upgrade (3 nodes at a time) for ElasticSearch cluster search_eqiad: eqiad reboot to apply sec updates - ryankemper@cumin1001 - T280563
  • 16:29 ryankemper: T281498 `sudo confctl select 'name=wdqs2004.codfw.wmnet' set/pooled=yes:weight=10` after merge of https://gerrit.wikimedia.org/r/c/operations/puppet/+/684435
  • 16:27 ryankemper@puppetmaster1001: conftool action : set/pooled=yes:weight=10; selector: name=wdqs2004.codfw.wmnet
  • 16:19 legoktm: legoktm@lists1001:~$ sudo apt install default-mysql-client # for temporary debugging
  • 15:48 pt1979@cumin2001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:44 pt1979@cumin2001: START - Cookbook sre.dns.netbox
  • 15:27 Amir1: upgrade group A to mailman3 (T280322)
  • 14:27 volans: uploaded conftool_1.3.1 to apt.wikimedia.org bullseye-wikimedia
  • 13:43 volans: uploaded cumin_4.1.0 to apt.wikimedia.org buster-wikimedia,bullseye-wikimedia
  • 13:10 Urbanecm: Run `User::newSystemUser( 'Maintenance script', [ 'steal' => true ] )` on cswiki to make the user a proper system user (T281703)
  • 12:36 kostajh: Backport window done
  • 12:33 kharlan@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: GrowthExperiments: Set default variant (T278123) GrowthExperiments: enable link recommendations frontend on cswiki (T278710) (duration: 00m 57s)
  • 12:07 kharlan@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: GrowthExperiments: enable link recommendations backend on cswiki (T278710) (duration: 00m 57s)
  • 11:56 kharlan@deploy1002: Synchronized php-1.37.0-wmf.3/extensions/GrowthExperiments: Backport: refreshLinkRecommendations.php: Use per-wiki locks Handle DB readonly errors (T281382) (duration: 00m 58s)
  • 11:15 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.3/extensions/Popups/: a438b64: Fix settings dialog offering ReferencePreviews when unavailable (T281352) (duration: 00m 58s)
  • 11:11 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: c5a7c67: Set wgGEMentorshipMigrationStage to SCHEMA_COMPAT_NEW everywhere (T279853) (duration: 00m 57s)
  • 11:04 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: f1a5ef0: wikidata: post edit constraint jobs on 70% of edits (T204031) (duration: 00m 57s)
  • 10:59 moritzm: installing avahi security updates on buster
  • 10:47 jdrewniak@deploy1002: Synchronized portals: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 00m 57s)
  • 10:46 jdrewniak@deploy1002: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 00m 58s)
  • 09:42 moritzm: installing python3.7 security updates
  • 09:41 joal@deploy1002: Finished deploy [analytics/refinery@584ed6a] (hadoop-test): Hotfix analytics deploy (monthly sqoop) HADOOP-TEST [analytics/refinery@584ed6a] (duration: 29m 24s)
  • 09:12 joal@deploy1002: Started deploy [analytics/refinery@584ed6a] (hadoop-test): Hotfix analytics deploy (monthly sqoop) HADOOP-TEST [analytics/refinery@584ed6a]
  • 09:10 joal@deploy1002: Finished deploy [analytics/refinery@584ed6a] (thin): Hotfix analytics deploy (monthly sqoop) THIN [analytics/refinery@584ed6a] (duration: 00m 07s)
  • 09:10 joal@deploy1002: Started deploy [analytics/refinery@584ed6a] (thin): Hotfix analytics deploy (monthly sqoop) THIN [analytics/refinery@584ed6a]
  • 09:09 joal@deploy1002: Finished deploy [analytics/refinery@584ed6a]: Hotfix analytics deploy (monthly sqoop) [analytics/refinery@584ed6a] (duration: 16m 06s)
  • 08:52 joal@deploy1002: Started deploy [analytics/refinery@584ed6a]: Hotfix analytics deploy (monthly sqoop) [analytics/refinery@584ed6a]
  • 08:01 moritzm: installing edk2 security updates
  • 07:31 moritzm: installing libimage-exiftool-perl security updates

2021-05-02

  • 13:40 dcaro@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on cloudmetrics1002.eqiad.wmnet with reason: Flaky host
  • 13:40 dcaro@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on cloudmetrics1002.eqiad.wmnet with reason: Flaky host

2021-05-01

  • 19:12 Urbanecm: Invalidate password for MaraBot@SUL (T281586)
  • 16:58 legoktm@deploy1002: Synchronized logos/config.yaml: Add eswiki 20th anniversary logos (duration: 00m 57s)
  • 16:56 legoktm@deploy1002: Synchronized wmf-config/logos.php: Use eswiki 20th anniversary logos (T280908) (duration: 00m 56s)
  • 16:50 legoktm@deploy1002: Synchronized static/images/project-logos/: Add eswiki 20th anniversary logos (duration: 00m 57s)
  • 07:22 elukey: powercycle elastic2033 - no ssh, no tty available via mgmt

Archives

See Server Admin Log/Archives.