You are browsing a read-only backup copy of Wikitech. The primary site can be found at wikitech.wikimedia.org

Difference between revisions of "Server Admin Log"

From Wikitech-static
Jump to navigation Jump to search
imported>Stashbot
(legoktm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Switch enwiki to use enwiki20 "Option A" logo variant (T272526) (duration: 00m 57s))
imported>Stashbot
(mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .)
 
(243 intermediate revisions by 3 users not shown)
Line 1: Line 1:
== 2021-01-23 ==
== 2021-10-21 ==
* 00:46 legoktm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Switch enwiki to use enwiki20 "Option A" logo variant ([[phab:T272526|T272526]]) (duration: 00m 57s)
* 00:06 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 00:36 legoktm@deploy1001: Synchronized static/images/project-logos/: Add enwiki20 "Option A" fixed logos ([[phab:T272526|T272526]]) (duration: 00m 59s)
* 00:01 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .


== 2021-01-22 ==
== 2021-10-20 ==
* 22:41 reedy@deploy1001: Synchronized invalid.json: (no justification provided) (duration: 00m 58s)
* 23:56 thcipriani@deploy1002: Finished scap: Backport: [[gerrit:732336{{!}}Restore title to mobile skin without logo (T290525)]] (duration: 11m 41s)
* 20:07 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1268.eqiad.wmnet with reason: REIMAGE
* 23:44 thcipriani@deploy1002: Started scap: Backport: [[gerrit:732336{{!}}Restore title to mobile skin without logo (T290525)]]
* 20:05 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1268.eqiad.wmnet with reason: REIMAGE
* 23:42 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 20:05 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mw2330.codfw.wmnet with reason: REIMAGE
* 23:39 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 20:05 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mw2332.codfw.wmnet with reason: REIMAGE
* 23:30 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 20:03 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2328.codfw.wmnet with reason: REIMAGE
* 23:29 tstarling@deploy1002: Synchronized wmf-config/CommonSettings.php: fawiki require login for creation of pages in the draft namespace [[phab:T291018|T291018]] (duration: 01m 02s)
* 20:01 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mw1413.eqiad.wmnet with reason: REIMAGE
* 23:27 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 20:01 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2334.codfw.wmnet with reason: REIMAGE
* 23:27 tstarling@deploy1002: Synchronized wmf-config/InitialiseSettings.php: fawiki require login to edit main namespace [[phab:T291018|T291018]] (duration: 01m 04s)
* 20:00 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1413.eqiad.wmnet with reason: REIMAGE
* 22:13 dancy@deploy1002: Synchronized README: testing (4/4) (duration: 02m 52s)
* 20:00 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2328.codfw.wmnet with reason: REIMAGE
* 22:00 dancy@deploy1002: Synchronized README: testing (3/4) (duration: 02m 57s)
* 20:00 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2330.codfw.wmnet with reason: REIMAGE
* 21:54 dancy@deploy1002: Synchronized README: testing (2) (duration: 01m 02s)
* 20:00 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2332.codfw.wmnet with reason: REIMAGE
* 21:52 dancy@deploy1002: Synchronized README: (no justification provided) (duration: 01m 03s)
* 19:59 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2334.codfw.wmnet with reason: REIMAGE
* 21:50 dancy: Testing a series of one-file scap sync-file runs
* 19:39 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2356.codfw.wmnet
* 21:22 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 19:38 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2354.codfw.wmnet
* 21:19 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 19:38 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2352.codfw.wmnet
* 21:10 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 19:36 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2350.codfw.wmnet
* 21:08 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|b9cf996a38d82fdd67e600a5a951e88423957e8d}}: Promote Growth features out of darkmode on several wikis  ([[phab:T291826|T291826]], [[phab:T255037|T255037]], [[phab:T287878|T287878]]) (duration: 01m 04s)
* 19:35 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2352.codfw.wmnet
* 21:07 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 19:35 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2350.codfw.wmnet
* 20:38 eileen: civicrm revision changed from {{Gerrit|9b5e0d015b}} to {{Gerrit|e889831012}}, config revision is {{Gerrit|eed79486d5}}
* 19:35 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2354.codfw.wmnet
* 20:25 legoktm: uploaded php7.4 on buster to apt.wm.o ([[phab:T293449|T293449]])
* 19:34 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2356.codfw.wmnet
* 19:24 ebernhardson@deploy1002: Finished deploy [search/mjolnir/deploy@985a139]: bulk_daemon: detect cross-cluste config from old and new locations (duration: 00m 46s)
* 19:15 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mw2350.codfw.wmnet with reason: REIMAGE
* 19:24 ebernhardson@deploy1002: Started deploy [search/mjolnir/deploy@985a139]: bulk_daemon: detect cross-cluste config from old and new locations
* 19:13 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2352.codfw.wmnet with reason: REIMAGE
* 19:09 mutante: disabling puppet on mw* for a minute to deploy a change
* 19:11 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2354.codfw.wmnet with reason: REIMAGE
* 18:41 otto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
* 19:10 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2350.codfw.wmnet with reason: REIMAGE
* 18:41 otto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-main' for release 'canary' .
* 19:09 mutante: releases1002 systemctl reset-failed
* 18:31 otto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-main' for release 'canary' .
* 19:09 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2356.codfw.wmnet with reason: REIMAGE
* 18:30 otto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
* 19:09 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2352.codfw.wmnet with reason: REIMAGE
* 18:24 otto@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
* 19:08 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2354.codfw.wmnet with reason: REIMAGE
* 17:28 mutante: [krb1001:~] $ sudo manage_principals.py create statwithlatte --email_address=naray-ctr@wikimedia.org -  [[phab:T293810|T293810]]
* 19:07 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2356.codfw.wmnet with reason: REIMAGE
* 17:27 mutante: [krb1001:~] $ sudo manage_principals.py create statwithlatte --email_address=naray-ctr@wikimedia.org
* 18:47 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2364.codfw.wmnet
* 17:11 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 18:47 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2362.codfw.wmnet
* 17:05 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 18:47 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2360.codfw.wmnet
* 17:01 razzi@deploy1002: Finished deploy [analytics/refinery@9e3295f]: Regular analytics weekly train [analytics/refinery@9e3295f] (duration: 23m 42s)
* 18:46 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2358.codfw.wmnet
* 17:00 hashar@deploy1002: Synchronized php-1.38.0-wmf.5/extensions/Wikibase/client: Update deprecated calls to ParserOutput in ShortDescHandler - [[phab:T293860|T293860]] (duration: 01m 03s)
* 18:46 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2362.codfw.wmnet
* 16:56 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 18:46 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2364.codfw.wmnet
* 16:53 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 18:45 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2360.codfw.wmnet
* 16:53 hashar@deploy1002: Synchronized php-1.38.0-wmf.5/extensions/LiquidThreads/pages/LqtDiscussionPager.php: Remove deprecated usage of setProperty - [[phab:T293895|T293895]] (duration: 01m 03s)
* 18:45 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2358.codfw.wmnet
* 16:49 hashar@deploy1002: Synchronized php-1.38.0-wmf.5/extensions/GeoCrumbs: Replace use of deprecated ParserOutput:getProperty() - [[phab:T293894|T293894]] (duration: 01m 09s)
* 18:17 mutante: releases2002 - rebooting to confirm works now and also new disk gets auto-mounted
* 16:44 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 18:03 mutante: releases1002 - replaced ens5 with ens6 in /etc/network/interfaaces and rebooted again
* 16:41 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 18:01 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on releases1002.eqiad.wmnet with reason: fixing networking - added disk
* 16:37 razzi@deploy1002: Started deploy [analytics/refinery@9e3295f]: Regular analytics weekly train [analytics/refinery@9e3295f]
* 18:01 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on releases1002.eqiad.wmnet with reason: fixing networking - added disk
* 16:36 razzi: deploy refinery change for https://phabricator.wikimedia.org/T287084
* 17:59 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on mw2360.codfw.wmnet with reason: new install on buster
* 16:13 jbond: upload cas_6.4.2-1_amd64.deb
* 17:59 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on mw2360.codfw.wmnet with reason: new install on buster
* 15:42 volans@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 17:57 mutante: releases1002 (releases.wm.org active backend) - rebooting - hopefully it does not run into [[phab:T272555|T272555]] but if it does now it's known how to fix
* 15:39 volans@cumin2002: START - Cookbook sre.dns.netbox
* 17:55 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mw2364.codfw.wmnet with reason: REIMAGE
* 14:57 moritzm: installing modsecurity-crs security updates on Buster
* 17:54 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mw2360.codfw.wmnet with reason: REIMAGE
* 14:48 moritzm: installing xmlgraphics-commons security updates on Buster
* 17:53 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2358.codfw.wmnet with reason: REIMAGE
* 14:46 moritzm: installing irssi security updates on Buster
* 17:52 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2362.codfw.wmnet with reason: REIMAGE
* 14:44 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
* 17:52 mutante: releases2001 - create new partition table with fdisk, make ext4 filesystem on /dev/vdb1
* 14:44 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality' for release 'main' .
* 17:50 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2364.codfw.wmnet with reason: REIMAGE
* 14:35 moritzm: installing commons-io security updates on Buster
* 17:50 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2362.codfw.wmnet with reason: REIMAGE
* 14:27 ema: cp3062: test higher vsl_space values [[phab:T293879|T293879]]
* 17:49 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2360.codfw.wmnet with reason: REIMAGE
* 14:27 kevinbazira@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
* 17:49 ppchelko@deploy1001: Finished deploy [restbase/deploy@e54225d]: [[phab:T270411|T270411]] [[phab:T270415|T270415]] [[phab:T270281|T270281]] [[phab:T270277|T270277]] (duration: 65m 37s)
* 14:12 moritzm: installing ruby2.3 security updates
* 17:49 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2358.codfw.wmnet with reason: REIMAGE
* 13:40 moritzm: installing apache2 security updates on buster
* 17:29 mforns@deploy1001: Finished deploy [analytics/refinery@eea071d] (thin): Extra bug-fix train THIN [analytics/refinery@eea071def90a8a856b1e04dda23b77a850134253] (duration: 00m 07s)
* 13:27 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 17:29 mforns@deploy1001: Started deploy [analytics/refinery@eea071d] (thin): Extra bug-fix train THIN [analytics/refinery@eea071def90a8a856b1e04dda23b77a850134253]
* 13:24 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 17:23 mforns@deploy1001: Finished deploy [analytics/refinery@eea071d]: Extra bug-fix train [analytics/refinery@eea071def90a8a856b1e04dda23b77a850134253] (duration: 10m 03s)
* 13:21 hashar@deploy1002: Synchronized php: group1 wikis to 1.38.0-wmf.5  refs [[phab:T281169|T281169]] (duration: 01m 02s)
* 17:13 mforns@deploy1001: Started deploy [analytics/refinery@eea071d]: Extra bug-fix train [analytics/refinery@eea071def90a8a856b1e04dda23b77a850134253]
* 13:20 hashar@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.38.0-wmf.5  refs [[phab:T281169|T281169]]
* 16:44 ppchelko@deploy1001: Started deploy [restbase/deploy@e54225d]: [[phab:T270411|T270411]] [[phab:T270415|T270415]] [[phab:T270281|T270281]] [[phab:T270277|T270277]]
* 13:11 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on 7 hosts with reason: Schema change s3 [[phab:T277116|T277116]]
* 16:40 cmjohnson1: replacing optics/fiber  pfw3a-eqiad:xe-0/0/17 and fasw-c1a-eqiad:xe-0/2/0 [[phab:T271295|T271295]]
* 13:11 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on 7 hosts with reason: Schema change s3 [[phab:T277116|T277116]]
* 16:19 jynus: restart of backup source hosts on codfw [[phab:T271913|T271913]]
* 13:04 ema@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp3062.esams.wmnet,service=ats-tls
* 15:54 otto@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventstreams-internal' for release 'main' .
* 13:04 ema@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp3062.esams.wmnet,service=varnish-fe
* 15:40 moritzm: installing puppetboard1002
* 12:51 ema: cp3062: bump vsl_space from 80M (default) to 512M [[phab:T293879|T293879]] - varnish restart needed
* 15:24 moritzm: installing puppetboard2002
* 12:37 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on 14 hosts with reason: Schema change s1 [[phab:T277116|T277116]]
* 13:44 kormat@cumin1001: dbctl commit (dc=all): 'db1149 (re)pooling @ 100%: Reboot [[phab:T272255|T272255]]', diff saved to https://phabricator.wikimedia.org/P13932 and previous config saved to /var/cache/conftool/dbconfig/20210122-134444-kormat.json
* 12:36 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on 14 hosts with reason: Schema change s1 [[phab:T277116|T277116]]
* 13:33 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1121', diff saved to https://phabricator.wikimedia.org/P13931 and previous config saved to /var/cache/conftool/dbconfig/20210122-133341-marostegui.json
* 12:17 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 13:31 marostegui: Stop replication on db1121
* 12:09 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 13:30 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1121', diff saved to https://phabricator.wikimedia.org/P13930 and previous config saved to /var/cache/conftool/dbconfig/20210122-133044-marostegui.json
* 12:02 urbanecm@deploy1002: Finished scap: {{Gerrit|802d3b7}}: {{Gerrit|e4f7f85}}: CreateAccountCampaign: Support for recurring donors ([[phab:T293699|T293699]]) (duration: 25m 19s)
* 13:29 kormat@cumin1001: dbctl commit (dc=all): 'db1149 (re)pooling @ 75%: Reboot [[phab:T272255|T272255]]', diff saved to https://phabricator.wikimedia.org/P13929 and previous config saved to /var/cache/conftool/dbconfig/20210122-132939-kormat.json
* 11:57 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 13:21 jmm@cumin2001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host puppetboard2002.codfw.wmnet
* 11:49 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 13:20 kormat@cumin1001: dbctl commit (dc=all): 'es1023 (re)pooling @ 100%: Reboot [[phab:T272121|T272121]]', diff saved to https://phabricator.wikimedia.org/P13927 and previous config saved to /var/cache/conftool/dbconfig/20210122-132028-kormat.json
* 11:46 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts testvm2007.codfw.wmnet
* 13:14 kormat@cumin1001: dbctl commit (dc=all): 'db1149 (re)pooling @ 50%: Reboot [[phab:T272255|T272255]]', diff saved to https://phabricator.wikimedia.org/P13926 and previous config saved to /var/cache/conftool/dbconfig/20210122-131436-kormat.json
* 11:40 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts testvm2007.codfw.wmnet
* 13:05 kormat@cumin1001: dbctl commit (dc=all): 'es1023 (re)pooling @ 75%: Reboot [[phab:T272121|T272121]]', diff saved to https://phabricator.wikimedia.org/P13925 and previous config saved to /var/cache/conftool/dbconfig/20210122-130525-kormat.json
* 11:37 btullis@cumin1001: END (PASS) - Cookbook sre.hadoop.roll-restart-masters (exit_code=0) restart masters for Hadoop test cluster: Restart of jvm daemons. - btullis@cumin1001
* 12:59 kormat@cumin1001: dbctl commit (dc=all): 'db1149 (re)pooling @ 25%: Reboot [[phab:T272255|T272255]]', diff saved to https://phabricator.wikimedia.org/P13924 and previous config saved to /var/cache/conftool/dbconfig/20210122-125932-kormat.json
* 11:37 urbanecm@deploy1002: Started scap: {{Gerrit|802d3b7}}: {{Gerrit|e4f7f85}}: CreateAccountCampaign: Support for recurring donors ([[phab:T293699|T293699]])
* 12:54 jmm@cumin2001: START - Cookbook sre.ganeti.makevm for new host puppetboard2002.codfw.wmnet
* 11:32 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts testvm2005.codfw.wmnet
* 12:53 jmm@cumin2001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host puppetboard1002.eqiad.wmnet
* 11:21 moritzm: installing ffmpeg security updates
* 12:50 kormat@cumin1001: dbctl commit (dc=all): 'es1023 (re)pooling @ 50%: Reboot [[phab:T272121|T272121]]', diff saved to https://phabricator.wikimedia.org/P13923 and previous config saved to /var/cache/conftool/dbconfig/20210122-125021-kormat.json
* 11:15 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|e520fc57411bb19123766192cd636396ea6fc59d}}: GrowthExperiments: Add campaign pattern for enwiki ([[phab:T293699|T293699]]) (duration: 01m 22s)
* 12:47 kormat@cumin1001: dbctl commit (dc=all): 'db1149 depooling: Rebooting for [[phab:T272255|T272255]]', diff saved to https://phabricator.wikimedia.org/P13922 and previous config saved to /var/cache/conftool/dbconfig/20210122-124748-kormat.json
* 11:11 btullis@cumin1001: START - Cookbook sre.hadoop.roll-restart-masters restart masters for Hadoop test cluster: Restart of jvm daemons. - btullis@cumin1001
* 12:47 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on db1149.eqiad.wmnet with reason: Rebooting for [[phab:T272255|T272255]]
* 11:10 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 12:47 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:30:00 on db1149.eqiad.wmnet with reason: Rebooting for [[phab:T272255|T272255]]
* 11:07 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 12:43 kormat@cumin1001: dbctl commit (dc=all): 'Remove db1110 from api group [[phab:T272255|T272255]]', diff saved to https://phabricator.wikimedia.org/P13921 and previous config saved to /var/cache/conftool/dbconfig/20210122-124310-kormat.json
* 10:57 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts testvm2005.codfw.wmnet
* 12:38 jmm@cumin2001: START - Cookbook sre.ganeti.makevm for new host puppetboard1002.eqiad.wmnet
* 10:13 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 13 hosts with reason: Schema change s4 [[phab:T277116|T277116]]
* 12:38 kormat@cumin1001: dbctl commit (dc=all): 'Remove db1127 from api group [[phab:T272255|T272255]]', diff saved to https://phabricator.wikimedia.org/P13920 and previous config saved to /var/cache/conftool/dbconfig/20210122-123832-kormat.json
* 10:13 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 13 hosts with reason: Schema change s4 [[phab:T277116|T277116]]
* 12:35 kormat@cumin1001: dbctl commit (dc=all): 'es1023 (re)pooling @ 25%: Reboot [[phab:T272121|T272121]]', diff saved to https://phabricator.wikimedia.org/P13919 and previous config saved to /var/cache/conftool/dbconfig/20210122-123518-kormat.json
* 09:59 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 9 hosts with reason: Schema change s2 [[phab:T277116|T277116]]
* 12:33 volker-e@deploy1001: Finished deploy [design/style-guide@9a811b8]: Deploy design/style-guide: {{Gerrit|9a811b8}} Add Language selectors to component overview Sketch document (#424) (duration: 00m 07s)
* 09:59 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 9 hosts with reason: Schema change s2 [[phab:T277116|T277116]]
* 12:33 volker-e@deploy1001: Started deploy [design/style-guide@9a811b8]: Deploy design/style-guide: {{Gerrit|9a811b8}} Add Language selectors to component overview Sketch document (#424)
* 09:52 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 11 hosts with reason: Schema change s7 [[phab:T277116|T277116]]
* 12:10 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=0) for hosts an-worker[1135,1137].eqiad.wmnet
* 09:52 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 11 hosts with reason: Schema change s7 [[phab:T277116|T277116]]
* 12:08 elukey@cumin1001: START - Cookbook sre.hadoop.init-hadoop-workers for hosts an-worker[1135,1137].eqiad.wmnet
* 09:05 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 9 hosts with reason: Schema change s5 [[phab:T277116|T277116]]
* 12:00 kormat@cumin1001: dbctl commit (dc=all): 'db1141 (re)pooling @ 100%: Reboot [[phab:T272255|T272255]]', diff saved to https://phabricator.wikimedia.org/P13918 and previous config saved to /var/cache/conftool/dbconfig/20210122-120011-kormat.json
* 09:04 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 9 hosts with reason: Schema change s5 [[phab:T277116|T277116]]
* 11:54 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'similar-users' for release 'main' .
* 08:50 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 9 hosts with reason: Schema change s6 [[phab:T277116|T277116]]
* 11:51 kormat@cumin1001: dbctl commit (dc=all): 'db1136 (re)pooling @ 100%: Reboot [[phab:T272255|T272255]]', diff saved to https://phabricator.wikimedia.org/P13917 and previous config saved to /var/cache/conftool/dbconfig/20210122-115113-kormat.json
* 08:50 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 9 hosts with reason: Schema change s6 [[phab:T277116|T277116]]
* 11:50 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on es1023.eqiad.wmnet with reason: Extended reboot for [[phab:T272121|T272121]]
* 08:01 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
* 11:50 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on es1023.eqiad.wmnet with reason: Extended reboot for [[phab:T272121|T272121]]
* 08:01 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 11:46 kormat@cumin1001: dbctl commit (dc=all): 'db1134 (re)pooling @ 100%: Reboot [[phab:T272255|T272255]]', diff saved to https://phabricator.wikimedia.org/P13916 and previous config saved to /var/cache/conftool/dbconfig/20210122-114642-kormat.json
* 07:16 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1118.eqiad.wmnet with OS buster
* 11:45 kormat@cumin1001: dbctl commit (dc=all): 'db1141 (re)pooling @ 75%: Reboot [[phab:T272255|T272255]]', diff saved to https://phabricator.wikimedia.org/P13915 and previous config saved to /var/cache/conftool/dbconfig/20210122-114507-kormat.json
* 07:09 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 11:40 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mwdebug1003.eqiad.wmnet
* 06:49 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host db1118.eqiad.wmnet with OS buster
* 11:36 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on es1023.eqiad.wmnet with reason: Reboot for [[phab:T272121|T272121]]
* 06:45 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1118 (s1) for reimage [[phab:T290865|T290865]]', diff saved to https://phabricator.wikimedia.org/P17552 and previous config saved to /var/cache/conftool/dbconfig/20211020-064529-marostegui.json
* 11:36 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on es1023.eqiad.wmnet with reason: Reboot for [[phab:T272121|T272121]]
* 06:41 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1126.eqiad.wmnet with OS buster
* 11:36 kormat@cumin1001: dbctl commit (dc=all): 'db1136 (re)pooling @ 75%: Reboot [[phab:T272255|T272255]]', diff saved to https://phabricator.wikimedia.org/P13914 and previous config saved to /var/cache/conftool/dbconfig/20210122-113610-kormat.json
* 06:39 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1106 (s1) after upgrade', diff saved to https://phabricator.wikimedia.org/P17551 and previous config saved to /var/cache/conftool/dbconfig/20211020-063926-marostegui.json
* 11:31 kormat@cumin1001: dbctl commit (dc=all): 'db1134 (re)pooling @ 75%: Reboot [[phab:T272255|T272255]]', diff saved to https://phabricator.wikimedia.org/P13913 and previous config saved to /var/cache/conftool/dbconfig/20210122-113139-kormat.json
* 06:35 marostegui: Upgrade db1106
* 11:30 kormat@cumin1001: dbctl commit (dc=all): 'db1141 (re)pooling @ 50%: Reboot [[phab:T272255|T272255]]', diff saved to https://phabricator.wikimedia.org/P13912 and previous config saved to /var/cache/conftool/dbconfig/20210122-113004-kormat.json
* 06:34 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1106 (s1) for upgrade', diff saved to https://phabricator.wikimedia.org/P17550 and previous config saved to /var/cache/conftool/dbconfig/20211020-063431-marostegui.json
* 11:26 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host mwdebug1003.eqiad.wmnet
* 06:31 dcausse: restarting blazegraph on wdqs1012
* 11:24 kormat@cumin1001: dbctl commit (dc=all): 'es1023 depooling: enable report_host [[phab:T271106|T271106]]', diff saved to https://phabricator.wikimedia.org/P13911 and previous config saved to /var/cache/conftool/dbconfig/20210122-112424-kormat.json
* 06:28 elukey: reboot analytics1066 - OS showing CPU soft lockups, tons of defunct processes (including node manager) and high CPU usage
* 11:24 hnowlan: joining restbase2009-a to cluster
* 06:21 marostegui: Depool clouddb1013 for upgrade
* 11:21 kormat@cumin1001: dbctl commit (dc=all): 'db1136 (re)pooling @ 50%: Reboot [[phab:T272255|T272255]]', diff saved to https://phabricator.wikimedia.org/P13910 and previous config saved to /var/cache/conftool/dbconfig/20210122-112106-kormat.json
* 06:14 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host db1126.eqiad.wmnet with OS buster
* 11:17 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host urldownloader1001.wikimedia.org
* 06:12 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 11:16 kormat@cumin1001: dbctl commit (dc=all): 'db1134 (re)pooling @ 50%: Reboot [[phab:T272255|T272255]]', diff saved to https://phabricator.wikimedia.org/P13909 and previous config saved to /var/cache/conftool/dbconfig/20210122-111635-kormat.json
* 06:12 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1126 (s8) for upgrade', diff saved to https://phabricator.wikimedia.org/P17549 and previous config saved to /var/cache/conftool/dbconfig/20211020-061202-marostegui.json
* 11:15 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host urldownloader1001.wikimedia.org
* 06:06 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 11:15 kormat@cumin1001: dbctl commit (dc=all): 'db1141 (re)pooling @ 25%: Reboot [[phab:T272255|T272255]]', diff saved to https://phabricator.wikimedia.org/P13908 and previous config saved to /var/cache/conftool/dbconfig/20210122-111500-kormat.json
* 06:05 XioNoX: put transport link between ulsfo and eqsin in service - [[phab:T273308|T273308]]
* 11:11 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host urldownloader2001.wikimedia.org
* 05:59 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2112.codfw.wmnet with OS buster
* 11:09 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host urldownloader2001.wikimedia.org
* 05:26 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host db2112.codfw.wmnet with OS buster
* 11:06 kormat@cumin1001: dbctl commit (dc=all): 'db1136 (re)pooling @ 25%: Reboot [[phab:T272255|T272255]]', diff saved to https://phabricator.wikimedia.org/P13906 and previous config saved to /var/cache/conftool/dbconfig/20210122-110603-kormat.json
* 04:44 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 11:05 jbond42: deploy cairo updates to jessie
* 04:42 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 11:02 kormat@cumin1001: dbctl commit (dc=all): 'db1141 depooling: Rebooting for [[phab:T272255|T272255]]', diff saved to https://phabricator.wikimedia.org/P13905 and previous config saved to /var/cache/conftool/dbconfig/20210122-110229-kormat.json
* 04:40 legoktm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Enable $wgLocalHTTPProxy on group0 wikis ([[phab:T288848|T288848]]) (duration: 01m 05s)
* 11:02 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on db1141.eqiad.wmnet with reason: Rebooting for [[phab:T272255|T272255]]
* 01:31 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 11:02 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:30:00 on db1141.eqiad.wmnet with reason: Rebooting for [[phab:T272255|T272255]]
* 01:28 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 11:01 kormat@cumin1001: dbctl commit (dc=all): 'db1134 (re)pooling @ 25%: Reboot [[phab:T272255|T272255]]', diff saved to https://phabricator.wikimedia.org/P13904 and previous config saved to /var/cache/conftool/dbconfig/20210122-110132-kormat.json
* 00:03 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 10:59 kormat@cumin1001: dbctl commit (dc=all): 'db1136 depooling: Rebooting for [[phab:T272255|T272255]]', diff saved to https://phabricator.wikimedia.org/P13903 and previous config saved to /var/cache/conftool/dbconfig/20210122-105952-kormat.json
* 00:00 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 10:59 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on db1136.eqiad.wmnet with reason: Rebooting for [[phab:T272255|T272255]]
* 00:00 tgr: west coast evening deploys done
* 10:59 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:30:00 on db1136.eqiad.wmnet with reason: Rebooting for [[phab:T272255|T272255]]
* 10:59 kormat@cumin1001: dbctl commit (dc=all): 'Temporarily add db1127 to api group [[phab:T272255|T272255]]', diff saved to https://phabricator.wikimedia.org/P13902 and previous config saved to /var/cache/conftool/dbconfig/20210122-105921-kormat.json
* 10:56 kormat@cumin1001: dbctl commit (dc=all): 'db1134 depooling: Rebooting for [[phab:T272255|T272255]]', diff saved to https://phabricator.wikimedia.org/P13901 and previous config saved to /var/cache/conftool/dbconfig/20210122-105636-kormat.json
* 10:56 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on db1134.eqiad.wmnet with reason: Rebooting for [[phab:T272255|T272255]]
* 10:56 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:30:00 on db1134.eqiad.wmnet with reason: Rebooting for [[phab:T272255|T272255]]
* 10:53 kormat@cumin1001: dbctl commit (dc=all): 'Remove db1088 from api group [[phab:T271106|T271106]]', diff saved to https://phabricator.wikimedia.org/P13900 and previous config saved to /var/cache/conftool/dbconfig/20210122-105345-kormat.json
* 10:52 kormat@cumin1001: dbctl commit (dc=all): 'db1130 (re)pooling @ 100%: Reboot [[phab:T272255|T272255]]', diff saved to https://phabricator.wikimedia.org/P13899 and previous config saved to /var/cache/conftool/dbconfig/20210122-105244-kormat.json
* 10:37 kormat@cumin1001: dbctl commit (dc=all): 'db1130 (re)pooling @ 75%: Reboot [[phab:T272255|T272255]]', diff saved to https://phabricator.wikimedia.org/P13898 and previous config saved to /var/cache/conftool/dbconfig/20210122-103741-kormat.json
* 10:36 kormat@cumin1001: dbctl commit (dc=all): 'db1093 (re)pooling @ 100%: Reboot [[phab:T272255|T272255]]', diff saved to https://phabricator.wikimedia.org/P13897 and previous config saved to /var/cache/conftool/dbconfig/20210122-103609-kormat.json
* 10:22 kormat@cumin1001: dbctl commit (dc=all): 'db1130 (re)pooling @ 50%: Reboot [[phab:T272255|T272255]]', diff saved to https://phabricator.wikimedia.org/P13895 and previous config saved to /var/cache/conftool/dbconfig/20210122-102237-kormat.json
* 10:21 kormat@cumin1001: dbctl commit (dc=all): 'db1093 (re)pooling @ 75%: Reboot [[phab:T272255|T272255]]', diff saved to https://phabricator.wikimedia.org/P13894 and previous config saved to /var/cache/conftool/dbconfig/20210122-102105-kormat.json
* 10:18 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host archiva1002.wikimedia.org
* 10:16 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host archiva1002.wikimedia.org
* 10:07 kormat@cumin1001: dbctl commit (dc=all): 'db1130 (re)pooling @ 25%: Reboot [[phab:T272255|T272255]]', diff saved to https://phabricator.wikimedia.org/P13893 and previous config saved to /var/cache/conftool/dbconfig/20210122-100734-kormat.json
* 10:06 kormat@cumin1001: dbctl commit (dc=all): 'db1093 (re)pooling @ 50%: Reboot [[phab:T272255|T272255]]', diff saved to https://phabricator.wikimedia.org/P13892 and previous config saved to /var/cache/conftool/dbconfig/20210122-100602-kormat.json
* 10:03 kormat@cumin1001: dbctl commit (dc=all): 'db1130 depooling: Rebooting for [[phab:T272255|T272255]]', diff saved to https://phabricator.wikimedia.org/P13891 and previous config saved to /var/cache/conftool/dbconfig/20210122-100307-kormat.json
* 10:03 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on db1130.eqiad.wmnet with reason: Rebooting for [[phab:T272255|T272255]]
* 10:03 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:30:00 on db1130.eqiad.wmnet with reason: Rebooting for [[phab:T272255|T272255]]
* 10:02 kormat@cumin1001: dbctl commit (dc=all): 'Temporarily add db1110 to api group [[phab:T272255|T272255]]', diff saved to https://phabricator.wikimedia.org/P13890 and previous config saved to /var/cache/conftool/dbconfig/20210122-100233-kormat.json
* 09:52 moritzm: uploaded cairo 1.14.0-2.1+deb8u2+wmf1 to apt.wikimedia.org
* 09:50 kormat@cumin1001: dbctl commit (dc=all): 'db1093 (re)pooling @ 25%: Reboot [[phab:T272255|T272255]]', diff saved to https://phabricator.wikimedia.org/P13889 and previous config saved to /var/cache/conftool/dbconfig/20210122-095058-kormat.json
* 09:44 kormat@cumin1001: dbctl commit (dc=all): 'db1093 depooling: Rebooting for [[phab:T272255|T272255]]', diff saved to https://phabricator.wikimedia.org/P13888 and previous config saved to /var/cache/conftool/dbconfig/20210122-094453-kormat.json
* 09:44 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on db1093.eqiad.wmnet with reason: Rebooting for [[phab:T272255|T272255]]
* 09:44 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:30:00 on db1093.eqiad.wmnet with reason: Rebooting for [[phab:T272255|T272255]]
* 09:43 kormat@cumin1001: dbctl commit (dc=all): 'Temporarily add db1088 to api group [[phab:T271106|T271106]]', diff saved to https://phabricator.wikimedia.org/P13887 and previous config saved to /var/cache/conftool/dbconfig/20210122-094337-kormat.json
* 08:49 moritzm: installing PIP security updates for stretch
* 08:44 moritzm: installing mutt updates for stretch
* 08:35 XioNoX: Remove BGP for Zayo transit in ulsfo, eqiad, eqord
* 08:33 elukey: update puppet compiler's facts
* 07:26 ryankemper: [WDQS Deploy] WDQS deploy complete; service is healthy
* 06:59 ryankemper: [WDQS Deploy] Restarting `wdqs-categories` across lvs-managed hosts, one node at a time: `sudo -E cumin -b 1 'A:wdqs-all and not A:wdqs-test' 'depool && sleep 45 && systemctl restart wdqs-categories && sleep 45 && pool'`
* 06:59 ryankemper: [WDQS Deploy] Restarted `wdqs-categories` across all test hosts simultaneously: `sudo -E cumin 'A:wdqs-test' 'systemctl restart wdqs-categories'`
* 06:59 ryankemper: [WDQS Deploy] Restarted `wdqs-updater` across all hosts, 4 hosts at a time: `sudo -E cumin -b 4 'A:wdqs-all' 'systemctl restart wdqs-updater'`
* 06:58 ryankemper: [WDQS Deploy] Initial deploy complete, `query.wikidata.org` handles queries fine, proceeding to post-deploy steps
* 06:57 ryankemper@deploy1001: Finished deploy [wdqs/wdqs@70f9d37]: 0.3.60 (duration: 10m 43s)
* 06:50 ryankemper: [WDQS Deploy] All tests passing on canary `wdqs1003` following canary WDQS deploy, proceeding to rest of fleet
* 06:46 ryankemper@deploy1001: Started deploy [wdqs/wdqs@70f9d37]: 0.3.60
* 06:46 ryankemper: [WDQS Deploy] All tests passing on canary `wdqs1003` before WDQS deploy, beginning deploy
* 06:45 ryankemper: [wdqs] re-pooled `wdqs1013` (all caught up on lag)
* 06:16 marostegui: Stop MySQL on db1117 db2133 db2078 [[phab:T272614|T272614]]
* 06:01 marostegui@cumin1001: dbctl commit (dc=all): 'Pool db2143 and db2144 as x2 codfw slaves [[phab:T269324|T269324]]', diff saved to https://phabricator.wikimedia.org/P13885 and previous config saved to /var/cache/conftool/dbconfig/20210122-060147-marostegui.json
* 06:00 marostegui@cumin1001: dbctl commit (dc=all): 'Pool db2142 into x2 as codfw master [[phab:T269324|T269324]]', diff saved to https://phabricator.wikimedia.org/P13884 and previous config saved to /var/cache/conftool/dbconfig/20210122-060007-marostegui.json
* 05:43 marostegui@cumin1001: dbctl commit (dc=all): 'Reduce db1118 weight', diff saved to https://phabricator.wikimedia.org/P13883 and previous config saved to /var/cache/conftool/dbconfig/20210122-054330-marostegui.json
* 01:26 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2368.codfw.wmnet
* 01:25 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2366.codfw.wmnet
* 01:25 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2374.codfw.wmnet
* 01:22 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2368.codfw.wmnet
* 01:22 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2366.codfw.wmnet
* 01:22 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2374.codfw.wmnet
* 01:19 Urbanecm: Evening B&C window finished
* 01:18 urbanecm@deploy1001: Synchronized php-1.36.0-wmf.27/extensions/AbuseFilter/: {{Gerrit|7d8ab70d5b00142e8344e242dd085eb7bfa81145}}: Dont return the status of doBlockInternal when processing block actions (duration: 00m 59s)
* 01:16 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|376cba1b33dd68d40490a1498c59a4d430318ab1}}: Enroll idwiki in the DiscussionTools a/b test ([[phab:T268191|T268191]]) (duration: 00m 55s)
* 01:14 urbanecm@deploy1001: Synchronized php-1.36.0-wmf.27/extensions/DiscussionTools/: {{Gerrit|513a7861bbcf06a8ac5c29e1b9838640cbd7c628}}: A/B test output when a specific feature is being tested ([[phab:T268191|T268191]]) (duration: 00m 55s)
* 01:12 urbanecm@deploy1001: Synchronized php-1.36.0-wmf.27/extensions/WikibaseMediaInfo/: {{Gerrit|4b0259b761681ca90b3f3039019553ddca40a5fe}}: Distinguish between null continue value and unknown one ([[phab:T272548|T272548]]) (duration: 00m 59s)
* 01:06 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2376.codfw.wmnet
* 01:03 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2366.codfw.wmnet with reason: REIMAGE
* 01:02 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2376.codfw.wmnet
* 01:01 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2368.codfw.wmnet with reason: REIMAGE
* 01:00 Urbanecm: Evening B&C still in process, waiting on Zuul
* 00:59 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2366.codfw.wmnet with reason: REIMAGE
* 00:58 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2374.codfw.wmnet with reason: REIMAGE
* 00:58 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2368.codfw.wmnet with reason: REIMAGE
* 00:56 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2374.codfw.wmnet with reason: REIMAGE
* 00:49 robh@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on db1156.eqiad.wmnet with reason: REIMAGE
* 00:49 robh@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on db1174.eqiad.wmnet with reason: REIMAGE
* 00:49 robh@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on db1161.eqiad.wmnet with reason: REIMAGE
* 00:49 robh@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on db1157.eqiad.wmnet with reason: REIMAGE
* 00:49 robh@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on db1167.eqiad.wmnet with reason: REIMAGE
* 00:49 robh@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on db1163.eqiad.wmnet with reason: REIMAGE
* 00:49 robh@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on db1158.eqiad.wmnet with reason: REIMAGE
* 00:49 robh@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on db1168.eqiad.wmnet with reason: REIMAGE
* 00:48 robh@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on db1165.eqiad.wmnet with reason: REIMAGE
* 00:48 robh@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on db1160.eqiad.wmnet with reason: REIMAGE
* 00:48 robh@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on db1170.eqiad.wmnet with reason: REIMAGE
* 00:48 robh@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on db1169.eqiad.wmnet with reason: REIMAGE
* 00:48 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1162.eqiad.wmnet with reason: REIMAGE
* 00:46 robh@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on db1164.eqiad.wmnet with reason: REIMAGE
* 00:46 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1166.eqiad.wmnet with reason: REIMAGE
* 00:44 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1174.eqiad.wmnet with reason: REIMAGE
* 00:44 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1156.eqiad.wmnet with reason: REIMAGE
* 00:44 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1161.eqiad.wmnet with reason: REIMAGE
* 00:44 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1157.eqiad.wmnet with reason: REIMAGE
* 00:44 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1167.eqiad.wmnet with reason: REIMAGE
* 00:44 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1163.eqiad.wmnet with reason: REIMAGE
* 00:44 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1158.eqiad.wmnet with reason: REIMAGE
* 00:44 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1165.eqiad.wmnet with reason: REIMAGE
* 00:44 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1160.eqiad.wmnet with reason: REIMAGE
* 00:44 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1170.eqiad.wmnet with reason: REIMAGE
* 00:44 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1169.eqiad.wmnet with reason: REIMAGE
* 00:44 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1168.eqiad.wmnet with reason: REIMAGE
* 00:44 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1164.eqiad.wmnet with reason: REIMAGE
* 00:44 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1162.eqiad.wmnet with reason: REIMAGE
* 00:44 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1166.eqiad.wmnet with reason: REIMAGE
* 00:41 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2376.codfw.wmnet with reason: REIMAGE
* 00:39 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2376.codfw.wmnet with reason: REIMAGE
* 00:37 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2372.codfw.wmnet
* 00:35 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2370.codfw.wmnet
* 00:31 brennen@deploy1001: rebuilt and synchronized wikiversions files: all wikis to 1.36.0-wmf.27
* 00:27 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|d4f5d6f09977962be1c49471432125a92357ede6}}: Temporarily amend ukwiki AF configuration ([[phab:T272330|T272330]]) (duration: 01m 03s)
* 00:20 brennen@deploy1001: Synchronized php-1.36.0-wmf.27/extensions/MobileFrontend: Backport: [[gerrit:657702{{!}}Fix toggling storage cleanup (T272638)]] (duration: 01m 07s)
* 00:18 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2372.codfw.wmnet
* 00:10 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2370.codfw.wmnet
* 00:03 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on mw2370.codfw.wmnet with reason: new install on buster
* 00:03 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on mw2370.codfw.wmnet with reason: new install on buster


== 2021-01-21 ==
== 2021-10-19 ==
* 23:57 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2372.codfw.wmnet with reason: REIMAGE
* 23:59 tgr@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:732103{{!}}Reorder some wikis at wgExtraNamespaces and wmgVisualEditorAvailableNamespaces (T293846)]] (duration: 01m 02s)
* 23:55 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mw2370.codfw.wmnet with reason: REIMAGE
* 23:51 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 23:55 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2374.codfw.wmnet with reason: REIMAGE
* 23:48 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 23:54 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2370.codfw.wmnet with reason: REIMAGE
* 23:47 tgr@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:732053{{!}}ruwikiversity: Add 'portal' and 'faculty' namespaces (T293545)]] (duration: 01m 03s)
* 23:53 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2372.codfw.wmnet with reason: REIMAGE
* 23:40 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 23:53 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2374.codfw.wmnet with reason: REIMAGE
* 23:37 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 22:10 brennen: 1.36.0-wmf.27 train status: for avoidance of doubt, no deploys until further notice - sorting out [[phab:T272638|T272638]]
* 23:36 tgr@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:710565{{!}}Set the project namespace and sitename for Javanese Wikipedia and Wiktionary (T287437)]] (duration: 01m 02s)
* 21:27 brennen@deploy1001: rebuilt and synchronized wikiversions files: Revert group2 wikis to 1.36.0-wmf.26
* 23:23 tgr@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:731953{{!}}Create Portal and Portal talk namespace for shiwiki (T288909)]] (duration: 01m 03s)
* 20:06 brennen@deploy1001: rebuilt and synchronized wikiversions files: all wikis to 1.36.0-wmf.27
* 23:23 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 20:04 razzi@cumin1001: END (PASS) - Cookbook sre.druid.reboot-workers (exit_code=0) for Druid public cluster: Reboot Druid nodes - razzi@cumin1001
* 23:15 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 19:51 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|ac99da75f9507e19472ab3020be638262857ec07}}: Migrate WebUIActionsTracking schemas to Event Platform on testwiki ([[phab:T267347|T267347]]; [[phab:T271164|T271164]]) (duration: 01m 03s)
* 23:13 tgr@deploy1002: Synchronized static: Config: [[gerrit:731231{{!}}Repair the size of the logo of Kashmiri Wikipedia (T293342)]] (duration: 02m 14s)
* 19:41 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|4bb9e5d13be702516368774732a9e1711bec42e5}}: Enables the Wikisource extension on oldwikisource ([[phab:T272163|T272163]]) (duration: 01m 04s)
* 21:34 mutante: mwmaint1002 - delete large files over 100MB from puppet clientbucket. sudo /usr/bin/find /var/lib/puppet/clientbucket/ -type f -size +100M -delete {{!}} fixed Icinga alert: RECOVERY - Check for large files in client bucket on mwmaint1002 is OK: OK: [[phab:T165885|T165885]]
* 19:40 urbanecm@deploy1001: Synchronized php-1.36.0-wmf.27/extensions/EventLogging/: {{Gerrit|ee830a5ec2051fa970084e89b477a44c384e309c}}: {{Gerrit|f7152a74e00404fc561c44d1c2e37d7f882e2f52}}: EventLogging backport, see commits for details ([[phab:T253121|T253121]]) (duration: 01m 05s)
* 21:32 mutante: mwmaint1002 - delete large files over 100MB from puppet clientbucket. sudo /usr/bin/find /var/lib/puppet/clientbucket/ -type f -size +100M -delete
* 19:39 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2226.codfw.wmnet
* 20:56 ejegg: updated payments-wiki from {{Gerrit|0f48acea49}} to {{Gerrit|30e596903d}}
* 19:38 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2375.codfw.wmnet
* 19:03 hashar@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.38.0-wmf.5  refs [[phab:T281169|T281169]]
* 19:38 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2373.codfw.wmnet
* 18:46 urbanecm@deploy1002: Synchronized php-1.38.0-wmf.5/extensions/MediaSearch/: {{Gerrit|a84a675}}: {{Gerrit|3231578}}: MediaSearch backports ([[phab:T291392|T291392]], [[phab:T293335|T293335]], [[phab:T291392|T291392]], [[phab:T291622|T291622]], [[phab:T293554|T293554]]) (duration: 01m 03s)
* 19:38 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2371.codfw.wmnet
* 18:45 urbanecm@deploy1002: Synchronized php-1.38.0-wmf.4/extensions/MediaSearch/: {{Gerrit|694580a}}: {{Gerrit|c02e301}}: MediaSearch backports([[phab:T291392|T291392]], [[phab:T293335|T293335]], [[phab:T291392|T291392]], [[phab:T291622|T291622]], [[phab:T293554|T293554]]) (duration: 01m 03s)
* 19:38 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2226.codfw.wmnet
* 18:37 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudmetrics1003.eqiad.wmnet with OS bullseye
* 19:28 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|62c9c35a76e2d065922f8c9f5a58672240dea7de}}: Migrate SuggestedTagsAction to Event Platform on all wikis ([[phab:T267351|T267351]]) (duration: 01m 03s)
* 18:30 foks: deleting 1 more email with deleteUserEmail.php
* 19:21 urbanecm@deploy1001: Synchronized wmf-config/CommonSettings.php: {{Gerrit|0b46c9f1f75fc773f57bfa70521c9eaf20410b9e}}: [no-op] Add notes about load order of Wikisource and Collection extensions ([[phab:T255790|T255790]]) (duration: 01m 11s)
* 18:17 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|1476a2d93}}: {{Gerrit|dd8393c1a0}}: foundationwiki: Restrict sensitive namespaces to editor group ([[phab:T205350|T205350]]) (duration: 01m 03s)
* 19:07 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2375.codfw.wmnet
* 18:12 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host cloudmetrics1003.eqiad.wmnet with OS bullseye
* 19:06 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2373.codfw.wmnet
* 18:12 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|9a2893c7190e615a247674dbf7f87348bf43b91c}}: Enable topic subscriptions as a beta feature on all remaining projects ([[phab:T287802|T287802]]) (duration: 01m 04s)
* 19:06 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2371.codfw.wmnet
* 18:00 legoktm@deploy1002: Synchronized wmf-config/: Add framework for setting $wgLocalHTTPProxy ([[phab:T288848|T288848]]) (2/2) (duration: 01m 06s)
* 19:02 cstone: civicrm revision changed from {{Gerrit|a4caad22b1}} to {{Gerrit|3afb54f6f9}}
* 17:59 legoktm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Add framework for setting $wgLocalHTTPProxy ([[phab:T288848|T288848]]) (1/2) (duration: 01m 05s)
* 18:53 razzi@cumin1001: START - Cookbook sre.druid.reboot-workers for Druid public cluster: Reboot Druid nodes - razzi@cumin1001
* 17:57 foks: removing six email addresses on request (with deleteUserEmail.php)
* 18:52 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2226.codfw.wmnet with reason: REIMAGE
* 17:37 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudmetrics1004.eqiad.wmnet with OS bullseye
* 18:49 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2226.codfw.wmnet with reason: REIMAGE
* 17:25 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudmetrics1003.eqiad.wmnet with OS bullseye
* 18:37 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2373.codfw.wmnet with reason: REIMAGE
* 17:11 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host cloudmetrics1004.eqiad.wmnet with OS bullseye
* 18:35 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mw2375.codfw.wmnet with reason: REIMAGE
* 17:09 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host cloudmetrics1003.eqiad.wmnet with OS bullseye
* 18:35 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2371.codfw.wmnet with reason: REIMAGE
* 16:48 bd808@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'toolhub' for release 'main' .
* 18:34 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2375.codfw.wmnet with reason: REIMAGE
* 16:46 bd808@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'toolhub' for release 'main' .
* 18:34 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2373.codfw.wmnet with reason: REIMAGE
* 16:41 bd808@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'toolhub' for release 'main' .
* 18:33 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2371.codfw.wmnet with reason: REIMAGE
* 16:12 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on 7 hosts with reason: Schema change s3 [[phab:T277118|T277118]]
* 18:21 pt1979@cumin2001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 16:11 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on 7 hosts with reason: Schema change s3 [[phab:T277118|T277118]]
* 18:14 pt1979@cumin2001: START - Cookbook sre.dns.netbox
* 16:09 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 14 hosts with reason: Schema change s1 [[phab:T277118|T277118]]
* 18:12 pt1979@cumin2001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 16:09 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 14 hosts with reason: Schema change s1 [[phab:T277118|T277118]]
* 18:08 pt1979@cumin2001: START - Cookbook sre.dns.netbox
* 16:06 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 13 hosts with reason: Schema change s4 [[phab:T277118|T277118]]
* 18:08 pt1979@cumin2001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
* 16:06 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 13 hosts with reason: Schema change s4 [[phab:T277118|T277118]]
* 18:02 pt1979@cumin2001: START - Cookbook sre.dns.netbox
* 16:00 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 11 hosts with reason: Schema change s7 [[phab:T277118|T277118]]
* 17:42 pt1979@cumin2001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 16:00 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 11 hosts with reason: Schema change s7 [[phab:T277118|T277118]]
* 17:36 pt1979@cumin2001: START - Cookbook sre.dns.netbox
* 15:46 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 9 hosts with reason: Schema change s2 [[phab:T277118|T277118]]
* 17:35 ryankemper: [wdqs] Depooled `wdqs1013` to allow it to catch up on lag
* 15:46 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 9 hosts with reason: Schema change s2 [[phab:T277118|T277118]]
* 16:27 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'similar-users' for release 'main' .
* 15:40 otto@deploy1002: Synchronized wmf-config/InitialiseSettings.php: wgEventStreams - remove now redundant stream setting - [[phab:T277193|T277193]] (duration: 01m 04s)
* 16:14 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host krb1001.eqiad.wmnet
* 15:35 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 9 hosts with reason: Schema change s5 [[phab:T277118|T277118]]
* 16:09 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host krb1001.eqiad.wmnet
* 15:35 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 9 hosts with reason: Schema change s5 [[phab:T277118|T277118]]
* 16:05 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host krb2001.codfw.wmnet
* 15:34 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on 9 hosts with reason: Schema change s6 [[phab:T277118|T277118]]
* 15:59 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host krb2001.codfw.wmnet
* 15:34 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 5:00:00 on 9 hosts with reason: Schema change s6 [[phab:T277118|T277118]]
* 15:13 moritzm: installing cairo security updates on stretch
* 15:32 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 9 hosts with reason: Schema change s6 [[phab:T277118|T277118]]
* 15:12 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'similar-users' for release 'main' .
* 15:32 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 9 hosts with reason: Schema change s6 [[phab:T277118|T277118]]
* 15:11 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'similar-users' for release 'main' .
* 15:30 bd808@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'toolhub' for release 'main' .
* 14:22 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host bast5001.wikimedia.org
* 15:28 bd808@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'toolhub' for release 'main' .
* 14:17 godog: roll-restart swift-object in eqiad to apply new concurrency
* 15:26 bd808@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'toolhub' for release 'main' .
* 14:14 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host bast5001.wikimedia.org
* 15:17 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2025.codfw.wmnet
* 14:13 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host bast4002.wikimedia.org
* 15:12 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2025.codfw.wmnet
* 14:08 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host bast4002.wikimedia.org
* 14:34 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 14:06 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host bast3004.wikimedia.org
* 14:29 jbond: disable puppet on lvs, cp, authdns, mc, mw-be and wcqs to while i merge G:662699
* 13:54 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host bast3004.wikimedia.org
* 14:15 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 13:38 XioNoX: put eqiad/esams lumen link back in service
* 14:11 hashar@deploy1002: Finished scap: testwikis wikis to 1.38.0-wmf.5  refs [[phab:T281169|T281169]] (duration: 45m 13s)
* 12:20 marostegui@cumin1001: dbctl commit (dc=all): 'db1085 (re)pooling @ 100%: After moving wikireplicas to another host', diff saved to https://phabricator.wikimedia.org/P13872 and previous config saved to /var/cache/conftool/dbconfig/20210121-122043-root.json
* 13:52 kevinbazira@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality' for release 'main' .
* 12:05 marostegui@cumin1001: dbctl commit (dc=all): 'db1085 (re)pooling @ 75%: After moving wikireplicas to another host', diff saved to https://phabricator.wikimedia.org/P13871 and previous config saved to /var/cache/conftool/dbconfig/20210121-120540-root.json
* 13:45 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 11:50 marostegui@cumin1001: dbctl commit (dc=all): 'db1085 (re)pooling @ 50%: After moving wikireplicas to another host', diff saved to https://phabricator.wikimedia.org/P13870 and previous config saved to /var/cache/conftool/dbconfig/20210121-115036-root.json
* 13:31 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 11:35 marostegui@cumin1001: dbctl commit (dc=all): 'db1085 (re)pooling @ 25%: After moving wikireplicas to another host', diff saved to https://phabricator.wikimedia.org/P13868 and previous config saved to /var/cache/conftool/dbconfig/20210121-113533-root.json
* 13:28 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 11:29 marostegui: Stop replication on db1085 to move wiki replicas under the other sanitarium host
* 13:26 hashar@deploy1002: Started scap: testwikis wikis to 1.38.0-wmf.5  refs [[phab:T281169|T281169]]
* 11:28 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1085', diff saved to https://phabricator.wikimedia.org/P13867 and previous config saved to /var/cache/conftool/dbconfig/20210121-112849-marostegui.json
* 13:19 marostegui@cumin1001: dbctl commit (dc=all): 'db1105:3311 (re)pooling @ 100%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17547 and previous config saved to /var/cache/conftool/dbconfig/20211019-131927-root.json
* 11:12 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'similar-users' for release 'main' .
* 13:16 marostegui@cumin1001: dbctl commit (dc=all): 'db1105:3312 (re)pooling @ 100%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17546 and previous config saved to /var/cache/conftool/dbconfig/20211019-131651-root.json
* 11:12 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'similar-users' for release 'main' .
* 13:04 marostegui@cumin1001: dbctl commit (dc=all): 'db1105:3311 (re)pooling @ 75%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17545 and previous config saved to /var/cache/conftool/dbconfig/20211019-130424-root.json
* 09:44 hoo: Updated the Wikidata property suggester with data from the 2021-01-11 JSON dump and applied the [[phab:T132839|T132839]] workarounds
* 13:01 marostegui@cumin1001: dbctl commit (dc=all): 'db1105:3312 (re)pooling @ 75%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17544 and previous config saved to /var/cache/conftool/dbconfig/20211019-130147-root.json
* 09:00 marostegui: m1 master restart - [[phab:T271540|T271540]]
* 12:49 marostegui@cumin1001: dbctl commit (dc=all): 'db1105:3311 (re)pooling @ 50%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17543 and previous config saved to /var/cache/conftool/dbconfig/20211019-124920-root.json
* 08:51 jynus: stopping puppet and bacula for backup1001 [[phab:T271540|T271540]]
* 12:46 marostegui@cumin1001: dbctl commit (dc=all): 'db1105:3312 (re)pooling @ 50%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17542 and previous config saved to /var/cache/conftool/dbconfig/20211019-124644-root.json
* 08:43 godog: swift codfw-prod: more weight to ms-be20[58-61] - [[phab:T269337|T269337]]
* 12:40 moritzm: installing aftpd security updates
* 08:37 marostegui: Silence m1 hosts in preparation for the restart [[phab:T271540|T271540]]
* 12:34 marostegui@cumin1001: dbctl commit (dc=all): 'db1105:3311 (re)pooling @ 25%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17541 and previous config saved to /var/cache/conftool/dbconfig/20211019-123416-root.json
* 08:34 godog: roll-restart swift-object in codfw to apply new concurrency
* 12:34 marostegui: Upgrade dbstore1003
* 07:21 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1099:3318', diff saved to https://phabricator.wikimedia.org/P13864 and previous config saved to /var/cache/conftool/dbconfig/20210121-072101-marostegui.json
* 12:31 marostegui@cumin1001: dbctl commit (dc=all): 'db1105:3312 (re)pooling @ 25%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17540 and previous config saved to /var/cache/conftool/dbconfig/20211019-123140-root.json
* 07:03 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repoool db1099:3318', diff saved to https://phabricator.wikimedia.org/P13863 and previous config saved to /var/cache/conftool/dbconfig/20210121-070346-marostegui.json
* 12:19 marostegui@cumin1001: dbctl commit (dc=all): 'db1105:3311 (re)pooling @ 10%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17539 and previous config saved to /var/cache/conftool/dbconfig/20211019-121913-root.json
* 06:55 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repoool db1099:3318', diff saved to https://phabricator.wikimedia.org/P13862 and previous config saved to /var/cache/conftool/dbconfig/20210121-065459-marostegui.json
* 12:16 marostegui@cumin1001: dbctl commit (dc=all): 'db1105:3312 (re)pooling @ 10%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17538 and previous config saved to /var/cache/conftool/dbconfig/20211019-121636-root.json
* 06:54 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1087', diff saved to https://phabricator.wikimedia.org/P13861 and previous config saved to /var/cache/conftool/dbconfig/20210121-065408-marostegui.json
* 12:12 XioNoX: push anycast tuning to all Lumen and NTT transit links - [[phab:T288843|T288843]]
* 06:49 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1087 and pool db1099:3318 into s8 vslow', diff saved to https://phabricator.wikimedia.org/P13860 and previous config saved to /var/cache/conftool/dbconfig/20210121-064903-marostegui.json
* 12:09 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1167 (s8) after upgrade', diff saved to https://phabricator.wikimedia.org/P17537 and previous config saved to /var/cache/conftool/dbconfig/20211019-120918-marostegui.json
* 03:54 milimetric@deploy1001: deploy aborted: Minor typo fix (duration: 01m 39s)
* 12:04 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1167 (s8) for upgrade', diff saved to https://phabricator.wikimedia.org/P17536 and previous config saved to /var/cache/conftool/dbconfig/20211019-120458-marostegui.json
* 03:52 milimetric@deploy1001: Started deploy [analytics/refinery@57589e7]: Minor typo fix
* 12:04 marostegui@cumin1001: dbctl commit (dc=all): 'db1105:3311 (re)pooling @ 5%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17535 and previous config saved to /var/cache/conftool/dbconfig/20211019-120409-root.json
* 01:27 ryankemper: [WDQS Deploy] Rollback complete, service health of `wdqs1003` is restored. Need to investigate source of 404 (possibly related to some recent changes we made in the `gui` repo)
* 12:03 marostegui@cumin1001: dbctl commit (dc=all): 'db1101:3318 (re)pooling @ 100%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17534 and previous config saved to /var/cache/conftool/dbconfig/20211019-120348-root.json
* 01:26 ryankemper@deploy1001: Finished deploy [wdqs/wdqs@70f9d37]: 0.3.60 (duration: 02m 53s)
* 12:01 urbanecm@deploy1002: Synchronized php-1.38.0-wmf.5/extensions/WikibaseMediaInfo/: {{Gerrit|ec0125770775c1a1a54c3b592d86d287fd9e3ad6}}: Escape captions when writing stored data into js state ([[phab:T293556|T293556]]) (duration: 00m 55s)
* 01:26 ryankemper: [WDQS Deploy] Rollback of canary `wdqs1003` initiated
* 12:01 marostegui@cumin1001: dbctl commit (dc=all): 'db1105:3312 (re)pooling @ 5%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17533 and previous config saved to /var/cache/conftool/dbconfig/20211019-120132-root.json
* 01:25 ryankemper: [WDQS Deploy] Automated tests passing on canary`wdqs1003` but manually visiting `http://localhost:9999` (my tunnel to `wdqs1003`) gives `404 Not Found`from nginx; aborting deploy
* 12:00 urbanecm@deploy1002: Synchronized php-1.38.0-wmf.4/extensions/WikibaseMediaInfo/: {{Gerrit|79808a90a95dd5dac2b532b87fb7ec1a490ea0f0}}: Escape captions when writing stored data into js state ([[phab:T293556|T293556]]) (duration: 00m 56s)
* 01:23 ryankemper@deploy1001: Started deploy [wdqs/wdqs@70f9d37]: 0.3.60
* 12:00 marostegui@cumin1001: dbctl commit (dc=all): 'db1101:3317 (re)pooling @ 100%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17532 and previous config saved to /var/cache/conftool/dbconfig/20211019-120024-root.json
* 01:22 ryankemper: [WDQS Deploy] Tests on canary `wdqs1003` passing before start of deploy, proceeding with deploy of wdqs `0.3.60` to canary
* 11:58 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 00:44 legoktm: legoktm@mwmaint1002:~$ mwscript initSiteStats.php --wiki=trwikivoyage --update
* 11:56 XioNoX: push anycast tuning to Tele2, Init7, DT transit links - [[phab:T288843|T288843]]
* 00:19 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2369.codfw.wmnet
* 11:55 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 00:19 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2367.codfw.wmnet
* 11:48 marostegui@cumin1001: dbctl commit (dc=all): 'db1101:3318 (re)pooling @ 75%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17531 and previous config saved to /var/cache/conftool/dbconfig/20211019-114844-root.json
* 00:19 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2365.codfw.wmnet
* 11:46 marostegui: Upgrade db1105 (s1,s2)
* 00:19 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2363.codfw.wmnet
* 11:46 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1105 (s1,s2) for upgrade', diff saved to https://phabricator.wikimedia.org/P17530 and previous config saved to /var/cache/conftool/dbconfig/20211019-114649-marostegui.json
* 00:18 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2369.codfw.wmnet
* 11:45 marostegui@cumin1001: dbctl commit (dc=all): 'db1101:3317 (re)pooling @ 75%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17529 and previous config saved to /var/cache/conftool/dbconfig/20211019-114520-root.json
* 00:18 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2365.codfw.wmnet
* 11:33 marostegui@cumin1001: dbctl commit (dc=all): 'db1101:3318 (re)pooling @ 50%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17527 and previous config saved to /var/cache/conftool/dbconfig/20211019-113340-root.json
* 00:18 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2367.codfw.wmnet
* 11:30 marostegui@cumin1001: dbctl commit (dc=all): 'db1101:3317 (re)pooling @ 50%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17526 and previous config saved to /var/cache/conftool/dbconfig/20211019-113017-root.json
* 00:17 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2363.codfw.wmnet
* 11:18 marostegui@cumin1001: dbctl commit (dc=all): 'db1101:3318 (re)pooling @ 25%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17525 and previous config saved to /var/cache/conftool/dbconfig/20211019-111837-root.json
* 11:15 marostegui@cumin1001: dbctl commit (dc=all): 'db1101:3317 (re)pooling @ 25%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17524 and previous config saved to /var/cache/conftool/dbconfig/20211019-111513-root.json
* 11:11 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 11:09 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 11:08 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|7c31b04e50101a60db7ae8acae64bc031f5e1007}}: DPL: Explicitly note it is not possible to enable DPL on any more wikis (duration: 00m 55s)
* 11:03 marostegui@cumin1001: dbctl commit (dc=all): 'db1101:3318 (re)pooling @ 10%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17523 and previous config saved to /var/cache/conftool/dbconfig/20211019-110333-root.json
* 11:02 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2025.codfw.wmnet
* 11:00 marostegui@cumin1001: dbctl commit (dc=all): 'db1101:3317 (re)pooling @ 10%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17522 and previous config saved to /var/cache/conftool/dbconfig/20211019-110009-root.json
* 10:56 marostegui: Upgrade clouddb1021
* 10:53 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2025.codfw.wmnet
* 10:51 moritzm: failover master in ganeti-test to ganeti2026
* 10:50 godog: bounce superset on an-tool1005 to pick up statsd changes - [[phab:T247963|T247963]]
* 10:49 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2112.codfw.wmnet with OS stretch
* 10:48 marostegui@cumin1001: dbctl commit (dc=all): 'db1101:3318 (re)pooling @ 5%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17521 and previous config saved to /var/cache/conftool/dbconfig/20211019-104829-root.json
* 10:45 godog: bounce navtiming on webperf1001 to pick up statsd changes - [[phab:T247963|T247963]]
* 10:45 godog: bounce superset on an-tool1010 to pick up statsd changes - [[phab:T247963|T247963]]
* 10:45 marostegui@cumin1001: dbctl commit (dc=all): 'db1101:3317 (re)pooling @ 5%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17520 and previous config saved to /var/cache/conftool/dbconfig/20211019-104506-root.json
* 10:38 oblivian@deploy1002: Synchronized w/static.php: Config: [[gerrit:730182{{!}}static.php: Add support for /static/current rewrites (take 2) (T285232)]] (duration: 00m 55s)
* 10:38 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2026.codfw.wmnet
* 10:37 marostegui: Upgrade db1101 (s7,s8)
* 10:36 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1101 (s7,s8) for upgrade', diff saved to https://phabricator.wikimedia.org/P17519 and previous config saved to /var/cache/conftool/dbconfig/20211019-103634-marostegui.json
* 10:34 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 10:31 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 10:29 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
* 10:28 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 10:27 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2026.codfw.wmnet
* 10:22 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 10:22 oblivian@deploy1002: Synchronized tests/WmfConfigServicesTest.php: Config: [[gerrit:731918{{!}}ProductionServices: use graphite2003 for statsd (T247963)]] (duration: 00m 54s)
* 10:22 godog: flip mw statsd traffic with https://gerrit.wikimedia.org/r/c/operations/mediawiki-config/+/731918 - [[phab:T247963|T247963]]
* 10:21 oblivian@deploy1002: Synchronized wmf-config/ProductionServices.php: Config: [[gerrit:731918{{!}}ProductionServices: use graphite2003 for statsd (T247963)]] (duration: 00m 54s)
* 10:20 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 10:18 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host db2112.codfw.wmnet with OS stretch
* 10:16 marostegui@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db2112.codfw.wmnet with OS buster
* 09:52 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host db2112.codfw.wmnet with OS buster
* 09:50 marostegui@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db2112.codfw.wmnet with OS buster
* 09:44 hashar@deploy1002: Pruned MediaWiki: 1.38.0-wmf.3 (duration: 01m 39s)
* 09:42 hashar@deploy1002: Pruned MediaWiki: 1.38.0-wmf.2 (duration: 16m 06s)
* 09:37 godog: move graphite/statsd writes to graphite2003 - [[phab:T247963|T247963]]
* 09:34 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host db2112.codfw.wmnet with OS buster
* 09:27 hashar: sap clean --delete 1.38.0-wmf.2 && scap clean --delete 1.38.0-wmf.3  # [[phab:T281169|T281169]]
* 09:27 hashar: Cloned and applied security patches for 1.38.0-wmf.5 # [[phab:T281169|T281169]]
* 09:19 marostegui: Stop slave on db2112 [[phab:T290865|T290865]]
* 09:18 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on 14 hosts with reason: Schema change s1 [[phab:T281058|T281058]]
* 09:18 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on 14 hosts with reason: Schema change s1 [[phab:T281058|T281058]]
* 09:03 XioNoX: push anycast tuning to all Telia transit links - [[phab:T288843|T288843]]
* 08:50 godog: point graphite.discovery.wmnet to graphite2003 - [[phab:T247963|T247963]]
* 08:40 XioNoX: push prep-work for anycast tuning to all sites - [[phab:T288843|T288843]]
* 08:33 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on 13 hosts with reason: Schema change s8 [[phab:T281058|T281058]]
* 08:33 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on 13 hosts with reason: Schema change s8 [[phab:T281058|T281058]]
* 08:32 urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript namespaceDupes.php hrwiki --fix
* 08:17 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 08:07 mvernon@cumin2002: conftool action : set/pooled=false; selector: name=codfw,dnsdisc=swift
* 08:07 mvernon@cumin2002: conftool action : set/pooled=false; selector: name=codfw,dnsdisc=swift-ro
* 08:03 XioNoX: push prep-work for anycast tuning in ulsfo (try 2) - [[phab:T288843|T288843]]
* 08:01 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 07:32 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 07:24 ema: A:cp start rolling varnish upgrades to 6.0.8-1wm1 [[phab:T292290|T292290]]
* 07:21 marostegui@cumin1001: dbctl commit (dc=all): 'db1178 (re)pooling @ 100%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17517 and previous config saved to /var/cache/conftool/dbconfig/20211019-072111-root.json
* 07:15 marostegui@cumin1001: dbctl commit (dc=all): 'db1184 (re)pooling @ 100%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17516 and previous config saved to /var/cache/conftool/dbconfig/20211019-071519-root.json
* 07:06 marostegui@cumin1001: dbctl commit (dc=all): 'db1178 (re)pooling @ 75%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17515 and previous config saved to /var/cache/conftool/dbconfig/20211019-070607-root.json
* 07:00 marostegui@cumin1001: dbctl commit (dc=all): 'db1184 (re)pooling @ 75%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17514 and previous config saved to /var/cache/conftool/dbconfig/20211019-070016-root.json
* 06:51 marostegui@cumin1001: dbctl commit (dc=all): 'db1178 (re)pooling @ 50%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17513 and previous config saved to /var/cache/conftool/dbconfig/20211019-065104-root.json
* 06:45 marostegui@cumin1001: dbctl commit (dc=all): 'db1184 (re)pooling @ 50%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17512 and previous config saved to /var/cache/conftool/dbconfig/20211019-064512-root.json
* 06:38 marostegui@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db2112.codfw.wmnet with OS buster
* 06:36 marostegui@cumin1001: dbctl commit (dc=all): 'db1178 (re)pooling @ 25%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17511 and previous config saved to /var/cache/conftool/dbconfig/20211019-063559-root.json
* 06:30 marostegui@cumin1001: dbctl commit (dc=all): 'db1184 (re)pooling @ 25%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17510 and previous config saved to /var/cache/conftool/dbconfig/20211019-063008-root.json
* 06:20 marostegui@cumin1001: dbctl commit (dc=all): 'db1178 (re)pooling @ 10%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17509 and previous config saved to /var/cache/conftool/dbconfig/20211019-062054-root.json
* 06:15 marostegui@cumin1001: dbctl commit (dc=all): 'db1184 (re)pooling @ 10%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17508 and previous config saved to /var/cache/conftool/dbconfig/20211019-061505-root.json
* 06:06 marostegui: Upgrade dbstore1005
* 06:05 marostegui@cumin1001: dbctl commit (dc=all): 'db1178 (re)pooling @ 5%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17507 and previous config saved to /var/cache/conftool/dbconfig/20211019-060551-root.json
* 06:04 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 06:03 marostegui: Upgrade db1184, db1178
* 06:01 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1178 for upgrade', diff saved to https://phabricator.wikimedia.org/P17506 and previous config saved to /var/cache/conftool/dbconfig/20211019-060123-marostegui.json
* 06:00 marostegui@cumin1001: dbctl commit (dc=all): 'db1184 (re)pooling @ 5%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17505 and previous config saved to /var/cache/conftool/dbconfig/20211019-060001-root.json
* 05:54 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1184 for upgrade', diff saved to https://phabricator.wikimedia.org/P17504 and previous config saved to /var/cache/conftool/dbconfig/20211019-055429-marostegui.json
* 05:51 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host db2112.codfw.wmnet with OS buster
* 05:46 marostegui: Reimage db2112 (s1 codfw master) [[phab:T290865|T290865]]
* 04:36 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 03:49 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 02:36 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 02:34 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 02:21 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 02:18 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 02:09 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 02:06 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 00:38 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer


== 2021-01-20 ==
== 2021-10-18 ==
* 23:51 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2369.codfw.wmnet with reason: REIMAGE
* 23:40 hoo: Updated the Wikidata property suggester with data from the 2021-10-04 JSON dump (with pre-applied [[phab:T132839|T132839]] workarounds)
* 23:51 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mw2365.codfw.wmnet with reason: REIMAGE
* 23:16 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|b654980240d51fff3c6e9c48f7076d4609c2560f}}: Create an alias for the Draft namespace on hrwiki ([[phab:T291755|T291755]]) (duration: 00m 56s)
* 23:49 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2367.codfw.wmnet with reason: REIMAGE
* 23:16 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 23:47 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2363.codfw.wmnet with reason: REIMAGE
* 23:13 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 23:47 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2369.codfw.wmnet with reason: REIMAGE
* 23:12 urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript namespaceDupes.php --wiki=thwiktionary --fix # [[phab:T291761|T291761]]
* 23:47 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2367.codfw.wmnet with reason: REIMAGE
* 23:10 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|abe777d28594da852e49ccb1c1597b2598f3e483}}: Create Rhymes namespace for thwiktionary ([[phab:T291761|T291761]]) (duration: 00m 57s)
* 23:46 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2365.codfw.wmnet with reason: REIMAGE
* 23:04 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 23:45 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2363.codfw.wmnet with reason: REIMAGE
* 23:01 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 23:30 mutante: releases2002 - rebooting VM
* 22:56 legoktm@deploy1002: Synchronized php-1.38.0-wmf.4/includes/http/MWHttpRequest.php: Allow using a reverse proxy for local HTTP requests ([[phab:T288848|T288848]]) (duration: 00m 56s)
* 23:25 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2361.codfw.wmnet
* 22:06 maryum: deployed security patch for [[phab:T293589|T293589]]
* 23:25 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2359.codfw.wmnet
* 21:23 maryum: deployed security patch for [[phab:T293556|T293556]]
* 23:25 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2355.codfw.wmnet
* 21:05 mutante: mwmaint1002 - sudo -u www-data /usr/local/bin/mw-cli-wrapper /usr/local/bin/mwscript extensions/TranslationNotifications/scripts/DigestEmailer.php --wiki mediawikiwiki {{!}} Fatal error: Uncaught Error: Class 'MediaWiki\MediaWikiServices' not found
* 23:25 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2357.codfw.wmnet
* 20:58 mutante: mwmaint1002 - attempt to start mediawiki_job_translationnotifications-mediawikiwiki which was alerting as failed
* 23:22 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on releases2002.codfw.wmnet with reason: rebooting to add a disk
* 20:41 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 23:22 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on releases2002.codfw.wmnet with reason: rebooting to add a disk
* 20:38 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 23:10 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2357.codfw.wmnet
* 19:46 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 23:07 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2361.codfw.wmnet
* 19:42 pt1979@cumin2002: START - Cookbook sre.dns.netbox
* 23:07 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2359.codfw.wmnet
* 19:29 mutante: LDAP: removed non-existent user gerrit2 from group labsadminbots ([[phab:T160122|T160122]])
* 23:06 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2355.codfw.wmnet
* 19:29 urbanecm@deploy1002: Synchronized php-1.38.0-wmf.4/extensions/MediaSearch/resources/store/state.js: {{Gerrit|ac7b4fc2ccc69589e00a42f49d18a8f6d71777f2}}: Revert 727328 ([[phab:T293554|T293554]]) (duration: 00m 56s)
* 23:03 legoktm: updated docker-registry.discovery.wmnet/wikimedia-buster image
* 19:29 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 23:01 mutante: mw2331, mw2333 - scap pull
* 19:26 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 22:49 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on mw2359.codfw.wmnet with reason: new install on buster
* 19:12 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 22:49 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on mw2359.codfw.wmnet with reason: new install on buster
* 19:09 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 22:48 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mw2359.codfw.wmnet with reason: REIMAGE
* 18:45 otto@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Convert $wgEventStreams to be an associative array - [[phab:T277193|T277193]] (duration: 00m 57s)
* 22:47 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2361.codfw.wmnet with reason: REIMAGE
* 18:45 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 22:45 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2357.codfw.wmnet with reason: REIMAGE
* 18:42 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 22:44 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2361.codfw.wmnet with reason: REIMAGE
* 18:07 mutante: gerrit - removed tonina from wmde-mediawiki gerrit group ([[phab:T293621|T293621]])
* 22:43 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2359.codfw.wmnet with reason: REIMAGE
* 17:51 mutante: puppet run on all bastion hosts via cumin
* 22:43 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2357.codfw.wmnet with reason: REIMAGE
* 15:32 mvernon@cumin2002: END (FAIL) - Cookbook sre.discovery.service-route (exit_code=99)
* 22:43 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2355.codfw.wmnet with reason: REIMAGE
* 15:32 mvernon@cumin2002: START - Cookbook sre.discovery.service-route
* 22:41 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2355.codfw.wmnet with reason: REIMAGE
* 15:23 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 18:00:00 on 7 hosts with reason: Schema change s3 [[phab:T281058|T281058]]
* 22:19 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on mw2327.codfw.wmnet with reason: new install on buster
* 15:23 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 18:00:00 on 7 hosts with reason: Schema change s3 [[phab:T281058|T281058]]
* 22:19 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on mw2327.codfw.wmnet with reason: new install on buster
* 15:16 herron: reprepro copied anycast-healthchecker, python3-json-logger and python3-anycast-healthchecker from buster-wikimedia to bullseye-wikimedia [[phab:T292196|T292196]]
* 22:18 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2353.codfw.wmnet
* 15:16 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on 13 hosts with reason: Schema change s4 [[phab:T281058|T281058]]
* 22:17 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2351.codfw.wmnet
* 15:16 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on 13 hosts with reason: Schema change s4 [[phab:T281058|T281058]]
* 22:17 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2339.codfw.wmnet
* 14:59 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 11 hosts with reason: Schema change s7 [[phab:T281058|T281058]]
* 22:17 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2338.codfw.wmnet
* 14:59 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 11 hosts with reason: Schema change s7 [[phab:T281058|T281058]]
* 22:16 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2353.codfw.wmnet
* 14:54 herron: rebuilt and uploaded kafkatee for bullseye [[phab:T292196|T292196]]
* 22:16 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2351.codfw.wmnet
* 14:50 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 22:16 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2339.codfw.wmnet
* 14:45 pt1979@cumin2002: START - Cookbook sre.dns.netbox
* 22:15 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2338.codfw.wmnet
* 14:36 phuedx@deploy1002: Synchronized wmf-config/InitialiseSettings-labs.php: Config: [[gerrit:731346{{!}}[beta] Rename $wgIPInfoGeoIP2Path to $wgIPInfoGeoIP2Prefix (T289361)]] (duration: 00m 56s)
* 21:35 milimetric@deploy1001: Finished deploy [analytics/refinery@1313244] (thin): Regular analytics weekly train THIN [analytics/refinery@1313244] (duration: 00m 07s)
* 14:36 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 21:35 milimetric@deploy1001: Started deploy [analytics/refinery@1313244] (thin): Regular analytics weekly train THIN [analytics/refinery@1313244]
* 14:33 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 21:34 milimetric@deploy1001: Finished deploy [analytics/refinery@1313244]: Regular analytics weekly train [analytics/refinery@1313244] (duration: 10m 52s)
* 14:15 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 21:24 milimetric@deploy1001: Started deploy [analytics/refinery@1313244]: Regular analytics weekly train [analytics/refinery@1313244]
* 14:09 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 21:21 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mw2351.codfw.wmnet with reason: REIMAGE
* 13:54 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 21:19 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2353.codfw.wmnet with reason: REIMAGE
* 13:51 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 21:19 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc-gp2003.codfw.wmnet
* 13:48 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings-labs.php: Config: [[gerrit:731015{{!}}Remove wmg variables for dispatch via jobs (T291828)]] (2/2) (duration: 00m 56s)
* 21:17 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2338.codfw.wmnet with reason: REIMAGE
* 13:47 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:731015{{!}}Remove wmg variables for dispatch via jobs (T291828)]] (1/2) (duration: 00m 56s)
* 21:16 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2353.codfw.wmnet with reason: REIMAGE
* 13:37 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 21:16 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2351.codfw.wmnet with reason: REIMAGE
* 13:35 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/Wikibase.php: Config: [[gerrit:731014{{!}}Unconditionally enable Wikibase dispatching via jobs (T291828)]] (duration: 00m 56s)
* 21:15 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mw2339.codfw.wmnet with reason: REIMAGE
* 13:34 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 21:15 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2339.codfw.wmnet with reason: REIMAGE
* 12:22 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2079.codfw.wmnet with OS buster
* 21:14 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2338.codfw.wmnet with reason: REIMAGE
* 12:04 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 21:13 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc-gp2003.codfw.wmnet
* 12:02 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 21:02 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc-gp2002.codfw.wmnet
* 11:55 Lucas_WMDE: UTC morning backport window done
* 20:56 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc-gp2002.codfw.wmnet
* 11:55 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings-labs.php: Config: [[gerrit:730748{{!}}Remove $wmgWikibaseDispatchViaJobsAllowedClients (T291828)]] (2/2) (duration: 00m 56s)
* 20:48 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc-gp2001.codfw.wmnet
* 11:54 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:730748{{!}}Remove $wmgWikibaseDispatchViaJobsAllowedClients (T291828)]] (1/2) (duration: 00m 56s)
* 20:46 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2337.codfw.wmnet
* 11:53 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 20:46 cdanis: ✔️ cdanis@cumin1001.eqiad.wmnet ~ 🕞🍵 sudo cumin A:cp 'enable-puppet "cdanis deploying {{Gerrit|I558346d}} [[phab:T272330|T272330]]"'
* 11:51 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host db2079.codfw.wmnet with OS buster
* 20:46 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2335.codfw.wmnet
* 11:50 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 20:46 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2333.codfw.wmnet
* 11:49 marostegui: Reimage db2079 (codfw s8 master) [[phab:T290868|T290868]]
* 20:45 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2331.codfw.wmnet
* 11:48 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/Wikibase.php: Config: [[gerrit:730747{{!}}Set dispatchViaJobsAllowedClients to null everywhere (T291828)]] (duration: 00m 56s)
* 20:45 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2337.codfw.wmnet
* 11:41 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 20:45 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2335.codfw.wmnet
* 11:38 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 20:44 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2333.codfw.wmnet
* 11:37 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.38.0-wmf.4/extensions/Wikibase/repo/includes/ChangeModification/DispatchChangesJob.php: Backport: [[gerrit:731239{{!}}Make deduplication actually work for DispatchChangesJob (T291118)]] (duration: 00m 55s)
* 20:44 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2331.codfw.wmnet
* 11:10 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.38.0-wmf.4/extensions/Wikibase/repo/includes/Hooks/RecentChangeSaveHookHandler.php: Backport: [[gerrit:731238{{!}}Create DispatchChangesJob without change id (T291118)]] (2/2) (duration: 00m 56s)
* 20:41 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc-gp2001.codfw.wmnet
* 11:09 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 20:41 effie: restart mc-gp2001, mc-gp2002, mc-gp2003 for [[phab:T269596|T269596]]
* 11:09 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.38.0-wmf.4/extensions/Wikibase/repo/includes/ChangeModification/DispatchChangesJob.php: Backport: [[gerrit:731238{{!}}Create DispatchChangesJob without change id (T291118)]] (duration: 00m 56s)
* 20:31 brennen@deploy1001: Synchronized php: group1 wikis to 1.36.0-wmf.27 (duration: 03m 05s)
* 11:07 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 20:28 brennen@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.36.0-wmf.27
* 10:55 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 20:23 brennen: 1.36.0-wmf.27 ([[phab:T271341|T271341]]) train: proceeding to group1
* 10:51 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 20:21 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 10:47 moritzm: copied wmf-certificates from buster-wikimedia to stretch-wikimedia in reprepro
* 20:17 cdanis: ✔️ cdanis@cumin1001.eqiad.wmnet ~ 🕒🍵 sudo cumin A:cp 'disable-puppet "cdanis deploying {{Gerrit|I558346d}} [[phab:T272330|T272330]]"'
* 10:38 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.38.0-wmf.4/extensions/Wikibase/repo/: Backport: [[gerrit:731237{{!}}Don't filter by change Id when dispatching to client wikis ()]] (duration: 00m 59s)
* 20:16 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 09:48 moritzm: installing node-tar security updates on buster
* 20:12 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 09:39 vgutierrez: updating acme-chief to version 0.34 on acmechief instances - [[phab:T292619|T292619]]
* 20:06 brennen: 1.36.0-wmf.27 ([[phab:T271341|T271341]]) train status as of deploy window: currently blocked at group0 on [[phab:T272508|T272508]]
* 09:38 godog: sync metrics from graphite1004 to graphite2003 - [[phab:T247963|T247963]]
* 20:06 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 09:13 moritzm: installing apr security updates on bullseye
* 19:50 bblack: lvs1015: bringing pybal back online
* 08:57 godog: cleanup graphite metrics not modified for >= ~3yr (1024 days)
* 19:47 bblack: lvs1015: stopping pybal to try to fix a lingering ifup service state issue on the host, which may require downing an interface
* 07:34 ema: cp3060 (text), cp3061 (upload): upgrade varnish to 6.0.8 [[phab:T292290|T292290]]
* 19:33 urbanecm@deploy1001: Synchronized static/images/project-logos: {{Gerrit|5c941678ec739dd6b5257b4a8f866b7e3a257f45}}: Revert: [enwiki] Update celebration logo to "option A" ([[phab:T272526|T272526]]) (duration: 01m 04s)
* 07:34 elukey: depool + restart blazegraph on wdqs1013
* 19:24 effie: depool and repool thumbor* to upgrade python-thumbor-wikimedia to v2.9
* 07:01 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 19:24 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mw2337.codfw.wmnet with reason: REIMAGE
* 06:31 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 19:22 urbanecm@deploy1001: Synchronized static/images/project-logos: {{Gerrit|13fb338249b3ec73e380c4971ee697f28a2f6d76}}: [enwiki] Update celebration logo to "option A" ([[phab:T272526|T272526]]) (duration: 01m 05s)
* 06:09 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 19:22 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2335.codfw.wmnet with reason: REIMAGE
* 19:20 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2333.codfw.wmnet with reason: REIMAGE
* 19:19 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2337.codfw.wmnet with reason: REIMAGE
* 19:18 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2331.codfw.wmnet with reason: REIMAGE
* 19:18 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2335.codfw.wmnet with reason: REIMAGE
* 19:17 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2333.codfw.wmnet with reason: REIMAGE
* 19:16 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2331.codfw.wmnet with reason: REIMAGE
* 19:12 urbanecm@deploy1001: Synchronized wmf-config/config/kuwiki.yaml: {{Gerrit|a736d97463e7a42b41dbcff19a8c2c3c62f8bf6d}}: Enable visualeditor on kuwiki by default ([[phab:T270841|T270841]]; 2/2) (duration: 01m 05s)
* 19:11 XioNoX: add BGP to Lumen in eqiad
* 19:11 urbanecm@deploy1001: Synchronized dblists/visualeditor-nondefault.dblist: {{Gerrit|a736d97463e7a42b41dbcff19a8c2c3c62f8bf6d}}: Enable visualeditor on kuwiki by default ([[phab:T270841|T270841]]; 1/2) (duration: 01m 04s)
* 18:54 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2325.codfw.wmnet
* 18:53 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2327.codfw.wmnet
* 18:53 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2329.codfw.wmnet
* 18:53 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2316.codfw.wmnet
* 18:51 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2329.codfw.wmnet
* 18:51 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2327.codfw.wmnet
* 18:51 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2325.codfw.wmnet
* 18:51 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2316.codfw.wmnet
* 18:42 brennen@deploy1001: Synchronized php-1.36.0-wmf.27/extensions/AbuseFilter/includes/View/AbuseFilterViewDiff.php: Backport: [[gerrit:657366{{!}}Catch ClosestFilterVersionNotFoundException in ViewDiff (T272505)]] (duration: 01m 06s)
* 18:29 bblack: lvs1015: re-enabling puppet + pybal - [[phab:T272258|T272258]]
* 18:25 XioNoX: draining esams-eqiad link
* 18:24 mutante: ganeti - creating 150G virtual hard disk and adding it to releases2002 for [[phab:T272092|T272092]]
* 18:22 mutante: ganeti - creating 105G virtual harddisk and adding to releases1002 for [[phab:T272092|T272092]]
* 18:17 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on mw2327.codfw.wmnet with reason: new install on buster
* 18:17 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on mw2327.codfw.wmnet with reason: new install on buster
* 18:15 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mw2327.codfw.wmnet with reason: REIMAGE
* 18:14 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2329.codfw.wmnet with reason: REIMAGE
* 18:12 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2325.codfw.wmnet with reason: REIMAGE
* 18:10 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2329.codfw.wmnet with reason: REIMAGE
* 18:10 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2316.codfw.wmnet with reason: REIMAGE
* 18:10 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2327.codfw.wmnet with reason: REIMAGE
* 18:09 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2325.codfw.wmnet with reason: REIMAGE
* 18:08 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2316.codfw.wmnet with reason: REIMAGE
* 18:01 bblack: lvs1015 - shutdown for [[phab:T272258|T272258]]
* 17:58 volans@cumin2001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 17:54 bblack: lvs1015: stopping pybal with puppet disabled for [[phab:T272258|T272258]]
* 17:43 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'similar-users' for release 'main' .
* 17:40 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'similar-users' for release 'main' .
* 17:31 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'similar-users' for release 'main' .
* 17:24 volans@cumin2001: START - Cookbook sre.dns.netbox
* 16:44 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2061.codfw.wmnet
* 16:37 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be2061.codfw.wmnet
* 16:37 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2060.codfw.wmnet
* 16:29 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be2060.codfw.wmnet
* 16:29 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2059.codfw.wmnet
* 16:20 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be2059.codfw.wmnet
* 16:04 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2058.codfw.wmnet
* 15:59 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'similar-users' for release 'main' .
* 15:58 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'similar-users' for release 'main' .
* 15:57 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be2058.codfw.wmnet
* 15:56 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2057.codfw.wmnet
* 15:55 elukey@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventstreams-internal' for release 'main' .
* 15:49 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be2057.codfw.wmnet
* 15:47 kormat@cumin1001: dbctl commit (dc=all): 'db1109 (re)pooling @ 100%: Reboot [[phab:T272255|T272255]]', diff saved to https://phabricator.wikimedia.org/P13858 and previous config saved to /var/cache/conftool/dbconfig/20210120-154726-kormat.json
* 15:46 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2056.codfw.wmnet
* 15:46 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'similar-users' for release 'main' .
* 15:46 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'similar-users' for release 'test' .
* 15:46 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'similar-users' for release 'canary' .
* 15:43 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'similar-users' for release 'canary' .
* 15:43 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'similar-users' for release 'main' .
* 15:43 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'similar-users' for release 'test' .
* 15:40 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be2056.codfw.wmnet
* 15:34 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'similar-users' for release 'test' .
* 15:34 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'similar-users' for release 'canary' .
* 15:34 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'similar-users' for release 'main' .
* 15:32 kormat@cumin1001: dbctl commit (dc=all): 'db1109 (re)pooling @ 75%: Reboot [[phab:T272255|T272255]]', diff saved to https://phabricator.wikimedia.org/P13857 and previous config saved to /var/cache/conftool/dbconfig/20210120-153223-kormat.json
* 15:32 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'similar-users' for release 'canary' .
* 15:32 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'similar-users' for release 'main' .
* 15:32 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'similar-users' for release 'test' .
* 15:24 brennen@deploy1001: rebuilt and synchronized wikiversions files: group0 wikis to 1.36.0-wmf.27
* 15:23 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2054.codfw.wmnet
* 15:18 brennen: 1.36.0-wmf.27 train unblocked, proceeding to group0 ([[phab:T271341|T271341]])
* 15:17 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be2054.codfw.wmnet
* 15:17 kormat@cumin1001: dbctl commit (dc=all): 'db1109 (re)pooling @ 50%: Reboot [[phab:T272255|T272255]]', diff saved to https://phabricator.wikimedia.org/P13856 and previous config saved to /var/cache/conftool/dbconfig/20210120-151719-kormat.json
* 15:17 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2053.codfw.wmnet
* 15:15 kormat@cumin1001: dbctl commit (dc=all): 'db1076 (re)pooling @ 100%: Reboot [[phab:T272255|T272255]]', diff saved to https://phabricator.wikimedia.org/P13855 and previous config saved to /var/cache/conftool/dbconfig/20210120-151555-kormat.json
* 15:12 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be2053.codfw.wmnet
* 15:08 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2052.codfw.wmnet
* 15:02 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be2052.codfw.wmnet
* 15:02 kormat@cumin1001: dbctl commit (dc=all): 'db1109 (re)pooling @ 25%: Reboot [[phab:T272255|T272255]]', diff saved to https://phabricator.wikimedia.org/P13854 and previous config saved to /var/cache/conftool/dbconfig/20210120-150216-kormat.json
* 15:01 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2051.codfw.wmnet
* 15:00 kormat@cumin1001: dbctl commit (dc=all): 'db1076 (re)pooling @ 66%: Reboot [[phab:T272255|T272255]]', diff saved to https://phabricator.wikimedia.org/P13853 and previous config saved to /var/cache/conftool/dbconfig/20210120-150051-kormat.json
* 14:59 elukey@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'kube-system' for release 'rbac-deploy-clusterrole' .
* 14:57 otto@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Migrate QuickSurveys schemas to EventGate on all wikis - [[phab:T271165|T271165]], [[phab:T271166|T271166]] (duration: 01m 05s)
* 14:56 kormat@cumin1001: dbctl commit (dc=all): 'db1109 depooling: Rebooting for [[phab:T272255|T272255]]', diff saved to https://phabricator.wikimedia.org/P13852 and previous config saved to /var/cache/conftool/dbconfig/20210120-145605-kormat.json
* 14:56 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on db1109.eqiad.wmnet with reason: Rebooting for [[phab:T272255|T272255]]
* 14:56 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:30:00 on db1109.eqiad.wmnet with reason: Rebooting for [[phab:T272255|T272255]]
* 14:55 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be2051.codfw.wmnet
* 14:53 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2050.codfw.wmnet
* 14:47 otto@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Migrate QuickSurveys schemas to EventGate on testwiki - [[phab:T271165|T271165]], [[phab:T271166|T271166]] (duration: 01m 06s)
* 14:46 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be2050.codfw.wmnet
* 14:46 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2049.codfw.wmnet
* 14:45 kormat@cumin1001: dbctl commit (dc=all): 'db1076 (re)pooling @ 33%: Reboot [[phab:T272255|T272255]]', diff saved to https://phabricator.wikimedia.org/P13851 and previous config saved to /var/cache/conftool/dbconfig/20210120-144547-kormat.json
* 14:40 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be2049.codfw.wmnet
* 14:40 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2048.codfw.wmnet
* 14:34 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be2048.codfw.wmnet
* 14:32 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2047.codfw.wmnet
* 14:32 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'similar-users' for release 'test' .
* 14:32 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'similar-users' for release 'canary' .
* 14:32 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'similar-users' for release 'main' .
* 14:26 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be2047.codfw.wmnet
* 14:26 kormat@cumin1001: dbctl commit (dc=all): 'db1076 depooling: Rebooting for [[phab:T272255|T272255]]', diff saved to https://phabricator.wikimedia.org/P13850 and previous config saved to /var/cache/conftool/dbconfig/20210120-142636-kormat.json
* 14:26 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on db1076.eqiad.wmnet with reason: Rebooting for [[phab:T272255|T272255]]
* 14:26 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:30:00 on db1076.eqiad.wmnet with reason: Rebooting for [[phab:T272255|T272255]]
* 14:26 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2046.codfw.wmnet
* 14:21 kormat@cumin1001: dbctl commit (dc=all): 'Repooling after reboot. [[phab:T272255|T272255]]', diff saved to https://phabricator.wikimedia.org/P13849 and previous config saved to /var/cache/conftool/dbconfig/20210120-142139-kormat.json
* 14:20 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be2046.codfw.wmnet
* 14:19 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2045.codfw.wmnet
* 14:14 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be2045.codfw.wmnet
* 14:13 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2044.codfw.wmnet
* 14:12 kormat@cumin1001: dbctl commit (dc=all): 'db1075 depooling: Rebooting for [[phab:T272255|T272255]]', diff saved to https://phabricator.wikimedia.org/P13848 and previous config saved to /var/cache/conftool/dbconfig/20210120-141230-kormat.json
* 14:12 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on db1075.eqiad.wmnet with reason: Rebooting for [[phab:T272255|T272255]]
* 14:12 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:30:00 on db1075.eqiad.wmnet with reason: Rebooting for [[phab:T272255|T272255]]
* 14:07 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be2044.codfw.wmnet
* 14:07 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2043.codfw.wmnet
* 14:00 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be2043.codfw.wmnet
* 14:00 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2042.codfw.wmnet
* 13:55 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be2042.codfw.wmnet
* 13:41 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2041.codfw.wmnet
* 13:35 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be2041.codfw.wmnet
* 13:35 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2040.codfw.wmnet
* 13:30 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be2040.codfw.wmnet
* 13:29 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2039.codfw.wmnet
* 13:28 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'similar-users' for release 'main' .
* 13:28 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'similar-users' for release 'canary' .
* 13:28 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'similar-users' for release 'test' .
* 13:26 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'similar-users' for release 'main' .
* 13:26 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'similar-users' for release 'test' .
* 13:26 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'similar-users' for release 'canary' .
* 13:20 urbanecm@deploy1001: Synchronized php-1.36.0-wmf.27/extensions/Translate/: {{Gerrit|20decbd5cc3de0af655b9419cf69fc442ab056a4}}: Add flag to toggle the usage of the group synchronization cache ([[phab:T272428|T272428]]; [[phab:T182433|T182433]]) (duration: 01m 10s)
* 13:14 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be2039.codfw.wmnet
* 13:14 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'similar-users' for release 'test' .
* 13:14 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'similar-users' for release 'main' .
* 13:14 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'similar-users' for release 'canary' .
* 13:07 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2038.codfw.wmnet
* 13:02 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'similar-users' for release 'main' .
* 13:02 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'similar-users' for release 'canary' .
* 13:02 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'similar-users' for release 'test' .
* 12:58 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be2038.codfw.wmnet
* 12:49 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2087.codfw.wmnet with reason: Schema change [[phab:T267767|T267767]]
* 12:49 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2087.codfw.wmnet with reason: Schema change [[phab:T267767|T267767]]
* 12:46 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2037.codfw.wmnet
* 12:41 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be2037.codfw.wmnet
* 12:41 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2036.codfw.wmnet
* 12:36 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be2036.codfw.wmnet
* 12:31 godog: bounce icinga on alert1001
* 12:28 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'similar-users' for release 'test' .
* 12:27 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'similar-users' for release 'main' .
* 12:27 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'similar-users' for release 'canary' .
* 12:27 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2035.codfw.wmnet
* 12:20 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be2035.codfw.wmnet
* 12:19 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2034.codfw.wmnet
* 12:10 matthiasmullie: EU config window done
* 12:10 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be2034.codfw.wmnet
* 12:09 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2033.codfw.wmnet
* 12:08 mlitn@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|2fc57b259}}: Remove MediaSearch survey (duration: 01m 10s)
* 12:02 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be2033.codfw.wmnet
* 12:01 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2032.codfw.wmnet
* 11:54 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be2032.codfw.wmnet
* 11:28 marostegui@cumin1001: dbctl commit (dc=all): 'db1079 (re)pooling @ 100%: After moving wikireplicas to another host', diff saved to https://phabricator.wikimedia.org/P13847 and previous config saved to /var/cache/conftool/dbconfig/20210120-112808-root.json
* 11:13 marostegui@cumin1001: dbctl commit (dc=all): 'db1079 (re)pooling @ 75%: After moving wikireplicas to another host', diff saved to https://phabricator.wikimedia.org/P13846 and previous config saved to /var/cache/conftool/dbconfig/20210120-111305-root.json
* 10:58 marostegui@cumin1001: dbctl commit (dc=all): 'db1079 (re)pooling @ 50%: After moving wikireplicas to another host', diff saved to https://phabricator.wikimedia.org/P13845 and previous config saved to /var/cache/conftool/dbconfig/20210120-105801-root.json
* 10:53 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2030.codfw.wmnet
* 10:51 XioNoX: Discard the non-whitelisted 172.16.0.0/12 traffic - [[phab:T209082|T209082]]
* 10:48 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be2030.codfw.wmnet
* 10:46 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2029.codfw.wmnet
* 10:42 marostegui@cumin1001: dbctl commit (dc=all): 'db1079 (re)pooling @ 25%: After moving wikireplicas to another host', diff saved to https://phabricator.wikimedia.org/P13844 and previous config saved to /var/cache/conftool/dbconfig/20210120-104257-root.json
* 10:37 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be2029.codfw.wmnet
* 10:35 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2028.codfw.wmnet
* 10:34 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1079 to stop replication [[phab:T272008|T272008]]', diff saved to https://phabricator.wikimedia.org/P13842 and previous config saved to /var/cache/conftool/dbconfig/20210120-103449-marostegui.json
* 10:26 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be2028.codfw.wmnet
* 10:26 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2027.codfw.wmnet
* 10:17 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be2027.codfw.wmnet
* 10:16 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2026.codfw.wmnet
* 10:07 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be2026.codfw.wmnet
* 10:05 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2025.codfw.wmnet
* 09:59 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be2025.codfw.wmnet
* 09:57 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2024.codfw.wmnet
* 09:49 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be2024.codfw.wmnet
* 09:47 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2023.codfw.wmnet
* 09:39 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be2023.codfw.wmnet
* 09:39 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2021.codfw.wmnet
* 09:32 moritzm: installing cuminunpriv1001
* 09:32 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be2021.codfw.wmnet
* 09:31 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2020.codfw.wmnet
* 09:24 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be2020.codfw.wmnet
* 09:19 XioNoX: configure Lumen interfaces
* 09:16 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2019.codfw.wmnet
* 09:09 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be2019.codfw.wmnet
* 09:08 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2018.codfw.wmnet
* 09:01 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be2018.codfw.wmnet
* 00:43 tgr@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:656284{{!}}Update /analytics/legacy/homepagemodule/ schema version to 1.1.0 (T270309)]] (duration: 01m 03s)
* 00:30 tgr@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:655863{{!}}(no-op) GrowthExperiments: Disable link recommendations (T261408)]] (duration: 01m 05s)
* 00:09 legoktm: uploaded docker-report 0.0.4-1~deb9u1 to stretch-wikimedia ([[phab:T179696|T179696]])


== 2021-01-19 ==
== 2021-10-16 ==
* 21:52 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2314.codfw.wmnet
* 03:56 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 21:51 brennen@deploy1001: rebuilt and synchronized wikiversions files: Revert group0 wikis to 1.36.0-wmf.26
* 02:19 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 21:51 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2313.codfw.wmnet
* 01:30 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 21:51 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2312.codfw.wmnet
* 21:50 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2315.codfw.wmnet
* 21:46 ottomata: wiping kafka-test cluster data and starting from scratch - [[phab:T255973|T255973]]
* 21:00 Urbanecm: Start of `foreachwikiindblist group2 extensions/AbuseFilter/maintenance/MigrateAflFilter.php --batch-size=1000` ([[phab:T269713|T269713]])
* 20:09 brennen@deploy1001: rebuilt and synchronized wikiversions files: group0 wikis to 1.36.0-wmf.27
* 20:08 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2315.codfw.wmnet
* 20:07 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2314.codfw.wmnet
* 20:07 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2313.codfw.wmnet
* 20:07 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2312.codfw.wmnet
* 19:46 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.change-distro-from-cdh (exit_code=0) for Hadoop test cluster: Change Hadoop distribution - elukey@cumin1001
* 19:30 elukey@cumin1001: START - Cookbook sre.hadoop.change-distro-from-cdh for Hadoop test cluster: Change Hadoop distribution - elukey@cumin1001
* 19:27 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.stop-cluster (exit_code=0) for Hadoop test cluster: Stop the Hadoop cluster before maintenance. - elukey@cumin1001
* 19:22 elukey@cumin1001: START - Cookbook sre.hadoop.stop-cluster for Hadoop test cluster: Stop the Hadoop cluster before maintenance. - elukey@cumin1001
* 18:58 brennen@deploy1001: Pruned MediaWiki: 1.36.0-wmf.22 (duration: 03m 53s)
* 18:47 mbsantos@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'push-notifications' for release 'main' .
* 18:43 mbsantos@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'push-notifications' for release 'main' .
* 18:42 brennen@deploy1001: Finished scap: testwikis wikis to 1.36.0-wmf.27 (duration: 41m 57s)
* 18:39 mbsantos@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'push-notifications' for release 'main' .
* 18:01 brennen@deploy1001: Started scap: testwikis wikis to 1.36.0-wmf.27
* 17:59 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on restbase2009.codfw.wmnet with reason: REIMAGE
* 17:59 brennen: starting deploy-promote to testwikis for 1.36.0-wmf.27 ([[phab:T271341|T271341]])
* 17:58 pt1979@cumin2001: START - Cookbook sre.hosts.downtime for 2:00:00 on restbase2009.codfw.wmnet with reason: REIMAGE
* 17:30 Urbanecm: Start of `foreachwikiindblist group1 extensions/AbuseFilter/maintenance/MigrateAflFilter.php --batch-size=1000  ` ([[phab:T269713|T269713]])
* 17:08 Urbanecm: Run extensions/AbuseFilter/maintenance/MigrateAflFilter.php for all group0 wikis ([[phab:T269713|T269713]])
* 17:06 Urbanecm: mwscript extensions/AbuseFilter/maintenance/MigrateAflFilter.php --wiki=test2wiki --batch-size=1000 # [[phab:T269713|T269713]]
* 17:04 Urbanecm: mwscript extensions/AbuseFilter/maintenance/MigrateAflFilter.php --wiki=testwiki --batch-size=1000 # [[phab:T269713|T269713]]
* 16:51 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on mw2314.codfw.wmnet with reason: new install on buster
* 16:51 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on mw2314.codfw.wmnet with reason: new install on buster
* 16:50 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mw2314.codfw.wmnet with reason: REIMAGE
* 16:48 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mw2313.codfw.wmnet with reason: REIMAGE
* 16:47 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'similar-users' for release 'test' .
* 16:47 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'similar-users' for release 'main' .
* 16:47 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'similar-users' for release 'canary' .
* 16:47 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2315.codfw.wmnet with reason: REIMAGE
* 16:46 brennen: 1.36.0-wmf.27 was branched at {{Gerrit|fbb516d8e33924c6cb66c93bb6d42907558c31f3}} for [[phab:T271341|T271341]]
* 16:45 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2312.codfw.wmnet with reason: REIMAGE
* 16:45 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2315.codfw.wmnet with reason: REIMAGE
* 16:45 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2314.codfw.wmnet with reason: REIMAGE
* 16:43 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2313.codfw.wmnet with reason: REIMAGE
* 16:43 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2312.codfw.wmnet with reason: REIMAGE
* 16:43 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'similar-users' for release 'canary' .
* 16:43 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'similar-users' for release 'test' .
* 16:43 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'similar-users' for release 'main' .
* 16:41 jmm@cumin2001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host ms-be1046.eqiad.wmnet
* 16:39 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'similar-users' for release 'test' .
* 16:39 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'similar-users' for release 'main' .
* 16:39 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'similar-users' for release 'canary' .
* 16:36 marostegui@cumin1001: dbctl commit (dc=all): 'db1112 (re)pooling @ 100%: After moving wikireplicas to another host', diff saved to https://phabricator.wikimedia.org/P13838 and previous config saved to /var/cache/conftool/dbconfig/20210119-163637-root.json
* 16:30 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'similar-users' for release 'test' .
* 16:30 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'similar-users' for release 'canary' .
* 16:30 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'similar-users' for release 'main' .
* 16:23 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'similar-users' for release 'main' .
* 16:23 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'similar-users' for release 'test' .
* 16:23 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'similar-users' for release 'canary' .
* 16:22 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'similar-users' for release 'canary' .
* 16:21 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'similar-users' for release 'test' .
* 16:21 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'similar-users' for release 'main' .
* 16:21 marostegui@cumin1001: dbctl commit (dc=all): 'db1112 (re)pooling @ 75%: After moving wikireplicas to another host', diff saved to https://phabricator.wikimedia.org/P13837 and previous config saved to /var/cache/conftool/dbconfig/20210119-162134-root.json
* 16:14 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.change-distro-from-cdh (exit_code=0) for Hadoop test cluster: Change Hadoop distribution - elukey@cumin1001
* 16:07 moritzm: powercycling ms-be1046, stuck during boot
* 16:06 marostegui@cumin1001: dbctl commit (dc=all): 'db1112 (re)pooling @ 50%: After moving wikireplicas to another host', diff saved to https://phabricator.wikimedia.org/P13836 and previous config saved to /var/cache/conftool/dbconfig/20210119-160630-root.json
* 15:58 elukey@cumin1001: START - Cookbook sre.hadoop.change-distro-from-cdh for Hadoop test cluster: Change Hadoop distribution - elukey@cumin1001
* 15:58 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.stop-cluster (exit_code=0) for Hadoop test cluster: Stop the Hadoop cluster before maintenance. - elukey@cumin1001
* 15:51 elukey@cumin1001: START - Cookbook sre.hadoop.stop-cluster for Hadoop test cluster: Stop the Hadoop cluster before maintenance. - elukey@cumin1001
* 15:51 marostegui@cumin1001: dbctl commit (dc=all): 'db1112 (re)pooling @ 25%: After moving wikireplicas to another host', diff saved to https://phabricator.wikimedia.org/P13835 and previous config saved to /var/cache/conftool/dbconfig/20210119-155127-root.json
* 15:47 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be1046.eqiad.wmnet
* 15:46 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1045.eqiad.wmnet
* 15:45 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'similar-users' for release 'canary' .
* 15:45 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'similar-users' for release 'main' .
* 15:45 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'similar-users' for release 'test' .
* 15:43 jmm@cumin2001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host cuminunpriv1001.eqiad.wmnet
* 15:40 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be1045.eqiad.wmnet
* 15:37 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1044.eqiad.wmnet
* 15:29 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be1044.eqiad.wmnet
* 15:28 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1043.eqiad.wmnet
* 15:26 jmm@cumin2001: START - Cookbook sre.ganeti.makevm for new host cuminunpriv1001.eqiad.wmnet
* 15:23 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'similar-users' for release 'canary' .
* 15:23 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'similar-users' for release 'test' .
* 15:23 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'similar-users' for release 'main' .
* 15:22 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be1043.eqiad.wmnet
* 15:15 Urbanecm: Run `foreachwikiindblist closed extensions/AbuseFilter/maintenance/MigrateAflFilter.php` ([[phab:T269713|T269713]])
* 15:06 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'similar-users' for release 'canary' .
* 15:06 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'similar-users' for release 'test' .
* 15:06 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'similar-users' for release 'main' .
* 15:03 Jeff_Green: authdns-update DNS adjustments for frdata-(eqiad{{!}}codfw)
* 14:30 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1042.eqiad.wmnet
* 14:26 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be1042.eqiad.wmnet
* 14:19 marostegui: Sanitize trwikivoyage on db2094:3315, db1124:3315, db1154:3315 [[phab:T271261|T271261]]
* 14:15 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1041.eqiad.wmnet
* 14:10 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be1041.eqiad.wmnet
* 14:09 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1040.eqiad.wmnet
* 14:08 Amir1: end of foreachwikiindblist wikidataclient extensions/Wikibase/lib/maintenance/populateSitesTable.php --force-protocol https ([[phab:T271264|T271264]])
* 14:03 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be1040.eqiad.wmnet
* 13:49 Amir1: start of foreachwikiindblist wikidataclient extensions/Wikibase/lib/maintenance/populateSitesTable.php --force-protocol https ([[phab:T271264|T271264]])
* 13:49 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1039.eqiad.wmnet
* 13:40 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be1039.eqiad.wmnet
* 13:39 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1038.eqiad.wmnet
* 13:39 Urbanecm: trwikivoyage is created
* 13:39 urbanecm@deploy1001: Synchronized wmf-config/interwiki.php: Update interwiki cache (duration: 01m 53s)
* 13:38 godog: bounce logstash on logstash1025 to debug unindexable logs
* 13:37 urbanecm@deploy1001: update-interwiki-cache aborted: Update interwiki cache (duration: 00m 05s)
* 13:36 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Creating trwikivoyage ([[phab:T271260|T271260]]) (duration: 00m 55s)
* 13:35 urbanecm@deploy1001: Synchronized static/images/project-logos/: Creating trwikivoyage ([[phab:T271260|T271260]]) (duration: 00m 55s)
* 13:34 urbanecm@deploy1001: rebuilt and synchronized wikiversions files: Creating trwikivoyage ([[phab:T271260|T271260]])
* 13:32 urbanecm@deploy1001: Synchronized dblists: Creating trwikivoyage ([[phab:T271260|T271260]]) (duration: 00m 55s)
* 13:31 urbanecm@deploy1001: Synchronized wmf-config/db-codfw.php: Creating trwikivoyage ([[phab:T271260|T271260]]) (duration: 00m 55s)
* 13:31 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be1038.eqiad.wmnet
* 13:30 urbanecm@deploy1001: Synchronized wmf-config/db-eqiad.php: Creating trwikivoyage ([[phab:T271260|T271260]]) (duration: 00m 56s)
* 13:22 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1037.eqiad.wmnet
* 13:16 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be1037.eqiad.wmnet
* 13:07 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1036.eqiad.wmnet
* 12:59 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be1036.eqiad.wmnet
* 12:58 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1035.eqiad.wmnet
* 12:48 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be1035.eqiad.wmnet
* 12:46 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1034.eqiad.wmnet
* 12:45 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'similar-users' for release 'staging' .
* 12:45 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'similar-users' for release 'production' .
* 12:44 jdrewniak@deploy1001: Synchronized portals: Wikimedia Portals Update: [[gerrit:656842{{!}} Bumping portals to master (T128546)]] (duration: 00m 56s)
* 12:44 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'similar-users' for release 'canary' .
* 12:44 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'similar-users' for release 'test' .
* 12:44 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'similar-users' for release 'main' .
* 12:44 jdrewniak@deploy1001: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: [[gerrit:656842{{!}} Bumping portals to master (T128546)]] (duration: 00m 56s)
* 12:38 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be1034.eqiad.wmnet
* 12:37 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1033.eqiad.wmnet
* 12:32 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|338c0f9fe32512266c3030f7c9b7f8804ed30432}}: wgAbuseFilterAflFilterMigrationStage: Make WRITE_BOTH everywhere ([[phab:T269712|T269712]]) (duration: 00m 56s)
* 12:30 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be1033.eqiad.wmnet
* 12:30 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1031.eqiad.wmnet
* 12:25 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be1031.eqiad.wmnet
* 12:22 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1030.eqiad.wmnet
* 12:22 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'similar-users' for release 'canary' .
* 12:22 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'similar-users' for release 'test' .
* 12:22 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'similar-users' for release 'main' .
* 12:17 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be1030.eqiad.wmnet
* 12:16 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1029.eqiad.wmnet
* 12:12 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'similar-users' for release 'main' .
* 12:12 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'similar-users' for release 'canary' .
* 12:12 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'similar-users' for release 'test' .
* 12:08 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be1029.eqiad.wmnet
* 12:06 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|6a4cbe662655edaa4f6c36e69877766a6a48d828}}: Revert "Switch fiwiki to their 500k temporary logo!" ([[phab:T270974|T270974]]) (duration: 00m 56s)
* 11:59 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1028.eqiad.wmnet
* 11:54 moritzm: installing remaining openssl 1.1 updates on stretch
* 11:41 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be1028.eqiad.wmnet
* 11:41 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1026.eqiad.wmnet
* 11:34 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be1026.eqiad.wmnet
* 11:34 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'similar-users' for release 'main' .
* 11:34 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'similar-users' for release 'test' .
* 11:33 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'similar-users' for release 'canary' .
* 11:33 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1025.eqiad.wmnet
* 11:26 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be1025.eqiad.wmnet
* 11:17 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1024.eqiad.wmnet
* 11:11 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'similar-users' for release 'main' .
* 11:11 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'similar-users' for release 'test' .
* 11:11 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'similar-users' for release 'canary' .
* 11:10 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be1024.eqiad.wmnet
* 11:10 hnowlan@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'kube-system' for release 'calico-policy-controller' .
* 11:10 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1023.eqiad.wmnet
* 11:06 hnowlan@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'kube-system' for release 'calico-policy-controller' .
* 11:01 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be1023.eqiad.wmnet
* 10:58 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1021.eqiad.wmnet
* 10:56 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'kube-system' for release 'calico-policy-controller' .
* 10:50 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be1021.eqiad.wmnet
* 10:49 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1020.eqiad.wmnet
* 10:41 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be1020.eqiad.wmnet
* 10:18 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2017.codfw.wmnet
* 10:08 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be2017.codfw.wmnet
* 09:51 jmm@cumin2001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host ms-be2017.codfw.wmnet
* 09:51 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be2017.codfw.wmnet
* 09:49 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2016.codfw.wmnet
* 09:40 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be2016.codfw.wmnet
* 09:01 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1112 to stop replication [[phab:T272008|T272008]]', diff saved to https://phabricator.wikimedia.org/P13828 and previous config saved to /var/cache/conftool/dbconfig/20210119-090100-marostegui.json
* 08:59 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1078, depooled by mistake', diff saved to https://phabricator.wikimedia.org/P13827 and previous config saved to /var/cache/conftool/dbconfig/20210119-085918-marostegui.json
* 08:58 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1078 to stop replication [[phab:T272008|T272008]]', diff saved to https://phabricator.wikimedia.org/P13826 and previous config saved to /var/cache/conftool/dbconfig/20210119-085856-marostegui.json
* 08:54 godog: swift codfw-prod: more weight to ms-be20[58-61] - [[phab:T269337|T269337]]
* 08:08 marostegui@cumin1001: dbctl commit (dc=all): 'db1082 (re)pooling @ 100%: After moving wikireplicas to another host', diff saved to https://phabricator.wikimedia.org/P13825 and previous config saved to /var/cache/conftool/dbconfig/20210119-080839-root.json
* 07:53 marostegui@cumin1001: dbctl commit (dc=all): 'db1082 (re)pooling @ 75%: After moving wikireplicas to another host', diff saved to https://phabricator.wikimedia.org/P13824 and previous config saved to /var/cache/conftool/dbconfig/20210119-075336-root.json
* 07:41 oblivian@deploy1001: Synchronized README: Null deployments to test php restarts from scap (duration: 01m 23s)
* 07:38 marostegui@cumin1001: dbctl commit (dc=all): 'db1082 (re)pooling @ 50%: After moving wikireplicas to another host', diff saved to https://phabricator.wikimedia.org/P13823 and previous config saved to /var/cache/conftool/dbconfig/20210119-073832-root.json
* 07:23 marostegui@cumin1001: dbctl commit (dc=all): 'db1082 (re)pooling @ 25%: After moving wikireplicas to another host', diff saved to https://phabricator.wikimedia.org/P13822 and previous config saved to /var/cache/conftool/dbconfig/20210119-072329-root.json
* 07:14 elukey: clean up prometheus es exporter units on es-codfw nodes not needed anymore
* 07:02 marostegui: Stop MySQL on db1082 [[phab:T272008|T272008]]
* 06:57 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1082 to stop replication [[phab:T272008|T272008]]', diff saved to https://phabricator.wikimedia.org/P13821 and previous config saved to /var/cache/conftool/dbconfig/20210119-065748-marostegui.json
* 06:04 marostegui: Upgrade kernel on pc2007 pc2008 pc2009 pc2010 [[phab:T272121|T272121]]
* 04:39 Krinkle: unlocked per ttps://phabricator.wikimedia.org/T272215#6755025
* 04:37 Krinkle: locks scap on deploy1001 as precaution


== 2021-01-18 ==
== 2021-10-15 ==
* 21:33 eileen: civicrm revision changed from {{Gerrit|4220fc8177}} to {{Gerrit|a4caad22b1}}, config revision is {{Gerrit|f08249ecf9}}
* 23:48 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 21:30 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2311.codfw.wmnet
* 23:27 dzahn@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'miscweb' for release 'main' .
* 21:30 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2310.codfw.wmnet
* 23:23 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 21:30 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2309.codfw.wmnet
* 22:38 mutante: apt1001 - removing nginx package, accidentally installed, should just be nginx-light of course, running puppet
* 21:30 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2307.codfw.wmnet
* 22:36 mutante: apt2001 - removing nginx package, accidentally installed, should just be nginx-light of course, running puppet
* 21:29 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2309.codfw.wmnet
* 22:34 mutante: apt2001 - upgraded nginx
* 21:29 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2307.codfw.wmnet
* 22:18 dzahn@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'miscweb' for release 'main' .
* 21:29 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2310.codfw.wmnet
* 22:14 dzahn@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'miscweb' for release 'main' .
* 21:29 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2311.codfw.wmnet
* 22:05 dpifke@deploy1002: Finished deploy [performance/arc-lamp@40cb764]: Revert problematic arclamp patch to fix daemon crashes (duration: 00m 05s)
* 20:54 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2311.codfw.wmnet with reason: REIMAGE
* 22:05 dpifke@deploy1002: Started deploy [performance/arc-lamp@40cb764]: Revert problematic arclamp patch to fix daemon crashes
* 20:52 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mw2309.codfw.wmnet with reason: REIMAGE
* 21:51 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 20:50 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2310.codfw.wmnet with reason: REIMAGE
* 21:44 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 20:49 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2311.codfw.wmnet with reason: REIMAGE
* 21:44 dzahn@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'miscweb' for release 'main' .
* 20:48 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2307.codfw.wmnet with reason: REIMAGE
* 21:36 dzahn@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'miscweb' for release 'main' .
* 20:48 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2310.codfw.wmnet with reason: REIMAGE
* 20:09 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 20:47 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2309.codfw.wmnet with reason: REIMAGE
* 18:44 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 20:46 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2307.codfw.wmnet with reason: REIMAGE
* 17:20 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 20:25 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2305.codfw.wmnet
* 17:17 mutante: gitlab1001 - disabling puppet for debugging
* 20:25 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2303.codfw.wmnet
* 17:05 mutante: gitlab2001 - temp stopped puppet - debugging gitlab restore script with Arnold - [[phab:T283076|T283076]]
* 20:24 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2277.codfw.wmnet
* 17:01 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 20:24 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2276.codfw.wmnet
* 16:50 mutante: gitlab2001 - temp stopped puppet - debugging gitlab restore script with Arnold
* 20:20 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2303.codfw.wmnet
* 16:46 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 20:19 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2305.codfw.wmnet
* 16:44 jayme@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'shellbox-constraints' for release 'main' .
* 20:19 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2277.codfw.wmnet
* 15:23 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 20:19 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2276.codfw.wmnet
* 15:23 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 19:42 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2305.codfw.wmnet with reason: REIMAGE
* 15:08 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 19:40 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2303.codfw.wmnet with reason: REIMAGE
* 15:08 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 19:39 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2305.codfw.wmnet with reason: REIMAGE
* 14:48 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 19:38 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2303.codfw.wmnet with reason: REIMAGE
* 14:31 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 19:35 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2277.codfw.wmnet with reason: REIMAGE
* 14:15 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 19:33 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2276.codfw.wmnet with reason: REIMAGE
* 13:32 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 19:32 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2277.codfw.wmnet with reason: REIMAGE
* 13:32 ryankemper@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99)
* 19:31 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2276.codfw.wmnet with reason: REIMAGE
* 13:32 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 18:40 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2275.codfw.wmnet
* 13:30 elukey: start topic rebalancing for kafka main-eqiad (long maintenance, it will last a couple of days)
* 18:40 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2274.codfw.wmnet
* 13:24 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 18:39 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2273.codfw.wmnet
* 13:21 vgutierrez: updating acme-chief to version 0.34 on acmechief-test instances - [[phab:T292619|T292619]]
* 18:38 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2271.codfw.wmnet
* 13:19 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 18:36 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=0) for hosts an-worker[1136,1138].eqiad.wmnet
* 13:14 vgutierrez: upload acme-chief 0.34 to apt.wikimedia.org (buster) - [[phab:T292619|T292619]]
* 18:34 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2274.codfw.wmnet
* 11:55 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 18:34 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2275.codfw.wmnet
* 11:49 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 18:34 elukey@cumin1001: START - Cookbook sre.hadoop.init-hadoop-workers for hosts an-worker[1136,1138].eqiad.wmnet
* 11:48 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host testvm2007.codfw.wmnet
* 18:34 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2273.codfw.wmnet
* 11:45 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 18:33 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2271.codfw.wmnet
* 11:33 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 18:24 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1138.eqiad.wmnet with reason: REIMAGE
* 11:24 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host testvm2007.codfw.wmnet
* 18:22 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1138.eqiad.wmnet with reason: REIMAGE
* 11:14 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 18:21 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=0) for hosts an-worker1132.eqiad.wmnet
* 10:46 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 18:20 elukey@cumin1001: START - Cookbook sre.hadoop.init-hadoop-workers for hosts an-worker1132.eqiad.wmnet
* 09:15 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 18:19 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=0) for hosts an-worker1130.eqiad.wmnet
* 09:06 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 18:18 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1136.eqiad.wmnet with reason: REIMAGE
* 08:58 jelto: jelto@gitlab1001:~$ sudo disable-puppet "disable puppet on gitlab1001 to test 728380 on GitLab replica - [[phab:T283076|T283076]]"
* 18:17 elukey@cumin1001: START - Cookbook sre.hadoop.init-hadoop-workers for hosts an-worker1130.eqiad.wmnet
* 07:41 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 18:16 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1136.eqiad.wmnet with reason: REIMAGE
* 06:20 urbanecm: Start server-side upload for 1 video file
* 18:14 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=0) for hosts an-worker1128.eqiad.wmnet
* 02:14 ryankemper: [[phab:T288231|T288231]] `wdqs2006` data transfer complete and all tests passing on the host. All of `codfw wdqs-internal` is on the new streaming updater
* 18:12 elukey@cumin1001: START - Cookbook sre.hadoop.init-hadoop-workers for hosts an-worker1128.eqiad.wmnet
* 00:09 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 17:51 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=0) for hosts an-worker[1124-1127].eqiad.wmnet
* 00:07 brennen: end of UTC late backport & config training window
* 17:50 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2274.codfw.wmnet with reason: REIMAGE
* 17:49 elukey@cumin1001: START - Cookbook sre.hadoop.init-hadoop-workers for hosts an-worker[1124-1127].eqiad.wmnet
* 17:49 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mw2275.codfw.wmnet with reason: REIMAGE
* 17:49 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2273.codfw.wmnet with reason: REIMAGE
* 17:48 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=0) for hosts an-worker[1121-1123].eqiad.wmnet
* 17:47 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2271.codfw.wmnet with reason: REIMAGE
* 17:47 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2275.codfw.wmnet with reason: REIMAGE
* 17:46 elukey@cumin1001: START - Cookbook sre.hadoop.init-hadoop-workers for hosts an-worker[1121-1123].eqiad.wmnet
* 17:46 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2274.codfw.wmnet with reason: REIMAGE
* 17:45 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2273.codfw.wmnet with reason: REIMAGE
* 17:45 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2271.codfw.wmnet with reason: REIMAGE
* 17:44 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=0) for hosts an-worker1120.eqiad.wmnet
* 17:42 elukey@cumin1001: START - Cookbook sre.hadoop.init-hadoop-workers for hosts an-worker1120.eqiad.wmnet
* 17:38 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=0) for hosts an-worker1118.eqiad.wmnet
* 17:36 elukey@cumin1001: START - Cookbook sre.hadoop.init-hadoop-workers for hosts an-worker1118.eqiad.wmnet
* 17:32 mutante: reimaging mw2271,mw2273,mw2274,mw227 (codfw only)
* 16:24 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1137.eqiad.wmnet with reason: REIMAGE
* 16:22 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1137.eqiad.wmnet with reason: REIMAGE
* 16:05 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1135.eqiad.wmnet with reason: REIMAGE
* 16:03 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1135.eqiad.wmnet with reason: REIMAGE
* 15:59 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1132.eqiad.wmnet with reason: REIMAGE
* 15:57 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1132.eqiad.wmnet with reason: REIMAGE
* 15:48 moritzm: installing wavpack security updates
* 15:39 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1130.eqiad.wmnet with reason: REIMAGE
* 15:36 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1130.eqiad.wmnet with reason: REIMAGE
* 15:16 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1128.eqiad.wmnet with reason: REIMAGE
* 15:14 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1128.eqiad.wmnet with reason: REIMAGE
* 15:10 kormat@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
* 14:53 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host theemin.codfw.wmnet
* 14:49 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host theemin.codfw.wmnet
* 14:43 kormat@cumin1001: START - Cookbook sre.ganeti.makevm
* 14:31 kormat@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99)
* 14:30 arturo: updating packages in buster-wikimedia/thirdparty/ceph-nautilus-buster ([[phab:T272296|T272296]])
* 14:26 kormat@cumin1001: START - Cookbook sre.ganeti.makevm
* 14:26 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 14:18 kormat@cumin1001: START - Cookbook sre.hosts.decommission
* 13:34 moritzm: uploaded wmf-sre-laptop 0.3.2 to apt.wikimedia.org
* 13:26 volans: installed spicerack 0.0.48-1+deb10u1 on cumin hosts
* 13:12 marostegui: Upgrade db2071 to 10.4.17 - [[phab:T268457|T268457]]
* 13:08 XioNoX: add NAT rule on pfw3-eqiad - [[phab:T272066|T272066]]
* 12:56 XioNoX: add NAT rule on pfw3-codfw - [[phab:T272066|T272066]]
* 12:36 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-fe2008.codfw.wmnet
* 12:33 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1127.eqiad.wmnet with reason: REIMAGE
* 12:32 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-fe2008.codfw.wmnet
* 12:31 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1127.eqiad.wmnet with reason: REIMAGE
* 12:27 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1126.eqiad.wmnet with reason: REIMAGE
* 12:25 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1126.eqiad.wmnet with reason: REIMAGE
* 12:22 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1125.eqiad.wmnet with reason: REIMAGE
* 12:21 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-fe2007.codfw.wmnet
* 12:20 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1125.eqiad.wmnet with reason: REIMAGE
* 12:18 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-fe2007.codfw.wmnet
* 12:14 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-fe2006.codfw.wmnet
* 12:13 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1124.eqiad.wmnet with reason: REIMAGE
* 12:11 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1124.eqiad.wmnet with reason: REIMAGE
* 12:08 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-fe2006.codfw.wmnet
* 12:08 volans: uploaded spicerack_0.0.48 to apt.wikimedia.org buster-wikimedia
* 12:04 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-fe2005.codfw.wmnet
* 12:03 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1123.eqiad.wmnet with reason: REIMAGE
* 12:01 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1123.eqiad.wmnet with reason: REIMAGE
* 11:54 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-fe2005.codfw.wmnet
* 11:52 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1122.eqiad.wmnet with reason: REIMAGE
* 11:52 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-fe1008.eqiad.wmnet
* 11:49 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1122.eqiad.wmnet with reason: REIMAGE
* 11:48 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-fe1008.eqiad.wmnet
* 11:44 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-fe1007.eqiad.wmnet
* 11:40 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-fe1007.eqiad.wmnet
* 11:37 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1121.eqiad.wmnet with reason: REIMAGE
* 11:35 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1121.eqiad.wmnet with reason: REIMAGE
* 11:33 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-fe1006.eqiad.wmnet
* 11:28 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-fe1006.eqiad.wmnet
* 11:22 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1120.eqiad.wmnet with reason: REIMAGE
* 11:18 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1120.eqiad.wmnet with reason: REIMAGE
* 11:17 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-fe1005.eqiad.wmnet
* 11:11 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-fe1005.eqiad.wmnet
* 11:10 hashar: Restarting Gerrit main instance on gerrit1001.wikimedia.org
* 11:08 hashar: Restarting Gerrit replica on gerrit2001.wikimedia.org
* 10:58 moritzm: installing python2.7 security updates on Stretch
* 10:30 marostegui@cumin1001: dbctl commit (dc=all): 'db1074 (re)pooling @ 100%: After moving wikireplicas to another host', diff saved to https://phabricator.wikimedia.org/P13799 and previous config saved to /var/cache/conftool/dbconfig/20210118-102959-root.json
* 10:14 marostegui@cumin1001: dbctl commit (dc=all): 'db1074 (re)pooling @ 75%: After moving wikireplicas to another host', diff saved to https://phabricator.wikimedia.org/P13798 and previous config saved to /var/cache/conftool/dbconfig/20210118-101456-root.json
* 10:00 _joe_: restarting pybal on lvs1016, not talking to its etcd server
* 09:59 marostegui@cumin1001: dbctl commit (dc=all): 'db1074 (re)pooling @ 50%: After moving wikireplicas to another host', diff saved to https://phabricator.wikimedia.org/P13797 and previous config saved to /var/cache/conftool/dbconfig/20210118-095952-root.json
* 09:51 kormat@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
* 09:44 marostegui@cumin1001: dbctl commit (dc=all): 'db1074 (re)pooling @ 25%: After moving wikireplicas to another host', diff saved to https://phabricator.wikimedia.org/P13796 and previous config saved to /var/cache/conftool/dbconfig/20210118-094449-root.json
* 09:25 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1074 to stop replication [[phab:T272008|T272008]]', diff saved to https://phabricator.wikimedia.org/P13795 and previous config saved to /var/cache/conftool/dbconfig/20210118-092546-marostegui.json
* 09:24 kormat@cumin1001: START - Cookbook sre.ganeti.makevm
* 09:24 marostegui@cumin1001: dbctl commit (dc=all): 'db1106 (re)pooling @ 100%: After moving wikireplicas to another host', diff saved to https://phabricator.wikimedia.org/P13794 and previous config saved to /var/cache/conftool/dbconfig/20210118-092429-root.json
* 09:20 marostegui@cumin1001: dbctl commit (dc=all): 'Remove db1105:3311 from vslow', diff saved to https://phabricator.wikimedia.org/P13793 and previous config saved to /var/cache/conftool/dbconfig/20210118-092003-marostegui.json
* 09:13 moritzm: installing openssl 1.1 security updates on stretch
* 09:09 marostegui@cumin1001: dbctl commit (dc=all): 'db1106 (re)pooling @ 75%: After moving wikireplicas to another host', diff saved to https://phabricator.wikimedia.org/P13791 and previous config saved to /var/cache/conftool/dbconfig/20210118-090926-root.json
* 09:06 kormat@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 09:01 kormat@cumin1001: START - Cookbook sre.dns.netbox
* 08:54 marostegui@cumin1001: dbctl commit (dc=all): 'db1106 (re)pooling @ 50%: After moving wikireplicas to another host', diff saved to https://phabricator.wikimedia.org/P13790 and previous config saved to /var/cache/conftool/dbconfig/20210118-085422-root.json
* 08:46 kormat@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
* 08:42 kormat@cumin1001: START - Cookbook sre.hosts.decommission
* 08:39 marostegui@cumin1001: dbctl commit (dc=all): 'db1106 (re)pooling @ 25%: After moving wikireplicas to another host', diff saved to https://phabricator.wikimedia.org/P13788 and previous config saved to /var/cache/conftool/dbconfig/20210118-083919-root.json
* 08:17 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1106 to stop replication, place db1105:3311 temporarily in vslow [[phab:T272008|T272008]]', diff saved to https://phabricator.wikimedia.org/P13787 and previous config saved to /var/cache/conftool/dbconfig/20210118-081740-marostegui.json
* 08:15 moritzm: installing remaining openssl 1.0 security updated on stretch
* 08:13 elukey: clean up old archiva debs and upload 2.2.4-3 to buster-wikimedia - [[phab:T272082|T272082]]
* 08:01 marostegui@cumin1001: dbctl commit (dc=all): 'db1138 (re)pooling @ 100%: After restarting for kernel upgraed', diff saved to https://phabricator.wikimedia.org/P13786 and previous config saved to /var/cache/conftool/dbconfig/20210118-080122-root.json
* 07:46 marostegui@cumin1001: dbctl commit (dc=all): 'db1138 (re)pooling @ 75%: After restarting for kernel upgraed', diff saved to https://phabricator.wikimedia.org/P13785 and previous config saved to /var/cache/conftool/dbconfig/20210118-074618-root.json
* 07:31 marostegui@cumin1001: dbctl commit (dc=all): 'db1138 (re)pooling @ 50%: After restarting for kernel upgraed', diff saved to https://phabricator.wikimedia.org/P13784 and previous config saved to /var/cache/conftool/dbconfig/20210118-073115-root.json
* 07:16 marostegui@cumin1001: dbctl commit (dc=all): 'db1138 (re)pooling @ 25%: After restarting for kernel upgraed', diff saved to https://phabricator.wikimedia.org/P13783 and previous config saved to /var/cache/conftool/dbconfig/20210118-071611-root.json
* 06:53 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1138 for kernel upgrade', diff saved to https://phabricator.wikimedia.org/P13782 and previous config saved to /var/cache/conftool/dbconfig/20210118-065312-marostegui.json
* 06:35 marostegui: Reboot dbproxy2001, dbproxy2002, dbproxy2003 for kernel upgrade
* 06:22 marostegui: Reboot db1154 and db1155 for kernel upgrade


== 2021-01-16 ==
== 2021-10-14 ==
* 12:18 elukey: elukey@cumin1001:~$ sudo cumin 'A:mw-app-canary and A:mw-eqiad' 'run-puppet-agent' -b 10 - [[phab:T272215|T272215]]
* 23:59 cjming@deploy1002: Synchronized wmf-config/logos.php: Config: [[gerrit:730737{{!}}Change Kashmiri Wikipedia logo (T293342)]] (duration: 00m 55s)
* 12:10 elukey: 'elukey@cumin1001:~$ sudo cumin 'A:mw-eqiad' 'run-puppet-agent' -b 10' [[phab:T272215|T272215]])
* 23:58 cjming@deploy1002: Synchronized logos/config.yaml: Config: [[gerrit:730737{{!}}Change Kashmiri Wikipedia logo (T293342)]] (duration: 00m 55s)
* 11:23 oblivian@puppetmaster1001: conftool action : set/pooled=yes; selector: cluster=appserver,dc=eqiad
* 23:56 cjming@deploy1002: Synchronized static/images/project-logos: Config: [[gerrit:730737{{!}}Change Kashmiri Wikipedia logo (T293342)]] (duration: 00m 56s)
* 23:49 cjming@deploy1002: Synchronized wmf-config/logos.php: Config: [[gerrit:730736{{!}}Change Kashmiri Wiktionary logo (T293373)]] (duration: 00m 55s)
* 23:48 cjming@deploy1002: Synchronized logos/config.yaml: Config: [[gerrit:730736{{!}}Change Kashmiri Wiktionary logo (T293373)]] (duration: 00m 55s)
* 23:46 cjming@deploy1002: Synchronized static/images/project-logos: Config: [[gerrit:730736{{!}}Change Kashmiri Wiktionary logo (T293373)]] (duration: 00m 56s)
* 23:43 ejegg: updated payments-wiki from {{Gerrit|19d18c1852}} to {{Gerrit|0f48acea49}}
* 23:34 cjming@deploy1002: Synchronized php-1.38.0-wmf.4/extensions/WikimediaEvents/includes/VectorPrefDiffInstrumentation.php: Backport: [[gerrit:730733{{!}}Change VectorPrefDiffInstrumentation stream name to `mediawiki.skin_diff` (T289622)]] (duration: 00m 56s)
* 23:24 cjming@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:730936{{!}}allow sysops to add and remove users to other groups on ptwikivoyage (T292806)]] (duration: 00m 56s)
* 23:21 ryankemper@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) restart without plugin upgrade (1 nodes at a time) for ElasticSearch cluster cloudelastic: cloudelastic restart - ryankemper@cumin1001 - [[phab:T292814|T292814]]
* 23:11 brennen@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:730933{{!}}Add americanantiquarian.org to the wgCopyUploadsDomains allowlist of Wikimedia Commons (T292918)]] (duration: 00m 57s)
* 23:11 mutante: mw1452 - re-pooled, scap pull
* 23:09 dzahn@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'miscweb' for release 'main' .
* 22:35 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 22:35 ryankemper: [[phab:T288231|T288231]] Ran puppet on `wdqs2006`, now back to the cookbook run
* 22:33 ryankemper: [[phab:T288231|T288231]] Forgot about running puppet-agent on `wdqs2006`; aborted cookbook run
* 22:33 ryankemper@cumin1001: END (ERROR) - Cookbook sre.wdqs.data-transfer (exit_code=97)
* 22:33 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 22:32 ryankemper: [[phab:T288231|T288231]] Merged https://gerrit.wikimedia.org/r/c/operations/puppet/+/730795; proceeding to data-transfer on `wdqs2006`: `sudo rm -fv /srv/wdqs/data_loaded` on `wdqs2006` followed by `ryankemper@cumin1001:~$ sudo cookbook sre.wdqs.data-transfer --source wdqs2008.codfw.wmnet --dest wdqs2006.codfw.wmnet --reason "streaming updater cutover for wdqs2005" --blazegraph_instance blazegraph --task-id [[phab:T288231|T288231]]`
* 22:31 mutante: depooling mw1452 for testig
* 22:28 ryankemper: [[phab:T288231|T288231]] `ryankemper@wdqs2005:~$ sudo pool`: transfer completed successfully; tests passing on host (used `ssh -L 9999:localhost:80 wdqs2005.codfw.wmnet` to establish tunnel)
* 22:23 dpifke@deploy1002: Finished deploy [performance/arc-lamp@84fe496]: New flamegraph.pl from upstream [[phab:T291898|T291898]] (duration: 00m 05s)
* 22:23 dpifke@deploy1002: Started deploy [performance/arc-lamp@84fe496]: New flamegraph.pl from upstream [[phab:T291898|T291898]]
* 22:17 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation restart without plugin upgrade (1 nodes at a time) for ElasticSearch cluster cloudelastic: cloudelastic restart - ryankemper@cumin1001 - [[phab:T292814|T292814]]
* 22:07 eileen: civicrm revision changed from {{Gerrit|018d3b19fe}} to {{Gerrit|9b5e0d015b}}, config revision is {{Gerrit|781d6a1b1f}}
* 21:34 robh@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 21:25 robh@cumin1001: START - Cookbook sre.dns.netbox
* 21:10 robh@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
* 21:06 robh@cumin1001: START - Cookbook sre.dns.netbox
* 19:45 dancy@deploy1002: rebuilt and synchronized wikiversions files: group2 wikis to 1.38.0-wmf.4  refs [[phab:T281168|T281168]]
* 19:23 dzahn@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'miscweb' for release 'main' .
* 19:05 dzahn@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'miscweb' for release 'main' .
* 18:53 dzahn@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'miscweb' for release 'main' .
* 18:53 urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript namespaceDupes.php --wiki=dagwiki --fix
* 18:47 urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript extensions/CentralAuth/maintenance/fixStuckGlobalRename.php --wiki=frwiktionary --logwiki=metawiki 'TURK FASTER' 'ARTHUR MORGAN'
* 18:42 urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript extensions/CentralAuth/maintenance/fixStuckGlobalRename.php --wiki=metawiki --logwiki=metawiki 'George Dum Fulton' 'George Fulton' # [[phab:T293403|T293403]]
* 18:41 urbanecm: UTC evening B&C done
* 18:40 urbanecm@deploy1002: Synchronized php-1.38.0-wmf.4/extensions/MediaSearch/extension.json: {{Gerrit|6da3523daaba85a4199721980c0a9c96b20697e7}}: Fix assessment quickview labels ([[phab:T292596|T292596]]) (duration: 01m 03s)
* 18:37 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|c8dffefd0d095abe3709dcc962d5d24f27b55869}}: Create Salima namespace for dagwiki ([[phab:T289911|T289911]]) (duration: 01m 04s)
* 18:30 dzahn@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'miscweb' for release 'main' .
* 18:25 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|0bccd4bc45498db8628567574d0bb3a23f8fb378}}: Add $wgSitename and $wgMetaNamespace for kswiki and kswiktionary ([[phab:T289752|T289752]], [[phab:T289767|T289767]]) (duration: 01m 04s)
* 18:17 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 18:14 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|262e588b44f126fb9e1aa933a3ca59b191b42bd7}}: Enable Growth mentor dashboard backend on all wikis ([[phab:T278920|T278920]]) (duration: 01m 05s)
* 18:07 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|41baa8c41d64510986f009b9be2d70dad0915f8c}}: Add new mediawiki.skin_diff event logging stream ([[phab:T289622|T289622]]) (duration: 01m 05s)
* 18:03 addshore@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' .
* 18:02 addshore@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' .
* 18:01 addshore@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'staging' .
* 17:54 bd808@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'toolhub' for release 'main' .
* 17:52 rzl: repooled mw1452 (with `sudo pool` so no auto log from conftool)
* 17:47 bd808@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'toolhub' for release 'main' .
* 17:45 rzl@cumin1001: conftool action : set/pooled=no; selector: name=mw1452.eqiad.wmnet
* 17:42 rzl: depool mw1452 for training
* 17:32 addshore@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' .
* 17:31 addshore@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' .
* 17:29 addshore@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'staging' .
* 16:44 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 16:44 ryankemper: [[phab:T288231|T288231]] Manually killed dangling `pigz` / `nc` processes on `wdqs2008` (and `wdqs2005` implicitly). Should be in the right state to re-start the `data-transfer` cookbook from again
* 16:41 ryankemper@cumin1001: END (ERROR) - Cookbook sre.wdqs.data-transfer (exit_code=97)
* 16:37 elukey: drop kubeflow-kfserving* docker images from deneb
* 16:36 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 16:34 ryankemper@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99)
* 16:33 moritzm: installing node-ansi-regex security updates
* 16:28 mbsantos@deploy1002: Finished deploy [kartotherian/deploy@4bff2d1]: Force mirrored traffic to 0% for everywhere (duration: 02m 24s)
* 16:25 mbsantos@deploy1002: Started deploy [kartotherian/deploy@4bff2d1]: Force mirrored traffic to 0% for everywhere
* 16:24 dancy@deploy1002: Synchronized php-1.38.0-wmf.4/extensions/Collection/includes/CollectionHooks.php: Backport: [[gerrit:730580{{!}}Check that the timestamp  key/value is set to avoid undefined offset (T293300)]] (duration: 01m 04s)
* 16:16 mbsantos@deploy1002: Finished deploy [kartotherian/deploy@071f7c3]: Increase mirrored traffic to 100% for eqiad (duration: 02m 41s)
* 16:14 mbsantos@deploy1002: Started deploy [kartotherian/deploy@071f7c3]: Increase mirrored traffic to 100% for eqiad
* 16:08 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 16:07 ryankemper@cumin1001: END (ERROR) - Cookbook sre.wdqs.data-transfer (exit_code=97)
* 16:07 ryankemper: [[phab:T288231|T288231]] About to ctrl+c out of ongoing data transfer because puppet run following merge of https://gerrit.wikimedia.org/r/c/operations/puppet/+/730794 restarted blazegraph; we'll manually disable updater and kick off the transfer again
* 16:04 ryankemper: [[phab:T288231|T288231]] `ryankemper@wdqs2005:~$ sudo run-puppet-agent --force`
* 15:56 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 15:54 ryankemper: [[phab:T288231|T288231]] `ryankemper@wdqs2008:~$ sudo depool`
* 15:52 ryankemper: [[phab:T288231|T288231]] `ryankemper@wdqs2005:~$ sudo depool`
* 15:22 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.addnode (exit_code=0) for new host ganeti2026.codfw.wmnet to ganeti-test01.svc.codfw.wmnet
* 15:20 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti2026.codfw.wmnet to ganeti-test01.svc.codfw.wmnet
* 15:13 bd808@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'toolhub' for release 'main' .
* 15:06 dancy@deploy1002: Synchronized php-1.38.0-wmf.4/extensions/VisualEditor/includes/VisualEditorHooks.php: Backport: [[gerrit:730729{{!}}Fix value of 'namespacesWithSubpages' in wgVisualEditorConfig (T293310)]] (duration: 01m 04s)
* 15:02 dancy@deploy1002: Synchronized php-1.38.0-wmf.4/extensions/Collection/includes/CollectionHooks.php: Backport: [[gerrit:730580{{!}}Check that the timestamp  key/value is set to avoid undefined offset (T293300)]] (duration: 01m 03s)
* 15:00 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti2026.codfw.wmnet to ganeti-test01.svc.codfw.wmnet
* 14:59 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti2026.codfw.wmnet to ganeti-test01.svc.codfw.wmnet
* 14:53 kormat: upgrading orchestrator.wm.o to 3.2.6-1 [[phab:T275784|T275784]]
* 14:49 jbond@cumin1001: conftool action : set/pooled=true; selector: name=eqiad,dnsdisc=apt
* 14:43 jbond: migrate apt.w.o to a dns active/passiev discovery address (cc moritzm)
* 14:23 moritzm: installing krb5 security updates on KDCs
* 14:19 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality' for release 'main' .
* 14:10 urbanecm@deploy1002: Synchronized dblists/growthexperiments.dblist: {{Gerrit|b35adfc59eec9c19b509bb9439cdfe33978a4f8b}}: Deploy Growth wikis to 4 wikis in dark mode ([[phab:T291826|T291826]]; 2/2) (duration: 01m 03s)
* 14:07 urbanecm: Run extensions/GrowthExperiments/initWikiConfig.php for ganwiki, iuwiki, tgwiki ([[phab:T291826|T291826]])
* 14:07 urbanecm: Create growthexperiments DB tables for ganwiki, iuwiki, tgwiki ([[phab:T291826|T291826]])
* 14:06 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality' for release 'main' .
* 14:05 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
* 14:05 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 14:04 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|b35adfc59eec9c19b509bb9439cdfe33978a4f8b}}: Deploy Growth wikis to 4 wikis in dark mode ([[phab:T291826|T291826]]; 1/2) (duration: 01m 04s)
* 14:03 urbanecm@deploy1002: Synchronized dblists/visualeditor-nondefault.dblist: {{Gerrit|82d0a4bf45126ecba2cfcd1a0c2081a00f58dca3}}: Enable VE by default on 4 more wikis ([[phab:T290614|T290614]]) (duration: 01m 05s)
* 13:56 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality' for release 'main' .
* 13:55 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
* 13:54 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 13:54 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
* 13:54 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 13:52 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
* 13:52 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 13:43 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2026.codfw.wmnet
* 13:36 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2026.codfw.wmnet
* 13:33 ayounsi@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 13:14 kormat: uploaded orchestrator 3.2.6-1 packages to apt.wm.o (buster) [[phab:T275784|T275784]]
* 12:45 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti2026.codfw.wmnet with OS buster
* 12:44 ayounsi@cumin1001: START - Cookbook sre.dns.netbox
* 12:42 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10 days, 0:00:00 on cloudbackup2002.codfw.wmnet with reason: working on cinder backupse
* 12:42 aborrero@cumin1001: START - Cookbook sre.hosts.downtime for 10 days, 0:00:00 on cloudbackup2002.codfw.wmnet with reason: working on cinder backupse
* 12:19 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/Wikibase.php: Config: [[gerrit:730746{{!}}Untangle “dispatch via jobs” settings in Wikibase.php (T291828)]] (no-op) (duration: 01m 04s)
* 12:12 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:730725{{!}}Set wmgWikibaseDispatchViaJobsPruneChangesTableInJobEnabled for wikidatawiki (T291828)]] (no-op) (duration: 01m 05s)
* 11:47 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti2026.codfw.wmnet with OS buster
* 11:17 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts testvm2006.codfw.wmnet
* 11:10 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts testvm2006.codfw.wmnet
* 11:10 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 11:01 jmm@cumin2002: START - Cookbook sre.dns.netbox
* 10:52 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts testvm2002.codfw.wmnet
* 10:38 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts testvm2002.codfw.wmnet
* 10:38 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts testvm2001.codfw.wmnet
* 10:35 urbanecm@deploy1002: Synchronized php-1.38.0-wmf.4/extensions/GrowthExperiments/: {{Gerrit|1f33fc3}}, {{Gerrit|e0ea1b8}}, {{Gerrit|cba2ac9}}: GrowthExperiments backports ([[phab:T290609|T290609]]) (duration: 01m 05s)
* 10:33 urbanecm@deploy1002: Synchronized php-1.38.0-wmf.3/extensions/GrowthExperiments/: {{Gerrit|465b564}}, {{Gerrit|a8cc98b}}, {{Gerrit|6e95c48}}: GrowthExperiments backports ([[phab:T290609|T290609]]) (duration: 01m 06s)
* 10:32 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts testvm2001.codfw.wmnet
* 09:20 volans@cumin2002: END (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: sretest1001.eqiad.wmnet
* 09:20 volans@cumin2002: START - Cookbook sre.debmonitor.remove-hosts for 1 hosts: sretest1001.eqiad.wmnet
* 09:19 volans@cumin2002: END (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: sretest1001.eqiad.wmnet
* 09:19 volans@cumin2002: START - Cookbook sre.debmonitor.remove-hosts for 1 hosts: sretest1001.eqiad.wmnet
* 09:19 volans@cumin2002: END (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: sretest1001.eqiad.wmnet
* 09:19 volans@cumin2002: START - Cookbook sre.debmonitor.remove-hosts for 1 hosts: sretest1001.eqiad.wmnet
* 09:18 volans@deploy1002: Finished deploy [debmonitor/deploy@ab62ac5]: Release v0.3.1 (duration: 00m 50s)
* 09:17 volans@deploy1002: Started deploy [debmonitor/deploy@ab62ac5]: Release v0.3.1
* 09:04 volans@deploy1002: Finished deploy [debmonitor/deploy@444b931]: Release v0.3.1 (duration: 00m 45s)
* 09:03 volans@deploy1002: Started deploy [debmonitor/deploy@444b931]: Release v0.3.1
* 09:02 volans@deploy1002: Finished deploy [debmonitor/deploy@444b931]: Release v0.3.1 (duration: 00m 23s)
* 09:02 volans@deploy1002: Started deploy [debmonitor/deploy@444b931]: Release v0.3.1
* 08:52 volans@cumin2002: END (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: sretest1001.eqiad.wmnet
* 08:52 volans@cumin2002: START - Cookbook sre.debmonitor.remove-hosts for 1 hosts: sretest1001.eqiad.wmnet
* 08:51 volans@cumin2002: END (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: sretest1001.eqiad.wmnet
* 08:51 volans@cumin2002: START - Cookbook sre.debmonitor.remove-hosts for 1 hosts: sretest1001.eqiad.wmnet
* 08:22 volans: rolling out debmonitor-client upgrade to 0.3.1 across the fleet
* 07:25 oblivian@cumin1001: END (FAIL) - Cookbook sre.discovery.service-route (exit_code=99)
* 07:25 oblivian@cumin1001: START - Cookbook sre.discovery.service-route
* 07:25 oblivian@cumin1001: END (FAIL) - Cookbook sre.discovery.service-route (exit_code=99)
* 07:25 oblivian@cumin1001: START - Cookbook sre.discovery.service-route
* 07:24 oblivian@cumin1001: END (FAIL) - Cookbook sre.discovery.service-route (exit_code=99)
* 07:24 oblivian@cumin1001: START - Cookbook sre.discovery.service-route
* 07:18 filippo@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=swift-ro,name=eqiad
* 07:18 filippo@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=swift,name=eqiad
* 07:17 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 06:37 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 01:52 bd808@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'toolhub' for release 'main' .
* 01:50 foks: changing user email for "Region of Peel Archives"
* 01:41 ejegg: updated payments-wiki from {{Gerrit|b329d2dea2}} to {{Gerrit|19d18c1852}}
* 01:35 bd808@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'toolhub' for release 'main' .
* 01:31 bd808@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'toolhub' for release 'main' .


== 2021-01-15 ==
== 2021-10-13 ==
* 23:59 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1118.eqiad.wmnet with reason: REIMAGE
* 23:37 dzahn@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'miscweb' for release 'main' .
* 23:57 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1118.eqiad.wmnet with reason: REIMAGE
* 23:36 eileen: civicrm revision changed from {{Gerrit|946dfb6c5a}} to {{Gerrit|018d3b19fe}}, config revision is {{Gerrit|85277466ed}}
* 21:22 pt1979@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti5002.eqsin.wmnet with reason: REIMAGE
* 23:36 tgr@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:730575{{!}}Create an alias for the project namespace on kswiki (T291740)]] (duration: 01m 05s)
* 21:19 pt1979@cumin2001: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti5002.eqsin.wmnet with reason: REIMAGE
* 22:30 dzahn@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'miscweb' for release 'main' .
* 20:39 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|66e6be391ecfde7ca0604146ab978987ce472b5c}}: Set anniversary logo for frwiki (3/3; [[phab:T272075|T272075]]) (duration: 00m 55s)
* 22:01 dancy@deploy1002: Synchronized php-1.38.0-wmf.3/extensions/Collection/includes/Specials/SpecialCollection.php: Backport: [[gerrit:730578{{!}}Api: Avoid trying to access undefined offset in a user's collection (T293261)]] (duration: 01m 04s)
* 20:37 urbanecm@deploy1001: Synchronized static/images/mobile/copyright/wikipedia-tagline-fr-20.svg: {{Gerrit|66e6be391ecfde7ca0604146ab978987ce472b5c}}: Set anniversary logo for frwiki (2/3; [[phab:T272075|T272075]]) (duration: 00m 55s)
* 21:50 dancy@deploy1002: Synchronized php-1.38.0-wmf.4/extensions/Collection: Backport: [[gerrit:730577{{!}}Api: Avoid trying to access undefined offset in a user's collection (T293261)]] (duration: 01m 04s)
* 20:36 urbanecm@deploy1001: Synchronized static/images/mobile/copyright/wikipedia-fr-20.svg: {{Gerrit|66e6be391ecfde7ca0604146ab978987ce472b5c}}: Set anniversary logo for frwiki (1/3; [[phab:T272075|T272075]]) (duration: 00m 58s)
* 21:47 foks: removing 8 files for legal compliance
* 20:21 urbanecm@deploy1001: Synchronized static/images/mobile/copyright/wikipedia-fr-20.svg: {{Gerrit|66e6be391ecfde7ca0604146ab978987ce472b5c}}: Set anniversary logo for frwiki (1/3; [[phab:T272075|T272075]]) (duration: 01m 54s)
* 21:03 foks: removing 2 files for legal compliance
* 17:17 legoktm: legoktm@contint2001:~$ sudo systemctl reload apache2 # for [[phab:T272159|T272159]]
* 21:00 mbsantos@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 16:17 bstorm: canceled downtime for maintain-dbusers on labstore1004 [[phab:T272127|T272127]]
* 20:50 bd808@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'toolhub' for release 'main' .
* 15:30 elukey: restart archiva to apply hot-fix for [[phab:T272082|T272082]]
* 20:49 brennen@deploy1002: Synchronized php-1.38.0-wmf.4/extensions/Collection/includes/Api/ApiGetBookCreatorBoxContent.php: Backport: [[gerrit:730574{{!}}Fall back to main page if given title is invalid (T293299)]] (duration: 01m 04s)
* 15:17 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ldap-replica1002.wikimedia.org
* 20:46 bd808@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'toolhub' for release 'main' .
* 15:14 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ldap-replica1002.wikimedia.org
* 20:40 bd808@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'toolhub' for release 'main' .
* 15:11 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ldap-replica1001.wikimedia.org
* 20:31 mbsantos@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 15:07 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ldap-replica1001.wikimedia.org
* 20:27 robh@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes1021.eqiad.wmnet with OS stretch
* 15:05 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host moscovium.eqiad.wmnet
* 20:04 robh@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes1021.eqiad.wmnet with OS stretch
* 15:01 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host moscovium.eqiad.wmnet
* 20:03 robh@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host kubernetes1021.eqiad.wmnet with OS stretch
* 14:42 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ldap-replica2003.wikimedia.org
* 20:01 robh@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes1021.eqiad.wmnet with OS stretch
* 14:39 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ldap-replica2003.wikimedia.org
* 19:18 dzahn@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'miscweb' for release 'main' .
* 14:29 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ldap-replica2004.wikimedia.org
* 19:16 mutante: gitlab2001 - status before was that "gitlab-ctl status" showed components "gitlab-workhorse" and "postgres-exporter" as "down". this was either pre-broken or caused by the restore process. after manually 'gitlab-ctl start gitlab-workhorse' all of the components are in "run" and https://gitlab-replica.wikimedia.org is up ( [[phab:T285867|T285867]])
* 14:25 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ldap-replica2004.wikimedia.org
* 19:08 mutante: gitl1b2001 - started workhorse which was for some reason marked as down after restore command ran
* 11:30 jynus: rolling restart of eqiad source backup dbs
* 19:08 mutante: [gitlab2001:~] $ sudo /usr/bin/gitlab-ctl start gitlab-workhorse
* 11:19 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-conf1002.eqiad.wmnet
* 19:06 dancy@deploy1002: Synchronized php: group1 wikis to 1.38.0-wmf.4  refs [[phab:T281168|T281168]] (duration: 01m 03s)
* 11:15 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-conf1002.eqiad.wmnet
* 19:05 dancy@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.38.0-wmf.4  refs [[phab:T281168|T281168]]
* 11:11 XioNoX: update cloud-in4 firewall rules
* 19:02 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|87879865c35edab3ead523027681146e00d6fc02}}: Create Translation namespace for viwikisource ([[phab:T290691|T290691]]) (duration: 01m 04s)
* 11:06 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc2036.codfw.wmnet
* 18:39 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|06fd0f225575448771cdba0d4e6bf36bb6715bc1}}: add extendedconfimed for autoreview group on ptwiki ([[phab:T292912|T292912]]) (duration: 01m 04s)
* 10:59 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc2036.codfw.wmnet
* 18:37 urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript initSiteStats.php --wiki=ptwiki --update
* 10:58 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-conf1003.eqiad.wmnet
* 18:33 urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript emptyUserGroup.php --wiki=ptwiki extendedconfirmed
* 10:56 jiji@cumin1001: END (ERROR) - Cookbook sre.hosts.reboot-single (exit_code=97) for host mc2036.codfw.wmnet
* 18:31 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|0bb2b388217aa91a39ed3684f87fdf7edb06fd81}}:  Set autoconfirmedextended and confirmedextended for ptwiki ([[phab:T292915|T292915]]) (duration: 01m 04s)
* 10:55 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc2036.codfw.wmnet
* 18:16 urbanecm@deploy1002: Synchronized static/images/project-logos: {{Gerrit|694bc234ab5dbb9a2387a6129998d45a53ac0ab3}}: Remove an old dawiki temporary logo (duration: 01m 04s)
* 10:53 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-conf1003.eqiad.wmnet
* 18:15 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|224e2a374b1cc6327e9d8c2bca576091ce4efc74}}: Add NS_MAIN back to wgExtraSignatureNamespaces for mediawikiwiki ([[phab:T291630|T291630]]) (duration: 01m 05s)
* 10:53 vgutierrez: re-enable puppet on acme-chief clients
* 18:12 volans@cumin2002: END (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: sretest1001.eqiad.wmnet
* 10:53 jynus: rolling restart of dbprov2* hosts
* 18:12 volans@cumin2002: START - Cookbook sre.debmonitor.remove-hosts for 1 hosts: sretest1001.eqiad.wmnet
* 10:52 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host acmechief1001.eqiad.wmnet
* 18:11 urbanecm@deploy1002: Synchronized static/images/project-logos/: {{Gerrit|1b96f54a518620b0dc6a0ab63b402d0ea2c6bf70}}: Update logo for liwiktionary ([[phab:T291479|T291479]]) (duration: 01m 14s)
* 10:52 _joe_: rebuilding the docker images coredns,nutcracker,prometheus-statsd-exporter,service-checker,wmfdebug to use wikimedia-buster as a base
* 18:10 volans@cumin2002: END (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: sretest1001.eqiad.wmnet
* 10:51 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-conf1001.eqiad.wmnet
* 18:10 volans@cumin2002: START - Cookbook sre.debmonitor.remove-hosts for 1 hosts: sretest1001.eqiad.wmnet
* 10:48 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host acmechief1001.eqiad.wmnet
* 18:09 volans@cumin2002: END (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: sretest1001.eqiad.wmnet
* 10:46 vgutierrez: disable puppet on acme-chief clients
* 18:09 volans@cumin2002: START - Cookbook sre.debmonitor.remove-hosts for 1 hosts: sretest1001.eqiad.wmnet
* 10:45 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-conf1001.eqiad.wmnet
* 18:08 volans: uploaded debmonitor-client_0.3.1 to apt.wikimedia.org stretch-wikimedia,buster-wikimedia,bullseye-wikimedia
* 10:43 effie: reboot mc2036 - [[phab:T269596|T269596]]
* 17:14 urbanecm@deploy1002: Synchronized php-1.38.0-wmf.4/extensions/GrowthExperiments/maintenance/initWikiConfig.php: {{Gerrit|dd7a3314602ffddc5b917cccc71c917301639388}}: initWikiConfig: Fix loading difficulty/group from SUGGESTED_EDITS_TASK_TYPES ([[phab:T293219|T293219]]) (duration: 01m 04s)
* 10:40 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host acmechief-test1001.eqiad.wmnet
* 17:13 urbanecm@deploy1002: Synchronized php-1.38.0-wmf.3/extensions/GrowthExperiments/maintenance/initWikiConfig.php: {{Gerrit|5c27154cf434bebc37f5e98e2ad1b5cea7cde1d4}}: initWikiConfig: Fix loading difficulty/group from SUGGESTED_EDITS_TASK_TYPES ([[phab:T293219|T293219]]) (duration: 01m 15s)
* 10:27 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host acmechief-test1001.eqiad.wmnet
* 16:57 mutante: stat1008 - short on disk space, mostly used in /tmp, high CPU usage by R proccess, sent a message about it to all shell users via wall
* 10:26 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host acmechief-test2001.codfw.wmnet
* 16:50 mutante: stat1008 - apt-get clean - freed 1.3 GB disk space - was alerting in Icinga because / was 97% full
* 10:21 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host acmechief-test2001.codfw.wmnet
* 16:37 volans@cumin2002: END (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: sretest1001.eqiad.wmnet
* 10:10 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-test-client1001.eqiad.wmnet
* 16:37 volans@cumin2002: START - Cookbook sre.debmonitor.remove-hosts for 1 hosts: sretest1001.eqiad.wmnet
* 10:07 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-test-client1001.eqiad.wmnet
* 16:23 volans@cumin2002: END (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: sretest1001.eqiad.wmnet
* 10:02 vgutierrez@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1)
* 16:23 volans@cumin2002: START - Cookbook sre.debmonitor.remove-hosts for 1 hosts: sretest1001.eqiad.wmnet
* 09:58 reedy@deploy1001: Synchronized php-1.36.0-wmf.26/extensions/GrowthExperiments/includes/NewcomerTasks/TaskSuggester/CacheDecorator.php: [[phab:T272103|T272103]] (duration: 00m 57s)
* 15:29 volans@cumin2002: END (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: sretest1001.eqiad.wmnet
* 09:49 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single
* 15:28 volans@cumin2002: START - Cookbook sre.debmonitor.remove-hosts for 1 hosts: sretest1001.eqiad.wmnet
* 09:36 vgutierrez: rolling restart acme-chief servers to catch up on kernel upgrades
* 15:26 volans@cumin2002: END (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: sretest1001.eqiad.wmnet
* 09:24 jynus: rolling restart of dbprov1* hosts
* 15:26 volans@cumin2002: START - Cookbook sre.debmonitor.remove-hosts for 1 hosts: sretest1001.eqiad.wmnet
* 09:19 godog: swift codfw-prod: more weight to ms-be20[58-61] - [[phab:T269337|T269337]]
* 15:16 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2026.codfw.wmnet
* 09:07 moritzm: installing bast5002 [[phab:T257324|T257324]]
* 15:13 jbond@cumin1001: END (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: sretest1001.eqiad.wmnet
* 08:45 moritzm: installing bast4003 [[phab:T257324|T257324]]
* 15:13 jbond@cumin1001: START - Cookbook sre.debmonitor.remove-hosts for 1 hosts: sretest1001.eqiad.wmnet
* 08:39 marostegui: Restart clouddb1013-clouddb1020
* 15:12 jbond@cumin1001: END (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: sretest1001.eqiad.wmnet
* 08:28 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 15:12 jbond@cumin1001: START - Cookbook sre.debmonitor.remove-hosts for 1 hosts: sretest1001.eqiad.wmnet
* 08:28 ryankemper: WDQS puppet run successful
* 15:09 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2026.codfw.wmnet
* 08:15 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single
* 15:04 jgiannelos@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 08:15 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 15:03 jbond@cumin1001: END (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: sretest1001.eqiad.wmnet
* 08:01 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single
* 15:03 jbond@cumin1001: START - Cookbook sre.debmonitor.remove-hosts for 1 hosts: sretest1001.eqiad.wmnet
* 07:59 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 15:01 jgiannelos@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 07:57 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single
* 15:01 jbond@cumin1001: END (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: sretest1001.eqiad.wmnet
* 03:49 eileen: civicrm revision changed from {{Gerrit|f417a510a5}} to {{Gerrit|4220fc8177}}, config revision is {{Gerrit|f08249ecf9}}
* 15:01 jbond@cumin1001: START - Cookbook sre.debmonitor.remove-hosts for 1 hosts: sretest1001.eqiad.wmnet
* 14:59 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 14:59 jbond@cumin1001: END (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: sretest1001.eqiad.wmnet
* 14:59 jbond@cumin1001: START - Cookbook sre.debmonitor.remove-hosts for 1 hosts: sretest1001.eqiad.wmnet
* 14:57 jbond@cumin1001: END (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: sretest1001.eqiad.wmnet
* 14:56 jbond@cumin1001: START - Cookbook sre.debmonitor.remove-hosts for 1 hosts: sretest1001.eqiad.wmnet
* 14:56 jbond@cumin1001: END (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: sretest1001.eqiad.wmnet
* 14:56 jbond@cumin1001: START - Cookbook sre.debmonitor.remove-hosts for 1 hosts: sretest1001.eqiad.wmnet
* 14:54 jbond@cumin1001: END (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: sretest1001.eqiad.wmnet
* 14:54 jbond@cumin1001: START - Cookbook sre.debmonitor.remove-hosts for 1 hosts: sretest1001.eqiad.wmnet
* 14:52 ema: repool cp4021, further testing can be performed on sretest1001 [[phab:T201317|T201317]]
* 14:51 volans: restarting ircecho.service on alert1001 to get back icinga-wm without the underscore
* 14:50 elukey: restart pybal on lvs1015 (low-traffic primary) to pick up new config for inference.discovery.wmnet - [[phab:T289835|T289835]]
* 14:48 moritzm: reverted to clean package state on deneb
* 14:44 elukey@puppetmaster1001: conftool action : ge; selector: cluster=ml_serve,service=inference
* 14:36 elukey: restart pybal on lvs1016 (low-traffic secondary) to pick up new config for inference.discovery.wmnet - [[phab:T289835|T289835]]
* 14:27 jbond@cumin1001: END (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: sretest1001.eqiad.wmnet
* 14:27 jbond@cumin1001: START - Cookbook sre.debmonitor.remove-hosts for 1 hosts: sretest1001.eqiad.wmnet
* 14:25 jbond@cumin1001: END (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: sretest1001.eqiad.wmnet
* 14:25 jbond@cumin1001: START - Cookbook sre.debmonitor.remove-hosts for 1 hosts: sretest1001.eqiad.wmnet
* 14:21 jbond@cumin1001: END (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: sretest1001.eqiad.wmnet
* 14:21 jbond@cumin1001: START - Cookbook sre.debmonitor.remove-hosts for 1 hosts: sretest1001.eqiad.wmnet
* 14:20 moritzm: temporarily downgrade sphinx packages on deneb to 1.7.9-1~bpo9+1 to build a Ganeti 2.16 stretch backport with delicate toolchain needs
* 14:13 jbond@cumin1001: END (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: sretest1001.eqiad.wmnet
* 14:13 jbond@cumin1001: START - Cookbook sre.debmonitor.remove-hosts for 1 hosts: sretest1001.eqiad.wmnet
* 14:10 jbond@cumin1001: END (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: sretest1001.eqiad.wmnet
* 14:10 jbond@cumin1001: START - Cookbook sre.debmonitor.remove-hosts for 1 hosts: sretest1001.eqiad.wmnet
* 14:10 jbond@cumin1001: END (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: sretest1001.eqiad.wmnet
* 14:10 jbond@cumin1001: START - Cookbook sre.debmonitor.remove-hosts for 1 hosts: sretest1001.eqiad.wmnet
* 13:59 XioNoX: push prep-work for anycast tuning in ulsfo - [[phab:T288843|T288843]]
* 13:38 jayme: imported helm-diff_3.1.3-2 to buster-wikimedia (https://gerrit.wikimedia.org/r/c/operations/debs/helm-diff/+/730509)
* 13:37 jayme@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'shellbox' for release 'main' .
* 13:34 ema@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4021.ulsfo.wmnet with OS buster
* 12:13 Lucas_WMDE: UTC morning backport+config window done
* 12:12 kharlan@deploy1002: Synchronized php-1.38.0-wmf.3/extensions/GrowthExperiments/includes: Backport: [[gerrit:730370{{!}}Add Link: Do not log "no suggestion found" errors in production log (T291251)]] (duration: 01m 04s)
* 12:11 urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript extensions/GrowthExperiments/maintenance/initWikiConfig.php --wiki=itwiki --phab='[[phab:T255037|T255037]]'  # after applying 730512 at mwmaint1002 to workaround [[phab:T293219|T293219]] # [[phab:T255037|T255037]]
* 12:11 kharlan@deploy1002: Synchronized php-1.38.0-wmf.3/extensions/GrowthExperiments/modules: Backport: [[gerrit:730371{{!}}Suggested Edits: Update local config.presets when topics/difficulty presets change (T292536)]] (duration: 01m 07s)
* 11:56 urbanecm@deploy1002: Synchronized wmf-config/config/itwiki.yaml: {{Gerrit|38a019d4fd6ff8e7cf92f5e7c6a899c336f20235}}: itwiki: Deploy Growth features in dark mode ([[phab:T255037|T255037]]) (duration: 01m 04s)
* 11:55 urbanecm: mwscript extensions/Translate/scripts/moveTranslatablePage.php --wiki=mediawikiwiki "Growth/Communities/How to introduce yourself as a mentor" "Growth/Communities/How to configure the mentors' list" "Martin Urbanec (WMF)" --reason '[[:phab:T293184]]' # [[phab:T293184|T293184]]
* 11:55 urbanecm@deploy1002: Synchronized dblists/growthexperiments.dblist: {{Gerrit|38a019d4fd6ff8e7cf92f5e7c6a899c336f20235}}: Deploy Growth features in dark mode ([[phab:T255037|T255037]]; 2/3) (duration: 01m 04s)
* 11:54 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|38a019d4fd6ff8e7cf92f5e7c6a899c336f20235}}: itwiki: Deploy Growth features in dark mode ([[phab:T255037|T255037]]; 1/3) (duration: 01m 05s)
* 11:50 urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript extensions/GrowthExperiments/maintenance/initWikiConfig.php --wiki=itwiki --phab='[[phab:T255037|T255037]]' # [[phab:T255037|T255037]]
* 11:49 urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript extensions/WikimediaMaintenance/createExtensionTables.php --wiki=itwiki growthexperiments # [[phab:T255037|T255037]]
* 11:48 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.38.0-wmf.4/extensions/Wikibase/repo/: Backport: [[gerrit:730380{{!}}Instantiate ItemId for SiteLinkConflictLookup results (T293104)]] (duration: 01m 07s)
* 11:43 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.38.0-wmf.3/extensions/Wikibase/repo/: Backport: [[gerrit:730385{{!}}Instantiate ItemId for SiteLinkConflictLookup results (T293104)]] (duration: 01m 18s)
* 11:33 ema@cumin2002: START - Cookbook sre.hosts.reimage for host cp4021.ulsfo.wmnet with OS buster
* 11:19 ema: pool cp4021 after reimage [[phab:T201317|T201317]]
* 11:05 ema@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4021.ulsfo.wmnet with OS buster
* 10:15 jgiannelos@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 10:10 ayounsi@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 10:09 phuedx@deploy1002: Synchronized wmf-config/InitialiseSettings-labs.php: Config: [[gerrit:728490{{!}}Add more types of QuickSurveys on beta cluster (T292459)]] (duration: 01m 53s)
* 10:06 ayounsi@cumin1001: START - Cookbook sre.dns.netbox
* 09:22 ema@cumin2002: START - Cookbook sre.hosts.reimage for host cp4021.ulsfo.wmnet with OS buster
* 08:35 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 08:28 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 08:21 elukey: run kafka preferred-replica-election on kafka-main1001 to rebalance partition leaders - [[phab:T288825|T288825]]
* 08:15 godog: bounce graphite on graphite1004 to apply new config
* 07:33 elukey: increase kafka topic partition size of the top 4 high traffic topics of main-eqiad as described in https://phabricator.wikimedia.org/T288825#7422726
* 07:13 XioNoX: provision new eqsin-ulsfo link - [[phab:T273308|T273308]]
* 06:26 elukey: `kafka topics --alter --topic <nowiki>{</nowiki>eqiad,codfw<nowiki>}</nowiki>.change-prop.transcludes.resource-change --partitions 3` on kafka-main2001 - [[phab:T288825|T288825]]
* 00:38 ejegg: updated payments-wiki from {{Gerrit|030b11da1a}} to {{Gerrit|b329d2dea2}}


== 2021-01-14 ==
== 2021-10-12 ==
* 23:35 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2236.codfw.wmnet
* 23:48 dzahn@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'miscweb' for release 'main' .
* 23:13 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [[phab:T272094|T272094]] Change enwiki logo to 20th Birthday Celebration one (duration: 00m 56s)
* 23:16 urbanecm: UTC late B&C window done
* 23:11 jforrester@deploy1001: Synchronized static/images/project-logos/enwiki20-2x.png: [[phab:T272094|T272094]] Sync out logo before going live, 3/3 (duration: 00m 55s)
* 23:15 urbanecm@deploy1002: Synchronized wmf-config/logos.php: {{Gerrit|59c31d9046a68e73b07d8179ac569425d18dcf73}}: Change logo in astwiki ([[phab:T292742|T292742]]) (duration: 01m 04s)
* 23:09 jforrester@deploy1001: Synchronized static/images/project-logos/enwiki20-1.5x.png: [[phab:T272094|T272094]] Sync out logo before going live, 2/3 (duration: 00m 55s)
* 23:12 urbanecm@deploy1002: Synchronized static/images/project-logos/: {{Gerrit|59c31d9046a68e73b07d8179ac569425d18dcf73}}: Change logo in astwiki ([[phab:T292742|T292742]]) (duration: 02m 09s)
* 23:07 jforrester@deploy1001: Synchronized static/images/project-logos/enwiki20.png: [[phab:T272094|T272094]] Sync out logo before going live, 1/3 (duration: 01m 02s)
* 23:05 dzahn@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'miscweb' for release 'main' .
* 23:02 mutante: Happy 20th Birthday Wikipedia - https://20.wikipedia.org - https://gerrit.wikimedia.org/r/656268
* 22:53 urbanecm: [urbanecm@labweb1001 ~]$ mwscript extensions/OATHAuth/maintenance/disableOATHAuthForUser.php --wiki=labswiki Jamesmontalvo3 #
* 22:55 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2236.codfw.wmnet
* 22:51 dzahn@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'miscweb' for release 'main' .
* 22:38 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2270.codfw.wmnet
* 20:21 bd808@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'toolhub' for release 'main' .
* 22:38 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2268.codfw.wmnet
* 19:31 dancy@deploy1002: Pruned MediaWiki: 1.38.0-wmf.1 (duration: 04m 02s)
* 22:38 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2269.codfw.wmnet
* 19:13 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 22:32 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2269.codfw.wmnet
* 19:08 pt1979@cumin2002: START - Cookbook sre.dns.netbox
* 22:32 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2270.codfw.wmnet
* 19:02 dancy@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.38.0-wmf.4  refs [[phab:T281168|T281168]]
* 22:32 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2268.codfw.wmnet
* 18:47 dancy@deploy1002: Finished scap: testwikis wikis to 1.38.0-wmf.4  refs [[phab:T281168|T281168]] (duration: 45m 36s)
* 22:04 thcipriani: restart apache on gerrit1001
* 18:12 volans@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sretest1001.eqiad.wmnet with OS buster
* 21:57 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2236.codfw.wmnet with reason: REIMAGE
* 18:01 dancy@deploy1002: Started scap: testwikis wikis to 1.38.0-wmf.4  refs [[phab:T281168|T281168]]
* 21:55 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2236.codfw.wmnet with reason: REIMAGE
* 17:58 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 21:53 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2270.codfw.wmnet with reason: REIMAGE
* 17:56 dancy@deploy1002: Synchronized php-1.38.0-wmf.4/extensions/CentralNotice: Backport: [[gerrit:730141]] (duration: 00m 59s)
* 21:51 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2269.codfw.wmnet with reason: REIMAGE
* 17:55 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 21:50 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2270.codfw.wmnet with reason: REIMAGE
* 17:46 volans@cumin2002: START - Cookbook sre.hosts.reimage for host sretest1001.eqiad.wmnet with OS buster
* 21:49 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2268.codfw.wmnet with reason: REIMAGE
* 17:43 jgiannelos@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 21:49 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2269.codfw.wmnet with reason: REIMAGE
* 17:41 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 21:47 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2268.codfw.wmnet with reason: REIMAGE
* 17:41 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 21:23 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2258.codfw.wmnet
* 17:38 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 21:23 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2255.codfw.wmnet
* 17:32 dancy@deploy1002: Synchronized php-1.38.0-wmf.3/extensions/SyntaxHighlight_GeSHi/includes/ResourceLoaderPygmentsModule.php: Backport: [[gerrit:730233{{!}}Include generated styles before Mediawiki overrides (T292736)]] (duration: 00m 57s)
* 21:19 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ml-serve1003.eqiad.wmnet with reason: REIMAGE
* 17:30 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 21:18 razzi@cumin1001: END (PASS) - Cookbook sre.druid.reboot-workers (exit_code=0) for Druid analytics cluster: Reboot Druid nodes - razzi@cumin1001
* 17:27 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 21:18 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ml-serve1004.eqiad.wmnet with reason: REIMAGE
* 17:23 dancy@deploy1002: Synchronized php-1.38.0-wmf.4/includes/actions/pagers/HistoryPager.php: Backport: [[gerrit:730236{{!}}Fix history page iteration in backwards mode (T292791)]] (duration: 00m 57s)
* 21:16 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2242.codfw.wmnet
* 17:19 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 21:16 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2241.codfw.wmnet
* 17:16 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 21:16 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ml-serve1002.eqiad.wmnet with reason: REIMAGE
* 17:16 dancy@deploy1002: Synchronized php-1.38.0-wmf.3/includes/actions/pagers/HistoryPager.php: Backport: [[gerrit:730235{{!}}Fix history page iteration in backwards mode (T292791)]] (duration: 00m 57s)
* 21:15 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on ml-serve1003.eqiad.wmnet with reason: REIMAGE
* 17:12 moritzm: installing rsync bugfix updates
* 21:15 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on ml-serve1004.eqiad.wmnet with reason: REIMAGE
* 17:09 bd808@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'toolhub' for release 'main' .
* 21:14 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on ml-serve1002.eqiad.wmnet with reason: REIMAGE
* 16:56 bd808@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'toolhub' for release 'main' .
* 21:12 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2258.codfw.wmnet
* 16:55 volans@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts testvm2009.codfw.wmnet
* 21:12 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2255.codfw.wmnet
* 16:53 moritzm: failed over ganeti master for test cluster to ganeti2025
* 21:10 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2242.codfw.wmnet
* 16:50 bd808@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'toolhub' for release 'main' .
* 21:10 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2241.codfw.wmnet
* 16:48 volans@cumin2002: START - Cookbook sre.hosts.decommission for hosts testvm2009.codfw.wmnet
* 20:17 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ml-serve1001.eqiad.wmnet with reason: REIMAGE
* 16:32 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 20:17 mutante: ACKing all unhandled crit alerts about systemd on clouddb hosts - notifications are disabled but this cleans up Icinga web UI noise - [[phab:T267090|T267090]]
* 16:30 volans@cumin2002: END (ERROR) - Cookbook sre.hosts.decommission (exit_code=97) for hosts testvm2009.codfw.wmnet
* 20:15 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on ml-serve1001.eqiad.wmnet with reason: REIMAGE
* 16:30 volans@cumin2002: START - Cookbook sre.hosts.decommission for hosts testvm2009.codfw.wmnet
* 20:05 razzi@cumin1001: START - Cookbook sre.druid.reboot-workers for Druid analytics cluster: Reboot Druid nodes - razzi@cumin1001
* 16:29 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 19:31 urbanecm@deploy1001: Synchronized dblists/closed.dblist: {{Gerrit|d3e274e9b953f5edda07fa3a016b7291a451ceb2}}: Close lrcwiki ([[phab:T272041|T272041]]) (duration: 00m 58s)
* 16:26 volans@cumin2002: END (ERROR) - Cookbook sre.ganeti.makevm (exit_code=97) for new host testvm2009.codfw.wmnet
* 19:03 mutante: mc1024 - attempting to power on via mgmt, went down and power down
* 16:26 dancy@deploy1002: Synchronized php-1.38.0-wmf.4/includes: Backport: [[gerrit:730226{{!}}Pre-format comments for non-local files too (T292570)]] (duration: 01m 15s)
* 18:45 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2258.codfw.wmnet with reason: REIMAGE
* 16:17 volans@cumin2002: START - Cookbook sre.ganeti.makevm for new host testvm2009.codfw.wmnet
* 18:43 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2255.codfw.wmnet with reason: REIMAGE
* 16:16 volans@cumin2002: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts testvm2009.codfw.wmnet
* 18:41 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2242.codfw.wmnet with reason: REIMAGE
* 16:11 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 18:41 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2258.codfw.wmnet with reason: REIMAGE
* 16:10 volans@cumin2002: START - Cookbook sre.hosts.decommission for hosts testvm2009.codfw.wmnet
* 18:40 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2255.codfw.wmnet with reason: REIMAGE
* 16:09 volans@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts testvm2009.codfw.wmnet
* 18:39 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2241.codfw.wmnet with reason: REIMAGE
* 16:08 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 18:38 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2242.codfw.wmnet with reason: REIMAGE
* 16:06 dancy@deploy1002: Synchronized php-1.38.0-wmf.4/extensions/SecurePoll/includes/Hooks/HookRunner.php: Backport: [[gerrit:730231{{!}}Fix wrong var being passed (T289950 T293102)]] (duration: 00m 57s)
* 18:38 Amir1: started mass deletion of lrcwiki ([[phab:T272041|T272041]]) - https://w.wiki/uPV
* 16:00 volans@cumin2002: START - Cookbook sre.hosts.decommission for hosts testvm2009.codfw.wmnet
* 18:37 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2241.codfw.wmnet with reason: REIMAGE
* 15:59 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 18:36 jynus: restarting backup1002, backup2002 [[phab:T271913|T271913]]
* 15:58 dancy@deploy1002: Synchronized php-1.38.0-wmf.3/extensions/SecurePoll/includes/Hooks/HookRunner.php: Backport: [[gerrit:730230{{!}}Fix wrong var being passed (T289950 T293102)]] (duration: 02m 13s)
* 18:05 jynus: restarting backup1001, backup2001 [[phab:T271913|T271913]]
* 15:57 volans@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host testvm2009.codfw.wmnet
* 16:47 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on 10 hosts with reason: upgrading openstack
* 15:57 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 16:47 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on 10 hosts with reason: upgrading openstack
* 15:51 jgiannelos@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 16:47 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on 93 hosts with reason: upgrading openstack
* 15:49 volans@cumin2002: START - Cookbook sre.ganeti.makevm for new host testvm2009.codfw.wmnet
* 16:46 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on 93 hosts with reason: upgrading openstack
* 15:48 volans@cumin2002: END (ERROR) - Cookbook sre.ganeti.makevm (exit_code=97) for new host testvm2009.codfw.wmnet
* 16:32 moritzm: installing php-pear updates on stretch
* 15:48 volans@cumin2002: START - Cookbook sre.ganeti.makevm for new host testvm2009.codfw.wmnet
* 16:03 moritzm: installing tomcat8 security updates
* 15:41 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for analytics1069.eqiad.wmnet
* 15:40 moritzm: installing sqlite3 security updates on Stretch
* 15:41 btullis@cumin1001: START - Cookbook sre.hosts.remove-downtime for analytics1069.eqiad.wmnet
* 15:30 papaul: power down ms-be2022 for maintenance
* 15:02 volans@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 15:19 otto@deploy1001: Finished deploy [analytics/refinery@1117f45]: Explicitly set timeout in banner_activity-druid-monthly-coord - [[phab:T264358|T264358]] (duration: 02m 16s)
* 14:50 volans@cumin2002: START - Cookbook sre.dns.netbox
* 15:16 otto@deploy1001: Started deploy [analytics/refinery@1117f45]: Explicitly set timeout in banner_activity-druid-monthly-coord - [[phab:T264358|T264358]]
* 13:49 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 15:11 elukey@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1)
* 13:40 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host testvm2006.codfw.wmnet
* 15:00 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on 93 hosts with reason: upgrading openstack
* 13:25 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host testvm2006.codfw.wmnet
* 14:59 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on 93 hosts with reason: upgrading openstack
* 13:21 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 14:59 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on 10 hosts with reason: upgrading openstack
* 13:19 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 14:59 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on 10 hosts with reason: upgrading openstack
* 13:14 godog: add 50G to prometheus/k8s in eqiad
* 14:56 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single
* 13:13 otto@deploy1002: Synchronized wmf-config/CommonSettings.php: Enable x_client_ip_forwarding_enabled for eventgate-analytics and eventgate-analytics-external - [[phab:T288853|T288853]] (duration: 00m 56s)
* 14:30 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 13:11 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on analytics1069.eqiad.wmnet with reason: draining flea power [[phab:T291732|T291732]]
* 14:28 arturo: running homer in asw-b-codfw* ([[phab:T271519|T271519]])
* 13:11 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on analytics1069.eqiad.wmnet with reason: draining flea power [[phab:T291732|T291732]]
* 14:26 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single
* 13:05 volans: upgraed spicerack to 1.0.5 on cumin hosts
* 14:24 arturo: running homer in asw-b-codfw* ([[phab:T271519|T271519]])
* 12:25 volans: uploaded spicerack_1.0.5 to apt.wikimedia.org buster-wikimedia,bullseye-wikimedia
* 14:10 liw@deploy1001: rebuilt and synchronized wikiversions files: all wikis to 1.36.0-wmf.26
* 12:15 elukey: `kafka topics --alter --topic codfw.mediawiki.job.cirrusSearchElasticaWrite --partitions 5` - [[phab:T288825|T288825]]
* 14:07 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 12:15 elukey: `kafka topics --alter --topic eqiad.mediawiki.job.cirrusSearchElasticaWrite --partitions 5` - [[phab:T288825|T288825]]
* 14:06 jmm@cumin2001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
* 12:10 elukey: `kafka topics --alter --topic codfw.cpjobqueue.partitioned.mediawiki.job.cirrusSearchElasticaWrite --partitions 5` - [[phab:T288825|T288825]]
* 14:06 hashar@deploy1001: Synchronized php-1.36.0-wmf.26/skins/CologneBlue/includes/CologneBlueHooks.php: Edit link may not be present, avoid undefined index notice [[phab:T271978|T271978]] (duration: 01m 07s)
* 12:09 elukey: `kafka topics --alter --topic eqiad.cpjobqueue.partitioned.mediawiki.job.cirrusSearchElasticaWrite --partitions 5` - [[phab:T288825|T288825]]
* 13:56 aborrero@cumin2001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
* 11:58 elukey: `kafka topics --alter --topic codfw.resource-purge --partitions 5` on kafka-main2001 - [[phab:T288825|T288825]]
* 13:47 marostegui: Restart mysql on db2094 for openssl upgrades test
* 11:49 elukey: `kafka topics --alter --topic eqiad.resource-purge --partitions 5` on kafka-main2001 - [[phab:T288825|T288825]]
* 13:42 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single
* 11:46 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2025.codfw.wmnet
* 13:23 moritzm: restarting mw canaries for openssl update
* 11:44 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 13:22 jmm@cumin2001: START - Cookbook sre.ganeti.makevm
* 11:42 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 13:22 aborrero@cumin2001: START - Cookbook sre.dns.netbox
* 11:40 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2025.codfw.wmnet
* 13:17 moritzm: installing openssl1.0 security updates on stretch
* 11:34 urbanecm: UTC morning B&C window done
* 13:15 jmm@cumin2001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
* 11:33 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 13:11 moritzm: installing xerces-c security updates on stretch
* 11:32 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|860ea0944d6dc1e6b5061eb84eec378eb5ac8441}}: Remove NS_MAIN from wgExtraSignatureNamespaces on most special wikis ([[phab:T291630|T291630]]) (duration: 00m 57s)
* 12:50 volans: upgraded python3-pynetbox to 5.3.0-1 on all affected hosts - [[phab:T266487|T266487]]
* 11:30 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 12:49 jmm@cumin2001: START - Cookbook sre.ganeti.makevm
* 11:22 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 12:47 aborrero@cumin2001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=99)
* 11:19 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 12:34 jmm@cumin2001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99)
* 11:14 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 12:34 jmm@cumin2001: START - Cookbook sre.ganeti.makevm
* 11:11 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 12:33 aborrero@cumin2001: START - Cookbook sre.hosts.decommission
* 11:08 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 12:29 aborrero@cumin2001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=99)
* 11:06 urbanecm@deploy1002: Synchronized w/static.php: {{Gerrit|e77ae17efb34723598fc69e87109944384df442a}}: static.php: correctly report a bad request (duration: 00m 57s)
* 12:29 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-presto1004.eqiad.wmnet with reason: REIMAGE
* 11:02 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host testvm2003.codfw.wmnet
* 12:27 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-presto1004.eqiad.wmnet with reason: REIMAGE
* 10:58 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host testvm2003.codfw.wmnet
* 12:24 XioNoX: push pfw3 firewall rules - [[phab:T271935|T271935]]
* 10:53 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on testvm[2001-2002,2005].codfw.wmnet with reason: Ganeti tests
* 12:16 volans: upgraded python3-pynetbox to 5.3.0-1 on cumin2001
* 10:53 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on testvm[2001-2002,2005].codfw.wmnet with reason: Ganeti tests
* 12:16 aborrero@cumin2001: START - Cookbook sre.hosts.decommission
* 10:30 ema: apply https://gerrit.wikimedia.org/r/726912 to all A:cp nodes [[phab:T288106|T288106]]
* 12:14 elukey@cumin1001: END (ERROR) - Cookbook sre.presto.reboot-workers (exit_code=97) for Presto analytics cluster: Reboot Presto nodes - elukey@cumin1001
* 10:24 ema@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp4028.ulsfo.wmnet,service=ats-be
* 12:14 volans: built and uploaded python3-pynetbox 5.3.0-1 to apt.wikimedia.org - [[phab:T266487|T266487]]
* 10:23 ema: depool/repool ats-be on cp4028 to verify updates to /etc/varnish/directors.frontend.vcl on cp4027 keep on working fine [[phab:T288106|T288106]]
* 12:10 awight: EU config window finished.
* 10:23 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 12:09 awight@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:656116{{!}}Remove unused WMDE TeWü QuickSurveys (T253112, T272013)]] (duration: 01m 07s)
* 10:22 ema@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp4028.ulsfo.wmnet,service=ats-be
* 12:04 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 10:16 ema: cp4027: enable and run puppet to test https://gerrit.wikimedia.org/r/726912 [[phab:T288106|T288106]]
* 12:02 moritzm: rebooting miscweb1002
* 10:12 jmm@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ganeti2025.codfw.wmnet with OS buster
* 12:02 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
* 09:16 kormat@cumin1001: dbctl commit (dc=all): 'db1127 (re)pooling @ 100%: repool db1127 [[phab:T292956|T292956]]', diff saved to https://phabricator.wikimedia.org/P17456 and previous config saved to /var/cache/conftool/dbconfig/20211012-091614-kormat.json
* 11:53 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 09:01 kormat@cumin1001: dbctl commit (dc=all): 'db1127 (re)pooling @ 75%: repool db1127 [[phab:T292956|T292956]]', diff saved to https://phabricator.wikimedia.org/P17455 and previous config saved to /var/cache/conftool/dbconfig/20211012-090111-kormat.json
* 11:49 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
* 08:46 kormat@cumin1001: dbctl commit (dc=all): 'db1127 (re)pooling @ 50%: repool db1127 [[phab:T292956|T292956]]', diff saved to https://phabricator.wikimedia.org/P17454 and previous config saved to /var/cache/conftool/dbconfig/20211012-084607-kormat.json
* 11:48 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 08:31 kormat@cumin1001: dbctl commit (dc=all): 'db1127 (re)pooling @ 25%: repool db1127 [[phab:T292956|T292956]]', diff saved to https://phabricator.wikimedia.org/P17453 and previous config saved to /var/cache/conftool/dbconfig/20211012-083103-kormat.json
* 11:44 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
* 08:03 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 11:43 jmm@cumin2001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99)
* 08:00 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 11:43 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
* 07:58 urbanecm@deploy1002: Synchronized php-1.38.0-wmf.3/extensions/GrowthExperiments/: {{Gerrit|17dc3aa}}, {{Gerrit|e0ca905}}, {{Gerrit|c0f4f4e}}: GrowthExperiments backports ([[phab:T292224|T292224]], [[phab:T290609|T290609]], [[phab:T290609|T290609]]) (duration: 00m 59s)
* 11:39 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 07:40 elukey: run kafka preferred-replica-election on kafka-main2001 to rebalance partition leaders after the last topic moves - [[phab:T288825|T288825]]
* 11:34 oblivian@deploy1001: Finished deploy [docker-pkg/deploy@4164318]: (no justification provided) (duration: 30m 34s)
* 07:33 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti2025.codfw.wmnet with OS buster
* 11:34 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
* 07:22 moritzm: installing RT security updates
* 11:22 elukey@cumin1001: START - Cookbook sre.presto.reboot-workers for Presto analytics cluster: Reboot Presto nodes - elukey@cumin1001
* 04:43 eileen: civicrm revision changed from {{Gerrit|96090e4bd2}} to {{Gerrit|946dfb6c5a}}, config revision is {{Gerrit|85277466ed}}
* 11:17 vgutierrez@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1)
* 03:56 kart_: cxserver: Remove Matxin Key from Production ([[phab:T292635|T292635]])
* 11:04 oblivian@deploy1001: Started deploy [docker-pkg/deploy@4164318]: (no justification provided)
* 03:54 kartik@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'cxserver' for release 'production' .
* 11:04 oblivian@deploy1001: deploy aborted: (no justification provided) (duration: 00m 14s)
* 03:48 kartik@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'cxserver' for release 'production' .
* 11:03 oblivian@deploy1001: Started deploy [docker-pkg/deploy@4164318]: (no justification provided)
* 03:45 kartik@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'cxserver' for release 'staging' .
* 11:01 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single
* 02:44 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 10:42 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 02:41 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 10:36 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single
* 02:28 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 10:35 aborrero@cumin2001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
* 02:22 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 10:29 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 00:11 eileen: civicrm revision changed from {{Gerrit|598b59b0ee}} to {{Gerrit|96090e4bd2}}, config revision is {{Gerrit|85277466ed}}
* 10:28 jbond42: failover apt.wikimedia.org back to apt1001
* 10:28 aborrero@cumin2001: START - Cookbook sre.hosts.decommission
* 10:25 jbond42: reboot apt1001
* 10:22 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single
* 10:19 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 10:16 jbond42: failover apt.wikimedia.org to apt2001
* 10:14 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single
* 10:12 jbond42: reboot apt2001
* 10:01 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 09:55 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single
* 09:44 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 09:40 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single
* 09:38 marostegui@cumin1001: dbctl commit (dc=all): 'db1138 (re)pooling @ 100%: After restarting mysql', diff saved to https://phabricator.wikimedia.org/P13768 and previous config saved to /var/cache/conftool/dbconfig/20210114-093803-root.json
* 09:32 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 09:29 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single
* 09:23 marostegui@cumin1001: dbctl commit (dc=all): 'db1138 (re)pooling @ 75%: After restarting mysql', diff saved to https://phabricator.wikimedia.org/P13767 and previous config saved to /var/cache/conftool/dbconfig/20210114-092300-root.json
* 09:18 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 09:14 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single
* 09:11 godog: swift eqiad-prod: add weight to ms-be106[0-3] - [[phab:T268435|T268435]]
* 09:07 marostegui@cumin1001: dbctl commit (dc=all): 'db1138 (re)pooling @ 50%: After restarting mysql', diff saved to https://phabricator.wikimedia.org/P13766 and previous config saved to /var/cache/conftool/dbconfig/20210114-090756-root.json
* 09:07 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 09:01 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single
* 08:59 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 08:52 marostegui@cumin1001: dbctl commit (dc=all): 'db1138 (re)pooling @ 25%: After restarting mysql', diff saved to https://phabricator.wikimedia.org/P13765 and previous config saved to /var/cache/conftool/dbconfig/20210114-085252-root.json
* 08:52 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single
* 08:51 vgutierrez: rolling restart of ncredir servers to catch up on kernel upgrades
* 08:47 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 08:44 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
* 08:44 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 08:43 XioNoX: standardize cloudsw interfaces to prepare for switches homerisation
* 08:42 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db2140 [[phab:T271084|T271084]]', diff saved to https://phabricator.wikimedia.org/P13764 and previous config saved to /var/cache/conftool/dbconfig/20210114-084243-marostegui.json
* 08:39 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
* 08:10 jmm@cumin2001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99)
* 08:10 jmm@cumin2001: START - Cookbook sre.ganeti.makevm
* 00:22 ryankemper: [[phab:T266492|T266492]] Restart of `relforge` successful
* 00:20 ryankemper@cumin1001: END (PASS) - Cookbook sre.elasticsearch.rolling-restart (exit_code=0)
* 00:15 chaomodus: completed rebooting Netbox hosts, failure was due to report errors that would not have recovered.
* 00:14 crusnov@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1)
* 00:13 ryankemper: `sudo -i cookbook sre.elasticsearch.rolling-restart relforge "relforge cluster restart" --task-id [[phab:T266492|T266492]] --nodes-per-run 1 --without-lvs`
* 00:13 ryankemper: (Forgot to tell it `relforge` isn't lvs-managed)
* 00:13 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-restart
* 00:10 ryankemper@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.rolling-restart (exit_code=99)
* 00:10 ryankemper: [[phab:T266492|T266492]] Beginning rolling restart of `relforge`
* 00:09 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-restart
* 00:04 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2240.codfw.wmnet
* 00:04 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2239.codfw.wmnet
* 00:04 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2238.codfw.wmnet
* 00:04 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2237.codfw.wmnet
* 00:01 crusnov@cumin1001: START - Cookbook sre.hosts.reboot-single
* 00:01 crusnov@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 00:00 ryankemper: [[phab:T266492|T266492]] [[phab:T268779|T268779]] [[phab:T265699|T265699]] Rolling restart of `cloudelastic` was successful


== 2021-01-13 ==
== 2021-10-11 ==
* 23:53 crusnov@cumin1001: START - Cookbook sre.hosts.reboot-single
* 21:25 btullis@cumin1001: END (PASS) - Cookbook sre.hadoop.roll-restart-masters (exit_code=0) restart masters for Hadoop analytics cluster: Restart of jvm daemons. - btullis@cumin1001
* 23:53 crusnov@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 20:58 btullis@cumin1001: START - Cookbook sre.hadoop.roll-restart-masters restart masters for Hadoop analytics cluster: Restart of jvm daemons. - btullis@cumin1001
* 23:49 ryankemper@cumin1001: END (PASS) - Cookbook sre.elasticsearch.rolling-restart (exit_code=0)
* 17:08 elukey: force kafka preferred-replica-election on kafka-main2001 after another batch of topic partitions moves - [[phab:T288825|T288825]]
* 23:49 crusnov@cumin1001: START - Cookbook sre.hosts.reboot-single
* 15:40 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2026.codfw.wmnet
* 23:49 crusnov@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 15:34 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2026.codfw.wmnet
* 23:46 crusnov@cumin1001: START - Cookbook sre.hosts.reboot-single
* 15:31 jgleeson: smashpig updated from {{Gerrit|3607b16f83}} to {{Gerrit|dd3a81c7c2}}
* 23:46 crusnov@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99)
* 14:59 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on testvm[2001-2002,2005].codfw.wmnet with reason: Ganeti tests
* 23:46 crusnov@cumin1001: START - Cookbook sre.hosts.reboot-single
* 14:59 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on testvm[2001-2002,2005].codfw.wmnet with reason: Ganeti tests
* 23:44 chaomodus: rebooting Netbox instances to apply updates
* 14:36 Emperor: start restoring weight to ms-be2045 [[phab:T290881|T290881]]
* 23:18 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2240.codfw.wmnet
* 13:42 elukey: force kafka preferred-replica-election on kafka-main2001 after another batch of topic partitions moves - [[phab:T288825|T288825]]
* 23:18 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2239.codfw.wmnet
* 12:53 moritzm: install apache security updates on buster
* 23:18 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2238.codfw.wmnet
* 12:49 topranks: Setting up BGP peering to AS12552 (GlobalConnect Group) at AMS-IX on cr2-esams
* 23:18 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2237.codfw.wmnet
* 12:45 ema: cp4027: upgrade varnish to 6.0.8 [[phab:T292290|T292290]]
* 22:53 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-restart
* 12:04 moritzm: install apache security updates on bullseye
* 22:53 ryankemper: [[phab:T266492|T266492]] [[phab:T268779|T268779]] [[phab:T265699|T265699]] `sudo -i cookbook sre.elasticsearch.rolling-restart cloudelastic "cloudelastic cluster restart" --task-id [[phab:T266492|T266492]] --nodes-per-run 1`
* 10:23 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host graphite2003.codfw.wmnet
* 22:53 ryankemper: [[phab:T266492|T266492]] [[phab:T268779|T268779]] [[phab:T265699|T265699]] Restarting cloudelastic to apply new readahead changes, this will also verify cloudelastic support works in our elasticsearch spicerack code. Only going one node at a time because cloudelastic elasticsearch indices only have 1 replica shard per index
* 09:50 filippo@cumin1001: START - Cookbook sre.hosts.reimage for host graphite2003.codfw.wmnet
* 21:51 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on mw2239.codfw.wmnet with reason: new install on buster
* 09:45 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host graphite2003.codfw.wmnet
* 21:51 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on mw2239.codfw.wmnet with reason: new install on buster
* 09:37 elukey: force kafka preferred-replica-election on kafka-main2001 after another batch of topic partitions moves - [[phab:T288825|T288825]]
* 21:44 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2240.codfw.wmnet with reason: REIMAGE
* 09:13 filippo@cumin1001: START - Cookbook sre.hosts.reimage for host graphite2003.codfw.wmnet
* 21:42 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2238.codfw.wmnet with reason: REIMAGE
* 09:09 elukey: force kafka preferred-replica-election on kafka-main2001 after the first 50 topic partitions moves - [[phab:T288825|T288825]]
* 21:41 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2240.codfw.wmnet with reason: REIMAGE
* 09:05 volans@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sretest1002.eqiad.wmnet
* 21:40 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mw2239.codfw.wmnet with reason: REIMAGE
* 09:01 godog: bounce swift-object-replicator on ms-be2036
* 21:40 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2237.codfw.wmnet with reason: REIMAGE
* 08:52 godog: bounce statsite on graphite1004 to apply unit config changes
* 21:40 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2239.codfw.wmnet with reason: REIMAGE
* 08:48 volans@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sretest1001.eqiad.wmnet
* 21:39 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2238.codfw.wmnet with reason: REIMAGE
* 08:41 volans@cumin2002: START - Cookbook sre.hosts.reimage for host sretest1002.eqiad.wmnet
* 21:38 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2237.codfw.wmnet with reason: REIMAGE
* 08:38 moritzm: updated buster d-i image for Bullseye 11.1 point release [[phab:T292844|T292844]]
* 21:14 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2235.codfw.wmnet
* 08:38 moritzm: updated buster d-i image for Buster 10.11 point release [[phab:T292838|T292838]]
* 21:14 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2234.codfw.wmnet
* 08:26 godog: swift eqiad-prod: final weight to ms-be10[64-67] - [[phab:T290546|T290546]]
* 21:13 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2233.codfw.wmnet
* 08:25 moritzm: updated buster d-i image for Buster 10.11 point release [[phab:T292838|T292838]]
* 21:12 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2232.codfw.wmnet
* 08:24 volans@cumin1001: START - Cookbook sre.hosts.reimage for host sretest1001.eqiad.wmnet
* 21:12 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2231.codfw.wmnet
* 08:06 godog: bounce uwsgi on graphite hosts to bump request size limit - [[phab:T292877|T292877]]
* 21:06 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2235.codfw.wmnet
* 07:58 volans: migrating physical hosts DHCP to the new reimage process - [[phab:T269855|T269855]]
* 21:06 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2234.codfw.wmnet
* 07:57 elukey: start kafka topics rebalancing for main-codfw (long running maintenance) - [[phab:T288825|T288825]]
* 21:06 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2233.codfw.wmnet
* 21:06 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2232.codfw.wmnet
* 20:40 mutante: DNS - new project language "alt" added.  Altai (also Gorno-Altai) is a Turkic language, spoken officially in the Altai Republic, Russia.
* 20:07 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2235.codfw.wmnet with reason: REIMAGE
* 20:05 robh@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on ml-serve1003.eqiad.wmnet with reason: REIMAGE
* 20:05 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ml-serve1001.eqiad.wmnet with reason: REIMAGE
* 20:04 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2235.codfw.wmnet with reason: REIMAGE
* 20:03 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ml-serve1002.eqiad.wmnet with reason: REIMAGE
* 20:02 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on ml-serve1001.eqiad.wmnet with reason: REIMAGE
* 20:02 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on ml-serve1003.eqiad.wmnet with reason: REIMAGE
* 20:01 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on ml-serve1002.eqiad.wmnet with reason: REIMAGE
* 19:58 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|726e972bc8cff1ff8ed90c8dd853aae4997329f5}}: Set import sources for mrwikibooks ([[phab:T270402|T270402]]) (duration: 01m 04s)
* 19:47 thcipriani@deploy1001: Synchronized php-1.36.0-wmf.26/extensions/WikibaseMediaInfo/src/Search/MediaSearchProfiles.php: [[gerrit:655919{{!}}Guard against this file being included twice]] [[phab:T271933|T271933]] (for real -- forgot to submodule update) (duration: 01m 04s)
* 19:46 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2234.codfw.wmnet with reason: REIMAGE
* 19:44 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2234.codfw.wmnet with reason: REIMAGE
* 19:42 thcipriani@deploy1001: Synchronized php-1.36.0-wmf.26/extensions/WikibaseMediaInfo/src/Search/MediaSearchProfiles.php: [[gerrit:655919{{!}}Guard against this file being included twice]] [[phab:T271933|T271933]] (duration: 01m 04s)
* 19:39 razzi@cumin1001: END (PASS) - Cookbook sre.druid.reboot-workers (exit_code=0) for Druid test cluster: Reboot Druid nodes - razzi@cumin1001
* 19:36 otto@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Undo - Migrate SpecialMuteSubmit to EventGate - [[phab:T268517|T268517]] (duration: 01m 06s)
* 19:34 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2231.codfw.wmnet
* 19:20 razzi@cumin1001: START - Cookbook sre.druid.reboot-workers for Druid test cluster: Reboot Druid nodes - razzi@cumin1001
* 19:14 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2230.codfw.wmnet
* 19:13 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2229.codfw.wmnet
* 19:13 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2228.codfw.wmnet
* 19:02 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2233.codfw.wmnet with reason: REIMAGE
* 19:00 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2233.codfw.wmnet with reason: REIMAGE
* 18:50 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2232.codfw.wmnet with reason: REIMAGE
* 18:48 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2231.codfw.wmnet with reason: REIMAGE
* 18:47 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2232.codfw.wmnet with reason: REIMAGE
* 18:46 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2231.codfw.wmnet with reason: REIMAGE
* 18:34 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2227.codfw.wmnet
* 18:34 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2228.codfw.wmnet
* 18:29 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2230.codfw.wmnet
* 18:19 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2229.codfw.wmnet
* 18:19 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2230.codfw.wmnet
* 18:18 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2227.codfw.wmnet
* 17:47 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2228.codfw.wmnet with reason: REIMAGE
* 17:45 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2228.codfw.wmnet with reason: REIMAGE
* 17:31 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2230.codfw.wmnet with reason: REIMAGE
* 17:29 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2230.codfw.wmnet with reason: REIMAGE
* 17:28 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2229.codfw.wmnet with reason: REIMAGE
* 17:26 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2227.codfw.wmnet with reason: REIMAGE
* 17:25 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2229.codfw.wmnet with reason: REIMAGE
* 17:23 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2227.codfw.wmnet with reason: REIMAGE
* 17:11 herron: beginning cutover of https://logstash.wikimedia.org frontend to ELK7 [[phab:T234854|T234854]]
* 17:02 mutante: m2228 resetting DRAC/BMC - trying to solve remote IPMI issue - bmc-device --cold-reset; echo $?
* 17:02 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 16:53 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 16:39 sukhe: upload pdns-recursor_4.4.2-2wm1 to apt.wm.o (buster) - [[phab:T252132|T252132]]
* 16:18 jmm@cumin2001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99)
* 16:18 jmm@cumin2001: START - Cookbook sre.ganeti.makevm
* 16:18 jmm@cumin2001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99)
* 16:17 jmm@cumin2001: START - Cookbook sre.ganeti.makevm
* 16:06 urbanecm@deploy1001: Synchronized php-1.36.0-wmf.26/extensions/ProofreadPage/includes/Special/SpecialProofreadPages.php: {{Gerrit|d73ba7c1aa92190903cd4b07fe3e8cf1bed13d70}}: GlobalVarConfig::get should not be provided with the wg prefix ([[phab:T271932|T271932]]) (duration: 01m 07s)
* 15:56 volans: upgraded spicerack to 0.0.47-1+deb10u1 on cumin1001 - [[phab:T257905|T257905]]
* 15:56 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.roll-restart-masters (exit_code=0)
* 15:45 akosiaris@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
* 15:45 akosiaris@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-main' for release 'canary' .
* 15:44 akosiaris@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'canary' .
* 15:44 akosiaris@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
* 15:42 akosiaris@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
* 15:42 akosiaris@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'canary' .
* 15:41 akosiaris@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-analytics' for release 'production' .
* 15:41 akosiaris@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-analytics' for release 'canary' .
* 15:22 hashar: Stopping Jenkins CI on contint2001 to upgrade Jenkins # [[phab:T271507|T271507]]
* 15:11 elukey@cumin1001: START - Cookbook sre.hadoop.roll-restart-masters
* 15:06 volans@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:10:00 on cumin2001.codfw.wmnet with reason: volans's test
* 15:06 volans@cumin2001: START - Cookbook sre.hosts.downtime for 0:10:00 on cumin2001.codfw.wmnet with reason: volans's test
* 15:05 volans: upgraded spicerack to 0.0.47-1+deb10u1 on cumin2001 - [[phab:T257905|T257905]]
* 15:01 hashar: Upgraded Jenkins on releases1002 / releases2002 hosts # [[phab:T271507|T271507]]
* 14:57 moritzm: imported jenkins 2.263.2 (security release) to apt.wikimedia.org/buster-wikimedia
* 14:27 urbanecm@deploy1001: Synchronized php-1.36.0-wmf.26/skins/Vector/includes/templates/legacy/Sidebar.mustache: {{Gerrit|5a117ded68b5e0fc7f9b4a8a4513780e57eceefe}}: Use {{link-mainpage}} in legacy sidebar same as new logo ([[phab:T271873|T271873]]) (duration: 01m 05s)
* 14:17 akosiaris@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-main' for release 'canary' .
* 14:17 akosiaris@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
* 14:16 akosiaris@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'canary' .
* 14:15 akosiaris@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
* 14:14 akosiaris@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'canary' .
* 14:14 akosiaris@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
* 14:14 akosiaris@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-analytics' for release 'production' .
* 14:12 akosiaris@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-analytics' for release 'canary' .
* 14:04 liw@deploy1001: Synchronized php: group1 wikis to 1.36.0-wmf.26 (duration: 01m 03s)
* 14:03 liw@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.36.0-wmf.26
* 13:52 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
* 13:52 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-main' for release 'canary' .
* 13:51 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
* 13:51 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'canary' .
* 13:50 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'canary' .
* 13:50 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
* 13:49 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics' for release 'canary' .
* 13:49 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics' for release 'production' .
* 13:48 godog: swift codfw-prod: more weight to ms-be20[58-61] - [[phab:T269337|T269337]]
* 13:36 akosiaris@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventstreams' for release 'canary' .
* 13:36 akosiaris@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventstreams' for release 'production' .
* 13:31 akosiaris@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventstreams' for release 'canary' .
* 13:31 akosiaris@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventstreams' for release 'production' .
* 13:07 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventstreams' for release 'production' .
* 12:15 dcausse: European mid-day backport window done
* 12:09 dcausse@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [[phab:T239931|T239931]]: Revert "Disable sanity check cirrus jobs for Wikidata" (duration: 01m 16s)
* 11:52 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc2029.codfw.wmnet with reason: REIMAGE
* 11:49 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1029.eqiad.wmnet with reason: REIMAGE
* 11:49 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mc2029.codfw.wmnet with reason: REIMAGE
* 11:47 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1029.eqiad.wmnet with reason: REIMAGE
* 11:40 kart_: Updated cxserver to 2021-01-12-095820-production ([[phab:T234220|T234220]], [[phab:T270408|T270408]])
* 11:37 kartik@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'cxserver' for release 'production' .
* 11:33 kartik@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'cxserver' for release 'production' .
* 11:23 kartik@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'cxserver' for release 'staging' .
* 11:13 marostegui@cumin1001: dbctl commit (dc=all): 'es1020 (re)pooling @ 100%: After restarting mysql', diff saved to https://phabricator.wikimedia.org/P13756 and previous config saved to /var/cache/conftool/dbconfig/20210113-111312-root.json
* 11:04 marostegui@cumin1001: dbctl commit (dc=all): 'Remove weight on es4 the master', diff saved to https://phabricator.wikimedia.org/P13755 and previous config saved to /var/cache/conftool/dbconfig/20210113-110419-marostegui.json
* 10:58 marostegui@cumin1001: dbctl commit (dc=all): 'es1020 (re)pooling @ 75%: After restarting mysql', diff saved to https://phabricator.wikimedia.org/P13754 and previous config saved to /var/cache/conftool/dbconfig/20210113-105809-root.json
* 10:57 volans: uploaded spicerack_0.0.47 to apt.wikimedia.org buster-wikimedia
* 10:43 marostegui@cumin1001: dbctl commit (dc=all): 'es1020 (re)pooling @ 50%: After restarting mysql', diff saved to https://phabricator.wikimedia.org/P13753 and previous config saved to /var/cache/conftool/dbconfig/20210113-104305-root.json
* 10:35 jbond42: puppet re-enabled on aall cp-text hosts
* 10:28 marostegui@cumin1001: dbctl commit (dc=all): 'es1020 (re)pooling @ 25%: After restarting mysql', diff saved to https://phabricator.wikimedia.org/P13751 and previous config saved to /var/cache/conftool/dbconfig/20210113-102802-root.json
* 10:22 marostegui@cumin1001: dbctl commit (dc=all): 'Reduce weight on es1021', diff saved to https://phabricator.wikimedia.org/P13750 and previous config saved to /var/cache/conftool/dbconfig/20210113-102245-marostegui.json
* 10:18 jbond42: disable puppet on the cp::text to deploy block list changes 651174 + 651171
* 10:16 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es1020', diff saved to https://phabricator.wikimedia.org/P13749 and previous config saved to /var/cache/conftool/dbconfig/20210113-101606-marostegui.json
* 10:02 marostegui@cumin1001: dbctl commit (dc=all): 'es1020 (re)pooling @ 25%: After restarting mysql', diff saved to https://phabricator.wikimedia.org/P13748 and previous config saved to /var/cache/conftool/dbconfig/20210113-100253-root.json
* 09:59 marostegui: Enable report_host on es1020 [[phab:T271106|T271106]]
* 09:58 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es1020', diff saved to https://phabricator.wikimedia.org/P13747 and previous config saved to /var/cache/conftool/dbconfig/20210113-095834-marostegui.json
* 09:49 marostegui: Enable report_host on all codfw sby masters - [[phab:T271106|T271106]]
* 09:42 godog: swift eqiad-prod: add weight to ms-be106[0-3] - [[phab:T268435|T268435]]
* 09:05 ayounsi@deploy1001: Finished deploy [homer/deploy@723ebfe]: Netbox 2.9 changes (duration: 03m 11s)
* 09:03 ryankemper@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.rolling-restart (exit_code=99)
* 09:02 ayounsi@deploy1001: Started deploy [homer/deploy@723ebfe]: Netbox 2.9 changes
* 09:02 moritzm: installing efivar bugfix update
* 09:00 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 08:54 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
* 08:47 moritzm: draining ganeti4003 for eventual reboot
* 08:46 ema: cp5008: re-enable puppet to undo JIT tslua experiment [[phab:T265625|T265625]]
* 08:35 moritzm: failover ganeti master in ulsfo to ganeti4002
* 08:29 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 08:23 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
* 08:19 moritzm: draining ganeti4002 for eventual reboot
* 08:17 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 08:13 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
* 08:04 ryankemper: [WDQS Deploy] Deploy is complete, and the WDQS service is healthy
* 07:59 moritzm: draining ganeti4001 for eventual reboot
* 07:29 ryankemper: [WDQS Deploy] Restarting `wdqs-categories` across lvs-managed hosts, one node at a time: `sudo -E cumin -b 1 'A:wdqs-all and not A:wdqs-test' 'depool && sleep 45 && systemctl restart wdqs-categories && sleep 45 && pool'`
* 07:29 ryankemper: [WDQS Deploy] Restarted `wdqs-categories` across all test hosts simultaneously: `sudo -E cumin 'A:wdqs-test' 'systemctl restart wdqs-categories'`
* 07:28 ryankemper: [WDQS Deploy] Restarted `wdqs-updater` across all hosts simultaneously: `sudo -E cumin -b 4 'A:wdqs-all' 'systemctl restart wdqs-updater'`
* 07:28 ryankemper@deploy1001: Finished deploy [wdqs/wdqs@fdd2c2f]: 0.3.59 (duration: 14m 23s)
* 07:15 ryankemper: [WDQS Deploy] All tests passing on canary instance `wdqs1003` following canary deploy. Proceeding to rest of fleet...
* 07:13 ryankemper@deploy1001: Started deploy [wdqs/wdqs@fdd2c2f]: 0.3.59
* 07:13 ryankemper: [WDQS Deploy] All tests passing on canary instance `wdqs1003` prior to start of deploy. Proceeding with canary deploy of version `0.3.59`...
* 07:04 ryankemper: [[phab:T266492|T266492]] [[phab:T268779|T268779]] [[phab:T265699|T265699]] Restarting cloudelastic to apply new readahead changes, this will also verify cloudelastic support works in our elasticsearch spicerack code. Only going one node at a time because cloudelastic elasticsearch indices only have 1 replica shard per index.
* 07:03 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-restart
* 06:55 marostegui@cumin1001: dbctl commit (dc=all): 'db1079 (re)pooling @ 100%: After cloning db1155:3317', diff saved to https://phabricator.wikimedia.org/P13745 and previous config saved to /var/cache/conftool/dbconfig/20210113-065535-root.json
* 06:40 marostegui@cumin1001: dbctl commit (dc=all): 'db1079 (re)pooling @ 75%: After cloning db1155:3317', diff saved to https://phabricator.wikimedia.org/P13744 and previous config saved to /var/cache/conftool/dbconfig/20210113-064031-root.json
* 06:25 marostegui@cumin1001: dbctl commit (dc=all): 'db1079 (re)pooling @ 50%: After cloning db1155:3317', diff saved to https://phabricator.wikimedia.org/P13743 and previous config saved to /var/cache/conftool/dbconfig/20210113-062528-root.json
* 06:10 marostegui@cumin1001: dbctl commit (dc=all): 'db1079 (re)pooling @ 25%: After cloning db1155:3317', diff saved to https://phabricator.wikimedia.org/P13742 and previous config saved to /var/cache/conftool/dbconfig/20210113-061024-root.json


== 2021-01-12 ==
== 2021-10-09 ==
* 22:55 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2225.codfw.wmnet
* 05:01 jiji@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 22:55 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2224.codfw.wmnet
* 04:28 jiji@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 22:46 crusnov@deploy1001: Finished deploy [netbox/deploy@b17db99]: Rerun production deploy of Netbox 2.9 just in case [[phab:T266487|T266487]] (duration: 00m 05s)
* 01:32 ryankemper@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) restart without plugin upgrade (1 nodes at a time) for ElasticSearch cluster cloudelastic: cloudelastic restart - ryankemper@cumin1001 - [[phab:T292814|T292814]]
* 22:46 crusnov@deploy1001: Started deploy [netbox/deploy@b17db99]: Rerun production deploy of Netbox 2.9 just in case [[phab:T266487|T266487]]
* 00:46 mutante: ms-be2045 - started systemd-timedated which had been killed by something
* 22:37 chaomodus: Upgrade of Netbox to 2.9 complete, checking support software. [[phab:T266487|T266487]]
* 00:28 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation restart without plugin upgrade (1 nodes at a time) for ElasticSearch cluster cloudelastic: cloudelastic restart - ryankemper@cumin1001 - [[phab:T292814|T292814]]
* 22:33 crusnov@deploy1001: Finished deploy [netbox/deploy@b17db99]: Deploy Netbox 2.9.10 to production [[phab:T266487|T266487]] (duration: 02m 33s)
* 00:24 ryankemper@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.force-unfreeze (exit_code=99)
* 22:30 crusnov@deploy1001: Started deploy [netbox/deploy@b17db99]: Deploy Netbox 2.9.10 to production [[phab:T266487|T266487]]
* 00:23 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.force-unfreeze
* 22:12 chaomodus: Merged Netbox 2.9 related changes in puppet and -extras; testing on -next [[phab:T266487|T266487]]
* 00:13 ryankemper: [[phab:T292814|T292814]] Write queue stuck at 133 events in partition 1 of topic `codfw.cpjobqueue.partitioned.mediawiki.job.cirrusSearchElasticaWrite`, will try again at another time
* 22:07 bblack: reboot authdns1001 - [[phab:T266746|T266746]]#6741647
* 00:12 ryankemper@cumin1001: END (ERROR) - Cookbook sre.elasticsearch.rolling-operation (exit_code=97) restart without plugin upgrade (1 nodes at a time) for ElasticSearch cluster cloudelastic: cloudelastic restart - ryankemper@cumin1001 - [[phab:T292814|T292814]]
* 22:04 chaomodus: proceeding with Netbox 2.9 upgrade [[phab:T266487|T266487]]
* 22:02 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2225.codfw.wmnet with reason: REIMAGE
* 22:00 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2225.codfw.wmnet with reason: REIMAGE
* 21:57 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2224.codfw.wmnet with reason: REIMAGE
* 21:55 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2224.codfw.wmnet with reason: REIMAGE
* 21:50 jforrester@deploy1001: Synchronized php-1.36.0-wmf.25/extensions/AbuseFilter/modules/mode-abusefilter.js: [[phab:T271487|T271487]] Don't pass protocol-relative URLs to the Ace worker (duration: 01m 06s)
* 21:41 ottomata: rolling restart of eventgate-analytics-external pods
* 20:40 tgr_: running 'mwscript extensions/ORES/maintenance/PopulateDatabase.php --wiki=ukwiki' on terbium
* 19:57 tgr_: backports done
* 19:52 bblack: dns1001,authdns1001 - upgrade gdnsd to 3.5.0
* 19:49 tgr_: synced Config: [[gerrit:654520{{!}}Disable DiscussionTools' upcoming newtopictool (T270119)]]
* 19:49 tgr_: synced Config: [[gerrit:655723{{!}}Migrate HomepageVisit and ServerSideAccountCreation to Event Platform on testwiki (T267333)]]
* 19:48 tgr_: synced Config: [[gerrit:655706{{!}}Migrate SuggestedTagsAction to Event Platform on testwiki (T267351)]]
* 19:48 tgr_: synced Config: [[gerrit:655301{{!}}Alphabetize ORES settings (T256887)]]
* 19:46 tgr@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:655302{{!}}Enable ORES filters on ukwiki (T256887)]] (duration: 01m 05s)
* 19:32 tgr@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Bunch of no-op/testwiki changes: [[gerrit:654520]], [[gerrit:655301]], [[gerrit:655706]], [[gerrit:655723]] (duration: 01m 05s)
* 19:27 bblack: dns3001,dns4001 - upgrade gdnsd to 3.5.0
* 19:25 ottomata: rolling restart of eventgate-analytics-external pods to clear schema caches - [[phab:T267333|T267333]]
* 19:01 ariel@deploy1001: Synchronized php-1.36.0-wmf.26/includes/api/ApiQueryInfo.php: Backport: (gerrit 655671) Fix undefined index error in ApiQueryInfo ([[phab:T271815|T271815]]) (duration: 01m 06s)
* 18:06 bblack: dns2001,dns5001 - upgrade gdnsd to 3.5.0
* 17:40 bblack: dnsX002 - upgrade gdnsd to 3.5.0
* 17:20 herron: roll restarting eqiad/codfw low-traffic pybals for kibana-next -> kibana7 rename
* 17:11 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 17:09 jynus: shutting down db2132, db2078:m1 for m1 codfw replica reprovisioning [[phab:T270877|T270877]]
* 17:09 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
* 17:09 moritzm: rebooting people1002 (people.wikimedia.org)
* 16:56 moritzm: reinstalling bast3005 with correct DHCP settings
* 16:39 herron@puppetmaster1001: conftool action : set/pooled=yes:weight=10; selector: cluster=kibana7,service=kibana7
* 16:37 ema: cp5008: ats-backend-restart to apply jit.off(true, true) to all lua scripts [[phab:T265625|T265625]]
* 16:33 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 16:31 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
* 16:18 herron@puppetmaster1001: conftool action : set/weight=10; selector: name=logstash2031.codfw.wmnet
* 15:56 ema: cp5008: ats-backend-restart to apply jit.off(true, true) in default.lua [[phab:T265625|T265625]]
* 15:55 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 15:53 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
* 15:52 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on ms-be2055.codfw.wmnet with reason: reboot
* 15:52 filippo@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on ms-be2055.codfw.wmnet with reason: reboot
* 15:22 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on ms-be2031.codfw.wmnet with reason: test unattended reboot
* 15:22 filippo@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on ms-be2031.codfw.wmnet with reason: test unattended reboot
* 14:19 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 14:17 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
* 14:14 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 14:05 liw@deploy1001: rebuilt and synchronized wikiversions files: group0 wikis to 1.36.0-wmf.26
* 14:00 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
* 13:55 moritzm: draining ganeti3003 for eventual reboot
* 13:53 moritzm: failover ganeti master in esams to ganeti3002
* 13:44 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 13:39 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
* 13:33 moritzm: draining ganeti3002 for eventual reboot
* 12:19 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 12:12 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
* 12:08 moritzm: draining ganeti3001 for eventual reboot
* 11:22 moritzm: installing edk2 security updates
* 10:51 godog: swift eqiad-prod: add weight to ms-be106[0-3] - [[phab:T268435|T268435]]
* 10:28 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on cloudnet1004.eqiad.wmnet with reason: [[phab:T271058|T271058]]
* 10:28 aborrero@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on cloudnet1004.eqiad.wmnet with reason: [[phab:T271058|T271058]]
* 10:26 moritzm: installing systemd bugfix update from Buster 10.7 point release
* 10:15 liw@deploy1001: Finished scap: testwikis wikis to 1.36.0-wmf.26 (duration: 67m 18s)
* 10:13 marostegui: Restart mysql on db1138 to pick up new config [[phab:T271427|T271427]] [[phab:T271106|T271106]]
* 10:12 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1138', diff saved to https://phabricator.wikimedia.org/P13736 and previous config saved to /var/cache/conftool/dbconfig/20210112-101211-marostegui.json
* 09:28 liw@deploy1001: Started scap: testwikis wikis to 1.36.0-wmf.26
* 09:05 marostegui@cumin1001: dbctl commit (dc=all): 'db1112 (re)pooling @ 100%: After schema change', diff saved to https://phabricator.wikimedia.org/P13732 and previous config saved to /var/cache/conftool/dbconfig/20210112-090533-root.json
* 08:50 marostegui@cumin1001: dbctl commit (dc=all): 'db1112 (re)pooling @ 75%: After schema change', diff saved to https://phabricator.wikimedia.org/P13731 and previous config saved to /var/cache/conftool/dbconfig/20210112-085030-root.json
* 08:49 reedy@deploy1001: Synchronized wmf-config/InitialiseSettings-labs.php: [[phab:T271755|T271755]] (duration: 00m 57s)
* 08:47 liw: 1.36.0-wmf.26 was branched at {{Gerrit|e6ad9ab7713ee33c30cd7c17762737870dc8fd08}} for [[phab:T267419|T267419]]
* 08:40 marostegui: Sanitize bclwiktionary diqwiktionary niawiki niawiktionary diqwiktionary on db1124  db2094 {{Gerrit|db11154}} [[phab:T270280|T270280]] [[phab:T270276|T270276]] [[phab:T270414|T270414]] [[phab:T270410|T270410]] [[phab:T271261|T271261]]
* 08:35 marostegui@cumin1001: dbctl commit (dc=all): 'db1112 (re)pooling @ 50%: After schema change', diff saved to https://phabricator.wikimedia.org/P13730 and previous config saved to /var/cache/conftool/dbconfig/20210112-083526-root.json
* 08:30 moritzm: installing remaining curl security updates on stretch
* 08:21 marostegui: Deploy schema change on s3 eqiad master - [[phab:T270187|T270187]]
* 08:20 marostegui@cumin1001: dbctl commit (dc=all): 'db1112 (re)pooling @ 25%: After schema change', diff saved to https://phabricator.wikimedia.org/P13729 and previous config saved to /var/cache/conftool/dbconfig/20210112-082023-root.json
* 08:04 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1112', diff saved to https://phabricator.wikimedia.org/P13728 and previous config saved to /var/cache/conftool/dbconfig/20210112-080419-marostegui.json
* 07:00 marostegui@cumin1001: dbctl commit (dc=all): 'db1075 (re)pooling @ 100%: After schema change', diff saved to https://phabricator.wikimedia.org/P13727 and previous config saved to /var/cache/conftool/dbconfig/20210112-070051-root.json
* 06:53 XioNoX: push CR655445, only configure vlans relevant to a switch
* 06:45 marostegui@cumin1001: dbctl commit (dc=all): 'db1075 (re)pooling @ 75%: After schema change', diff saved to https://phabricator.wikimedia.org/P13726 and previous config saved to /var/cache/conftool/dbconfig/20210112-064548-root.json
* 06:30 marostegui@cumin1001: dbctl commit (dc=all): 'db1075 (re)pooling @ 50%: After schema change', diff saved to https://phabricator.wikimedia.org/P13725 and previous config saved to /var/cache/conftool/dbconfig/20210112-063044-root.json
* 06:30 jhuneidi@deploy1001: Pruned MediaWiki: 1.36.0-wmf.21 (duration: 03m 21s)
* 06:16 marostegui: Stop mysql on db1079 to clone db1155:3317 [[phab:T268742|T268742]]
* 06:15 marostegui@cumin1001: dbctl commit (dc=all): 'db1075 (re)pooling @ 25%: After schema change', diff saved to https://phabricator.wikimedia.org/P13724 and previous config saved to /var/cache/conftool/dbconfig/20210112-061541-root.json
* 06:05 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1079', diff saved to https://phabricator.wikimedia.org/P13723 and previous config saved to /var/cache/conftool/dbconfig/20210112-060557-marostegui.json
* 05:59 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1075', diff saved to https://phabricator.wikimedia.org/P13722 and previous config saved to /var/cache/conftool/dbconfig/20210112-055953-marostegui.json


== 2021-01-11 ==
== 2021-10-08 ==
* 22:16 eileen: process-control config revision is {{Gerrit|f08249ecf9}} eoy jobs disabled
* 23:16 legoktm: sudo cumin -b 10 C:mediawiki::packages 'apt-get purge lilypond-data -y'
* 22:12 eileen: civicrm revision changed from {{Gerrit|2df572bdcd}} to {{Gerrit|f417a510a5}}, config revision is {{Gerrit|f08249ecf9}}
* 23:10 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation restart without plugin upgrade (1 nodes at a time) for ElasticSearch cluster cloudelastic: cloudelastic restart - ryankemper@cumin1001 - [[phab:T292814|T292814]]
* 21:58 Amir1: deleting watchlist enteries of Fawikibot in fawiki (1.1M rows)
* 21:38 mutante: mwmaint2002 - disable-puppet, stop bacula-fd, recovery in progress
* 21:20 mutante: docker images - [deneb:/srv/images/production-images] $ sudo -i build-production-images
* 21:34 mutante: disabling puppet on bacula - going through a restore https://wikitech.wikimedia.org/wiki/Bacula#Restore_from_a_non-existent_host_(missing_private_key)
* 21:02 bblack: dns4002 - upgrade gdnsd to 3.5.0 package
* 21:30 legoktm: running puppet across C:mediawiki::packages to uninstall lilypond and ploticus: legoktm@cumin1001:~$ sudo cumin -b 4 C:mediawiki::packages 'run-puppet-agent'
* 20:47 bblack: authdns2001 - upgrade gdnsd to 3.5.0 package
* 20:12 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes1018.eqiad.wmnet with reason: REIMAGE
* 19:53 otto@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Migrate UniversalLanguageSelector to Event Platform - [[phab:T268517|T268517]] (duration: 00m 57s)
* 20:10 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubestage1004.eqiad.wmnet with reason: REIMAGE
* 19:43 Amir1: end of ladsgroup@mwmaint1002:~$ foreachwikiindblist wikidataclient extensions/Wikibase/lib/maintenance/populateSitesTable.php --force-protocol https ([[phab:T270417|T270417]] [[phab:T270413|T270413]] [[phab:T270279|T270279]])
* 20:08 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1018.eqiad.wmnet with reason: REIMAGE
* 19:14 Amir1: start of ladsgroup@mwmaint1002:~$ foreachwikiindblist wikidataclient extensions/Wikibase/lib/maintenance/populateSitesTable.php --force-protocol https ([[phab:T270417|T270417]] [[phab:T270413|T270413]] [[phab:T270279|T270279]])
* 20:08 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubestage1003.eqiad.wmnet with reason: REIMAGE
* 18:48 dpifke@deploy1001: Finished deploy [performance/arc-lamp@6bbac6d]: Deploying https://gerrit.wikimedia.org/r/c/performance/arc-lamp/+/650277 (duration: 00m 04s)
* 20:06 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubestage1004.eqiad.wmnet with reason: REIMAGE
* 18:48 dpifke@deploy1001: Started deploy [performance/arc-lamp@6bbac6d]: Deploying https://gerrit.wikimedia.org/r/c/performance/arc-lamp/+/650277
* 20:05 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubestage1003.eqiad.wmnet with reason: REIMAGE
* 18:01 reedy@deploy1001: Synchronized wmf-config/CommonSettings-labs.php: [[phab:T181217|T181217]] (duration: 00m 56s)
* 19:46 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes1020.eqiad.wmnet with reason: REIMAGE
* 18:00 reedy@deploy1001: Synchronized wmf-config/InitialiseSettings-labs.php: [[phab:T181217|T181217]] (duration: 00m 57s)
* 19:45 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes1019.eqiad.wmnet with reason: REIMAGE
* 17:57 reedy@deploy1001: Synchronized wmf-config/extension-list: [[phab:T181217|T181217]] (duration: 00m 56s)
* 19:43 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1020.eqiad.wmnet with reason: REIMAGE
* 17:48 Amir1: manually removing watchlist rows for Dexbot in Wikidata
* 19:42 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1019.eqiad.wmnet with reason: REIMAGE
* 17:46 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 14 days, 0:00:00 on deploy2002.codfw.wmnet with reason: new install on buster
* 19:42 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes1018.eqiad.wmnet with reason: REIMAGE
* 17:46 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 14 days, 0:00:00 on deploy2002.codfw.wmnet with reason: new install on buster
* 19:39 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1018.eqiad.wmnet with reason: REIMAGE
* 17:46 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 14 days, 0:00:00 on deploy1002.eqiad.wmnet with reason: new install on buster
* 18:15 cstone: civicrm revision changed from {{Gerrit|5cb7d487cb}} to {{Gerrit|598b59b0ee}}
* 17:46 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 14 days, 0:00:00 on deploy1002.eqiad.wmnet with reason: new install on buster
* 16:19 urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript extensions/GrowthExperiments/maintenance/updateMenteeData.php --wiki=enwiki --force # to measure performance on a large wiki
* 17:40 mutante: deploy2002 - scap pull
* 15:48 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
* 17:39 mutante: deploy1002 - scap pull
* 15:48 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 17:15 andrew@deploy1001: Finished deploy [striker/deploy@b6441b8]: Striker deploy for [[phab:T271621|T271621]] (duration: 01m 59s)
* 15:29 jelto: enable puppet on gitlab1001 again for [[phab:T283076|T283076]]
* 17:13 andrew@deploy1001: Started deploy [striker/deploy@b6441b8]: Striker deploy for [[phab:T271621|T271621]]
* 14:05 jiji@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 17:12 andrew@deploy1001: Finished deploy [striker/deploy@b6441b8]: Striker deploy for [[phab:T271621|T271621]] (duration: 02m 05s)
* 14:01 jiji@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 17:10 andrew@deploy1001: Started deploy [striker/deploy@b6441b8]: Striker deploy for [[phab:T271621|T271621]]
* 09:49 Amir1: wikiadmin@10.64.16.85(wikidatawiki)> delete from wb_changes_subscription where cs_subscriber_id in ('testcommonswiki', 'mowiki');
* 16:48 Urbanecm: Create new wiki window is completed
* 09:39 Emperor: installing stress on ms-be2045 given recent h/w issues [[phab:T290881|T290881]]
* 16:43 andrew@deploy1001: Finished deploy [striker/deploy@3180f72]: Striker deploy for [[phab:T271621|T271621]] (duration: 01m 01s)
* 08:20 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 16:42 andrew@deploy1001: Started deploy [striker/deploy@3180f72]: Striker deploy for [[phab:T271621|T271621]]
* 08:12 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 16:37 urbanecm@deploy1001: Synchronized wmf-config/interwiki.php: Update interwiki cache (duration: 02m 18s)
* 08:04 urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript extensions/GrowthExperiments/maintenance/updateMenteeData.php --wiki=frwiki --force
* 16:34 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Creating bclwiktionary ([[phab:T270274|T270274]]) (duration: 00m 56s)
* 07:43 Emperor: reboot ms-be2045 [[phab:T290881|T290881]]
* 16:33 urbanecm@deploy1001: Synchronized static/images/project-logos/: Creating bclwiktionary ([[phab:T270274|T270274]]) (duration: 00m 56s)
* 07:41 gehel: manually resuming the data reloads on wdqs1009 and wdqs2008
* 16:32 urbanecm@deploy1001: rebuilt and synchronized wikiversions files: Creating bclwiktionary ([[phab:T270274|T270274]])
* 06:42 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.cf (exit_code=0)
* 16:30 urbanecm@deploy1001: Synchronized dblists: Creating bclwiktionary ([[phab:T270274|T270274]]) (duration: 00m 55s)
* 06:42 ayounsi@cumin1001: START - Cookbook sre.network.cf
* 16:29 urbanecm@deploy1001: Synchronized wmf-config/db-codfw.php: Creating bclwiktionary ([[phab:T270274|T270274]]) (duration: 00m 55s)
* 06:28 ayounsi@cumin2002: END (PASS) - Cookbook sre.network.cf (exit_code=0)
* 16:26 urbanecm@deploy1001: Synchronized wmf-config/db-eqiad.php: Creating bclwiktionary ([[phab:T270274|T270274]]) (duration: 00m 54s)
* 06:28 ayounsi@cumin2002: START - Cookbook sre.network.cf
* 16:25 moritzm: installing openldap security updates on stretch (client tools/libs only, all slapd installation on Buster and fixed already)
* 05:35 ryankemper@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) restart without plugin upgrade (1 nodes at a time) for ElasticSearch cluster cloudelastic: cloudelastic restart - ryankemper@cumin1001 - [[phab:T292814|T292814]]
* 16:22 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Creating diqwiktionary ([[phab:T270275|T270275]]) (duration: 00m 56s)
* 04:56 ryankemper: [WDQS Deploy] Deploy complete. Successful test query placed on query.wikidata.org, there's no relevant criticals in Icinga, and Grafana looks good
* 16:20 andrew@deploy1001: Finished deploy [striker/deploy@ba6c0ae]: Striker deploy for [[phab:T271621|T271621]] (duration: 02m 02s)
* 04:32 ryankemper: [[phab:T292814|T292814]] Beginning rolling restart of `cloudelastic`: `sudo -i cookbook sre.elasticsearch.rolling-operation cloudelastic "cloudelastic restart" --nodes-per-run 1 --start-datetime 2021-10-08T03:53:49 --task-id [[phab:T292814|T292814]]` on `ryankemper@cumin1001` tmux `elastic`
* 16:18 andrew@deploy1001: Started deploy [striker/deploy@ba6c0ae]: Striker deploy for [[phab:T271621|T271621]]
* 04:31 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation restart without plugin upgrade (1 nodes at a time) for ElasticSearch cluster cloudelastic: cloudelastic restart - ryankemper@cumin1001 - [[phab:T292814|T292814]]
* 16:17 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Creating diqwiktionary ([[phab:T270275|T270275]]) (duration: 01m 34s)
* 04:29 ryankemper: [WDQS Deploy] Restarting `wdqs-categories` across lvs-managed hosts, one node at a time: `sudo -E cumin -b 1 'A:wdqs-all and not A:wdqs-test' 'depool && sleep 45 && systemctl restart wdqs-categories && sleep 45 && pool'`
* 16:17 moritzm: installing remaining p11-kit security updates on stretch
* 04:28 ryankemper: [WDQS Deploy] Restarted `wdqs-categories` across both test hosts simultaneously: `sudo -E cumin 'A:wdqs-test' 'systemctl restart wdqs-categories'`
* 16:15 urbanecm@deploy1001: rebuilt and synchronized wikiversions files: Creating diqwiktionary ([[phab:T270275|T270275]])
* 04:28 ryankemper: [WDQS Deploy] Restarted `wdqs-updater` across all hosts, 4 hosts at a time: `sudo -E cumin -b 4 'A:wdqs-all' 'systemctl restart wdqs-updater'`
* 16:14 urbanecm@deploy1001: Synchronized dblists: Creating diqwiktionary ([[phab:T270275|T270275]]) (duration: 00m 57s)
* 04:23 ryankemper@deploy1002: Finished deploy [wdqs/wdqs@8f57a56]: 0.3.89 (duration: 08m 22s)
* 16:13 urbanecm@deploy1001: Synchronized wmf-config/db-codfw.php: Creating diqwiktionary ([[phab:T270275|T270275]]) (duration: 00m 57s)
* 04:20 ryankemper@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) restart without plugin upgrade (1 nodes at a time) for ElasticSearch cluster cloudelastic: cloudelastic restart - ryankemper@cumin1001 - [[phab:T292814|T292814]]
* 16:12 urbanecm@deploy1001: Synchronized wmf-config/db-eqiad.php: Creating diqwiktionary ([[phab:T270275|T270275]]) (duration: 00m 55s)
* 04:20 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation restart without plugin upgrade (1 nodes at a time) for ElasticSearch cluster cloudelastic: cloudelastic restart - ryankemper@cumin1001 - [[phab:T292814|T292814]]
* 16:07 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Creating niawiktionary ([[phab:T270409|T270409]]) (duration: 00m 55s)
* 04:18 gehel@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99)
* 16:05 urbanecm@deploy1001: Synchronized static/images/project-logos/: Creating niawiktionary ([[phab:T270409|T270409]]) (duration: 00m 56s)
* 04:17 gehel@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99)
* 16:04 urbanecm@deploy1001: rebuilt and synchronized wikiversions files: Creating niawiktionary ([[phab:T270409|T270409]])
* 04:15 ryankemper: [WDQS Deploy] Tests passing following deploy of `0.3.89` on canary `wdqs1003`; proceeding to rest of fleet
* 16:03 urbanecm@deploy1001: Synchronized dblists: Creating niawiktionary ([[phab:T270409|T270409]]) (duration: 00m 55s)
* 04:14 ryankemper@deploy1002: Started deploy [wdqs/wdqs@8f57a56]: 0.3.89
* 16:02 urbanecm@deploy1001: Synchronized wmf-config/db-codfw.php: Creating niawiktionary ([[phab:T270409|T270409]]) (duration: 00m 56s)
* 04:14 ryankemper: [WDQS Deploy] Gearing up for deploy of wdqs `0.3.89`. Pre-deploy tests passing on canary `wdqs1003`
* 16:01 urbanecm@deploy1001: Synchronized wmf-config/db-eqiad.php: Creating niawiktionary ([[phab:T270409|T270409]]) (duration: 00m 56s)
* 03:58 ryankemper@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) restart without plugin upgrade (1 nodes at a time) for ElasticSearch cluster cloudelastic: cloudelastic restart - ryankemper@cumin1001 - [[phab:T292814|T292814]]
* 15:57 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc2037.codfw.wmnet with reason: REIMAGE
* 03:58 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation restart without plugin upgrade (1 nodes at a time) for ElasticSearch cluster cloudelastic: cloudelastic restart - ryankemper@cumin1001 - [[phab:T292814|T292814]]
* 15:57 andrew@deploy1001: Finished deploy [striker/deploy@b2804f2]: Striker deploy for [[phab:T271621|T271621]] (duration: 02m 05s)
* 02:04 Krinkle: krinkle@deploy1002$ echo 'https://en.wikipedia.org/static/images/project-logos/jvwiktionary.png' {{!}} mwscript purgeList.php , ref [[phab:T287425|T287425]], [[phab:T292810|T292810]]
* 15:56 urbanecm@deploy1001: Synchronized langlist: Creating niawiki ([[phab:T270408|T270408]]) (duration: 00m 53s)
* 00:07 tgr_: deploy window over
* 15:55 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Creating niawiki ([[phab:T270408|T270408]]) (duration: 00m 55s)
* 00:05 tgr@deploy1002: Synchronized php-1.38.0-wmf.3/extensions/GrowthExperiments: Backport: [[gerrit:727498{{!}}Mentee overview: Make UncachedMenteeOverviewDataProvider::getBlocksForUsers faster (T290609)]] (duration: 00m 56s)
* 15:55 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1028.eqiad.wmnet with reason: REIMAGE
* 15:55 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mc2037.codfw.wmnet with reason: REIMAGE
* 15:55 andrew@deploy1001: Started deploy [striker/deploy@b2804f2]: Striker deploy for [[phab:T271621|T271621]]
* 15:54 urbanecm@deploy1001: Synchronized static/images/project-logos/: Creating niawiki ([[phab:T270408|T270408]]) (duration: 00m 56s)
* 15:54 urbanecm@deploy1001: rebuilt and synchronized wikiversions files: Creating niawiki ([[phab:T270408|T270408]])
* 15:53 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1028.eqiad.wmnet with reason: REIMAGE
* 15:52 urbanecm@deploy1001: Synchronized dblists: Creating niawiki ([[phab:T270408|T270408]]) (duration: 00m 57s)
* 15:51 urbanecm@deploy1001: Synchronized wmf-config/db-codfw.php: Creating niawiki ([[phab:T270408|T270408]]) (duration: 00m 57s)
* 15:50 urbanecm@deploy1001: Synchronized wmf-config/db-eqiad.php: Creating niawiki ([[phab:T270408|T270408]]) (duration: 00m 56s)
* 15:48 andrew@deploy1001: Finished deploy [striker/deploy@fb85bfd]: Striker deploy for [[phab:T271621|T271621]] (duration: 00m 43s)
* 15:47 andrew@deploy1001: Started deploy [striker/deploy@fb85bfd]: Striker deploy for [[phab:T271621|T271621]]
* 15:47 andrew@deploy1001: Finished deploy [striker/deploy@fb85bfd]: Striker deploy for [[phab:T271621|T271621]] (duration: 01m 45s)
* 15:45 andrew@deploy1001: Started deploy [striker/deploy@fb85bfd]: Striker deploy for [[phab:T271621|T271621]]
* 15:42 andrew@deploy1001: Finished deploy [striker/deploy@fb85bfd]: Striker deploy for [[phab:T271621|T271621]] (duration: 01m 04s)
* 15:41 marostegui@cumin1001: dbctl commit (dc=all): 'db1078 (re)pooling @ 100%: After schema change', diff saved to https://phabricator.wikimedia.org/P13720 and previous config saved to /var/cache/conftool/dbconfig/20210111-154123-root.json
* 15:41 andrew@deploy1001: Started deploy [striker/deploy@fb85bfd]: Striker deploy for [[phab:T271621|T271621]]
* 15:36 otto@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Migrate SpecialMuteSubmit to EventGate - [[phab:T268517|T268517]] (duration: 00m 58s)
* 15:32 effie: upgrading python-thumbor-wikimedia to 2.9 on thumbor1001
* 15:31 jmm@cumin2001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
* 15:26 marostegui@cumin1001: dbctl commit (dc=all): 'db1078 (re)pooling @ 75%: After schema change', diff saved to https://phabricator.wikimedia.org/P13719 and previous config saved to /var/cache/conftool/dbconfig/20210111-152619-root.json
* 15:11 marostegui@cumin1001: dbctl commit (dc=all): 'db1078 (re)pooling @ 50%: After schema change', diff saved to https://phabricator.wikimedia.org/P13718 and previous config saved to /var/cache/conftool/dbconfig/20210111-151116-root.json
* 15:06 jmm@cumin2001: START - Cookbook sre.ganeti.makevm
* 15:05 jmm@cumin2001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99)
* 15:05 jmm@cumin2001: START - Cookbook sre.ganeti.makevm
* 14:56 marostegui@cumin1001: dbctl commit (dc=all): 'db1078 (re)pooling @ 25%: After schema change', diff saved to https://phabricator.wikimedia.org/P13717 and previous config saved to /var/cache/conftool/dbconfig/20210111-145612-root.json
* 14:52 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1078', diff saved to https://phabricator.wikimedia.org/P13716 and previous config saved to /var/cache/conftool/dbconfig/20210111-145239-marostegui.json
* 14:32 XioNoX: add Routinator 0.8.2 to APT repo - [[phab:T269738|T269738]]
* 14:22 moritzm: restarting FPM/Apache on app server canaries for curl update
* 14:13 marostegui: Deploy schema change on s3 codfw master - [[phab:T270187|T270187]]
* 13:52 moritzm: installing curl security updates on stretch
* 13:42 marostegui@cumin1001: dbctl commit (dc=all): 'db1121 (re)pooling @ 100%: After schema change', diff saved to https://phabricator.wikimedia.org/P13713 and previous config saved to /var/cache/conftool/dbconfig/20210111-134213-root.json
* 13:27 marostegui@cumin1001: dbctl commit (dc=all): 'db1121 (re)pooling @ 66%: After schema change', diff saved to https://phabricator.wikimedia.org/P13712 and previous config saved to /var/cache/conftool/dbconfig/20210111-132709-root.json
* 13:12 marostegui@cumin1001: dbctl commit (dc=all): 'db1121 (re)pooling @ 33%: After schema change', diff saved to https://phabricator.wikimedia.org/P13711 and previous config saved to /var/cache/conftool/dbconfig/20210111-131206-root.json
* 11:38 jdrewniak@deploy1001: Synchronized portals: Wikimedia Portals Update: [[gerrit:655418{{!}} Bumping portals to master (T128546)]] (duration: 01m 03s)
* 11:37 jdrewniak@deploy1001: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: [[gerrit:655418{{!}} Bumping portals to master (T128546)]] (duration: 00m 56s)
* 11:10 XioNoX: upgrade Routinator to 0.8.2 on rpki2001 - [[phab:T269738|T269738]]
* 11:10 jbond42: push change to ratelimit vscode-phabricator - [[phab:T271528|T271528]]
* 10:36 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|ab9e80dad5c44ff72a6fa7568a5ba59798df3d4e}}: Enable anniversary logo for cs.wikipedia ([[phab:T271662|T271662]]; 2/2) (duration: 00m 56s)
* 10:35 urbanecm@deploy1001: Synchronized static/images/project-logos/: {{Gerrit|ab9e80dad5c44ff72a6fa7568a5ba59798df3d4e}}: Enable anniversary logo for cs.wikipedia ([[phab:T271662|T271662]]; 1/2) (duration: 01m 00s)
* 10:06 ema: cp3050: restart ats-be to lower lua states from 256 to 64 [[phab:T265625|T265625]]
* 09:31 marostegui: Sanitize db1155:3314 - [[phab:T268742|T268742]]
* 09:31 marostegui: Deploy schema change on s1 codfw master - [[phab:T270187|T270187]]
* 09:02 elukey: force puppet on logstash1007 after ES OOM
* 08:55 godog: swift codfw-prod: more weight to ms-be20[58-61] - [[phab:T269337|T269337]]
* 08:24 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1030.eqiad.wmnet with reason: REIMAGE
* 08:22 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc2030.codfw.wmnet with reason: REIMAGE
* 08:20 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1030.eqiad.wmnet with reason: REIMAGE
* 08:19 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mc2030.codfw.wmnet with reason: REIMAGE
* 07:49 dcausse: depooling & restarting blazegraph on wdqs2007 ([[phab:T242453|T242453]])
* 07:48 marostegui@cumin1001: dbctl commit (dc=all): 'db1136 (re)pooling @ 100%: After schema change', diff saved to https://phabricator.wikimedia.org/P13709 and previous config saved to /var/cache/conftool/dbconfig/20210111-074853-root.json
* 07:43 dcausse: repool wdqs1007 (wrong machine) ([[phab:T242453|T242453]])
* 07:41 dcausse: depooling & restarting blazegraph on wdqs1007 ([[phab:T242453|T242453]])
* 07:33 marostegui@cumin1001: dbctl commit (dc=all): 'db1136 (re)pooling @ 75%: After schema change', diff saved to https://phabricator.wikimedia.org/P13708 and previous config saved to /var/cache/conftool/dbconfig/20210111-073349-root.json
* 07:18 marostegui@cumin1001: dbctl commit (dc=all): 'db1136 (re)pooling @ 50%: After schema change', diff saved to https://phabricator.wikimedia.org/P13707 and previous config saved to /var/cache/conftool/dbconfig/20210111-071846-root.json
* 07:12 marostegui: Deploy schema change on s8 codfw master - [[phab:T270187|T270187]]
* 07:03 marostegui@cumin1001: dbctl commit (dc=all): 'db1136 (re)pooling @ 25%: After schema change', diff saved to https://phabricator.wikimedia.org/P13706 and previous config saved to /var/cache/conftool/dbconfig/20210111-070342-root.json
* 06:56 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1136', diff saved to https://phabricator.wikimedia.org/P13704 and previous config saved to /var/cache/conftool/dbconfig/20210111-065640-marostegui.json
* 06:55 marostegui@cumin1001: dbctl commit (dc=all): 'db1079 (re)pooling @ 100%: After schema change', diff saved to https://phabricator.wikimedia.org/P13703 and previous config saved to /var/cache/conftool/dbconfig/20210111-065550-root.json
* 06:40 marostegui@cumin1001: dbctl commit (dc=all): 'db1079 (re)pooling @ 50%: After schema change', diff saved to https://phabricator.wikimedia.org/P13702 and previous config saved to /var/cache/conftool/dbconfig/20210111-064046-root.json
* 06:32 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1079', diff saved to https://phabricator.wikimedia.org/P13701 and previous config saved to /var/cache/conftool/dbconfig/20210111-063226-marostegui.json
* 06:31 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1094', diff saved to https://phabricator.wikimedia.org/P13700 and previous config saved to /var/cache/conftool/dbconfig/20210111-063155-marostegui.json
* 06:31 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1094', diff saved to https://phabricator.wikimedia.org/P13699 and previous config saved to /var/cache/conftool/dbconfig/20210111-063124-marostegui.json
* 06:04 marostegui: Depool db1121 to clone db1155:3314
* 06:04 marostegui: Deploy schema change on s7 codfw master - [[phab:T270187|T270187]]
* 06:03 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1121', diff saved to https://phabricator.wikimedia.org/P13698 and previous config saved to /var/cache/conftool/dbconfig/20210111-060342-marostegui.json


== 2021-01-09 ==
== 2021-10-07 ==
* 00:11 mutante: puppetmaster2003 - restarted apache after spweing 500s
* 23:43 thcipriani@deploy1002: Synchronized wmf-config/logos.php: Config: [[gerrit:708065{{!}}Change Javanese Wiktionary logo (T287425)]] part 3/3 (duration: 00m 55s)
* 23:41 thcipriani@deploy1002: Synchronized logos/config.yaml: Config: [[gerrit:708065{{!}}Change Javanese Wiktionary logo (T287425)]] part 2/3 (duration: 00m 55s)
* 23:40 thcipriani@deploy1002: Synchronized static/images/project-logos: Config: [[gerrit:708065{{!}}Change Javanese Wiktionary logo (T287425)]] part 1/3 (duration: 00m 56s)
* 23:30 thcipriani@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:704170{{!}}Adding and use wordmark in trwikiquote (T286133)]] Part 2/2 (duration: 00m 56s)
* 23:28 thcipriani@deploy1002: Synchronized static/images/mobile/copyright/wikiquote-wordmark-tr.svg: Config: [[gerrit:704170{{!}}Adding and use wordmark in trwikiquote (T286133)]] Part 1/2 (duration: 00m 57s)
* 21:35 urbanecm: Password reset for SUL User:LA2-bot ([[phab:T292793|T292793]])
* 20:43 brennen@deploy1002: rebuilt and synchronized wikiversions files: all wikis to 1.38.0-wmf.3
* 20:37 brennen@deploy1002: rebuilt and synchronized wikiversions files: all wikis to 1.38.0-wmf.2  refs [[phab:T281167|T281167]]
* 20:35 cmooney@cumin1001: END (PASS) - Cookbook sre.network.cf (exit_code=0)
* 20:35 cmooney@cumin1001: START - Cookbook sre.network.cf
* 20:23 krinkle@deploy1002: Synchronized php-1.38.0-wmf.3/extensions/Gadgets/: {{Gerrit|I7c858b8c4bc}} (duration: 00m 56s)
* 20:01 urbanecm@deploy1002: Synchronized php-1.38.0-wmf.3/extensions/Echo/: {{Gerrit|8a7ff05ba28f302adb581bf430a868bb815b4ffd}}: Revert "Use namespaced CentralAuthSessionProvider" (duration: 00m 57s)
* 19:45 urbanecm@deploy1002: Synchronized php-1.38.0-wmf.3/extensions/CentralAuth/: {{Gerrit|c01c2e4983bad8582ddd62aeb35ac9be852d493b}}: Revert "Namespace session providers" (duration: 00m 57s)
* 19:44 urbanecm: Backporting https://gerrit.wikimedia.org/r/c/mediawiki/extensions/CentralAuth/+/727489, https://gerrit.wikimedia.org/r/c/mediawiki/extensions/Echo/+/727487 in an unsafe way -- exceptions at testwikis expected, wmf.3 is not deployed elsewhere, so this should be ok
* 19:37 brennen@deploy1002: rebuilt and synchronized wikiversions files: Revert all wikis to 1.38.0-wmf.2 ([[phab:T281167|T281167]])
* 19:33 brennen: 1.38.0-wmf.3 train ([[phab:T281167|T281167]]): variously blocked, rolling back to testwikis for safe deploy of backports
* 19:14 brennen@deploy1002: rebuilt and synchronized wikiversions files: Revert group2 wikis to 1.38.0-wmf.2
* 19:07 brennen@deploy1002: rebuilt and synchronized wikiversions files: all wikis to 1.38.0-wmf.3  refs [[phab:T281167|T281167]]
* 19:03 brennen: 1.38.0-wmf.3 train ([[phab:T281167|T281167]]): unblocked, rolling to all wikis
* 18:50 urbanecm: [urbanecm@mwmaint1002 /srv/mediawiki/php]$ mwscript extensions/GrowthExperiments/maintenance/initWikiConfig.php --wiki=test2wiki
* 18:46 sukhe: running authdns-update for [[phab:T292537|T292537]]
* 18:29 urbanecm: Morning B&C window done
* 18:28 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|4a946c046ae17a520f8d3463a16b1435ceb4856c}}: Deploy Growth mentor dashboard to pilot wikis ([[phab:T278920|T278920]]) (duration: 01m 04s)
* 18:23 urbanecm@deploy1002: Synchronized dblists/growthexperiments.dblist: {{Gerrit|87e300137c14451949fac12c3ec89319305a423e}}: Deploy Growth features to test2wiki (duration: 01m 03s)
* 18:21 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|87e300137c14451949fac12c3ec89319305a423e}}: Deploy Growth features to test2wiki (duration: 01m 04s)
* 18:20 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|31770f2b3660e7d7490c0a9ab66285c1f069732d}}: shwiki: Deploy Growth features to newcomers ([[phab:T278240|T278240]]) (duration: 01m 04s)
* 18:15 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|33526dfed148068585289f5ac501feda72068fd9}}: Stream config changes for android_daily_stats schema ([[phab:T286000|T286000]]) (duration: 01m 06s)
* 18:10 ejegg: updated payments-wiki from {{Gerrit|6d3560d083}} to {{Gerrit|030b11da1a}}
* 18:07 arnoldokoth: gitlab2001 re-image complete ([[phab:T283076|T283076]])
* 17:30 mutante: rebooting gitlab2001.wikimedia.org
* 16:56 arnoldokoth: down timing gitlab2001 for re-imaging ([[phab:T283076|T283076]])
* 16:47 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on gitlab2001.wikimedia.org with reason: reimage
* 16:47 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on gitlab2001.wikimedia.org with reason: reimage
* 16:32 hnowlan: roll restarting maps cassandra instances for java updates
* 16:19 ayounsi@cumin2002: END (PASS) - Cookbook sre.network.cf (exit_code=0)
* 16:19 ayounsi@cumin2002: START - Cookbook sre.network.cf
* 16:18 ayounsi@cumin1001: END (FAIL) - Cookbook sre.network.cf (exit_code=99)
* 16:18 ayounsi@cumin1001: START - Cookbook sre.network.cf
* 16:18 ayounsi@cumin1001: END (FAIL) - Cookbook sre.network.cf (exit_code=99)
* 16:18 ayounsi@cumin1001: START - Cookbook sre.network.cf
* 15:07 hashar@deploy1002: Finished deploy [gerrit/gerrit@13cef9f]: Gerrit to 3.3.6 on gerrit1001 (duration: 00m 08s)
* 15:07 hashar@deploy1002: Started deploy [gerrit/gerrit@13cef9f]: Gerrit to 3.3.6 on gerrit1001
* 14:50 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 14:49 hashar@deploy1002: Finished deploy [gerrit/gerrit@13cef9f]: Gerrit to 3.3.6 on gerrit2001 (duration: 00m 10s)
* 14:49 hashar@deploy1002: Started deploy [gerrit/gerrit@13cef9f]: Gerrit to 3.3.6 on gerrit2001
* 14:48 hashar: Upgrading Gerrit replica to 3.3.6 # [[phab:T290236|T290236]]
* 14:48 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 14:34 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 14:31 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 13:56 jiji@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
* 13:46 jiji@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
* 13:39 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 13:30 jiji@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
* 13:29 jiji@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
* 13:29 hashar: restarting CI Jenkins for git plugin update
* 13:19 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 13:15 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 13:14 hashar: Upgraded CI Jenkins on contint2001
* 13:14 jiji@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
* 13:13 jiji@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
* 13:10 jiji@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
* 13:09 jiji@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
* 13:06 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 13:06 volans@cumin2002: END (PASS) - Cookbook sre.experimental.reimage (exit_code=0) for host sretest1001.eqiad.wmnet
* 13:05 jiji@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
* 13:05 jiji@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
* 12:56 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 12:56 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 12:51 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 12:40 volans@cumin2002: START - Cookbook sre.experimental.reimage for host sretest1001.eqiad.wmnet
* 12:16 moritzm: installing testvm2005
* 11:59 moritzm: installing openssl security updates for stretch (buster/bullseye already fixed)
* 11:52 Lucas_WMDE: EU backport+config window (aka UTC morning) done
* 11:52 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 11:50 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:725858{{!}}Enable Content and Section Translation to Kurdish WP (T290238)]] (duration: 01m 04s)
* 11:49 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 11:44 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.38.0-wmf.3/extensions/WikidataPageBanner/includes/WikidataPageBannerFunctions.php: Backport: [[gerrit:727188{{!}}Change PropertyId to NumericPropertyId (T289125, T292667)]] (duration: 01m 05s)
* 11:41 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 11:38 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 11:10 jbond: update puppet stdlib gerrit:726872
* 09:36 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 09:31 jmm@cumin2002: START - Cookbook sre.dns.netbox
* 09:27 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts testvm2004.codfw.wmnet
* 09:26 mvernon@cumin2002: END (PASS) - Cookbook sre.experimental.reimage (exit_code=0) for host ms-be2045.codfw.wmnet
* 09:25 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host testvm2005.codfw.wmnet
* 09:19 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts testvm2004.codfw.wmnet
* 09:08 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host testvm2005.codfw.wmnet
* 08:49 mvernon@cumin2002: START - Cookbook sre.experimental.reimage for host ms-be2045.codfw.wmnet
* 08:36 moritzm: imported jenkins 2.303.2 to thirdparty/ci component for buster-wikimedia
* 07:57 Emperor: re-enabling puppet on ms-be2045 after hw work [[phab:T290881|T290881]]
* 07:39 jayme@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'sync'.
* 07:39 jayme@deploy1002: helmfile [eqiad] START helmfile.d/admin 'sync'.
* 07:38 jayme@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'sync'.
* 07:37 jayme@deploy1002: helmfile [codfw] START helmfile.d/admin 'sync'.
* 07:34 jayme@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'sync'.
* 07:33 jayme@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'sync'.
* 07:32 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'sync'.
* 07:31 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'sync'.
* 06:21 ryankemper: [Elastic] Restart of `relforge` complete
* 06:05 ryankemper: [Elastic] Cluster in green status, proceeding to next and final node => `ryankemper@relforge1003:~$ sudo systemctl restart elasticsearch_6@relforge-eqiad-small-alpha.service && sudo systemctl restart elasticsearch_6@relforge-eqiad.service`
* 05:53 ryankemper: [Elastic] `ryankemper@relforge1004:~$ sudo systemctl restart elasticsearch_6@relforge-eqiad-small-alpha.service && sudo systemctl restart elasticsearch_6@relforge-eqiad.service`
* 05:48 ryankemper: [Elastic] Performing rolling restarts of `relforge`. `relforge1003` is the master so I'll restart `relforge1004` first to minimize disruption
* 03:00 ejegg: updated payments-wiki from {{Gerrit|23d0ffac66}} to {{Gerrit|6d3560d083}}
* 02:30 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 02:28 tstarling@deploy1002: Synchronized wmf-config/CommonSettings.php: enable Parsoid API everywhere (duration: 01m 04s)
* 02:27 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 00:11 mutante: [grafana2001:~] $ sudo systemctl start rsync-var-lib-grafana  because of "PROBLEM - Check systemd state on grafana2001 is CRITICAL: CRITICAL - degraded" because of some race condition where a file vanished during sync


== 2021-01-08 ==
== 2021-10-06 ==
* 19:48 andrew@deploy1001: Finished deploy [striker/deploy@e4db843]: Striker deploy for [[phab:T269004|T269004]] (duration: 02m 11s)
* 23:57 mutante: releases2002 - rm /srv/org/wikimedia/reprepro/conf/distributions - contains only jessie-mediawiki - see 725670 and EOL of MediaWiki 1.31
* 19:45 andrew@deploy1001: Started deploy [striker/deploy@e4db843]: Striker deploy for [[phab:T269004|T269004]]
* 23:32 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 19:28 andrew@deploy1001: Finished deploy [horizon/deploy@7466703]: Horizon with a bunch of Buster patches (duration: 02m 35s)
* 23:29 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 19:26 andrew@deploy1001: Started deploy [horizon/deploy@7466703]: Horizon with a bunch of Buster patches
* 23:21 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 18:02 joal@deploy1001: Finished deploy [analytics/refinery@db9da3c] (thin): Hotfix analytics deployment - THIN [analytics/refinery@db9da3c] (duration: 00m 07s)