You are browsing a read-only backup copy of Wikitech. The primary site can be found at wikitech.wikimedia.org

Difference between revisions of "Server Admin Log"

From Wikitech-static
Jump to navigation Jump to search
imported>Stashbot
(urbanecm@deploy1001: Synchronized php-1.36.0-wmf.20/vendor/: 3278ffd107888757c4620383160a6d5fa67d05b5: Bump wikimedia/parsoid to v0.13.0-a19 (T269685) (duration: 01m 16s))
imported>Stashbot
(ejegg: updated payments-wiki from 756c2f7ce0 to df80a99b40)
Line 1: Line 1:
== 2020-12-10 ==
* 00:42 ejegg: updated payments-wiki from {{Gerrit|756c2f7ce0}} to {{Gerrit|df80a99b40}}
* 00:26 robh: cr2-eqsin bad fan being swapped via [[phab:T267544|T267544]]
== 2020-12-09 ==
== 2020-12-09 ==
* 23:21 mutante: repooling parse2001 after buster reimage - [[phab:T268524|T268524]]
* 23:19 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2243.codfw.wmnet
* 23:16 mutante: repooling parse2001 after buster reimage - [[phab:T245757|T245757]]
* 23:15 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=parse2001.codfw.wmnet
* 23:04 mutante: zero.wikimedia.beta.wmflabs.org removed from beta_sites (deployment-prep) [[phab:T187716|T187716]]
* 22:32 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 22:30 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 21:40 bstorm: shutting down labstore1006 for maintenance [[phab:T268285|T268285]]
* 20:27 mutante: mw1281,mw1282,mw1283 - scap pull
* 20:26 mutante: repooling mw1281,mw1282,mw1283 - now in rack A8
* 20:26 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw128[1-3].eqiad.wmnet
* 20:22 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1283.eqiad.wmnet
* 20:22 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1282.eqiad.wmnet
* 20:21 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1281.eqiad.wmnet
* 20:18 twentyafterfour: wmf.21 looks good on group1 wikis. Still seeing [[phab:T269603|T269603]] but not at an increased rate. (refs [[phab:T264801|T264801]])
* 20:13 twentyafterfour@deploy1001: Synchronized php: group1 wikis to 1.36.0-wmf.21 (duration: 01m 02s)
* 20:12 twentyafterfour@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.36.0-wmf.21
* 19:38 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|ce01bbe7b05eda8065fc57c865a69370e8aae797}}: Enable ArticlePlaceholder at papwiki ([[phab:T223693|T223693]]) (duration: 01m 02s)
* 19:17 twentyafterfour@deploy1001: Synchronized php-1.36.0-wmf.21/includes/page/Article.php: deploy {{Gerrit|0d99fe6d54}} Article::view - remove the old subtitle from doOutputFromParserCache. Bug: [[phab:T269727|T269727]] (duration: 01m 04s)
* 18:59 mutante: testreduce1001 - installed make
* 18:48 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 18:43 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 18:42 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 18:41 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 18:17 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw2243.codfw.wmnet
* 18:16 mutante: depooling mw2243 (jobrunner) for reimaging ([[phab:T245757|T245757]])
* 18:15 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2243.codfw.wmnet
* 18:12 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 18:12 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 18:05 mutante: mw1281,mw1282,mw1283 shut down for [[phab:T266164|T266164]]
* 18:02 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 18:02 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 18:02 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 18:02 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 18:01 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 18:01 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 17:59 clarakosi@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' .
* 17:58 clarakosi@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' .
* 17:57 clarakosi@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'staging' .
* 17:49 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw128[1-3].eqiad.wmnet
* 17:24 mutante: depooling 3 API appservers in eqiad to physically move to another rack
* 17:23 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw128[1-3].eqiad.wmnet
* 16:52 mbsantos@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
* 16:49 mbsantos@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
* 16:48 mbsantos@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'mobileapps' for release 'staging' .
* 16:10 ema: deployment-cache-text06: deploy varnish 6.0.0-1wm1 [[phab:T264398|T264398]]
* 16:06 moritzm: updating mwdebug1003, parse2001, deploy1002, deploy2002 to wikidiff 1.10.0-1~wmf1+buster1
* 16:05 moritzm: importing wikidiff2 1.10.0-1~wmf1+buster1 to component/php72 [[phab:T250515|T250515]]
* 15:56 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: dc=eqiad,cluster=maps,service=kartotherian,name=maps1001.eqiad.wmnet
* 15:47 hnowlan: reimaging restbase2009 after disk replacement
* 15:29 moritzm: restarting nginx on htmldump1001 to pick up OpenSSL security updates
* 13:54 godog: experiment with rsync.service increased niceness on ms-be2057 - [[phab:T269337|T269337]]
* 13:27 jmm@cumin2001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99)
* 13:25 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
* 13:03 XioNoX: standardize Private-Peer BGP group on all cr*
* 12:30 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: dc=eqiad,cluster=maps,service=kartotherian,name=maps1001.eqiad.wmnet
* 12:29 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 12:29 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime
* 12:24 Urbanecm: Eu B&C window done
* 12:23 urbanecm@deploy1001: Synchronized w/static.php: {{Gerrit|cfb36023ac873c00e680032999b7c21c2a105132}}: Remove unsupported arg in MediaWiki::doPostOutputShutdown() call (duration: 01m 02s)
* 12:20 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: dc=codfw,cluster=restbase,service=restbase,name=restbase2009.codfw.wmnet
* 12:16 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|3414289c8c7272185e30cacc3df5d5dbc719219d}}: Add extended-confirmed group and restriction level for bgwiki ([[phab:T269709|T269709]]) (duration: 01m 19s)
* 11:06 godog: reboot ms-be1019 / ms-be1020 - [[phab:T268435|T268435]]
* 10:56 godog: change librenms alerts and transport groups to use alertmanager - [[phab:T267018|T267018]]
* 10:45 moritzm: installing openssl updates on Buster
* 09:24 jbond42: make message mandatory for disable-puppet
* 09:03 godog: swift codfw-prod: add ms-be20[58-61] - [[phab:T269337|T269337]]
* 01:07 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Explicitly set wgAbuseFilterAflFilterMigrationStage ahead of train roll-out [[phab:T269712|T269712]] (duration: 01m 03s)
* 00:53 urbanecm@deploy1001: Synchronized php-1.36.0-wmf.20/vendor/: {{Gerrit|3278ffd107888757c4620383160a6d5fa67d05b5}}: Bump wikimedia/parsoid to v0.13.0-a19 ([[phab:T269685|T269685]]) (duration: 01m 16s)
* 00:53 urbanecm@deploy1001: Synchronized php-1.36.0-wmf.20/vendor/: {{Gerrit|3278ffd107888757c4620383160a6d5fa67d05b5}}: Bump wikimedia/parsoid to v0.13.0-a19 ([[phab:T269685|T269685]]) (duration: 01m 16s)


Line 650: Line 722:
* 00:17 ryankemper@cumin2001: START - Cookbook sre.wdqs.data-reload
* 00:17 ryankemper@cumin2001: START - Cookbook sre.wdqs.data-reload
* 00:16 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-reload
* 00:16 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-reload
== 2020-11-30 ==
* 23:12 razzi@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
* 23:08 mutante: parse2001 - sudo -i /usr/local/sbin/restart-php7.2-fpm
* 23:08 mutante: sudo -i /usr/local/sbin/restart-php7.2-fpm
* 22:45 razzi@cumin1001: START - Cookbook sre.hosts.decommission
* 22:42 reedy@deploy1001: Synchronized wmf-config/CommonSettings.php: Remove 1.34 from $wgExtDistSnapshotRefs [[phab:T268931|T268931]] (duration: 00m 57s)
* 22:34 razzi@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
* 22:21 cdanis@deploy1001: Synchronized docroot/thankyou: Also serve apple-app-site-assoc file from /.well-known/ [[phab:T259312|T259312]] {{Gerrit|bc52d1481}} (duration: 00m 57s)
* 22:15 razzi@cumin1001: START - Cookbook sre.hosts.decommission
* 22:14 razzi@cumin1001: START - Cookbook sre.ganeti.makevm
* 22:14 mutante: parse2001 - systemctl restart ferm - had to restart ferm after reimaging (though there weren't any alerts about that) but it fixed running httpbb tests on it ([[phab:T268524|T268524]])
* 22:13 ejegg: extended and re-synchronized timing of thank you mail sender and donation queue consumer
* 21:51 mutante: parse2001 - scap pull
* 21:51 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=parse2001.codfw.wmnet
* 21:45 razzi@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
* 21:38 razzi@cumin1001: START - Cookbook sre.hosts.decommission
* 21:00 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 20:58 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 20:58 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 20:55 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 20:47 razzi@cumin1001: START - Cookbook sre.ganeti.makevm
* 20:47 pt1979@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 20:45 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 20:43 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 20:43 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 20:42 mutante: reimaging deploy2002 with buster (not active, deploy1001/2001 are) [[phab:T265963|T265963]]
* 20:39 mutante: reimaging parse2001 (parsoid canary) with buster ([[phab:T268524|T268524]])
* 20:36 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=parse2001.codfw.wmnet
* 20:33 mutante: depooling parse2001 to prepare for reimage [[phab:T268524|T268524]]
* 20:32 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=parse2001.codfw.wmnet
* 20:28 mutante: reimaging deploy1002 with buster - not the active deployment server, deploy1001 still is ([[phab:T265963|T265963]])
* 20:10 ariel@deploy1001: Finished deploy [dumps/dumps@2f4d931]: per job batches for page content. step one. (duration: 00m 04s)
* 20:10 ariel@deploy1001: Started deploy [dumps/dumps@2f4d931]: per job batches for page content. step one.
* 19:52 papaul: power down ms-be2059 for RAID re-configuration
* 19:47 mutante: added Sukhbir to Ops vendor maintenance calendar permissions to make changes and share like all of SRE ([[phab:T229860|T229860]])
* 19:23 ppchelko@deploy1001: Synchronized wmf-config/CommonSettings.php: gerrit:644236 Decrease OAuth token expiration (duration: 00m 56s)
* 19:17 ppchelko@deploy1001: Synchronized wmf-config/InitialiseSettings.php: gerrit:644243 group2: switch ParserCache to JSON (duration: 00m 58s)
* 19:14 pt1979@cumin2001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 19:09 pt1979@cumin2001: START - Cookbook sre.dns.netbox
* 17:47 joal@deploy1001: Finished deploy [analytics/refinery@9db742d] (thin): Analytics special deploy before first of month - Hotfix -- THIN [analytics/refinery@9db742d] (duration: 00m 08s)
* 17:47 pt1979@cumin2001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 17:47 joal@deploy1001: Started deploy [analytics/refinery@9db742d] (thin): Analytics special deploy before first of month - Hotfix -- THIN [analytics/refinery@9db742d]
* 17:43 joal@deploy1001: Finished deploy [analytics/refinery@9db742d]: Analytics special deploy before first of month - Hotfix [analytics/refinery@9db742d] (duration: 11m 32s)
* 17:37 pt1979@cumin2001: START - Cookbook sre.dns.netbox
* 17:31 joal@deploy1001: Started deploy [analytics/refinery@9db742d]: Analytics special deploy before first of month - Hotfix [analytics/refinery@9db742d]
* 17:07 moritzm: reset failed (now obsolete idp-u2f-sync/stunnel4 services on idp1001
* 16:50 hnowlan@puppetmaster1001: conftool action : set/pooled=yes:weight=10; selector: dc=eqiad,cluster=maps,service=kartotherian,name=maps1008.eqiad.wmnet
* 16:24 volans: uploaded spicerack_0.0.45 to apt.wikimedia.org buster-wikimedia
* 16:09 ebernhardson@deploy1001: Finished deploy [wikimedia/discovery/analytics@b46380d]: oozie: Repoint hive to analytics-hive.eqiad.wmnet (duration: 01m 15s)
* 16:07 ebernhardson@deploy1001: Started deploy [wikimedia/discovery/analytics@b46380d]: oozie: Repoint hive to analytics-hive.eqiad.wmnet
* 15:43 moritzm: installing tomcat8 security updates
* 15:43 hnowlan@puppetmaster1001: conftool action : set/pooled=yes:weight=10; selector: dc=eqiad,cluster=maps,service=kartotherian,name=maps1007.eqiad.wmnet
* 15:34 ema: cp3054: upgrade varnish to 6.0.7-1wm1 [[phab:T268736|T268736]] [[phab:T264398|T264398]]
* 15:28 otto@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Migrate 2 Anti-Harassment schemas to EventGate on all wikis - [[phab:T268517|T268517]] (duration: 00m 56s)
* 15:15 otto@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Migrate 2 Anti-Harassment schemas to EventGate on testwiki - [[phab:T268517|T268517]] (duration: 01m 16s)
* 14:55 joal@deploy1001: Finished deploy [analytics/refinery@72ac883] (thin): Analytics special deploy before first of month -- THIN [analytics/refinery@72ac883] (duration: 00m 08s)
* 14:55 joal@deploy1001: Started deploy [analytics/refinery@72ac883] (thin): Analytics special deploy before first of month -- THIN [analytics/refinery@72ac883]
* 14:55 joal@deploy1001: Finished deploy [analytics/refinery@72ac883]: Analytics special deploy before first of month [analytics/refinery@72ac883] (duration: 09m 26s)
* 14:45 joal@deploy1001: Started deploy [analytics/refinery@72ac883]: Analytics special deploy before first of month [analytics/refinery@72ac883]
* 14:37 hnowlan@puppetmaster1001: conftool action : set/pooled=yes:weight=10; selector: dc=eqiad,cluster=maps,service=kartotherian,name=maps1006.eqiad.wmnet
* 14:32 marostegui@cumin1001: dbctl commit (dc=all): 'db1087 (re)pooling @ 100%: After cloning clouddb1016:3318', diff saved to https://phabricator.wikimedia.org/P13481 and previous config saved to /var/cache/conftool/dbconfig/20201130-143232-root.json
* 14:23 marostegui: Deploy schema change on s3 codfw, lag will show up on s3 codfw [[phab:T268004|T268004]]
* 14:19 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1092', diff saved to https://phabricator.wikimedia.org/P13480 and previous config saved to /var/cache/conftool/dbconfig/20201130-141953-marostegui.json
* 14:17 marostegui@cumin1001: dbctl commit (dc=all): 'db1087 (re)pooling @ 75%: After cloning clouddb1016:3318', diff saved to https://phabricator.wikimedia.org/P13479 and previous config saved to /var/cache/conftool/dbconfig/20201130-141729-root.json
* 14:11 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1092', diff saved to https://phabricator.wikimedia.org/P13478 and previous config saved to /var/cache/conftool/dbconfig/20201130-141146-marostegui.json
* 14:02 marostegui@cumin1001: dbctl commit (dc=all): 'db1087 (re)pooling @ 50%: After cloning clouddb1016:3318', diff saved to https://phabricator.wikimedia.org/P13477 and previous config saved to /var/cache/conftool/dbconfig/20201130-140226-root.json
* 13:58 ema: varnish 6.0.7-1wm1 uploaded to apt.wikimedia.org component/varnish6 [[phab:T268736|T268736]]
* 13:48 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1092', diff saved to https://phabricator.wikimedia.org/P13475 and previous config saved to /var/cache/conftool/dbconfig/20201130-134841-marostegui.json
* 13:47 marostegui@cumin1001: dbctl commit (dc=all): 'db1087 (re)pooling @ 25%: After cloning clouddb1016:3318', diff saved to https://phabricator.wikimedia.org/P13474 and previous config saved to /var/cache/conftool/dbconfig/20201130-134722-root.json
* 13:23 jbond42: update zeromq on jessie hosts
* 13:21 dcausse: depooling wdqs1004 (lag)
* 13:18 moritzm: CAS enabled for racktables
* 13:16 gilles@deploy1001: Synchronized debug.json: [[phab:T268167|T268167]] Add mwdebug1003 to list of debug servers (duration: 00m 56s)
* 12:50 Urbanecm: EU B&C window done
* 12:46 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|3476644e4c27dd28339f7b10c8871be2e9455394}}: Grant enwikibooks reviewers suppressredirect and raise move rate limit to 100/60 ([[phab:T268849|T268849]]; 2nd attempt) (duration: 00m 56s)
* 12:43 hnowlan@deploy1001: Finished deploy [kartotherian/deploy@0a38bc5]: Redeploy to fix gelf traffic (duration: 00m 24s)
* 12:43 hnowlan@deploy1001: Started deploy [kartotherian/deploy@0a38bc5]: Redeploy to fix gelf traffic
* 12:41 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: dc=eqiad,cluster=maps,service=kartotherian,name=maps1004.eqiad.wmnet
* 12:40 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|5585fd79119a4e077705789d1c1928c9e9efa956}}: Enable RelatedArticles on ptwikinews ([[phab:T268945|T268945]]) (duration: 00m 57s)
* 12:38 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|ba6d0f8fd2a443e5c913a292365063f01f2d076b}}: Grant enwikibooks reviewers suppressredirect and raise move rate limit to 100/60 ([[phab:T268849|T268849]]) (duration: 00m 57s)
* 12:37 hnowlan@deploy1001: Finished deploy [kartotherian/deploy@0a38bc5]: Newer codfw maps hosts (duration: 02m 05s)
* 12:37 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|9942d68c914a56073d2d192434ba24ff8cb921ba}}: Assign patrolmarks right to autoconfirmed users on itwiki ([[phab:T268734|T268734]]) (duration: 00m 57s)
* 12:35 hnowlan@puppetmaster1001: conftool action : set/pooled=yes:weight=10; selector: dc=eqiad,cluster=maps,service=kartotherian,name=maps1005.eqiad.wmnet
* 12:35 hnowlan@deploy1001: Started deploy [kartotherian/deploy@0a38bc5]: Newer codfw maps hosts
* 12:34 hnowlan@deploy1001: Finished deploy [tilerator/deploy@97575e4]: Newer codfw maps hosts (duration: 00m 24s)
* 12:34 hnowlan@deploy1001: Started deploy [tilerator/deploy@97575e4]: Newer codfw maps hosts
* 12:34 hnowlan@deploy1001: Finished deploy [tilerator/deploy@97575e4]: Newer codfw maps hosts (duration: 00m 51s)
* 12:33 hnowlan@deploy1001: Started deploy [tilerator/deploy@97575e4]: Newer codfw maps hosts
* 12:32 Lucas_WMDE: Deployed patch for [[phab:T260349|T260349]]
* 12:27 hnowlan@deploy1001: Finished deploy [tilerator/deploy@97575e4]: New eqiad maps hosts (duration: 00m 03s)
* 12:27 hnowlan@deploy1001: Started deploy [tilerator/deploy@97575e4]: New eqiad maps hosts
* 12:24 hnowlan@deploy1001: Finished deploy [kartotherian/deploy@0a38bc5]: New eqiad maps hosts (duration: 00m 03s)
* 12:24 hnowlan@deploy1001: Started deploy [kartotherian/deploy@0a38bc5]: New eqiad maps hosts
* 12:21 hnowlan@puppetmaster1001: conftool action : set/pooled=no:weight=0; selector: dc=eqiad,cluster=maps,service=kartotherian,name=maps1005.eqiad.wmnet
* 12:12 urbanecm@deploy1001: Synchronized wmf-config/CommonSettings.php: {{Gerrit|2922abe7b810f1b53b446af783dc9d51e6585225}}: Remove wgContentTranslationRESTBase config ([[phab:T266213|T266213]]) (duration: 00m 57s)
* 11:43 marostegui: Sanitize clouddb1016:3318 - [[phab:T267090|T267090]]
* 11:38 ema: A:cp upgrade fifo-log-demux to 0.6.2 [[phab:T268883|T268883]]
* 11:36 jdrewniak@deploy1001: Synchronized portals: Wikimedia Portals Update: [[gerrit:644196{{!}} Bumping portals to master (T128546)]] (duration: 00m 57s)
* 11:35 jdrewniak@deploy1001: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: [[gerrit:644196{{!}} Bumping portals to master (T128546)]] (duration: 01m 01s)
* 11:32 ariel@deploy1001: Finished deploy [dumps/dumps@e8c6267]: allow page content fixup script to write output files to arbitrary dir (duration: 00m 04s)
* 11:32 ariel@deploy1001: Started deploy [dumps/dumps@e8c6267]: allow page content fixup script to write output files to arbitrary dir
* 11:28 ema: upload fifo-log-demux 0.6.2 to buster-wikimedia [[phab:T268883|T268883]]
* 11:13 marostegui@cumin1001: dbctl commit (dc=all): 'db1134 (re)pooling @ 100%: After schema change', diff saved to https://phabricator.wikimedia.org/P13473 and previous config saved to /var/cache/conftool/dbconfig/20201130-111321-root.json
* 11:00 hnowlan: bootstrapping maps1005 cassandra
* 10:58 marostegui@cumin1001: dbctl commit (dc=all): 'db1134 (re)pooling @ 75%: After schema change', diff saved to https://phabricator.wikimedia.org/P13472 and previous config saved to /var/cache/conftool/dbconfig/20201130-105818-root.json
* 10:43 marostegui@cumin1001: dbctl commit (dc=all): 'db1134 (re)pooling @ 50%: After schema change', diff saved to https://phabricator.wikimedia.org/P13471 and previous config saved to /var/cache/conftool/dbconfig/20201130-104314-root.json
* 10:29 ema@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 10:29 marostegui: Compare data between clouddb1014:3312 clouddb1018:3312 labsdb1012 [[phab:T267090|T267090]]
* 10:29 marostegui: Compare data between clouddb1012:3312 clouddb1018:3312 labsdb1012 [[phab:T267090|T267090]]
* 10:28 marostegui@cumin1001: dbctl commit (dc=all): 'db1134 (re)pooling @ 25%: After schema change', diff saved to https://phabricator.wikimedia.org/P13470 and previous config saved to /var/cache/conftool/dbconfig/20201130-102811-root.json
* 10:24 akosiaris: applying https://gerrit.wikimedia.org/r/q/topic:%22k8s_config%22 series of patches
* 10:18 ema@cumin1001: START - Cookbook sre.hosts.reboot-single
* 10:18 ema: cp4031: reboot to test atsmtail/fifo-log-demux service dependencies -- https://gerrit.wikimedia.org/r/c/operations/puppet/+/643922 [[phab:T256467|T256467]]
* 10:11 ema: cp4032: upgrade varnish to 6.0.7-1wm1 [[phab:T268736|T268736]]
* 10:06 moritzm: installing NSS security updates
* 09:57 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1134 for schema change', diff saved to https://phabricator.wikimedia.org/P13469 and previous config saved to /var/cache/conftool/dbconfig/20201130-095729-marostegui.json
* 09:56 marostegui@cumin1001: dbctl commit (dc=all): 'db1089 (re)pooling @ 100%: After schema change', diff saved to https://phabricator.wikimedia.org/P13468 and previous config saved to /var/cache/conftool/dbconfig/20201130-095621-root.json
* 09:41 marostegui@cumin1001: dbctl commit (dc=all): 'db1089 (re)pooling @ 75%: After schema change', diff saved to https://phabricator.wikimedia.org/P13467 and previous config saved to /var/cache/conftool/dbconfig/20201130-094117-root.json
* 09:40 marostegui: Stop MySQL on db1087 to clone clouddb1016:3318 [[phab:T267090|T267090]])
* 09:39 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1087 from s8 and pool db1092 instead temporarily on vslow [[phab:T267090|T267090]]', diff saved to https://phabricator.wikimedia.org/P13466 and previous config saved to /var/cache/conftool/dbconfig/20201130-093909-marostegui.json
* 09:26 marostegui@cumin1001: dbctl commit (dc=all): 'db1089 (re)pooling @ 50%: After schema change', diff saved to https://phabricator.wikimedia.org/P13465 and previous config saved to /var/cache/conftool/dbconfig/20201130-092614-root.json
* 09:21 marostegui@cumin1001: dbctl commit (dc=all): 'db1089+ (re)pooling @ 25%: After schema change', diff saved to https://phabricator.wikimedia.org/P13464 and previous config saved to /var/cache/conftool/dbconfig/20201130-092154-root.json
* 08:51 marostegui: Deploy schema change on db1089
* 08:51 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1089', diff saved to https://phabricator.wikimedia.org/P13463 and previous config saved to /var/cache/conftool/dbconfig/20201130-085101-marostegui.json
* 08:41 godog: swift eqiad-prod: add weight to ms-be106[0-3] - [[phab:T268435|T268435]]
* 08:36 marostegui: Compare data between clouddb1016:3315 labsdb1012 [[phab:T267090|T267090]]
* 07:45 marostegui@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
* 07:41 marostegui@cumin1001: START - Cookbook sre.hosts.decommission
* 07:25 marostegui@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
* 07:18 marostegui@cumin1001: START - Cookbook sre.hosts.decommission
* 07:11 marostegui: Deploy schema change on s1 codfw - [[phab:T268004|T268004]]
* 07:05 marostegui: Stop mysql on db1124:3318 to clone clouddb1016:3318, lag will show up on wikireplicas on s8 [[phab:T267090|T267090]]
* 06:47 marostegui@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
* 06:43 marostegui@cumin1001: START - Cookbook sre.hosts.decommission
* 04:26 kart_: Updated cxserver to  2020-11-23-050106-production ([[phab:T262253|T262253]], [[phab:T268410|T268410]])
* 04:18 kartik@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'cxserver' for release 'production' .
* 04:14 kartik@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'cxserver' for release 'production' .
* 04:11 kartik@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'cxserver' for release 'staging' .
== 2020-11-27 ==
* 17:30 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 17:30 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime
* 15:50 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 15:50 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime
* 15:13 elukey@cumin1001: END (PASS) - Cookbook sre.zookeeper.roll-restart-zookeeper (exit_code=0)
* 15:06 elukey@cumin1001: START - Cookbook sre.zookeeper.roll-restart-zookeeper
* 14:56 elukey@cumin1001: END (PASS) - Cookbook sre.zookeeper.roll-restart-zookeeper (exit_code=0)
* 14:50 elukey: roll restart zookeeper on druid* nodes for openjdk upgrades
* 14:50 elukey@cumin1001: START - Cookbook sre.zookeeper.roll-restart-zookeeper
* 10:52 jayme: updated helmfile to 0.135.0-1 on deploy*,contint*
* 10:51 jayme: updated helm-diff to 3.1.3-1 on contint*
* 10:49 jayme: updated helm to 2.17.0-1 on deploy*,contint*,chartmuseum*
* 10:06 jayme: updated helm and helmfile on deploy2001
* 10:04 jayme@deploy2001: helmfile [staging] Ran 'sync' command on namespace 'blubberoid' for release 'staging' .
* 10:00 jayme: imported helm 2.17.0 into buster-wikimedia and stretch-wikimedia
* 08:55 elukey@cumin1001: END (PASS) - Cookbook sre.druid.roll-restart-workers (exit_code=0)
* 08:05 elukey: roll restart druid public cluster for openjdk upgrades
* 08:04 elukey@cumin1001: START - Cookbook sre.druid.roll-restart-workers
* 06:39 marostegui: Stop mysql on es1015 [[phab:T268810|T268810]]
* 06:38 marostegui@cumin1001: dbctl commit (dc=all): 'Remove es1015 from dbctl', diff saved to https://phabricator.wikimedia.org/P13454 and previous config saved to /var/cache/conftool/dbconfig/20201127-063846-marostegui.json
* 06:30 marostegui: Remove es1016 from tendril and zarcillo [[phab:T268812|T268812]]
* 06:29 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 06:25 marostegui@cumin1001: START - Cookbook sre.hosts.decommission
* 06:19 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es1015 for decommissioning [[phab:T268810|T268810]]', diff saved to https://phabricator.wikimedia.org/P13453 and previous config saved to /var/cache/conftool/dbconfig/20201127-061929-marostegui.json
== 2020-11-26 ==
* 17:18 jayme: downgrade helmfile to 0.125.2-1 on deploy*
* 17:05 jayme: updated helm-diff and helmfile on deploy100* and deploy200*
* 16:34 jayme: imported helm-diff 3.1.3-1 into buster-wikimedia and stretch-wikimedia
* 15:01 moritzm: installing libonig security updates
* 14:44 marostegui@cumin1001: dbctl commit (dc=all): 'db1126 (re)pooling @ 100%: After schema change', diff saved to https://phabricator.wikimedia.org/P13452 and previous config saved to /var/cache/conftool/dbconfig/20201126-144446-root.json
* 14:38 elukey@cumin1001: END (PASS) - Cookbook sre.druid.roll-restart-workers (exit_code=0)
* 14:36 moritzm: installing zeromq3 security updates for stretch
* 14:35 hnowlan@puppetmaster1001: conftool action : set/pooled=yes:weight=10; selector: dc=codfw,cluster=maps,service=kartotherian,name=maps2001.codfw.wmnet
* 14:35 jbond42: failing idp back to idp2001
* 14:29 marostegui@cumin1001: dbctl commit (dc=all): 'db1126 (re)pooling @ 66%: After schema change', diff saved to https://phabricator.wikimedia.org/P13451 and previous config saved to /var/cache/conftool/dbconfig/20201126-142942-root.json
* 14:24 hnowlan@puppetmaster1001: conftool action : set/pooled=no:weight=0; selector: dc=codfw,cluster=maps,service=kartotherian,name=maps2001.codfw.wmnet
* 14:24 hnowlan@puppetmaster1001: conftool action : set/pooled=yes:weight=10; selector: dc=codfw,cluster=maps,service=kartotherian,name=maps2004.codfw.wmnet
* 14:23 moritzm: remove labtestpuppetmaster2001 from debmonitor [[phab:T258103|T258103]]
* 14:14 marostegui@cumin1001: dbctl commit (dc=all): 'db1126 (re)pooling @ 33%: After schema change', diff saved to https://phabricator.wikimedia.org/P13450 and previous config saved to /var/cache/conftool/dbconfig/20201126-141439-root.json
* 13:52 elukey: roll restart druid daemons on druid analytics to pick up new openjdk upgrades
* 13:52 root@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 13:52 root@cumin1001: START - Cookbook sre.hosts.downtime
* 13:52 elukey@cumin1001: START - Cookbook sre.druid.roll-restart-workers
* 13:50 moritzm: installing python3.5 security updates
* 13:32 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1126 for schema change', diff saved to https://phabricator.wikimedia.org/P13449 and previous config saved to /var/cache/conftool/dbconfig/20201126-133204-marostegui.json
* 13:29 marostegui@cumin1001: dbctl commit (dc=all): 'db1109 (re)pooling @ 100%: After schema change', diff saved to https://phabricator.wikimedia.org/P13448 and previous config saved to /var/cache/conftool/dbconfig/20201126-132918-root.json
* 13:19 hnowlan@puppetmaster1001: conftool action : set/pooled=no:weight=0; selector: dc=codfw,cluster=maps,service=kartotherian,name=maps2004.codfw.wmnet
* 13:14 marostegui@cumin1001: dbctl commit (dc=all): 'db1109 (re)pooling @ 66%: After schema change', diff saved to https://phabricator.wikimedia.org/P13447 and previous config saved to /var/cache/conftool/dbconfig/20201126-131414-root.json
* 13:07 hnowlan: testing depooling kartotherian on maps2004 to reduce load
* 13:07 hnowlan@puppetmaster1001: conftool action : set/pooled=no:weight=10; selector: dc=codfw,cluster=maps,service=kartotherian,name=maps2004.codfw.wmnet
* 13:01 jbond42: update puppet_compiler on compiler1003
* 12:59 marostegui@cumin1001: dbctl commit (dc=all): 'db1109 (re)pooling @ 33%: After schema change', diff saved to https://phabricator.wikimedia.org/P13446 and previous config saved to /var/cache/conftool/dbconfig/20201126-125911-root.json
* 12:42 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1109 for schema change', diff saved to https://phabricator.wikimedia.org/P13445 and previous config saved to /var/cache/conftool/dbconfig/20201126-124253-marostegui.json
* 12:31 jbond42: fail over idp.wikimedia.org
* 11:55 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 11:53 moritzm: rebooting seaborgium for kernel update
* 11:53 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
* 11:40 marostegui: Deploy schema change on s8 codfw - there will be lag on s8 codfw - [[phab:T268004|T268004]]
* 11:16 moritzm: restarting archiva to pick up Java security update
* 10:43 marostegui@cumin1001: dbctl commit (dc=all): 'db1136 (re)pooling @ 100%: After schema change', diff saved to https://phabricator.wikimedia.org/P13442 and previous config saved to /var/cache/conftool/dbconfig/20201126-104324-root.json
* 10:41 elukey@cumin1001: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0)
* 10:28 marostegui@cumin1001: dbctl commit (dc=all): 'db1136 (re)pooling @ 75%: After schema change', diff saved to https://phabricator.wikimedia.org/P13441 and previous config saved to /var/cache/conftool/dbconfig/20201126-102820-root.json
* 10:13 marostegui@cumin1001: dbctl commit (dc=all): 'db1136 (re)pooling @ 50%: After schema change', diff saved to https://phabricator.wikimedia.org/P13440 and previous config saved to /var/cache/conftool/dbconfig/20201126-101317-root.json
* 09:58 marostegui@cumin1001: dbctl commit (dc=all): 'db1136 (re)pooling @ 25%: After schema change', diff saved to https://phabricator.wikimedia.org/P13439 and previous config saved to /var/cache/conftool/dbconfig/20201126-095813-root.json
* 09:47 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1136 for schema change', diff saved to https://phabricator.wikimedia.org/P13438 and previous config saved to /var/cache/conftool/dbconfig/20201126-094729-marostegui.json
* 09:47 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1094 after schema change', diff saved to https://phabricator.wikimedia.org/P13437 and previous config saved to /var/cache/conftool/dbconfig/20201126-094702-marostegui.json
* 09:46 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1094 for schema change', diff saved to https://phabricator.wikimedia.org/P13436 and previous config saved to /var/cache/conftool/dbconfig/20201126-094639-marostegui.json
* 09:45 marostegui@cumin1001: dbctl commit (dc=all): 'db1079 (re)pooling @ 100%: After schema change', diff saved to https://phabricator.wikimedia.org/P13435 and previous config saved to /var/cache/conftool/dbconfig/20201126-094538-root.json
* 09:38 marostegui: Stop mysql on es1016 for decommission
* 09:30 marostegui@cumin1001: dbctl commit (dc=all): 'db1079 (re)pooling @ 75%: After schema change', diff saved to https://phabricator.wikimedia.org/P13434 and previous config saved to /var/cache/conftool/dbconfig/20201126-093035-root.json
* 09:26 ema: deployment-cache-text06: upgrade Varnish to 6.0.7-1wm1 [[phab:T268736|T268736]]
* 09:15 marostegui@cumin1001: dbctl commit (dc=all): 'db1079 (re)pooling @ 50%: After schema change', diff saved to https://phabricator.wikimedia.org/P13433 and previous config saved to /var/cache/conftool/dbconfig/20201126-091532-root.json
* 09:00 marostegui@cumin1001: dbctl commit (dc=all): 'db1079 (re)pooling @ 25%: After schema change', diff saved to https://phabricator.wikimedia.org/P13432 and previous config saved to /var/cache/conftool/dbconfig/20201126-090028-root.json
* 08:49 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1079 for schema change', diff saved to https://phabricator.wikimedia.org/P13431 and previous config saved to /var/cache/conftool/dbconfig/20201126-084903-marostegui.json
* 08:40 elukey: roll restart cassandra on aqs10* for openjdk upgrades
* 08:40 elukey@cumin1001: START - Cookbook sre.cassandra.roll-restart
* 08:09 godog: swift eqiad-prod: add weight to ms-be106[0-3] - [[phab:T268435|T268435]]
* 08:08 marostegui: Deploy schema change on s7 codfw - there will be lag on s7 codfw - [[phab:T268004|T268004]]
* 07:25 marostegui@cumin1001: dbctl commit (dc=all): 'db1138 (re)pooling @ 100%: After schema change', diff saved to https://phabricator.wikimedia.org/P13430 and previous config saved to /var/cache/conftool/dbconfig/20201126-072506-root.json
* 07:15 marostegui@cumin1001: dbctl commit (dc=all): 'db1085 (re)pooling @ 100%: After cloning clouddb hosts', diff saved to https://phabricator.wikimedia.org/P13429 and previous config saved to /var/cache/conftool/dbconfig/20201126-071514-root.json
* 07:12 marostegui: Enable GTID on clouddb1018:3317 clouddb1014:3317 [[phab:T267090|T267090]]
* 07:10 marostegui@cumin1001: dbctl commit (dc=all): 'db1138 (re)pooling @ 75%: After schema change', diff saved to https://phabricator.wikimedia.org/P13428 and previous config saved to /var/cache/conftool/dbconfig/20201126-071003-root.json
* 07:00 marostegui@cumin1001: dbctl commit (dc=all): 'db1085 (re)pooling @ 75%: After cloning clouddb hosts', diff saved to https://phabricator.wikimedia.org/P13427 and previous config saved to /var/cache/conftool/dbconfig/20201126-070010-root.json
* 06:55 marostegui@cumin1001: dbctl commit (dc=all): 'db1138 (re)pooling @ 50%: After schema change', diff saved to https://phabricator.wikimedia.org/P13426 and previous config saved to /var/cache/conftool/dbconfig/20201126-065500-root.json
* 06:45 marostegui@cumin1001: dbctl commit (dc=all): 'db1085 (re)pooling @ 50%: After cloning clouddb hosts', diff saved to https://phabricator.wikimedia.org/P13425 and previous config saved to /var/cache/conftool/dbconfig/20201126-064507-root.json
* 06:39 marostegui@cumin1001: dbctl commit (dc=all): 'db1138 (re)pooling @ 25%: After schema change', diff saved to https://phabricator.wikimedia.org/P13424 and previous config saved to /var/cache/conftool/dbconfig/20201126-063956-root.json
* 06:32 marostegui@cumin1001: dbctl commit (dc=all): 'Remove es1016 from dbctl', diff saved to https://phabricator.wikimedia.org/P13423 and previous config saved to /var/cache/conftool/dbconfig/20201126-063234-marostegui.json
* 06:30 marostegui@cumin1001: dbctl commit (dc=all): 'db1085 (re)pooling @ 25%: After cloning clouddb hosts', diff saved to https://phabricator.wikimedia.org/P13422 and previous config saved to /var/cache/conftool/dbconfig/20201126-063003-root.json
* 06:28 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es1016 for decommissioning', diff saved to https://phabricator.wikimedia.org/P13421 and previous config saved to /var/cache/conftool/dbconfig/20201126-062811-marostegui.json
* 06:17 marostegui: Stop mysql on db1124:3315 to clone clouddb1016:3315 [[phab:T267090|T267090]]
* 06:15 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1138 for schema change', diff saved to https://phabricator.wikimedia.org/P13420 and previous config saved to /var/cache/conftool/dbconfig/20201126-061552-marostegui.json
* 06:15 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1143', diff saved to https://phabricator.wikimedia.org/P13419 and previous config saved to /var/cache/conftool/dbconfig/20201126-061459-marostegui.json
* 06:14 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1143 for schema change', diff saved to https://phabricator.wikimedia.org/P13418 and previous config saved to /var/cache/conftool/dbconfig/20201126-061432-marostegui.json
* 06:08 ryankemper: [[phab:T268770|T268770]] [eqiad] Finished rolling restart of cirrus eqiad. All cirrus elasticsearch restarts are now complete (cloudelastic, relforge, eqiad, codfw)
* 06:05 ryankemper@cumin1001: END (PASS) - Cookbook sre.elasticsearch.rolling-restart (exit_code=0)
* 04:24 ryankemper: [[phab:T268770|T268770]] [eqiad] Begin rolling restart of cirrus eqiad, 3 nodes at a time
* 04:17 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-restart
* 03:07 krinkle@deploy1001: Synchronized wmf-config/mc.php: {{Gerrit|I805699ecfa}} (duration: 00m 58s)
== 2020-11-25 ==
* 23:28 akosiaris@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
* 22:55 mutante: mwdebug1003 - scap pull - which rsyncs from deploy1001 and runs php-fpm restart check script ([[phab:T245757|T245757]])
* 22:47 ejegg: increased Ingenico API call timeout
* 22:34 shdubsh: beginning rolling restart of logstash cluster - eqiad
* 22:23 akosiaris@cumin1001: START - Cookbook sre.ganeti.makevm
* 22:14 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 22:14 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 22:12 akosiaris@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
* 21:19 akosiaris@cumin1001: START - Cookbook sre.ganeti.makevm
* 20:49 krinkle@deploy1001: Synchronized php-1.36.0-wmf.18/includes/libs/CSSMin.php: {{Gerrit|I26ed3e5e9a}} - fix [[phab:T268308|T268308]] (duration: 00m 59s)
* 20:43 mutante: LDAP added user duminasi to group wmf ([[phab:T266791|T266791]])
* 20:06 ryankemper@cumin1001: END (PASS) - Cookbook sre.elasticsearch.rolling-restart (exit_code=0)
* 18:44 elukey: upload new hive* packages 2.2.3-2 to stretch-wikimedia - thirdparty/bigtop14 component
* 18:42 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-restart
* 18:38 mutante: LDAP adding swagoel to NDA [[phab:T267314|T267314]]#6625628
* 18:31 ryankemper@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.rolling-restart (exit_code=99)
* 18:05 ryankemper: [[phab:T268770|T268770]] [cloudelastic] Thawed writes to cloudelastic cluster following restarts: `/usr/local/bin/mwscript extensions/CirrusSearch/maintenance/FreezeWritesToCluster.php --wiki=enwiki --cluster=cloudelastic --thaw` on `mwmaint1002`
* 18:01 ryankemper: [cloudelastic] (forgot to mention this) Thawed writes to cloudelastic cluster following restarts: `/usr/local/bin/mwscript extensions/CirrusSearch/maintenance/FreezeWritesToCluster.php --wiki=enwiki --cluster=cloudelastic --thaw` on `mwmaint1002`
* 17:58 ryankemper: [[phab:T268770|T268770]] [cloudelastic] restarts complete, service is healthy. This is done.
* 17:55 ryankemper: [[phab:T268770|T268770]] [cloudelastic] restarts on `cloudelastic1006` complete and all 3 elasticsearch clusters are green, all cloudelastic instances are now complete
* 17:49 ryankemper: [[phab:T268770|T268770]] [cloudelastic] restarts on `cloudelastic1005` complete and all 3 elasticsearch clusters are green, proceeding to next instance
* 17:44 shdubsh: beginning rolling restart of logstash cluster - codfw
* 17:44 ryankemper: [[phab:T268770|T268770]] [cloudelastic] restarts on `cloudelastic1004` complete and all 3 elasticsearch clusters are green, proceeding to next instance
* 17:39 ryankemper: [[phab:T268770|T268770]] [cloudelastic] restarts on `cloudelastic1003` complete and all 3 elasticsearch clusters are green, proceeding to next instance
* 17:39 ryankemper: [[phab:T268770|T268770]] [cloudelastic] restarts on `cloudelastic1002` complete and all 3 elasticsearch clusters are green, proceeding to next instance
* 17:28 ryankemper: [[phab:T268770|T268770]] [cloudelastic] restarts on `cloudelastic1001` complete and all 3 elasticsearch clusters are green, proceeding to next instance
* 17:22 ryankemper: [[phab:T268770|T268770]] Freezing writes to cloudelastic in preparation for restarts: `/usr/local/bin/mwscript extensions/CirrusSearch/maintenance/FreezeWritesToCluster.php --wiki=enwiki --cluster=cloudelastic` on `mwmaint1002`
* 17:09 ryankemper: [[phab:T268770|T268770]] [cloudelastic] Downtimed `cloudelastic100[1-6]` in icinga in preparation for cloudelastic search elasticsearch cluster restart
* 17:05 ryankemper: [[phab:T268770|T268770]] Begin rolling restart of eqiad cirrus elasticsearch, 3 nodes at a time
* 17:04 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-restart
* 17:00 godog: fail sdk on ms-be2031
* 16:49 godog: clean up sdk1 on / on ms-be2031
* 16:46 elukey: move analytics1066 to C3 - [[phab:T267065|T267065]]
* 16:44 akosiaris@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
* 16:21 mutante: puppetmaster - revoking old and signing new cert for mwdebug1003
* 16:11 elukey: move analytics1065 to C3 - [[phab:T267065|T267065]]
* 16:10 mutante: shutting down mwdebug1003 - reimaging for [[phab:T245757|T245757]]
* 16:08 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 16:08 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 16:02 moritzm: installing golang-1.7 updates for stretch
* 15:57 akosiaris@cumin1001: START - Cookbook sre.ganeti.makevm
* 15:57 akosiaris@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99)
* 15:57 akosiaris@cumin1001: START - Cookbook sre.ganeti.makevm
* 15:38 elukey: move stat1004 to A5 - [[phab:T267065|T267065]]
* 15:37 akosiaris@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
* 15:34 moritzm: removing maps2002 from debmonitor
* 15:10 akosiaris@cumin1001: START - Cookbook sre.ganeti.makevm
* 15:04 akosiaris@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99)
* 15:04 akosiaris@cumin1001: START - Cookbook sre.ganeti.makevm
* 15:03 akosiaris@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99)
* 14:56 moritzm: installing krb5 security updates for Buster
* 14:55 akosiaris@cumin1001: START - Cookbook sre.ganeti.makevm
* 14:55 akosiaris@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99)
* 14:55 akosiaris@cumin1001: START - Cookbook sre.ganeti.makevm
* 14:26 akosiaris@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 14:00 akosiaris@cumin1001: START - Cookbook sre.dns.netbox
* 13:56 akosiaris@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
* 13:44 akosiaris@cumin1001: START - Cookbook sre.dns.netbox
* 13:43 akosiaris: assign IPs to kubestage200<nowiki>{</nowiki>1,2,3<nowiki>}</nowiki>.codfw.wmnet, kubestagemaster2001.codfw.wmnet in netbox [[phab:T268747|T268747]]
* 13:14 marostegui: Deploy schema change on commonswiki.watchlist on s4 codfw - there will be lag on s4 codfw - [[phab:T268004|T268004]]
* 13:08 akosiaris: assign IPs to kubestage200<nowiki>{</nowiki>1,2,3<nowiki>}</nowiki>.codfw.wmnet, kubestagemaster2001.codfw.wmnet in netbox
* 12:42 marostegui@cumin1001: dbctl commit (dc=all): 'db1129 (re)pooling @ 100%: After schema change', diff saved to https://phabricator.wikimedia.org/P13414 and previous config saved to /var/cache/conftool/dbconfig/20201125-124202-root.json
* 12:27 marostegui@cumin1001: dbctl commit (dc=all): 'db1129 (re)pooling @ 75%: After schema change', diff saved to https://phabricator.wikimedia.org/P13413 and previous config saved to /var/cache/conftool/dbconfig/20201125-122659-root.json
* 12:11 marostegui@cumin1001: dbctl commit (dc=all): 'db1129 (re)pooling @ 50%: After schema change', diff saved to https://phabricator.wikimedia.org/P13412 and previous config saved to /var/cache/conftool/dbconfig/20201125-121155-root.json
* 11:56 marostegui@cumin1001: dbctl commit (dc=all): 'db1129 (re)pooling @ 25%: After schema change', diff saved to https://phabricator.wikimedia.org/P13411 and previous config saved to /var/cache/conftool/dbconfig/20201125-115652-root.json
* 11:49 gilles@deploy1001: Finished deploy [performance/coal@be167b2]: [[phab:T268724|T268724]] (duration: 00m 06s)
* 11:48 gilles@deploy1001: Started deploy [performance/coal@be167b2]: [[phab:T268724|T268724]]
* 11:47 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1129 for schema change', diff saved to https://phabricator.wikimedia.org/P13408 and previous config saved to /var/cache/conftool/dbconfig/20201125-114717-marostegui.json
* 11:27 gilles@deploy1001: Finished deploy [performance/coal@468bc50]: [[phab:T268724|T268724]] (duration: 00m 06s)
* 11:27 gilles@deploy1001: Started deploy [performance/coal@468bc50]: [[phab:T268724|T268724]]
* 11:27 jbond42: install krb5 updates to jessie hosts
* 10:52 jbond42: failover idp primary to idp2001
* 10:51 kormat: deployed wmfmariadbpy 0.6.1 to `C:wmfmariadbpy`
* 10:43 kormat: uploaded wmfmariadbpy 0.6.1 to stretch+buster apt repos
* 10:21 jynus: upgrade wmfbackup-check package on alert* hosts
* 10:11 kormat: uploaded wmfmariadbpy 0.6 to stretch+buster apt repos
* 09:54 moritzm: uploaded krb5 1.12.1+dfsg-19+deb8u5+wmf1 to apt.wikimedia.org
* 09:52 marostegui@cumin1001: dbctl commit (dc=all): 'db1076 (re)pooling @ 100%: After schema change', diff saved to https://phabricator.wikimedia.org/P13405 and previous config saved to /var/cache/conftool/dbconfig/20201125-095239-root.json
* 09:45 marostegui: Manually install apt-get install bsd-mailx on clouddb1015, labsdb1012 and labsdb1011 - [[phab:T268725|T268725]]
* 09:37 marostegui@cumin1001: dbctl commit (dc=all): 'db1076 (re)pooling @ 75%: After schema change', diff saved to https://phabricator.wikimedia.org/P13404 and previous config saved to /var/cache/conftool/dbconfig/20201125-093736-root.json
* 09:22 marostegui@cumin1001: dbctl commit (dc=all): 'db1076 (re)pooling @ 50%: After schema change', diff saved to https://phabricator.wikimedia.org/P13403 and previous config saved to /var/cache/conftool/dbconfig/20201125-092232-root.json
* 09:07 marostegui@cumin1001: dbctl commit (dc=all): 'db1076 (re)pooling @ 25%: After schema change', diff saved to https://phabricator.wikimedia.org/P13402 and previous config saved to /var/cache/conftool/dbconfig/20201125-090729-root.json
* 08:52 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1076 for schema change', diff saved to https://phabricator.wikimedia.org/P13401 and previous config saved to /var/cache/conftool/dbconfig/20201125-085216-marostegui.json
* 08:46 marostegui@cumin1001: dbctl commit (dc=all): 'db1074 (re)pooling @ 100%: After cloning the new clouddb hosts', diff saved to https://phabricator.wikimedia.org/P13400 and previous config saved to /var/cache/conftool/dbconfig/20201125-084603-root.json
* 08:43 kormat@deploy1001: Synchronized wmf-config/db-eqiad.php: Re-enable writes to es5 [[phab:T268469|T268469]] (duration: 00m 59s)
* 08:34 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 08:34 kormat@cumin1001: START - Cookbook sre.hosts.downtime
* 08:31 marostegui@cumin1001: dbctl commit (dc=all): 'db1074 (re)pooling @ 75%: After cloning the new clouddb hosts', diff saved to https://phabricator.wikimedia.org/P13399 and previous config saved to /var/cache/conftool/dbconfig/20201125-083059-root.json
* 08:15 marostegui@cumin1001: dbctl commit (dc=all): 'db1074 (re)pooling @ 50%: After cloning the new clouddb hosts', diff saved to https://phabricator.wikimedia.org/P13398 and previous config saved to /var/cache/conftool/dbconfig/20201125-081556-root.json
* 08:14 kormat: rebooting es1024 [[phab:T268469|T268469]]
* 08:08 godog: swift eqiad-prod: add weight to ms-be106[0-3] - [[phab:T268435|T268435]]
* 08:07 kormat: stopping mariadb on es1024 [[phab:T268469|T268469]]
* 08:04 kormat@deploy1001: Synchronized wmf-config/db-eqiad.php: Disable writes to es5 [[phab:T268469|T268469]] (duration: 00m 58s)
* 08:02 marostegui: Upgrade db2108
* 08:02 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 08:02 kormat@cumin1001: START - Cookbook sre.hosts.downtime
* 08:00 marostegui@cumin1001: dbctl commit (dc=all): 'db1074 (re)pooling @ 25%: After cloning the new clouddb hosts', diff saved to https://phabricator.wikimedia.org/P13397 and previous config saved to /var/cache/conftool/dbconfig/20201125-080053-root.json
* 07:19 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1130', diff saved to https://phabricator.wikimedia.org/P13396 and previous config saved to /var/cache/conftool/dbconfig/20201125-071951-marostegui.json
* 07:14 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1130 for schema change', diff saved to https://phabricator.wikimedia.org/P13395 and previous config saved to /var/cache/conftool/dbconfig/20201125-071450-marostegui.json
* 06:38 marostegui: Stop mysql on db1125:3317 to clone clouddb1014:3317 clouddb1018:3317 [[phab:T267090|T267090]]
* 06:33 marostegui: Restart clouddb1019:3314, clouddb1019:3316
* 06:32 marostegui: Restart clouddb1015:3314, clouddb1015:3316
* 06:28 marostegui: Check private data on clouddb1014:3312 and clouddb1018:3312 [[phab:T267090|T267090]]
* 05:48 marostegui: Sanitize clouddb1014:3312 and clouddb1018:3312 [[phab:T267090|T267090]]
* 01:10 tgr_: Evening deploys done
* 01:07 tgr@deploy1001: Finished scap: Backport: [[gerrit:643156{{!}}GrowthExperiments: Add Russian aliases (T268519)]] (duration: 32m 09s)
* 00:35 tgr@deploy1001: Started scap: Backport: [[gerrit:643156{{!}}GrowthExperiments: Add Russian aliases (T268519)]]
== 2020-11-24 ==
* 23:50 crusnov@deploy1001: Finished deploy [netbox/deploy@0362a12]: Test deploy of 2.9.10 to netbox-next [[phab:T266488|T266488]] p2 (duration: 00m 05s)
* 23:50 crusnov@deploy1001: Started deploy [netbox/deploy@0362a12]: Test deploy of 2.9.10 to netbox-next [[phab:T266488|T266488]] p2
* 23:50 crusnov@deploy1001: Finished deploy [netbox/deploy@0362a12]: Test deploy of 2.9.10 to netbox-next [[phab:T266488|T266488]] (duration: 01m 51s)
* 23:48 crusnov@deploy1001: Started deploy [netbox/deploy@0362a12]: Test deploy of 2.9.10 to netbox-next [[phab:T266488|T266488]]
* 21:27 andrewbogott: restarting slapd on serpens
* 21:20 cdanis: ✔️ cdanis@seaborgium.wikimedia.org ~ 🕟🍵 sudo systemctl restart prometheus-openldap-exporter.service
* 21:17 andrewbogott: restarting slapd on seaborgium
* 20:49 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 20:42 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 20:41 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 20:40 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 19:53 otto@deploy1001: Synchronized wmf-config/InitialiseSettings-labs.php: Remove no longer needed EventLoggingSchemas override for NavigationTiming and ResourceTiming - [[phab:T254606|T254606]] (duration: 01m 01s)
* 19:49 ryankemper: [elasticsearch] Restarted all elasticsearch systemd-managed services on `relforge100[1,2]`: `elasticsearch_6@relforge-eqiad.service` and `elasticsearch_6@relforge-eqiad-small-alpha.service`
* 19:30 gilles@deploy1001: Synchronized php-1.36.0-wmf.18/extensions/NavigationTiming/extension.json: (no justification provided) (duration: 00m 57s)
* 19:24 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|331a129}}: Remove temporary feature flags ([[phab:T258116|T258116]]) (duration: 00m 57s)
* 19:20 mutante: LDAP - added derick to group nda ([[phab:T268150|T268150]])
* 19:17 moritzm: installing Java security updates on elastic* and relforge*
* 19:09 ppchelko@deploy1001: Synchronized wmf-config/InitialiseSettings.php: gerrit:643260 group1: Switch ParserCache to JSON (duration: 00m 57s)
* 18:59 mbsantos@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'push-notifications' for release 'main' .
* 18:56 elukey@deploy1001: Finished deploy [analytics/refinery@1ff0868]: Regular analytics weekly train (duration: 09m 50s)
* 18:56 volans: migrating anycast zonefile to the Netbox-generated ones - [[phab:T258729|T258729]]
* 18:55 mbsantos@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'push-notifications' for release 'main' .
* 18:52 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 18:51 mbsantos@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'push-notifications' for release 'main' .
* 18:48 andrew@cumin1001: START - Cookbook sre.hosts.downtime
* 18:46 elukey@deploy1001: Started deploy [analytics/refinery@1ff0868]: Regular analytics weekly train
* 18:46 crusnov@deploy1001: Finished deploy [netbox/deploy@88f61d0]: Test deploy of 2.9.9 to netbox-next [[phab:T266488|T266488]] p2 (duration: 00m 05s)
* 18:45 crusnov@deploy1001: Started deploy [netbox/deploy@88f61d0]: Test deploy of 2.9.9 to netbox-next [[phab:T266488|T266488]] p2
* 18:45 crusnov@deploy1001: Finished deploy [netbox/deploy@88f61d0]: Test deploy of 2.9.9 to netbox-next [[phab:T266488|T266488]] (duration: 01m 09s)
* 18:45 elukey: restart memcached on mw2339 to pick up the correct port (was bound on 11211 rather than 11210)
* 18:44 crusnov@deploy1001: Started deploy [netbox/deploy@88f61d0]: Test deploy of 2.9.9 to netbox-next [[phab:T266488|T266488]]
* 18:19 ejegg: updated Fundraising CiviCRM from {{Gerrit|28464df973}} to {{Gerrit|fb0ad7f39b}}
* 18:07 mbsantos@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
* 18:06 mbsantos@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
* 18:04 mbsantos@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'wikifeeds' for release 'staging' .
* 17:51 volans@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 17:44 volans@cumin1001: START - Cookbook sre.dns.netbox
* 17:10 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 17:08 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 17:08 andrew@cumin1001: START - Cookbook sre.hosts.downtime
* 17:07 andrew@cumin1001: START - Cookbook sre.hosts.downtime
* 16:29 elukey: move analytics1064 from C2 to C3 eqiad - [[phab:T267065|T267065]]
* 16:26 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 16:19 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 16:06 hnowlan: finished removing restbase2009 from cassandra cluster
* 16:01 cmjohnson1: replacing the sfp at cr1-eqiad xe-3/2/1 [[phab:T267672|T267672]]
* 15:42 marostegui: Drop kraken user from s4 - [[phab:T268636|T268636]]
* 15:38 elukey: move druid1005 from rack B7 to B6 - [[phab:T267065|T267065]]
* 15:35 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 15:33 andrew@cumin1001: START - Cookbook sre.hosts.downtime
* 15:29 otto@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventstreams' for release 'canary' .
* 15:29 otto@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventstreams' for release 'production' .
* 15:28 jayme: pushed docker-registry.discovery.wmnet/calico/kube-controllers:v3.17.0 docker-registry.discovery.wmnet/calico/node:v3.17.0 docker-registry.discovery.wmnet/calico/typha:v3.17.0
* 15:23 jayme: imported calico 3.17.0 into component/calico-future for stretch-wikimedia
* 15:07 godog: swift eqiad-prod: decom ms-be1022 ssd from swift - [[phab:T267870|T267870]]
* 15:01 marostegui: Enable GTID on clouddb1013:3311 clouddb1015:3314 clouddb1017:3311 clouddb1019:3314 [[phab:T267090|T267090]]
* 14:58 elukey: move analytics1072 from rack B2 to B3 - [[phab:T267065|T267065]]
* 14:54 otto@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventstreams' for release 'canary' .
* 14:54 otto@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventstreams' for release 'production' .
* 14:53 jayme: imported helmfile 0.135.0-1 into buster-wikimedia and stretch-wikimedia
* 14:47 otto@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventstreams' for release 'production' .
* 14:44 akosiaris@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'kube-system' for release 'calico-policy-controller' .
* 14:43 akosiaris@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'kube-system' for release 'calico-policy-controller' .
* 14:43 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'kube-system' for release 'calico-policy-controller' .
* 14:42 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'kube-system' for release 'calico-policy-controller' .
* 14:42 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1085 for schema change', diff saved to https://phabricator.wikimedia.org/P13392 and previous config saved to /var/cache/conftool/dbconfig/20201124-144219-marostegui.json
* 14:34 liw: finished testing Scap on Beta cluster in prep for https://phabricator.wikimedia.org/T268634
* 14:31 otto@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventstreams' for release 'production' .
* 14:27 otto@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventstreams' for release 'production' .
* 14:19 marostegui@cumin1001: dbctl commit (dc=all): 'db1121 (re)pooling @ 100%: After cloning the new clouddb hosts', diff saved to https://phabricator.wikimedia.org/P13391 and previous config saved to /var/cache/conftool/dbconfig/20201124-141912-root.json
* 14:09 moritzm: reset-failed idp-u2f.service after Hiera change (one time issue, will soon be obsolete)
* 14:04 marostegui@cumin1001: dbctl commit (dc=all): 'db1121 (re)pooling @ 75%: After cloning the new clouddb hosts', diff saved to https://phabricator.wikimedia.org/P13390 and previous config saved to /var/cache/conftool/dbconfig/20201124-140409-root.json
* 13:52 elukey@deploy1001: Finished deploy [statsv/statsv@b25b6ff]: Deploy https://gerrit.wikimedia.org/r/c/analytics/statsv/+/643252 (duration: 00m 05s)
* 13:52 elukey@deploy1001: Started deploy [statsv/statsv@b25b6ff]: Deploy https://gerrit.wikimedia.org/r/c/analytics/statsv/+/643252
* 13:49 marostegui@cumin1001: dbctl commit (dc=all): 'db1121 (re)pooling @ 50%: After cloning the new clouddb hosts', diff saved to https://phabricator.wikimedia.org/P13389 and previous config saved to /var/cache/conftool/dbconfig/20201124-134905-root.json
* 13:40 marostegui: Stop MySQL on db1074 to clone clouddb1018 and clouddb1014 [[phab:T267090|T267090]]
* 13:37 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1074 to clone clouddb1018 and clouddb1014 [[phab:T267090|T267090]]', diff saved to https://phabricator.wikimedia.org/P13388 and previous config saved to /var/cache/conftool/dbconfig/20201124-133709-marostegui.json
* 13:34 marostegui@cumin1001: dbctl commit (dc=all): 'db1121 (re)pooling @ 25%: After cloning the new clouddb hosts', diff saved to https://phabricator.wikimedia.org/P13387 and previous config saved to /var/cache/conftool/dbconfig/20201124-133402-root.json
* 13:13 jgleeson: civicrm revision is {{Gerrit|28464df973}}, config revision is {{Gerrit|928918a9b6}}
* 13:01 hashar@deploy1001: rebuilt and synchronized wikiversions files: all wikis to 1.36.0-wmf.18
* 13:01 liw: done testing Scap release candidate on beta (failed: disk full on deploy01)
* 12:49 hnowlan: disabled cassandra service on restbase2009, starting drain
* 12:30 liw: testing upcoming Scap release on beta
* 12:02 ayounsi@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 11:59 jayme: imported helm3 3.4.1-1 into buster-wikimedia and stretch-wikimedia
* 11:56 ayounsi@cumin1001: START - Cookbook sre.dns.netbox
* 11:52 XioNoX: push CR641949 and CR641949
* 11:38 effie: rolling depool and pool app and api clusters - [[phab:T244340|T244340]]
* 11:25 _joe_: rebuild docker images for [[phab:T268612|T268612]]
* 11:20 effie: disable puppet on api and app servers to rollout onhost memcached - [[phab:T244340|T244340]]
* 11:15 root@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 11:15 root@cumin1001: START - Cookbook sre.hosts.downtime
* 11:15 root@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 11:14 root@cumin1001: START - Cookbook sre.hosts.downtime
* 11:12 marostegui: Stop mysql on db1125:3312 to clone clouddb1014:3312 and clouddb1018:3312 - [[phab:T267090|T267090]]
* 10:45 moritzm: upgrading seaborgium to Buster
* 10:41 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 10:40 jmm@cumin2001: START - Cookbook sre.hosts.downtime
* 10:31 jbond42: up0load new cas package to wikimedia-buster
* 10:01 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db2073', diff saved to https://phabricator.wikimedia.org/P13384 and previous config saved to /var/cache/conftool/dbconfig/20201124-100139-marostegui.json
* 10:00 marostegui@cumin1001: dbctl commit (dc=all): 'Repool es2026', diff saved to https://phabricator.wikimedia.org/P13383 and previous config saved to /var/cache/conftool/dbconfig/20201124-100020-marostegui.json
* 09:48 volans: Migrating codfw private/public primary DNS records to the auto-generated ones from Netbox - [[phab:T258729|T258729]]
* 09:44 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1096:3316 after schema change', diff saved to https://phabricator.wikimedia.org/P13382 and previous config saved to /var/cache/conftool/dbconfig/20201124-094449-marostegui.json
* 09:42 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1096:3316', diff saved to https://phabricator.wikimedia.org/P13381 and previous config saved to /var/cache/conftool/dbconfig/20201124-094159-marostegui.json
* 09:40 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1098:3316 after schema change', diff saved to https://phabricator.wikimedia.org/P13380 and previous config saved to /var/cache/conftool/dbconfig/20201124-094052-marostegui.json
* 09:35 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1098:3316', diff saved to https://phabricator.wikimedia.org/P13379 and previous config saved to /var/cache/conftool/dbconfig/20201124-093517-marostegui.json
* 09:23 marostegui: Deploy schema change on db2114 and db1096:3316 - [[phab:T268004|T268004]]
* 09:13 ema: cp4032: switch back to varnish 6.0.6-1wm2 after [[phab:T264398|T264398]] experiment, fix [[phab:T268243|T268243]]
* 09:09 elukey: drop principals and keytabs for analytics10[42-57] - [[phab:T267932|T267932]]
* 09:03 gilles@deploy1001: Finished deploy [performance/navtiming@ba6cd0d]: [[phab:T260580|T260580]] Parse user agents in navtiming instead of relying on eventlogging to do it (duration: 00m 05s)
* 09:03 gilles@deploy1001: Started deploy [performance/navtiming@ba6cd0d]: [[phab:T260580|T260580]] Parse user agents in navtiming instead of relying on eventlogging to do it
* 08:49 _joe_: uploading the base production docker images for MediaWiki, [[phab:T265324|T265324]]
* 08:48 elukey@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
* 08:43 _joe_: refreshing debian buster base image
* 08:42 elukey@cumin1001: START - Cookbook sre.hosts.decommission
* 08:42 elukey@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
* 08:36 elukey@cumin1001: START - Cookbook sre.hosts.decommission
* 08:31 marostegui: Deploy user for pki database for dbproxy1012, dbproxy1014, dbproxy2001 - [[phab:T268329|T268329]]
* 08:28 elukey@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=99)
* 08:27 elukey@cumin1001: START - Cookbook sre.hosts.decommission
* 07:58 godog: swift eqiad-prod: add weight to ms-be106[0-3] - [[phab:T268435|T268435]]
* 07:43 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1112 after schema change [[phab:T267335|T267335]] [[phab:T267399|T267399]]', diff saved to https://phabricator.wikimedia.org/P13378 and previous config saved to /var/cache/conftool/dbconfig/20201124-074342-marostegui.json
* 07:32 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1112 for schema change [[phab:T267335|T267335]] [[phab:T267399|T267399]]', diff saved to https://phabricator.wikimedia.org/P13377 and previous config saved to /var/cache/conftool/dbconfig/20201124-073202-marostegui.json
* 07:31 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1078 after schema change [[phab:T267335|T267335]] [[phab:T267399|T267399]]', diff saved to https://phabricator.wikimedia.org/P13376 and previous config saved to /var/cache/conftool/dbconfig/20201124-073125-marostegui.json
* 07:27 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1078 for schema change [[phab:T267335|T267335]] [[phab:T267399|T267399]]', diff saved to https://phabricator.wikimedia.org/P13375 and previous config saved to /var/cache/conftool/dbconfig/20201124-072755-marostegui.json
* 07:27 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1075 after schema change [[phab:T267335|T267335]] [[phab:T267399|T267399]]', diff saved to https://phabricator.wikimedia.org/P13374 and previous config saved to /var/cache/conftool/dbconfig/20201124-072715-marostegui.json
* 07:22 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1075 for schema change [[phab:T267335|T267335]] [[phab:T267399|T267399]]', diff saved to https://phabricator.wikimedia.org/P13373 and previous config saved to /var/cache/conftool/dbconfig/20201124-072249-marostegui.json
* 07:00 _joe_: changing the mtail recipe for mediawiki/apache to use an actual histogram
* 06:31 marostegui: Sanitize clouddb1019:3314 [[phab:T267090|T267090]]
* 06:28 marostegui: Sanitize clouddb1015:3314 [[phab:T267090|T267090]]
* 03:43 ryankemper@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99)
* 03:42 ryankemper@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99)
* 03:38 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 03:37 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 03:36 ryankemper@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99)
* 03:31 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 00:42 reedy@deploy1001: Synchronized php-1.36.0-wmf.16/extensions/SecurePoll/includes/Crypt/GpgCrypt.php: Unbreak gpg encrypted polls [[phab:T268583|T268583]] (duration: 01m 05s)
* 00:29 reedy@deploy1001: Synchronized php-1.36.0-wmf.18/extensions/SecurePoll/includes/Crypt/GpgCrypt.php: Unbreak gpg encrypted polls [[phab:T268583|T268583]] (duration: 01m 06s)
== 2020-11-23 ==
* 22:56 pt1979@cumin2001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 22:52 pt1979@cumin2001: START - Cookbook sre.dns.netbox
* 22:35 pt1979@cumin2001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 22:31 pt1979@cumin2001: START - Cookbook sre.dns.netbox
* 21:54 mutante: mwdebug1003 - removing php packages and letting puppet reinstall them after it has the correct APT config [[phab:T267248|T267248]]
* 21:51 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 21:51 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 21:26 mutante: mwdebug1003 - scap pull because <+icinga-wm> PROBLEM - Ensure local MW versions match expected deployment on mwdebug1003 is CRITICAL
* 20:41 otto@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventstreams' for release 'production' .
* 20:28 otto@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventstreams' for release 'production' .
* 20:09 dancy@deploy1001: Synchronized php: group1 wikis to 1.36.0-wmf.18 (duration: 01m 04s)
* 20:08 dancy@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.36.0-wmf.18
* 20:00 otto@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventstreams' for release 'production' .
* 19:24 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Revert {{Gerrit|a110db09adf95edb38f663c19ce596e817ecf55d}}: group1: switch ParserCache to JSON ([[phab:T263579|T263579]]) (duration: 00m 42s)
* 19:22 Urbanecm: Morning B&C done
* 19:21 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|a110db09adf95edb38f663c19ce596e817ecf55d}}: group1: switch ParserCache to JSON ([[phab:T263579|T263579]]) (duration: 01m 05s)
* 19:15 Urbanecm: Synced security patch for [[phab:T120883|T120883]] (wmf.18)
* 19:12 Urbanecm: Synced security patch for [[phab:T120883|T120883]] (wmf.16)
* 19:09 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|7561926e1dede35c2ad27d587c044a5ebf5e6648}}: GrowthExperiments: Enable help panel top-posting on svwiki, ruwiki ([[phab:T268227|T268227]]) (duration: 01m 06s)
* 17:48 volans@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 17:46 volans@cumin1001: START - Cookbook sre.dns.netbox
* 17:46 volans@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
* 17:44 pt1979@cumin2001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 17:41 volans@cumin1001: START - Cookbook sre.dns.netbox
* 17:37 hnowlan@puppetmaster1001: conftool action : set/pooled=yes:weight=10; selector: dc=codfw,cluster=maps,service=kartotherian,name=maps2010.codfw.wmnet
* 17:36 pt1979@cumin2001: START - Cookbook sre.dns.netbox
* 17:29 urbanecm@deploy1001: Synchronized private/PrivateSettings.php: Update [[phab:T250887|T250887]] mitigations (duration: 01m 05s)
* 17:22 mutante: DNS - new project language 'skr' added - Saraiki ( سرائیکی Sarā'īkī, also spelt Siraiki, or Seraiki) is an Indo-Aryan language of the Lahnda group, spoken in the south-western half of the province of Punjab in Pakistan.
* 17:12 elukey: move aqs1004 from rack A4 to A3 - [[phab:T267065|T267065]]
* 17:06 pt1979@cumin2001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 16:58 pt1979@cumin2001: START - Cookbook sre.dns.netbox
* 16:37 elukey: move analytics1070 from rack A7 to rack A5 - [[phab:T267065|T267065]]
* 15:51 otto@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventstreams' for release 'production' .
* 15:13 godog: add ipv6 forward/reverse records for grafana1002 / grafana2001
* 15:05 filippo@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 14:57 filippo@cumin1001: START - Cookbook sre.dns.netbox
* 14:26 hnowlan@puppetmaster1001: conftool action : set/pooled=yes:weight=10; selector: dc=codfw,cluster=maps,service=kartotherian,name=maps2009.codfw.wmnet
* 14:10 kormat: cleaning up heartbeat.heartbeat on pc3 [[phab:T268336|T268336]]
* 14:09 kormat: cleaning up heartbeat.heartbeat on pc2 [[phab:T268336|T268336]]
* 14:04 kormat: cleaning up heartbeat.heartbeat on pc1 [[phab:T268336|T268336]]
* 14:01 moritzm: imported prometheus-php-fpm-exporter 0.4.1+git20181018.d0d1837-2 to buster-wikimedia [[phab:T245757|T245757]]
* 13:56 XioNoX: push CR641960
* 13:56 godog: add ms-be106[0-3] to eqiad-prod with minimal weight - [[phab:T268435|T268435]]
* 13:17 moritzm: imported ploticus 2.42-4.2~wmf1 to buster-wikimedia [[phab:T245757|T245757]]
* 13:11 Lucas_WMDE: EU backport+config window done
* 13:11 lucaswerkmeister-wmde@deploy1001: Synchronized php-1.36.0-wmf.18/extensions/Wikibase: Backport: [[gerrit:642103{{!}}Calculate page props on-the-fly during RDF dump (T145712)]] (duration: 01m 14s)
* 13:01 hnowlan: started cassandra pooling maps2009
* 12:58 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1143 after schema change [[phab:T267335|T267335]] [[phab:T267399|T267399]]', diff saved to https://phabricator.wikimedia.org/P13370 and previous config saved to /var/cache/conftool/dbconfig/20201123-125815-marostegui.json
* 12:58 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1143 for schema change [[phab:T267335|T267335]] [[phab:T267399|T267399]]', diff saved to https://phabricator.wikimedia.org/P13369 and previous config saved to /var/cache/conftool/dbconfig/20201123-125759-marostegui.json
* 12:54 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1141 after schema change [[phab:T267335|T267335]] [[phab:T267399|T267399]]', diff saved to https://phabricator.wikimedia.org/P13368 and previous config saved to /var/cache/conftool/dbconfig/20201123-125417-marostegui.json
* 12:53 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1141 for schema change [[phab:T267335|T267335]] [[phab:T267399|T267399]]', diff saved to https://phabricator.wikimedia.org/P13367 and previous config saved to /var/cache/conftool/dbconfig/20201123-125345-marostegui.json
* 12:34 Lucas_WMDE: Undeployed patch for [[phab:T260349|T260349]]
* 12:32 hnowlan@puppetmaster1001: conftool action : set/pooled=yes:weight=10; selector: dc=codfw,cluster=maps,service=kartotherian,name=maps2008.codfw.wmnet
* 12:32 Urbanecm: Run scap pull at mwdebug1003
* 12:28 marostegui: Stop mysql on db1121 to clone  clouddb1017:3314 clouddb1019:3314
* 12:27 Lucas_WMDE: Deployed patch for [[phab:T260349|T260349]]
* 12:25 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1121 to clone clouddb1017:3314 clouddb1019:3314 [[phab:T267090|T267090]]', diff saved to https://phabricator.wikimedia.org/P13366 and previous config saved to /var/cache/conftool/dbconfig/20201123-122549-marostegui.json
* 12:07 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|c00d7e8e4c407b76aa2930dfa040394e874d77bc}}: Move ContentTranslation out of Beta for br, ka, ast, si and ig WPs ([[phab:T267212|T267212]], [[phab:T266217|T266217]], [[phab:T266218|T266218]], [[phab:T266219|T266219]], [[phab:T266220|T266220]]) (duration: 01m 06s)
* 12:01 Urbanecm: Start of mwscript extensions/AbuseFilter/maintenance/updateVarDumps.php --wiki=$wiki --print-orphaned-records-to=/tmp/urbanecm/$wiki-orphaned.log --progress-markers > $wiki.log in a tmux at mwmaint1002 (wiki=zhwiki; [[phab:T246539|T246539]])
* 11:49 XioNoX: eqiad row A, split LVS, Ganeti, Cloud, interface-ranges to individual terms
* 11:38 jdrewniak@deploy1001: Synchronized portals: Wikimedia Portals Update: [[gerrit:643018{{!}} Bumping portals to master (T128546)]] (duration: 01m 05s)
* 11:37 jdrewniak@deploy1001: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: [[gerrit:643018{{!}} Bumping portals to master (T128546)]] (duration: 01m 21s)
* 11:25 hnowlan: starting cassandra bootstrap of maps2008
* 11:20 effie: enable puppet on cp* hosts
* 11:16 moritzm: installing poppler security updates on stretch
* 11:13 elukey@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=99)
* 11:13 elukey@cumin1001: START - Cookbook sre.hosts.decommission
* 11:05 XioNoX: eqiad row A, standardize interfaces descriptions and ranges order
* 10:35 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 10:26 effie: disable puppet on cp* hosts to merge 641730
* 10:26 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
* 10:26 moritzm: rebooting serpens
* 10:21 XioNoX: eqiad row B, split LVS, Ganeti, Cloud, interface-ranges to individual terms
* 09:48 XioNoX: eqiad row B, standardize interfaces descriptions and ranges order
* 08:46 elukey: drop kerberos keytabs for analytics10[28-41] from krb1001:/srv/kerberos/keytabs, decommed nodes (old hadoop  test cluster)
* 08:43 godog: start stress testing on ms-be106* - [[phab:T268435|T268435]]
* 08:41 elukey: drop kerberos principals from krb1001 for analytics10[29-41], decommed nodes (old hadoop test cluster)
* 08:36 elukey: drop analytics1028's krb principals from krb1001 - old decommed node
* 08:35 moritzm: installing remaining krb5 security updates for Stretch
* 07:27 marostegui: Stop MySQL on db1125:3314 to clone clouddb1015 and clouddb1019 - lag will appear on Commosnwiki on wikireplicas - [[phab:T267090|T267090]]
* 07:06 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 07:00 marostegui@cumin1001: START - Cookbook sre.hosts.decommission
* 06:46 marostegui: Restart clouddb1013 clouddb1015 clouddb1017 clouddb1019 for testing [[phab:T267090|T267090]]
== 2020-11-21 ==
* 09:18 joal: Drop historical logs of 'Wikidata Concepts Monitor ETL' on HDFS keeping one example - freeing 60Tb
* 09:17 joal: Drop historical logs of '
* 08:28 ariel@deploy1001: Finished deploy [dumps/dumps@1a76a9a]: revinfo updates (duration: 00m 05s)
* 08:28 ariel@deploy1001: Started deploy [dumps/dumps@1a76a9a]: revinfo updates
* 08:10 elukey: remove big stderrlog fine in /var/lib/hadoop/data/d/yarn/logs/application_1605880843685_1450 on an-worker1110
* 08:05 elukey: remove big stderrlog fine in /var/lib/hadoop/data/e/yarn/logs/application_1605880843685_1450 on an-worker1105
== 2020-11-20 ==
* 23:38 mutante: synced puppet-compiler facts - new hosts should be usable in compiler
* 22:30 mutante: cumin1001 - sudo systemctl start cumin-check-aliases ->  <+icinga-wm> RECOVERY - Check systemd state on cumin1001 is OK  [[phab:T268369|T268369]]
* 21:30 razzi@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
* 20:26 razzi@cumin1001: START - Cookbook sre.ganeti.makevm
* 20:09 razzi@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
* 19:52 mutante: releases2002 - systemctl disable wmf_auto_restart_rsync; rm /usr/lib/systemd/system/wmf_auto_restart_rsync.* ; systemctl daemon-reload ; systemctl reset-failed - clear up systemd unit that was not absented and fix Icinga alerts
* 19:45 mutante: releases2002 systemctl reset-failed (wmf_auto_restart_rsync.service failed but hopefully fixed)
* 19:39 mutante: Icinga: ACKing all the "unhandled CRIT" alerts on clouddb* an an-coord* that have disabled notifications to remove monitoring noise.  from 72 to 25 active alerts
* 19:14 razzi@cumin1001: START - Cookbook sre.ganeti.makevm
* 18:47 elukey@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
* 18:42 elukey@cumin1001: START - Cookbook sre.hosts.decommission
* 18:37 elukey@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
* 18:36 razzi@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
* 18:31 elukey@cumin1001: START - Cookbook sre.hosts.decommission
* 18:31 elukey@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
* 18:18 elukey@cumin1001: START - Cookbook sre.hosts.decommission
* 18:14 dwisehaupt: shifting 100% of thank_you mail through frmxs ahead of tomorrow's banner test - [[phab:T267259|T267259]]
* 17:37 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 17:35 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 17:32 razzi@cumin1001: START - Cookbook sre.ganeti.makevm
* 17:24 razzi@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
* 16:48 volans@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
* 16:40 volans@cumin1001: START - Cookbook sre.hosts.decommission
* 16:29 razzi@cumin1001: START - Cookbook sre.ganeti.makevm
* 16:29 razzi@cumin1001: END (ERROR) - Cookbook sre.ganeti.makevm (exit_code=97)
* 16:28 razzi: removed canceled ip address records for kafka-test1002 from netbox
* 16:11 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 16:09 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 16:01 razzi@cumin1001: START - Cookbook sre.ganeti.makevm
* 16:01 razzi@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99)
* 15:42 razzi@cumin1001: START - Cookbook sre.ganeti.makevm
* 15:09 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
* 15:01 andrew@cumin1001: START - Cookbook sre.hosts.decommission
* 14:59 andrew@cumin1001: END (ERROR) - Cookbook sre.hosts.decommission (exit_code=97)
* 14:58 andrew@cumin1001: START - Cookbook sre.hosts.decommission
* 14:30 elukey: force umount/mount for /mnt/hdfs on all stat1* nodes to pick up new openjdk settings
* 14:28 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.roll-restart-masters (exit_code=0)
* 14:00 elukey: restart hadoop daemons on an-master[1001-1002] (Hadoop masters) to pick up new rack settings and openjdk upgrades
* 13:59 elukey@cumin1001: START - Cookbook sre.hadoop.roll-restart-masters
* 13:34 liw: finished trying to test scap on beta cluster
* 13:24 bblack: cp*: remove remnants of expiring globalsign-2019 unified cert, including ocsp config+outputs
* 13:12 liw: testing upcoming Scap release on beta
* 13:00 bblack: dns*: upgrade remainder of fleet to gdnsd to 3.4.1
* 12:54 elukey@cumin1001: END (PASS) - Cookbook sre.kafka.roll-restart-brokers (exit_code=0)
* 12:29 moritzm: uploaded wmf-sre-laptop 0.3 to buster-wikimedia/component/wmf-sre-laptop
* 12:16 marostegui@cumin1001: dbctl commit (dc=all): 'Set original weight to db1089', diff saved to https://phabricator.wikimedia.org/P13351 and previous config saved to /var/cache/conftool/dbconfig/20201120-121645-marostegui.json
* 12:14 marostegui: Run check private data on clouddb1013:3311  clouddb1013:3313 clouddb1015:3316 clouddb1017:3311 clouddb1017:3313 clouddb1019:3316 [[phab:T267090|T267090]]
* 12:11 Urbanecm: Start of mwscript extensions/AbuseFilter/maintenance/updateVarDumps.php --wiki=$wiki --print-orphaned-records-to=/tmp/urbanecm/$wiki-orphaned.log --progress-markers > $wiki.log in a tmux at mwmaint1002 (wiki=fawiki; [[phab:T246539|T246539]])
* 11:50 marostegui@cumin1001: dbctl commit (dc=all): 'More traffic to db1089', diff saved to https://phabricator.wikimedia.org/P13350 and previous config saved to /var/cache/conftool/dbconfig/20201120-115057-marostegui.json
* 11:47 marostegui@cumin1001: dbctl commit (dc=all): 'More traffic to db1089', diff saved to https://phabricator.wikimedia.org/P13349 and previous config saved to /var/cache/conftool/dbconfig/20201120-114758-marostegui.json
* 11:46 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1089', diff saved to https://phabricator.wikimedia.org/P13348 and previous config saved to /var/cache/conftool/dbconfig/20201120-114614-marostegui.json
* 11:15 volans@cumin2001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 11:11 volans@cumin2001: START - Cookbook sre.dns.netbox
* 10:45 marostegui@cumin1001: dbctl commit (dc=all): 'db1106 (re)pooling @ 100%: Repooling after cloning new clouddb hosts', diff saved to https://phabricator.wikimedia.org/P13347 and previous config saved to /var/cache/conftool/dbconfig/20201120-104459-root.json
* 10:29 marostegui@cumin1001: dbctl commit (dc=all): 'db1106 (re)pooling @ 75%: Repooling after cloning new clouddb hosts', diff saved to https://phabricator.wikimedia.org/P13345 and previous config saved to /var/cache/conftool/dbconfig/20201120-102955-root.json
* 10:14 marostegui@cumin1001: dbctl commit (dc=all): 'db1106 (re)pooling @ 50%: Repooling after cloning new clouddb hosts', diff saved to https://phabricator.wikimedia.org/P13344 and previous config saved to /var/cache/conftool/dbconfig/20201120-101452-root.json
* 09:59 marostegui@cumin1001: dbctl commit (dc=all): 'db1106 (re)pooling @ 25%: Repooling after cloning new clouddb hosts', diff saved to https://phabricator.wikimedia.org/P13342 and previous config saved to /var/cache/conftool/dbconfig/20201120-095949-root.json
* 09:56 elukey: update analytics filters on cr1/cr2 eqiad (ref: https://gerrit.wikimedia.org/r/c/operations/homer/public/+/642346)
* 09:21 marostegui: Move pc2010 right under pc1007 to investigate lag issues (using orchestrator for this move)
* 09:07 moritzm: updating krb5 on krb*
* 08:57 elukey@cumin1001: START - Cookbook sre.kafka.roll-restart-brokers
* 08:50 elukey@cumin1001: END (PASS) - Cookbook sre.kafka.roll-restart-mirror-maker (exit_code=0)
* 08:32 elukey@cumin1001: START - Cookbook sre.kafka.roll-restart-mirror-maker
* 08:31 elukey: roll restart kafka daemons on kafka-jumbo100* to pick up openjdk upgrades
* 08:13 marostegui: Enable GTID on clouddb1015:3316 clouddb1019:3316 - [[phab:T267090|T267090]]
* 08:10 elukey: update analytics filters on cr1/cr2 eqiad (ref: https://gerrit.wikimedia.org/r/642268)
* 08:04 marostegui: Stop db1124:3313 to clone clouddb1013:3313, clouddb1017:3313
* 08:00 XioNoX: update cloud-in4 filter in codfw
* 04:57 bblack: dns3001: upgrade gdnsd to 3.4.1
* 04:55 bblack: authdns1001: upgrade gdnsd to 3.4.1
* 04:49 bblack: authdns2001: upgrade gdnsd to 3.4.1
* 04:45 bblack: dns3002: upgrade gdnsd to 3.4.1
* 04:41 bblack: reprepro: uploaded gdnsd-3.4.1-1~wmf1 to buster-wikimedia
== 2020-11-19 ==
* 23:59 razzi@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
* 23:52 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 23:50 robh@cumin1001: START - Cookbook sre.hosts.downtime
* 23:23 robh@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 23:21 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 23:19 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 23:18 robh@cumin1001: START - Cookbook sre.hosts.downtime
* 23:18 robh@cumin1001: START - Cookbook sre.hosts.downtime
* 23:17 robh@cumin1001: START - Cookbook sre.hosts.downtime
* 23:06 razzi@cumin1001: START - Cookbook sre.ganeti.makevm
* 22:54 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 22:52 robh@cumin1001: START - Cookbook sre.hosts.downtime
* 22:23 razzi@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
* 22:07 razzi@cumin1001: START - Cookbook sre.ganeti.makevm
* 22:06 krinkle@deploy1001: Synchronized php-1.36.0-wmf.16/includes/filerepo/: [[phab:T267668|T267668]] - {{Gerrit|I1115135ee}}, and {{Gerrit|Ic239bb9807}} (duration: 01m 07s)
* 20:19 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 20:17 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 20:12 herron: upgraded logstash-next to kibana 7.10
* 19:23 otto@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'canary' .
* 19:23 otto@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
* 19:20 otto@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
* 19:20 otto@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'canary' .
* 19:14 otto@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
* 18:48 mutante: gerrit1001 - re-enabling puppet after merging gerrit:642086 for [[phab:T268260|T268260]] (upstream bug 13701)
* 18:41 mutante: gerrit1001 - added RequestHeader set "X-Forwarded-Proto" expr=%<nowiki>{</nowiki>REQUEST_SCHEME<nowiki>}</nowiki> in apache config, reloaded apache to fix redirect issue
* 18:37 mutante: gerrit1001 - disabled puppet
* 18:19 ryankemper@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99)
* 18:07 ryankemper@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99)
* 18:03 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.roll-restart-workers (exit_code=0)
* 17:59 clarakosi@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventstreams' for release 'production' .
* 17:47 clarakosi@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventstreams' for release 'production' .
* 17:33 hashar@deploy1001: Finished deploy [gerrit/gerrit@9d27055]: Upgrade gerrit1001 (primary) to Gerrit 3.2.5 (duration: 00m 09s)
* 17:33 hashar@deploy1001: Started deploy [gerrit/gerrit@9d27055]: Upgrade gerrit1001 (primary) to Gerrit 3.2.5
* 17:32 hashar: Upgrading Gerrit to 3.2.5 and restarting it
* 17:05 dancy@deploy1001: Synchronized php: group1 wikis to 1.36.0-wmf.16 (duration: 01m 06s)
* 17:04 dancy@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.36.0-wmf.16
* 16:59 ryankemper: [[phab:T246345|T246345]] [wdqs] Data-transfer of new wdqs node `wdqs1012` is complete, beginning transfer of `wdqs1004`->`wdqs1013` (public) and `wdqs1003`->`wdqs1011` (internal). Once these transfers are done `wdqs1012` and `wdqs1013` will need to be pooled and have their weights set to 10 after verifying they're healthy
* 16:58 kormat: started mariadb on pc2010, now with more 🤞
* 16:58 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 16:54 kormat: stopping mariadb on pc2010
* 16:54 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 16:43 hashar: Restarting Gerrit replica instance on gerrit2001
* 16:42 hashar@deploy1001: Finished deploy [gerrit/gerrit@9d27055]: Upgrade gerrit2001 to Gerrit 3.2.5 (take 2 after rebasing deploy server) (duration: 00m 10s)
* 16:42 hashar@deploy1001: Started deploy [gerrit/gerrit@9d27055]: Upgrade gerrit2001 to Gerrit 3.2.5 (take 2 after rebasing deploy server)
* 16:41 kormat: stopped and started replication on pc2010 to see if that would help it recover
* 16:40 hashar@deploy1001: Finished deploy [gerrit/gerrit@5a41181]: Upgrade gerrit2001 to Gerrit 3.2.5 (duration: 00m 05s)
* 16:40 hashar@deploy1001: Started deploy [gerrit/gerrit@5a41181]: Upgrade gerrit2001 to Gerrit 3.2.5
* 16:35 elukey: roll restart hadoop workers for openjdk upgrades
* 16:35 elukey@cumin1001: START - Cookbook sre.hadoop.roll-restart-workers
* 16:06 elukey@cumin1001: END (PASS) - Cookbook sre.presto.roll-restart-workers (exit_code=0)
* 15:58 moritzm: installing jupyter-notebook security updates on an-coord*
* 15:56 elukey@cumin1001: START - Cookbook sre.presto.roll-restart-workers
* 15:52 bblack: dns*: upgrade to gdnsd-3.4.0 on remainder of the dns fleet'
* 15:44 bblack: dns3001: upgrade gdnsd to 3.4.0
* 15:43 elukey@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
* 15:41 bblack: dns1001: upgrade gdnsd to 3.4.0
* 15:40 elukey@cumin1001: START - Cookbook sre.hosts.decommission
* 15:40 elukey@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
* 15:36 bblack: dns3002: upgrade gdnsd to 3.4.0
* 15:36 elukey@cumin1001: START - Cookbook sre.hosts.decommission
* 15:36 elukey@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
* 15:31 bblack: authdns1001: upgrade gdnsd to 3.4.0
* 15:30 elukey@cumin1001: START - Cookbook sre.hosts.decommission
* 15:29 elukey@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
* 15:26 elukey@cumin1001: START - Cookbook sre.hosts.decommission
* 15:25 elukey@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
* 15:23 elukey@cumin1001: START - Cookbook sre.hosts.decommission
* 15:22 elukey@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
* 15:18 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 15:18 elukey@cumin1001: START - Cookbook sre.hosts.decommission
* 15:17 elukey@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
* 15:16 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 15:11 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 15:08 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 14:57 moritzm: installing openldap security updates on buster (client side tools/libs, slapd already updated)
* 14:54 elukey@cumin1001: START - Cookbook sre.hosts.decommission
* 14:53 elukey@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
* 14:50 elukey@cumin1001: START - Cookbook sre.hosts.decommission
* 14:49 elukey@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
* 14:47 marostegui: Sanitize enwiki on clouddb1017 [[phab:T267090|T267090]]
* 14:45 elukey@cumin1001: START - Cookbook sre.hosts.decommission
* 14:44 elukey@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
* 14:43 volans@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
* 14:41 marostegui: Sanitize enwiki on clouddb1013 [[phab:T267090|T267090]]
* 14:39 volans@cumin1001: START - Cookbook sre.hosts.decommission
* 14:36 elukey@cumin1001: START - Cookbook sre.hosts.decommission
* 14:29 moritzm: rolling restart of app server canaries to pick up latest sec updates
* 14:21 moritzm: installing krb5 security updates on stretch
* 14:02 bblack: authdns2001: upgrade gdnsd to 3.4.0
* 13:45 XioNoX: push current state of audited cloud-in4 filter - [[phab:T264993|T264993]]
* 13:42 moritzm: removing stray wireshark 2.2.6 wireshark libs on Stretch
* 13:32 moritzm: installing wireshark security updates
* 13:30 bblack: dns4002: upgrade gdnsd to 3.4.0
* 13:28 bblack: reprepro: updated buster-wikimedia gdnsd package to 3.4.0-1~wmf1
* 12:43 moritzm: installing libproxy security updates on stretch
* 12:38 marostegui: Stop mysql on db1106 to clone clouddb1013 and clouddb1017 [[phab:T267090|T267090]]
* 12:25 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1106 [[phab:T267090|T267090]]', diff saved to https://phabricator.wikimedia.org/P13334 and previous config saved to /var/cache/conftool/dbconfig/20201119-122459-marostegui.json
* 12:00 marostegui@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
* 11:53 marostegui@cumin1001: START - Cookbook sre.hosts.decommission
* 11:46 marostegui@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
* 11:44 moritzm: installing Java security updates on Hadoop/Kafka Jumbo hosts
* 11:42 marostegui@cumin1001: START - Cookbook sre.hosts.decommission
* 11:40 elukey@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
* 11:33 elukey@cumin1001: START - Cookbook sre.hosts.decommission
* 11:00 Urbanecm: Start of mwscript extensions/AbuseFilter/maintenance/updateVarDumps.php --wiki=$wiki --print-orphaned-records-to=/tmp/urbanecm/$wiki-orphaned.log --progress-markers > $wiki.log in a tmux at mwmaint1002 (wiki=ruwiki; [[phab:T246539|T246539]])
* 10:28 marostegui: Restart mysql on db1115, tendril and dbtree will be down for a few minutes
* 09:40 marostegui: Stop mysql on db1124:3311 to clone clouddb1013 and clouddb1017, there will be lag on s1 on wikireplicas - [[phab:T267090|T267090]]
* 09:29 moritzm: upgrading serpens to Buster
* 09:26 XioNoX: eqiad row C: move Ganeti/LVS interfaces to individual terms
* 09:07 elukey: restart kafka daemons on kafka-jumbo1001 for openjdk upgrades (canary)
* 08:56 effie: disable puppet on mw canaries to merge 641816
* 08:55 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.roll-restart-masters (exit_code=0)
* 08:49 elukey: restart hadoop daemons on analytics1058 for openjdk upgrades (canary)
* 08:25 elukey@cumin1001: START - Cookbook sre.hadoop.roll-restart-masters
* 08:19 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.roll-restart-masters (exit_code=0)
* 08:19 XioNoX: eqiad row C: standardize interfaces config
* 07:55 XioNoX: eqiad row D: move Ganeti/LVS interfaces to individual terms
* 07:47 XioNoX: eqiad row D: standardize interfaces config
* 07:22 elukey@cumin1001: START - Cookbook sre.hadoop.roll-restart-masters
* 07:21 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.roll-restart-workers (exit_code=0)
* 07:05 elukey: roll restart java daemons on Hadoop test for openjdk upgrades
* 07:05 elukey@cumin1001: START - Cookbook sre.hadoop.roll-restart-workers
* 06:22 marostegui@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
* 06:21 marostegui: Remove es1014 from tendril and zarcillo [[phab:T268102|T268102]]
* 06:18 marostegui@cumin1001: START - Cookbook sre.hosts.decommission
* 06:08 marostegui: Stop mysql on db1125:3316 to clone clouddb1015 and clouddb1019, there will be lag on s6 on wikireplicas - [[phab:T267090|T267090]]
* 02:41 ryankemper@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99)
* 01:30 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
== 2020-11-18 ==
* 23:34 mutante: disabling puppet on memcache::mediawiki - deploying gerrit:637742
* 22:56 dpifke@deploy1001: Finished deploy [performance/arc-lamp@6bbac6d]: Fix for bytes/str issue after [[phab:T267269|T267269]] (duration: 00m 04s)
* 22:56 dpifke@deploy1001: Started deploy [performance/arc-lamp@6bbac6d]: Fix for bytes/str issue after [[phab:T267269|T267269]]
* 22:24 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 22:22 robh@cumin1001: START - Cookbook sre.hosts.downtime
* 22:19 urbanecm@deploy1001: Synchronized wmf-config/CommonSettings.php: Deploy GlobalWatchlist to beta (noop; [[phab:T268181|T268181]]) (duration: 01m 04s)
* 22:11 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Deploy GlobalWatchlist extension: Prepare IS.php to know relevant variables (noop; [[phab:T268181|T268181]]) (duration: 01m 06s)
* 22:05 urbanecm@deploy1001: Synchronized wmf-config/extension-list: Deploy GlobalWatchlist extension to beta: add it to extension-list ([[phab:T268181|T268181]]) (duration: 01m 05s)
* 21:53 mutante: mwdebug1003 - restarting ferm because config was generated but service not restarted due to puppet dependency errors, breaking NRPE monitoring [[phab:T267248|T267248]]
* 21:47 mutante: mwdebug1003 - scap pull - [[phab:T267248|T267248]]
* 21:40 mutante: mw1317,mw1318 - back in action and all monitoring activated again
* 21:17 dzahn@cumin1001: conftool action : set/weight=10; selector: name=mw1318.eqiad.wmnet,cluster=videoscaler
* 21:08 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1317.eqiad.wmnet
* 21:08 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1318.eqiad.wmnet
* 21:02 mutante: mw1317,mw1318 - repooled=no after physical move to rack B
* 20:56 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1318.eqiad.wmnet
* 20:54 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1317.eqiad.wmnet
* 20:27 mutante: mw1317, mw1318 shutting down for physical move
* 20:21 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw1318.eqiad.wmnet
* 20:21 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw1317.eqiad.wmnet
* 20:15 mutante: mw1317,mw1318 - downtimed and depooled - they are physically moving from B7 to B5 ([[phab:T266164|T266164]])
* 20:13 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 20:13 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 20:13 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 20:13 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 20:11 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1317.eqiad.wmnet
* 20:11 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1318.eqiad.wmnet
* 20:10 dancy@deploy1001: Synchronized php: group1 wikis to 1.36.0-wmf.18 (duration: 01m 03s)
* 20:09 dancy@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.36.0-wmf.18
* 20:03 akosiaris@cumin1001: conftool action : set/pooled=inactive; selector: service=recommendation-api,name=kubernetes.*,dc=codfw
* 20:03 akosiaris@cumin1001: conftool action : set/weight=0; selector: service=recommendation-api,name=kubernetes.*,dc=codfw
* 19:53 akosiaris@cumin1001: conftool action : set/pooled=no; selector: service=recommendation-api,name=kubernetes.*,dc=codfw
* 19:48 otto@deploy1001: Synchronized php-1.36.0-wmf.16/extensions/EventLogging/modules/ext.eventLogging/core.js: EventLogging legacy events should use dt as server side receive time - [[phab:T240460|T240460]] (duration: 01m 06s)
* 19:45 otto@deploy1001: Synchronized php-1.36.0-wmf.18/extensions/EventLogging/modules/ext.eventLogging/core.js: EventLogging legacy events should use dt as server side receive time - [[phab:T240460|T240460]] (duration: 01m 07s)
* 19:26 pt1979@cumin2001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 19:23 ppchelko@deploy1001: Synchronized wmf-config/InitialiseSettings.php: gerrit:635607 - Switch ParserCache to JSON for group0 wikis (duration: 01m 05s)
* 19:19 ppchelko@deploy1001: Synchronized wmf-config/CommonSettings.php: gerrit:635086 - Enable parsoid on api_appserver (duration: 01m 04s)
* 19:19 pt1979@cumin2001: START - Cookbook sre.dns.netbox
* 19:13 ppchelko@deploy1001: Synchronized wmf-config/CommonSettings.php: gerrit:641527 - Set  to 0 (duration: 01m 04s)
* 18:26 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 18:26 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 17:44 jayme@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 17:38 jayme@cumin1001: START - Cookbook sre.dns.netbox
* 17:18 elukey: shutdown an-presto1004 for hw maintenance
* 17:13 akosiaris: [[phab:T241230|T241230]] pool codfw kubernetes for recommendation-api at a very low weight
* 17:12 akosiaris@cumin1001: conftool action : set/pooled=yes; selector: service=recommendation-api,name=kubernetes.*,dc=codfw
* 17:12 akosiaris@cumin1001: conftool action : set/weight=1; selector: service=recommendation-api,name=kubernetes.*,dc=codfw
* 16:52 jbond42: drop os_version/requiers_os functions from wmflib
* 16:50 elukey: update /etc/krb5.keytab on krb1001/krb2001 to match the most up to date key version for host/krb2001.codfw.wmnet
* 16:49 hnowlan@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
* 16:49 hnowlan@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
* 16:44 hnowlan@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
* 16:43 hnowlan@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
* 16:38 reedy@deploy1001: Synchronized wmf-config/logging.php: [[phab:T268141|T268141]] (duration: 01m 06s)
* 16:36 pt1979@cumin2001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 16:32 robh@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 16:31 pt1979@cumin2001: START - Cookbook sre.dns.netbox
* 16:27 robh@cumin1001: START - Cookbook sre.dns.netbox
* 15:59 akosiaris@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'recommendation-api' for release 'production' .
* 15:56 akosiaris@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'recommendation-api' for release 'production' .
* 15:51 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'recommendation-api' for release 'production' .
* 15:42 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 15:16 Urbanecm: mwscript deleteEqualMessages.php --wiki=cswiki --delete
* 15:14 akosiaris@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'recommendation-api' for release 'production' .
* 15:12 akosiaris@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'recommendation-api' for release 'production' .
* 15:12 otto@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
* 15:12 otto@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'canary' .
* 15:11 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'recommendation-api' for release 'production' .
* 15:09 otto@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'canary' .
* 15:09 otto@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
* 15:05 otto@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
* 15:03 Urbanecm: Purge https://2030.wikimedia.org/ via purgeList.php ([[phab:T264797|T264797]])
* 14:34 akosiaris@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'recommendation-api' for release 'production' .
* 14:30 akosiaris@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'recommendation-api' for release 'production' .
* 14:30 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'recommendation-api' for release 'production' .
* 14:17 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'recommendation-api' for release 'production' .
* 14:13 Urbanecm: Purge https://2030.wikimedia.org/ via purgeList.php ([[phab:T264797|T264797]])
* 14:09 elukey: copied /etc/krb5.keytab from krb1001 to krb2001 (the last one contained only one principal for 2001, the first one both for 1001 and 2001)
* 14:05 moritzm: installing openldap security updates on ro replicas
* 14:03 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'recommendation-api' for release 'production' .
* 14:02 elukey: restart krb5-kpropd.service on krb2001 to force the pick up of new client configs
* 13:35 bblack: cache_text: Executing "varnishadm -n frontend param.set nuke_limit 1000" - [[phab:T266373|T266373]]
* 13:34 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'recommendation-api' for release 'production' .
* 13:30 moritzm: installing openldap security updates on corp replicas
* 13:08 Urbanecm: EU B&C done (~15 minutes ago)
* 12:43 akosiaris: sync staging cluster's helmfile.d/admin state. Aside from calico, the rest is a noop
* 12:43 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'kube-system' for release 'eventrouter' .
* 12:42 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'kube-system' for release 'calico-policy-controller' .
* 12:42 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'kube-system' for release 'coredns' .
* 12:42 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'kube-system' for release 'rbac-deploy-clusterrole' .
* 12:40 urbanecm@deploy1001: Synchronized php-1.36.0-wmf.16/extensions/GrowthExperiments/includes/NewcomerTasks/TaskSuggester/NewcomerTasksCacheRefreshJob.php: {{Gerrit|5488f56c7458fa8fb9be5f41f131e00b26a84cc0}}: Fix NewcomerTasksCacheRefreshJob ([[phab:T268008|T268008]]) (duration: 01m 05s)
* 12:25 urbanecm@deploy1001: Synchronized php-1.36.0-wmf.18/extensions/GrowthExperiments/includes/NewcomerTasks/TaskSuggester/NewcomerTasksCacheRefreshJob.php: {{Gerrit|45d71a37f381e81e5382c8e10ac4063c9665beb8}}: Fix NewcomerTasksCacheRefreshJob ([[phab:T268008|T268008]]) (duration: 01m 05s)
* 12:13 Urbanecm: Purge https://en.wikipedia.org/static/images/project-logos/<nowiki>{</nowiki>bnwiki,bnwiki-1.5x,bnwiki-2x<nowiki>}</nowiki>.png ([[phab:T265553|T265553]])
* 12:13 akosiaris@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=releases
* 12:11 urbanecm@deploy1001: Synchronized static/images/project-logos/: {{Gerrit|70aabf7ec8e1b549e78978e48967fb70d21316de}}: Regenerate Bengali Wikipedia logo ([[phab:T265553|T265553]]) (duration: 01m 06s)
* 12:06 akosiaris@cumin1001: conftool action : set/ttl=300; selector: dnsdisc=wikifeeds
* 12:01 akosiaris@cumin1001: conftool action : set/pooled=false; selector: dnsdisc=restbase-async,name=eqiad
* 12:00 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool pc1009 in pc3 after restarting mysql [[phab:T266483|T266483]] (duration: 01m 06s)
* 12:00 akosiaris@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=blubberoid,name=eqiad
* 11:56 Urbanecm: Start of mwscript extensions/AbuseFilter/maintenance/updateVarDumps.php --wiki=$wiki --print-orphaned-records-to=/tmp/urbanecm/$wiki-orphaned.log --progress-markers > $wiki.log in a tmux at mwmaint1002 (wiki=frwiki; [[phab:T246539|T246539]])
* 11:56 marostegui: Restart mysql on pc1009 [[phab:T266483|T266483]]
* 11:56 Urbanecm: End of mwscript extensions/AbuseFilter/maintenance/updateVarDumps.php --wiki=$wiki --print-orphaned-records-to=/tmp/urbanecm/$wiki-orphaned.log --progress-markers > $wiki.log in a tmux at mwmaint1002 (wiki=nlwiki; [[phab:T246539|T246539]])
* 11:56 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool pc1009 and place pc1010 instead of it [[phab:T266483|T266483]] (duration: 01m 18s)
* 11:40 XioNoX: eqiad row D: remove un-needed "enable" keywords
* 10:59 jbond@cumin1001: END (FAIL) - Cookbook sre.puppet.renew-cert (exit_code=99)
* 10:59 jbond@cumin1001: START - Cookbook sre.puppet.renew-cert
* 10:58 jbond42: renew sretest1002 ssl cert to test cookbook
* 10:51 hnowlan@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'changeprop' for release 'production' .
* 10:51 hnowlan@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'changeprop' for release 'staging' .
* 10:48 hnowlan@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'changeprop' for release 'staging' .
* 10:48 hnowlan@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'changeprop' for release 'production' .
* 10:25 godog: ms-be1022 - disable failed sdb
* 10:01 XioNoX: eqiad row D: Standardize interfaces descriptions
* 09:56 moritzm: uploaded libexif 0.6.21-2+deb8u4+wmf1 to jessie-wikimedia
* 09:22 elukey: set dns_canonicalize_hostname = false to all kerberos clients
* 09:13 jbond42: renew puppet certificate of seaborgium
* 08:34 marostegui: Stop MySQL on es1011, es1012, es1014 [[phab:T268100|T268100]] [[phab:T268101|T268101]] [[phab:T268102|T268102]]
* 08:29 marostegui@cumin1001: dbctl commit (dc=all): 'Remove es1012 from dbctl [[phab:T268101|T268101]]', diff saved to https://phabricator.wikimedia.org/P13326 and previous config saved to /var/cache/conftool/dbconfig/20201118-082942-marostegui.json
* 08:26 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es1012 before decommissioning it', diff saved to https://phabricator.wikimedia.org/P13325 and previous config saved to /var/cache/conftool/dbconfig/20201118-082636-marostegui.json
* 08:26 marostegui@cumin1001: dbctl commit (dc=all): 'es1032 (re)pooling @ 100%: Slowly pool es1032 after being recloned [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P13324 and previous config saved to /var/cache/conftool/dbconfig/20201118-082618-root.json
* 08:11 marostegui@cumin1001: dbctl commit (dc=all): 'es1032 (re)pooling @ 80%: Slowly pool es1032 after being recloned [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P13323 and previous config saved to /var/cache/conftool/dbconfig/20201118-081115-root.json
* 07:56 marostegui@cumin1001: dbctl commit (dc=all): 'es1032 (re)pooling @ 75%: Slowly pool es1032 after being recloned [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P13322 and previous config saved to /var/cache/conftool/dbconfig/20201118-075612-root.json
* 07:45 marostegui: Deploy schema change on db1098:3316 [[phab:T267335|T267335]] [[phab:T267399|T267399]]
* 07:41 marostegui@cumin1001: dbctl commit (dc=all): 'es1032 (re)pooling @ 60%: Slowly pool es1032 after being recloned [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P13321 and previous config saved to /var/cache/conftool/dbconfig/20201118-074108-root.json
* 07:28 Urbanecm: Start of mwscript extensions/AbuseFilter/maintenance/updateVarDumps.php --wiki=$wiki --print-orphaned-records-to=/tmp/urbanecm/$wiki-orphaned.log --progress-markers > $wiki.log in a tmux at mwmaint1002 (wiki=nlwiki; [[phab:T246539|T246539]])
* 07:26 marostegui@cumin1001: dbctl commit (dc=all): 'es1032 (re)pooling @ 50%: Slowly pool es1032 after being recloned [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P13320 and previous config saved to /var/cache/conftool/dbconfig/20201118-072605-root.json
* 07:16 marostegui: Run check table on s6 on db1125:3316 [[phab:T267090|T267090]]
* 07:11 marostegui@cumin1001: dbctl commit (dc=all): 'es1032 (re)pooling @ 30%: Slowly pool es1032 after being recloned [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P13319 and previous config saved to /var/cache/conftool/dbconfig/20201118-071101-root.json
* 06:55 marostegui@cumin1001: dbctl commit (dc=all): 'es1032 (re)pooling @ 25%: Slowly pool es1032 after being recloned [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P13318 and previous config saved to /var/cache/conftool/dbconfig/20201118-065558-root.json
* 06:53 elukey: restart also mirror maker on kafka-main1001/1003 (seems not related but just to clear old errors and a possible weird state)
* 06:45 marostegui@cumin1001: dbctl commit (dc=all): 'es1018 (re)pooling @ 100%: Slowly pool es1018 after cloning es1032 [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P13317 and previous config saved to /var/cache/conftool/dbconfig/20201118-064556-root.json
* 06:40 marostegui@cumin1001: dbctl commit (dc=all): 'es1032 (re)pooling @ 20%: Slowly pool es1032 after being recloned [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P13316 and previous config saved to /var/cache/conftool/dbconfig/20201118-064054-root.json
* 06:37 elukey: restart kafka-mirror-main-codfw_to_main-eqiad@0.service on kafka-main1002 - consumer msg rate low since kafka-main2003 went down for codfw c7 failure
* 06:30 marostegui@cumin1001: dbctl commit (dc=all): 'es1018 (re)pooling @ 75%: Slowly pool es1018 after cloning es1032 [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P13315 and previous config saved to /var/cache/conftool/dbconfig/20201118-063052-root.json
* 06:25 marostegui@cumin1001: dbctl commit (dc=all): 'es1032 (re)pooling @ 10%: Slowly pool es1032 after being recloned [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P13314 and previous config saved to /var/cache/conftool/dbconfig/20201118-062551-root.json
* 06:25 marostegui@cumin1001: dbctl commit (dc=all): 'Remove es1014 from dbctl', diff saved to https://phabricator.wikimedia.org/P13313 and previous config saved to /var/cache/conftool/dbconfig/20201118-062547-marostegui.json
* 06:15 marostegui@cumin1001: dbctl commit (dc=all): 'es1018 (re)pooling @ 50%: Slowly pool es1018 after cloning es1032 [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P13312 and previous config saved to /var/cache/conftool/dbconfig/20201118-061549-root.json
* 06:13 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es1014 before decommissioning it', diff saved to https://phabricator.wikimedia.org/P13311 and previous config saved to /var/cache/conftool/dbconfig/20201118-061340-marostegui.json
* 06:12 marostegui@cumin1001: dbctl commit (dc=all): 'Set es1027 as new es1 master', diff saved to https://phabricator.wikimedia.org/P13310 and previous config saved to /var/cache/conftool/dbconfig/20201118-061218-marostegui.json
* 06:11 marostegui@cumin1001: dbctl commit (dc=all): 'Remove es1011 from dbctl', diff saved to https://phabricator.wikimedia.org/P13309 and previous config saved to /var/cache/conftool/dbconfig/20201118-061112-marostegui.json
* 06:06 marostegui@cumin1001: dbctl commit (dc=all): 'Pool es1032 with minimum weight on es1 [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P13308 and previous config saved to /var/cache/conftool/dbconfig/20201118-060641-marostegui.json
* 06:00 marostegui@cumin1001: dbctl commit (dc=all): 'es1018 (re)pooling @ 25%: Slowly pool es1018 after cloning es1032 [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P13307 and previous config saved to /var/cache/conftool/dbconfig/20201118-060045-root.json
* 05:47 marostegui: Run check table on enwiki on db1124:3311 [[phab:T267090|T267090]]
* 05:45 marostegui@cumin1001: dbctl commit (dc=all): 'es1018 (re)pooling @ 10%: Slowly pool es1018 after cloning es1032 [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P13306 and previous config saved to /var/cache/conftool/dbconfig/20201118-054542-root.json
* 00:53 tgr_: also deployed [[gerrit:641294{{!}}Suggested Edits: Guard against task type not existing (T268012)]]
* 00:52 tgr@deploy1001: Synchronized php-1.36.0-wmf.18/extensions/GrowthExperiments/includes/HomepageModules/SuggestedEdits.php: Backport: [[gerrit:641295{{!}}Suggested edits: Guard against empty topic data (T268015)]] (duration: 01m 07s)
* 00:27 tgr@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:641250{{!}}Enable watchlist expiry feature on Wikidata & Commons (T266874)]] (duration: 01m 03s)
== 2020-11-17 ==
* 22:54 mforns@deploy1001: Finished deploy [analytics/refinery@f19d20c] (thin): Regular analytics weekly train THIN [analytics/refinery@f19d20c21ada05df230d00c6e0022a7d5c356c13] (duration: 00m 07s)
* 22:54 mforns@deploy1001: Started deploy [analytics/refinery@f19d20c] (thin): Regular analytics weekly train THIN [analytics/refinery@f19d20c21ada05df230d00c6e0022a7d5c356c13]
* 22:53 mforns@deploy1001: Finished deploy [analytics/refinery@f19d20c]: Regular analytics weekly train [analytics/refinery@f19d20c21ada05df230d00c6e0022a7d5c356c13] (duration: 12m 51s)
* 22:45 clarakosi@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'mathoid' for release 'production' .
* 22:40 mforns@deploy1001: Started deploy [analytics/refinery@f19d20c]: Regular analytics weekly train [analytics/refinery@f19d20c21ada05df230d00c6e0022a7d5c356c13]
* 22:39 clarakosi@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'mathoid' for release 'production' .
* 22:29 clarakosi@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'mathoid' for release 'staging' .
* 22:10 mutante: otrs1001 - systemctl start otrs-cache-cleanup
* 22:08 ppchelko@deploy1001: Finished deploy [restbase/deploy@8363aeb]: update to service-runner 2.8.0, everywhere (duration: 11m 07s)
* 22:07 mutante: otrs1001 - removing otrs-cache-cleanup cron from otrs's crontab - adding same command as systemd timer. gerrit:637038 [[phab:T265138|T265138]]
* 21:57 ppchelko@deploy1001: Started deploy [restbase/deploy@8363aeb]: update to service-runner 2.8.0, everywhere
* 21:32 ppchelko@deploy1001: Finished deploy [restbase/deploy@8363aeb]: update to service-runner 2.8.0, codfw (duration: 07m 11s)
* 21:24 ppchelko@deploy1001: Started deploy [restbase/deploy@8363aeb]: update to service-runner 2.8.0, codfw
* 20:56 dancy@deploy1001: rebuilt and synchronized wikiversions files: group0 wikis to 1.36.0-wmf.18
* 20:43 Urbanecm: End of mwscript extensions/AbuseFilter/maintenance/updateVarDumps.php --wiki=$wiki --print-orphaned-records-to=/tmp/urbanecm/$wiki-orphaned.log --progress-markers > $wiki.log in a tmux at mwmaint1002 (wiki=itwiki; [[phab:T246539|T246539]])
* 20:31 dancy@deploy1001: Finished scap: testwikis wikis to 1.36.0-wmf.18 (duration: 39m 37s)
* 19:58 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 19:56 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 19:52 dancy@deploy1001: Started scap: testwikis wikis to 1.36.0-wmf.18
* 19:50 ppchelko@deploy1001: Finished deploy [restbase/deploy@8363aeb]: update to service-runner 2.8.0, canary on 2010 (duration: 02m 03s)
* 19:48 ppchelko@deploy1001: Started deploy [restbase/deploy@8363aeb]: update to service-runner 2.8.0, canary on 2010
* 19:46 dancy@deploy1001: Pruned MediaWiki: 1.36.0-wmf.11 (duration: 13m 05s)
* 19:24 elukey@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
* 19:21 elukey@cumin1001: START - Cookbook sre.hosts.decommission
* 19:18 elukey@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
* 19:12 otto@deploy1001: Synchronized wmf-config/InitialiseSettings-labs.php: wgEventStreamsDefaultSettings in beta should only set eqiad as topic prefix - [[phab:T253069|T253069]] (duration: 02m 26s)
* 19:12 elukey@cumin1001: START - Cookbook sre.hosts.decommission
* 19:09 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 19:07 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 18:38 ejegg: updated standalone SmashPig deployment from {{Gerrit|09f29c1da5}} to {{Gerrit|63dffcb11f}}
* 18:36 ejegg: updated fundraising python tools from {{Gerrit|68e054c9ad}} to {{Gerrit|41cab089da}}
* 18:09 jynus: stopping db1139 for hw maintenance [[phab:T261405|T261405]]
* 17:59 dpifke@deploy1001: Finished deploy [performance/navtiming@8eaf7db]: (no justification provided) (duration: 00m 05s)
* 17:58 dpifke@deploy1001: Started deploy [performance/navtiming@8eaf7db]: (no justification provided)
* 17:37 dpifke@deploy1001: Finished deploy [performance/coal@43b91df]: (no justification provided) (duration: 00m 06s)
* 17:37 dpifke@deploy1001: Started deploy [performance/coal@43b91df]: (no justification provided)
* 17:34 dpifke@deploy1001: Finished deploy [statsv/statsv@249d073]: (no justification provided) (duration: 00m 05s)
* 17:34 dpifke@deploy1001: Started deploy [statsv/statsv@249d073]: (no justification provided)
* 17:27 dpifke@deploy1001: Finished deploy [statsv/statsv@873ea90]: (no justification provided) (duration: 00m 05s)
* 17:27 dpifke@deploy1001: Started deploy [statsv/statsv@873ea90]: (no justification provided)
* 17:19 dzahn@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
* 17:16 dpifke@deploy1001: Finished deploy [performance/arc-lamp@55d4d41]: (no justification provided) (duration: 00m 04s)
* 17:16 dpifke@deploy1001: Started deploy [performance/arc-lamp@55d4d41]: (no justification provided)
* 17:15 dpifke@deploy1001: Finished deploy [performance/arc-lamp@55fccc6]: (no justification provided) (duration: 00m 04s)
* 17:15 dpifke@deploy1001: Started deploy [performance/arc-lamp@55fccc6]: (no justification provided)
* 17:08 dpifke@deploy1001: Finished deploy [performance/coal@5a32eb2]: (no justification provided) (duration: 00m 04s)
* 17:08 dpifke@deploy1001: Started deploy [performance/coal@5a32eb2]: (no justification provided)
* 16:47 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm
* 16:46 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 16:46 filippo@cumin1001: START - Cookbook sre.hosts.downtime
* 16:42 jbond42: re-enable puppet fleet wide
* 16:36 clarakosi@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' .
* 16:33 clarakosi@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' .
* 16:29 clarakosi@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'staging' .
* 16:22 moritzm: uploaded zeromq3 4.0.5+dfsg-2+deb8u2+wmf1 to jessie-wikimedia
* 16:13 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 16:13 klausman@cumin1001: START - Cookbook sre.hosts.downtime
* 16:04 volans: powercycle ms-be1030.eqiad.wmnet, unresponsive to ping/ssh, no prompt in console, nothing in hw logs
* 15:34 ayounsi@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 15:27 ayounsi@cumin1001: START - Cookbook sre.dns.netbox
* 15:16 pt1979@cumin2001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 15:16 jbond42: disable puppet fleet wide
* 15:10 pt1979@cumin2001: START - Cookbook sre.dns.netbox
* 15:09 otto@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-main' for release 'canary' .
* 15:09 otto@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
* 15:01 otto@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-main' for release 'canary' .
* 15:01 otto@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
* 14:59 cdanis@deploy1001: Synchronized docroot/thankyou: Special docroot for thankyouwiki [[phab:T259312|T259312]] {{Gerrit|d2a20ec57}} (duration: 00m 55s)
* 14:58 otto@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'canary' .
* 14:58 otto@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
* 14:57 elukey: stutdown stat1008 for ram expansion
* 14:55 otto@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'canary' .
* 14:55 otto@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
* 14:49 otto@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-analytics' for release 'production' .
* 14:48 ayounsi@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 14:47 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 14:47 filippo@cumin1001: START - Cookbook sre.hosts.downtime
* 14:44 ayounsi@cumin1001: START - Cookbook sre.dns.netbox
* 14:43 XioNoX: codfw row A: move ganeti and LVS from interface-range to individual term
* 14:41 otto@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-analytics' for release 'production' .
* 14:37 Urbanecm: Start of mwscript extensions/AbuseFilter/maintenance/updateVarDumps.php --wiki=$wiki --print-orphaned-records-to=/tmp/urbanecm/$wiki-orphaned.log --progress-markers > $wiki.log in a tmux at mwmaint1002 (wiki=itwiki; [[phab:T246539|T246539]])
* 14:36 otto@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'canary' .
* 14:36 otto@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
* 14:33 otto@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
* 14:33 otto@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'canary' .
* 14:03 XioNoX: codfw row A: standardize interfaces
* 13:37 elukey@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
* 13:36 XioNoX: codfw row B: move ganeti, Cloud and LVS from interface-range to individual term
* 13:29 elukey@cumin1001: START - Cookbook sre.hosts.decommission
* 13:27 elukey@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
* 13:23 elukey@cumin1001: START - Cookbook sre.hosts.decommission
* 13:22 elukey@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=99)
* 13:22 elukey@cumin1001: START - Cookbook sre.hosts.decommission
* 13:21 elukey@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=99)
* 13:21 elukey@cumin1001: START - Cookbook sre.hosts.decommission
* 13:09 XioNoX: codfw row B: remove extra "enable"
* 12:59 Lucas_WMDE: EU backport&config window done (again ☺)
* 12:58 moritzm: updating idp-test* to 6.2.4-2
* 12:57 XioNoX: codfw row B: Standardize interfaces descriptions
* 12:55 lucaswerkmeister-wmde@deploy1001: Synchronized php-1.36.0-wmf.16/extensions/GrowthExperiments/includes/HomepageModules/SuggestedEdits.php: Backport: [[gerrit:641293{{!}}Suggested Edits: Guard against task type not existing (T268012)]] (duration: 00m 58s)
* 12:53 bblack: cpNNNN: removing old (30d+) failure reports from /var/cache/ocsp
* 12:42 moritzm: IDP updated to 6.2.4
* 12:33 Lucas_WMDE: reopen EU backport&config window
* 12:23 XioNoX: codfw row C: move ganeti and LVS from interface-range to individual term
* 12:15 XioNoX: codfw row C: remove extra "enable"
* 12:15 Lucas_WMDE: EU backport&config window done
* 12:13 hnowlan@puppetmaster1001: conftool action : set/pooled=yes:weight=10; selector: dc=codfw,cluster=maps,service=kartotherian,name=maps2006.codfw.wmnet
* 12:13 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/InitialiseSettings-labs.php: Config: [[gerrit:631496{{!}}Remove migration settings in InitialiseSettings.php (T264286)]], 2/2 (labs) (duration: 00m 56s)
* 12:12 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:631496{{!}}Remove migration settings in InitialiseSettings.php (T264286)]], 1/2 (prod) (duration: 00m 56s)
* 12:05 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/Wikibase.php: Config: [[gerrit:631431{{!}}Remove migration settings in Wikibase.php (T264286)]] (duration: 00m 57s)
* 11:51 XioNoX: codfw row C: Standardize interfaces descriptions
* 10:46 marostegui: Run a test on check_private_data on clouddb1013 for s1 and s3 - [[phab:T267090|T267090]]
* 10:25 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool pc1008 in pc2 after restarting mysql [[phab:T266483|T266483]] (duration: 00m 56s)
* 10:21 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 10:21 kormat@cumin1001: START - Cookbook sre.hosts.downtime
* 10:19 marostegui: Restart mysql on pc1008 [[phab:T266483|T266483]]
* 10:19 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool pc1008 and place pc1010 instead of it [[phab:T266483|T266483]] (duration: 00m 57s)
* 09:29 volans@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
* 09:17 volans@cumin1001: START - Cookbook sre.hosts.decommission
* 09:14 volans@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
* 09:10 volans@cumin1001: START - Cookbook sre.hosts.decommission
* 09:08 elukey@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
* 09:02 elukey@cumin1001: START - Cookbook sre.hosts.decommission
* 09:01 elukey@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
* 08:56 elukey@cumin1001: START - Cookbook sre.hosts.decommission
* 08:56 elukey@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
* 08:55 marostegui@cumin1001: dbctl commit (dc=all): 'Set es1028 as new es3 master', diff saved to https://phabricator.wikimedia.org/P13301 and previous config saved to /var/cache/conftool/dbconfig/20201117-085542-marostegui.json
* 08:54 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es1011 before decommissioning it and pool es1026 as new es2 master', diff saved to https://phabricator.wikimedia.org/P13300 and previous config saved to /var/cache/conftool/dbconfig/20201117-085432-marostegui.json
* 08:52 elukey@cumin1001: START - Cookbook sre.hosts.decommission
* 08:47 marostegui@cumin1001: dbctl commit (dc=all): 'es1034 (re)pooling @ 100%: Slowly pool es1034 after being recloned [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P13299 and previous config saved to /var/cache/conftool/dbconfig/20201117-084744-root.json
* 08:47 marostegui@cumin1001: dbctl commit (dc=all): 'es1033 (re)pooling @ 100%: Slowly pool es1033 after being recloned [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P13298 and previous config saved to /var/cache/conftool/dbconfig/20201117-084733-root.json
* 08:43 marostegui: Truncate tendril.global_status_log - [[phab:T231185|T231185]]
* 08:37 elukey@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
* 08:33 elukey@cumin1001: START - Cookbook sre.hosts.decommission
* 08:32 marostegui@cumin1001: dbctl commit (dc=all): 'es1034 (re)pooling @ 80%: Slowly pool es1034 after being recloned [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P13297 and previous config saved to /var/cache/conftool/dbconfig/20201117-083241-root.json
* 08:32 marostegui@cumin1001: dbctl commit (dc=all): 'es1033 (re)pooling @ 80%: Slowly pool es1033 after being recloned [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P13296 and previous config saved to /var/cache/conftool/dbconfig/20201117-083229-root.json
* 08:31 elukey@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
* 08:24 elukey@cumin1001: START - Cookbook sre.hosts.decommission
* 08:22 volans: restart netbox on netbox1001 to test new logging configuration
* 08:17 marostegui@cumin1001: dbctl commit (dc=all): 'es1034 (re)pooling @ 75%: Slowly pool es1034 after being recloned [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P13295 and previous config saved to /var/cache/conftool/dbconfig/20201117-081737-root.json
* 08:17 marostegui@cumin1001: dbctl commit (dc=all): 'es1033 (re)pooling @ 75%: Slowly pool es1033 after being recloned [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P13294 and previous config saved to /var/cache/conftool/dbconfig/20201117-081726-root.json
* 08:02 marostegui@cumin1001: dbctl commit (dc=all): 'es1034 (re)pooling @ 60%: Slowly pool es1034 after being recloned [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P13293 and previous config saved to /var/cache/conftool/dbconfig/20201117-080234-root.json
* 08:02 marostegui@cumin1001: dbctl commit (dc=all): 'es1033 (re)pooling @ 60%: Slowly pool es1033 after being recloned [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P13292 and previous config saved to /var/cache/conftool/dbconfig/20201117-080222-root.json
* 07:58 XioNoX: codfw row D: Convert LVS ranges to individual interfaces
* 07:54 XioNoX: codfw row D: explicitly set access ports to "interface-mode access"
* 07:49 XioNoX: split codfw row D ganeti switch ports out of the interface group
* 07:47 marostegui@cumin1001: dbctl commit (dc=all): 'es1034 (re)pooling @ 50%: Slowly pool es1034 after being recloned [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P13291 and previous config saved to /var/cache/conftool/dbconfig/20201117-074730-root.json
* 07:47 marostegui@cumin1001: dbctl commit (dc=all): 'es1033 (re)pooling @ 50%: Slowly pool es1033 after being recloned [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P13290 and previous config saved to /var/cache/conftool/dbconfig/20201117-074719-root.json
* 07:32 marostegui@cumin1001: dbctl commit (dc=all): 'es1034 (re)pooling @ 30%: Slowly pool es1034 after being recloned [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P13289 and previous config saved to /var/cache/conftool/dbconfig/20201117-073227-root.json
* 07:32 marostegui@cumin1001: dbctl commit (dc=all): 'es1033 (re)pooling @ 30%: Slowly pool es1033 after being recloned [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P13288 and previous config saved to /var/cache/conftool/dbconfig/20201117-073216-root.json
* 07:30 marostegui@cumin1001: dbctl commit (dc=all): 'es1019 (re)pooling @ 100%: Slowly pool es1019 after cloning es1034 [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P13287 and previous config saved to /var/cache/conftool/dbconfig/20201117-073057-root.json
* 07:30 marostegui@cumin1001: dbctl commit (dc=all): 'es1015 (re)pooling @ 100%: Slowly pool es1015 after cloning es1033 [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P13286 and previous config saved to /var/cache/conftool/dbconfig/20201117-073032-root.json
* 07:17 marostegui@cumin1001: dbctl commit (dc=all): 'es1034 (re)pooling @ 25%: Slowly pool es1034 after being recloned [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P13285 and previous config saved to /var/cache/conftool/dbconfig/20201117-071723-root.json
* 07:17 marostegui@cumin1001: dbctl commit (dc=all): 'es1033 (re)pooling @ 25%: Slowly pool es1033 after being recloned [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P13284 and previous config saved to /var/cache/conftool/dbconfig/20201117-071712-root.json
* 07:15 marostegui@cumin1001: dbctl commit (dc=all): 'es1019 (re)pooling @ 75%: Slowly pool es1019 after cloning es1034 [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P13283 and previous config saved to /var/cache/conftool/dbconfig/20201117-071553-root.json
* 07:15 marostegui@cumin1001: dbctl commit (dc=all): 'es1015 (re)pooling @ 75%: Slowly pool es1015 after cloning es1033 [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P13282 and previous config saved to /var/cache/conftool/dbconfig/20201117-071529-root.json
* 07:02 marostegui@cumin1001: dbctl commit (dc=all): 'es1034 (re)pooling @ 20%: Slowly pool es1034 after being recloned [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P13281 and previous config saved to /var/cache/conftool/dbconfig/20201117-070220-root.json
* 07:02 marostegui@cumin1001: dbctl commit (dc=all): 'es1033 (re)pooling @ 20%: Slowly pool es1033 after being recloned [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P13280 and previous config saved to /var/cache/conftool/dbconfig/20201117-070209-root.json
* 07:00 marostegui@cumin1001: dbctl commit (dc=all): 'es1019 (re)pooling @ 50%: Slowly pool es1019 after cloning es1034 [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P13278 and previous config saved to /var/cache/conftool/dbconfig/20201117-070050-root.json
* 07:00 marostegui: Stop mysql on db1124: s1 and s3, this will generate lag on enwiki and s3 on labsdb - [[phab:T267090|T267090]]
* 07:00 marostegui@cumin1001: dbctl commit (dc=all): 'es1015 (re)pooling @ 50%: Slowly pool es1015 after cloning es1033 [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P13277 and previous config saved to /var/cache/conftool/dbconfig/20201117-070025-root.json
* 06:51 marostegui: Upgrade db1077 and pc2010 to 10.4.17
* 06:47 marostegui@cumin1001: dbctl commit (dc=all): 'es1034 (re)pooling @ 10%: Slowly pool es1034 after being recloned [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P13276 and previous config saved to /var/cache/conftool/dbconfig/20201117-064716-root.json
* 06:47 marostegui@cumin1001: dbctl commit (dc=all): 'es1033 (re)pooling @ 10%: Slowly pool es1033 after being recloned [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P13275 and previous config saved to /var/cache/conftool/dbconfig/20201117-064705-root.json
* 06:45 marostegui@cumin1001: dbctl commit (dc=all): 'es1019 (re)pooling @ 25%: Slowly pool es1019 after cloning es1034 [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P13274 and previous config saved to /var/cache/conftool/dbconfig/20201117-064546-root.json
* 06:45 marostegui@cumin1001: dbctl commit (dc=all): 'es1015 (re)pooling @ 25%: Slowly pool es1015 after cloning es1033 [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P13273 and previous config saved to /var/cache/conftool/dbconfig/20201117-064522-root.json
* 06:39 marostegui@cumin1001: dbctl commit (dc=all): 'Pool es1034 with minimum weight on es3 [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P13272 and previous config saved to /var/cache/conftool/dbconfig/20201117-063933-marostegui.json
* 06:38 marostegui@cumin1001: dbctl commit (dc=all): 'Pool es1033 with minimum weight on es2 [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P13271 and previous config saved to /var/cache/conftool/dbconfig/20201117-063805-marostegui.json
* 06:30 marostegui@cumin1001: dbctl commit (dc=all): 'es1019 (re)pooling @ 10%: Slowly pool es1019 after cloning es1034 [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P13270 and previous config saved to /var/cache/conftool/dbconfig/20201117-063043-root.json
* 06:30 marostegui@cumin1001: dbctl commit (dc=all): 'es1015 (re)pooling @ 10%: Slowly pool es1015 after cloning es1033 [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P13269 and previous config saved to /var/cache/conftool/dbconfig/20201117-063019-root.json
* 02:37 dwisehaupt: shifted portion of thank you emails flowing through frmx's to 60% of the total volume
* 01:59 eileen_: civicrm revision is {{Gerrit|b6fe8bd791}}, config revision is {{Gerrit|61e2000391}}
== 2020-11-16 ==
* 23:28 mutante: cumin1001 - sudo systemctl start cumin-check-aliases (to confirm switching cron to timer worked) [[phab:T265138|T265138]]
* 22:22 otto@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
* 22:19 otto@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics' for release 'production' .
* 22:19 otto@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics' for release 'canary' .
* 22:17 otto@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
* 22:09 otto@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
* 22:06 mutante: planet - fixed updates of uk.planet which failed due to non-ASCII chars in a URL - since updates are systemd timers now that affects the entire systemd state monitoring
* 21:40 rzl@cumin1001: conftool action : set/pooled=yes; selector: name=mw2250.codfw.wmnet
* 21:40 rzl@cumin1001: conftool action : set/weight=1; selector: name=mw2250.codfw.wmnet,cluster=videoscaler,service=canary
* 21:38 rzl@cumin1001: conftool action : set/pooled=yes; selector: name=mw2250.codfw.wmnet,cluster=jobrunner
* 21:30 mutante: peek2001 - mv /var/lib/peek/git to git.old ; run puppet ; let it fix git checkout
* 21:07 rzl: disable puppet on jobrunners [[phab:T264991|T264991]]
* 20:40 mutante: planet1002/planet2002 - delete entire crontab of user planet, drop update cronjobs after switching to systemd timers with gerrit:636105 ([[phab:T265138|T265138]])
* 20:06 pt1979@cumin2001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 20:06 mutante: releases2002 systemctl reset-failed should clear Icinga systemd alert after gerrit:641228
* 20:05 dwisehaupt: disabling process-control jobs and moving to maintenance mode for maint window
* 19:57 pt1979@cumin2001: START - Cookbook sre.dns.netbox
* 19:53 ebernhardson@deploy1001: Finished deploy [wikimedia/discovery/analytics@4a953ca]: query_clicks_hourly: handle wmf.webrequest page_id change from int to bigint (duration: 02m 27s)
* 19:51 ebernhardson@deploy1001: Started deploy [wikimedia/discovery/analytics@4a953ca]: query_clicks_hourly: handle wmf.webrequest page_id change from int to bigint
* 19:48 effie: disable puppet on parsoid servers - [[phab:T264991|T264991]]
* 19:01 hnowlan@cumin1001: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0)
* 18:59 mutante: mw2255 - is pooled and puppet works on next run, after it removed php 7.2 config files
* 18:56 mutante: running puppet on mw2313 and mw2255 which were listed in puppetboard as failed puppet runs
* 18:15 rzl: disable puppet on 'A:mw-api and not A:mw-api-canary' [[phab:T264991|T264991]]
* 18:05 effie: disable puppet on all appservers
* 17:48 elukey: enable and run puppet on kafka-main2003 (it will start kafka services) - [[phab:T267865|T267865]]
* 17:42 dwisehaupt: frmon1001 upgraded to buster
* 17:36 volans: moved interfaces in Netbox from old to new switch - [[phab:T267865|T267865]]
* 17:24 vgutierrez: switching back from lvs2010 to lvs2007 - [[phab:T267865|T267865]]
* 17:21 vgutierrez: repooling cp2037 and cp2038 - [[phab:T267865|T267865]]
* 16:46 elukey@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
* 16:40 elukey@cumin1001: START - Cookbook sre.hosts.decommission
* 16:16 XioNoX: update c7 serial in row C VC config - [[phab:T267865|T267865]]
* 16:16 rzl: disable puppet on A:mw-api-canary [[phab:T264991|T264991]]
* 16:14 hnowlan@cumin1001: START - Cookbook sre.cassandra.roll-restart
* 16:08 effie: disable puppet in appservers canaries to install ICU 63 - [[phab:T264991|T264991]]
* 16:07 vgutierrez@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp2038.codfw.wmnet
* 16:07 vgutierrez@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp2037.codfw.wmnet
* 16:06 hnowlan@cumin1001: END (FAIL) - Cookbook sre.cassandra.roll-restart (exit_code=99)
* 16:03 hnowlan: joined maps2006 to maps codfw cassandra cluster
* 16:01 elukey@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
* 15:57 hnowlan@cumin1001: START - Cookbook sre.cassandra.roll-restart
* 15:57 hnowlan: roll-restarting eqiad restbase for java security updates
* 15:56 hnowlan@cumin1001: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0)
* 15:50 elukey@cumin1001: START - Cookbook sre.hosts.decommission
* 15:40 cdanis@cumin1001: END (PASS) - Cookbook sre.network.cf (exit_code=0)
* 15:40 cdanis@cumin1001: START - Cookbook sre.network.cf
* 14:16 hnowlan@cumin1001: START - Cookbook sre.cassandra.roll-restart
* 14:12 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool pc1007 in pc1 after restarting mysql [[phab:T266483|T266483]] (duration: 00m 59s)
* 14:06 marostegui: Restart pc1007's mysql [[phab:T266483|T266483]]
* 14:06 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool pc1007 and place pc1010 instead of it [[phab:T266483|T266483]] (duration: 01m 00s)
* 13:23 hnowlan@cumin1001: END (FAIL) - Cookbook sre.cassandra.roll-restart (exit_code=99)
* 13:00 kormat: running schema change against s1 in codfw [[phab:T259831|T259831]]
* 12:59 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 12:59 kormat@cumin1001: START - Cookbook sre.hosts.downtime
* 12:43 moritzm: installing tcpdump security updates
* 12:35 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 12:35 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime
* 12:25 hnowlan@cumin1001: START - Cookbook sre.cassandra.roll-restart
* 12:25 hnowlan: roll-restarting restbase-codfw
* 12:24 hnowlan@cumin1001: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0)
* 12:10 hnowlan@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 12:10 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime
* 11:49 hnowlan: roll restarting sessionstore for java updates
* 11:49 hnowlan@cumin1001: START - Cookbook sre.cassandra.roll-restart
* 11:13 moritzm: installing poppler security updates
* 10:46 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 10:46 klausman@cumin1001: START - Cookbook sre.hosts.downtime
* 10:45 dcaro@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 10:45 dcaro@cumin1001: START - Cookbook sre.hosts.downtime
* 10:44 dcaro@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 10:44 dcaro@cumin1001: START - Cookbook sre.hosts.downtime
* 09:31 gehel@cumin2001: END (FAIL) - Cookbook sre.elasticsearch.force-shard-allocation (exit_code=99)
* 09:31 gehel@cumin2001: START - Cookbook sre.elasticsearch.force-shard-allocation
* 08:39 godog: centrallog1001 move invalid config /etc/logrotate.d/logrotate-debug to /etc
* 08:35 moritzm: installing codemirror-js security updates
* 08:32 XioNoX: asw-c-codfw> request system power-off member 7 - [[phab:T267865|T267865]]
* 08:24 joal@deploy1001: Finished deploy [analytics/refinery@3df51cb] (thin): Analytics special train for webrequest table update THIN [analytics/refinery@3df51cb] (duration: 00m 07s)
* 08:23 joal@deploy1001: Started deploy [analytics/refinery@3df51cb] (thin): Analytics special train for webrequest table update THIN [analytics/refinery@3df51cb]
* 08:23 joal@deploy1001: Finished deploy [analytics/refinery@3df51cb]: Analytics special train for webrequest table update [analytics/refinery@3df51cb] (duration: 10m 09s)
* 08:13 joal@deploy1001: Started deploy [analytics/refinery@3df51cb]: Analytics special train for webrequest table update [analytics/refinery@3df51cb]
* 08:08 XioNoX: asw-c-codfw> request system power-off member 7 - [[phab:T267865|T267865]]
* 06:35 marostegui: Stop replication on s3 codfw master (db2105) for MCR schema change deployment [[phab:T238966|T238966]]
* 06:14 marostegui: Stop MySQL on es1018, es1015, es1019 to clone es1032, es1033, es1034 - [[phab:T261717|T261717]]
* 06:06 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es1018, es1015, es1019 - [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P13262 and previous config saved to /var/cache/conftool/dbconfig/20201116-060624-marostegui.json
* 06:02 marostegui: Restart mysql on db1115 (tendril/dbtree) due to memory usage
* 00:55 shdubsh: re-applied mask to kafka and kafka-mirror-main-eqiad_to_main-codfw@0 on kafka-main2003 and disabled puppet to prevent restart - [[phab:T267865|T267865]]
* 00:19 elukey: run 'systemctl mask kafka' and 'systemctl mask kafka-mirror-main-eqiad_to_main-codfw@0' on kafka-main2003 (for the brief moment when it was up) to avoid purged issues - [[phab:T267865|T267865]]
* 00:09 elukey: sudo cumin 'cp2028* or cp2036* or cp2039* or cp4022* or cp4025* or cp4028* or cp4031*' 'systemctl restart purged' -b 3 - [[phab:T267865|T267865]]
== 2020-11-15 ==
* 22:10 cdanis: restart some purgeds in ulsfo as well [[phab:T267865|T267865]] [[phab:T267867|T267867]]
* 22:03 cdanis: [[phab:T267867|T267867]] [[phab:T267865|T267865]] ✔️ cdanis@cumin1001.eqiad.wmnet ~ 🕔🍺 sudo cumin -b2 -s10 'A:cp and A:codfw' 'systemctl restart purged'
* 14:00 cdanis: powercycling ms-be1022 via mgmt
* 11:21 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 11:21 aborrero@cumin1001: START - Cookbook sre.hosts.downtime
* 11:12 vgutierrez: depooling lvs2007, lvs2010 taking over text traffic on codfw - [[phab:T267865|T267865]]
* 10:00 elukey: cumin 'cp2042* or cp2036* or cp2039*' 'systemctl restart purged' -b 1
* 09:57 elukey: restart purged on cp4028 (consumer stuck due to kafka-main2003 down)
* 09:55 elukey: restart purged on cp4025 (consumer stuck due to kafka-main2003 down)
* 09:53 elukey: restart purged on cp4031 (consumer stuck due to kafka-main2003 down)
* 09:50 elukey: restart purged on cp4022 (consumer stuck due to kafka-main2003 down)
* 09:42 elukey: restart purged on cp2028 (kafka-main2003 is down and there are connect timeouts errors)
* 09:07 Urbanecm: Change email for SUL user Botopol via resetUserEmail.php ([[phab:T267866|T267866]])
* 08:27 elukey: truncate -s 10g /var/lib/hadoop/data/n/yarn/logs/application_1601916545561_173219/container_e25_1601916545561_173219_01_000177/stderr on an-worker1100
* 08:24 elukey: sudo truncate -s 10g /var/lib/hadoop/data/c/yarn/logs/application_1601916545561_173219/container_e25_1601916545561_173219_01_000019/stderr on an-worker1098
== 2020-11-13 ==
* 22:06 Urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript emptyUserGroup.php --wiki=myvwiki autopatrolled # [[phab:T105570|T105570]]
* 22:04 Urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript emptyUserGroup.php --wiki=testwiki editor # [[phab:T105570|T105570]]
* 21:42 Urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript emptyUserGroup.php --wiki=enwikinews reviewer # [[phab:T105570|T105570]]
* 21:40 Urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript emptyUserGroup.php --wiki=bnwiki editor # [[phab:T105570|T105570]]
* 21:39 Urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript emptyUserGroup.php --wiki=testwiki flood # [[phab:T105570|T105570]]
* 21:38 Urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript emptyUserGroup.php --wiki=test2wiki upwizcampeditors # [[phab:T105570|T105570]]
* 21:33 Urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript emptyUserGroup.php --wiki=aawiki communityapplica # [[phab:T105570|T105570]]
* 21:28 Urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript emptyUserGroup.php --wiki=enwiki epadmin # [[phab:T105570|T105570]]
* 16:50 _joe_: manually rotate user.log on centrallog1001 and moved it to /srv/user.log.manual-rotation
* away: updated fundraising CiviCRM from {{Gerrit|f7954c6659}} to {{Gerrit|74d795408f}}
* 08:15 vgutierrez: restart acme-chief on acmechief1001
* 01:30 TimStarling: on mwmaint1002 running fixT260485.php unmerged fixup script from https://gerrit.wikimedia.org/r/c/mediawiki/extensions/WikimediaMaintenance/+/640348
== 2020-11-12 ==
* 19:12 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|0f0f8397424d4337cdcd61f7acb276d4f0b1facd}}: Enable "Cite" button in toolbar for enwiktionary ([[phab:T267504|T267504]]) (duration: 00m 58s)
* 19:08 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|3ce18e6f63abe060c05c40239b651086f65a1a33}}: Add artsdatabanken.no to the wgCopyUploadsDomains allowlist of Wikimedia Commons ([[phab:T267784|T267784]]) (duration: 01m 00s)
* 16:12 Urbanecm: Start of `mwscript extensions/AbuseFilter/maintenance/updateVarDumps.php --wiki=$wiki --print-orphaned-records-to=/tmp/urbanecm/$wiki-orphaned.log --progress-markers > $wiki.log` in a tmux at mwmaint1002 (wiki=jawiki; [[phab:T246539|T246539]])
* 16:11 Urbanecm: End of `mwscript extensions/AbuseFilter/maintenance/updateVarDumps.php --wiki=$wiki --print-orphaned-records-to=/tmp/urbanecm/$wiki-orphaned.log --progress-markers > $wiki.log` in a tmux session updateVarDumps at mwmaint1002 (wiki=cswiki; [[phab:T246539|T246539]])
* 13:38 Urbanecm: Start of `mwscript extensions/AbuseFilter/maintenance/updateVarDumps.php --wiki=$wiki --print-orphaned-records-to=/tmp/urbanecm/$wiki-orphaned.log --progress-markers > $wiki.log` in a tmux session updateVarDumps at mwmaint1002 (wiki=cswiki; [[phab:T246539|T246539]])
* 11:40 mvolz@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'citoid' for release 'production' .
* 11:35 mvolz@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'citoid' for release 'production' .
* 11:30 mvolz@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'citoid' for release 'staging' .
* 11:12 mvolz@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'zotero' for release 'production' .
* 11:08 mvolz@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'zotero' for release 'production' .
* 11:02 mvolz@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'zotero' for release 'staging' .
* 09:19 hashar@deploy1001: Synchronized php-1.36.0-wmf.16/includes/filerepo: Revert "filerepo: clean up shared cache keys to avoid key metrics clutter" - [[phab:T267668|T267668]] (duration: 01m 01s)
* 09:12 hashar: Pulled https://gerrit.wikimedia.org/r/640746 on deployment server for # [[phab:T267668|T267668]]
* 03:46 ejegg: updated python fundraising tools from {{Gerrit|7853f426ee}} to {{Gerrit|68e054c9ad}}
== 2020-11-11 ==
* 16:44 XioNoX: Revert "temporarily route Italy to codfw"
* 16:38 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.cf (exit_code=0)
* 16:38 ayounsi@cumin1001: START - Cookbook sre.network.cf
* 16:30 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.cf (exit_code=0)
* 16:30 ayounsi@cumin1001: START - Cookbook sre.network.cf
* 15:52 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.cf (exit_code=0)
* 15:52 ayounsi@cumin1001: START - Cookbook sre.network.cf
* 14:29 akosiaris@cumin1001: conftool action : set/pooled=yes; selector: name=cp3054.esams.wmnet
* 13:52 akosiaris@cumin1001: conftool action : set/pooled=no; selector: name=cp3054.esams.wmnet
* 12:25 Lucas_WMDE: EU backport&config window done
* 12:23 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/Wikibase.php: Config: [[gerrit:640676{{!}}Remove propagateChangeVisibility repo setting]] (duration: 00m 58s)
* 12:16 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:636453{{!}}Enable propagatePageDeletion on Wikidata]] (duration: 00m 59s)
* 12:11 lucaswerkmeister-wmde@deploy1001: Synchronized php-1.36.0-wmf.16/extensions/DiscussionTools/includes/CommentParser.php: Backport: [[gerrit:640497{{!}}Fix getHeadlineNodeAndOffset() returning text nodes (T267284)]] (duration: 01m 01s)
* 10:34 XioNoX: delete unused interfaces from asw-d-codfw
* 09:53 XioNoX: prioritized DE-CIX IXP - [[phab:T262681|T262681]]
* 02:18 ryankemper: (WDQS deploy completed)
* 00:48 ryankemper: Restarting `wdqs-categories` one host at a time across all wdqs production instances: `sudo -E cumin -b 1 'A:wdqs-all and not A:wdqs-test' 'depool && sleep 60 && systemctl restart wdqs-categories && sleep 30 && pool'`
* 00:47 ryankemper: Restarted `wdqs-categories` across wdqs test hosts: `sudo -E cumin 'A:wdqs-test' 'systemctl restart wdqs-categories'`
* 00:47 ryankemper: Restarted `wdqs-updater` simultaneously across all wdqs hosts: `sudo -E cumin -b 4 'A:wdqs-all' 'systemctl restart wdqs-updater'`
* 00:47 ryankemper: [wdqs deploy] following deploy, example query succeeds on `query.wikidata.org`, proceeding to post deploy steps
* 00:46 ryankemper@deploy1001: Finished deploy [wdqs/wdqs@03219df]: 0.3.55 (duration: 11m 24s)
* 00:46 ryankemper: [[phab:T222669|T222669]] [Elasticsearch reindex] Began long-running reindex of cirrus elasticsearch for `codfw`, `eqiad`, and `cloudelastic`. 3 tmux sessions on `ryankemper@mwmaint1002`: `reindex_eqiad`, `reindex_codfw`, `reindex_cloudelastic`
* 00:38 ryankemper: Following deploy to canary `wdqs1003`, automated tests are passing as is a manual test of an example query. Proceeding...
* 00:34 ryankemper@deploy1001: Started deploy [wdqs/wdqs@03219df]: 0.3.55
* 00:32 ryankemper: About to begin wdqs deploy; before-deploy tests on canary `wdqs1003` are passing
* 00:09 eileen: civicrm revision changed from {{Gerrit|d0cd7f6dbb}} to {{Gerrit|e5d12cc46c}}, config revision is {{Gerrit|e2d133eff4}}
== 2020-11-10 ==
* 22:14 mholloway-shell@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'mobileapps' for release 'nontls' .
* 22:14 mholloway-shell@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
* 22:08 mholloway-shell@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'mobileapps' for release 'nontls' .
* 22:08 mholloway-shell@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
* 22:05 mholloway-shell@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'mobileapps' for release 'staging' .
* 21:59 mholloway-shell@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
* 21:58 jgleeson: update civicrm revision changed from {{Gerrit|c36a5cc1b1}} to {{Gerrit|d0cd7f6dbb}}
* 21:57 mholloway-shell@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
* 21:55 mholloway-shell@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'wikifeeds' for release 'staging' .
* 21:47 ebernhardson: unban elastic1050 from eqiad search psi cluster
* 21:28 cstone: civicrm revision changed from {{Gerrit|b1342c4129}} to {{Gerrit|c36a5cc1b1}}
* 21:24 brennen@deploy1001: sync-file aborted: Testing: README.md sync-file with ssh -n for [[phab:T223287|T223287]] (duration: 00m 37s)
* 21:23 brennen: testing some scap operations, modified to use ssh -n for debugging [[phab:T223287|T223287]]
* 21:11 ebernhardson: ban elastic1050 from eqiad psi cluster due to excessive load
* 21:02 brennen@deploy1001: Finished scap: Backport: [[gerrit:640487{{!}}language: Honor $wgTranslateNumerals, even if PHP does digit translation(T267614)]] and [[gerrit:640488{{!}}Downgrade the severity of the non-numeric argument to formatNum warnings (T267370, T267587)]] (duration: 34m 46s)
* 20:27 brennen@deploy1001: Started scap: Backport: [[gerrit:640487{{!}}language: Honor $wgTranslateNumerals, even if PHP does digit translation(T267614)]] and [[gerrit:640488{{!}}Downgrade the severity of the non-numeric argument to formatNum warnings (T267370, T267587)]]
* 20:10 brennen@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:640254{{!}}Turn on formatnum logging (T267587, T267370)]] (duration: 01m 02s)
* 19:06 hknust: holger mwmaint1002 Stop [[phab:T219279|T219279]]
* 18:31 hknust: holger mwmaint1002 Start [[phab:T219279|T219279]]
* 17:57 effie: pool mw1263 mw1264
* 17:31 effie: briefly depool mw1263 and mw1264
* 17:30 jynus: about to shutdown db1139 for hw maintenance [[phab:T261405|T261405]]
* 17:13 dwisehaupt: upping thank you mail flow through frmx's to 30% of the total runs
* 16:32 XioNoX: add cloud-storage1-b-codfw to, well, codfw switches - [[phab:T267378|T267378]]
* 16:20 effie: pool mw1263
* 16:17 hashar: Restarting Gerrit on gerrit1001
* 16:12 hashar: Restarted Gerrit on gerrit2001 for config change
* 15:53 zpapierski@deploy1001: Finished deploy [wikimedia/discovery/analytics@1ab89ed]: Deploying venv workaround for Debian 9 (duration: 01m 06s)
* 15:52 zpapierski@deploy1001: Started deploy [wikimedia/discovery/analytics@1ab89ed]: Deploying venv workaround for Debian 9
* 15:38 moritzm: installing 4.19.152 kernel packages on buster hosts (only installing the package, reboots will happen separately)
* 15:28 effie: depool mw1263 - [[phab:T244340|T244340]]
* 15:09 ejegg: updated fundraising python tools from {{Gerrit|087a596d3a}} to {{Gerrit|7853f426ee}}
* 14:21 effie: pooling mw1276 - [[phab:T244340|T244340]]
* 13:51 moritzm: imported php-memcached 3.0.1+2.2.0-1~wmf3+buster1  to component/php72 for buster-wikimedia
* 13:29 marostegui: Restart db2093 to pick up report_host - [[phab:T266483|T266483]]
* 13:17 marostegui: Restart db1117* to pick up report_host - [[phab:T266483|T266483]]
* 12:46 effie: depool mw1276 to install onhost memcached - [[phab:T244340|T244340]]
* 12:33 Lucas_WMDE: EU backport&config window done
* 12:33 moritzm: installing wireshark security updates
* 12:31 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/CommonSettings.php: Config: [[gerrit:636095{{!}}Switch parser cache to using "mcrouter-with-onhost-tier" (T264604)]] (duration: 00m 57s)
* 12:23 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/mc.php: Config: [[gerrit:636094{{!}}Add "mcrouter-with-onhost-tier" entry to $wgObjectCaches (T264604)]] (duration: 00m 57s)
* 12:04 lucaswerkmeister-wmde@deploy1001: Synchronized php-1.36.0-wmf.16/extensions/Wikibase: Backport: [[gerrit:639035{{!}}Revert JS parser commits (T266671)]] (duration: 01m 04s)
* 08:59 hashar: Restarted Gerrit for plugins deployment
* 08:06 hashar: Restarting Gerrit on gerrit2001 / gerrit-replica
* 08:04 hashar@deploy1001: Finished deploy [gerrit/gerrit@5a41181]: jmx and prometheus metrics reporters - [[phab:T184086|T184086]] (duration: 00m 10s)
* 08:04 hashar@deploy1001: Started deploy [gerrit/gerrit@5a41181]: jmx and prometheus metrics reporters - [[phab:T184086|T184086]]
* 07:40 elukey: import hue_4.8.0-2 to buster-wikimedia
* 06:53 marostegui: Restart dbstore* to pick up report_host - [[phab:T266483|T266483]]
* 06:44 marostegui: Restart pc1010 to pick up report_host - [[phab:T266483|T266483]]
== 2020-11-09 ==
* 22:26 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 22:24 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 21:14 mbsantos@deploy1001: Finished deploy [tilerator/deploy@97575e4]: Add new target for beta environment and clean-up old envs ([[phab:T222377|T222377]]) (duration: 02m 23s)
* 21:11 mbsantos@deploy1001: Started deploy [tilerator/deploy@97575e4]: Add new target for beta environment and clean-up old envs ([[phab:T222377|T222377]])
* 20:53 cdanis@cumin1001: conftool action : set/pooled=inactive; selector: name=maps2002.*
* 20:36 cdanis: depool maps2002
* 20:26 mbsantos@deploy1001: Finished deploy [kartotherian/deploy@0a38bc5]: Add new target for beta environment and clean-up old envs ([[phab:T223041|T223041]] [[phab:T222377|T222377]] [[phab:T255932|T255932]]) (duration: 01m 09s)
* 20:25 mbsantos@deploy1001: Started deploy [kartotherian/deploy@0a38bc5]: Add new target for beta environment and clean-up old envs ([[phab:T223041|T223041]] [[phab:T222377|T222377]] [[phab:T255932|T255932]])
* 20:24 mbsantos@deploy1001: Finished deploy [kartotherian/deploy@0a38bc5]: Add new target for beta environment and clean-up old envs ([[phab:T223041|T223041]] [[phab:T222377|T222377]] [[phab:T255932|T255932]]) (duration: 11m 36s)
* 20:13 mbsantos@deploy1001: Started deploy [kartotherian/deploy@0a38bc5]: Add new target for beta environment and clean-up old envs ([[phab:T223041|T223041]] [[phab:T222377|T222377]] [[phab:T255932|T255932]])
* 20:12 brennen@deploy1001: rebuilt and synchronized wikiversions files: all wikis to 1.36.0-wmf.16
* 20:04 mbsantos@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
* 20:01 mbsantos@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
* 19:58 mbsantos@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'wikifeeds' for release 'staging' .
* 18:32 mepps: updated payments-wiki from {{Gerrit|388490e86d}} to {{Gerrit|8612ed1002}}, config revision is {{Gerrit|987e839869}}
* 17:53 XioNoX: re-order asw-d-codfw interfaces-ranges
* 17:51 XioNoX: standardize asw-d-codfw interfaces descriptions
* 17:33 effie: updating mwdebug2002 to ICU 63 - [[phab:T264991|T264991]]
* 17:04 ayounsi@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 16:57 brennen@deploy1001: Synchronized php: group1 wikis to 1.36.0-wmf.16 (duration: 01m 05s)
* 16:57 ayounsi@cumin1001: START - Cookbook sre.dns.netbox
* 16:56 brennen@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.36.0-wmf.16
* 16:48 ayounsi@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
* 16:45 ayounsi@cumin1001: START - Cookbook sre.dns.netbox
* 16:40 moritzm: imported 2.0.2+0.5.7-1~wmf3+php72+buster1 to component/php72 for buster-wikimedia
* 16:34 Urbanecm: Start of `mwscript extensions/AbuseFilter/maintenance/updateVarDumps.php --wiki=$wiki --print-orphaned-records-to=/tmp/urbanecm/$wiki-orphaned.log --progress-markers > $wiki.log` in a tmux session at mwmaint1002 (wiki=trwiki; [[phab:T246539|T246539]])
* 16:34 Urbanecm: End of `mwscript extensions/AbuseFilter/maintenance/updateVarDumps.php --wiki=$wiki --print-orphaned-records-to=/tmp/urbanecm/$wiki-orphaned.log --progress-markers > $wiki.log` in a tmux session at mwmaint1002 (wiki=kowiki; [[phab:T246539|T246539]])
* 16:20 XioNoX: Netbox prod: mass import from PuppetDB (cables, etc) - [[phab:T262899|T262899]]
* 16:04 pt1979@cumin2001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 15:55 pt1979@cumin2001: START - Cookbook sre.dns.netbox
* 15:12 urbanecm@deploy1001: Synchronized wmf-config/abusefilter.php: {{Gerrit|62c2e02f836095ba7e8c7b80d97a52aee885b619}}: abusefilter.php: Enable wgAbuseFilterNotificationsPrivate by default for WMF wikis ([[phab:T266298|T266298]]) (duration: 01m 07s)
* 14:34 hashar: Restarting Gerrit
* 14:07 hashar@deploy1001: Finished deploy [gerrit/gerrit@0a803e2]: Upgrade javamelody to 1.86.0 # [[phab:T232678|T232678]] (duration: 00m 18s)
* 14:07 hashar@deploy1001: Started deploy [gerrit/gerrit@0a803e2]: Upgrade javamelody to 1.86.0 # [[phab:T232678|T232678]]
* 14:05 ayounsi@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 14:03 ayounsi@cumin1001: START - Cookbook sre.dns.netbox
* 14:03 Urbanecm: Start of `mwscript extensions/AbuseFilter/maintenance/updateVarDumps.php --wiki=$wiki --print-orphaned-records-to=/tmp/urbanecm/$wiki-orphaned.log --progress-markers > $wiki.log` in a tmux session at mwmaint1002 (wiki=kowiki; [[phab:T246539|T246539]])
* 14:01 ayounsi@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 13:59 ayounsi@cumin1001: START - Cookbook sre.dns.netbox
* 13:55 ayounsi@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
* 13:49 ayounsi@cumin1001: START - Cookbook sre.dns.netbox
* 13:44 ayounsi@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
* 13:40 ayounsi@cumin1001: START - Cookbook sre.dns.netbox
* 12:13 Urbanecm: urbanecm@mwmaint1002:~$ mwscript namespaceDupes.php --wiki=zhwikinews --fix --add-prefix=BROKEN # [[phab:T266925|T266925]]
* 12:10 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|11b8f6236d159962bdebccd6dcacb72e600ec6b5}}: Add wgNamespaceAliases for zhwikinews ([[phab:T266925|T266925]]) (duration: 01m 06s)
* 12:06 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|87b3eede24fb407ddd226ad65817ab8adf44aeb8}}: Enable DiscussionTools as a beta feature on fiwiki ([[phab:T265446|T265446]]) (duration: 01m 06s)
* 11:58 moritzm: installing remaining openldap updates on stretch
* 11:57 jynus: restart dbstore1004 mariadb instances
* 10:49 ayounsi@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 10:46 ayounsi@cumin1001: START - Cookbook sre.dns.netbox
* 10:36 XioNoX: add 185.15.56.240/29 IPs to relevant cloudsw interfaces - [[phab:T265288|T265288]]
* 10:35 effie: merging 638109 and roll restart ms-fe* hosts to pick up the change
* 10:11 XioNoX: renumber cloud-xlink1-eqiad
* 09:56 Urbanecm: Purge https://vote.wikimedia.org/wiki/Main_Page ([[phab:T262689|T262689]])
* 09:54 Urbanecm: Start of `mwscript extensions/AbuseFilter/maintenance/updateVarDumps.php --wiki=$wiki --print-orphaned-records-to=/tmp/urbanecm/$wiki-orphaned.log --progress-markers > $wiki.log` in a tmux session at mwmaint1002 (wiki=svwiki; [[phab:T246539|T246539]])
* 09:52 hashar: Restarting Gerrit on gerrit1001 and gerrit2001  in order to have the JVM to exit after OutOfMemory  # [[phab:T267517|T267517]]
* 09:50 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|7b0a81f4294dcedfd5736884900cb561de9a080e}}: Revert "Change votewiki language temporarily to fa for fawiki elections" ([[phab:T262689|T262689]]) (duration: 01m 08s)
* 09:37 moritzm: installing libexif security updates
* 09:06 godog: enable thanos query-frontend on thanos-fe hosts - [[phab:T261281|T261281]]
* 08:24 XioNoX: configure traceoptions on pfw3-eqiad - [[phab:T263833|T263833]]
* 08:11 hashar: Restarting Gerrit on gerrit1001 and gerrit2001
* 07:58 hashar: Restarted CI Jenkins on contint2001 for Java upgrade
* 07:17 elukey: restart gerrit on gerrit2001 (OOM registered for two days ago, uptime from systemctl since a month ago, probably in a weird state)
* 01:35 tstarling@deploy1001: Synchronized php-1.36.0-wmf.14/tests/phpunit/maintenance/categoryChangesAsRdfTest.php: this was cherry-picked to make CI pass, pushing it out just for a clean staging dir (duration: 01m 06s)
* 01:32 tstarling@deploy1001: Synchronized php-1.36.0-wmf.14/resources/src/mediawiki.api/upload.js: fixing UBN [[phab:T266903|T266903]] (duration: 01m 06s)
* 01:30 tstarling@deploy1001: Synchronized php-1.36.0-wmf.14/resources/src/mediawiki.Upload.js: fixing UBN [[phab:T266903|T266903]] (duration: 01m 07s)
* 01:29 tstarling@deploy1001: sync-file aborted: fixing UBN [[phab:T266903|T266903]] (duration: 00m 01s)
== 2020-11-08 ==
* 23:08 tstarling@deploy1001: Synchronized php-1.36.0-wmf.16/resources/src/mediawiki.api/upload.js: fixing UBN [[phab:T266903|T266903]] (duration: 01m 06s)
* 23:06 tstarling@deploy1001: Synchronized php-1.36.0-wmf.16/resources/src/mediawiki.Upload.js: fixing UBN [[phab:T266903|T266903]] (duration: 01m 35s)
* 20:34 cdanis: repool esams
* 19:48 cdanis@cumin1001: END (PASS) - Cookbook sre.network.cf (exit_code=0)
* 19:48 cdanis@cumin1001: START - Cookbook sre.network.cf
* 19:16 cdanis: depool esams
* 18:35 cdanis@cumin1001: END (PASS) - Cookbook sre.network.cf (exit_code=0)
* 18:35 cdanis@cumin1001: START - Cookbook sre.network.cf
== 2020-11-06 ==
* 23:38 dwisehaupt: frdata1001 upgraded to buster
* 22:40 ebernhardson@deploy1001: Finished deploy [wikimedia/discovery/analytics@bfaac0f]: Update to master, primarily updates for ores weekly predictions handling (duration: 01m 08s)
* 22:39 ebernhardson@deploy1001: Started deploy [wikimedia/discovery/analytics@bfaac0f]: Update to master, primarily updates for ores weekly predictions handling
* 22:29 ebernhardson@deploy1001: Finished deploy [wikimedia/discovery/analytics@dc63e7e]: Update to master, primarily updates for ores weekly predictions handling (duration: 00m 26s)
* 22:29 ebernhardson@deploy1001: Started deploy [wikimedia/discovery/analytics@dc63e7e]: Update to master, primarily updates for ores weekly predictions handling
* 20:57 reedy@deploy1001: Synchronized php-1.36.0-wmf.16/skins/CologneBlue/: [[phab:T267278|T267278]] (duration: 01m 05s)
* 20:56 reedy@deploy1001: Synchronized php-1.36.0-wmf.14/skins/CologneBlue/: [[phab:T267278|T267278]] (duration: 01m 10s)
* 20:07 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 20:05 robh@cumin1001: START - Cookbook sre.hosts.downtime
* 18:54 cwhite@cumin1001: conftool action : set/pooled=no; selector: name=mw1379.eqiad.wmnet
* 17:02 dwisehaupt: rolled out new thank_you_mail_send process_control scripts to utilize frmx hosts
* 16:45 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 16:44 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime
* 16:44 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 16:44 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime
* 16:44 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 16:44 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime
* 16:20 hnowlan@puppetmaster1001: conftool action : set/pooled=yes:weight=10; selector: dc=codfw,cluster=maps,service=kartotherian,name=maps2005.codfw.wmnet
* 14:46 moritzm: installing wireshark security updates
* 14:36 hnowlan: resyncing database on maps1001
* 14:25 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 14:24 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime
* 14:05 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 14:03 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 14:01 elukey@cumin1001: START - Cookbook sre.hosts.downtime
* 14:01 elukey@cumin1001: START - Cookbook sre.hosts.downtime
* 13:05 hnowlan: started cassandra bootstrap of maps2005
* 11:52 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 11:50 elukey@cumin1001: START - Cookbook sre.hosts.downtime
* 11:49 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 11:47 elukey@cumin1001: START - Cookbook sre.hosts.downtime
* 11:30 hnowlan: joining maps2005 to cassandra cluster
* 11:24 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 11:22 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 11:20 elukey@cumin1001: START - Cookbook sre.hosts.downtime
* 11:19 elukey@cumin1001: START - Cookbook sre.hosts.downtime
* 11:09 moritzm: uploaded openjdk-8  8u272-b10-1~deb10u1 to buster-wikimedia/component/jdk
* 10:54 elukey@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 10:52 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 10:49 elukey@cumin1001: START - Cookbook sre.hosts.downtime
* 10:49 elukey@cumin1001: START - Cookbook sre.hosts.downtime
* 10:06 dcausse: restarted elastic on elastic1063 ([[phab:T265113|T265113]])
* 09:57 moritzm: installing spice security updates
* 09:32 moritzm: installing libsndfile security updates
* 09:15 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 09:13 elukey@cumin1001: START - Cookbook sre.hosts.downtime
* 09:12 elukey@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 09:12 elukey@cumin1001: START - Cookbook sre.hosts.downtime
* 08:14 moritzm: installing openldap security updates on stretch/buster (client-side tools/libs only, slapd updates already deployed)
* 04:38 ryankemper: [Deploy finished] WDQS deploy is complete; the service is healthy per https://grafana.wikimedia.org/d/000000489/wikidata-query-service?orgId=1&from=1604633917530&to=1604637475930
* 04:36 ryankemper: Finished restarting wdqs categories one host at a time across all wdqs production instances
* 04:02 ryankemper: Restarting wdqs categories one host at a time across all wdqs production instances: `sudo -E cumin -b 1 'A:wdqs-all and not A:wdqs-test' 'depool && sleep 60 && systemctl restart wdqs-categories && sleep 30 && pool'` (in progress)
* 04:01 ryankemper: Restarted wdqs categories across test hosts: `sudo -E cumin 'A:wdqs-test' 'systemctl restart wdqs-categories'`
* 04:01 ryankemper: Restarted wdqs updater across all hosts: `sudo -E cumin -b 4 'A:wdqs-all' 'systemctl restart wdqs-updater'`
* 04:00 ryankemper: `query.wikidata.org` looks good following deploy, proceeding to post-deploy steps
* 03:59 ryankemper@deploy1001: Finished deploy [wdqs/wdqs@27a5c54]: 0.3.54 (duration: 11m 22s)
* 03:51 ryankemper: Tests passing on canary `wdqs1003` following initial deployment, proceeding with deploy to rest of fleet
* 03:48 ryankemper@deploy1001: Started deploy [wdqs/wdqs@27a5c54]: 0.3.54
* 03:48 ryankemper: About to begin wdqs deploy, tests passing on canary `wdqs1003`
* 00:53 brennen@deploy1001: Finished scap: Synchronizing to pick up i18n for [[gerrit:639505]]. Will resume moving train to group1 on Monday morning (US) ([[phab:T263182|T263182]]) (duration: 69m 02s)
== 2020-11-05 ==
* 23:44 brennen@deploy1001: Started scap: Synchronizing to pick up i18n for [[gerrit:639505]]. Will resume moving train to group1 on Monday morning (US) ([[phab:T263182|T263182]])
* 23:38 brennen@deploy1001: Synchronized php-1.36.0-wmf.16/includes/media/FormatMetadata.php: Backport: [[gerrit:639505{{!}}media: Support GPSAltitudeRef exif tag - FormatMetData.php (T267370)]] (duration: 07m 22s)
* 23:29 brennen@deploy1001: Synchronized php-1.36.0-wmf.16/languages/i18n/exif: Backport: [[gerrit:639505{{!}}media: Support GPSAltitudeRef exif tag - i18n/exif files (T267370)]] (duration: 01m 08s)
* 23:09 brennen@deploy1001: Synchronized php-1.36.0-wmf.16/vendor: Backport: [[gerrit:639504{{!}}Bump wikimedia/parsoid to 0.13.0-a16 (T267146)]] (duration: 01m 14s)
* 20:54 hnowlan: reenabled tilerator in eqiad
* 20:47 brennen@deploy1001: rebuilt and synchronized wikiversions files: Revert group1 wikis to 1.36.0-wmf.14
* 20:44 brennen@deploy1001: Synchronized php: group1 wikis to 1.36.0-wmf.16 (duration: 01m 39s)
* 20:42 brennen@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.36.0-wmf.16
* 20:39 hnowlan: finished removenode of maps2002 cassandra
* 20:22 brennen: train: waiting ~15 minutes before rolling forward to group1.
* 20:19 brennen@deploy1001: rebuilt and synchronized wikiversions files: group0 wikis to 1.36.0-wmf.16
* 20:15 brennen@deploy1001: Synchronized php-1.36.0-wmf.16/extensions/CentralAuth/includes/specials/SpecialCentralAuth.php: Backport: [[gerrit:639500{{!}}Dont double-format numeric edit count (T267362)]] (duration: 01m 06s)
* 19:44 Urbanecm: Morning B&C window done
* 19:44 urbanecm@deploy1001: Synchronized php-1.36.0-wmf.16/extensions/GrowthExperiments/modules/homepage/: {{Gerrit|81cb1c7b141d49d7fc931fdc13ffd1b48b3a25ab}}: Suggested edits: Export task count from start editing dialog ([[phab:T266868|T266868]]; [[phab:T263040|T263040]]) (duration: 01m 07s)
* 19:16 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|453b9c64c44a256eafdfafe7a0023484377bbbd2}}: Fix DiscussionTools wikis config for thwiki/tgwiki ([[phab:T266303|T266303]]) (duration: 01m 08s)
* 18:32 razzi: shutting down kafka-jumbo1005 to allow dcops to upgrade NIC
* 17:52 akosiaris: restart uwsgi-ores in all ores1* nodes per complaint on IRC that max redis clients have been reached [[phab:T263910|T263910]]
* 17:51 brennen@deploy1001: rebuilt and synchronized wikiversions files: Revert group0 wikis to 1.36.0-wmf.14
* 17:48 razzi: shutting down kafka-jumbo1004 to allow dcops to upgrade NIC
* 17:46 brennen@deploy1001: rebuilt and synchronized wikiversions files: group0 wikis to 1.36.0-wmf.16
* 17:41 brennen: train is currently unblocked; rolling to group0 ([[phab:T263182|T263182]])
* 17:33 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 17:33 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime
* 17:32 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 17:32 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime
* 17:26 brennen@deploy1001: Synchronized php-1.36.0-wmf.16/languages: Backport: [[gerrit:639491{{!}}language: Clean up $separatorTransformTable in km/la/my (T267091)]] (duration: 01m 12s)
* 17:21 brennen@deploy1001: Synchronized php-1.36.0-wmf.16/resources/Resources.php: Backport: [[gerrit:639495{{!}}mediawiki.action.edit.preview: Add versionCallback to improve startup perf (T266311)]] (duration: 01m 10s)
* 17:15 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: dc=codfw,cluster=maps,service=kartotherian,name=maps2002.codfw.wmnet
* 17:14 hnowlan: rebuilding cassandra on maps2002
* 17:14 jayme: imported kubernetes 1.16.15 to component/kubernetes-future stretch-wikimedia
* 17:05 hnowlan: restarting maps2004 postgres for config change
* 17:05 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 17:05 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime
* 16:57 razzi: shutting down kafka-jumbo1003 to allow dcops to upgrade NIC
* 16:26 razzi: shutting down kafka-jumbo1002 to allow dcops to upgrade NIC
* 15:53 elukey@cumin1001: END (PASS) - Cookbook sre.aqs.roll-restart (exit_code=0)
* 15:50 elukey@cumin1001: START - Cookbook sre.aqs.roll-restart
* 15:41 moritzm: installing junit4 security updates
* 14:55 elukey: shutdown kafka-jumbo1001 to swap NICs (1g -> 10g)
* 14:45 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 14:45 kormat@cumin1001: START - Cookbook sre.hosts.downtime
* 14:10 jbond42: enable puppet fleet wide to post restart puppetdb
* 14:01 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 14:01 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime
* 13:57 jbond42: disable puppet fleet wide to restart puppetdb
* 13:26 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 13:25 kormat@cumin1001: START - Cookbook sre.hosts.downtime
* 13:08 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 13:08 kormat@cumin1001: START - Cookbook sre.hosts.downtime
* 12:54 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 12:54 kormat@cumin1001: START - Cookbook sre.hosts.downtime
* 12:52 jbond42: upgrade freetype on jessie
* 12:50 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 12:49 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime
* 12:44 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 12:44 kormat@cumin1001: START - Cookbook sre.hosts.downtime
* 12:34 root@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 12:34 root@cumin1001: START - Cookbook sre.hosts.downtime
* 12:09 marostegui: Upgrade mysql on pc2010
* 11:58 jynus: shutting down db1139 in preparation of maintenance [[phab:T261405|T261405]]
* 11:55 marostegui: Upgrade mysql on db1077
* 11:42 marostegui@cumin1001: dbctl commit (dc=all): 'Promote es1012 to es1 master, es1011 to es2 master, es1014 to es3 (this is a noop) [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P13230 and previous config saved to /var/cache/conftool/dbconfig/20201105-114223-marostegui.json
* 11:41 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 11:41 kormat@cumin1001: START - Cookbook sre.hosts.downtime
* 11:27 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 11:27 kormat@cumin1001: START - Cookbook sre.hosts.downtime
* 11:12 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 11:12 kormat@cumin1001: START - Cookbook sre.hosts.downtime
* 11:05 Urbanecm: Start of `mwscript extensions/AbuseFilter/maintenance/updateVarDumps.php --wiki=$wiki --print-orphaned-records-to=/tmp/urbanecm/$wiki-orphaned.log --progress-markers > $wiki.log` in a tmux session updateVarDumps at mwmaint1002 (wiki=dewiki; [[phab:T246539|T246539]])
* 10:55 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 10:55 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime
* 10:16 godog: grafana-rw.wikimedia.org active and sso-enabled - [[phab:T262512|T262512]]
* 09:58 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 09:58 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime
* 09:43 marostegui@cumin1001: dbctl commit (dc=all): 'es1031 (re)pooling @ 100%: Slowly pool es1031 after being recloned [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P13227 and previous config saved to /var/cache/conftool/dbconfig/20201105-094356-root.json
* 09:43 marostegui@cumin1001: dbctl commit (dc=all): 'es1030 (re)pooling @ 100%: Slowly pool es1030 after being recloned [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P13226 and previous config saved to /var/cache/conftool/dbconfig/20201105-094348-root.json
* 09:43 marostegui@cumin1001: dbctl commit (dc=all): 'es1029 (re)pooling @ 100%: Slowly pool es1029 after being recloned [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P13225 and previous config saved to /var/cache/conftool/dbconfig/20201105-094336-root.json
* 09:28 marostegui@cumin1001: dbctl commit (dc=all): 'es1031 (re)pooling @ 75%: Slowly pool es1031 after being recloned [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P13224 and previous config saved to /var/cache/conftool/dbconfig/20201105-092853-root.json
* 09:28 moritzm: enabling CAS on grafana1002, editing dashboards will be interrupted for a bit
* 09:28 marostegui@cumin1001: dbctl commit (dc=all): 'es1030 (re)pooling @ 75%: Slowly pool es1030 after being recloned [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P13223 and previous config saved to /var/cache/conftool/dbconfig/20201105-092845-root.json
* 09:28 marostegui@cumin1001: dbctl commit (dc=all): 'es1029 (re)pooling @ 75%: Slowly pool es1029 after being recloned [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P13222 and previous config saved to /var/cache/conftool/dbconfig/20201105-092833-root.json
* 09:21 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 09:21 kormat@cumin1001: START - Cookbook sre.hosts.downtime
* 09:13 marostegui@cumin1001: dbctl commit (dc=all): 'es1031 (re)pooling @ 50%: Slowly pool es1031 after being recloned [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P13219 and previous config saved to /var/cache/conftool/dbconfig/20201105-091350-root.json
* 09:13 marostegui@cumin1001: dbctl commit (dc=all): 'es1030 (re)pooling @ 50%: Slowly pool es1030 after being recloned [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P13218 and previous config saved to /var/cache/conftool/dbconfig/20201105-091341-root.json
* 09:13 marostegui@cumin1001: dbctl commit (dc=all): 'es1029 (re)pooling @ 50%: Slowly pool es1029 after being recloned [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P13217 and previous config saved to /var/cache/conftool/dbconfig/20201105-091329-root.json
* 08:58 marostegui@cumin1001: dbctl commit (dc=all): 'es1031 (re)pooling @ 25%: Slowly pool es1031 after being recloned [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P13216 and previous config saved to /var/cache/conftool/dbconfig/20201105-085846-root.json
* 08:58 marostegui@cumin1001: dbctl commit (dc=all): 'es1030 (re)pooling @ 25%: Slowly pool es1030 after being recloned [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P13215 and previous config saved to /var/cache/conftool/dbconfig/20201105-085838-root.json
* 08:58 marostegui@cumin1001: dbctl commit (dc=all): 'es1029 (re)pooling @ 25%: Slowly pool es1029 after being recloned [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P13214 and previous config saved to /var/cache/conftool/dbconfig/20201105-085826-root.json
* 08:43 marostegui@cumin1001: dbctl commit (dc=all): 'es1031 (re)pooling @ 10%: Slowly pool es1031 after being recloned [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P13213 and previous config saved to /var/cache/conftool/dbconfig/20201105-084343-root.json
* 08:43 marostegui@cumin1001: dbctl commit (dc=all): 'es1030 (re)pooling @ 10%: Slowly pool es1030 after being recloned [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P13212 and previous config saved to /var/cache/conftool/dbconfig/20201105-084334-root.json
* 08:43 marostegui@cumin1001: dbctl commit (dc=all): 'es1029 (re)pooling @ 10%: Slowly pool es1029 after being recloned [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P13211 and previous config saved to /var/cache/conftool/dbconfig/20201105-084323-root.json
* 08:42 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1090:3312', diff saved to https://phabricator.wikimedia.org/P13210 and previous config saved to /var/cache/conftool/dbconfig/20201105-084250-marostegui.json
* 08:33 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1090:3312', diff saved to https://phabricator.wikimedia.org/P13209 and previous config saved to /var/cache/conftool/dbconfig/20201105-083304-marostegui.json
* 08:31 marostegui@cumin1001: dbctl commit (dc=all): 'db1129 (re)pooling @ 100%: After regenerating table stats', diff saved to https://phabricator.wikimedia.org/P13208 and previous config saved to /var/cache/conftool/dbconfig/20201105-083142-root.json
* 08:16 marostegui@cumin1001: dbctl commit (dc=all): 'db1129 (re)pooling @ 75%: After regenerating table stats', diff saved to https://phabricator.wikimedia.org/P13207 and previous config saved to /var/cache/conftool/dbconfig/20201105-081638-root.json
* 08:01 marostegui@cumin1001: dbctl commit (dc=all): 'db1129 (re)pooling @ 50%: After regenerating table stats', diff saved to https://phabricator.wikimedia.org/P13206 and previous config saved to /var/cache/conftool/dbconfig/20201105-080135-root.json
* 07:56 marostegui@cumin1001: dbctl commit (dc=all): 'Pool es1031 on es3 with minimium weight after being cloned from es1017 [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P13205 and previous config saved to /var/cache/conftool/dbconfig/20201105-075625-marostegui.json
* 07:55 marostegui@cumin1001: dbctl commit (dc=all): 'Pool es1030 on es2 with minimium weight after being cloned from es1013 [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P13204 and previous config saved to /var/cache/conftool/dbconfig/20201105-075507-marostegui.json
* 07:54 marostegui@cumin1001: dbctl commit (dc=all): 'Pool es1029 on es1 with minimium weight after being cloned from es1016 [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P13203 and previous config saved to /var/cache/conftool/dbconfig/20201105-075358-marostegui.json
* 07:46 marostegui@cumin1001: dbctl commit (dc=all): 'db1129 (re)pooling @ 25%: After regenerating table stats', diff saved to https://phabricator.wikimedia.org/P13202 and previous config saved to /var/cache/conftool/dbconfig/20201105-074631-root.json
* 07:23 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1129 [[phab:T267216|T267216]]', diff saved to https://phabricator.wikimedia.org/P13201 and previous config saved to /var/cache/conftool/dbconfig/20201105-072352-marostegui.json
* 07:10 marostegui@cumin1001: dbctl commit (dc=all): 'es1016 (re)pooling @ 100%: After cloning es1029 [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P13200 and previous config saved to /var/cache/conftool/dbconfig/20201105-071017-root.json
* 07:06 marostegui@cumin1001: dbctl commit (dc=all): 'es1013 (re)pooling @ 100%: After cloning es1030 [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P13199 and previous config saved to /var/cache/conftool/dbconfig/20201105-070616-root.json
* 07:06 marostegui@cumin1001: dbctl commit (dc=all): 'es1017 (re)pooling @ 100%: After cloning es1031 [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P13198 and previous config saved to /var/cache/conftool/dbconfig/20201105-070610-root.json
* 06:55 marostegui@cumin1001: dbctl commit (dc=all): 'es1016 (re)pooling @ 75%: After cloning es1029 [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P13197 and previous config saved to /var/cache/conftool/dbconfig/20201105-065514-root.json
* 06:51 marostegui@cumin1001: dbctl commit (dc=all): 'es1013 (re)pooling @ 75%: After cloning es1030 [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P13196 and previous config saved to /var/cache/conftool/dbconfig/20201105-065113-root.json
* 06:51 marostegui@cumin1001: dbctl commit (dc=all): 'es1017 (re)pooling @ 75%: After cloning es1031 [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P13195 and previous config saved to /var/cache/conftool/dbconfig/20201105-065107-root.json
* 06:40 marostegui@cumin1001: dbctl commit (dc=all): 'es1016 (re)pooling @ 50%: After cloning es1029 [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P13193 and previous config saved to /var/cache/conftool/dbconfig/20201105-064010-root.json
* 06:36 marostegui@cumin1001: dbctl commit (dc=all): 'es1013 (re)pooling @ 50%: After cloning es1030 [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P13192 and previous config saved to /var/cache/conftool/dbconfig/20201105-063610-root.json
* 06:36 marostegui@cumin1001: dbctl commit (dc=all): 'es1017 (re)pooling @ 50%: After cloning es1031 [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P13191 and previous config saved to /var/cache/conftool/dbconfig/20201105-063603-root.json
* 06:34 elukey: truncate application_1601916545561_129457's taskmanager.log (~600G) on an-worker1113 due to partition 'e' full
* 06:25 marostegui@cumin1001: dbctl commit (dc=all): 'es1016 (re)pooling @ 25%: After cloning es1029 [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P13190 and previous config saved to /var/cache/conftool/dbconfig/20201105-062507-root.json
* 06:24 marostegui@cumin1001: dbctl commit (dc=all): 'es1013 (re)pooling @ 25%: After cloning es1027 [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P13189 and previous config saved to /var/cache/conftool/dbconfig/20201105-062454-root.json
* 06:24 marostegui@cumin1001: dbctl commit (dc=all): 'es1017 (re)pooling @ 25%: After cloning es1028 [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P13188 and previous config saved to /var/cache/conftool/dbconfig/20201105-062446-root.json
* 01:57 milimetric@deploy1001: Finished deploy [analytics/refinery@6913407] (thin): Regular analytics weekly train THIN [analytics/refinery@6913407] (duration: 00m 08s)
* 01:56 milimetric@deploy1001: Started deploy [analytics/refinery@6913407] (thin): Regular analytics weekly train THIN [analytics/refinery@6913407]
* 01:56 milimetric@deploy1001: Finished deploy [analytics/refinery@6913407]: Regular analytics weekly train [analytics/refinery@6913407] (duration: 08m 34s)
* 01:47 milimetric@deploy1001: Started deploy [analytics/refinery@6913407]: Regular analytics weekly train [analytics/refinery@6913407]
== 2020-11-04 ==
* 20:36 Urbanecm: Late B&C Morning window completed, deployment host is clear
* 20:36 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|ee0ba541fa55f6707276fdc5bd3f032cb9be3e60}}: Disable the search in header A/B test ([[phab:T265333|T265333]]) (duration: 01m 06s)
* 20:33 ejegg: updated payments-wiki from {{Gerrit|1ad4ba9639}} to {{Gerrit|388490e86d}}
* 20:31 otto@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Migrate NewcomerTask event stream to EventGate on testwiki - [[phab:T259163|T259163]] (duration: 01m 07s)
* 20:17 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|82579bf9d71bd3c9d97da0132ce8d92a8863da5b}}: Enable wgImagePreconnect on remaining wikis ([[phab:T123582|T123582]]) (duration: 01m 06s)
* 20:12 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|d2a57725f8f6fdaa3f40c834e84b43a0260077f2}}: Enable DiscussionTools as a beta feature on almost all Wikipedias ([[phab:T266303|T266303]]) (duration: 01m 07s)
* 20:08 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|fb5c03262c20b5e99b3c2f6e91abb024f12da1f5}}: Enable wgCheckUserLogLogins at all wikis but loginwiki ([[phab:T253802|T253802]]) (duration: 01m 08s)
* 19:59 brennen@deploy1001: Finished scap: testwikis wikis to 1.36.0-wmf.16 (duration: 62m 44s)
* 18:57 brennen@deploy1001: Started scap: testwikis wikis to 1.36.0-wmf.16
* 18:52 brennen@deploy1001: Pruned MediaWiki: 1.36.0-wmf.10 (duration: 27m 38s)
* 18:51 Urbanecm: Strip 2FA for Mark83 at SUL ([[phab:T267257|T267257]])
* 18:20 elukey: restart memcached on mc1036 to pick up new settings (see https://gerrit.wikimedia.org/r/639099)
* 18:15 hknust: holger@mwmaint1002 END - Run updateRestrictions.php
* 17:44 hknust: holger@mwmaint1002 START - Run updateRestrictions.php
* 17:22 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 17:20 jiji@cumin1001: START - Cookbook sre.hosts.downtime
* 17:15 zpapierski@deploy1001: Finished deploy [wikimedia/discovery/analytics@8e8d2d4]: Deploying dc switch (duration: 01m 15s)
* 17:13 zpapierski@deploy1001: Started deploy [wikimedia/discovery/analytics@8e8d2d4]: Deploying dc switch
* 17:07 effie: Reimage mc1036 for real this time
* 16:40 brennen: 1.36.0-wmf.16 was branched at {{Gerrit|f51ccd2ccef8cba0e7d874b6f7cf4b73bcd36636}} for [[phab:T263182|T263182]]
* 16:10 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 16:10 filippo@cumin1001: START - Cookbook sre.hosts.downtime
* 15:39 effie: Reimage mc1036 to buster - [[phab:T252391|T252391]]
* 15:25 otto@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Migrate ContentTranslationAbuseFilter event stream to EventGate on all wikis - [[phab:T259163|T259163]] (duration: 00m 58s)
* 15:19 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 15:19 kormat@cumin1001: START - Cookbook sre.hosts.downtime
* 15:18 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 15:18 kormat@cumin1001: START - Cookbook sre.hosts.downtime
* 15:09 otto@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Migrate ContentTranslationAbuseFilter event stream to EventGate on testwiki - [[phab:T259163|T259163]] (duration: 00m 59s)
* 14:37 jynus: restart mysql at db1133 [[phab:T266483|T266483]]
* 14:35 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 14:35 kormat@cumin1001: START - Cookbook sre.hosts.downtime
* 14:35 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 14:35 kormat@cumin1001: START - Cookbook sre.hosts.downtime
* 14:35 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 14:34 kormat@cumin1001: START - Cookbook sre.hosts.downtime
* 14:19 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 14:19 kormat@cumin1001: START - Cookbook sre.hosts.downtime
* 14:17 elukey: upload hue 4.8.0-1+deb10u1 to buster-wikimedia
* 14:15 jynus: restart mysqls at db209[789],db210[01], db2139, db2141 [[phab:T266483|T266483]]
* 14:03 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 14:03 kormat@cumin1001: START - Cookbook sre.hosts.downtime
* 13:59 jynus: restart mysqls at db1150 [[phab:T266483|T266483]]
* 13:54 jynus: restart mysqls at db1145 [[phab:T266483|T266483]]
* 13:51 jynus: restart mysqls at db1140 [[phab:T266483|T266483]]
* 13:47 jynus: restart mysqls at db1139 [[phab:T266483|T266483]]
* 13:43 jynus: restart mysqls at db1116 [[phab:T266483|T266483]]
* 13:43 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 13:43 kormat@cumin1001: START - Cookbook sre.hosts.downtime
* 13:40 jynus: restart mysqls at db1102 [[phab:T266483|T266483]]
* 13:36 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 13:36 kormat@cumin1001: START - Cookbook sre.hosts.downtime
* 13:35 jynus: restart mysqls at db1095 [[phab:T266483|T266483]]
* 13:24 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 13:24 kormat@cumin1001: START - Cookbook sre.hosts.downtime
* 13:06 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 13:06 kormat@cumin1001: START - Cookbook sre.hosts.downtime
* 12:58 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 12:58 kormat@cumin1001: START - Cookbook sre.hosts.downtime
* 12:55 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 12:55 kormat@cumin1001: START - Cookbook sre.hosts.downtime
* 12:55 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 12:55 kormat@cumin1001: START - Cookbook sre.hosts.downtime
* 12:50 Lucas_WMDE: EU backport&config done
* 12:11 Urbanecm: Run scap pull at snapshot1010 manually
* 12:09 Urbanecm: scap-sync file returned `snapshot1010.eqiad.wmnet returned [255]: Host key verification failed.`
* 12:08 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|ed3c43dc4488205663e6694b7ddfa991e3f3d4b9}}: Add www.irishstatutebook.ie to the wgCopyUploadsDomains allowlist of Wikimedia Commons ([[phab:T267193|T267193]]) (duration: 01m 02s)
* 11:54 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 11:54 kormat@cumin1001: START - Cookbook sre.hosts.downtime
* 11:53 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 11:53 kormat@cumin1001: START - Cookbook sre.hosts.downtime
* 11:52 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 11:52 kormat@cumin1001: START - Cookbook sre.hosts.downtime
* 11:51 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 11:51 kormat@cumin1001: START - Cookbook sre.hosts.downtime
* 11:47 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 11:47 kormat@cumin1001: START - Cookbook sre.hosts.downtime
* 11:32 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 11:32 kormat@cumin1001: START - Cookbook sre.hosts.downtime
* 11:25 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 11:25 kormat@cumin1001: START - Cookbook sre.hosts.downtime
* 11:05 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 11:05 kormat@cumin1001: START - Cookbook sre.hosts.downtime
* 11:03 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 11:03 kormat@cumin1001: START - Cookbook sre.hosts.downtime
* 11:03 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 11:02 kormat@cumin1001: START - Cookbook sre.hosts.downtime
* 11:01 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 11:01 kormat@cumin1001: START - Cookbook sre.hosts.downtime
* 10:27 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 10:27 kormat@cumin1001: START - Cookbook sre.hosts.downtime
* 10:23 kormat@cumin1001: dbctl commit (dc=all): 'Repooling after reboot. [[phab:T261389|T261389]]', diff saved to https://phabricator.wikimedia.org/P13185 and previous config saved to /var/cache/conftool/dbconfig/20201104-102341-kormat.json
* 10:23 Urbanecm: Start of `mwscript extensions/AbuseFilter/maintenance/updateVarDumps.php --wiki=$wiki --print-orphaned-records-to=/tmp/urbanecm/$wiki-orphaned.log --progress-markers > $wiki.log` in a tmux session updateVarDumps at mwmaint1002 (wiki=fiwiki; [[phab:T246539|T246539]])
* 10:17 kormat@cumin1001: dbctl commit (dc=all): 'Rebooting for [[phab:T261389|T261389]]', diff saved to https://phabricator.wikimedia.org/P13184 and previous config saved to /var/cache/conftool/dbconfig/20201104-101729-kormat.json
* 10:17 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 10:17 kormat@cumin1001: START - Cookbook sre.hosts.downtime
* 10:17 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 10:17 kormat@cumin1001: START - Cookbook sre.hosts.downtime
* 10:08 _joe_: restarting envoyproxy on all of restbase codfw, sending the command in parallel via cumin, to test poolcounter usage by the safe restart scripts
* 10:05 _joe_: restarting envoyproxy on restbase20<nowiki>{</nowiki>09,10<nowiki>}</nowiki> to test poolcounter usage by the safe restart scripts
* 09:25 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 09:25 kormat@cumin1001: START - Cookbook sre.hosts.downtime
* 09:24 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 09:24 kormat@cumin1001: START - Cookbook sre.hosts.downtime
* 09:19 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.cf (exit_code=0)
* 09:19 ayounsi@cumin1001: START - Cookbook sre.network.cf
* 09:01 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 09:01 kormat@cumin1001: START - Cookbook sre.hosts.downtime
* 09:00 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 09:00 kormat@cumin1001: START - Cookbook sre.hosts.downtime
* 08:44 moritzm: uploaded freetype 2.5.2+deb8u4+wmf1 to apt.wikimedia.org/jessie-wikimedia
* 08:00 marostegui@cumin1001: dbctl commit (dc=all): 'es1028 (re)pooling @ 100%: Slowly pool es1028 after being recloned [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P13182 and previous config saved to /var/cache/conftool/dbconfig/20201104-080033-root.json
* 08:00 marostegui@cumin1001: dbctl commit (dc=all): 'es1027 (re)pooling @ 100%: Slowly pool es1027 after being recloned [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P13181 and previous config saved to /var/cache/conftool/dbconfig/20201104-080024-root.json
* 07:59 marostegui@cumin1001: dbctl commit (dc=all): 'es1026 (re)pooling @ 100%: Slowly pool es1026 after being recloned [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P13180 and previous config saved to /var/cache/conftool/dbconfig/20201104-075953-root.json
* 07:45 marostegui@cumin1001: dbctl commit (dc=all): 'es1028 (re)pooling @ 75%: Slowly pool es1028 after being recloned [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P13179 and previous config saved to /var/cache/conftool/dbconfig/20201104-074530-root.json
* 07:45 marostegui@cumin1001: dbctl commit (dc=all): 'es1027 (re)pooling @ 75%: Slowly pool es1027 after being recloned [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P13178 and previous config saved to /var/cache/conftool/dbconfig/20201104-074520-root.json
* 07:44 marostegui@cumin1001: dbctl commit (dc=all): 'es1026 (re)pooling @ 75%: Slowly pool es1026 after being recloned [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P13177 and previous config saved to /var/cache/conftool/dbconfig/20201104-074449-root.json
* 07:30 marostegui@cumin1001: dbctl commit (dc=all): 'es1028 (re)pooling @ 50%: Slowly pool es1028 after being recloned [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P13176 and previous config saved to /var/cache/conftool/dbconfig/20201104-073026-root.json
* 07:30 marostegui@cumin1001: dbctl commit (dc=all): 'es1027 (re)pooling @ 50%: Slowly pool es1027 after being recloned [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P13175 and previous config saved to /var/cache/conftool/dbconfig/20201104-073017-root.json
* 07:29 marostegui@cumin1001: dbctl commit (dc=all): 'es1026 (re)pooling @ 50%: Slowly pool es1026 after being recloned [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P13174 and previous config saved to /var/cache/conftool/dbconfig/20201104-072946-root.json
* 07:15 marostegui@cumin1001: dbctl commit (dc=all): 'es1028 (re)pooling @ 25%: Slowly pool es1028 after being recloned [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P13173 and previous config saved to /var/cache/conftool/dbconfig/20201104-071523-root.json
* 07:15 marostegui@cumin1001: dbctl commit (dc=all): 'es1027 (re)pooling @ 25%: Slowly pool es1027 after being recloned [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P13172 and previous config saved to /var/cache/conftool/dbconfig/20201104-071513-root.json
* 07:14 marostegui@cumin1001: dbctl commit (dc=all): 'es1026 (re)pooling @ 25%: Slowly pool es1026 after being recloned [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P13171 and previous config saved to /var/cache/conftool/dbconfig/20201104-071443-root.json
* 07:09 elukey: manual cleanup of mcelog and its wmf-auto-restart (failing) on mw1381 (kernel 4.19, doesn't support mcelog)
* 07:01 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es1016 es1013 es1017 [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P13170 and previous config saved to /var/cache/conftool/dbconfig/20201104-070121-marostegui.json
* 07:00 marostegui: Stop mysql on es1016, es1013, es1017 to clone es1029, es1030, es1031 [[phab:T261717|T261717]]
* 07:00 marostegui@cumin1001: dbctl commit (dc=all): 'es1028 (re)pooling @ 10%: Slowly pool es1028 after being recloned [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P13169 and previous config saved to /var/cache/conftool/dbconfig/20201104-070020-root.json
* 07:00 marostegui@cumin1001: dbctl commit (dc=all): 'es1027 (re)pooling @ 10%: Slowly pool es1027 after being recloned [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P13168 and previous config saved to /var/cache/conftool/dbconfig/20201104-070010-root.json
* 06:59 marostegui@cumin1001: dbctl commit (dc=all): 'es1026 (re)pooling @ 10%: Slowly pool es1026 after being recloned [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P13167 and previous config saved to /var/cache/conftool/dbconfig/20201104-065939-root.json
* 06:59 marostegui@cumin1001: dbctl commit (dc=all): 'es1014 (re)pooling @ 100%: After cloning es1028 [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P13166 and previous config saved to /var/cache/conftool/dbconfig/20201104-065926-root.json
* 06:59 marostegui@cumin1001: dbctl commit (dc=all): 'es1012 (re)pooling @ 100%: After cloning es1027 [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P13165 and previous config saved to /var/cache/conftool/dbconfig/20201104-065905-root.json
* 06:58 marostegui@cumin1001: dbctl commit (dc=all): 'es1011 (re)pooling @ 100%: After cloning es1026 [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P13164 and previous config saved to /var/cache/conftool/dbconfig/20201104-065849-root.json
* 06:52 elukey: force start of rasdaemon.service on dumpsdata1002 (its auto-restart unit was failing for it)
* 06:47 elukey: set an-presto1004's netbox status as "active" (was: failed) after hw maintenance - [[phab:T253438|T253438]]
* 06:44 elukey: force restart of uwsgi-ores on ores1005 - daemon down after reload, max client reached error messages in the logs
* 06:44 marostegui@cumin1001: dbctl commit (dc=all): 'es1014 (re)pooling @ 75%: After cloning es1028 [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P13163 and previous config saved to /var/cache/conftool/dbconfig/20201104-064422-root.json
* 06:44 marostegui@cumin1001: dbctl commit (dc=all): 'es1012 (re)pooling @ 75%: After cloning es1027 [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P13162 and previous config saved to /var/cache/conftool/dbconfig/20201104-064402-root.json
* 06:43 marostegui@cumin1001: dbctl commit (dc=all): 'es1011 (re)pooling @ 75%: After cloning es1026 [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P13161 and previous config saved to /var/cache/conftool/dbconfig/20201104-064345-root.json
* 06:30 marostegui@cumin1001: dbctl commit (dc=all): 'Pool es1028 with minimum weight after recloning [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P13160 and previous config saved to /var/cache/conftool/dbconfig/20201104-063028-marostegui.json
* 06:29 marostegui@cumin1001: dbctl commit (dc=all): 'es1014 (re)pooling @ 50%: After cloning es1028 [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P13159 and previous config saved to /var/cache/conftool/dbconfig/20201104-062919-root.json
* 06:29 marostegui@cumin1001: dbctl commit (dc=all): 'es1012 (re)pooling @ 50%: After cloning es1027 [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P13158 and previous config saved to /var/cache/conftool/dbconfig/20201104-062858-root.json
* 06:28 marostegui@cumin1001: dbctl commit (dc=all): 'es1011 (re)pooling @ 50%: After cloning es1026 [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P13157 and previous config saved to /var/cache/conftool/dbconfig/20201104-062842-root.json
* 06:18 marostegui@cumin1001: dbctl commit (dc=all): 'Pool es1027 with minimum weight after recloning [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P13156 and previous config saved to /var/cache/conftool/dbconfig/20201104-061829-marostegui.json
* 06:15 marostegui@cumin1001: dbctl commit (dc=all): 'Pool es1026 with minimum weight after recloning [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P13155 and previous config saved to /var/cache/conftool/dbconfig/20201104-061549-marostegui.json
* 06:14 marostegui@cumin1001: dbctl commit (dc=all): 'es1014 (re)pooling @ 25%: After cloning es1028 [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P13154 and previous config saved to /var/cache/conftool/dbconfig/20201104-061416-root.json
* 06:13 marostegui@cumin1001: dbctl commit (dc=all): 'es1012 (re)pooling @ 25%: After cloning es1027 [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P13153 and previous config saved to /var/cache/conftool/dbconfig/20201104-061355-root.json
* 06:13 marostegui@cumin1001: dbctl commit (dc=all): 'es1011 (re)pooling @ 25%: After cloning es1026 [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P13152 and previous config saved to /var/cache/conftool/dbconfig/20201104-061339-root.json
== 2020-11-03 ==
* 22:56 _joe_: repooling mw1346
* 22:55 _joe_: depooling mw1346
* 22:49 cdanis: mw1342 restart-php7.2-fpm
* 22:37 cdanis: repool mw1278 and mw1279
* 22:35 cdanis: ✔️ cdanis@mw1290.eqiad.wmnet ~ 🕠🍺 sudo restart-php7.2-fpm
* 22:34 cdanis: restart-php7.2-fpm and pool on mw1276
* 22:31 cdanis: depool mw1276 and mw1279 also
* 22:25 cdanis: ✔️ cdanis@mw1278.eqiad.wmnet ~ 🕠🍺 sudo depool
* 21:16 hashar: Gerrit: triggering java garbage collection # [[phab:T263008|T263008]]
* 19:32 gehel: restarting blazegraph on wdqs1007 to reset ban list
* 18:21 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 18:15 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 17:45 cmjohnson1: shutting elastic1063 down to reseat DIMM [[phab:T265113|T265113]]
* 17:32 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 17:31 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime
* 17:31 hnowlan@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 17:31 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime
* 17:30 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 17:30 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime
* 17:30 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 17:29 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime
* 16:57 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 16:51 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 16:39 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 16:39 kormat@cumin1001: START - Cookbook sre.hosts.downtime
* 16:36 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 16:35 kormat@cumin1001: START - Cookbook sre.hosts.downtime
* 16:35 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 16:35 kormat@cumin1001: START - Cookbook sre.hosts.downtime
* 16:22 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 16:22 kormat@cumin1001: START - Cookbook sre.hosts.downtime
* 16:21 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 16:21 kormat@cumin1001: START - Cookbook sre.hosts.downtime
* 16:13 cdanis@cumin1001: END (PASS) - Cookbook sre.network.cf (exit_code=0)
* 16:13 cdanis@cumin1001: START - Cookbook sre.network.cf
* 16:04 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
* 16:03 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
* 16:01 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
* 16:01 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
* 15:59 elukey: shutdown kafka-jumbo1006 to replace 1G with 10G nic
* 15:52 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 15:51 kormat@cumin1001: START - Cookbook sre.hosts.downtime
* 15:51 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 15:51 kormat@cumin1001: START - Cookbook sre.hosts.downtime
* 15:08 moritzm: imported php-redis/xdebug to component/php72 for buster-wikimedia
* 14:37 moritzm: imported php-apcu-bc/php-igbinary/tideways-xhprof to component/php72 for buster-wikimedia
* 14:35 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 14:35 kormat@cumin1001: START - Cookbook sre.hosts.downtime
* 14:33 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 14:32 kormat@cumin1001: START - Cookbook sre.hosts.downtime
* 14:04 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 14:04 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime
* 13:53 moritzm: imported php-mongodb/php-wmerrors/wikidiff2 to component/php72 for buster-wikimedia
* 13:43 sobanski: Removing db1091 from tendril and zarcillo [[phab:T267088|T267088]]
* 13:34 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 13:34 kormat@cumin1001: START - Cookbook sre.hosts.downtime
* 13:33 lsobanski@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 13:24 lsobanski@cumin1001: START - Cookbook sre.hosts.decommission
* 13:22 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 13:22 kormat@cumin1001: START - Cookbook sre.hosts.downtime
* 13:02 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 13:02 kormat@cumin1001: START - Cookbook sre.hosts.downtime
* 12:40 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 12:40 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime
* 11:58 moritzm: imported php-apcu/php-geoip/php-imagick/php-mailparse to component/php72 for buster-wikimedia
* 11:57 moritzm: running "reprepro clearvanished" to prune thirdparty/orchestrator
* 11:51 gilles@deploy1001: Finished deploy [performance/asoranking@2a2cb05]: [[phab:T266985|T266985]] (duration: 00m 03s)
* 11:51 gilles@deploy1001: Started deploy [performance/asoranking@2a2cb05]: [[phab:T266985|T266985]]
* 11:29 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 11:29 kormat@cumin1001: START - Cookbook sre.hosts.downtime
* 11:23 hnowlan@cumin1001: END (FAIL) - Cookbook sre.postgresql.postgres-init (exit_code=99)
* 11:23 hnowlan@cumin1001: START - Cookbook sre.postgresql.postgres-init
* 11:23 hnowlan: resyncing postgres replica maps1001
* 11:03 Amir1: rolling restart of ores
* 10:59 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 10:59 kormat@cumin1001: START - Cookbook sre.hosts.downtime
* 10:45 gilles@deploy1001: Finished deploy [performance/asoranking@2a2cb05]: [[phab:T266985|T266985]] (duration: 00m 07s)
* 10:45 gilles@deploy1001: Started deploy [performance/asoranking@2a2cb05]: [[phab:T266985|T266985]]
* 10:39 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 10:39 kormat@cumin1001: START - Cookbook sre.hosts.downtime
* 10:22 gilles@deploy1001: Finished deploy [performance/asoranking@2a2cb05]: [[phab:T266985|T266985]] (duration: 00m 26s)
* 10:21 gilles@deploy1001: Started deploy [performance/asoranking@2a2cb05]: [[phab:T266985|T266985]]
* 10:16 elukey@deploy1001: Finished deploy [analytics/refinery@cf5db74] (hadoop-test): (no justification provided) (duration: 02m 15s)
* 10:14 elukey@deploy1001: Started deploy [analytics/refinery@cf5db74] (hadoop-test): (no justification provided)
* 10:13 elukey@deploy1001: Finished deploy [analytics/refinery@cf5db74] (hadoop-test): (no justification provided) (duration: 01m 45s)
* 10:11 elukey@deploy1001: Started deploy [analytics/refinery@cf5db74] (hadoop-test): (no justification provided)
* 10:08 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 10:08 kormat@cumin1001: START - Cookbook sre.hosts.downtime
* 09:57 kormat: uploaded orchestrator 3.2.3-2 to apt
* 09:54 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 09:54 kormat@cumin1001: START - Cookbook sre.hosts.downtime
* 09:49 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 09:49 kormat@cumin1001: START - Cookbook sre.hosts.downtime
* 09:18 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 09:18 kormat@cumin1001: START - Cookbook sre.hosts.downtime
* 09:12 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 09:12 kormat@cumin1001: START - Cookbook sre.hosts.downtime
* 09:06 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 09:06 kormat@cumin1001: START - Cookbook sre.hosts.downtime
* 09:05 kormat@cumin1001: dbctl commit (dc=all): 'Repooling after reboot. [[phab:T261389|T261389]]', diff saved to https://phabricator.wikimedia.org/P13139 and previous config saved to /var/cache/conftool/dbconfig/20201103-090523-kormat.json
* 09:00 kormat@cumin1001: dbctl commit (dc=all): 'Rebooting for [[phab:T261389|T261389]]', diff saved to https://phabricator.wikimedia.org/P13138 and previous config saved to /var/cache/conftool/dbconfig/20201103-090013-kormat.json
* 09:00 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 09:00 kormat@cumin1001: START - Cookbook sre.hosts.downtime
* 08:32 godog: Prometheus re-enable compactions - [[phab:T261281|T261281]]
* 06:59 marostegui: Remove db1091 from tendril and zarcillo [[phab:T267088|T267088]]
* 06:57 marostegui@cumin1001: dbctl commit (dc=all): 'Remove db1091 from dbctl [[phab:T267088|T267088]]', diff saved to https://phabricator.wikimedia.org/P13137 and previous config saved to /var/cache/conftool/dbconfig/20201103-065756-marostegui.json
* 06:46 marostegui: Deploy schema change on s1 codfw master: [[phab:T265349|T265349]]
* 06:16 marostegui: Stop MySQL on es1014 to clone es1028 [[phab:T261717|T261717]]
* 06:14 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es1014 to reclone es1028 [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P13136 and previous config saved to /var/cache/conftool/dbconfig/20201103-061423-marostegui.json
* 06:14 marostegui@cumin1001: dbctl commit (dc=all): 'Promote es1019 to es3 master (this is a noop) [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P13135 and previous config saved to /var/cache/conftool/dbconfig/20201103-061403-marostegui.json
* 06:11 marostegui: Stop MySQL on es1012 to clone es1027 [[phab:T261717|T261717]]
* 06:07 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es1012 to reclone es1027 [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P13134 and previous config saved to /var/cache/conftool/dbconfig/20201103-060727-marostegui.json
* 06:07 marostegui@cumin1001: dbctl commit (dc=all): 'Promote es1018 to es1 master (this is a noop) [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P13133 and previous config saved to /var/cache/conftool/dbconfig/20201103-060705-marostegui.json
* 06:04 marostegui: Stop MySQL on es1011 to clone es1026 [[phab:T261717|T261717]]
* 06:00 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es1011 to reclone es1026 [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P13132 and previous config saved to /var/cache/conftool/dbconfig/20201103-060054-marostegui.json
* 06:00 marostegui@cumin1001: dbctl commit (dc=all): 'Promote es1015 to es2 master (this is a noop) [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P13131 and previous config saved to /var/cache/conftool/dbconfig/20201103-060038-marostegui.json
* 04:39 cstone: civicrm revision changed from {{Gerrit|cd13d9e30f}} to {{Gerrit|b1342c4129}}
* 02:13 shdubsh: restart ES on logstash1009 - oom killed
* 01:01 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 00:59 robh@cumin1001: START - Cookbook sre.hosts.downtime
* 00:42 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 00:40 robh@cumin1001: START - Cookbook sre.hosts.downtime
== 2020-11-02 ==
* 22:19 twentyafterfour: restart php7.3-fpm on phab1001
* 22:03 twentyafterfour: applied {{Gerrit|113a244a66}} on phab1001 to hotfix [[phab:T240862|T240862]]
* 20:22 eileen: process-control config revision is {{Gerrit|313a36312f}} re-enable thank you
* 19:56 dzahn@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 19:48 dzahn@cumin1001: START - Cookbook sre.dns.netbox
* 19:47 eileen: civicrm revision changed from {{Gerrit|3317d30356}} to {{Gerrit|cd13d9e30f}}, config revision is {{Gerrit|db912e3bba}}
* 19:45 eileen: process-control config revision is {{Gerrit|db912e3bba}} - thankyou job off for testing
* 19:07 Urbanecm: Deployed security fix for [[phab:T205908|T205908]]
* 19:04 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
* 18:59 andrewbogott: added dcaro to ops and wmf ldap groups
* 18:59 mutante: decom'ing testvm1001
* 18:58 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
* 18:49 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 18:49 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime
* 18:49 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 18:49 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime
* 18:17 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 18:17 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime
* 18:14 XioNoX: push new pfw policies - [[phab:T267051|T267051]]
* 16:39 ejegg: updated payments-wiki from {{Gerrit|adc3369cb3}} to {{Gerrit|1ad4ba9639}}
* 16:37 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 16:37 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime
* 16:37 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 16:37 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime
* 16:37 hnowlan@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 16:37 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime
* 15:36 moritzm: imported php-excimer/php-luasandbox to component/php72 for buster-wikimedia
* 14:38 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 14:38 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime
* 14:34 moritzm: rolling restart of cassandra in restbase-dev to pick up Java security updates
* 14:17 kormat: uploaded orchestrator 3.2.3-1 to apt
* 14:01 hashar@deploy1001: Synchronized wmf-config/CommonSettings.php: Remove $wgExtDistListFile, unused - [[phab:T266024|T266024]] (duration: 00m 58s)
* 13:46 elukey@cumin1001: END (PASS) - Cookbook sre.zookeeper.roll-restart-zookeeper (exit_code=0)
* 13:40 elukey: roll restart zookeeper ok an-conf* to pick up new openjdk upgrades
* 13:40 elukey@cumin1001: START - Cookbook sre.zookeeper.roll-restart-zookeeper
* 13:03 Lucas_WMDE: EU backport&config window done
* 13:02 lucaswerkmeister-wmde@deploy1001: Synchronized php-1.36.0-wmf.14/extensions/Wikibase: Backport: [[gerrit:637801{{!}}Revert JS parser commits (T266671)]] (duration: 01m 09s)
* 12:52 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:637819{{!}}Add Response namespace at otrs_wikiwiki to namespaces searched by default (T266917)]] (duration: 00m 58s)
* 12:21 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/InitialiseSettings-labs.php: Config: [[gerrit:634224{{!}}Stop defining wmgULSCompactLinksForNewAccounts and wmgULSCompactLinksEnableAnon]], 2/2 (Beta) (duration: 00m 57s)
* 12:20 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:634224{{!}}Stop defining wmgULSCompactLinksForNewAccounts and wmgULSCompactLinksEnableAnon]], 1/2 (production) (duration: 01m 02s)
* 12:15 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/CommonSettings.php: Config: [[gerrit:638020{{!}}Stop reading wmgULSCompactLinksForNewAccounts and wmgULSCompactLinksEnableAnon]] (duration: 00m 58s)
* 12:15 volans: upgraded python3-wmflib to 0.0.4 on cumin[12]001
* 12:09 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/InitialiseSettings-labs.php: Config: [[gerrit:637778{{!}}Fix array depth for properties array (T266835)]], Beta part (prod no-op) (duration: 00m 58s)
* 12:07 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:637778{{!}}Fix array depth for properties array (T266835)]] (duration: 00m 59s)
* 12:02 volans: uploaded python3-wmflib_0.0.4 to apt.wikimedia.org buster-wikimedia
* 11:51 effie: disable puppet on  thumbor1001 and thumbor1002 to test 636024
* 11:51 effie: disable thumbor on thumbor1001 and thumbor1002 to test 636024
* 11:34 jdrewniak@deploy1001: Synchronized portals: Wikimedia Portals Update: [[gerrit:638045{{!}} Bumping portals to master (T128546)]] (duration: 00m 58s)
* 11:33 jdrewniak@deploy1001: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: [[gerrit:638045{{!}} Bumping portals to master (T128546)]] (duration: 01m 00s)
* 11:18 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 11:18 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime
* 11:13 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 11:06 godog: upgrade thanos to 0.16.0 on prometheus hosts - [[phab:T261281|T261281]]
* 10:59 jmm@cumin2001: START - Cookbook sre.hosts.decommission
* 10:57 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 10:50 jmm@cumin2001: START - Cookbook sre.hosts.decommission
* 10:28 oblivian@cumin1001: END (PASS) - Cookbook sre.network.cf (exit_code=0)
* 10:28 oblivian@cumin1001: START - Cookbook sre.network.cf
* 10:28 oblivian@cumin1001: END (PASS) - Cookbook sre.network.cf (exit_code=0)
* 10:28 oblivian@cumin1001: START - Cookbook sre.network.cf
* 10:23 moritzm: installing openldap security updates on corp LDAP replicas
* 08:46 XioNoX: add uRPF strict to ulsfo office links - [[phab:T266561|T266561]]
* 08:41 moritzm: installing openldap security updates on LDAP replicas
* 08:40 godog: upgrade thanos to 0.16 in codfw/eqiad - [[phab:T261281|T261281]]
* 06:09 oblivian@cumin1001: END (PASS) - Cookbook sre.network.cf (exit_code=0)
* 06:09 oblivian@cumin1001: START - Cookbook sre.network.cf
* 06:09 oblivian@cumin1001: END (PASS) - Cookbook sre.network.cf (exit_code=0)
* 06:09 oblivian@cumin1001: START - Cookbook sre.network.cf
== 2020-11-01 ==
* 22:41 Urbanecm: mwscript extensions/OATHAuth/maintenance/disableOATHAuthForUser.php --wiki=metawiki Turkmen # [[phab:T266976|T266976]]
* 09:52 ariel@deploy1001: Finished deploy [dumps/dumps@de4c823]: actually allow per run dir to be made early in the run (duration: 00m 04s)
* 09:52 ariel@deploy1001: Started deploy [dumps/dumps@de4c823]: actually allow per run dir to be made early in the run
* 09:16 ariel@deploy1001: Finished deploy [dumps/dumps@6c7d811]: create empty dir for tableinfo if needed (duration: 00m 04s)
* 09:16 ariel@deploy1001: Started deploy [dumps/dumps@6c7d811]: create empty dir for tableinfo if needed
* 01:26 rzl@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 01:26 rzl@cumin1001: START - Cookbook sre.hosts.downtime
* 01:16 rzl@cumin1001: dbctl commit (dc=all): 'Depool db1091', diff saved to https://phabricator.wikimedia.org/P13124 and previous config saved to /var/cache/conftool/dbconfig/20201101-011600-rzl.json
== 2020-10-31 ==
* 00:12 mutante: removed Nuria from wmf group, she is already in nda group ([[phab:T266086|T266086]])
== 2020-10-30 ==
* 23:35 foks: removing two files for legal compliance
* 23:32 mutante: adding query.wikidata.org to TLS cert for webserver-misc-apps.discovery.wmnet [[phab:T266702|T266702]]
* 23:05 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 23:04 robh@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 23:04 robh@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 23:03 robh@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 23:03 robh@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 23:03 robh@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 23:03 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 23:02 robh@cumin1001: START - Cookbook sre.hosts.downtime
* 23:02 robh@cumin1001: START - Cookbook sre.hosts.downtime
* 23:02 robh@cumin1001: START - Cookbook sre.hosts.downtime
* 23:01 robh@cumin1001: START - Cookbook sre.hosts.downtime
* 23:01 robh@cumin1001: START - Cookbook sre.hosts.downtime
* 23:01 robh@cumin1001: START - Cookbook sre.hosts.downtime
* 23:01 robh@cumin1001: START - Cookbook sre.hosts.downtime
* 21:02 jiji@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 21:00 jiji@cumin2001: START - Cookbook sre.hosts.downtime
* 20:59 mutante: mw1267,mw1268 - scap pull and repool - back to prod - [[phab:T266164|T266164]]
* 20:57 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1267.eqiad.wmnet
* 20:57 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1268.eqiad.wmnet
* 20:56 mutante: mw1267,mw1268 - scap pull
* 20:33 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 20:32 robh@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 20:32 robh@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 20:32 robh@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 20:32 robh@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 20:32 robh@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 20:32 robh@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 20:32 robh@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 20:32 robh@cumin1001: START - Cookbook sre.hosts.downtime
* 20:31 robh@cumin1001: START - Cookbook sre.hosts.downtime
* 20:31 robh@cumin1001: START - Cookbook sre.hosts.downtime
* 20:31 robh@cumin1001: START - Cookbook sre.hosts.downtime
* 20:31 robh@cumin1001: START - Cookbook sre.hosts.downtime
* 20:31 robh@cumin1001: START - Cookbook sre.hosts.downtime
* 20:31 robh@cumin1001: START - Cookbook sre.hosts.downtime
* 20:31 robh@cumin1001: START - Cookbook sre.hosts.downtime
* 20:06 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 20:04 robh@cumin1001: START - Cookbook sre.hosts.downtime
* 18:48 cdanis: the above scap began (and mostly finished) several minutes ago but is hanging on a couple hosts down for maintenance
* 18:48 cdanis@deploy1001: Synchronized wmf-config/InitialiseSettings.php: lower frwiki featured feeds limit {{Gerrit|1a41ef634}} [[phab:T266865|T266865]] (duration: 05m 14s)
* 18:48 cdanis: ✔️ cdanis@deploy1001.eqiad.wmnet /srv/mediawiki-staging 🕝☕ scap sync-file wmf-config/InitialiseSettings.php 'lower frwiki featured feeds limit {{Gerrit|1a41ef634}} [[phab:T266865|T266865]]'
* 18:27 hashar@deploy1001: Finished deploy [integration/docroot@c35e5e9]: Add ECS to doc.wikimedia.org index (duration: 00m 06s)
* 18:27 hashar@deploy1001: Started deploy [integration/docroot@c35e5e9]: Add ECS to doc.wikimedia.org index
* 17:38 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 17:36 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 17:22 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 17:22 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 17:19 effie: disable puppet on mc1036 and mc2036 - [[phab:T252391|T252391]]
* 17:18 effie: enable puppet on all mediawiki and mc* hosts
* 16:19 elukey: kafka-jumbo1006 still running with 1g nick
* 15:36 effie: stopping puppet on mediawiki and mc* hosts
* 15:11 rzl@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 15:11 rzl@cumin1001: START - Cookbook sre.hosts.downtime
* 15:09 rzl: downtiming mc2036 for buster reimage
* 14:42 elukey: stop kafka-jumbo1006 to swap NICs (1g -> 10g, d1 -> d4 rack)
* 14:14 cmjohnson1: moving mw1267 and mw168 to rack A8 eqiad [[phab:T266164|T266164]]
* 12:29 XioNoX: set normal VRRP balancing on cr2-eqiad
* 10:08 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 10:08 klausman@cumin1001: START - Cookbook sre.hosts.downtime
* 10:02 ladsgroup@deploy1001: Synchronized static/images/project-logos: Revert: Changing logo of Wikidata for the brithday (duration: 01m 12s)
* 09:13 elukey@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 09:07 elukey@cumin1001: START - Cookbook sre.dns.netbox
* 08:58 elukey@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
* 08:54 elukey: decom an-tool1006 (old analytics test vm) - [[phab:T255139|T255139]]
* 08:53 elukey@cumin1001: START - Cookbook sre.hosts.decommission
== 2020-10-29 ==
* 23:59 eileen: process-control config revision is {{Gerrit|6891d35bce}}
* 23:39 Urbanecm: Evening B&C window done
* 23:38 Urbanecm: urbanecm@mwmaint1002:~$ mwscript namespaceDupes.php --wiki=trwikiquote --add-prefix=BROKEN --fix # [[phab:T266605|T266605]] # P13112
* 23:37 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|ddb7e08e9c1d07f704c9f7585d8b6089f1895b5c}}: Add namespace aliases to Turkish Wikiquote ([[phab:T266605|T266605]]) (duration: 00m 57s)
* 23:36 eileen: process-control config revision is {{Gerrit|1114512f90}}
* 23:29 Urbanecm: urbanecm@mwmaint1002:~$ mwscript namespaceDupes.php --wiki=trwikisource --add-prefix=BROKEN --fix # [[phab:T266606|T266606]] # P13111
* 23:27 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|c3a8555154673c4c5a65f6ec2a1219d0832f48e0}}: Add namespace aliases to Turkish Wikisource ([[phab:T266606|T266606]]) (duration: 00m 56s)
* 23:23 Urbanecm: urbanecm@mwmaint1002:~$ mwscript namespaceDupes.php --wiki=trwikibooks --fix # [[phab:T266608|T266608]]
* 23:22 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|1800d11ec8c07ff6ccffe0fd03ce11e6786f8a6e}}: Add namespace aliases to Turkish Wikibooks ([[phab:T266608|T266608]]) (duration: 00m 57s)
* 23:22 eileen: civicrm revision changed from {{Gerrit|e1d65b0f3a}} to {{Gerrit|3317d30356}}, config revision is {{Gerrit|d70fe02cb9}}
* 23:18 Urbanecm: urbanecm@mwmaint1002:~$ mwscript namespaceDupes.php --wiki=trwiktionary --fix    # [[phab:T266609|T266609]]
* 23:17 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|090f75730727e7a3ca5a85af0ff9071213dd047f}}: Add namespace aliases to Turkish Wiktionary ([[phab:T266609|T266609]]) (duration: 00m 58s)
* 22:35 mutante: mw1268 - depooled for [[phab:T266164|T266164]]
* 22:34 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1268.eqiad.wmnet
* 22:33 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 22:33 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 22:32 mutante: mw1269 rsyncd/ferm for scap proxy was enabled - mw1268 rsyncd/ferm for scan proxy was removed - deploy1001 scap-proxies dsh group was adjusted
* 22:21 mutante: replacing scap proxy for rack A7 eqiad because mw1268 needs to move physically ([[phab:T266164|T266164]])
* 22:21 bstorm: updated packages for thirdparty/kubeadm-k8s-1-17 to prepare for install [[phab:T263284|T263284]]
* 22:10 razzi@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 22:08 razzi@cumin1001: START - Cookbook sre.hosts.downtime
* 22:07 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 22:07 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 22:06 mutante: depooled mw1267 ([[phab:T266164|T266164]])
* 22:06 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1267.eqiad.wmnet
* 22:04 mutante: scandium - puppet disabled again (but only until tomorrow), downtimed in Icinga, for ongoing parsoid tests from testreduce1001
* 22:03 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 22:03 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 20:52 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 20:50 robh@cumin1001: START - Cookbook sre.hosts.downtime
* 20:23 herron@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 20:17 herron@cumin1001: START - Cookbook sre.dns.netbox
* 20:09 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 20:08 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 20:06 robh@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 20:06 robh@cumin1001: START - Cookbook sre.hosts.downtime
* 20:06 robh@cumin1001: START - Cookbook sre.hosts.downtime
* 20:06 robh@cumin1001: START - Cookbook sre.hosts.downtime
* 19:31 cdanis@cumin1001: END (PASS) - Cookbook sre.network.cf (exit_code=0)
* 19:31 cdanis@cumin1001: START - Cookbook sre.network.cf
* 19:25 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 19:25 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime
* 19:22 Urbanecm: Start of `mwscript extensions/AbuseFilter/maintenance/updateVarDumps.php --wiki=$wiki --print-orphaned-records-to=/tmp/urbanecm/$wiki-orphaned.log --progress-markers > $wiki.log` in a tmux session on mwmaint1002 (wiki=ukwiki; [[phab:T246539|T246539]])
* 19:13 Amir1: rolling restart of ores uwsgi
* 19:00 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 18:58 robh@cumin1001: START - Cookbook sre.hosts.downtime
* 18:16 herron@cumin1001: END (ERROR) - Cookbook sre.ganeti.makevm (exit_code=97)
* 18:13 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enable WikiLove on hewikiquote ([[phab:T266744|T266744]]) (duration: 00m 57s)
* 18:09 herron@cumin1001: START - Cookbook sre.ganeti.makevm
* 18:07 herron@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99)
* 18:07 herron@cumin1001: START - Cookbook sre.ganeti.makevm
* 18:06 herron@cumin1001: END (ERROR) - Cookbook sre.ganeti.makevm (exit_code=97)
* 18:06 herron@cumin1001: START - Cookbook sre.ganeti.makevm
* 18:06 Urbanecm: [urbanecm@deploy1001 /srv/mediawiki-staging (master * u=)]$ sudo /usr/local/sbin/fix-staging-perms
* 18:05 Urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript extensions/WikimediaMaintenance/createExtensionTables.php --wiki=hewikiquote wikilove # [[phab:T266744|T266744]]
* 18:04 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|b7eaaab81e1665c478f5dc1fdb495e36c53e7863}}: [cswiki] Set wgGEHomepageManualAssignmentMentorsList to Wikipedie:Potřebuji pomoc/Mentoři/Manuální ([[phab:T245639|T245639]]) (duration: 00m 57s)
* 17:48 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.cf (exit_code=0)
* 17:48 ayounsi@cumin1001: START - Cookbook sre.network.cf
* 17:29 hashar: Restarted CI Jenkins a bit ago
* 17:15 hashar: CI: killed all java  agents (java upgrade)
* 17:12 hashar: Stopping CI Jenkins
* 16:59 XioNoX: Delete cr1-eqiad:ae2.1120 and related static routes - [[phab:T265288|T265288]]
* 16:46 _joe_: restarted kartotherian on all servers in eqiad at the same time
* 16:38 XioNoX: Move cr2-eqiad:ae2.1120 to cloudsw1-d5:irb.1120 - [[phab:T265288|T265288]]
* 16:34 XioNoX: force VRRP master on cr1-eqiad - [[phab:T265288|T265288]]
* 16:26 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: dc=eqiad,cluster=maps,service=kartotherian,name=maps1004.eqiad.wmnet
* 16:25 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: dc=eqiad,cluster=maps,service=kartotherian-ssl,name=maps1004.eqiad.wmnet
* 15:34 oblivian@deploy1001: Synchronized wmf-config/ProductionServices.php: Revert: switch restbase to use envoy, https (duration: 00m 57s)
* 15:22 moritzm: installing bacula updates from Buster point release
* 15:22 urbanecm@deploy1001: Synchronized php-1.36.0-wmf.14/extensions/intersection/: {{Gerrit|483c3bceb926ac6a2cfc40112fb9b4f0671fef72}}: Attempt to add a query cache to DPL ([[phab:T263220|T263220]]) (duration: 00m 58s)
* 15:16 papaul: poweroff mc2029 for relocation
* 15:15 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|19c5aff02c20812c56b8abdcc0ed530393010193}}: Set wgDLPQueryCacheTime to 120 at all wikis ([[phab:T263220|T263220]]) (duration: 00m 59s)
* 15:09 oblivian@deploy1001: Synchronized wmf-config/ProductionServices.php: Switch restbase to use envoy, https (duration: 00m 57s)
* 15:06 vgutierrez: rolling restart of ATS to upgrade to trafficserver 8.0.8-1wm3 - [[phab:T265911|T265911]]
* 14:59 papaul: poweroff sessionstore2002 for relocation
* 14:36 ppchelko@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' .
* 14:35 ppchelko@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' .
* 14:34 ppchelko@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'staging' .
* 14:33 jmm@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 14:29 jmm@cumin1001: START - Cookbook sre.hosts.decommission
* 14:26 jmm@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 14:25 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 14:25 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime
* 14:24 elukey: restart zookeeper on an-conf1001 for openjdk upgrades
* 14:20 jmm@cumin1001: START - Cookbook sre.hosts.decommission
* 14:08 godog: bump FS for prometheus codfw global instance
* 13:54 elukey: roll out profile::java on all zookeeper instances
* 13:53 moritzm: installing Java 11 security updates
* 13:52 bblack: authdns1001 - restart gdnsd - [[phab:T266746|T266746]]
* 13:46 bblack: authdns2001 - restart gdnsd - [[phab:T266746|T266746]]
* 13:38 bblack: staggered restart of gdnsd on dns[12345]001 (1/2 recursors in each DC) - [[phab:T266746|T266746]]
* 13:29 bblack: staggered restart of gdnsd on dns[12345]002 (1/2 recursors in each DC) - [[phab:T266746|T266746]]
* 13:25 Urbanecm: Correction: Obviously 1002 ([[phab:T246539|T246539]])
* 13:23 Urbanecm: Start of `mwscript extensions/AbuseFilter/maintenance/updateVarDumps.php --wiki=$wiki --print-orphaned-records-to=/tmp/urbanecm/$wiki-orphaned.log --progress-markers > $wiki.log` in a tmux session updateVarDumps at mwmaint2001 (wiki=idwiki; [[phab:T246539|T246539]])
* 13:21 moritzm: installing bluez security updates on stretch
* 12:56 marostegui: Make orchestrator discover pc2 [[phab:T266485|T266485]]
* 12:55 marostegui: Deploy orchestrator grants on pc2 [[phab:T266485|T266485]]
* 12:44 marostegui: Deploy grants for cluster alias on pc1 [[phab:T266485|T266485]]
* 12:35 moritzm: upgrade idp-test* hosts to latest Java securiy updates
* 12:35 moritzm: restart idp-test
* 12:34 ariel@deploy1001: Finished deploy [dumps/dumps@4ed2cb9]: revinfo for page content jobs, tableinfo for list of known tables (duration: 00m 05s)
* 12:33 ariel@deploy1001: Started deploy [dumps/dumps@4ed2cb9]: revinfo for page content jobs, tableinfo for list of known tables
* 12:01 klausman@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
* 11:18 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.cf (exit_code=0)
* 11:18 ayounsi@cumin1001: START - Cookbook sre.network.cf
* 11:14 Urbanecm: EU B&C window done
* 11:12 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|28152b7387082b79d71cfbf28be740ffe629ee50}}: Add another SDC property to search for matching media statements ([[phab:T264925|T264925]]) (duration: 00m 58s)
* 11:11 klausman@cumin1001: START - Cookbook sre.ganeti.makevm
* 11:07 klausman@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99)
* 11:07 klausman@cumin1001: START - Cookbook sre.ganeti.makevm
* 11:06 klausman@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99)
* 11:06 klausman@cumin1001: START - Cookbook sre.ganeti.makevm
* 10:15 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 10:15 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime
* 10:15 elukey@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
* 10:12 elukey: restart tilerator on maps100[1,4] - redis errors in the logs
* 10:11 elukey: restart tilerator on maps1002 - redis errors in the logs
* 10:03 elukey@cumin1001: START - Cookbook sre.ganeti.makevm
* 10:03 elukey: drop 10.64.21.6/24 and 2620:0:861:105:10:64:21:6/64 from netbox (an-tool-ui1001 related records)
* 09:59 oblivian@deploy1001: Synchronized wmf-config/ProductionServices.php: Fix cxserver's configuration to use envoy (duration: 00m 59s)
* 09:52 elukey: add gdnsd.service to all gdnsd hosts (with LimitNOFILE=infinity as override) - no daemon restart done - [[phab:T266746|T266746]]
* 09:41 marostegui: Deploy schema change on s8 wikidata codfw master (db2079) [[phab:T264109|T264109]]
* 09:33 elukey: clean up 10.64.21.7/24 and 2620:0:861:105:10:64:21:7/64 from netbox (an-test-ui1001 already have ips previously allocated by makevm)
* 09:32 elukey@cumin1001: END (ERROR) - Cookbook sre.ganeti.makevm (exit_code=97)
* 09:23 elukey@cumin1001: START - Cookbook sre.ganeti.makevm
* 08:54 vgutierrez: turn off ECDHE-ECDSA-AES128-SHA support on the main caching cluster - [[phab:T258405|T258405]]
* 08:54 moritzm: fixing up stray jenkins auto restart timers on secondary releases server
* 08:53 vgutierrez: A:cp (except cp3052, running varnish 5) upgrade libvmod-netmapper to 1.9-1 [[phab:T266567|T266567]] [[phab:T264398|T264398]]
* 08:48 moritzm: fixing up stray mcelog auto restart timers on kubestage*
* 08:38 moritzm: fixing up stray cas auto restart timers on secondary IDP servers
* 08:19 moritzm: fixing up stray pmacctd auto restart timers on netflow*
* 08:19 moritzm: fixing up stray pcacctd auto restart timers on netflow*
* 08:02 marostegui: Disconnect replication codfw -> eqiad on s1 [[phab:T266663|T266663]]
* 07:56 vgutierrez: set LimitNOFILE=500000 for gdnsd on authdns1001
* 07:54 marostegui: Disconnect replication codfw -> eqiad on s4 [[phab:T266663|T266663]]
* 07:50 vgutierrez: restart haproxy on authdns2001
* 07:49 marostegui: Disconnect replication codfw -> eqiad on s8 [[phab:T266663|T266663]]
* 07:48 godog: swift codfw-prod: bump object weight for ms-be2057 - [[phab:T261633|T261633]]
* 07:46 marostegui: Disconnect replication codfw -> eqiad on s3 [[phab:T266663|T266663]]
* 07:43 vgutierrez: restart anycast-healthchecker on authdns2001
* 07:34 vgutierrez: set LimitNOFILE=500000 for gdnsd on authdns2001
* 07:27 elukey: "sudo truncate -s 10g /var/log/daemon.log" on authdns2001
* 06:52 marostegui: Disconnect replication codfw -> eqiad on s2 [[phab:T266663|T266663]]
* 06:38 marostegui: Disconnect replication codfw -> eqiad on s7 [[phab:T266663|T266663]]
* 06:36 marostegui: Disconnect replication codfw -> eqiad on s6 [[phab:T266663|T266663]]
* 06:25 elukey: execute 'truncate -s 10g /var/log/syslog.1 on authdns2001 - root partition full
* 06:23 marostegui: Disconnect replication codfw -> eqiad on s5 [[phab:T266663|T266663]]
* 06:10 marostegui: Disconnect replication codfw -> eqiad on es4 and es5 [[phab:T266663|T266663]]
* 06:07 marostegui: Disconnect replication codfw -> eqiad on x1 [[phab:T266663|T266663]]
* 05:58 marostegui: Disconnect replication codfw -> eqiad on pc1, pc2 and pc3 [[phab:T266663|T266663]]
* 04:06 ryankemper@cumin1001: END (PASS) - Cookbook sre.elasticsearch.rolling-restart (exit_code=0)
* 01:41 mutante: scandium reimaged a second time after making puppet changes to ensure nodejs/npm is NOT installed anymore ([[phab:T257906|T257906]])
* 01:17 ryankemper: [[phab:T266492|T266492]] Beginning rolling restart of eqiad cirrus cluster, 3 nodes at a time, on `ryankemper@cumin1001` tmux session `elasticsearch_restart_eqiad`
* 01:16 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-restart
* 00:51 ryankemper: Finished restart of wdqs categories across production hosts; wdqs deploy is complete and the service is healthy
* 00:14 Amir1: rolling restart of ores
* 00:12 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 00:10 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 00:04 ryankemper: Beginning restart of wdqs categories across production hosts, one at a time: `sudo -E cumin -b 1 'A:wdqs-all and not A:wdqs-test' 'depool && sleep 60 && systemctl restart wdqs-categories && sleep 30 && pool'`
* 00:03 ryankemper: Restarted wdqs categories across test hosts: `sudo -E cumin 'A:wdqs-test' 'systemctl restart wdqs-categories'`
* 00:03 ryankemper: Restarted wdqs updater across all hosts: `sudo -E cumin -b 4 'A:wdqs-all' 'systemctl restart wdqs-updater'`
* 00:02 ryankemper: Following wdqs deploy, https://query.wikidata.org successfully responds to an example query
* 00:01 ryankemper@deploy1001: Finished deploy [wdqs/wdqs@8c97b17]: 0.3.53 (duration: 09m 29s)
== 2020-10-28 ==
* 23:54 ryankemper: Canary `wdqs1003` tests pass, proceeding with wdqs deploy to rest of fleet
* 23:52 ryankemper@deploy1001: Started deploy [wdqs/wdqs@8c97b17]: 0.3.53
* 23:52 ryankemper@deploy1001: deploy aborted:  0.3.53 (duration: 00m 00s)
* 23:52 ryankemper@deploy1001: Started deploy [wdqs/wdqs@8c97b17]:  0.3.53
* 22:54 mutante: scandium - scap pull after reinstalling OS
* 22:14 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 22:12 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 21:41 ryankemper: Disabled elasticsearch "saneitizer" systemd timer in eqiad due to checker jobs falling behind: `sudo systemctl disable mediawiki_job_cirrus_sanitize_jobs.timer` on `mwmaint1002`
* 21:22 herron@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
* 21:05 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 21:05 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime
* 20:50 herron@cumin1001: START - Cookbook sre.ganeti.makevm
* 20:22 ladsgroup@deploy1001: Synchronized static/images/project-logos: Changing logo of Wikidata for the brithday (duration: 00m 58s)
* 19:56 jgleeson: updated Smashpig from {{Gerrit|2246685626}} to {{Gerrit|09f29c1da5}}
* 19:53 herron@cumin1001: END (ERROR) - Cookbook sre.ganeti.makevm (exit_code=97)
* 19:53 herron@cumin1001: START - Cookbook sre.ganeti.makevm
* 19:50 herron@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99)
* 19:36 herron@cumin1001: START - Cookbook sre.ganeti.makevm
* 19:36 herron@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99)
* 19:36 herron@cumin1001: START - Cookbook sre.ganeti.makevm
* 19:30 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 19:30 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime
* 19:22 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 19:20 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 18:56 tgr_: Morning deploys done
* 18:55 tgr@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:636983{{!}}Temporary enable 'editpage' warn logging (T251023)]] (duration: 00m 57s)
* 18:51 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 18:51 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 18:47 volans@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 18:46 tgr@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:636791{{!}}Revert "cirrus: Hardcode more_like to codfw cirrus cluster"]] (duration: 00m 56s)
* 18:45 tgr@deploy1001: Synchronized wmf-config/PoolCounterSettings.php: Config: [[gerrit:636956{{!}}Revert "Revert "Increase cirrus morelike pool counter by 20%"" ()]] (duration: 00m 57s)
* 18:43 volans@cumin1001: START - Cookbook sre.dns.netbox
* 18:40 tgr@deploy1001: Synchronized php-1.36.0-wmf.14/extensions/GrowthExperiments/includes/HomepageModules/SuggestedEdits.php: Backport: [[gerrit:636787{{!}}Suggested edits: Include page ID with task preview data (T266600)]] (duration: 00m 59s)
* 18:19 tgr@deploy1001: Synchronized wmf-config/CommonSettings.php: Config: [[gerrit:619880{{!}}Removing obsolete license definition]] (duration: 01m 00s)
* 18:11 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 18:07 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 18:06 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
* 18:02 elukey@cumin1001: END (ERROR) - Cookbook sre.dns.netbox (exit_code=97)
* 17:46 elukey@cumin1001: START - Cookbook sre.dns.netbox
* 17:46 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 17:30 hnowlan: reimporting OSM data for eqiad
* 17:24 hnowlan: removing OSM database on maps1004
* 16:35 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 16:34 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime
* 16:34 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 16:34 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime
* 16:34 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 16:34 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime
* 16:22 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: dc=eqiad,cluster=maps,service=kartotherian-ssl,name=maps1004.eqiad.wmnet
* 16:22 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: dc=eqiad,cluster=maps,service=kartotherian,name=maps1004.eqiad.wmnet
* 16:18 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: dc=eqiad,cluster=kartotherian,service=kartotherian,name=maps1004.eqiad.wmnet
* 16:16 hnowlan: Disabling tilerator in eqiad
* 16:15 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 16:15 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime
* 16:06 ppchelko@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' .
* 16:05 ppchelko@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' .
* 16:03 ppchelko@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'staging' .
* 15:51 Amir1: restarting uwsgi on ores in eqiad
* 15:49 elukey@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99)
* 15:33 ppchelko@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'mobileapps' for release 'nontls' .
* 15:33 ppchelko@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
* 15:24 ppchelko@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
* 15:24 ppchelko@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'mobileapps' for release 'nontls' .
* 15:23 ppchelko@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'mobileapps' for release 'staging' .
* 15:10 godog: roll restart logstash5 in codfw
* 14:50 elukey@cumin1001: START - Cookbook sre.ganeti.makevm
* 14:05 jayme@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'kube-system' for release 'eventrouter' .
* 13:54 jayme@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'kube-system' for release 'eventrouter' .
* 12:39 moritzm: installing libdatetime-timezone-perl  updates
* 11:46 XioNoX: configure urpf strict log-only on cr3-ulsfo:et-0/0/1.501 - [[phab:T266561|T266561]]
* 10:39 ema: due to [[phab:T266651|T266651]], cancel the entry above: A:cp upgrade libvmod-netmapper to 1.9-1 [[phab:T266567|T266567]] [[phab:T264398|T264398]]
* 10:38 elukey: clean up 10.64.5.7 and 2620:0:861:104:10:64:5:7 from Netbox (records mistakely allocated via the makevm cookbook) - [[phab:T266648|T266648]]
* 10:35 elukey@cumin1001: END (ERROR) - Cookbook sre.ganeti.makevm (exit_code=97)
* 10:25 ema: A:cp (except cp3052, running varnish 5) upgrade libvmod-netmapper to 1.9-1 [[phab:T266567|T266567]] [[phab:T264398|T264398]]
* 10:20 elukey@cumin1001: START - Cookbook sre.ganeti.makevm
* 09:54 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 09:52 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 09:50 elukey@cumin1001: START - Cookbook sre.hosts.downtime
* 09:49 elukey@cumin1001: START - Cookbook sre.hosts.downtime
* 09:26 jayme: imported kubeyaml 0.0.3~20201027+git5f5556c-1 to buster-wikimedia
* 09:04 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 09:02 elukey@cumin1001: START - Cookbook sre.hosts.downtime
* 08:37 jynus: updated dump grants on db2093
* 07:53 volans: upgraded python3-wmflib to 0.0.3 on the cumin hosts - [[phab:T257905|T257905]]
* 07:40 godog: update thanos-fe1002 to thanos 0.16.0 - [[phab:T261281|T261281]]
* 07:22 godog: swift codfw-prod: bump object weight for ms-be2057 - [[phab:T261633|T261633]]
* 04:43 ryankemper: [[phab:T266492|T266492]] Finished rolling restart of codfw cirrus cluster
* 04:43 ryankemper@cumin2001: END (PASS) - Cookbook sre.elasticsearch.rolling-restart (exit_code=0)
* 02:58 ryankemper: [[phab:T266492|T266492]] Beginning rolling restart of codfw cirrus cluster, 3 nodes at a time, on `ryankemper@cumin2001` tmux session `elasticsearch_restart_codfw`
* 02:57 ryankemper@cumin2001: START - Cookbook sre.elasticsearch.rolling-restart
* 02:12 eileen: tools revision changed from {{Gerrit|a2a91d6c6a}} to {{Gerrit|087a596d3a}}
* 00:40 eileen: civicrm revision changed from {{Gerrit|4fdfb8408b}} to {{Gerrit|e1d65b0f3a}}, config revision is {{Gerrit|f16003ab62}}
== 2020-10-27 ==
* 22:20 mutante: systemctl reset-failed on various servers to see which are coming back later from failed auto_restart and which don't
* 21:40 mutante: mwmaint2001 - systemctl reset-failed - mediawiki_job_parser_cache_purging.service
* 20:56 mutante: ms-be1057 is network down but running, NO-CARRIER on NIC, cable disconnected?
* 20:43 mutante: releases2002 - systemctl reset-failed .. after removing wmf_auto_restart_rsync
* 20:13 mutante: gerrit1001/gerrit2001: manually deleting list_mediawiki_extensions cron job ([[phab:T266024|T266024]])
* 19:40 eileen: civicrm revision changed from {{Gerrit|bb7c08bf6d}} to {{Gerrit|4fdfb8408b}}, config revision is {{Gerrit|f16003ab62}}
* 18:35 oblivian@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' .
* 17:55 otto@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'canary' .
* 17:55 otto@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
* 17:46 otto@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'canary' .
* 17:46 otto@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
* 17:44 otto@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
* 17:22 mutante: gerrit1001/2001 - sudo rm /var/www/mediawiki-extensions.txt
* 17:18 ejegg: updated payments-wiki from {{Gerrit|4c1503ad91}} to {{Gerrit|adc3369cb3}}
* 16:34 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.cf (exit_code=0)
* 16:34 ayounsi@cumin1001: START - Cookbook sre.network.cf
* 16:05 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.cf (exit_code=0)
* 16:05 ayounsi@cumin1001: START - Cookbook sre.network.cf
* 16:05 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.cf (exit_code=0)
* 16:05 ayounsi@cumin1001: START - Cookbook sre.network.cf
* 16:05 pt1979@cumin2001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 15:59 pt1979@cumin2001: START - Cookbook sre.dns.netbox
* 15:42 mepps: updated payments-wiki-staging from {{Gerrit|5fdd29bc16}} to {{Gerrit|4c1503ad91}}
* 15:25 ema: cp4032: downgrade varnish to 6.0.4 [[phab:T264398|T264398]]
* 15:13 ema: cp4032: varnish-frontend-restart with libvmod-netmapper 1.9-1 [[phab:T266567|T266567]]
* 14:55 ema: upload libvmod-netmapper 1.9-1 to buster-wikimedia component/varnish6 [[phab:T266567|T266567]]
* 14:49 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.08-restore-ttl (exit_code=0)
* 14:48 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.08-restore-ttl
* 14:40 _joe_: restarting envoyproxy on the jobrunners in codfw
* 14:36 akosiaris: rolling restart of all pods in codfw changeprop-jobqueue
* 14:27 _joe_: restart php-fpm on jobrunners in codfw
* 14:17 cdanis: ran puppet on alert1001
* 14:16 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.08-update-tendril (exit_code=0)
* 14:15 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.08-update-tendril
* 14:15 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.08-run-puppet-on-db-masters (exit_code=0)
* 14:11 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.08-run-puppet-on-db-masters
* 14:11 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.08-start-maintenance (exit_code=0)
* 14:11 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 14:11 marostegui@cumin1001: START - Cookbook sre.hosts.downtime
* 14:09 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.08-start-maintenance
* 14:09 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.07-set-readwrite (exit_code=0)
* 14:09 rzl@cumin1001: MediaWiki read-only period ends at: 2020-10-27 14:09:02.873019
* 14:09 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.07-set-readwrite
* 14:06 root@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 14:06 root@cumin1001: START - Cookbook sre.hosts.downtime
* 14:05 rzl@cumin1001: END (FAIL) - Cookbook sre.switchdc.mediawiki.07-set-readwrite (exit_code=99)
* 14:04 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.07-set-readwrite
* 14:04 rzl@cumin1001: END (FAIL) - Cookbook sre.switchdc.mediawiki.07-set-readwrite (exit_code=99)
* 14:03 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.07-set-readwrite
* 14:03 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.06-set-db-readwrite (exit_code=0)
* 14:03 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.06-set-db-readwrite
* 14:03 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.05-invert-redis-sessions (exit_code=0)
* 14:03 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.05-invert-redis-sessions
* 14:03 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.04-switch-mediawiki (exit_code=0)
* 14:02 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.04-switch-mediawiki
* 14:02 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.03-set-db-readonly (exit_code=0)
* 14:02 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.03-set-db-readonly
* 14:02 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.02-set-readonly (exit_code=0)
* 14:01 rzl@cumin1001: MediaWiki read-only period starts at: 2020-10-27 14:01:54.999830
* 14:01 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.02-set-readonly
* 13:56 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.01-stop-maintenance (exit_code=0)
* 13:56 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.01-stop-maintenance
* 13:55 root@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 13:55 root@cumin1001: START - Cookbook sre.hosts.downtime
* 13:54 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.00-warmup-caches (exit_code=0)
* 13:53 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.00-warmup-caches
* 13:50 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.00-warmup-caches (exit_code=0)
* 13:49 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.00-warmup-caches
* 13:47 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.00-warmup-caches (exit_code=0)
* 13:46 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.00-warmup-caches
* 13:38 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.00-warmup-caches (exit_code=0)
* 13:37 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.00-warmup-caches
* 13:36 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.00-disable-puppet (exit_code=0)
* 13:35 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.00-disable-puppet
* 13:35 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.00-reduce-ttl (exit_code=0)
* 13:35 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.00-reduce-ttl
* 13:15 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=0)
* 13:10 elukey@cumin1001: START - Cookbook sre.hadoop.init-hadoop-workers
* 13:07 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=0)
* 13:04 elukey@cumin1001: START - Cookbook sre.hadoop.init-hadoop-workers
* 13:01 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=0)
* 12:55 elukey@cumin1001: START - Cookbook sre.hadoop.init-hadoop-workers
* 12:55 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=0)
* 12:51 elukey@cumin1001: START - Cookbook sre.hadoop.init-hadoop-workers
* 11:35 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=0)
* 11:27 elukey@cumin1001: START - Cookbook sre.hadoop.init-hadoop-workers
* 11:25 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=0)
* 11:21 elukey@cumin1001: START - Cookbook sre.hadoop.init-hadoop-workers
* 11:19 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=0)
* 11:14 ema: A:cp remove libvarnishapi1, replaced by libvarnishapi2 a while ago [[phab:T261487|T261487]]
* 11:13 elukey@cumin1001: START - Cookbook sre.hadoop.init-hadoop-workers
* 11:12 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=0)
* 11:06 elukey@cumin1001: START - Cookbook sre.hadoop.init-hadoop-workers
* 11:02 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=0)
* 10:54 elukey@cumin1001: START - Cookbook sre.hadoop.init-hadoop-workers
* 10:52 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=0)
* 10:46 elukey@cumin1001: START - Cookbook sre.hadoop.init-hadoop-workers
* 10:44 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=0)
* 10:40 elukey@cumin1001: START - Cookbook sre.hadoop.init-hadoop-workers
* 10:31 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=0)
* 10:27 elukey@cumin1001: START - Cookbook sre.hadoop.init-hadoop-workers
* 10:21 XioNoX: update policies from-zone production to-zone junos-host on mr1-eqiad - [[phab:T265589|T265589]]
* 10:20 XioNoX: update policies from-zone production to-zone junos-host on mr1-eqsin - [[phab:T265589|T265589]]
* 10:19 XioNoX: update policies from-zone production to-zone junos-host on mr1-ulsfo - [[phab:T265589|T265589]]
* 10:15 XioNoX: update policies from-zone production to-zone junos-host on mr1-esams - [[phab:T265589|T265589]]
* 10:06 XioNoX: update policies from-zone production to-zone junos-host on mr1-codfw - [[phab:T265589|T265589]]
* 08:58 elukey@cumin1001: END (ERROR) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=97)
* 08:55 elukey@cumin1001: START - Cookbook sre.hadoop.init-hadoop-workers
* 08:39 elukey@cumin1001: END (ERROR) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=97)
* 08:32 elukey@cumin1001: START - Cookbook sre.hadoop.init-hadoop-workers
* 08:30 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=0)
* 08:27 elukey@cumin1001: START - Cookbook sre.hadoop.init-hadoop-workers
* 08:15 godog: update thanos-fe2002 to thanos 0.16.0 - [[phab:T261281|T261281]]
* 07:35 godog: swift codfw-prod: bump object weight for ms-be2057 - [[phab:T261633|T261633]]
* 06:58 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'kube-system' for release 'eventrouter' .
* 06:50 jayme: published docker-registry.discovery.wmnet/eventrouter:0.3.0-4
* 06:42 ryankemper: [[phab:T263970|T263970]] Set number of replicas to 2 (from previous value of 1) for all codfw indices matching `apifeatureusage*`, new shards have been assigned without issue
== 2020-10-26 ==
* 23:12 catrope@deploy1001: Synchronized php-1.36.0-wmf.14/extensions/GrowthExperiments/: Fix JS error when no topics set ([[phab:T266501|T266501]]) (duration: 01m 00s)
* 22:30 mutante: netflow5001 - systemctl reset-failed
* 21:44 rzl: live test of sre.switchdc.mediawiki complete, the foregoing logging noise had no actual production impact
* 21:43 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.08-update-tendril (exit_code=0)
* 21:43 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.08-update-tendril
* 21:43 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.08-start-maintenance (exit_code=0)
* 21:41 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.08-start-maintenance
* 21:41 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.08-restore-ttl (exit_code=0)
* 21:41 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.08-restore-ttl
* 21:40 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.08-run-puppet-on-db-masters (exit_code=0)
* 21:37 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.08-run-puppet-on-db-masters
* 21:37 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.07-set-readwrite (exit_code=0)
* 21:37 rzl@cumin1001: [DRY-RUN] MediaWiki read-only period ends at: 2020-10-26 21:37:17.809596
* 21:37 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.07-set-readwrite
* 21:37 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.06-set-db-readwrite (exit_code=0)
* 21:37 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.06-set-db-readwrite
* 21:37 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.05-invert-redis-sessions (exit_code=0)
* 21:36 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.05-invert-redis-sessions
* 21:36 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.04-switch-mediawiki (exit_code=0)
* 21:36 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.04-switch-mediawiki
* 21:36 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.03-set-db-readonly (exit_code=0)
* 21:35 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.03-set-db-readonly
* 21:35 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.02-set-readonly (exit_code=0)
* 21:35 rzl@cumin1001: [DRY-RUN] MediaWiki read-only period starts at: 2020-10-26 21:35:20.837214
* 21:35 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.02-set-readonly
* 21:34 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.01-stop-maintenance (exit_code=0)
* 21:34 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.01-stop-maintenance
* 21:34 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.00-warmup-caches (exit_code=0)
* 21:33 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.00-warmup-caches
* 21:32 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.00-reduce-ttl (exit_code=0)
* 21:32 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.00-reduce-ttl
* 21:31 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.00-disable-puppet (exit_code=0)
* 21:31 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.00-disable-puppet
* 21:31 rzl: starting a live test of sre.switchdc.mediawiki, which will create some logging noise but no actual production impact
* 20:54 mutante: scandium rm /usr/local/bin/update_parsoid.sh (gerrit:636494)
* 20:15 ladsgroup@deploy1001: Finished deploy [ores/deploy@6912889]: Deploy new version of articlequality for wikidata ([[phab:T261326|T261326]]) (duration: 06m 53s)
* 20:08 ladsgroup@deploy1001: Started deploy [ores/deploy@6912889]: Deploy new version of articlequality for wikidata ([[phab:T261326|T261326]])
* 19:31 ppchelko@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
* 19:29 ppchelko@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
* 19:26 ppchelko@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
* 18:59 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: GrowthExperiments: Remove variant setting override (no-op) ([[phab:T265556|T265556]]) (duration: 00m 57s)
* 18:55 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Configure $wgBabelCategoryNames on ndswiki ([[phab:T264990|T264990]]) (duration: 00m 58s)
* 18:51 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Add www.legislation.gov.uk to $wgCopyUploadsDomains on commonswiki ([[phab:T265690|T265690]]) (duration: 00m 58s)
* 18:47 catrope@deploy1001: Synchronized php-1.36.0-wmf.14/extensions/GrowthExperiments/: Make variant D the default, remove variant A ([[phab:T265372|T265372]], [[phab:T265556|T265556]]) (duration: 00m 58s)
* 18:46 catrope@deploy1001: Synchronized php-1.36.0-wmf.14/vendor/wikimedia/parsoid/: Bump wikimedia/parsoid to v0.13.0-a13, enabling 6-element DSRs ([[phab:T266285|T266285]]) (duration: 00m 58s)
* 18:43 catrope@deploy1001: Synchronized php-1.36.0-wmf.14/skins/Vector/: Fix logic in collapsibleTabs code ([[phab:T71729|T71729]]) (duration: 00m 58s)
* 18:21 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Remove wtp2001-wtp2020 from LinterSubmitterWhitelist ([[phab:T265558|T265558]]) (duration: 00m 59s)
* 18:10 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: GrowthExperiments: Make variant D the default on all wikis ([[phab:T265556|T265556]]) (duration: 00m 58s)
* 17:58 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'kube-system' for release 'eventrouter' .
* 17:48 mutante: an-worker109* - systemctl reset-failed  to clear Icinga alerts related to wmf_auto_restart changes
* 17:45 mutante: releases2002,netmon2001, various other hosts - systemctl reset-failed  to clear Icinga alerts related to wmf_auto_restart changes
* 17:39 krinkle@deploy1001: Synchronized php-1.36.0-wmf.13/resources/src/mediawiki.util/: [[phab:T265809|T265809]], {{Gerrit|I1011f63ae61f5a6}} (duration: 01m 00s)
* 16:41 XioNoX: bounce security log on pfw3-eqiad - [[phab:T263833|T263833]]
* 16:29 XioNoX: set security-log traceoptions on pfw3-eqiad - [[phab:T263833|T263833]]
* 16:14 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 16:08 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 16:07 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 16:00 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 15:51 rzl@cumin1001: conftool action : set/ttl=300; selector: dnsdisc=apertium{{!}}api-gateway{{!}}citoid{{!}}cxserver{{!}}echostore{{!}}eventgate-analytics{{!}}eventgate-analytics-external{{!}}eventgate-logging-external{{!}}eventgate-main{{!}}eventstreams{{!}}graphoid{{!}}kartotherian{{!}}mathoid{{!}}mobileapps{{!}}ores{{!}}parsoid{{!}}proton{{!}}push-notifications{{!}}recommendation-api{{!}}restbase{{!}}restbase-async{{!}}schema{{!}}search{{!}}sessionstore{{!}}termbox{{!}}wdqs{{!}}wdqs-internal{{!}}wikifeeds{{!}}zotero,name=eqiad
* 15:35 rzl@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=zotero,name=eqiad
* 15:32 rzl@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=wikifeeds,name=eqiad
* 15:29 rzl@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=wdqs-internal,name=eqiad
* 15:26 rzl@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=wdqs,name=eqiad
* 15:23 rzl@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=termbox,name=eqiad
* 15:20 rzl@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=sessionstore,name=eqiad
* 15:17 rzl@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=search,name=eqiad
* 15:14 rzl@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=schema,name=eqiad
* 15:11 rzl@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=restbase-async,name=eqiad
* 15:08 rzl@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=restbase,name=eqiad
* 15:05 rzl@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=recommendation-api,name=eqiad
* 15:02 rzl@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=push-notifications,name=eqiad
* 14:59 rzl@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=proton,name=eqiad
* 14:56 rzl@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=parsoid,name=eqiad
* 14:53 rzl@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=ores,name=eqiad
* 14:50 rzl@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=mobileapps,name=eqiad
* 14:47 rzl@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=mathoid,name=eqiad
* 14:46 ppchelko@deploy1001: Finished deploy [restbase/deploy@a1a1bd7]: Add api-portal and snmwiki (duration: 16m 43s)
* 14:44 rzl@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=kartotherian,name=eqiad
* 14:41 rzl@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=graphoid,name=eqiad
* 14:38 rzl@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=eventstreams,name=eqiad
* 14:35 rzl@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=eventgate-main,name=eqiad
* 14:32 rzl@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=eventgate-logging-external,name=eqiad
* 14:30 ppchelko@deploy1001: Started deploy [restbase/deploy@a1a1bd7]: Add api-portal and snmwiki
* 14:29 rzl@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=eventgate-analytics-external,name=eqiad
* 14:26 rzl@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=eventgate-analytics,name=eqiad
* 14:23 rzl@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=echostore,name=eqiad
* 14:20 rzl@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=cxserver,name=eqiad
* 14:17 rzl@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=citoid,name=eqiad
* 14:14 rzl@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=api-gateway,name=eqiad
* 14:11 rzl@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=apertium,name=eqiad
* 14:06 rzl@cumin1001: conftool action : set/ttl=10; selector: dnsdisc=apertium{{!}}api-gateway{{!}}citoid{{!}}cxserver{{!}}echostore{{!}}eventgate-analytics{{!}}eventgate-analytics-external{{!}}eventgate-logging-external{{!}}eventgate-main{{!}}eventstreams{{!}}graphoid{{!}}kartotherian{{!}}mathoid{{!}}mobileapps{{!}}ores{{!}}parsoid{{!}}proton{{!}}push-notifications{{!}}recommendation-api{{!}}restbase{{!}}restbase-async{{!}}schema{{!}}search{{!}}sessionstore{{!}}termbox{{!}}wdqs{{!}}wdqs-internal{{!}}wikifeeds{{!}}zotero,name=eqiad
* 13:48 moritzm: imported cas 6.2.4-1 to apt.wikimedia.org [[phab:T265857|T265857]]
* 13:21 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 13:21 marostegui@cumin1001: START - Cookbook sre.hosts.downtime
* 13:19 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 13:19 marostegui@cumin1001: START - Cookbook sre.hosts.downtime
* 11:52 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|bff6b37a55fe8f260fe00cbb942c53101167fb07}}: Add foto.digitalarkivet.no to wgCopyUploadsDomains whitelist of Wikimedia Commons ([[phab:T266390|T266390]]) (duration: 01m 14s)
* 11:27 elukey@cumin1001: END (FAIL) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=99)
* 11:27 elukey@cumin1001: START - Cookbook sre.hadoop.init-hadoop-workers
* 11:26 elukey@cumin1001: END (FAIL) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=99)
* 11:26 elukey@cumin1001: START - Cookbook sre.hadoop.init-hadoop-workers
* 11:11 vgutierrez: upgrade trafficserver to 8.0.8-1wm3 on cp4032 - [[phab:T265911|T265911]]
* 11:02 elukey@cumin1001: END (FAIL) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=99)
* 11:02 elukey@cumin1001: START - Cookbook sre.hadoop.init-hadoop-workers
* 10:51 vgutierrez: manually reloading nginx on cloudelastic[1005-1006]
* 10:29 vgutierrez: upload trafficserver 8.0.8-1wm3 to apt.wm.org (buster) - [[phab:T265911|T265911]]
* 10:18 godog: roll restart pybal to apply latest configuration
* 09:51 jayme: published docker-registry.discovery.wmnet/eventrouter:0.3.0-3
* 09:31 moritzm: restarting PHP FPM on mw canaries to pick up freetype update
* 09:04 godog: swift codfw-prod: bump object weight for ms-be2057 - [[phab:T261633|T261633]]
* 08:58 moritzm: installing freetype security updates for stretch
* 08:57 XioNoX: remove down sessions to AS38758
* 08:51 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 08:51 filippo@cumin1001: START - Cookbook sre.hosts.downtime
* 08:43 XioNoX: remove down sessions to AS8560
* 08:41 XioNoX: remove down sessions to AS31334
* 08:28 XioNoX: remove down sessions to AS6327
* 08:27 XioNoX: remove down sessions to AS8674
* 08:25 XioNoX: remove down sessions to AS24429
* 08:21 XioNoX: remove down sessions to AS16509
* 06:59 _joe_: rolling restart of php7.2-fpm on the codfw jobrunners, to reduce the number of dangling transcodes after restarting cp-jobqueue for a deploy
* 06:59 oblivian@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' .
* 06:16 oblivian@cumin2001: conftool action : set/pooled=no; selector: cluster=jobrunner,dc=codfw,name=mw224.*
* 06:15 oblivian@cumin2001: conftool action : set/pooled=no; selector: cluster=videoscaler,dc=codfw,name=mw228.*
* 06:10 marostegui: Warm up tables [[phab:T261914|T261914]]
== 2020-10-25 ==
* 15:53 dwisehaupt: kernel upgrade and reboot for frdb1003
* 15:50 dwisehaupt: kernel upgrade and reboot for fran1001
== 2020-10-23 ==
* 22:56 mutante: added Nuria to "nda" LDAP group - leaving her in "wmf" until the actual last day - shell account remains so no puppet change needed in ldap_only_admins ([[phab:T266086|T266086]])
* 15:42 pt1979@cumin2001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 15:37 pt1979@cumin2001: START - Cookbook sre.dns.netbox
* 13:04 ema: rolling thumbor-instances restart to apply https://gerrit.wikimedia.org/r/c/operations/puppet/+/636012/ [[phab:T266155|T266155]]
* 12:47 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'kube-system' for release 'eventrouter' .
* 10:57 kormat: uploaded orchestrator v3.2.3 to apt.wikimedia.org buster-wikimedia - [[phab:T266023|T266023]] (forgot to log this earlier)
* 10:56 volans: uploaded python3-wmflib_0.0.3 to apt.wikimedia.org buster-wikimedia - [[phab:T257905|T257905]]
* 10:09 jayme: published docker-registry.discovery.wmnet/eventrouter:0.3.0-2
* 09:51 moritzm: masking slapd on the old Stretch replicas to uncover potential direct access outside of the LVSes  [[phab:T264388|T264388]]
* 09:47 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 09:47 jmm@cumin2001: START - Cookbook sre.hosts.downtime
* 09:47 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 09:47 jmm@cumin2001: START - Cookbook sre.hosts.downtime
* 09:32 jayme@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
* 09:31 jayme: published docker-registry.discovery.wmnet/eventrouter:0.3.0-1
* 09:26 jayme@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
* 09:23 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'wikifeeds' for release 'staging' .
* 09:09 volans: upgrading spicerack to 0.0.44 on cumin hosts - [[phab:T257905|T257905]]
== 2020-10-22 ==
* 22:42 mutante: ganeti1001 - adding 2 more vcpus to VM testreduce1001 - [[phab:T257940|T257940]]
* 22:03 mutante: deploy1002 - armed keyholder, all deployment keys loaded [[phab:T265963|T265963]]
* 21:56 mutante: deploy1002 - scap pull  and added to mediawiki-installation "dsh" group - will be part of scap trains but just like any appserver ([[phab:T265963|T265963]])
* 20:36 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 20:36 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 19:13 mutante: deploy1002 currently cloning ALL the deployment repos - new setup
* 18:57 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 18:57 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 18:56 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 18:56 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 18:54 mutante: applying deployment_server role to new server deploy1002 - might show up in monitoring but is not prod yet, deploy1001 still is
* 18:34 mutante: adding mcrouter cert for deploy1002.eqiad.wmnet [[phab:T265963|T265963]]
* 18:12 dpifke@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Expand  to group1 ([[phab:T123582|T123582]]) (duration: 00m 56s)
* 18:12 volans: cumin 'A:dns-rec' 'rec_control wipe-cache wikimedia.org$' - [[phab:T258729|T258729]]
* 18:07 chaomodus: Updating eqiad public network DNS to automation
* 17:50 volans: cumin 'A:dns-rec' 'rec_control wipe-cache eqiad.wmnet$' - [[phab:T258729|T258729]]
* 17:49 elukey: add thirdparty/bigtop14 to buster-wikimedia
* 17:46 chaomodus: Updating eqiad private network DNS to automation
* 17:21 bd808@cumin1001: END (PASS) - Cookbook wmcs.wikireplicas.add_wiki (exit_code=0)
* 17:21 bd808@cumin1001: Added views for new wiki: smnwiki [[phab:T264900|T264900]]
* 17:07 bd808@cumin1001: START - Cookbook wmcs.wikireplicas.add_wiki
* 16:46 pt1979@cumin2001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 16:42 pt1979@cumin2001: START - Cookbook sre.dns.netbox
* 14:56 moritzm: installing remaining mariadb-10.3 updates for buster (as packaged in Debian, not the wmf-mariadb package)
* 14:47 ayounsi@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 14:33 ayounsi@cumin1001: START - Cookbook sre.dns.netbox
* 14:13 andrewbogott: upgrading mariadb on cloudcontrol1003, 1004, 1005
* 14:05 ottomata: bump camus version to wmf12 for all camus jobs.  should be no-op now. - [[phab:T251609|T251609]]
* 14:00 otto@deploy1001: Synchronized wmf-config/InitialiseSettings.php: wgEventStreams: Enable canary events for all eventgate-analytics-external bound streams - [[phab:T251609|T251609]] (duration: 01m 02s)
* 13:55 moritzm: depooling ldap-eqiad-replica01/ldap-eqiad-replica02 [[phab:T264388|T264388]]
* 13:41 moritzm: pooling ldap-replica1001/1002 [[phab:T264388|T264388]]
* 13:10 moritzm: depooling ldap-replica2001/2002 [[phab:T264388|T264388]]
* 13:04 liw@deploy1001: rebuilt and synchronized wikiversions files: all wikis to 1.36.0-wmf.14
* 13:01 moritzm: pooling ldap-replica2004 [[phab:T264388|T264388]]
* 12:24 otto@deploy1001: Synchronized wmf-config/InitialiseSettings.php: wgEventStreams: Enable canary events for 3 eventgate-analytics bound streams - [[phab:T251609|T251609]] (duration: 01m 05s)
* 12:10 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|52ad2d4df1164dced684231c12aa64bd028b8ac9}}: Do not log logins at loginwiki via CU ([[phab:T253802|T253802]]) (duration: 01m 06s)
* 12:03 Urbanecm: [urbanecm@deploy1001 /srv/mediawiki-staging (master * u=)]$ sudo /usr/local/sbin/fix-staging-perms
* 11:59 Lucas_WMDE: EU backport&config window done
* 11:58 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/Wikibase.php: Config: [[gerrit:635762{{!}}Enable propagatePageDeletion on Test Wikidata]], 2/2 (duration: 01m 04s)
* 11:57 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:635762{{!}}Enable propagatePageDeletion on Test Wikidata]], 1/2 (duration: 01m 02s)
* 11:54 Urbanecm: Start of `mwscript extensions/AbuseFilter/maintenance/updateVarDumps.php --wiki=$wiki --print-orphaned-records-to=/tmp/urbanecm/$wiki-orphaned.log --progress-markers > $wiki.log` in a tmux session updateVarDumps at mwmaint2001 (wiki=huwiki; [[phab:T246539|T246539]])
* 11:39 moritzm: restarting nginx on acmechief*, debmonitor*, schema*, puppetdb* to pick up freetype update
* 11:38 marostegui: Compare s1-s8 tables - [[phab:T261914|T261914]]
* 11:33 aborrero@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 11:31 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/InterwikiSortOrders.php: Config: [[gerrit:635813{{!}}Add ary, avk, awa, lld, shy and smn to InterwikiSortOrders.php]] (duration: 01m 08s)
* 11:31 aborrero@cumin2001: START - Cookbook sre.hosts.downtime
* 11:25 moritzm: restarting apache and smokeping* on netmon* to pick up freetype update
* 11:21 moritzm: correction: installing freetype security updates for buster (stretch TBD)
* 10:43 moritzm: installing freetype security updates for stretch/buster
* 10:33 volans@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 10:27 volans@cumin1001: START - Cookbook sre.dns.netbox
* 09:38 arturo: merging https://gerrit.wikimedia.org/r/c/operations/puppet/+/634050 change to network data yaml
* 08:31 kormat: enabling replication from eqiad to codfw [[phab:T261914|T261914]]
* 08:23 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 08:23 filippo@cumin1001: START - Cookbook sre.hosts.downtime
* 07:52 godog: swift codfw-prod: bump object weight for ms-be2057 - [[phab:T261633|T261633]]
* 03:37 eileen: civicrm revision changed from {{Gerrit|4dce7bf535}} to {{Gerrit|bb7c08bf6d}}, config revision is {{Gerrit|9a522d03dd}}
* 03:13 eileen: civicrm revision changed from {{Gerrit|3c3dcf80ae}} to {{Gerrit|4dce7bf535}}, config revision is {{Gerrit|9a522d03dd}}
* 01:12 ryankemper@deploy1001: Finished deploy [wdqs/wdqs@870829c]: 0.3.52 (duration: 09m 07s)
* 01:04 ryankemper: Tests passing on canary `wdqs1003`, proceeding with wdqs deploy for rest of fleet
* 01:03 ryankemper@deploy1001: Started deploy [wdqs/wdqs@870829c]: 0.3.52
== 2020-10-21 ==
* 23:16 catrope@deploy1001: Synchronized php-1.36.0-wmf.14/extensions/GrowthExperiments/: [[phab:T266033|T266033]] (duration: 01m 05s)
* 23:14 catrope@deploy1001: Synchronized php-1.36.0-wmf.13/extensions/GrowthExperiments/: [[phab:T265751|T265751]] [[phab:T265754|T265754]] (duration: 01m 08s)
* 21:38 mutante: testreduce1001 assigned 2 more GBs of RAM - rebooting ([[phab:T257940|T257940]], [[phab:T257906|T257906]])
* 19:44 Amir1: end of foreachwikiindblist wikidataclient extensions/Wikibase/lib/maintenance/populateSitesTable.php --force-protocol https ([[phab:T264963|T264963]])
* 19:15 Amir1: start of foreachwikiindblist wikidataclient extensions/Wikibase/lib/maintenance/populateSitesTable.php --force-protocol https ([[phab:T264963|T264963]])
* 18:13 Urbanecm: Morning B&C window done
* 18:12 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|45312d359442d274e83deb7be80f86e12fb9e864}}: [WikibaseMediaInfo] Fix concept chips array nesting structure ([[phab:T256431|T256431]]) (duration: 01m 05s)
* 18:12 mepps: updated payments-wiki-staging from {{Gerrit|db03677b2d}} to {{Gerrit|5fdd29bc16}}
* 18:09 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|d94e33ff39b300c74fcaf08d1746c089fb1af783}}: cirrus: Hardcode more_like to codfw cirrus cluster (duration: 01m 05s)
* 17:56 XioNoX: configure FB PNI in eqdfw
* 17:43 ppchelko@deploy1001: Synchronized php-1.36.0-wmf.14/skins/WikimediaApiPortal: Backport gerrit:635329, [[phab:T266021|T266021]] (duration: 01m 06s)
* 17:34 ppchelko@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Switch ParserCache to JSON on testwiki gerrit:635382 (duration: 01m 05s)
* 17:24 ppchelko@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enable ParserCache logger for warn+, gerrit:635071 (duration: 01m 08s)
* 17:21 ppchelko@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enable ParserCache logger for warn+, gerrit:635071 (duration: 01m 06s)
* 17:13 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 17:13 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 16:58 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 16:57 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 16:57 mutante: scandium - disabling puppet so that Parsoid team can make some tests on testreduce1001 today
* 16:46 effie: restart php-fpm and pool mw2252 and mw2328
* 15:58 Lucas_WMDE: Deployed patch for [[phab:T260349|T260349]]
* 15:34 ppchelko@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
* 15:33 ppchelko@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
* 15:31 ppchelko@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
* 15:28 moritzm: updating prometheus-openldap-exporter to 0+git20171128-3 to buster-wikimedia
* 15:23 jbond42: upgrade puppetlabs-stdlib to 6.5.0 https://gerrit.wikimedia.org/r/c/operations/puppet/+/634278
* 15:08 moritzm: imported prometheus-openldap-exporter 0+git20171128-3 to buster-wikimedia [[phab:T264388|T264388]]
* 15:02 otto@deploy1001: Finished deploy [analytics/refinery@e4d16f0] (hadoop-test): deploying with updated camus to test cluster (duration: 02m 56s)
* 15:01 crusnov@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 15:00 otto@deploy1001: Started deploy [analytics/refinery@e4d16f0] (hadoop-test): deploying with updated camus to test cluster
* 14:56 crusnov@cumin1001: START - Cookbook sre.dns.netbox
* 14:44 reedy@deploy1001: Synchronized wmf-config/wikitech.php: Set CURLOPT_RETURNTRANSFER true in gerrit handler [[phab:T242554|T242554]] (duration: 01m 07s)
* 14:34 dcausse: restarting blazegraph on codfw servers ([[phab:T263952|T263952]])
* 13:21 moritzm: pooling ldap-replica2003 [[phab:T264388|T264388]]
* 13:04 liw@deploy1001: Synchronized php: group1 wikis to 1.36.0-wmf.14 (duration: 01m 04s)
* 13:03 liw@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.36.0-wmf.14
* 11:40 matthiasmullie: EU B&C done
* 11:33 mlitn@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [WikibaseMediaInfo] Add config for related terms API (duration: 01m 04s)
* 11:17 urbanecm@deploy1001: Synchronized wmf-config/CommonSettings.php: {{Gerrit|785404fa2b998947d236aebe481ee1abcbd14220}}: Disable registrations stat on Special:TranslationStats ([[phab:T264158|T264158]]) (duration: 01m 05s)
* 11:10 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|11567427c3f7d2908b29046ee56a7b0c0da32c09}}: Enable ContentTranslation in 5 Wikipedias as a default tool ([[phab:T264737|T264737]]; [[phab:T264738|T264738]]; [[phab:T264739|T264739]]; [[phab:T264740|T264740]]; [[phab:T264741|T264741]]) (duration: 01m 30s)
* 11:00 marostegui: Upgrade db2093's mariadb version [[phab:T266003|T266003]]
* 10:58 aborrero@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 10:56 aborrero@cumin2001: START - Cookbook sre.hosts.downtime
* 10:38 Urbanecm: Start of `mwscript extensions/AbuseFilter/maintenance/updateVarDumps.php --wiki=$wiki --print-orphaned-records-to=/tmp/urbanecm/$wiki-orphaned.log --progress-markers > $wiki.log` (wiki=rowiki; [[phab:T246539|T246539]])
* 10:37 Urbanecm: End of `mwscript extensions/AbuseFilter/maintenance/updateVarDumps.php --wiki=$wiki --print-orphaned-records-to=/tmp/urbanecm/$wiki-orphaned.log --progress-markers > $wiki.log` (wiki=srwiki; [[phab:T246539|T246539]])
* 10:01 Urbanecm: Start of `mwscript extensions/AbuseFilter/maintenance/updateVarDumps.php --wiki=$wiki --print-orphaned-records-to=/tmp/urbanecm/$wiki-orphaned.log --progress-markers > $wiki.log` (wiki=srwiki; [[phab:T246539|T246539]])
* 10:00 Urbanecm: End of `mwscript extensions/AbuseFilter/maintenance/updateVarDumps.php --wiki=$wiki --print-orphaned-records-to=/tmp/urbanecm/$wiki-orphaned.log --progress-markers > $wiki.log` (wiki=nowiki; [[phab:T246539|T246539]])
* 09:59 vgutierrez: Bump ECDHE-ECDSA-AES128-SHA pageview replacement to 100% - [[phab:T258405|T258405]]
* 09:42 Urbanecm: Start of `mwscript extensions/AbuseFilter/maintenance/updateVarDumps.php --wiki=$wiki --print-orphaned-records-to=/tmp/urbanecm/$wiki-orphaned.log --progress-markers > $wiki.log` (wiki=nowiki; [[phab:T246539|T246539]])
* 09:42 Urbanecm: End of `mwscript extensions/AbuseFilter/maintenance/updateVarDumps.php --wiki=$wiki --print-orphaned-records-to=/tmp/urbanecm/$wiki-orphaned.log --progress-markers > $wiki.log` (wiki=shwiki; [[phab:T246539|T246539]])
* 09:38 Urbanecm: Start of `mwscript extensions/AbuseFilter/maintenance/updateVarDumps.php --wiki=$wiki --print-orphaned-records-to=/tmp/urbanecm/$wiki-orphaned.log --progress-markers > $wiki.log` (wiki=shwiki; [[phab:T246539|T246539]])
* 09:37 Urbanecm: mwscript extensions/AbuseFilter/maintenance/updateVarDumps.php --wiki=$wiki --print-orphaned-records-to=/tmp/urbanecm/$wiki-orphaned.log --progress-markers > $wiki.log # wiki=warwiki; [[phab:T246539|T246539]]
* 09:30 Urbanecm: End of `mwscript extensions/AbuseFilter/maintenance/updateVarDumps.php --wiki=$wiki --print-orphaned-records-to=/tmp/urbanecm/$wiki-orphaned.log --progress-markers > $wiki.log` (wiki=viwiki; [[phab:T246539|T246539]])
* 09:23 aborrero@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 09:22 root@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
* 09:21 aborrero@cumin2001: START - Cookbook sre.hosts.downtime
* 08:52 Urbanecm: Start of `mwscript extensions/AbuseFilter/maintenance/updateVarDumps.php --wiki=$wiki --print-orphaned-records-to=/tmp/urbanecm/$wiki-orphaned.log --progress-markers > $wiki.log` (wiki=viwiki; [[phab:T246539|T246539]])
* 08:50 Urbanecm: mwscript extensions/AbuseFilter/maintenance/updateVarDumps.php --wiki=$wiki --print-orphaned-records-to=/tmp/urbanecm/$wiki-orphaned.log --progress-markers > $wiki.log # wiki=cebwiki; [[phab:T246539|T246539]]
* 08:46 Urbanecm: [urbanecm@mwmaint2001 ~/updateVarDumps/output/group2-medium/output]$ mwscript extensions/AbuseFilter/maintenance/updateVarDumps.php --wiki=apiportalwiki # [[phab:T246539|T246539]]
* 08:38 root@cumin1001: START - Cookbook sre.ganeti.makevm
* 08:38 root@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99)
* 08:38 root@cumin1001: START - Cookbook sre.ganeti.makevm
* 08:33 godog: swift codfw-prod: bump object weight for ms-be2057 - [[phab:T261633|T261633]]
* 08:10 XioNoX: Upgrade Routinator 3000 to 0.8.0 on rpki1001 - [[phab:T266001|T266001]]
* 08:09 XioNoX: add Routinator 3000 0.8.0 to apt - [[phab:T266001|T266001]]
* 07:58 elukey: update analytics-in4 filter on cr1/cr2-eqiad for https://gerrit.wikimedia.org/r/635319
* 04:35 ryankemper: re-enabled icinga notifications on all wdqs hosts now that `wdqs-updater` is healthy
== 2020-10-20 ==
* 22:10 dwisehaupt: frmon2001 upgraded to buster with grafana 7.2.1
* 21:19 razzi@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
* 21:18 cdanis: ✔️ cdanis@mw2252.codfw.wmnet ~ 🕠🍺 sudo depool
* 20:57 mforns@deploy1001: Finished deploy [analytics/refinery@e4d16f0] (thin): Regular analytics weekly train THIN [analytics/refinery@e4d16f08a96b6f65447fcdc6c9e8945724a89f54] (duration: 00m 08s)
* 20:56 mforns@deploy1001: Started deploy [analytics/refinery@e4d16f0] (thin): Regular analytics weekly train THIN [analytics/refinery@e4d16f08a96b6f65447fcdc6c9e8945724a89f54]
* 20:39 cdanis: doing some manual testing on mw2221, depooled and puppet disabled
* 20:33 mforns@deploy1001: Finished deploy [analytics/refinery@e4d16f0]: Regular analytics weekly train [analytics/refinery@e4d16f08a96b6f65447fcdc6c9e8945724a89f54] (duration: 08m 10s)
* 20:31 ryankemper: [Temporarily] disabled notifications for all wdqs hosts while we figure out how to unstick the updater process. Impact is that new updates will be delayed, but queries will still keep serving as normal, so fixing this is a priority but note that there's no availability outage
* 20:29 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 20:25 mforns@deploy1001: Started deploy [analytics/refinery@e4d16f0]: Regular analytics weekly train [analytics/refinery@e4d16f08a96b6f65447fcdc6c9e8945724a89f54]
* 20:19 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
* 20:18 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 20:06 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
* 19:59 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 19:47 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
* 19:47 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
* 19:47 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
* 19:45 dzahn@cumin1001: conftool action : set/pooled=yes; selector: dc=codfw,cluster=parsoid,service=canary
* 19:24 razzi@cumin1001: START - Cookbook sre.ganeti.makevm
* 18:58 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 18:56 andrew@cumin1001: START - Cookbook sre.hosts.downtime
* 17:48 effie: depooling mw2328 - [[phab:T266052|T266052]]
* 17:37 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 17:35 andrew@cumin1001: START - Cookbook sre.hosts.downtime
* 15:54 ebernhardson@deploy1001: Finished deploy [wikimedia/discovery/analytics@629e8bc]: search satisfaction: remove unused y/m/d cli args (duration: 01m 31s)
* 15:52 ebernhardson@deploy1001: Started deploy [wikimedia/discovery/analytics@629e8bc]: search satisfaction: remove unused y/m/d cli args
* 15:15 aborrero@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 15:13 aborrero@cumin2001: START - Cookbook sre.hosts.downtime
* 14:58 urbanecm@deploy1001: Synchronized php-1.36.0-wmf.13/extensions/AbuseFilter/includes/Views/AbuseFilterViewList.php: {{Gerrit|fee2d3be13ae14d7ea51ff2db42090a1c27819bf}}: Prevent uncaught warnings/exception on Special:AbuseFilter ([[phab:T265994|T265994]]) (duration: 01m 03s)
* 14:56 urbanecm@deploy1001: Synchronized php-1.36.0-wmf.14/extensions/AbuseFilter/includes/Views/AbuseFilterViewList.php: {{Gerrit|00ef00f59fd2a7a1366161ccc66c260be20e3e50}}: Prevent uncaught warnings/exception on Special:AbuseFilter ([[phab:T265994|T265994]]) (duration: 01m 01s)
* 14:48 urbanecm@deploy1001: Synchronized php-1.36.0-wmf.14/extensions/FileImporter/: {{Gerrit|5eee9b773338e5181867cabec9faefbdeacf67ca}}: Set originalRequest (incl. X-Forwarded-For) for remote edits ([[phab:T265810|T265810]]) (duration: 01m 06s)
* 14:16 urbanecm@deploy1001: Synchronized php-1.36.0-wmf.13/extensions/FileImporter/: {{Gerrit|5f8d3de14c116b618f5226419082d5c9a07766fb}}: Set originalRequest (incl. X-Forwarded-For) for remote edits ([[phab:T265810|T265810]]) (duration: 01m 09s)
* 14:15 Urbanecm: [urbanecm@deploy1001 /srv/mediawiki-staging (master u=)]$ sudo /usr/local/sbin/fix-staging-perms
* 13:54 marostegui@cumin1001: dbctl commit (dc=all): 'db2125 (re)pooling @ 100%: Slowly repool db2125 after checking tables ', diff saved to https://phabricator.wikimedia.org/P13033 and previous config saved to /var/cache/conftool/dbconfig/20201020-135436-root.json
* 13:39 marostegui@cumin1001: dbctl commit (dc=all): 'db2125 (re)pooling @ 80%: Slowly repool db2125 after checking tables ', diff saved to https://phabricator.wikimedia.org/P13032 and previous config saved to /var/cache/conftool/dbconfig/20201020-133933-root.json
* 13:24 marostegui@cumin1001: dbctl commit (dc=all): 'db2125 (re)pooling @ 60%: Slowly repool db2125 after checking tables ', diff saved to https://phabricator.wikimedia.org/P13031 and previous config saved to /var/cache/conftool/dbconfig/20201020-132430-root.json
* 13:19 XioNoX: install routinator 3000 0.8.0 on rpki2001 - [[phab:T266001|T266001]]
* 13:16 liw@deploy1001: rebuilt and synchronized wikiversions files: group0 wikis to 1.36.0-wmf.14
* 13:11 liw@deploy1001: Finished scap: testwikis wikis to 1.36.0-wmf.14 (duration: 58m 03s)
* 13:09 marostegui@cumin1001: dbctl commit (dc=all): 'db2125 (re)pooling @ 40%: Slowly repool db2125 after checking tables ', diff saved to https://phabricator.wikimedia.org/P13030 and previous config saved to /var/cache/conftool/dbconfig/20201020-130926-root.json
* 12:54 marostegui@cumin1001: dbctl commit (dc=all): 'db2125 (re)pooling @ 20%: Slowly repool db2125 after checking tables ', diff saved to https://phabricator.wikimedia.org/P13029 and previous config saved to /var/cache/conftool/dbconfig/20201020-125423-root.json
* 12:25 jayme@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventstreams' for release 'production' .
* 12:25 jayme@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventstreams' for release 'canary' .
* 12:24 jayme@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventstreams' for release 'production' .
* 12:24 jayme@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventstreams' for release 'canary' .
* 12:16 jayme@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventstreams' for release 'production' .
* 12:16 jayme@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventstreams' for release 'canary' .
* 12:15 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventstreams' for release 'production' .
* 12:13 liw@deploy1001: Started scap: testwikis wikis to 1.36.0-wmf.14
* 11:37 liw: 1.36.0-wmf.14 was branched at {{Gerrit|1b7b5f716015f9303d37158820dadf759e8db707}} for [[phab:T263180|T263180]]
* 11:35 Lucas_WMDE: EU backport/config window done
* 11:35 lucaswerkmeister-wmde@deploy1001: Synchronized php-1.36.0-wmf.13/extensions/WikimediaEvents/: Backport: [[gerrit:635030{{!}}SearchSatisfaction: Set isAnon field (T259250)]] (duration: 00m 57s)
* 11:15 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:634039{{!}}Set Wikidata MF to collapse sections by default (T239195)]] (duration: 00m 56s)
* 11:09 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:634938{{!}}Remove noratelimit from Wikidata bot group (T258354)]] (duration: 00m 56s)
* 10:09 jayme@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'canary' .
* 10:09 jayme@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
* 10:04 godog: swift codfw-prod: bump object weight for ms-be2057 - [[phab:T261633|T261633]]
* 09:59 dcausse: [[phab:T255399|T255399]]: resuming wdqs-data-reload manually from chunk no 776 on wdqs1009
* 09:51 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 09:51 klausman@cumin1001: START - Cookbook sre.hosts.downtime
* 09:50 jayme@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
* 09:50 jayme@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'canary' .
* 09:47 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
* 09:25 jayme@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
* 09:25 jayme@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'canary' .
* 09:08 jayme@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'canary' .
* 09:08 jayme@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
* 09:06 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
== 2020-10-19 ==
* 23:57 gehel@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99)
* 23:57 gehel@cumin1001: START - Cookbook sre.wdqs.data-reload
* 23:57 gehel@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99)
* 23:57 gehel@cumin1001: START - Cookbook sre.wdqs.data-reload
* 23:56 gehel@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99)
* 23:11 ebernhardson@deploy1001: Finished deploy [wikimedia/discovery/analytics@4bfd6c9]: spark: case insensitive schema validation (duration: 04m 33s)
* 23:07 ebernhardson@deploy1001: Started deploy [wikimedia/discovery/analytics@4bfd6c9]: spark: case insensitive schema validation
* 23:02 mutante: etherpad got restarted with new config options related to rate limiting - hopefully this fixed [[phab:T265490|T265490]]
* 21:29 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 21:20 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
* 21:19 ebernhardson@deploy1001: Finished deploy [wikimedia/discovery/analytics@94c23a1]: airflow: fix column mismatch writing page predictions (duration: 04m 48s)
* 21:14 ebernhardson@deploy1001: Started deploy [wikimedia/discovery/analytics@94c23a1]: airflow: fix column mismatch writing page predictions
* 21:01 ppchelko@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
* 20:57 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 20:55 ppchelko@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
* 20:49 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
* 20:46 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 20:41 eileen: drush vset match_on_import 1
* 20:38 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
* 20:21 otto@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventstreams' for release 'production' .
* 20:21 otto@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventstreams' for release 'canary' .
* 20:19 otto@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventstreams' for release 'canary' .
* 20:19 otto@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventstreams' for release 'production' .
* 20:18 otto@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventstreams' for release 'production' .
* 20:18 otto@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventstreams' for release 'canary' .
* 20:17 jgiannelos@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'mobileapps' for release 'nontls' .
* 20:17 jgiannelos@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
* 20:16 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: dc=codfw,name=wtp2020.codfw.wmnet
* 20:16 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 20:16 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 20:16 ebernhardson@deploy1001: Finished deploy [wikimedia/discovery/analytics@e66bec2]: Fix column mismatch when reading discovery.wikibase_item (duration: 01m 03s)
* 20:16 mutante: decom'ing wtp201[0-9].codfw.wmnet (pooled=inactive) [[phab:T265558|T265558]]
* 20:16 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 20:15 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 20:15 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: dc=codfw,name=wtp201[0-9].codfw.wmnet
* 20:15 ebernhardson@deploy1001: Started deploy [wikimedia/discovery/analytics@e66bec2]: Fix column mismatch when reading discovery.wikibase_item
* 20:09 dzahn@cumin1001: conftool action : set/weight=1; selector: dc=codfw,cluster=parsoid,service=canary
* 20:08 jgiannelos@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
* 20:08 jgiannelos@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'mobileapps' for release 'nontls' .
* 20:06 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 20:06 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 20:01 mutante: decom'ing wtp200[1-9].codfw.wmnet (pooled=inactive) [[phab:T265558|T265558]]
* 20:00 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: dc=codfw,name=wtp200[1-9].codfw.wmnet
* 19:57 jgiannelos@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'mobileapps' for release 'staging' .
* 19:57 otto@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'canary' .
* 19:57 otto@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
* 19:52 otto@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'canary' .
* 19:52 otto@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
* 19:51 otto@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'canary' .
* 19:45 ebernhardson@deploy1001: Finished deploy [wikimedia/discovery/analytics@3c590e2]: Fix column mismatch for discovery.wikibase_item and multilist handler for esbulk uploads (duration: 03m 35s)
* 19:41 ebernhardson@deploy1001: Started deploy [wikimedia/discovery/analytics@3c590e2]: Fix column mismatch for discovery.wikibase_item and multilist handler for esbulk uploads
* 19:35 otto@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'canary' .
* 19:34 otto@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'canary' .
* 19:33 mutante: wtp2001 - sudo confctl decommission
* 19:29 dzahn@cumin1001: conftool action : set/weight=0; selector: dc=codfw,cluster=parsoid,service=canary
* 19:01 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: GrowthExperiments: Set default variant to D on trwiki ([[phab:T243445|T243445]], [[phab:T265556|T265556]]) (duration: 00m 56s)
* 18:37 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|18902aa75efafb7d56ca347c12781dbe59f2f8ad}}: Change votewiki language temporarily to fa for fawiki elections ([[phab:T262689|T262689]]) (duration: 00m 56s)
* 18:31 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enable and configure GrowthExperiments on trwiki ([[phab:T243445|T243445]]) (duration: 00m 57s)
* 18:29 tzatziki: removing 10 files for legal compliance
* 18:24 catrope@deploy1001: Synchronized php-1.36.0-wmf.13/extensions/MobileFrontend/: Fix mobile diff redirect when curid parameter is present ([[phab:T265654|T265654]]) (duration: 00m 58s)
* 18:20 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: GrowthExperiments: Enable variant C/D for new users ([[phab:T265556|T265556]]) (duration: 00m 56s)
* 18:10 catrope@deploy1001: Synchronized wmf-config/CommonSettings.php: Drop wgHiddenPrefs hack for VE beta feature ([[phab:T254349|T254349]]) (duration: 00m 56s)
* 17:53 robh@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 16:44 robh@cumin1001: START - Cookbook sre.dns.netbox
* 16:18 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 16:16 robh@cumin1001: START - Cookbook sre.hosts.downtime
* 15:59 Urbanecm: mwscript extensions/CirrusSearch/maintenance/UpdateSearchIndexConfig.php --wiki=smnwiki --cluster=all
* 15:31 elukey: update puppet compilers' facts
* 14:36 bpirkle@deploy1001: Synchronized wmf-config/CommonSettings.php: gerrit:634841 Add api.wikimedia.org to the list of allowed CORS origins (duration: 00m 57s)
* 14:32 bpirkle@deploy1001: Synchronized wmf-config/InitialiseSettings-labs.php: gerrit:634356 Configuration for user menu and sidebar special pages (duration: 00m 55s)
* 14:30 bpirkle@deploy1001: Synchronized wmf-config/InitialiseSettings.php: gerrit:634356 Configuration for user menu and sidebar special pages (duration: 00m 56s)
* 14:15 moritzm: installing llvm-toolchain-7 bugfix updates from Buster point release
* 13:34 Urbanecm: Start of `[urbanecm@mwmaint2001 ~/updateVarDumps/output/group2-medium]$ while read wiki; do echo "Processing $wiki"; mwscript extensions/AbuseFilter/maintenance/updateVarDumps.php --wiki=$wiki --print-orphaned-records-to=/tmp/urbanecm/$wiki-orphaned.log --progress-markers > output/$wiki.log; done < wikis.dblist` ([[phab:T246539|T246539]]; wikis.dblist is medium wikis from group2.dblist)
* 13:33 mholloway-shell@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
* 13:32 mholloway-shell@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
* 13:31 mholloway-shell@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'wikifeeds' for release 'staging' .
* 13:26 moritzm: import prometheus-openldap-exporter 0+git20171128-2+deb10u1  for buster-wikimedia  [[phab:T264388|T264388]]
* 12:48 moritzm: installing httpcomponents-client security updates on Buster
* 12:26 Urbanecm: Creation of smnwiki is done ([[phab:T264859|T264859]])
* 12:25 urbanecm@deploy1001: Synchronized wmf-config/interwiki.php: Update interwiki cache (duration: 00m 56s)
* 12:22 urbanecm@deploy1001: Synchronized langlist: Creating smnwiki ([[phab:T264859|T264859]]) (duration: 00m 56s)
* 12:16 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Creating smnwiki ([[phab:T264859|T264859]]) (duration: 00m 55s)
* 12:16 marostegui: Sanitize smnwiki on db1124:3315 and db2094:3315 - [[phab:T264900|T264900]]
* 12:15 urbanecm@deploy1001: Synchronized static/images/project-logos/: Creating smnwiki ([[phab:T264859|T264859]]) (duration: 00m 56s)
* 12:15 marostegui: Deploy schema change on smnwiki [[phab:T265321|T265321]] [[phab:T264900|T264900]]
* 12:14 urbanecm@deploy1001: rebuilt and synchronized wikiversions files: Creating smnwiki ([[phab:T264859|T264859]])
* 12:12 urbanecm@deploy1001: Synchronized dblists: Creating smnwiki ([[phab:T264859|T264859]]) (duration: 00m 55s)
* 12:11 urbanecm@deploy1001: Synchronized wmf-config/db-codfw.php: Creating smnwiki ([[phab:T264859|T264859]]) (duration: 00m 55s)
* 12:10 urbanecm@deploy1001: Synchronized wmf-config/db-eqiad.php: Creating smnwiki ([[phab:T264859|T264859]]) (duration: 00m 56s)
* 11:51 moritzm: updating idp-test1001 to CAS 6.2.4
* 11:46 moritzm: updating idp-test2001 to CAS 6.2.4
* 11:43 Urbanecm: End of `[urbanecm@mwmaint2001 ~/updateVarDumps/script]$ while read wiki; do echo "Processing $wiki"; mwscript extensions/AbuseFilter/maintenance/updateVarDumps.php --wiki=$wiki --print-orphaned-records-to=/tmp/urbanecm/$wiki-orphaned.log --progress-markers > $wiki.log; done < ../small-group2.dblist` # [[phab:T246539|T246539]] # small-group2.dblist is wikis from small.dblist that are also in group2.dblist
* 11:42 Urbanecm: End of `mwscript extensions/AbuseFilter/maintenance/updateVarDumps.php --wiki=enwikisource --print-orphaned-records-to=/tmp/urbanecm/enwikisource-orphaned.log --progress-markers` ([[phab:T246539|T246539]])
* 11:40 Urbanecm: [urbanecm@mwmaint2001 ~/updateVarDumps/script]$ while read wiki; do echo "Processing $wiki"; mwscript extensions/AbuseFilter/maintenance/updateVarDumps.php --wiki=$wiki --print-orphaned-records-to=/tmp/urbanecm/$wiki-orphaned.log --progress-markers > $wiki.log; done < ../small-group2.dblist # [[phab:T246539|T246539]] # small-group2.dblist is wikis from small.dblist that are also in group2.dblist
* 11:31 jmm@cumin2001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1)
* 11:24 Urbanecm: EU B&C window done
* 11:24 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|ce92c9814bf9c12cab1a9592dfb32f935d255d93}}: Restore bureaucrat abilities at uzwiki ([[phab:T265746|T265746]]) (duration: 00m 56s)
* 11:20 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|26b97261f2b9d1991ea08fe32b6007ba6fe5088f}}: Disable EditorJourney (UnderstandingFirstDay) ([[phab:T252391|T252391]]) (duration: 01m 10s)
* 11:17 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
* 11:13 Urbanecm: Manually run `mwscript extensions/AbuseFilter/maintenance/updateVarDumps.php --wiki=$wiki --print-orphaned-records-to=/tmp/urbanecm/$wiki-orphaned.log --progress-markers > $wiki.log` for several small group2 wikis ([[phab:T246539|T246539]])
* 10:57 Urbanecm: Start `mwscript extensions/AbuseFilter/maintenance/updateVarDumps.php --wiki=enwikisource --print-orphaned-records-to=/tmp/urbanecm/enwikisource-orphaned.log --progress-markers` in a tmux session named updateVarDumps at mwmaint2001 ([[phab:T246539|T246539]])
* 10:53 Urbanecm: [urbanecm@mwmaint2001 ~/updateVarDumps/script]$  mwscript extensions/AbuseFilter/maintenance/updateVarDumps.php --wiki=jawikivoyage --print-orphaned-records-to=- --progress-markers # [[phab:T246539|T246539]]
* 09:09 gehel@cumin2001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 08:40 jayme: updated helm to 2.16.12-1 on deploy*,chartmuseum*,contint*
* 08:37 godog: upgrade rsyslog to 8.2008.0-1~bpo10+1 on centrallog2001 - [[phab:T259780|T259780]]
* 08:31 godog: swift codfw-prod: bump object weight for ms-be2057 - [[phab:T261633|T261633]]
* 08:26 jayme: updated helm to 2.16.12-1 on deploy2001
* 08:24 jayme: imported helm 2.16.12-1 to buster-wikimedia stretch-wikimedia jessie-wikimedia - [[phab:T263616|T263616]]
* 08:01 godog: re-enable compaction for prometheus[12]003 - [[phab:T261281|T261281]]
* 07:53 gehel@cumin2001: START - Cookbook sre.wdqs.data-transfer
* 07:36 elukey@cumin1001: END (FAIL) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=99)
* 07:36 elukey@cumin1001: START - Cookbook sre.hadoop.init-hadoop-workers
* 07:16 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2125 ', diff saved to https://phabricator.wikimedia.org/P13022 and previous config saved to /var/cache/conftool/dbconfig/20201019-071614-marostegui.json
* 06:46 elukey@deploy1001: Finished deploy [analytics/turnilo/deploy@334627e]: Upgrade to 1.27 (duration: 00m 10s)
* 06:45 elukey@deploy1001: Started deploy [analytics/turnilo/deploy@334627e]: Upgrade to 1.27
== 2020-10-17 ==
* 13:22 Urbanecm: [urbanecm@mwmaint2001 ~/uploads]$ mwscript importImages.php --wiki=commonswiki --comment-ext=txt --user=Fæ . # [[phab:T264529|T264529]]
== 2020-10-16 ==
* 21:46 pt1979@cumin2001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 21:43 pt1979@cumin2001: START - Cookbook sre.dns.netbox
* 20:27 robh@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 20:25 robh@cumin1001: START - Cookbook sre.hosts.downtime
* 19:39 robh@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 19:37 robh@cumin1001: START - Cookbook sre.hosts.downtime
* 17:43 thcipriani: restarting gerrit due to gc thrashing
* 16:25 andrew@deploy1001: Finished deploy [horizon/deploy@89b308c]: prevent creation of VMs with non-ceph flavors (duration: 04m 08s)
* 16:21 andrew@deploy1001: Started deploy [horizon/deploy@89b308c]: prevent creation of VMs with non-ceph flavors
* 15:36 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.cf (exit_code=0)
* 15:36 ayounsi@cumin1001: START - Cookbook sre.network.cf
* 15:11 bblack@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 15:01 bblack@cumin1001: START - Cookbook sre.hosts.decommission
* 13:41 effie: pooling mw2279.codfw.wmnet [[phab:T264698|T264698]]
* 12:11 jiji@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 12:09 jiji@cumin2001: START - Cookbook sre.hosts.downtime
* 10:35 reedy@deploy1001: Synchronized php-1.36.0-wmf.13/extensions/ProofreadPage/: Revert excessive escaping [[phab:T265571|T265571]] (duration: 01m 12s)
* 09:23 ema: text@esams (except for cp3050/cp3052): upgrade varnish to 6.0.6-1wm2, restart varnishkafka instances [[phab:T264074|T264074]]
* 09:19 ema: upload@esams: upgrade varnish to 6.0.6-1wm2, restart varnishkafka-webrequest [[phab:T264074|T264074]]
* 09:08 ema: upload@eqsin: upgrade varnish to 6.0.6-1wm2, restart varnishkafka-webrequest [[phab:T264074|T264074]]
* 09:03 XioNoX: eqsin, push CR 634473
* 09:01 ema: text@eqsin: upgrade varnish to 6.0.6-1wm2, restart varnishkafka instances [[phab:T264074|T264074]]
* 08:53 ema: upload@codfw: upgrade varnish to 6.0.6-1wm2, restart varnishkafka-webrequest [[phab:T264074|T264074]]
* 08:52 XioNoX: add BGP_IXP_RS_in to eqsin RS BGP sessions
* 08:48 ema: text@codfw: upgrade varnish to 6.0.6-1wm2, restart varnishkafka instances [[phab:T264074|T264074]]
* 08:29 ema: upload@eqiad: upgrade varnish to 6.0.6-1wm2, restart varnishkafka-webrequest [[phab:T264074|T264074]]
* 08:24 ema: text@eqiad: upgrade varnish to 6.0.6-1wm2, restart varnishkafka instances [[phab:T264074|T264074]]
* 08:09 elukey: reboot stat1005/stat1008 to pick up correct GPU settings
* 08:09 ema: upload@ulsfo: upgrade varnish to 6.0.6-1wm2, restart varnishkafka-webrequest [[phab:T264074|T264074]]
* 07:59 ema: text@ulsfo: upgrade varnish to 6.0.6-1wm2, restart varnishkafka instances [[phab:T264074|T264074]]
* 07:19 dcausse@deploy1001: Finished deploy [wikimedia/discovery/analytics@27d0b01]: cirrus namespace map: Align output columns with table (duration: 04m 22s)
* 07:15 dcausse@deploy1001: Started deploy [wikimedia/discovery/analytics@27d0b01]: cirrus namespace map: Align output columns with table
* 06:57 XioNoX: enable cr2-eqdfw:xe-0/1/2
* 02:14 eileen: civicrm revision changed from {{Gerrit|585eb835d8}} to {{Gerrit|3c3dcf80ae}}, config revision is {{Gerrit|f76d7849bc}}
* 01:01 ryankemper: Cleaning up a dangling no-longer-puppet-managed udev elasticsearch-readahead rule across all cirrus instances: `sudo cumin -b 36 C:profile::elasticsearch::cirrus 'sudo rm -fv /etc/udev/rules.d/elasticsearch-readahead.rules && sudo /sbin/udevadm control --reload && sudo /sbin/udevadm trigger'`
* 00:56 cdanis@cumin1001: END (PASS) - Cookbook sre.network.cf (exit_code=0)
* 00:56 cdanis@cumin1001: START - Cookbook sre.network.cf
== 2020-10-15 ==
* 23:49 ryankemper: Began in-place reindex of `eqiad`, `codfw`, and `cloudelastic`. Running on `ryankemper@mwmaint2001` under tmux sessions `inplace_reindex_[eqiad, codfw, cloudelastic]`
* 23:00 krinkle@deploy1001: Synchronized wmf-config/env.php: {{Gerrit|I245e84e0b8c}} (duration: 01m 10s)
* 22:09 cdanis: previous sre.network.cf invocation was a no-op; just checking status
* 22:08 cdanis@cumin1001: END (PASS) - Cookbook sre.network.cf (exit_code=0)
* 22:08 cdanis@cumin1001: START - Cookbook sre.network.cf
* 22:06 mutante: depooled remaining wtp* servers in codfw. old parsoid servers, new servers are parse2* ([[phab:T265558|T265558]])
* 22:05 dzahn@cumin1001: conftool action : set/pooled=no; selector: dc=codfw,name=wtp2020.codfw.wmnet
* 22:05 dzahn@cumin1001: conftool action : set/pooled=no; selector: dc=codfw,name=wtp201[6-9].codfw.wmnet
* 21:35 dzahn@cumin1001: conftool action : set/pooled=no; selector: dc=codfw,name=wtp201[0-5].codfw.wmnet
* 20:27 cdanis@cumin1001: END (PASS) - Cookbook sre.network.cf (exit_code=0)
* 20:27 cdanis@cumin1001: START - Cookbook sre.network.cf
* 19:46 ebernhardson@deploy1001: Finished deploy [wikimedia/discovery/analytics@88e1283]: spark: fix handling of unpartitioned data sources (duration: 06m 22s)
* 19:43 marxarelli: all wikis promoted to 1.36.0-wmf.13 ([[phab:T263179|T263179]])
* 19:39 ebernhardson@deploy1001: Started deploy [wikimedia/discovery/analytics@88e1283]: spark: fix handling of unpartitioned data sources
* 19:33 dduvall@deploy1001: rebuilt and synchronized wikiversions files: all wikis to 1.36.0-wmf.13
* 19:30 robh@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 19:23 robh@cumin1001: START - Cookbook sre.dns.netbox
* 19:20 catrope@deploy1001: Synchronized php-1.36.0-wmf.11/extensions/DiscussionTools/: Correctly generate timezone abbreviations for parsing ([[phab:T265500|T265500]]) (duration: 01m 29s)
* 19:16 catrope@deploy1001: Synchronized php-1.36.0-wmf.13/extensions/DiscussionTools/: Correctly generate timezone abbreviations for parsing ([[phab:T265500|T265500]]) (duration: 01m 51s)
* 19:14 catrope@deploy1001: Synchronized php-1.36.0-wmf.13/extensions/Echo/: Drop text indent in modern Vector ([[phab:T264339|T264339]]) (duration: 01m 51s)
* 19:09 catrope@deploy1001: Synchronized php-1.36.0-wmf.13/skins/Vector/: Vertically align personal tools ([[phab:T264339|T264339]]) (duration: 01m 43s)
* 19:07 catrope@deploy1001: Synchronized php-1.36.0-wmf.13/extensions/WikimediaEvents/: Revert "clientError: Adds is_logged_in tag to aid filtering" ([[phab:T256173|T256173]]) (duration: 01m 58s)
* 19:04 catrope@deploy1001: Synchronized php-1.36.0-wmf.13/extensions/UploadWizard/: Work around LESS calculating calc() values wrong ([[phab:T265560|T265560]]) (duration: 02m 07s)
* 18:32 mutante: depooling wtp2005 through wtp2009 (parsoid, old server generation) [[phab:T265558|T265558]]
* 18:32 dzahn@cumin1001: conftool action : set/pooled=no; selector: dc=codfw,name=wtp200[6-9].codfw.wmnet
* 18:07 mutante: mx1001/mx2001: made previous live hack official and added benefactors@wikipedia alias, re-enabling puppet
* 17:51 pt1979@cumin2001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 17:46 pt1979@cumin2001: START - Cookbook sre.dns.netbox
* 17:19 volans@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 17:17 jbond42: deleteing old pcc reports in compiler1002 to free disk space
* 17:12 volans@cumin1001: START - Cookbook sre.dns.netbox
* 17:06 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'zotero' for release 'staging' .
* 17:05 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'wikifeeds' for release 'staging' .
* 17:00 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'termbox' for release 'staging' .
* 16:58 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'push-notifications' for release 'main' .
* 16:57 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'proton' for release 'production' .
* 16:56 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'mobileapps' for release 'staging' .
* 16:54 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'mathoid' for release 'staging' .
* 16:51 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
* 16:50 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics' for release 'canary' .
* 16:50 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics' for release 'production' .
* 16:48 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'cxserver' for release 'staging' .
* 16:46 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'citoid' for release 'staging' .
* 16:40 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
* 16:25 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
* 16:25 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'canary' .
* 16:14 elukey@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 16:14 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventstreams' for release 'canary' .
* 16:14 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventstreams' for release 'production' .
* 16:11 elukey@cumin1001: START - Cookbook sre.dns.netbox
* 16:11 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'canary' .
* 16:11 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
* 15:53 elukey@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 15:53 urbanecm@deploy1001: Synchronized php-1.36.0-wmf.13/extensions/CheckUser/includes/specials/: {{Gerrit|fd94002cf6070180a289296ec65ad224e5a0ae67}}: Revert "Validate username input before constructing subpage links" ([[phab:T265606|T265606]]) (duration: 02m 48s)
* 15:50 elukey@cumin1001: START - Cookbook sre.dns.netbox
* 15:47 elukey@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 15:35 elukey@cumin1001: START - Cookbook sre.dns.netbox
* 15:29 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 15:19 elukey@cumin1001: START - Cookbook sre.hosts.decommission
* 15:09 elukey@cumin1001: END (PASS) - Cookbook sre.druid.roll-restart-workers (exit_code=0)
* 15:07 ebernhardson@deploy1001: Finished deploy [wikimedia/discovery/analytics@500bdad]: spark: correctly parse non-partitioned partition specs (duration: 00m 59s)
* 15:06 ebernhardson@deploy1001: Started deploy [wikimedia/discovery/analytics@500bdad]: spark: correctly parse non-partitioned partition specs
* 14:51 elukey: roll restart druid-historical daemons on druid1004-1008 to pick up new conn pooling changes
* 14:51 elukey@cumin1001: START - Cookbook sre.druid.roll-restart-workers
* 14:45 jbond42: enable puppet post deploy puppetdb change blacklisting dynamic facts
* 14:41 ema: varnish 6.0.6-1wm2 uploaded to apt.wikimedia.org component/varnish6 [[phab:T264074|T264074]]
* 14:38 jbond42: disable puppet to deploy puppetdb change blacklisting dynamic facts
* 14:21 ema: cp3050: systemctl reload varnishkafka-webrequest.service [[phab:T264074|T264074]]
* 14:21 jayme: imported doxygen_1.8.19-1~deb10+wmf1 to component/ci buster-wikimedia - [[phab:T265579|T265579]]
* 14:12 ema: cp3050: restart varnishkafka-webrequest w/ libvarnishapi2 6.0.6-1wm2 [[phab:T264074|T264074]]
* 14:11 ema: cp3050: upgrade varnish to 6.0.6-1wm2 [[phab:T264074|T264074]]
* 14:10 ema: cp3050: upgrade varnish to 6.0.6-1wm2 [[phab:T26407|T26407]]
* 12:58 gilles@deploy1001: Finished deploy [performance/navtiming@dff55f8]: (no justification provided) (duration: 00m 05s)
* 12:58 gilles@deploy1001: Started deploy [performance/navtiming@dff55f8]: (no justification provided)
* 12:12 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'blubberoid' for release 'staging' .
* 10:47 vgutierrez: restart ats-backend on cp3050
* 10:00 akosiaris: [[phab:T264209|T264209]]. Initiate a docker pull of docker-registry.discovery.wmnet/mwcachedir:0.0.1 from all kubernetes and kubernetes staging nodes.
* 08:17 godog: swift codfw-prod: bump object weight for ms-be2057 - [[phab:T261633|T261633]]
* 04:27 ryankemper: Rolling upgrade for cirrus `codfw` complete
* 04:10 ryankemper@cumin1001: END (PASS) - Cookbook sre.elasticsearch.rolling-upgrade (exit_code=0)
* 02:18 ryankemper: Rolling upgrade for cirrussearch `codfw` beginning
* 02:18 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-upgrade
* 02:14 ryankemper: Rolling upgrade for cirrussearch `eqiad` is complete
* 02:13 ryankemper@cumin1001: END (PASS) - Cookbook sre.elasticsearch.rolling-upgrade (exit_code=0)
* 00:36 ryankemper: Beginning rolling upgrade for cirrussearch `eqiad`. Cookbook will restart elasticsearch on 36 nodes total, 3 nodes at a time
* 00:36 eileen: tools revision changed from {{Gerrit|d4e08c52de}} to {{Gerrit|a2a91d6c6a}}
* 00:35 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-upgrade
* 00:24 twentyafterfour: phabricator update was uneventful
* 00:13 twentyafterfour: updating phabricator
== 2020-10-14 ==
* 23:35 foks: Removing one further file for legal compliance
* 23:28 foks: Removing nine files for legal compliance
* 23:11 ebernhardson: Syncronized wmf-config/InitialiseSettings.php to sync reduction of cirrus morelike query cache from 3 back to 1 day
* 23:08 ebernhardson@deploy1001: Synchronized wmf-config/InitialiseSettings.php: (no justification provided) (duration: 01m 04s)
* 23:00 dwisehaupt: all payments hosts in eqiad are now running the REL1_35 code.
* 22:41 ebernhardson@deploy1001: Finished deploy [search/mjolnir/deploy@9ce273f]: bulk_daemon: revert of streaming gzip decompression (duration: 02m 25s)
* 22:38 ebernhardson@deploy1001: Started deploy [search/mjolnir/deploy@9ce273f]: bulk_daemon: revert of streaming gzip decompression
* 22:13 dduvall@deploy1001: Synchronized php: group1 wikis to 1.36.0-wmf.13 (duration: 01m 03s)
* 22:12 dduvall@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.36.0-wmf.13
* 22:08 ebernhardson@deploy1001: Finished deploy [wikimedia/discovery/analytics@04548dd]: spark: centralize reading/writing to hive (duration: 03m 44s)
* 22:04 ebernhardson@deploy1001: Started deploy [wikimedia/discovery/analytics@04548dd]: spark: centralize reading/writing to hive
* 22:01 thcipriani@deploy1001: Synchronized php-1.36.0-wmf.13/extensions/NavigationTiming: BACON: [[gerrit:634002{{!}}Make attribution source logic more defensive]] [[phab:T263599|T263599]] (duration: 01m 05s)
* 21:51 dpifke@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enabling image preconnect in group0 ([[phab:T123582|T123582]]) (duration: 01m 03s)
* 21:33 thcipriani@deploy1001: Synchronized php-1.36.0-wmf.13/skins/Vector/resources/skins.vector.styles/Menu.less: BACON: [[gerrit:634086{{!}}Stylesheet needs to be compatible with cached HTML]] [[phab:T265543|T265543]] (duration: 01m 07s)
* 20:39 marxarelli: group1 rolled back to 1.36.0-wmf.11 due to malformed html in nav. task incoming (cc: [[phab:T263179|T263179]])
* 20:37 dduvall@deploy1001: rebuilt and synchronized wikiversions files: Revert group1 wikis to 1.36.0-wmf.11
* 20:32 marxarelli: rolling back group1 due to malformed html in nav menu
* 19:46 marxarelli: 1.36.0-wmf.13 promoted to group1. no new or concerning errors or changes in error rates ([[phab:T263179|T263179]])
* 19:39 dduvall@deploy1001: Synchronized php: group1 wikis to 1.36.0-wmf.13 (duration: 01m 03s)
* 19:38 dduvall@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.36.0-wmf.13
* 19:33 mutante: mx1001/mx2001 - temp. disabled puppet, live hacking urgent alias change since private repo needs to be fixed
* 19:14 mutante: depooling 5 of the older parsoid servers in codfw
* 19:14 dzahn@cumin1001: conftool action : set/pooled=no; selector: dc=codfw,name=wtp200[1-5].codfw.wmnet
* 18:28 Urbanecm: wikiadmin@10.192.0.6(wikidatawiki)> DELETE FROM watchlist WHERE wl_user=104889; # [[phab:T265347|T265347]]
* 18:14 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|d6a56bb7fb762c53db5965f2698a93db2433d33d}}: Add rollbacker right on uzwiki ([[phab:T265509|T265509]]) (duration: 01m 04s)
* 18:10 urbanecm@deploy1001: Synchronized wmf-config/CommonSettings.php: {{Gerrit|0da89998e4e380f3ebe527a42a47dc66c49ee4d2}}: Add spamblacklistlog as a default right for the CU log user ([[phab:T239288|T239288]]) (duration: 01m 05s)
* 16:12 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.reboot-workers (exit_code=0)
* 15:59 elukey: drain + reboot an-worker1100 to pick up GPU settings - [[phab:T255138|T255138]]
* 15:58 elukey@cumin1001: START - Cookbook sre.hadoop.reboot-workers
* 15:55 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.reboot-workers (exit_code=0)
* 15:29 elukey: drain + reboot an-worker110[1,2] to pick up GPU settings - [[phab:T255138|T255138]]
* 15:28 elukey@cumin1001: START - Cookbook sre.hadoop.reboot-workers
* 15:26 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.reboot-workers (exit_code=0)
* 15:24 jayme: enabled and ran puppet on deploy1001 - [[phab:T260917|T260917]]
* 14:56 elukey: drain + reboot an-worker109[8,9] to pick up GPU settings - [[phab:T255138|T255138]]
* 14:55 elukey@cumin1001: START - Cookbook sre.hadoop.reboot-workers
* 14:12 jayme: disable-puppet on deploy1001 to test a change in hemlfile puppet on deploy2001 only - [[phab:T260917|T260917]]
* 14:01 akosiaris: push a 6GB image, named docker-registry.discovery.wmnet/mwcachedir:0.0.1, containing the cache/ dir of a mediawiki installation to the registry. [[phab:T264209|T264209]]
* 14:01 akosiaris: push a 6GB image, named docker-registry.discovery.wmnet/mwcachedir:0.0.1, containing the cache/ dir of a mediawiki installation to the registry. [[phab:T265183|T265183]]
* 13:53 jbond42: enable puppet fleet wide post - convert puppetdb stockpile queue to tmpfs
* 13:48 jbond42: disable puppet fleet wide to convert puppetdb stockpile queue to tmpfs
* 12:46 vgutierrez: Bump ECDHE-ECDSA-AES128-SHA pageview replacement to 10% - [[phab:T258405|T258405]]
* 11:50 hnowlan@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
* 11:50 hnowlan@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
* 11:48 hnowlan@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
* 11:48 hnowlan@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
* 11:43 moritzm: imported php-memcached, php-redis to component/icu63 [[phab:T264991|T264991]]
* 11:25 Urbanecm: EU B&C window completed
* 11:22 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|c63632de6a20b2f00da91187e5cf416fd39d8c5b}}: Enable DiscussionTools as a beta feature on 30 more wikis ([[phab:T264693|T264693]]) (duration: 01m 15s)
* 11:16 moritzm: imported php-igbinary, php-apcu-bc to component/icu63 [[phab:T264991|T264991]]
* 09:59 moritzm: imported php-wmerrors, tideways, tideways-xhprof, wikidiff2, xdebug to component/icu63 [[phab:T264991|T264991]]
* 08:34 elukey@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 08:28 elukey@cumin1001: START - Cookbook sre.dns.netbox
* 08:09 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 08:09 filippo@cumin1001: START - Cookbook sre.hosts.downtime
* 07:14 marostegui@cumin1001: dbctl commit (dc=all): 'db2125 (re)pooling @ 100%: Slowly repool db2125 after on-site maintenance [[phab:T260670|T260670]] ', diff saved to https://phabricator.wikimedia.org/P12988 and previous config saved to /var/cache/conftool/dbconfig/20201014-071440-root.json
* 06:59 marostegui@cumin1001: dbctl commit (dc=all): 'db2125 (re)pooling @ 75%: Slowly repool db2125 after on-site maintenance [[phab:T260670|T260670]] ', diff saved to https://phabricator.wikimedia.org/P12987 and previous config saved to /var/cache/conftool/dbconfig/20201014-065936-root.json
* 06:44 marostegui@cumin1001: dbctl commit (dc=all): 'db2125 (re)pooling @ 50%: Slowly repool db2125 after on-site maintenance [[phab:T260670|T260670]] ', diff saved to https://phabricator.wikimedia.org/P12986 and previous config saved to /var/cache/conftool/dbconfig/20201014-064433-root.json
* 06:29 marostegui@cumin1001: dbctl commit (dc=all): 'db2125 (re)pooling @ 40%: Slowly repool db2125 after on-site maintenance [[phab:T260670|T260670]] ', diff saved to https://phabricator.wikimedia.org/P12985 and previous config saved to /var/cache/conftool/dbconfig/20201014-062930-root.json
* 06:14 marostegui@cumin1001: dbctl commit (dc=all): 'db2125 (re)pooling @ 20%: Slowly repool db2125 after on-site maintenance [[phab:T260670|T260670]] ', diff saved to https://phabricator.wikimedia.org/P12984 and previous config saved to /var/cache/conftool/dbconfig/20201014-061426-root.json
* 06:12 marostegui: Change UNIQUE into KEY on enwikivoyage.imagelinks [[phab:T265445|T265445]]
* 05:59 marostegui@cumin1001: dbctl commit (dc=all): 'db2125 (re)pooling @ 30%: Slowly repool db2125 after on-site maintenance [[phab:T260670|T260670]] ', diff saved to https://phabricator.wikimedia.org/P12983 and previous config saved to /var/cache/conftool/dbconfig/20201014-055923-root.json
* 05:44 marostegui@cumin1001: dbctl commit (dc=all): 'db2125 (re)pooling @ 10%: Slowly repool db2125 after on-site maintenance [[phab:T260670|T260670]] ', diff saved to https://phabricator.wikimedia.org/P12982 and previous config saved to /var/cache/conftool/dbconfig/20201014-054420-root.json
== 2020-10-13 ==
* 23:22 catrope@deploy1001: Synchronized php-1.36.0-wmf.13/extensions/GrowthExperiments/: Revert removal of variant A ([[phab:T265372|T265372]]) (duration: 01m 04s)
* 23:18 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Rename GrowthExperiments help desk on ptwiki ([[phab:T265214|T265214]]) (duration: 01m 04s)
* 23:14 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Disable event logging in MediaViewer ([[phab:T260582|T260582]]) (duration: 01m 04s)
* 23:07 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enable watchlist expiry on frwiki, fawiki, dewiki, cswiki ([[phab:T264780|T264780]]) (duration: 01m 04s)
* 21:16 mutante: icinga had gerrit health alert but did not notice an issue myself and was gone next check
* 21:12 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 21:12 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 21:09 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 21:07 andrew@cumin1001: START - Cookbook sre.hosts.downtime
* 20:44 mutante: bast1002 - apt-get autoremove - cleans up golang and ruby packages
* 20:44 mutante: bast1002 - apt-get remove nmap (it can be used on netmon hosts and was not consistent with other bast hosts)
* 20:15 ebernhardson: unban elastic2029 from production-search-psi-codfw
* 20:14 ebernhardson: restart production-search-psi-codfw on elastic2029 to reset any wonkiness from gc hell
* 20:06 marxarelli: 1.36.0-wmf.13 promoted to group0. no new or concerning errors or changes in error rates ([[phab:T263179|T263179]])
* 20:03 ebernhardson: add elastic2029-production-search-psi-codfw to cluster.routing.allocatin.exclude._name to drain active shards, instance currently in gc hell
* 19:54 dduvall@deploy1001: rebuilt and synchronized wikiversions files: group0 wikis to 1.36.0-wmf.13
* 19:52 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 19:49 andrew@cumin1001: START - Cookbook sre.hosts.downtime
* 19:40 dduvall@deploy1001: Finished scap: testwikis wikis to 1.36.0-wmf.13 (duration: 40m 51s)
* 19:00 dduvall@deploy1001: Started scap: testwikis wikis to 1.36.0-wmf.13
* 18:58 dduvall@deploy1001: Pruned MediaWiki: 1.36.0-wmf.9 (duration: 01m 56s)
* 18:56 dduvall@deploy1001: Pruned MediaWiki: 1.36.0-wmf.8 (duration: 02m 10s)
* 18:53 dduvall@deploy1001: Pruned MediaWiki: 1.36.0-wmf.6 (duration: 13m 00s)
* 18:23 dduvall@deploy1001: rebuilt and synchronized wikiversions files: all wikis to 1.36.0-wmf.11
* 18:21 marxarelli: 1.36.0-wmf.11 promoted to group1. no new errors ([[phab:T263177|T263177]]). promoting to all wikis
* 18:10 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 18:09 robh: scs-c1-codfw mgmt firmware updated, updating scs-a1-codfw [[phab:T238036|T238036]]
* 18:08 andrew@cumin1001: START - Cookbook sre.hosts.downtime
* 18:01 robh: scs-c1-codfw firmware update via [[phab:T238036|T238036]]
* 17:47 marxarelli: 1.36.0-wmf.13 branched at {{Gerrit|a6be801fc6331a6a6b96f02f368750200d50ab09}} for [[phab:T263179|T263179]]
* 17:35 dduvall@deploy1001: Synchronized php: group1 wikis to 1.36.0-wmf.11 (duration: 01m 07s)
* 17:34 dduvall@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.36.0-wmf.11
* 17:32 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 17:32 jbond@cumin1001: START - Cookbook sre.hosts.downtime
* 17:30 marxarelli: 1.36.0-wmf.11 promoted to group0. no new errors ([[phab:T263177|T263177]]). preparing to promote to group1
* 17:18 ppchelko@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventstreams' for release 'production' .
* 17:18 ppchelko@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventstreams' for release 'canary' .
* 17:17 ppchelko@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventstreams' for release 'production' .
* 17:16 ppchelko@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventstreams' for release 'canary' .
* 17:15 ppchelko@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventstreams' for release 'canary' .
* 17:15 ppchelko@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventstreams' for release 'production' .
* 16:39 dduvall@deploy1001: rebuilt and synchronized wikiversions files: group0 wikis to 1.36.0-wmf.11
* 16:31 ebernhardson@deploy1001: Finished deploy [wikimedia/discovery/analytics@77febb6]: airflow: parameterize active mediawiki dc (duration: 05m 29s)
* 16:26 ebernhardson@deploy1001: Started deploy [wikimedia/discovery/analytics@77febb6]: airflow: parameterize active mediawiki dc
* 15:56 papaul: power down ms-be2036 for maintenance
* 15:02 godog: bounce logstash on logstash1007, GC death
* 14:41 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 14:39 andrew@cumin1001: START - Cookbook sre.hosts.downtime
* 14:18 urbanecm@deploy1001: Synchronized wmf-config/CommonSettings.php: {{Gerrit|5b28fd685b9cb8d8e93650b5d02bc41b81d0883c}}: Add setmentor to wgAvailableRights (duration: 00m 59s)
* 13:42 jayme@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'push-notifications' for release 'main' .
* 13:40 jayme@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'push-notifications' for release 'main' .
* 13:15 Urbanecm: [urbanecm@mwmaint2001 ~]$ mwscript namespaceDupes.php --wiki=trwiki --add-prefix=BROKEN --fix # [[phab:T265336|T265336]]
* 13:08 moritzm: imported php-mailparse, php-mongodb, php-msgpack to component/icu63 [[phab:T264991|T264991]]
* 12:50 Urbanecm: urbanecm@mwmaint2001:~$ mwscript namespaceDupes.php --wiki=trwiki --add-prefix=FIXME --fix # [[phab:T265336|T265336]]
* 12:49 Urbanecm: End of `urbanecm@mwmaint2001:~$ mwscript namespaceDupes.php --wiki=trwiki --fix` # [[phab:T265336|T265336]]
* 12:49 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es2026 for on-site maintenance [[phab:T263837|T263837]] ', diff saved to https://phabricator.wikimedia.org/P12975 and previous config saved to /var/cache/conftool/dbconfig/20201013-124940-marostegui.json
* 12:20 moritzm: imported dh-php, php-acpu, php-imagick to component/icu63 [[phab:T264991|T264991]]
* 11:22 moritzm: imported php-defaults, php-excimer, php-luasandbox, php-geoip to component/icu63 [[phab:T264991|T264991]]
* 11:16 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|90028b4c3c1cd4407e0834d603ccb8b256f2498e}}: Add suppressredirect right to reviewers on bnwiki ([[phab:T265169|T265169]]) (duration: 00m 58s)
* 11:14 Urbanecm: Start of `urbanecm@mwmaint2001:~$ mwscript namespaceDupes.php --wiki=trwiki --fix # [[phab:T265336|T265336]]`
* 11:13 volans: installed spicerack_0.0.43-1+deb10u1_amd64.deb on cumin2001 , need to wait a long-rnning cookbook to end to upgrade both hosts
* 11:09 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|e61fcebe7315f73d1fb4d531da37d2c1253115ee}}: Add namespace aliases for Turkish Wikipedia ([[phab:T265336|T265336]]) (duration: 00m 59s)
* 10:47 jayme: no-change rolling restart of push-notifications in codfw - [[phab:T265258|T265258]]
* 10:29 volans: upgrading spicerack on cumin2001 to 0.0.44
* 10:19 ema: cp3050: clear varnishkafka-webrequest's vut->sighup via stap [[phab:T264074|T264074]]
* 10:09 ema: cp3050: *reload* varnishkafka-webrequest [[phab:T264074|T264074]]
* 10:04 volans: uploaded spicerack_0.0.44 to apt.wikimedia.org buster-wikimedia
* 09:55 ema: cp3054: systemctl restart varnishkafka-webrequest.service [[phab:T264074|T264074]]
* 09:51 ema: cp3052: systemctl restart varnishkafka-webrequest.service [[phab:T264074|T264074]]
* 09:39 kormat: running schema change against s1 in eqiad [[phab:T259831|T259831]]
* 09:38 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 09:38 kormat@cumin1001: START - Cookbook sre.hosts.downtime
* 09:32 ema: cp3050: set grouping by request (vut->g_arg = 2) on varnishkafka-webrequest [[phab:T264074|T264074]]
* 08:40 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 08:40 kormat@cumin1001: START - Cookbook sre.hosts.downtime
* 08:13 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 08:11 klausman@cumin1001: START - Cookbook sre.hosts.downtime
* 07:55 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 07:55 filippo@cumin1001: START - Cookbook sre.hosts.downtime
* 07:43 kormat: running schema change against s3 in eqiad [[phab:T259831|T259831]]
* 07:43 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 07:43 kormat@cumin1001: START - Cookbook sre.hosts.downtime
* 07:37 moritzm: installing ruby security updates on stretch
* 07:02 moritzm: installing PHP 7.0 security updates
* 06:39 moritzm: Installing httpcomponents-client security updates for Stretch
* 05:35 marostegui: Set global innodb_change_buffering = inserts; on pc2009 [[phab:T263443|T263443]]
== 2020-10-12 ==
* 17:03 jayme: fixed /var/lock/ permission (1777) on ms-be2036 - [[phab:T265208|T265208]]
* 15:41 godog: roll-restart logstash5 in codfw
* 14:44 _joe_: freed 1.5 GB of space on ms-be2036 by running "apt-get clean"
* 14:05 moritzm: uploaded php7.2 7.2.31-1+0~20200514.41+debian9~1.gbpe2a56b+wmf1+icu63 to component/icu63 [[phab:T264991|T264991]]
* 12:39 moritzm: installing rails security updates on Stretch
* 12:26 moritzm: installing spice security updates on Buster
* 11:38 Urbanecm: EU B&C done
* 11:32 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|fff2532424f84970962f7de1e35d4250b83cb3da}}: [testwiki, test2wiki] Allow bureaucrats to grant import rights (duration: 00m 58s)
* 11:28 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|4966e8a6b8ae4e6d5623dd35e65ed8fcf3338bc1}}: Enable wgCheckUserLogLogins at all wikis but few large wikis ([[phab:T253802|T253802]]) (duration: 00m 58s)
* 11:27 hnowlan@cumin1001: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0)
* 11:18 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:631809{{!}}Require autoconfirmed status to edit Wikidata Properties (T254280)]] (duration: 01m 00s)
* 10:26 hnowlan@cumin1001: START - Cookbook sre.cassandra.roll-restart
* 10:26 hnowlan: roll-restarting restbase201[345678] for cert refresh
* 08:50 moritzm: uploaded libxml2 2.9.4+dfsg1-2.2+deb9u3+wmf1 to component/icu63 [[phab:T264991|T264991]]
* 07:54 godog: reboot ms-be2036 - [[phab:T265208|T265208]]
* 07:53 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 07:53 filippo@cumin1001: START - Cookbook sre.hosts.downtime
* 07:53 filippo@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 07:53 filippo@cumin1001: START - Cookbook sre.hosts.downtime
== 2020-10-10 ==
* 01:32 tgr@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:633281{{!}}Enable session-ip log channel everywhere (T264799)]] (duration: 00m 59s)
* 00:54 tgr@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:633277{{!}}Enable session-ip log channel on all but enwiki (T264799)]] (duration: 01m 01s)
* 00:18 tgr@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:633276{{!}}Enable session-ip log channel on eswiki (T264799)]] (duration: 00m 55s)
* 00:13 mutante: built prometheus-nutcracker-exporter for buster and imported on apt1001 (0.2+nmu1)
== 2020-10-09 ==
* 23:44 tgr@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:633274{{!}}Enable session-ip log channel on Wikidata (T264799)]] (duration: 00m 59s)
* 23:25 tgr@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:633272{{!}}Enable session-ip log channel on Commons (T264799)]] (duration: 00m 59s)
* 23:13 mutante: maps2010 is down since almost 3 days - unhandled crit alert but nothing in SAL and only related ticket says resolved - powercycling it - boots normal but doesn't have a prod role ([[phab:T260271|T260271]])
* 23:07 mutante: maps2010 is down since almost 3 days - unhandled crit alert but nothing in SAL or tickets
* 23:06 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 23:06 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 23:06 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 23:06 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 23:06 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 23:06 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 23:03 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 23:03 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 22:52 tgr@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:633271{{!}}Enable session-ip log channel on group1, except Commons/Wikidata (T264799)]] (duration: 00m 57s)
* 22:23 tgr@deploy1001: Synchronized php-1.36.0-wmf.11/includes/: Backport: [[gerrit:633252{{!}}Log IP/device changes within the same session (T264799)]] & [[gerrit:633254{{!}}SessionManager: Always log IP/UA in session-ip]] (duration: 01m 04s)
* 22:20 tgr@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:633210{{!}}Enable session-ip log channel on group0 (T264799)]] (duration: 00m 59s)
* 22:09 tgr@deploy1001: Synchronized php-1.36.0-wmf.10/includes/: Backport: [[gerrit:633252{{!}}Log IP/device changes within the same session (T264799)]] & [[gerrit:633254{{!}}SessionManager: Always log IP/UA in session-ip]] (duration: 01m 06s)
* 22:01 tgr_: rolling out [[phab:T264799|T264799]]#6533622
* 21:53 Urbanecm: [urbanecm@mwmaint2001 ~]$ mwscript extensions/CentralAuth/maintenance/attachAccount.php --wiki=dewiki --userlist users.txt # users.txt contains Almeida # [[phab:T263935|T263935]]
* 20:41 dwisehaupt: upgrading pay-lvs1001 to buster
* 20:31 dwisehaupt: upgrading pay-lvs1002 to buster
* 20:04 dwisehaupt: upgrading payments1001 to buster
* 19:14 dwisehaupt: upgrading payments1002 to buster
* 19:10 dzahn@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
* 18:44 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm
* 18:30 dwisehaupt: upgrading payments1003 to buster
* 17:53 dwisehaupt: upgrading payments1004 to buster
* 17:52 cstone: civicrm revision changed from {{Gerrit|b86a15a430}} to {{Gerrit|585eb835d8}}, config revision is {{Gerrit|57843925bb}}
* 16:06 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 15:56 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
* 15:42 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
* 15:40 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
* 14:41 jayme@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'zotero' for release 'production' .
* 14:32 jayme@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'zotero' for release 'production' .
* 14:18 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'zotero' for release 'staging' .
* 13:48 jayme@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
* 13:45 jayme: helm rollback push-notification in eqiad to revision 8
* 13:31 gehel@cumin1001: START - Cookbook sre.wdqs.data-reload
* 13:29 jayme@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
* 13:23 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'wikifeeds' for release 'staging' .
* 13:12 jayme@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'termbox' for release 'production' .
* 12:55 jayme@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'termbox' for release 'production' .
* 12:52 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'termbox' for release 'staging' .
* 12:33 jayme@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'proton' for release 'production' .
* 12:20 jayme@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'push-notifications' for release 'main' .
* 12:20 jayme@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'proton' for release 'production' .
* 12:16 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'proton' for release 'production' .
* 12:15 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'push-notifications' for release 'main' .
* 12:13 jayme@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'mathoid' for release 'production' .
* 11:38 jayme@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
* 11:16 jayme@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
* 11:13 jayme@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'mathoid' for release 'production' .
* 11:13 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'mobileapps' for release 'staging' .
* 10:52 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'mathoid' for release 'staging' .
* 10:41 gehel@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99)
* 10:17 jayme@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-main' for release 'canary' .
* 10:17 jayme@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
* 10:16 jayme@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-analytics' for release 'production' .
* 10:11 jayme@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-main' for release 'canary' .
* 10:11 jayme@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
* 09:55 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
* 09:53 jayme@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-analytics' for release 'production' .
* 09:47 elukey: roll restart of hadoop-yarn-nodemanager on all hadoop workers to pick up new settings
* 09:38 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics' for release 'production' .
* 09:38 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics' for release 'canary' .
* 09:32 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 09:32 kormat@cumin1001: START - Cookbook sre.hosts.downtime
* 09:07 XioNoX: remove user from all network devices
* 08:22 marostegui: Restart dbstore1005 mysql to pick up new buffer pool sizes
* 08:11 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 08:11 filippo@cumin1001: START - Cookbook sre.hosts.downtime
* 07:36 moritzm: installing xen security updates for buster (libs only)
* 07:34 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 07:34 filippo@cumin1001: START - Cookbook sre.hosts.downtime
* 00:16 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 00:00 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
== 2020-10-08 ==
* 23:42 ryankemper: `cloudelastic1006` done. Writes thawed, maintenance window lifted; restarts are done for `cloudelastic`
* 23:37 ryankemper: `cloudelastic1005` done
* 23:31 ryankemper: `cloudelastic1004` done
* 23:27 ryankemper: `cloudelastic1003` done
* 23:23 ryankemper: `cloudelastic1002` done
* 23:16 tgr_: Evening deploys done
* 23:16 ryankemper: `cloudelastic1001` is done restarting and cluster is green again. Proceeding to `cloudelastic1002`
* 23:16 tgr@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:632797{{!}}Enable logging of session cookie changes everywhere (T264793)]] (duration: 01m 01s)
* 23:04 ryankemper: Beginning cluster restarts one server at a time. For each server, the process is depool->restart elasticsearch services->wait for services to restart and then pool->wait for cluster to return to green status before starting next server
* 23:01 ryankemper: Writes are frozen for `cloudelastic`: `/usr/local/bin/mwscript extensions/CirrusSearch/maintenance/FreezeWritesToCluster.php --wiki=enwiki --cluster=cloudelastic` on `mwmaint2001` => `Applied cluster-wide freeze`
* 22:56 ryankemper: `sudo apt policy wmf-elasticsearch-search-plugins` shows correct state: `Installed: 6.5.4-4~stretch`
* 22:56 ryankemper: `sudo -E cumin -b 6 C:role::elasticsearch::cloudelastic 'DEBIAN_FRONTEND=noninteractive sudo apt-get -y -o Dpkg::Options::="--force-confdef" -o Dpkg::Options::="--force-confold" install wmf-elasticsearch-search-plugins'`
* 22:54 ryankemper: About to start plugin upgrade followed by restarts of `cloudelastic`. Maintenance window set for the next 2 hours on `cloudelastic100[1-6]`
* 21:54 ebernhardson@deploy1001: Finished deploy [wikimedia/discovery/analytics@a923949]: search_satisfaction: update druid datasource to match previous data (duration: 01m 04s)
* 21:53 ebernhardson@deploy1001: Started deploy [wikimedia/discovery/analytics@a923949]: search_satisfaction: update druid datasource to match previous data
* 21:52 hashar@deploy1001: Synchronized php-1.36.0-wmf.10/includes/session/SessionBackend.php: Deduplicate SessionBackend::logPersistenceChange calls - [[phab:T264793|T264793]] (duration: 01m 01s)
* 21:05 volans@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 21:00 volans@cumin1001: START - Cookbook sre.dns.netbox
* 21:00 volans@cumin1001: END (ERROR) - Cookbook sre.dns.netbox (exit_code=97)
* 21:00 volans@cumin1001: START - Cookbook sre.dns.netbox
* 20:50 volans@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 20:45 volans@cumin1001: START - Cookbook sre.dns.netbox
* 20:43 volans: deploying Netbox DNS zone consolidation - [[phab:T264273|T264273]]
* 20:11 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 20:09 andrew@cumin1001: START - Cookbook sre.hosts.downtime
* 19:24 ebernhardson@deploy1001: Finished deploy [wikimedia/discovery/analytics@3b11443]: search_satisfaction: Alias sample multiplier to expected name (duration: 01m 09s)
* 19:23 ebernhardson@deploy1001: Started deploy [wikimedia/discovery/analytics@3b11443]: search_satisfaction: Alias sample multiplier to expected name
* 18:57 volker-e@deploy1001: Finished deploy [design/style-guide@b1166af]: Deploy design/style-guide:  (duration: 00m 06s)
* 18:57 volker-e@deploy1001: Started deploy [design/style-guide@b1166af]: Deploy design/style-guide:
* 18:17 tchanders@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:632908{{!}}Enable Special:Investigate by default on production (T264357)]] (duration: 01m 06s)
* 17:50 root@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 17:49 ebernhardson@deploy1001: Finished deploy [wikimedia/discovery/analytics@945e5c1]: airflow: Set search satisfaction dag start date to oldest current available data (duration: 11m 55s)
* 17:44 root@cumin1001: START - Cookbook sre.dns.netbox
* 17:37 ebernhardson@deploy1001: Started deploy [wikimedia/discovery/analytics@945e5c1]: airflow: Set search satisfaction dag start date to oldest current available data
* 17:31 volans@cumin1001: START - Cookbook sre.dns.netbox
* 17:30 volans@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 17:27 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 17:26 andrew@cumin1001: START - Cookbook sre.hosts.downtime
* 17:23 volans@cumin1001: START - Cookbook sre.dns.netbox
* 17:16 shdubsh: install prometheus-rsyslog-exporter_0.0.0+git20201008 on centrallog1001 - [[phab:T210137|T210137]]
* 16:25 mutante: rebooting cloudvirt1023 - trying PXE boot
* 16:19 hashar: Restarting CI Jenkins
* 16:15 volans@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 16:09 volans@cumin1001: START - Cookbook sre.dns.netbox
* 16:08 volans@cumin1001: END (ERROR) - Cookbook sre.dns.netbox (exit_code=97)
* 16:08 volans@cumin1001: START - Cookbook sre.dns.netbox
* 14:21 marostegui: Set  global innodb_change_buffering = all; on pc2009 [[phab:T263443|T263443]]
* 14:17 moritzm: importing icu 63.1-6+deb10u1~wmf5 to component/icu63 [[phab:T264991|T264991]]
* 13:37 hnowlan@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
* 13:37 hnowlan@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
* 12:29 kart_: Updated cxserver to 2020-10-08-053343-production ([[phab:T264407|T264407]], [[phab:T264859|T264859]])
* 12:26 kartik@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'cxserver' for release 'production' .
* 12:24 kartik@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'cxserver' for release 'production' .
* 12:21 kartik@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'cxserver' for release 'staging' .
* 12:10 hnowlan@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
* 12:10 hnowlan@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
* 12:08 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 12:07 kormat@cumin1001: START - Cookbook sre.hosts.downtime
* 12:07 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
* 12:07 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
* 12:05 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
* 12:05 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
* 10:54 aborrero@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 10:52 aborrero@cumin2001: START - Cookbook sre.hosts.downtime
* 10:45 hnowlan@puppetmaster1001: conftool action : set/pooled=yes:weight=10; selector: dc=eqiad,cluster=restbase,service=restbase-backend,name=restbase1030.eqiad.wmnet
* 10:45 hnowlan@puppetmaster1001: conftool action : set/pooled=yes:weight=10; selector: dc=eqiad,cluster=restbase,service=restbase-ssl,name=restbase1030.eqiad.wmnet
* 10:45 hnowlan@puppetmaster1001: conftool action : set/pooled=yes:weight=10; selector: dc=eqiad,cluster=restbase,service=restbase,name=restbase1030.eqiad.wmnet
* 10:37 moritzm: installing Postgres security updates on netboxdb1001
* 10:34 hnowlan@puppetmaster1001: conftool action : set/pooled=yes:weight=10; selector: dc=eqiad,cluster=restbase,service=restbase-backend,name=restbase1029.eqiad.wmnet
* 10:34 hnowlan@puppetmaster1001: conftool action : set/pooled=yes:weight=10; selector: dc=eqiad,cluster=restbase,service=restbase-ssl,name=restbase1029.eqiad.wmnet
* 10:34 hnowlan@puppetmaster1001: conftool action : set/pooled=yes:weight=10; selector: dc=eqiad,cluster=restbase,service=restbase,name=restbase1029.eqiad.wmnet
* 10:32 moritzm: installing Postgres security updates on netboxdb2001
* 10:29 mvolz@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'citoid' for release 'production' .
* 10:28 hnowlan@puppetmaster1001: conftool action : set/pooled=yes:weight=10; selector: dc=eqiad,cluster=restbase,service=restbase-backend,name=restbase1028.eqiad.wmnet
* 10:27 hnowlan@puppetmaster1001: conftool action : set/pooled=yes:weight=10; selector: dc=eqiad,cluster=restbase,service=restbase-ssl,name=restbase1028.eqiad.wmnet
* 10:27 hnowlan@puppetmaster1001: conftool action : set/pooled=yes:weight=10; selector: dc=eqiad,cluster=restbase,service=restbase,name=restbase1028.eqiad.wmnet
* 10:26 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: dc=eqiad,cluster=restbase,service=restbase-backend,name=restbase1028.eqiad.wmnet
* 10:26 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: dc=eqiad,cluster=restbase,service=restbase-ssl,name=restbase1028.eqiad.wmnet
* 10:26 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: dc=eqiad,cluster=restbase,service=restbase,name=restbase1028.eqiad.wmnet
* 10:26 hnowlan: pooling restbase1028,restbase1029,restbase1030
* 10:22 mvolz@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'citoid' for release 'production' .
* 10:14 mvolz@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'citoid' for release 'staging' .
* 09:40 gehel@cumin1001: START - Cookbook sre.wdqs.data-reload
* 09:10 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 09:09 klausman@cumin1001: START - Cookbook sre.hosts.downtime
* 08:38 godog: roll-restart swift-object-replicator on ms-be2* - [[phab:T261633|T261633]]
* 08:19 kormat: running schema change against s8 in eqiad [[phab:T259831|T259831]]
* 08:19 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 08:19 kormat@cumin1001: START - Cookbook sre.hosts.downtime
* 08:06 gehel@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 08:04 gehel@cumin1001: START - Cookbook sre.hosts.downtime
* 08:02 gehel: repooling wdqs2002
* 07:55 marostegui: Rebuild db2125 from snapshots - [[phab:T260670|T260670]]
* 07:45 marostegui: Stop MySQL on db1077 to build it from s1 snapshot
* 07:40 gehel: depooled wdqs2002 to catch up on lag
* 07:29 jayme: updated envoyproxy to 1.15.1-2 on all codfw hosts
* 07:23 moritzm: installing pyzmq updates from Buster point release
* 07:00 dcausse: depooling wdqs2002 (catching-up lag)
* 06:57 dcausse: restart blazegraph on wdqs2002 (stuck) [[phab:T242453|T242453]]
* 06:51 _joe_: enable notifications for wdqs-ssl-codfw
* 05:33 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 05:27 marostegui@cumin1001: START - Cookbook sre.hosts.decommission
* 04:05 ejegg: updated fundraising python tools from {{Gerrit|5515923ef7}} to {{Gerrit|d4e08c52de}}
* 00:31 tgr_: evening deploys done
* 00:20 tgr@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:632796{{!}}Enable logging of session cookie changes in group1 (T264793)]] (again, forgot to rebase the previous time) (duration: 00m 59s)
* 00:15 tgr@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:632796{{!}}Enable logging of session cookie changes in group1 (T264793)]] (duration: 00m 57s)
* 00:03 tgr@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:632795{{!}}Enable logging of session cookie changes in group0 (T264793)]] (duration: 00m 58s)
== 2020-10-07 ==
* 23:58 tgr@deploy1001: Synchronized php-1.36.0-wmf.10/includes/session: Backport: [[gerrit:632685{{!}}Log when SessionManager is emitting cookies (T264793)]] (duration: 01m 00s)
* 23:56 ryankemper@cumin1001: END (PASS) - Cookbook sre.elasticsearch.rolling-reboot (exit_code=0)
* 23:56 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-reboot
* 23:56 ryankemper@cumin1001: END (PASS) - Cookbook sre.elasticsearch.rolling-restart (exit_code=0)
* 23:55 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-restart
* 21:55 ryankemper@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.rolling-reboot (exit_code=99)
* 21:48 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-reboot
* 21:14 ryankemper@cumin1001: END (PASS) - Cookbook sre.elasticsearch.rolling-reboot (exit_code=0)
* 20:56 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-reboot
* 20:09 ebernhardson@deploy1001: Finished deploy [wikimedia/discovery/analytics@7fa787e]: airflow: update mjolnir configuration to reduce max training dataset (duration: 03m 23s)
* 20:05 ebernhardson@deploy1001: Started deploy [wikimedia/discovery/analytics@7fa787e]: airflow: update mjolnir configuration to reduce max training dataset
* 19:36 mutante: blog post: The latest addition to our family of Wikimedia languages is "Inari Sami" with language code "smn". It is a Sami language spoken by the Inari Sami of Finland and has about 400 native speakers. It's in the Uralic language family. Wikipedia will be created in [[phab:T264859|T264859]]. https://en.wikipedia.org/wiki/Inari_Sami {{!}} https://iso639-3.sil.org/code/smn {{!}}
* 18:30 ryankemper: search team's backport deploy is complete
* 18:30 ryankemper@deploy1001: Synchronized wmf-config/ProductionServices.php: Config: [[gerrit:632683{{!}}cloudelastic: envoy sits in front now (T263073)]] (duration: 00m 58s)
* 18:29 ryankemper: Above tests are as expected, syncing changes everywhere: `scap sync-file wmf-config/ProductionServices.php 'Config: [[gerrit:632683{{!}}cloudelastic: envoy sits in front now (T263073)]]'`
* 18:27 ryankemper: `scap pull`ed onto `mwdebug2001`; talking to cloudelastic via mediawiki from codfw has the expected decrease in latency due to the tls connection pooling
* 18:24 ryankemper: `scap pull`ed onto `mwdebug1002`. Talking to cloudelastic on localhost (which routes thru envoy), 6105 is `cloudelastic-chi-eqiad`, 6106 is `cloudelastic-omega-eqiad`, and 6107 is `cloudelastic-psi-eqiad` as expected
* 18:20 ryankemper: (backport) HEAD set to {{Gerrit|834b4571f978674162fa805906e665e35ac68e27}} as expected
* 18:12 hashar@deploy1001: Synchronized php-1.36.0-wmf.10/includes/HeaderCallback.php: Preload class used in HeaderCallback - [[phab:T261260|T261260]] (duration: 01m 01s)
* 17:58 hashar: Pulled https://gerrit.wikimedia.org/r/c/mediawiki/core/+/632680  on deployment staging area  and mw2001
* 17:35 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 17:33 elukey@cumin1001: START - Cookbook sre.hosts.downtime
* 16:39 jgleeson: updated civicrm from {{Gerrit|39b4f954ed}} to {{Gerrit|b86a15a430}}
* 16:35 mutante: switching webproxy service names to the new local install servers in esams/eqsin/ulsfo [[phab:T242602|T242602]]
* 15:12 godog: upgrade rsyslog to 8.2008.0-1~bpo10+1 on centrallog1001 - [[phab:T259780|T259780]]
* 14:45 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 14:41 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 14:37 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 14:33 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 14:22 jayme@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'cxserver' for release 'production' .
* 14:04 hoo: Ran "mwscript extensions/Wikibase/repo/maintenance/changePropertyDataType.php --wiki=wikidatawiki --property-id P1820 --new-data-type external-id" on mwmaint2001 ([[phab:T263986|T263986]])
* 14:04 jayme@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'cxserver' for release 'production' .
* 14:03 akosiaris@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'termbox' for release 'production' .
* 14:00 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'cxserver' for release 'staging' .
* 13:42 jayme: updated envoyproxy to 1.15.1-2 on all eqiad hosts
* 13:39 akosiaris@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'termbox' for release 'production' .
* 13:37 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'termbox' for release 'test' .
* 13:37 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'termbox' for release 'staging' .
* 13:18 volker-e@deploy1001: Finished deploy [design/style-guide@e3fda83]: Deploy design/style-guide:  (duration: 00m 04s)
* 13:18 volker-e@deploy1001: Started deploy [design/style-guide@e3fda83]: Deploy design/style-guide:
* 12:33 jayme@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'blubberoid' for release 'production' .
* 12:24 jayme@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'blubberoid' for release 'production' .
* 12:22 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'blubberoid' for release 'staging' .
* 11:55 _joe_: rolling restart of restbase due to running puppet with changed config-vars (a noop for the actual configuration)
* 11:22 Urbanecm: EU B&C window done
* 11:22 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|f85bc3056f809910c0487fb0b0559b3de92b1992}}: Enable bot passwords at all fishbowl and private wikis ([[phab:T258356|T258356]]) (duration: 00m 58s)
* 11:15 urbanecm@deploy1001: Synchronized wmf-config/CommonSettings.php: {{Gerrit|57297362c0a22ecf16648b7be4a73c4cb80d53ef}}: Fix OAuthRateLimiter rate limit configuration (duration: 00m 59s)
* 11:14 urbanecm@deploy1001: sync-file aborted: {{Gerrit|57297362c0a22ecf16648b7be4a73c4cb80d53ef}}: Fix OAuthRateLimiter rate limit configuration (duration: 00m 02s)
* 11:07 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|6cdeea2c4c15780a641722157584f12febedab2a}}: Set CXMTThresholdForPublish to 95% for Vietnamese Wikipedia ([[phab:T264161|T264161]]) (duration: 00m 59s)
* 10:58 marostegui: Set innodb_change_buffering = inserts on pc2009 [[phab:T263443|T263443]]
* 09:53 kormat@cumin1001: dbctl commit (dc=all): 'Remove db2119 from mw load groups [[phab:T259831|T259831]]', diff saved to https://phabricator.wikimedia.org/P12945 and previous config saved to /var/cache/conftool/dbconfig/20201007-095355-kormat.json
* 09:44 kormat@cumin1001: dbctl commit (dc=all): 'db2138:3314 (re)pooling @ 100%: 75', diff saved to https://phabricator.wikimedia.org/P12944 and previous config saved to /var/cache/conftool/dbconfig/20201007-094412-kormat.json
* 09:21 moritzm: imported icu63 63.1-6+deb10u1~wmf1 to component/icu63 for stretch-wikimedia
* 09:09 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1076 [[phab:T264755|T264755]] ', diff saved to https://phabricator.wikimedia.org/P12943 and previous config saved to /var/cache/conftool/dbconfig/20201007-090943-marostegui.json
* 08:39 kormat@cumin1001: dbctl commit (dc=all): 'db2138:3314 depooling: schema change [[phab:T259831|T259831]]', diff saved to https://phabricator.wikimedia.org/P12942 and previous config saved to /var/cache/conftool/dbconfig/20201007-083903-kormat.json
* 08:38 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 08:38 kormat@cumin1001: START - Cookbook sre.hosts.downtime
* 08:32 godog: roll-restart statsd-exporter across ms-be* after puppet run - [[phab:T264588|T264588]]
* 08:09 jayme: updated envoyproxy to 1.15.1-2 on all non mw and restbase hosts
* 08:05 volans@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 07:58 volans@cumin1001: START - Cookbook sre.dns.netbox
* 07:49 marostegui@cumin1001: dbctl commit (dc=all): 'Remove es2015 from dbctl [[phab:T264700|T264700]]', diff saved to https://phabricator.wikimedia.org/P12941 and previous config saved to /var/cache/conftool/dbconfig/20201007-074951-marostegui.json
* 07:14 marostegui: Stop MySQL es2015 for decommissioning [[phab:T264700|T264700]]
* 05:52 ayounsi@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 05:46 ayounsi@cumin1001: START - Cookbook sre.dns.netbox
* 02:37 eileen: civicrm revision changed from {{Gerrit|a30da7f92a}} to {{Gerrit|39b4f954ed}}, config revision is {{Gerrit|0ca9a3a055}}
* 01:00 cdanis: repool esams; cr2-esams router upgrade complete
* 00:43 cdanis: [[phab:T259621|T259621]] cdanis@re1.cr2-esams> request chassis routing-engine master switch
* 00:40 cdanis: [[phab:T259621|T259621]] cdanis@re1.cr2-esams> request system reboot other-routing-engine
* 00:36 cdanis: [[phab:T259621|T259621]] cdanis@re1.cr2-esams> request system software add /var/tmp/junos-install-mx-x86-64-17.3R3-S8.1.tgz re0 no-validate
* 00:26 cdanis: [[phab:T259621|T259621]] cdanis@re0.cr2-esams> request chassis routing-engine master switch
* 00:22 cdanis: [[phab:T259621|T259621]] cdanis@re0.cr2-esams> request system reboot other-routing-engine
* 00:15 cdanis: [[phab:T259621|T259621]] cdanis@re0.cr2-esams> request system software add re1 no-validate /var/tmp/junos-install-mx-x86-64-17.3R3-S8.1.tgz
* 00:01 mutante: reinstalling testvm[345]001 to confirm OS installs work as normal after switching DHCP servers in POPs ([[phab:T252526|T252526]])
== 2020-10-06 ==
* 23:55 mutante: 🖧  switched DHCP server for eqsin from install2003 to install5001 - homer deployed to cr*eqsin* ([[phab:T252526|T252526]]) 🖧
* 23:53 mutante: 🖧  switched DHCP server for ulsfo from install2003 to install4001 - homer deployed to cr*ulsfo* ([[phab:T252526|T252526]]) 🖧
* 23:52 mutante: 🖧  switched DHCP server for esams from install1003 to install3001 - homer deployed to cr*esams* ([[phab:T252526|T252526]]) 🖧
* 23:43 jhuneidi@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'blubberoid' for release 'production' .
* 23:11 jhuneidi@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'blubberoid' for release 'production' .
* 23:07 jhuneidi@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'blubberoid' for release 'staging' .
* 22:32 ryankemper: Restart of `wdqs-categories` done. WDQS deploy is complete
* 21:57 ryankemper: Restarting `wdqs-categories` across production instances one-at-a-time: `sudo -E cumin -b 1 'A:wdqs-all and not A:wdqs-test' 'depool && sleep 60 && systemctl restart wdqs-categories && sleep 30 && pool'`
* 21:57 ryankemper: Restarting `wdqs-categories` across all test instances (not public facing): `sudo -E cumin 'A:wdqs-test' 'systemctl restart wdqs-categories'`
* 21:56 ryankemper: Restarting `wdqs-updater` across the fleet: `sudo -E cumin -b 4 'A:wdqs-all' 'systemctl restart wdqs-updater'`
* 21:55 ryankemper@deploy1001: Finished deploy [wdqs/wdqs@e56a20e]: 0.3.51 (duration: 13m 09s)
* 21:43 ryankemper: All tests passing on canary `wdqs1003`, proceeding to rest of fleet
* 21:42 ryankemper@deploy1001: Started deploy [wdqs/wdqs@e56a20e]: 0.3.51
* 21:14 ppchelko@deploy1001: Synchronized wmf-config/CommonSettings.php: gerrit:632535 (duration: 01m 00s)
* 20:25 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 20:23 andrew@cumin1001: START - Cookbook sre.hosts.downtime
* 18:40 Urbanecm: Morning B&C done
* 18:40 urbanecm@deploy1001: Synchronized php-1.36.0-wmf.11/skins/MinervaNeue/: {{Gerrit|2118d265c0f5b6c914efeba86ba7eacd30c5ee0f}}: Hot fix: Use display for hiding/showing sidebar on OS 14_0 ([[phab:T264376|T264376]]) (duration: 01m 00s)
* 18:37 urbanecm@deploy1001: Synchronized php-1.36.0-wmf.10/skins/MinervaNeue/: {{Gerrit|d428ccbdf3be9a45139f8b8c0874c113f1732198}}: Hot fix: Use display for hiding/showing sidebar on OS 14_0 ([[phab:T264376|T264376]]) (duration: 01m 03s)
* 18:25 ppchelko@deploy1001: Synchronized wmf-config/Wikibase.php: Wikibase.php gerrit:631775 [[phab:T263493|T263493]] [[phab:T259622|T259622]] (duration: 00m 58s)
* 18:23 ppchelko@deploy1001: Synchronized wmf-config/InitialiseSettings.php: IS.php gerrit:631775 [[phab:T263493|T263493]] [[phab:T259622|T259622]] (duration: 00m 59s)
* 18:19 ppchelko@deploy1001: Synchronized wmf-config/InitialiseSettings.php: gerrit:632516 [[phab:T264043|T264043]] (duration: 00m 59s)
* 18:15 ppchelko@deploy1001: Synchronized wmf-config/InitialiseSettings.php: gerrit:632323 [[phab:T264637|T264637]] (duration: 00m 58s)
* 18:12 ppchelko@deploy1001: Synchronized wmf-config/InitialiseSettings.php: gerrit:632484 [[phab:T264637|T264637]] (duration: 00m 58s)
* 15:41 godog: centrallog* delete archived logs from old, single file, organization
* 15:23 jayme@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'citoid' for release 'production' .
* 15:23 jayme: updated envoyproxy to 1.15.1-2 on mw-canary and restbase-canary
* 14:57 sukhe: upload dnsdist_1.5.0-1wm1 to apt.wm.o (buster) - [[phab:T263789|T263789]]
* 14:47 kormat@cumin1001: dbctl commit (dc=all): 'db2137:3314 (re)pooling @ 100%: schema change [[phab:T259831|T259831]]', diff saved to https://phabricator.wikimedia.org/P12936 and previous config saved to /var/cache/conftool/dbconfig/20201006-144701-kormat.json
* 14:45 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 14:45 aborrero@cumin1001: START - Cookbook sre.hosts.downtime
* 14:45 vgutierrez: Bump ECDHE-ECDSA-AES128-SHA pageview replacement to 5% - [[phab:T262946|T262946]]
* 14:45 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 14:44 aborrero@cumin1001: START - Cookbook sre.hosts.downtime
* 14:40 jayme: updated envoyproxy to 1.15.1-2 on mw2295.codfw.wmnet,restbase2017.codfw.wmnet
* 14:38 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: dc=codfw,cluster=restbase,service=restbase-backend,name=restbase2009.codfw.wmnet
* 14:38 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: dc=codfw,cluster=restbase,service=restbase-ssl,name=restbase2009.codfw.wmnet
* 14:38 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: dc=codfw,cluster=restbase,service=restbase,name=restbase2009.codfw.wmnet
* 14:36 hnowlan: repooling restbase2009
* 14:31 kormat@cumin1001: dbctl commit (dc=all): 'db2137:3314 (re)pooling @ 75%: schema change [[phab:T259831|T259831]]', diff saved to https://phabricator.wikimedia.org/P12935 and previous config saved to /var/cache/conftool/dbconfig/20201006-143157-kormat.json
* 14:19 volker-e@deploy1001: Finished deploy [design/style-guide@e3fda83]: Deploy design/style-guide:  (duration: 00m 05s)
* 14:19 volker-e@deploy1001: Started deploy [design/style-guide@e3fda83]: Deploy design/style-guide:
* 14:15 jayme: installed envoyproxy 1.15.1-2 on mwdebug1001
* 14:08 marostegui: Reboot db1076 for kernel upgrade [[phab:T264755|T264755]]
* 14:04 jayme@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'citoid' for release 'production' .
* 14:03 marostegui: Power cycle db1076 [[phab:T264755|T264755]]
* 13:58 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1076 ', diff saved to https://phabricator.wikimedia.org/P12934 and previous config saved to /var/cache/conftool/dbconfig/20201006-135810-marostegui.json
* 13:41 kormat@cumin1001: dbctl commit (dc=all): 'db2137:3314 depooling: schema change [[phab:T259831|T259831]]', diff saved to https://phabricator.wikimedia.org/P12932 and previous config saved to /var/cache/conftool/dbconfig/20201006-134149-kormat.json
* 13:41 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 13:41 kormat@cumin1001: START - Cookbook sre.hosts.downtime
* 13:40 kormat@cumin1001: dbctl commit (dc=all): 'Remove db2119 from dump/vslow, add to all other contributions/logpager/recentchanges*/watchlist temporarily [[phab:T259831|T259831]]', diff saved to https://phabricator.wikimedia.org/P12931 and previous config saved to /var/cache/conftool/dbconfig/20201006-134020-kormat.json
* 13:40 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'citoid' for release 'staging' .
* 13:14 jayme: pushed docker-registry.discovery.wmnet/envoy:1.15.1-2 - [[phab:T264157|T264157]]
* 13:04 marostegui: Change innodb_change_buffering = inserts on db2075 db2089 db2099 db2111 db2128 [[phab:T263443|T263443]]
* 12:55 godog: swift codfw-prod: bump weight for ms-be2057 - [[phab:T261633|T261633]]
* 12:20 elukey: update HDFS Namenode GC/Heap settings on an-master100[1,2]
* 12:13 jayme: imported envoyproxy_1.15.1-2 to buster-wikimedia and stretch-wikimedia
* 12:08 jbond42: deploy puppetlabs-stdlib 5.2
* 11:50 ayounsi@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 11:42 ayounsi@cumin1001: START - Cookbook sre.dns.netbox
* 11:39 ayounsi@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 11:35 ayounsi@cumin1001: START - Cookbook sre.dns.netbox
* 11:34 Urbanecm: EU B&C window done
* 11:34 Urbanecm: urbanecm@mwmaint2001:~$ mwscript namespaceDupes.php --wiki=arbcom_ruwiki --fix # [[phab:T264430|T264430]] # P12930
* 11:33 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|07c19f97c79ec20d6b1657e589acfc242dd53b09}}: arbcom_ruwiki: Set AK as alias for NS_PROJECT ([[phab:T264430|T264430]]) (duration: 00m 58s)
* 11:31 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|7e4e81129b8697c394ec329dd2b3c784e607a4d1}}: arbcom_ruwiki: Change favicon to File:Arbcom-ru_favicon.svg from commons ([[phab:T264430|T264430]]) (duration: 00m 58s)
* 11:30 urbanecm@deploy1001: Synchronized static/favicon/arbcom_ruwiki.ico: {{Gerrit|7e4e81129b8697c394ec329dd2b3c784e607a4d1}}: arbcom_ruwiki: Change favicon to File:Arbcom-ru_favicon.svg from commons ([[phab:T264430|T264430]]) (duration: 00m 58s)
* 11:20 XioNoX: push L3 prep work to cloudsw1-c8-eqiad
* 11:19 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|7b1a4fad0f55c626e42961489062115d5f97ed6c}}: ruewiki: Add rollbacker, grantable and revokable by sysops ([[phab:T264147|T264147]]) (duration: 00m 58s)
* 11:15 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|5cc7027ba8d0ddee5c9898b80afe850603bf870e}}: Allow bureaucrats to remove sysop permissions on Commons ([[phab:T261481|T261481]]) (duration: 00m 58s)
* 11:07 hnowlan@deploy1001: Finished deploy [restbase/deploy@4ad65b0]: Redeploying restbase2009 (duration: 03m 14s)
* 11:07 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|5f9721b3300c8e733d331bcbc754d31d9493f8ba}}: GrowthExperiments: Change Help Page URL for kowiki ([[phab:T254364|T254364]]) (duration: 01m 00s)
* 11:04 hnowlan@deploy1001: Started deploy [restbase/deploy@4ad65b0]: Redeploying restbase2009
* 11:02 hnowlan@deploy1001: Finished deploy [restbase/deploy@4ad65b0]: Redeploying restbase2009 (duration: 00m 12s)
* 11:02 hnowlan@deploy1001: Started deploy [restbase/deploy@4ad65b0]: Redeploying restbase2009
* 11:01 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 11:01 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime
* 10:48 effie: set mw2279.codfw.wmnet as inactive [[phab:T264698|T264698]]
* 10:47 jiji@cumin1001: conftool action : set/pooled=inactive; selector: name=mw2279.codfw.wmnet
* 10:45 hnowlan@deploy1001: Finished deploy [restbase/deploy@4ad65b0]: Deploying restbase to new hosts (duration: 01m 19s)
* 10:44 hnowlan@deploy1001: Started deploy [restbase/deploy@4ad65b0]: Deploying restbase to new hosts
* 10:43 hnowlan@deploy1001: Finished deploy [restbase/deploy@4ad65b0]: Deploying restbase to new hosts (duration: 01m 19s)
* 10:41 hnowlan@deploy1001: Started deploy [restbase/deploy@4ad65b0]: Deploying restbase to new hosts
* 10:37 hnowlan@deploy1001: Finished deploy [restbase/deploy@4ad65b0]: Redeploying to depooled restbase2009 (duration: 00m 15s)
* 10:37 hnowlan@deploy1001: Started deploy [restbase/deploy@4ad65b0]: Redeploying to depooled restbase2009
* 10:36 volans@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 10:33 hnowlan@deploy1001: Finished deploy [restbase/deploy@4ad65b0]: (no justification provided) (duration: 03m 01s)
* 10:31 volans@cumin1001: START - Cookbook sre.dns.netbox
* 10:30 hnowlan@deploy1001: Started deploy [restbase/deploy@4ad65b0]: (no justification provided)
* 10:01 marostegui: Restart mysql on dbstore1004 to pick up new buffer pool sizes
* 09:59 effie: enable puppet on mc20*
* 09:41 effie: enable puppet on mc10*
* 09:38 effie: disable puppet on mc*
* 09:27 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 09:26 klausman@cumin1001: START - Cookbook sre.hosts.downtime
* 08:57 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=0)
* 08:55 elukey@cumin1001: START - Cookbook sre.hadoop.init-hadoop-workers
* 08:33 jayme: imported envoyproxy_1.15.1-1+deb9u1 to stretch-wikimedia
* 08:27 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 08:26 elukey@cumin1001: START - Cookbook sre.hosts.downtime
* 08:02 volans: removing unused ms-fe and ms-fe-thumbs svc records from DNS (gerrit/628086)
* 07:53 marostegui: Change innodb_change_buffering = inserts on db2087:3316 db2089:3316 db2076 db2097:3316 db2114 [[phab:T263443|T263443]]
* 07:39 filippo@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
* 07:35 filippo@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
* 07:31 filippo@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'wikifeeds' for release 'staging' .
* 07:17 marostegui: Remove es2015 and es2017 from tendril and zarcillo [[phab:T264700|T264700]] [[phab:T264386|T264386]]
* 07:14 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es2015 [[phab:T264700|T264700]] ', diff saved to https://phabricator.wikimedia.org/P12926 and previous config saved to /var/cache/conftool/dbconfig/20201006-071451-marostegui.json
* 07:05 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 06:59 marostegui@cumin1001: START - Cookbook sre.hosts.decommission
* 05:28 marostegui@cumin1001: dbctl commit (dc=all): 'Remove es2017 from dbctl [[phab:T264386|T264386]]', diff saved to https://phabricator.wikimedia.org/P12925 and previous config saved to /var/cache/conftool/dbconfig/20201006-052849-marostegui.json
== 2020-10-05 ==
* 23:11 ejegg: updated payments staging from {{Gerrit|52704ffe24}} to {{Gerrit|db03677b2d}}
* 22:27 mutante: removing shinken puppet module and role
* 22:01 ebernhardson: restore wikidatawiki_content enwiki_content enwiki_general and commonswiki_file to default index.merge.policy.deletes_pct_allowed on eqiad cirrus cluster [[phab:T264053|T264053]]
* 21:01 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 20:59 andrew@cumin1001: START - Cookbook sre.hosts.downtime
* 20:30 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 20:28 andrew@cumin1001: START - Cookbook sre.hosts.downtime
* 20:26 ebernhardson: restart elasticsearch_6@production-search-codfw on elastic2051 to take reduced (32 sector, 16kB) readahead settings [[phab:T264053|T264053]]
* 20:13 ebernhardson: restart elasticsearch_6@production-search-codfw on elastic2051 to take reduced (64 sector, 32kB) readahead settings [[phab:T264053|T264053]]
* 19:56 ebernhardson: restart elasticsearch_6@production-search-codfw on elastic2050 to take reduced (128kB) readahead settings [[phab:T264053|T264053]]
* 19:31 mutante: ran sre.dns.netbox to push addition of an-worker1113 which was commited in prod repo but not in netbox data
* 19:30 dzahn@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 19:27 dzahn@cumin1001: START - Cookbook sre.dns.netbox
* 18:59 mforns@deploy1001: Finished deploy [analytics/refinery@2c6c335] (thin): [THIN] Special deployment to unblock deletion jobs [analytics/refinery@2c6c335e61cecd0321ec6f066a153feaf2dbbc27] (duration: 00m 08s)
* 18:59 mforns@deploy1001: Started deploy [analytics/refinery@2c6c335] (thin): [THIN] Special deployment to unblock deletion jobs [analytics/refinery@2c6c335e61cecd0321ec6f066a153feaf2dbbc27]
* 18:58 mforns@deploy1001: Finished deploy [analytics/refinery@2c6c335]: Special deployment to unblock deletion jobs [analytics/refinery@2c6c335e61cecd0321ec6f066a153feaf2dbbc27] (duration: 12m 08s)
* 18:46 mforns@deploy1001: Started deploy [analytics/refinery@2c6c335]: Special deployment to unblock deletion jobs [analytics/refinery@2c6c335e61cecd0321ec6f066a153feaf2dbbc27]
* 18:17 elukey@cumin1001: END (FAIL) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=99)
* 18:17 elukey@cumin1001: START - Cookbook sre.hadoop.init-hadoop-workers
* 18:15 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=0)
* 18:13 elukey@cumin1001: START - Cookbook sre.hadoop.init-hadoop-workers
* 18:11 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=0)
* 18:10 elukey@cumin1001: START - Cookbook sre.hadoop.init-hadoop-workers
* 17:53 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 17:51 elukey@cumin1001: START - Cookbook sre.hosts.downtime
* 17:29 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 17:27 elukey@cumin1001: START - Cookbook sre.hosts.downtime
* 17:25 hnowlan@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' .
* 17:25 hnowlan@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'staging' .
* 17:00 hnowlan@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'staging' .
* 17:00 hnowlan@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' .
* 16:59 hnowlan@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' .
* 16:59 hnowlan@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'staging' .
* 16:51 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'staging' .
* 16:51 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' .
* 15:15 ppchelko@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' .
* 14:56 ppchelko@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' .
* 14:55 ppchelko@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'staging' .
* 14:41 elukey: shutdown stat1005 and stat1008 for ram expansion (1005 again)
* 14:36 ppchelko@deploy1001: Finished deploy [restbase/deploy@366a543]: [[phab:T263133|T263133]] [[phab:T264035|T264035]] (duration: 22m 23s)
* 14:25 elukey: shutdown an-master1001 for ram expansion
* 14:13 ppchelko@deploy1001: Started deploy [restbase/deploy@366a543]: [[phab:T263133|T263133]] [[phab:T264035|T264035]]
* 14:01 filippo@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'citoid' for release 'production' .
* 13:58 filippo@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'citoid' for release 'production' .
* 13:55 filippo@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'citoid' for release 'staging' .
* 13:54 elukey: shutdown stat1005 for ram upgrade
* 13:31 elukey: shutdown an-master1002 for ram expansion (64 -> 128G)
* 12:39 moritzm: installing curl security updates on remaining hosts
* 11:34 hoo@deploy1001: Synchronized wmf-config/: Revert "Remove $wgExtraLanguageNames from Wikidata and Commons" ([[phab:T264295|T264295]]) (duration: 00m 59s)
* 11:28 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|be73f155001e9095697c3c21a208c63e7bf5d2d1}}: Move changetags right from users to sysop [trwiki] ([[phab:T264508|T264508]]) (duration: 00m 59s)
* 11:12 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|cd30b626e23b48146b970c72731f8f7bb1eee9e1}}: wgSkipSkins: Exclude contenttranslation skin from skin options for users ([[phab:T263093|T263093]]) (duration: 00m 59s)
* 11:05 jdrewniak@deploy1001: Synchronized portals: Wikimedia Portals Update: [[gerrit:632212{{!}} Bumping portals to master (T128546)]] (duration: 00m 58s)
* 11:04 jdrewniak@deploy1001: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: [[gerrit:632212{{!}} Bumping portals to master (T128546)]] (duration: 00m 58s)
* 10:37 elukey@cumin1001: END (PASS) - Cookbook sre.aqs.roll-restart (exit_code=0)
* 10:37 jdrewniak@deploy1001: Synchronized portals: Wikimedia Portals Update: [[gerrit:632204{{!}} Bumping portals to master (T128546)]] (duration: 00m 58s)
* 10:36 jdrewniak@deploy1001: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: [[gerrit:632204{{!}} Bumping portals to master (T128546)]] (duration: 01m 00s)
* 10:34 elukey@cumin1001: START - Cookbook sre.aqs.roll-restart
* 10:32 ema: cp3052: pool with varnish 5.1.3-1wm15 [[phab:T264398|T264398]]
* 10:28 ema: cp3052: depool and downgrade varnish to 5.1.3-1wm15 [[phab:T264398|T264398]]
* 10:08 moritzm: installing ldap-replica1002 [[phab:T264390|T264390]]
* 09:52 moritzm: installing ldap-replica1001 [[phab:T264390|T264390]]
* 09:22 moritzm: installing ldap-replica2003 [[phab:T264390|T264390]]
* 09:02 hnowlan: bootstrapping restbase1030-b
* 08:57 moritzm: installing ldap-replica2004 [[phab:T264390|T264390]]
* 08:40 kormat@cumin1001: dbctl commit (dc=all): 'db2073 depooling: schema change [[phab:T259831|T259831]]', diff saved to https://phabricator.wikimedia.org/P12918 and previous config saved to /var/cache/conftool/dbconfig/20201005-084022-kormat.json
* 08:39 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 08:39 kormat@cumin1001: START - Cookbook sre.hosts.downtime
* 08:38 kormat@cumin1001: dbctl commit (dc=all): 'Add db2119 to s4 dump/vslow temporarily [[phab:T259831|T259831]]', diff saved to https://phabricator.wikimedia.org/P12917 and previous config saved to /var/cache/conftool/dbconfig/20201005-083822-kormat.json
* 08:23 godog: prometheus codfw/ops, add 100G to the LV
* 08:06 jmm@cumin2001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
* 07:46 marostegui: Stop mysql on es2017 [[phab:T264386|T264386]]
* 07:30 jmm@cumin2001: START - Cookbook sre.ganeti.makevm
* 06:52 XioNoX: add static NAT to pfw3-eqiad - [[phab:T264356|T264356]]
* 06:33 elukey: reboot stat1005 to resolve weird GPU state (scheduled last week)
* 05:06 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es2017 [[phab:T264386|T264386]] ', diff saved to https://phabricator.wikimedia.org/P12916 and previous config saved to /var/cache/conftool/dbconfig/20201005-050636-marostegui.json
== 2020-10-03 ==
* 15:52 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: emergency: {{Gerrit|840545f1d9115ea6b672cecce1762d850d8b1f54}}: Restrict flow-hide right to autoconfirmed users on zhwiki ([[phab:T264489|T264489]]) (duration: 01m 17s)
* 00:08 ejegg: updated fundraising CiviCRM from {{Gerrit|256adda03c}} to {{Gerrit|a30da7f92a}}
== 2020-10-02 ==
* 22:00 mutante: depooling mw2271 because Icinga alerts about memcached and SAL shows there were ongoing tests of some kind on it
* 21:59 dzahn@cumin1001: conftool action : set/pooled=no; selector: dc=codfw,name=mw2271.codfw.wmnet
* 21:35 dzahn@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 21:32 dzahn@cumin1001: START - Cookbook sre.dns.netbox
* 21:26 dzahn@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 21:22 dzahn@cumin1001: START - Cookbook sre.dns.netbox
* 19:14 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
* 18:35 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
* 18:27 effie: enable puppet on mw2271
* 18:16 ebernhardson@deploy1001: Finished deploy [wikimedia/discovery/analytics@da6a098]: oozie: query_clicks_hourly needs to wait on codfw events (duration: 02m 01s)
* 18:14 ebernhardson@deploy1001: Started deploy [wikimedia/discovery/analytics@da6a098]: oozie: query_clicks_hourly needs to wait on codfw events
* 17:15 mutante: submitted puppet refactoring change on maps servers
* 16:49 effie: disable puppet on mw2271 and briefly depool it
* 15:39 _joe_: restarting redis on rdb2003, instance 6380
* 15:28 hnowlan: bootstrapping restbase1030-a
* 15:25 cdanis@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=ores,name=eqiad
* 14:45 cdanis@deploy1001: Synchronized docroot/wikimediafoundation.org: Separate foundation.wikimedia.org docroot & add .well-known/matrix/server [[phab:T261531|T261531]] {{Gerrit|4573776bd}} {{Gerrit|2fb4c20ae}} (duration: 01m 01s)
* 14:19 moritzm: installing LLVM 7 bugfix updates from Buster point release
* 14:08 effie: enable puppet on mwdebug1001
* 14:08 moritzm: purging some unused kernels on ping* (these only have 3GB "disks")
* 13:37 Urbanecm: Create bot_passwords table at fishbowl wikis ([[phab:T258356|T258356]])
* 13:35 kormat@cumin1001: dbctl commit (dc=all): 'db2140 (re)pooling @ 100%: schema change [[phab:T259831|T259831]]', diff saved to https://phabricator.wikimedia.org/P12905 and previous config saved to /var/cache/conftool/dbconfig/20201002-133545-kormat.json
* 13:20 kormat@cumin1001: dbctl commit (dc=all): 'db2140 (re)pooling @ 75%: schema change [[phab:T259831|T259831]]', diff saved to https://phabricator.wikimedia.org/P12904 and previous config saved to /var/cache/conftool/dbconfig/20201002-132042-kormat.json
* 13:00 moritzm: installing Linux 4.19.146 on Buster updates (from latest Buster point release, at this point only installing the updates, no reboots (yet))
* 12:26 effie: disable puppet on mwdebug1001
* 12:19 kormat@cumin1001: dbctl commit (dc=all): 'db2140 depooling: schema change [[phab:T259831|T259831]]', diff saved to https://phabricator.wikimedia.org/P12903 and previous config saved to /var/cache/conftool/dbconfig/20201002-121830-kormat.json
* 12:18 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 12:18 kormat@cumin1001: START - Cookbook sre.hosts.downtime
* 12:08 kormat@cumin1001: dbctl commit (dc=all): 'db2110 (re)pooling @ 100%: schema change [[phab:T259831|T259831]]', diff saved to https://phabricator.wikimedia.org/P12902 and previous config saved to /var/cache/conftool/dbconfig/20201002-120825-kormat.json
* 12:05 hnowlan: bootstrapping restbase1029-c
* 11:53 kormat@cumin1001: dbctl commit (dc=all): 'db2110 (re)pooling @ 75%: schema change [[phab:T259831|T259831]]', diff saved to https://phabricator.wikimedia.org/P12901 and previous config saved to /var/cache/conftool/dbconfig/20201002-115322-kormat.json
* 11:22 jmm@cumin2001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
* 10:59 jmm@cumin2001: START - Cookbook sre.ganeti.makevm
* 10:57 jmm@cumin2001: END (ERROR) - Cookbook sre.ganeti.makevm (exit_code=97)
* 10:47 jmm@cumin2001: START - Cookbook sre.ganeti.makevm
* 10:47 jmm@cumin2001: END (ERROR) - Cookbook sre.ganeti.makevm (exit_code=97)
* 10:44 kormat@cumin1001: dbctl commit (dc=all): 'db2110 depooling: schema change [[phab:T259831|T259831]]', diff saved to https://phabricator.wikimedia.org/P12900 and previous config saved to /var/cache/conftool/dbconfig/20201002-104453-kormat.json
* 10:44 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 10:44 kormat@cumin1001: START - Cookbook sre.hosts.downtime
* 10:43 kormat@cumin1001: dbctl commit (dc=all): 'db2106 (re)pooling @ 100%: schema change [[phab:T259831|T259831]]', diff saved to https://phabricator.wikimedia.org/P12899 and previous config saved to /var/cache/conftool/dbconfig/20201002-104320-kormat.json
* 10:40 jmm@cumin2001: START - Cookbook sre.ganeti.makevm
* 10:36 jmm@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
* 10:28 kormat@cumin1001: dbctl commit (dc=all): 'db2106 (re)pooling @ 67%: schema change [[phab:T259831|T259831]]', diff saved to https://phabricator.wikimedia.org/P12898 and previous config saved to /var/cache/conftool/dbconfig/20201002-102817-kormat.json
* 10:13 kormat@cumin1001: dbctl commit (dc=all): 'db2106 (re)pooling @ 33%: schema change [[phab:T259831|T259831]]', diff saved to https://phabricator.wikimedia.org/P12897 and previous config saved to /var/cache/conftool/dbconfig/20201002-101313-kormat.json
* 10:06 jmm@cumin1001: START - Cookbook sre.ganeti.makevm
* 09:58 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=0)
* 09:56 elukey@cumin1001: START - Cookbook sre.hadoop.init-hadoop-workers
* 09:48 jmm@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
* 09:30 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 09:28 elukey@cumin1001: START - Cookbook sre.hosts.downtime
* 09:27 kormat@cumin1001: dbctl commit (dc=all): 'db2106 depooling: schema change [[phab:T259831|T259831]]', diff saved to https://phabricator.wikimedia.org/P12896 and previous config saved to /var/cache/conftool/dbconfig/20201002-092715-kormat.json
* 09:27 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 09:27 kormat@cumin1001: START - Cookbook sre.hosts.downtime
* 09:19 jayme: running ipvsadm -D -t 10.2.1.20:10042; ipvsadm -D -t 10.2.1.16:1969 on lvs2010.codfw.wmnet,lvs2009.codfw.wmnet - [[phab:T255875|T255875]] [[phab:T255869|T255869]]
* 09:18 jayme: running ipvsadm -D -t 10.2.2.20:10042; ipvsadm -D -t 10.2.2.16:1969 on lvs1016.eqiad.wmnet,lvs1015.eqiad.wmnet - [[phab:T255875|T255875]] [[phab:T255869|T255869]]
* 09:17 jayme: restarting pybal on lvs1015.eqiad.wmnet,lvs2009.codfw.wmnet - [[phab:T255875|T255875]] [[phab:T255869|T255869]]
* 09:14 jayme: restarting pybal on lvs1016.eqiad.wmnet,lvs2010.codfw.wmnet - [[phab:T255875|T255875]] [[phab:T255869|T255869]]
* 09:12 jayme: running puppet on lvs servers - [[phab:T255875|T255875]] [[phab:T255869|T255869]]
* 09:11 arturo: added helm3 package to buster-wikimedia/thirdparty/kubeadm-k8s-1-17 ([[phab:T264221|T264221]])
* 09:09 jmm@cumin1001: START - Cookbook sre.ganeti.makevm
* 09:08 jmm@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99)
* 09:08 jmm@cumin1001: START - Cookbook sre.ganeti.makevm
* 09:07 hnowlan: bootstrapping restbase1029-b cassandra
* 09:05 hashar: gerrit: running garbage collector
* 09:00 root@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 09:00 root@cumin1001: START - Cookbook sre.hosts.downtime
* 09:00 root@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 09:00 root@cumin1001: START - Cookbook sre.hosts.downtime
* 08:59 root@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 08:59 root@cumin1001: START - Cookbook sre.hosts.downtime
* 08:54 dcausse@deploy1001: Finished deploy [wikimedia/discovery/analytics@5713fb0]: Test stat1007 deploy (duration: 00m 03s)
* 08:54 dcausse@deploy1001: Started deploy [wikimedia/discovery/analytics@5713fb0]: Test stat1007 deploy
* 08:42 dcausse@deploy1001: Finished deploy [wikimedia/discovery/analytics@5713fb0]: Test stat1007 deploy (duration: 00m 34s)
* 08:41 dcausse@deploy1001: Started deploy [wikimedia/discovery/analytics@5713fb0]: Test stat1007 deploy
* 08:30 dcausse@deploy1001: Finished deploy [wikimedia/discovery/analytics@5713fb0]: Fix lexeme dumps expected date (duration: 00m 33s)
* 08:30 dcausse@deploy1001: Started deploy [wikimedia/discovery/analytics@5713fb0]: Fix lexeme dumps expected date
* 08:29 moritzm: installing pyzmq bugfix update from buster point release
* 08:24 moritzm: installing nginx security updates on puppetdb*
* 08:17 dcausse@deploy1001: Finished deploy [wikimedia/discovery/analytics@5713fb0]: Fix lexeme dumps expected date (duration: 01m 35s)
* 08:16 dcausse@deploy1001: Started deploy [wikimedia/discovery/analytics@5713fb0]: Fix lexeme dumps expected date
* 07:42 moritzm: installing libcommons-compress-java security updates
* 07:35 godog: swift codfw-prod bump weight for ms-be2057 - [[phab:T261633|T261633]]
* 07:29 godog: prometheus codfw/k8s, add 50G to the LV
* 07:23 moritzm: installing libx11 security updates on buster
* 06:51 _joe_: restarting php-fpm on all appservers in eqiad, in batches of 10%, for testing the procedure suggested at [[phab:T264362|T264362]]
* 05:48 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 05:43 marostegui@cumin1001: START - Cookbook sre.hosts.decommission
* 05:30 marostegui@cumin1001: dbctl commit (dc=all): 'Remove es2011 from dbctl [[phab:T264261|T264261]]', diff saved to https://phabricator.wikimedia.org/P12893 and previous config saved to /var/cache/conftool/dbconfig/20201002-053020-marostegui.json
== 2020-10-01 ==
* 23:38 ebernhardson@deploy1001: Finished deploy [wikimedia/discovery/analytics@6101b56]: mjolnir: increase training memory overhead by 10% (duration: 00m 34s)
* 23:38 ebernhardson@deploy1001: Started deploy [wikimedia/discovery/analytics@6101b56]: mjolnir: increase training memory overhead by 10%
* 23:33 dzahn@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
* 23:15 ebernhardson@deploy1001: Finished deploy [wikimedia/discovery/analytics@6101b56]: mjolnir: increase training memory overhead by 10% (duration: 00m 24s)
* 23:15 ebernhardson@deploy1001: Started deploy [wikimedia/discovery/analytics@6101b56]: mjolnir: increase training memory overhead by 10%
* 23:07 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm
* 22:36 James_F: Manually created mediawiki/extensions.git REL1_35 at {{Gerrit|7ab9a74c9ebbb22ad9fb9b7c95c91b7fad8bf8c6}} for [[phab:T264365|T264365]]
* 22:35 dzahn@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99)
* 22:23 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm
* 22:09 dzahn@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99)
* 22:03 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm
* 22:00 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 21:58 andrew@cumin1001: START - Cookbook sre.hosts.downtime
* 21:29 twentyafterfour@deploy1001: rebuilt and synchronized wikiversions files: rollback group0 as well [[phab:T264363|T264363]]
* 21:29 James_F: Manually created mediawiki/skins.git REL1_35 at {{Gerrit|796693cb7a2ee3191fcbe19769d341bd0530bd4a}} for [[phab:T264365|T264365]]
* 21:28 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 21:26 andrew@cumin1001: START - Cookbook sre.hosts.downtime
* 21:26 twentyafterfour@deploy1001: rebuilt and synchronized wikiversions files: rollback group1
* 20:48 twentyafterfour@deploy1001: Synchronized php: group1 wikis to 1.36.0-wmf.11  refs [[phab:T263177|T263177]] (duration: 01m 06s)
* 20:47 twentyafterfour@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.36.0-wmf.11  refs [[phab:T263177|T263177]]
* 20:19 twentyafterfour@deploy1001: rebuilt and synchronized wikiversions files: group0 wikis to 1.36.0-wmf.11
* 20:08 twentyafterfour@deploy1001: Synchronized php-1.36.0-wmf.11/includes/parser/: sync ParserCache patches to unblock the train [[phab:T264257|T264257]] [[phab:T263177|T263177]] (duration: 00m 59s)
* 18:40 ebernhardson@deploy1001: Synchronized wmf-config/InitialiseSettings.php: cirrus: increase more_like recommendation cache from one to three days [[phab:T264053|T264053]] (duration: 00m 59s)
* 17:49 fdans@deploy1001: Finished deploy [analytics/refinery@530b339]: Regular analytics weekly train {{Gerrit|530b339}} (duration: 13m 42s)
* 17:35 fdans@deploy1001: Started deploy [analytics/refinery@530b339]: Regular analytics weekly train {{Gerrit|530b339}}
* 17:26 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 17:24 fdans@deploy1001: Finished deploy [analytics/refinery@530b339]: Regular analytics weekly train {{Gerrit|530b339}} (duration: 01m 34s)
* 17:24 mutante: etherpad1002 - attempted to upgrade Etherpad to newer version but wasn't working, reverted to previous one
* 17:22 fdans@deploy1001: Started deploy [analytics/refinery@530b339]: Regular analytics weekly train {{Gerrit|530b339}}
* 17:16 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 16:46 volans: migrating esams DNS records to the autogenerated ones from Netbox - [[phab:T258729|T258729]]
* 16:19 bblack: rebooting lvs1016 to a fresh state for interface config and error counters, etc - [[phab:T264227|T264227]]
* 15:56 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 15:54 andrew@cumin1001: START - Cookbook sre.hosts.downtime
* 15:53 bblack: lvs1016: re-disabled puppet with ticket ref in comment, downed interface enp5s0f0 since it's flapping furiously - [[phab:T264227|T264227]]
* 15:53 bblack: lvs1016: re-disabled puppet with ticket ref in comment, downed interface enp5s0f0 since it's flapping furiously
* 14:55 jayme: running ipvsadm -D -t 10.2.2.10:8081; ipvsadm -D -t 10.2.2.47:8889 on lvs1015.eqiad.wmnet - [[phab:T244843|T244843]] [[phab:T255878|T255878]]
* 14:55 moritzm: installing npm security updates on buster
* 14:54 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 14:53 jayme: running ipvsadm -D -t 10.2.1.10:8081; ipvsadm -D -t 10.2.1.47:8889 on lvs2010.codfw.wmnet,lvs2009.codfw.wmnet - [[phab:T244843|T244843]] [[phab:T255878|T255878]]
* 14:52 andrew@cumin1001: START - Cookbook sre.hosts.downtime
* 14:50 jayme: restarting pybal on lvs1015.eqiad.wmnet,lvs2009.codfw.wmnet - [[phab:T244843|T244843]] [[phab:T255878|T255878]]
* 14:48 jayme: restarting pybal on lvs2010.codfw.wmnet - [[phab:T244843|T244843]] [[phab:T255878|T255878]]
* 14:42 jayme: running puppet on lvs servers - [[phab:T244843|T244843]] [[phab:T255878|T255878]]
* 14:35 Urbanecm: Create bot_passwords table at all private wikis ([[phab:T258356|T258356]])
* 14:29 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 14:29 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime
* 14:29 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 14:29 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime
* 14:29 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 14:29 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime
* 14:21 kormat@cumin1001: dbctl commit (dc=all): 'db2136 (re)pooling @ 100%: schema change [[phab:T259831|T259831]]', diff saved to https://phabricator.wikimedia.org/P12886 and previous config saved to /var/cache/conftool/dbconfig/20201001-142156-kormat.json
* 14:14 andrewbogott: reimaging cloudvirt-wdqs1001 to buster
* 14:12 effie: enable puppet on mw2271
* 14:08 moritzm: installing pillow security updates
* 14:06 kormat@cumin1001: dbctl commit (dc=all): 'db2136 (re)pooling @ 67%: schema change [[phab:T259831|T259831]]', diff saved to https://phabricator.wikimedia.org/P12885 and previous config saved to /var/cache/conftool/dbconfig/20201001-140653-kormat.json
* 13:59 moritzm: installing nginx security updates on schema*
* 13:51 kormat@cumin1001: dbctl commit (dc=all): 'db2136 (re)pooling @ 33%: schema change [[phab:T259831|T259831]]', diff saved to https://phabricator.wikimedia.org/P12884 and previous config saved to /var/cache/conftool/dbconfig/20201001-135149-kormat.json
* 13:50 klausman: rebooting an-worker1096 for cluster maintenance
* 13:49 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 13:49 klausman@cumin1001: START - Cookbook sre.hosts.downtime
* 13:43 vgutierrez: use synthetic warning for 2% of ECDHE-ECDSA-AES128-SHA pageviews - [[phab:T258405|T258405]]
* 13:29 moritzm: restarting mw canaries to pick up curl update
* 13:22 moritzm: installing curl security updates on stretch
* 12:57 kormat@cumin1001: dbctl commit (dc=all): 'db2136 depooling: schema change [[phab:T259831|T259831]]', diff saved to https://phabricator.wikimedia.org/P12883 and previous config saved to /var/cache/conftool/dbconfig/20201001-125707-kormat.json
* 12:56 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 12:56 kormat@cumin1001: START - Cookbook sre.hosts.downtime
* 12:39 kormat@cumin1001: dbctl commit (dc=all): 'db2119 (re)pooling @ 100%: schema change [[phab:T259831|T259831]]', diff saved to https://phabricator.wikimedia.org/P12882 and previous config saved to /var/cache/conftool/dbconfig/20201001-123925-kormat.json
* 12:24 kormat@cumin1001: dbctl commit (dc=all): 'db2119 (re)pooling @ 75%: schema change [[phab:T259831|T259831]]', diff saved to https://phabricator.wikimedia.org/P12881 and previous config saved to /var/cache/conftool/dbconfig/20201001-122422-kormat.json
* 12:15 urbanecm@deploy1001: Synchronized php-1.36.0-wmf.11/extensions/GrowthExperiments/includes/NewcomerTasks/TemplateFilter.php: {{Gerrit|500d0c70c84936bcdecdd0927bcbb9ff7265afa9}}: Prevent returning the full templatelinks table in TemplateFilter ([[phab:T264029|T264029]]) (duration: 00m 59s)
* 12:12 urbanecm@deploy1001: Synchronized php-1.36.0-wmf.10/extensions/GrowthExperiments/includes/NewcomerTasks/TemplateFilter.php: {{Gerrit|500d0c70c84936bcdecdd0927bcbb9ff7265afa9}}: Prevent returning the full templatelinks table in TemplateFilter ([[phab:T264029|T264029]]) (duration: 01m 00s)
* 12:09 kormat@cumin1001: dbctl commit (dc=all): 'db2119 (re)pooling @ 50%: schema change [[phab:T259831|T259831]]', diff saved to https://phabricator.wikimedia.org/P12880 and previous config saved to /var/cache/conftool/dbconfig/20201001-120919-kormat.json
* 11:54 kormat@cumin1001: dbctl commit (dc=all): 'db2119 (re)pooling @ 25%: schema change [[phab:T259831|T259831]]', diff saved to https://phabricator.wikimedia.org/P12879 and previous config saved to /var/cache/conftool/dbconfig/20201001-115415-kormat.json
* 11:14 arturo: pulling packages into reprepro for buster-wikimedia/thirdpardy/kubeadm-k8s-1-17 ([[phab:T263284|T263284]])
* 11:09 Urbanecm: [urbanecm@mwmaint2001 ~]$ mwscript namespaceDupes.php --wiki=kuwiktionary --fix # [[phab:T262046|T262046]]
* 11:08 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|58a8c8271d75ff477ce0507ac5021edcfc2f6453}}: kuwiktionary: Create Jinûvesazî namespace ([[phab:T262046|T262046]]) (duration: 01m 01s)
* 10:47 kormat@cumin1001: dbctl commit (dc=all): 'db2119 depooling: schema change [[phab:T259831|T259831]]', diff saved to https://phabricator.wikimedia.org/P12878 and previous config saved to /var/cache/conftool/dbconfig/20201001-104716-kormat.json
* 10:47 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 10:47 kormat@cumin1001: START - Cookbook sre.hosts.downtime
* 08:55 hnowlan: adding buster host restbase1028-b to cassandra
* 08:53 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 08:38 akosiaris@cumin1001: START - Cookbook sre.hosts.decommission
* 08:37 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 08:33 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db2109', diff saved to https://phabricator.wikimedia.org/P12877 and previous config saved to /var/cache/conftool/dbconfig/20201001-083321-marostegui.json
* 08:28 akosiaris@cumin1001: START - Cookbook sre.hosts.decommission
* 08:27 akosiaris@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
* 08:25 akosiaris@cumin1001: START - Cookbook sre.hosts.decommission
* 08:25 akosiaris@cumin1001: END (ERROR) - Cookbook sre.hosts.decommission (exit_code=97)
* 08:25 akosiaris@cumin1001: START - Cookbook sre.hosts.decommission
* 08:25 akosiaris@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=99)
* 08:25 akosiaris@cumin1001: START - Cookbook sre.hosts.decommission
* 08:25 akosiaris@cumin1001: END (ERROR) - Cookbook sre.hosts.decommission (exit_code=97)
* 08:22 akosiaris@cumin1001: START - Cookbook sre.hosts.decommission
* 08:22 akosiaris@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
* 08:16 akosiaris@cumin1001: START - Cookbook sre.hosts.decommission
* 08:13 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2109 ', diff saved to https://phabricator.wikimedia.org/P12875 and previous config saved to /var/cache/conftool/dbconfig/20201001-081308-marostegui.json
* 07:53 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 07:53 filippo@cumin1001: START - Cookbook sre.hosts.downtime
* 07:14 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db2091', diff saved to https://phabricator.wikimedia.org/P12874 and previous config saved to /var/cache/conftool/dbconfig/20201001-071442-marostegui.json
* 07:14 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2091 ', diff saved to https://phabricator.wikimedia.org/P12873 and previous config saved to /var/cache/conftool/dbconfig/20201001-071413-marostegui.json
* 07:13 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db2086:3318', diff saved to https://phabricator.wikimedia.org/P12872 and previous config saved to /var/cache/conftool/dbconfig/20201001-071347-marostegui.json
* 07:13 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2086:3318', diff saved to https://phabricator.wikimedia.org/P12871 and previous config saved to /var/cache/conftool/dbconfig/20201001-071321-marostegui.json
* 07:12 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db2083', diff saved to https://phabricator.wikimedia.org/P12870 and previous config saved to /var/cache/conftool/dbconfig/20201001-071241-marostegui.json
* 07:12 elukey: restart hdfs namenodes on an-worker100[1,2] to pick up new hadoop workers settings
* 07:11 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2083', diff saved to https://phabricator.wikimedia.org/P12869 and previous config saved to /var/cache/conftool/dbconfig/20201001-071155-marostegui.json
* 06:42 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=0)
* 06:40 elukey@cumin1001: START - Cookbook sre.hadoop.init-hadoop-workers
* 06:31 marostegui@cumin1001: dbctl commit (dc=all): 'Make es2033 master of es2 [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P12867 and previous config saved to /var/cache/conftool/dbconfig/20201001-063104-marostegui.json
* 06:18 jayme: imported envoyproxy 1.15.1 to buster-wikimedia, stretch-wikimedia - [[phab:T264157|T264157]]
* 05:45 marostegui: Stop MySQL on es2011 [[phab:T264261|T264261]]
* 05:43 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es2011 [[phab:T264261|T264261]]', diff saved to https://phabricator.wikimedia.org/P12866 and previous config saved to /var/cache/conftool/dbconfig/20201001-054335-marostegui.json
* 05:38 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 05:35 marostegui@cumin1001: START - Cookbook sre.hosts.decommission
* 05:29 marostegui: Deploy schema change on s3 (testwikidatawiki) [[phab:T264109|T264109]]
* 05:19 marostegui: Repool labsdb1011
* 04:20 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 04:18 andrew@cumin1001: START - Cookbook sre.hosts.downtime
* 01:27 krinkle@deploy1001: Synchronized php-1.36.0-wmf.10/includes/parser/: {{Gerrit|Ia3357b2f593c}} (duration: 00m 58s)
* 01:12 krinkle@deploy1001: Synchronized wmf-config/CommonSettings.php: {{Gerrit|1721d2aa0}} - Reject ParserCache entries from the last wmf.11 deployment (duration: 05m 13s)
== 2020-09-30 ==
* 22:52 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 22:50 robh@cumin1001: START - Cookbook sre.hosts.downtime
* 22:12 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 22:10 robh@cumin1001: START - Cookbook sre.hosts.downtime
* 21:46 cdanis: depool mw2356 and mw2319
* 21:45 eileen: civicrm revision changed from {{Gerrit|5a53bfe6ed}} to {{Gerrit|256adda03c}}, config revision is {{Gerrit|646817a2c0}}
* 21:23 twentyafterfour@deploy1001: rebuilt and synchronized wikiversions files: rollback group0 also
* 21:19 ejegg: updated fundraising CiviCRM from {{Gerrit|6e843649ac}} to {{Gerrit|5a53bfe6ed}}
* 21:04 twentyafterfour@deploy1001: rebuilt and synchronized wikiversions files: rollback
* 21:00 twentyafterfour@deploy1001: scap failed: average error rate on 5/6 canaries increased by 10x (rerun with --force to override this check, see https://logstash.wikimedia.org/goto/e474f13ffac6b8c3bf919c4aeafc8c9b for details)
* 20:58 twentyafterfour@deploy1001: Synchronized php: group1 wikis to 1.36.0-wmf.11 (duration: 01m 20s)
* 20:56 twentyafterfour@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.36.0-wmf.11
* 20:47 mutante: temp disabling puppet on C:profile::swift::stats_reporter hosts, applying gerrit:631158 refactoring change
* 20:36 mutante: temp disabling puppet on swift::storage (swift-be) hosts, applying gerrit:631157 refactoring change
* 19:21 mutante: activating DHCP and squid on install[345]001.wikimedia.org
* 19:12 twentyafterfour@deploy1001: rebuilt and synchronized wikiversions files: group0 wikis to 1.36.0-wmf.11
* 19:01 effie: disable puppet on mw2271 and use onhost memcached - [[phab:T263958|T263958]]
* 19:00 hoo@deploy1001: Synchronized wmf-config/: Revert "labs: Turn on termbox v2 on wikidatawiki" ([[phab:T264066|T264066]]) (duration: 00m 58s)
* 18:58 hoo@deploy1001: Synchronized wmf-config/Wikibase.php: Revert "labs: Turn on termbox v2 on wikidatawiki" ([[phab:T264066|T264066]]) (duration: 00m 58s)
* 18:38 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enable and configure GrowthExperiments on svwiki ([[phab:T257220|T257220]]) (duration: 00m 58s)
* 18:36 bblack: lvs1016 pybal diff alerts downtimed in icinga for ~48h to reduce annoying flappy alert spam, with reference to https://phabricator.wikimedia.org/T264227
* 18:31 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enable GrowthExperiments for newcomers on ptwiki ([[phab:T225027|T225027]]) (duration: 00m 58s)
* 18:28 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Put search in header for anons on all wikis, not just desktop-improvements wikis ([[phab:T263032|T263032]]) (duration: 00m 59s)
* 18:14 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enable clientError on Wikidata and all Wikipedias except enwiki ([[phab:T255585|T255585]]) (duration: 00m 58s)
* 18:08 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Move search in header for anons ([[phab:T263032|T263032]]) (duration: 00m 59s)
* 17:52 bblack: lvs1016: restart pybal
* 17:04 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 17:02 andrew@cumin1001: START - Cookbook sre.hosts.downtime
* 17:01 hnowlan: finished adding restbase2018-a to the cassandra cluster
* 16:37 pt1979@cumin2001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 16:33 cicalese@deploy1001: Synchronized wmf-config/CommonSettings-labs.php: Add beta config for API Portal/OAuth communications (duration: 00m 58s)
* 16:31 pt1979@cumin2001: START - Cookbook sre.dns.netbox
* 16:21 mutante: re-enabled puppet on install2003
* 16:21 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 16:20 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 16:20 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 16:20 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 15:28 moritzm: removed librsvg 2.40.20-3+wmf1+stretch1 from component/thumbor, superseded by 2.40.21-0+deb9u1 released via stretch-security
* 14:23 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 14:22 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime
* 14:22 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 14:22 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime
* 14:22 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 14:22 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime
* 14:20 hnow