You are browsing a read-only backup copy of Wikitech. The primary site can be found at wikitech.wikimedia.org

Difference between revisions of "Server Admin Log"

From Wikitech-static
Jump to navigation Jump to search
imported>Stashbot
(catrope@deploy1001: Synchronized php-1.36.0-wmf.14/extensions/GrowthExperiments/: Fix JS error when no topics set (T266501) (duration: 01m 00s))
imported>Stashbot
(mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .)
 
(316 intermediate revisions by 3 users not shown)
Line 1: Line 1:
== 2020-10-26 ==
== 2021-10-22 ==
* 23:12 catrope@deploy1001: Synchronized php-1.36.0-wmf.14/extensions/GrowthExperiments/: Fix JS error when no topics set ([[phab:T266501|T266501]]) (duration: 01m 00s)
* 23:17 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 22:30 mutante: netflow5001 - systemctl reset-failed
* 23:13 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 21:44 rzl: live test of sre.switchdc.mediawiki complete, the foregoing logging noise had no actual production impact
* 20:57 bblack: re-pooling eqiad in DNS
* 21:43 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.08-update-tendril (exit_code=0)
* 20:54 legoktm: <XioNoX> I disabled the interface on cr1, going to re-enabled the active on on cr2
* 21:43 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.08-update-tendril
* 20:48 legoktm: bblack has temporarily depooled eqiad https://gerrit.wikimedia.org/r/733043
* 21:43 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.08-start-maintenance (exit_code=0)
* 20:41 XioNoX: disable sessions to equinix eqiad IXP
* 21:41 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.08-start-maintenance
* 19:17 urbanecm: Start server-side upload of 1 video file ([[phab:T294134|T294134]])
* 21:41 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.08-restore-ttl (exit_code=0)
* 15:06 jbond: upload puppetboard_3.1.0-1_all.deb to ullseye-wikimedia
* 21:41 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.08-restore-ttl
* 13:42 ema: deployment-cache-upload06: restart varnish-frontend, package got upgraded to 6.0.8 [[phab:T294116|T294116]]
* 21:40 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.08-run-puppet-on-db-masters (exit_code=0)
* 13:30 jbond: upload python3-pypuppetdb_2.4.0-1_all.deb to bullseye
* 21:37 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.08-run-puppet-on-db-masters
* 10:46 jbond: upload cas_6.4.2-1+wmf10u1
* 21:37 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.07-set-readwrite (exit_code=0)
* 10:33 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti2026.codfw.wmnet with OS buster
* 21:37 rzl@cumin1001: [DRY-RUN] MediaWiki read-only period ends at: 2020-10-26 21:37:17.809596
* 10:05 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti2026.codfw.wmnet with OS buster
* 21:37 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.07-set-readwrite
* 09:11 Lucas_WMDE: lucaswerkmeister-wmde@mwmaint1002:~$ mwscript extensions/Wikibase/repo/maintenance/ResubmitChanges.php wikidatawiki --minimum-age $((60*60*12)) # [[phab:T294029|T294029]]
* 21:37 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.06-set-db-readwrite (exit_code=0)
* 09:04 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti2025.codfw.wmnet with OS buster
* 21:37 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.06-set-db-readwrite
* 08:36 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti2025.codfw.wmnet with OS buster
* 21:37 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.05-invert-redis-sessions (exit_code=0)
* 08:27 ema@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp3062.esams.wmnet,service=(varnish-fe{{!}}ats-tls)
* 21:36 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.05-invert-redis-sessions
* 08:24 ema@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp3062.esams.wmnet,service=(varnish-fe{{!}}ats-tls)
* 21:36 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.04-switch-mediawiki (exit_code=0)
* 08:23 ema: cp3062: test 0008-vsl_check_e_inval_assertion.patch https://gerrit.wikimedia.org/r/c/operations/debs/varnish4/+/732913/ [[phab:T293879|T293879]]
* 21:36 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.04-switch-mediawiki
* 08:00 ema: deployment-cache-text06: test 0008-vsl_check_e_inval_assertion.patch https://gerrit.wikimedia.org/r/c/operations/debs/varnish4/+/732913/ [[phab:T293879|T293879]]
* 21:36 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.03-set-db-readonly (exit_code=0)
* 05:54 marostegui@cumin1001: dbctl commit (dc=all): 'db1126 (re)pooling @ 100%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17580 and previous config saved to /var/cache/conftool/dbconfig/20211022-055403-root.json
* 21:35 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.03-set-db-readonly
* 05:39 marostegui@cumin1001: dbctl commit (dc=all): 'db1126 (re)pooling @ 75%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17579 and previous config saved to /var/cache/conftool/dbconfig/20211022-053900-root.json
* 21:35 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.02-set-readonly (exit_code=0)
* 05:23 marostegui@cumin1001: dbctl commit (dc=all): 'db1126 (re)pooling @ 50%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17578 and previous config saved to /var/cache/conftool/dbconfig/20211022-052356-root.json
* 21:35 rzl@cumin1001: [DRY-RUN] MediaWiki read-only period starts at: 2020-10-26 21:35:20.837214
* 05:08 marostegui@cumin1001: dbctl commit (dc=all): 'db1126 (re)pooling @ 25%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17577 and previous config saved to /var/cache/conftool/dbconfig/20211022-050852-root.json
* 21:35 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.02-set-readonly
* 04:53 marostegui@cumin1001: dbctl commit (dc=all): 'db1126 (re)pooling @ 10%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17576 and previous config saved to /var/cache/conftool/dbconfig/20211022-045349-root.json
* 21:34 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.01-stop-maintenance (exit_code=0)
* 04:46 marostegui_: Deploy schema change on s8 codfw - [[phab:T291719|T291719]]
* 21:34 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.01-stop-maintenance
* 04:38 marostegui@cumin1001: dbctl commit (dc=all): 'db1126 (re)pooling @ 5%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17575 and previous config saved to /var/cache/conftool/dbconfig/20211022-043845-root.json
* 21:34 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.00-warmup-caches (exit_code=0)
* 02:59 ejegg: updated payments-wiki from {{Gerrit|088a8cda1e}} to {{Gerrit|6e810fb401}}
* 21:33 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.00-warmup-caches
* 21:32 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.00-reduce-ttl (exit_code=0)
* 21:32 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.00-reduce-ttl
* 21:31 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.00-disable-puppet (exit_code=0)
* 21:31 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.00-disable-puppet
* 21:31 rzl: starting a live test of sre.switchdc.mediawiki, which will create some logging noise but no actual production impact
* 20:54 mutante: scandium rm /usr/local/bin/update_parsoid.sh (gerrit:636494)
* 20:15 ladsgroup@deploy1001: Finished deploy [ores/deploy@6912889]: Deploy new version of articlequality for wikidata ([[phab:T261326|T261326]]) (duration: 06m 53s)
* 20:08 ladsgroup@deploy1001: Started deploy [ores/deploy@6912889]: Deploy new version of articlequality for wikidata ([[phab:T261326|T261326]])
* 19:31 ppchelko@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
* 19:29 ppchelko@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
* 19:26 ppchelko@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
* 18:59 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: GrowthExperiments: Remove variant setting override (no-op) ([[phab:T265556|T265556]]) (duration: 00m 57s)
* 18:55 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Configure $wgBabelCategoryNames on ndswiki ([[phab:T264990|T264990]]) (duration: 00m 58s)
* 18:51 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Add www.legislation.gov.uk to $wgCopyUploadsDomains on commonswiki ([[phab:T265690|T265690]]) (duration: 00m 58s)
* 18:47 catrope@deploy1001: Synchronized php-1.36.0-wmf.14/extensions/GrowthExperiments/: Make variant D the default, remove variant A ([[phab:T265372|T265372]], [[phab:T265556|T265556]]) (duration: 00m 58s)
* 18:46 catrope@deploy1001: Synchronized php-1.36.0-wmf.14/vendor/wikimedia/parsoid/: Bump wikimedia/parsoid to v0.13.0-a13, enabling 6-element DSRs ([[phab:T266285|T266285]]) (duration: 00m 58s)
* 18:43 catrope@deploy1001: Synchronized php-1.36.0-wmf.14/skins/Vector/: Fix logic in collapsibleTabs code ([[phab:T71729|T71729]]) (duration: 00m 58s)
* 18:21 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Remove wtp2001-wtp2020 from LinterSubmitterWhitelist ([[phab:T265558|T265558]]) (duration: 00m 59s)
* 18:10 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: GrowthExperiments: Make variant D the default on all wikis ([[phab:T265556|T265556]]) (duration: 00m 58s)
* 17:58 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'kube-system' for release 'eventrouter' .
* 17:48 mutante: an-worker109* - systemctl reset-failed  to clear Icinga alerts related to wmf_auto_restart changes
* 17:45 mutante: releases2002,netmon2001, various other hosts - systemctl reset-failed  to clear Icinga alerts related to wmf_auto_restart changes
* 17:39 krinkle@deploy1001: Synchronized php-1.36.0-wmf.13/resources/src/mediawiki.util/: [[phab:T265809|T265809]], {{Gerrit|I1011f63ae61f5a6}} (duration: 01m 00s)
* 16:41 XioNoX: bounce security log on pfw3-eqiad - [[phab:T263833|T263833]]
* 16:29 XioNoX: set security-log traceoptions on pfw3-eqiad - [[phab:T263833|T263833]]
* 16:14 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 16:08 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 16:07 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 16:00 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 15:51 rzl@cumin1001: conftool action : set/ttl=300; selector: dnsdisc=apertium{{!}}api-gateway{{!}}citoid{{!}}cxserver{{!}}echostore{{!}}eventgate-analytics{{!}}eventgate-analytics-external{{!}}eventgate-logging-external{{!}}eventgate-main{{!}}eventstreams{{!}}graphoid{{!}}kartotherian{{!}}mathoid{{!}}mobileapps{{!}}ores{{!}}parsoid{{!}}proton{{!}}push-notifications{{!}}recommendation-api{{!}}restbase{{!}}restbase-async{{!}}schema{{!}}search{{!}}sessionstore{{!}}termbox{{!}}wdqs{{!}}wdqs-internal{{!}}wikifeeds{{!}}zotero,name=eqiad
* 15:35 rzl@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=zotero,name=eqiad
* 15:32 rzl@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=wikifeeds,name=eqiad
* 15:29 rzl@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=wdqs-internal,name=eqiad
* 15:26 rzl@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=wdqs,name=eqiad
* 15:23 rzl@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=termbox,name=eqiad
* 15:20 rzl@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=sessionstore,name=eqiad
* 15:17 rzl@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=search,name=eqiad
* 15:14 rzl@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=schema,name=eqiad
* 15:11 rzl@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=restbase-async,name=eqiad
* 15:08 rzl@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=restbase,name=eqiad
* 15:05 rzl@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=recommendation-api,name=eqiad
* 15:02 rzl@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=push-notifications,name=eqiad
* 14:59 rzl@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=proton,name=eqiad
* 14:56 rzl@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=parsoid,name=eqiad
* 14:53 rzl@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=ores,name=eqiad
* 14:50 rzl@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=mobileapps,name=eqiad
* 14:47 rzl@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=mathoid,name=eqiad
* 14:46 ppchelko@deploy1001: Finished deploy [restbase/deploy@a1a1bd7]: Add api-portal and snmwiki (duration: 16m 43s)
* 14:44 rzl@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=kartotherian,name=eqiad
* 14:41 rzl@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=graphoid,name=eqiad
* 14:38 rzl@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=eventstreams,name=eqiad
* 14:35 rzl@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=eventgate-main,name=eqiad
* 14:32 rzl@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=eventgate-logging-external,name=eqiad
* 14:30 ppchelko@deploy1001: Started deploy [restbase/deploy@a1a1bd7]: Add api-portal and snmwiki
* 14:29 rzl@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=eventgate-analytics-external,name=eqiad
* 14:26 rzl@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=eventgate-analytics,name=eqiad
* 14:23 rzl@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=echostore,name=eqiad
* 14:20 rzl@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=cxserver,name=eqiad
* 14:17 rzl@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=citoid,name=eqiad
* 14:14 rzl@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=api-gateway,name=eqiad
* 14:11 rzl@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=apertium,name=eqiad
* 14:06 rzl@cumin1001: conftool action : set/ttl=10; selector: dnsdisc=apertium{{!}}api-gateway{{!}}citoid{{!}}cxserver{{!}}echostore{{!}}eventgate-analytics{{!}}eventgate-analytics-external{{!}}eventgate-logging-external{{!}}eventgate-main{{!}}eventstreams{{!}}graphoid{{!}}kartotherian{{!}}mathoid{{!}}mobileapps{{!}}ores{{!}}parsoid{{!}}proton{{!}}push-notifications{{!}}recommendation-api{{!}}restbase{{!}}restbase-async{{!}}schema{{!}}search{{!}}sessionstore{{!}}termbox{{!}}wdqs{{!}}wdqs-internal{{!}}wikifeeds{{!}}zotero,name=eqiad
* 13:48 moritzm: imported cas 6.2.4-1 to apt.wikimedia.org [[phab:T265857|T265857]]
* 13:21 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 13:21 marostegui@cumin1001: START - Cookbook sre.hosts.downtime
* 13:19 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 13:19 marostegui@cumin1001: START - Cookbook sre.hosts.downtime
* 11:52 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|bff6b37a55fe8f260fe00cbb942c53101167fb07}}: Add foto.digitalarkivet.no to wgCopyUploadsDomains whitelist of Wikimedia Commons ([[phab:T266390|T266390]]) (duration: 01m 14s)
* 11:27 elukey@cumin1001: END (FAIL) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=99)
* 11:27 elukey@cumin1001: START - Cookbook sre.hadoop.init-hadoop-workers
* 11:26 elukey@cumin1001: END (FAIL) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=99)
* 11:26 elukey@cumin1001: START - Cookbook sre.hadoop.init-hadoop-workers
* 11:11 vgutierrez: upgrade trafficserver to 8.0.8-1wm3 on cp4032 - [[phab:T265911|T265911]]
* 11:02 elukey@cumin1001: END (FAIL) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=99)
* 11:02 elukey@cumin1001: START - Cookbook sre.hadoop.init-hadoop-workers
* 10:51 vgutierrez: manually reloading nginx on cloudelastic[1005-1006]
* 10:29 vgutierrez: upload trafficserver 8.0.8-1wm3 to apt.wm.org (buster) - [[phab:T265911|T265911]]
* 10:18 godog: roll restart pybal to apply latest configuration
* 09:51 jayme: published docker-registry.discovery.wmnet/eventrouter:0.3.0-3
* 09:31 moritzm: restarting PHP FPM on mw canaries to pick up freetype update
* 09:04 godog: swift codfw-prod: bump object weight for ms-be2057 - [[phab:T261633|T261633]]
* 08:58 moritzm: installing freetype security updates for stretch
* 08:57 XioNoX: remove down sessions to AS38758
* 08:51 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 08:51 filippo@cumin1001: START - Cookbook sre.hosts.downtime
* 08:43 XioNoX: remove down sessions to AS8560
* 08:41 XioNoX: remove down sessions to AS31334
* 08:28 XioNoX: remove down sessions to AS6327
* 08:27 XioNoX: remove down sessions to AS8674
* 08:25 XioNoX: remove down sessions to AS24429
* 08:21 XioNoX: remove down sessions to AS16509
* 06:59 _joe_: rolling restart of php7.2-fpm on the codfw jobrunners, to reduce the number of dangling transcodes after restarting cp-jobqueue for a deploy
* 06:59 oblivian@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' .
* 06:16 oblivian@cumin2001: conftool action : set/pooled=no; selector: cluster=jobrunner,dc=codfw,name=mw224.*
* 06:15 oblivian@cumin2001: conftool action : set/pooled=no; selector: cluster=videoscaler,dc=codfw,name=mw228.*
* 06:10 marostegui: Warm up tables [[phab:T261914|T261914]]


== 2020-10-25 ==
== 2021-10-21 ==
* 15:53 dwisehaupt: kernel upgrade and reboot for frdb1003
* 23:40 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 15:50 dwisehaupt: kernel upgrade and reboot for fran1001
* 23:38 jforrester@deploy1002: Synchronized w/fatal-error.php: Config: [[gerrit:730038{{!}}build: Upgrade composer testing stack to latest as used Wikimedia-wide]] (duration: 00m 54s)
* 23:37 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 23:37 jforrester@deploy1002: Synchronized w/static.php: Config: [[gerrit:730038{{!}}build: Upgrade composer testing stack to latest as used Wikimedia-wide]] (duration: 00m 54s)
* 23:36 jforrester@deploy1002: Synchronized multiversion/: Config: [[gerrit:730038{{!}}build: Upgrade composer testing stack to latest as used Wikimedia-wide]] (duration: 00m 55s)
* 23:34 jforrester@deploy1002: Synchronized docroot/noc/conf/index.php: Config: [[gerrit:730038{{!}}build: Upgrade composer testing stack to latest as used Wikimedia-wide]] (duration: 00m 54s)
* 23:33 jforrester@deploy1002: Synchronized wmf-config: Config: [[gerrit:730038{{!}}build: Upgrade composer testing stack to latest as used Wikimedia-wide]] (duration: 00m 55s)
* 23:32 bd808@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'toolhub' for release 'main' .
* 23:28 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 23:25 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 23:25 thcipriani@deploy1002: Synchronized wmf-config: Config: [[gerrit:730946{{!}}CommonSettings: Drop legacy CentralAuth config flag, never read (T277932)]] (duration: 00m 55s)
* 23:18 thcipriani@deploy1002: Synchronized tests/multiversion/StaticSettingsTest.php: Config: [[gerrit:720362{{!}}Add new config names for CentralAuth denylist controls (T277932)]] (duration: 00m 55s)
* 23:15 thcipriani@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:720362{{!}}Add new config names for CentralAuth denylist controls (T277932)]] (duration: 00m 55s)
* 23:10 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 23:07 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 22:42 mutante: [[phab:T294038|T294038]] [krb1001:~] $ sudo manage_principals.py create effeietsanders ... Principal successfully created.  . .Successfully sent email
* 21:44 ebernhardson@deploy1002: Finished deploy [wdqs/wdqs@13448f1] (wcqs): Deploy 0.3.90 to WCQS (duration: 02m 47s)
* 21:41 ebernhardson@deploy1002: Started deploy [wdqs/wdqs@13448f1] (wcqs): Deploy 0.3.90 to WCQS
* 20:54 ebernhardson@deploy1002: Finished deploy [wdqs/wdqs@1309a97] (wcqs): dry run wcqs deploy (duration: 00m 13s)
* 20:53 ebernhardson@deploy1002: Started deploy [wdqs/wdqs@1309a97] (wcqs): dry run wcqs deploy
* 20:53 ebernhardson@deploy1002: Finished deploy [wdqs/wdqs@1309a97] (wcqs): dry run wcqs deploy (duration: 00m 35s)
* 20:52 ebernhardson@deploy1002: Started deploy [wdqs/wdqs@1309a97] (wcqs): dry run wcqs deploy
* 20:04 otto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-main' for release 'canary' .
* 20:04 otto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
* 20:02 otto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
* 20:02 otto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-main' for release 'canary' .
* 19:46 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 19:43 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 19:42 legoktm@deploy1002: Synchronized wmf-config/CommonSettings.php: Update $wgTimelineFonts for new path to unifont in Shellbox container ([[phab:T293050|T293050]]) (duration: 00m 55s)
* 19:38 legoktm@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'shellbox-timeline' for release 'main' .
* 19:35 legoktm@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'shellbox-timeline' for release 'main' .
* 19:31 otto@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
* 19:23 legoktm@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'shellbox-timeline' for release 'main' .
* 19:10 ebernhardson@deploy1002: Finished deploy [wdqs/wdqs@b2912b7]: deploy 0.3.90, incl oauth, to wcqs (duration: 00m 23s)
* 19:09 ebernhardson@deploy1002: Started deploy [wdqs/wdqs@b2912b7]: deploy 0.3.90, incl oauth, to wcqs
* 19:07 ebernhardson@deploy1002: Finished deploy [wdqs/wdqs@b2912b7]: (no justification provided) (duration: 00m 08s)
* 19:07 ebernhardson@deploy1002: Started deploy [wdqs/wdqs@b2912b7]: (no justification provided)
* 18:53 urbanecm: Deploy security patch for [[phab:T285116|T285116]] (wmf.4, wmf.5)
* 18:53 mutante: dumpsdata1003 - sudo systemctl reset-failed to clear Icinga alert about failed cleanup_tmpdumps.service
* 17:55 mutante: that's a key for https://www.worldcat.org/whatis/default.jsp btw for those wondering
* 17:53 mutante: citoid - replaced "wskey" for worldcat in private repo as requested on [[phab:T294010|T294010]] (is in 4 places, 3 for deployment_server/k8s and one remnant for scb)
* 17:53 mvolz@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'citoid' for release 'production' .
* 17:52 mvolz@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'citoid' for release 'production' .
* 17:50 mvolz@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'citoid' for release 'staging' .
* 16:17 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 16:14 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 16:13 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
* 16:12 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality' for release 'main' .
* 16:07 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.38.0-wmf.5/extensions/Wikibase/repo/tests/: Backport: [[gerrit:732669{{!}}Remove dispatchViaJobs repo setting (T292604)]] (3/3) (duration: 00m 56s)
* 16:06 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.38.0-wmf.5/extensions/Wikibase/repo/config/: Backport: [[gerrit:732669{{!}}Remove dispatchViaJobs repo setting (T292604)]] (2/3) (duration: 00m 54s)
* 16:05 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 16:04 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.38.0-wmf.5/extensions/Wikibase/repo/includes/: Backport: [[gerrit:732669{{!}}Remove dispatchViaJobs repo setting (T292604)]] (1/3) (duration: 00m 56s)
* 16:03 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality' for release 'main' .
* 16:02 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 16:01 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.38.0-wmf.5/extensions/Wikibase/repo/tests/: Backport: [[gerrit:732668{{!}}Remove dispatchViaJobsPruneChangesTableInJobEnabled repo setting (T292604)]] (3/3) (duration: 00m 56s)
* 15:59 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.38.0-wmf.5/extensions/Wikibase/repo/config/: Backport: [[gerrit:732668{{!}}Remove dispatchViaJobsPruneChangesTableInJobEnabled repo setting (T292604)]] (2/3) (duration: 00m 55s)
* 15:58 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.38.0-wmf.5/extensions/Wikibase/repo/includes/: Backport: [[gerrit:732668{{!}}Remove dispatchViaJobsPruneChangesTableInJobEnabled repo setting (T292604)]] (1/3) (duration: 00m 57s)
* 15:43 robh@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 15:21 robh@cumin1001: START - Cookbook sre.dns.netbox
* 15:14 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.38.0-wmf.5/extensions/Wikibase/repo/tests/: Backport: [[gerrit:732667{{!}}Remove dispatchViaJobsAllowedClients repo setting (T292604)]] (3/3) (duration: 00m 56s)
* 15:13 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 15:13 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.38.0-wmf.5/extensions/Wikibase/repo/config/: Backport: [[gerrit:732667{{!}}Remove dispatchViaJobsAllowedClients repo setting (T292604)]] (1/3) (duration: 00m 54s)
* 15:12 Lucas_WMDE: my next message accidentally says 1/3 again but it’s 2/3, sorry
* 15:11 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.38.0-wmf.5/extensions/Wikibase/repo/includes/: Backport: [[gerrit:732667{{!}}Remove dispatchViaJobsAllowedClients repo setting (T292604)]] (1/3) (duration: 00m 56s)
* 15:10 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 14:56 volans@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sretest1001.eqiad.wmnet with OS buster
* 14:42 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.38.0-wmf.5/extensions/Wikibase/repo/config/Wikibase.default.php: Backport: [[gerrit:732666{{!}}Enable dispatching via jobs by default (T291828)]] (duration: 00m 55s)
* 14:41 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 14:39 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.38.0-wmf.5/extensions/Wikibase/client/: Backport: [[gerrit:732674{{!}}Fix ExternalUserNames service wiring for local database]] (duration: 00m 57s)
* 14:38 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 14:33 volans@cumin2002: START - Cookbook sre.hosts.reimage for host sretest1001.eqiad.wmnet with OS buster
* 14:26 otto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-main' for release 'canary' .
* 14:26 otto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
* 14:19 otto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
* 14:19 otto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-main' for release 'canary' .
* 13:56 otto@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'eventgate-main' for release 'canary' .
* 13:55 otto@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
* 13:49 otto@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
* 13:49 otto@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'eventgate-main' for release 'canary' .
* 13:34 volans: uploaded spicerack_1.0.6 to apt.wikimedia.org buster-wikimedia,bullseye-wikimedia
* 13:08 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 13:05 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 13:04 hashar@deploy1002: rebuilt and synchronized wikiversions files: all wikis to 1.38.0-wmf.5  refs [[phab:T281169|T281169]]
* 12:56 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on 7 hosts with reason: Schema change s3 [[phab:T278619|T278619]]
* 12:56 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on 7 hosts with reason: Schema change s3 [[phab:T278619|T278619]]
* 12:52 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on 14 hosts with reason: Schema change s1 [[phab:T278619|T278619]]
* 12:52 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on 14 hosts with reason: Schema change s1 [[phab:T278619|T278619]]
* 12:48 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on 13 hosts with reason: Schema change s4 [[phab:T278619|T278619]]
* 12:48 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on 13 hosts with reason: Schema change s4 [[phab:T278619|T278619]]
* 12:43 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 9 hosts with reason: Schema change s2 [[phab:T278619|T278619]]
* 12:43 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 9 hosts with reason: Schema change s2 [[phab:T278619|T278619]]
* 12:34 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 11 hosts with reason: Schema change s7 [[phab:T278619|T278619]]
* 12:34 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 11 hosts with reason: Schema change s7 [[phab:T278619|T278619]]
* 11:55 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 9 hosts with reason: Schema change s5 [[phab:T278619|T278619]]
* 11:54 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 9 hosts with reason: Schema change s5 [[phab:T278619|T278619]]
* 11:47 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 9 hosts with reason: Schema change s6 [[phab:T278619|T278619]]
* 11:47 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 9 hosts with reason: Schema change s6 [[phab:T278619|T278619]]
* 11:13 Lucas_WMDE: UTC morning backport+config window done
* 11:10 Lucas_WMDE: lucaswerkmeister-wmde@mwmaint1002:~$ mwscript extensions/Wikibase/repo/maintenance/ResubmitChanges.php wikidatawiki --minimum-age $((60*60*12)) # [[phab:T294008|T294008]]
* 11:10 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 11:07 jgiannelos@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:730848{{!}}Configure event stream for map tiles state change (T289771)]] (duration: 01m 04s)
* 11:07 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 10:48 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.cf (exit_code=0)
* 10:48 ayounsi@cumin1001: START - Cookbook sre.network.cf
* 10:48 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.cf (exit_code=0)
* 10:47 ayounsi@cumin1001: START - Cookbook sre.network.cf
* 10:14 jbond: mergeing refactor of P:base Gerrit:714975
* 09:54 ayounsi@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 09:49 ayounsi@cumin1001: START - Cookbook sre.dns.netbox
* 08:56 urbanecm@deploy1002: Synchronized private/PrivateSettings.php: Update [[phab:T250887|T250887]] mitigations (duration: 01m 03s)
* 08:33 ema@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp3062.esams.wmnet,service=(varnish-fe{{!}}ats-tls)
* 08:26 ema@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp3062.esams.wmnet,service=(varnish-fe{{!}}ats-tls)
* 08:25 ema: cp3062: revert vsl_space experiment [[phab:T293879|T293879]]
* 08:24 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host graphite1004.eqiad.wmnet with OS bullseye
* 08:03 marostegui@cumin1001: dbctl commit (dc=all): 'db1118 (re)pooling @ 100%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17563 and previous config saved to /var/cache/conftool/dbconfig/20211021-080330-root.json
* 07:56 filippo@cumin1001: START - Cookbook sre.hosts.reimage for host graphite1004.eqiad.wmnet with OS bullseye
* 07:48 marostegui@cumin1001: dbctl commit (dc=all): 'db1118 (re)pooling @ 75%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17562 and previous config saved to /var/cache/conftool/dbconfig/20211021-074826-root.json
* 07:33 marostegui@cumin1001: dbctl commit (dc=all): 'db1118 (re)pooling @ 50%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17561 and previous config saved to /var/cache/conftool/dbconfig/20211021-073323-root.json
* 07:18 marostegui@cumin1001: dbctl commit (dc=all): 'db1118 (re)pooling @ 25%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17560 and previous config saved to /var/cache/conftool/dbconfig/20211021-071819-root.json
* 07:03 marostegui@cumin1001: dbctl commit (dc=all): 'db1118 (re)pooling @ 10%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17559 and previous config saved to /var/cache/conftool/dbconfig/20211021-070315-root.json
* 06:48 marostegui@cumin1001: dbctl commit (dc=all): 'db1118 (re)pooling @ 5%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17558 and previous config saved to /var/cache/conftool/dbconfig/20211021-064812-root.json
* 06:35 elukey: `systemctl reload nginx` on cloudelastic100[5,6] to pick up the new TLS certificate and clear alerts - [[phab:T293826|T293826]]
* 04:47 marostegui: Deploy schema change on s5 codfw - [[phab:T291719|T291719]]
* 04:37 marostegui: Deploy schema change on s6 codfw - [[phab:T291719|T291719]]
* 04:04 legoktm: restarted apache on lists1001 so it only uses new TLS cert ([[phab:T293826|T293826]])
* 03:29 eileen: civicrm revision changed from {{Gerrit|e889831012}} to {{Gerrit|733a8fceda}}, config revision is {{Gerrit|eed79486d5}}
* 00:06 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 00:01 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .


== 2020-10-23 ==
== 2021-10-20 ==
* 22:56 mutante: added Nuria to "nda" LDAP group - leaving her in "wmf" until the actual last day - shell account remains so no puppet change needed in ldap_only_admins ([[phab:T266086|T266086]])
* 23:56 thcipriani@deploy1002: Finished scap: Backport: [[gerrit:732336{{!}}Restore title to mobile skin without logo (T290525)]] (duration: 11m 41s)
* 15:42 pt1979@cumin2001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 23:44 thcipriani@deploy1002: Started scap: Backport: [[gerrit:732336{{!}}Restore title to mobile skin without logo (T290525)]]
* 15:37 pt1979@cumin2001: START - Cookbook sre.dns.netbox
* 23:42 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 13:04 ema: rolling thumbor-instances restart to apply https://gerrit.wikimedia.org/r/c/operations/puppet/+/636012/ [[phab:T266155|T266155]]
* 23:39 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 12:47 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'kube-system' for release 'eventrouter' .
* 23:30 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 10:57 kormat: uploaded orchestrator v3.2.3 to apt.wikimedia.org buster-wikimedia - [[phab:T266023|T266023]] (forgot to log this earlier)
* 23:29 tstarling@deploy1002: Synchronized wmf-config/CommonSettings.php: fawiki require login for creation of pages in the draft namespace [[phab:T291018|T291018]] (duration: 01m 02s)
* 10:56 volans: uploaded python3-wmflib_0.0.3 to apt.wikimedia.org buster-wikimedia - [[phab:T257905|T257905]]
* 23:27 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 10:09 jayme: published docker-registry.discovery.wmnet/eventrouter:0.3.0-2
* 23:27 tstarling@deploy1002: Synchronized wmf-config/InitialiseSettings.php: fawiki require login to edit main namespace [[phab:T291018|T291018]] (duration: 01m 04s)
* 09:51 moritzm: masking slapd on the old Stretch replicas to uncover potential direct access outside of the LVSes  [[phab:T264388|T264388]]
* 22:13 dancy@deploy1002: Synchronized README: testing (4/4) (duration: 02m 52s)
* 09:47 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 22:00 dancy@deploy1002: Synchronized README: testing (3/4) (duration: 02m 57s)
* 09:47 jmm@cumin2001: START - Cookbook sre.hosts.downtime
* 21:54 dancy@deploy1002: Synchronized README: testing (2) (duration: 01m 02s)
* 09:47 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 21:52 dancy@deploy1002: Synchronized README: (no justification provided) (duration: 01m 03s)
* 09:47 jmm@cumin2001: START - Cookbook sre.hosts.downtime
* 21:50 dancy: Testing a series of one-file scap sync-file runs
* 09:32 jayme@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
* 21:22 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 09:31 jayme: published docker-registry.discovery.wmnet/eventrouter:0.3.0-1
* 21:19 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 09:26 jayme@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
* 21:10 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 09:23 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'wikifeeds' for release 'staging' .
* 21:08 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|b9cf996a38d82fdd67e600a5a951e88423957e8d}}: Promote Growth features out of darkmode on several wikis  ([[phab:T291826|T291826]], [[phab:T255037|T255037]], [[phab:T287878|T287878]]) (duration: 01m 04s)
* 09:09 volans: upgrading spicerack to 0.0.44 on cumin hosts - [[phab:T257905|T257905]]
* 21:07 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 20:38 eileen: civicrm revision changed from {{Gerrit|9b5e0d015b}} to {{Gerrit|e889831012}}, config revision is {{Gerrit|eed79486d5}}
* 20:25 legoktm: uploaded php7.4 on buster to apt.wm.o ([[phab:T293449|T293449]])
* 19:24 ebernhardson@deploy1002: Finished deploy [search/mjolnir/deploy@985a139]: bulk_daemon: detect cross-cluste config from old and new locations (duration: 00m 46s)
* 19:24 ebernhardson@deploy1002: Started deploy [search/mjolnir/deploy@985a139]: bulk_daemon: detect cross-cluste config from old and new locations
* 19:09 mutante: disabling puppet on mw* for a minute to deploy a change
* 18:41 otto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
* 18:41 otto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-main' for release 'canary' .
* 18:31 otto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-main' for release 'canary' .
* 18:30 otto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
* 18:24 otto@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
* 17:28 mutante: [krb1001:~] $ sudo manage_principals.py create statwithlatte --email_address=naray-ctr@wikimedia.org -  [[phab:T293810|T293810]]
* 17:27 mutante: [krb1001:~] $ sudo manage_principals.py create statwithlatte --email_address=naray-ctr@wikimedia.org
* 17:11 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 17:05 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 17:01 razzi@deploy1002: Finished deploy [analytics/refinery@9e3295f]: Regular analytics weekly train [analytics/refinery@9e3295f] (duration: 23m 42s)
* 17:00 hashar@deploy1002: Synchronized php-1.38.0-wmf.5/extensions/Wikibase/client: Update deprecated calls to ParserOutput in ShortDescHandler - [[phab:T293860|T293860]] (duration: 01m 03s)
* 16:56 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 16:53 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 16:53 hashar@deploy1002: Synchronized php-1.38.0-wmf.5/extensions/LiquidThreads/pages/LqtDiscussionPager.php: Remove deprecated usage of setProperty - [[phab:T293895|T293895]] (duration: 01m 03s)
* 16:49 hashar@deploy1002: Synchronized php-1.38.0-wmf.5/extensions/GeoCrumbs: Replace use of deprecated ParserOutput:getProperty() - [[phab:T293894|T293894]] (duration: 01m 09s)
* 16:44 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 16:41 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 16:37 razzi@deploy1002: Started deploy [analytics/refinery@9e3295f]: Regular analytics weekly train [analytics/refinery@9e3295f]
* 16:36 razzi: deploy refinery change for https://phabricator.wikimedia.org/T287084
* 16:13 jbond: upload cas_6.4.2-1_amd64.deb
* 15:42 volans@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 15:39 volans@cumin2002: START - Cookbook sre.dns.netbox
* 14:57 moritzm: installing modsecurity-crs security updates on Buster
* 14:48 moritzm: installing xmlgraphics-commons security updates on Buster
* 14:46 moritzm: installing irssi security updates on Buster
* 14:44 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
* 14:44 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality' for release 'main' .
* 14:35 moritzm: installing commons-io security updates on Buster
* 14:27 ema: cp3062: test higher vsl_space values [[phab:T293879|T293879]]
* 14:27 kevinbazira@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
* 14:12 moritzm: installing ruby2.3 security updates
* 13:40 moritzm: installing apache2 security updates on buster
* 13:27 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 13:24 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 13:21 hashar@deploy1002: Synchronized php: group1 wikis to 1.38.0-wmf.5  refs [[phab:T281169|T281169]] (duration: 01m 02s)
* 13:20 hashar@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.38.0-wmf.5  refs [[phab:T281169|T281169]]
* 13:11 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on 7 hosts with reason: Schema change s3 [[phab:T277116|T277116]]
* 13:11 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on 7 hosts with reason: Schema change s3 [[phab:T277116|T277116]]
* 13:04 ema@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp3062.esams.wmnet,service=ats-tls
* 13:04 ema@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp3062.esams.wmnet,service=varnish-fe
* 12:51 ema: cp3062: bump vsl_space from 80M (default) to 512M [[phab:T293879|T293879]] - varnish restart needed
* 12:37 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on 14 hosts with reason: Schema change s1 [[phab:T277116|T277116]]
* 12:36 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on 14 hosts with reason: Schema change s1 [[phab:T277116|T277116]]
* 12:17 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 12:09 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 12:02 urbanecm@deploy1002: Finished scap: {{Gerrit|802d3b7}}: {{Gerrit|e4f7f85}}: CreateAccountCampaign: Support for recurring donors ([[phab:T293699|T293699]]) (duration: 25m 19s)
* 11:57 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 11:49 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 11:46 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts testvm2007.codfw.wmnet
* 11:40 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts testvm2007.codfw.wmnet
* 11:37 btullis@cumin1001: END (PASS) - Cookbook sre.hadoop.roll-restart-masters (exit_code=0) restart masters for Hadoop test cluster: Restart of jvm daemons. - btullis@cumin1001
* 11:37 urbanecm@deploy1002: Started scap: {{Gerrit|802d3b7}}: {{Gerrit|e4f7f85}}: CreateAccountCampaign: Support for recurring donors ([[phab:T293699|T293699]])
* 11:32 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts testvm2005.codfw.wmnet
* 11:21 moritzm: installing ffmpeg security updates
* 11:15 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|e520fc57411bb19123766192cd636396ea6fc59d}}: GrowthExperiments: Add campaign pattern for enwiki ([[phab:T293699|T293699]]) (duration: 01m 22s)
* 11:11 btullis@cumin1001: START - Cookbook sre.hadoop.roll-restart-masters restart masters for Hadoop test cluster: Restart of jvm daemons. - btullis@cumin1001
* 11:10 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 11:07 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 10:57 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts testvm2005.codfw.wmnet
* 10:13 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 13 hosts with reason: Schema change s4 [[phab:T277116|T277116]]
* 10:13 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 13 hosts with reason: Schema change s4 [[phab:T277116|T277116]]
* 09:59 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 9 hosts with reason: Schema change s2 [[phab:T277116|T277116]]
* 09:59 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 9 hosts with reason: Schema change s2 [[phab:T277116|T277116]]
* 09:52 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 11 hosts with reason: Schema change s7 [[phab:T277116|T277116]]
* 09:52 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 11 hosts with reason: Schema change s7 [[phab:T277116|T277116]]
* 09:05 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 9 hosts with reason: Schema change s5 [[phab:T277116|T277116]]
* 09:04 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 9 hosts with reason: Schema change s5 [[phab:T277116|T277116]]
* 08:50 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 9 hosts with reason: Schema change s6 [[phab:T277116|T277116]]
* 08:50 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 9 hosts with reason: Schema change s6 [[phab:T277116|T277116]]
* 08:01 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
* 08:01 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 07:16 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1118.eqiad.wmnet with OS buster
* 07:09 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 06:49 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host db1118.eqiad.wmnet with OS buster
* 06:45 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1118 (s1) for reimage [[phab:T290865|T290865]]', diff saved to https://phabricator.wikimedia.org/P17552 and previous config saved to /var/cache/conftool/dbconfig/20211020-064529-marostegui.json
* 06:41 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1126.eqiad.wmnet with OS buster
* 06:39 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1106 (s1) after upgrade', diff saved to https://phabricator.wikimedia.org/P17551 and previous config saved to /var/cache/conftool/dbconfig/20211020-063926-marostegui.json
* 06:35 marostegui: Upgrade db1106
* 06:34 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1106 (s1) for upgrade', diff saved to https://phabricator.wikimedia.org/P17550 and previous config saved to /var/cache/conftool/dbconfig/20211020-063431-marostegui.json
* 06:31 dcausse: restarting blazegraph on wdqs1012
* 06:28 elukey: reboot analytics1066 - OS showing CPU soft lockups, tons of defunct processes (including node manager) and high CPU usage
* 06:21 marostegui: Depool clouddb1013 for upgrade
* 06:14 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host db1126.eqiad.wmnet with OS buster
* 06:12 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 06:12 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1126 (s8) for upgrade', diff saved to https://phabricator.wikimedia.org/P17549 and previous config saved to /var/cache/conftool/dbconfig/20211020-061202-marostegui.json
* 06:06 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 06:05 XioNoX: put transport link between ulsfo and eqsin in service - [[phab:T273308|T273308]]
* 05:59 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2112.codfw.wmnet with OS buster
* 05:26 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host db2112.codfw.wmnet with OS buster
* 04:44 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 04:42 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 04:40 legoktm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Enable $wgLocalHTTPProxy on group0 wikis ([[phab:T288848|T288848]]) (duration: 01m 05s)
* 01:31 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 01:28 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 00:03 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 00:00 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 00:00 tgr: west coast evening deploys done


== 2020-10-22 ==
== 2021-10-19 ==
* 22:42 mutante: ganeti1001 - adding 2 more vcpus to VM testreduce1001 - [[phab:T257940|T257940]]
* 23:59 tgr@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:732103{{!}}Reorder some wikis at wgExtraNamespaces and wmgVisualEditorAvailableNamespaces (T293846)]] (duration: 01m 02s)
* 22:03 mutante: deploy1002 - armed keyholder, all deployment keys loaded [[phab:T265963|T265963]]
* 23:51 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 21:56 mutante: deploy1002 - scap pull  and added to mediawiki-installation "dsh" group - will be part of scap trains but just like any appserver ([[phab:T265963|T265963]])
* 23:48 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 20:36 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 23:47 tgr@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:732053{{!}}ruwikiversity: Add 'portal' and 'faculty' namespaces (T293545)]] (duration: 01m 03s)
* 20:36 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 23:40 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 19:13 mutante: deploy1002 currently cloning ALL the deployment repos - new setup
* 23:37 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 18:57 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 23:36 tgr@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:710565{{!}}Set the project namespace and sitename for Javanese Wikipedia and Wiktionary (T287437)]] (duration: 01m 02s)
* 18:57 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 23:23 tgr@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:731953{{!}}Create Portal and Portal talk namespace for shiwiki (T288909)]] (duration: 01m 03s)
* 18:56 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 23:23 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 18:56 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 23:15 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 18:54 mutante: applying deployment_server role to new server deploy1002 - might show up in monitoring but is not prod yet, deploy1001 still is
* 23:13 tgr@deploy1002: Synchronized static: Config: [[gerrit:731231{{!}}Repair the size of the logo of Kashmiri Wikipedia (T293342)]] (duration: 02m 14s)
* 18:34 mutante: adding mcrouter cert for deploy1002.eqiad.wmnet [[phab:T265963|T265963]]
* 21:34 mutante: mwmaint1002 - delete large files over 100MB from puppet clientbucket. sudo /usr/bin/find /var/lib/puppet/clientbucket/ -type f -size +100M -delete {{!}} fixed Icinga alert:  RECOVERY - Check for large files in client bucket on mwmaint1002 is OK: OK: [[phab:T165885|T165885]]
* 18:12 dpifke@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Expand  to group1 ([[phab:T123582|T123582]]) (duration: 00m 56s)
* 21:32 mutante: mwmaint1002 - delete large files over 100MB from puppet clientbucket. sudo /usr/bin/find /var/lib/puppet/clientbucket/ -type f -size +100M -delete
* 18:12 volans: cumin 'A:dns-rec' 'rec_control wipe-cache wikimedia.org$' - [[phab:T258729|T258729]]
* 20:56 ejegg: updated payments-wiki from {{Gerrit|0f48acea49}} to {{Gerrit|30e596903d}}
* 18:07 chaomodus: Updating eqiad public network DNS to automation
* 19:03 hashar@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.38.0-wmf.5  refs [[phab:T281169|T281169]]
* 17:50 volans: cumin 'A:dns-rec' 'rec_control wipe-cache eqiad.wmnet$' - [[phab:T258729|T258729]]
* 18:46 urbanecm@deploy1002: Synchronized php-1.38.0-wmf.5/extensions/MediaSearch/: {{Gerrit|a84a675}}: {{Gerrit|3231578}}: MediaSearch backports ([[phab:T291392|T291392]], [[phab:T293335|T293335]], [[phab:T291392|T291392]], [[phab:T291622|T291622]], [[phab:T293554|T293554]]) (duration: 01m 03s)
* 17:49 elukey: add thirdparty/bigtop14 to buster-wikimedia
* 18:45 urbanecm@deploy1002: Synchronized php-1.38.0-wmf.4/extensions/MediaSearch/: {{Gerrit|694580a}}: {{Gerrit|c02e301}}: MediaSearch backports([[phab:T291392|T291392]], [[phab:T293335|T293335]], [[phab:T291392|T291392]], [[phab:T291622|T291622]], [[phab:T293554|T293554]]) (duration: 01m 03s)
* 17:46 chaomodus: Updating eqiad private network DNS to automation
* 18:37 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudmetrics1003.eqiad.wmnet with OS bullseye
* 17:21 bd808@cumin1001: END (PASS) - Cookbook wmcs.wikireplicas.add_wiki (exit_code=0)
* 18:30 foks: deleting 1 more email with deleteUserEmail.php
* 17:21 bd808@cumin1001: Added views for new wiki: smnwiki [[phab:T264900|T264900]]
* 18:17 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|1476a2d93}}: {{Gerrit|dd8393c1a0}}: foundationwiki: Restrict sensitive namespaces to editor group ([[phab:T205350|T205350]]) (duration: 01m 03s)
* 17:07 bd808@cumin1001: START - Cookbook wmcs.wikireplicas.add_wiki
* 18:12 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host cloudmetrics1003.eqiad.wmnet with OS bullseye
* 16:46 pt1979@cumin2001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 18:12 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|9a2893c7190e615a247674dbf7f87348bf43b91c}}: Enable topic subscriptions as a beta feature on all remaining projects ([[phab:T287802|T287802]]) (duration: 01m 04s)
* 16:42 pt1979@cumin2001: START - Cookbook sre.dns.netbox
* 18:00 legoktm@deploy1002: Synchronized wmf-config/: Add framework for setting $wgLocalHTTPProxy ([[phab:T288848|T288848]]) (2/2) (duration: 01m 06s)
* 14:56 moritzm: installing remaining mariadb-10.3 updates for buster (as packaged in Debian, not the wmf-mariadb package)
* 17:59 legoktm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Add framework for setting $wgLocalHTTPProxy ([[phab:T288848|T288848]]) (1/2) (duration: 01m 05s)
* 14:47 ayounsi@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 17:57 foks: removing six email addresses on request (with deleteUserEmail.php)
* 14:33 ayounsi@cumin1001: START - Cookbook sre.dns.netbox
* 17:37 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudmetrics1004.eqiad.wmnet with OS bullseye
* 14:13 andrewbogott: upgrading mariadb on cloudcontrol1003, 1004, 1005
* 17:25 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudmetrics1003.eqiad.wmnet with OS bullseye
* 14:05 ottomata: bump camus version to wmf12 for all camus jobs.  should be no-op now. - [[phab:T251609|T251609]]
* 17:11 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host cloudmetrics1004.eqiad.wmnet with OS bullseye
* 14:00 otto@deploy1001: Synchronized wmf-config/InitialiseSettings.php: wgEventStreams: Enable canary events for all eventgate-analytics-external bound streams - [[phab:T251609|T251609]] (duration: 01m 02s)
* 17:09 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host cloudmetrics1003.eqiad.wmnet with OS bullseye
* 13:55 moritzm: depooling ldap-eqiad-replica01/ldap-eqiad-replica02 [[phab:T264388|T264388]]
* 16:48 bd808@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'toolhub' for release 'main' .
* 13:41 moritzm: pooling ldap-replica1001/1002 [[phab:T264388|T264388]]
* 16:46 bd808@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'toolhub' for release 'main' .
* 13:10 moritzm: depooling ldap-replica2001/2002 [[phab:T264388|T264388]]
* 16:41 bd808@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'toolhub' for release 'main' .
* 13:04 liw@deploy1001: rebuilt and synchronized wikiversions files: all wikis to 1.36.0-wmf.14
* 16:12 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on 7 hosts with reason: Schema change s3 [[phab:T277118|T277118]]
* 13:01 moritzm: pooling ldap-replica2004 [[phab:T264388|T264388]]
* 16:11 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on 7 hosts with reason: Schema change s3 [[phab:T277118|T277118]]
* 12:24 otto@deploy1001: Synchronized wmf-config/InitialiseSettings.php: wgEventStreams: Enable canary events for 3 eventgate-analytics bound streams - [[phab:T251609|T251609]] (duration: 01m 05s)
* 16:09 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 14 hosts with reason: Schema change s1 [[phab:T277118|T277118]]
* 12:10 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|52ad2d4df1164dced684231c12aa64bd028b8ac9}}: Do not log logins at loginwiki via CU ([[phab:T253802|T253802]]) (duration: 01m 06s)
* 16:09 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 14 hosts with reason: Schema change s1 [[phab:T277118|T277118]]
* 12:03 Urbanecm: [urbanecm@deploy1001 /srv/mediawiki-staging (master * u=)]$ sudo /usr/local/sbin/fix-staging-perms
* 16:06 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 13 hosts with reason: Schema change s4 [[phab:T277118|T277118]]
* 11:59 Lucas_WMDE: EU backport&config window done
* 16:06 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 13 hosts with reason: Schema change s4 [[phab:T277118|T277118]]
* 11:58 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/Wikibase.php: Config: [[gerrit:635762{{!}}Enable propagatePageDeletion on Test Wikidata]], 2/2 (duration: 01m 04s)
* 16:00 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 11 hosts with reason: Schema change s7 [[phab:T277118|T277118]]
* 11:57 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:635762{{!}}Enable propagatePageDeletion on Test Wikidata]], 1/2 (duration: 01m 02s)
* 16:00 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 11 hosts with reason: Schema change s7 [[phab:T277118|T277118]]
* 11:54 Urbanecm: Start of `mwscript extensions/AbuseFilter/maintenance/updateVarDumps.php --wiki=$wiki --print-orphaned-records-to=/tmp/urbanecm/$wiki-orphaned.log --progress-markers > $wiki.log` in a tmux session updateVarDumps at mwmaint2001 (wiki=huwiki; [[phab:T246539|T246539]])
* 15:46 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 9 hosts with reason: Schema change s2 [[phab:T277118|T277118]]
* 11:39 moritzm: restarting nginx on acmechief*, debmonitor*, schema*, puppetdb* to pick up freetype update
* 15:46 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 9 hosts with reason: Schema change s2 [[phab:T277118|T277118]]
* 11:38 marostegui: Compare s1-s8 tables - [[phab:T261914|T261914]]
* 15:40 otto@deploy1002: Synchronized wmf-config/InitialiseSettings.php: wgEventStreams - remove now redundant stream setting - [[phab:T277193|T277193]] (duration: 01m 04s)
* 11:33 aborrero@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 15:35 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 9 hosts with reason: Schema change s5 [[phab:T277118|T277118]]
* 11:31 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/InterwikiSortOrders.php: Config: [[gerrit:635813{{!}}Add ary, avk, awa, lld, shy and smn to InterwikiSortOrders.php]] (duration: 01m 08s)
* 15:35 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 9 hosts with reason: Schema change s5 [[phab:T277118|T277118]]
* 11:31 aborrero@cumin2001: START - Cookbook sre.hosts.downtime
* 15:34 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on 9 hosts with reason: Schema change s6 [[phab:T277118|T277118]]
* 11:25 moritzm: restarting apache and smokeping* on netmon* to pick up freetype update
* 15:34 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 5:00:00 on 9 hosts with reason: Schema change s6 [[phab:T277118|T277118]]
* 11:21 moritzm: correction: installing freetype security updates for buster (stretch TBD)
* 15:32 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 9 hosts with reason: Schema change s6 [[phab:T277118|T277118]]
* 10:43 moritzm: installing freetype security updates for stretch/buster
* 15:32 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 9 hosts with reason: Schema change s6 [[phab:T277118|T277118]]
* 10:33 volans@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 15:30 bd808@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'toolhub' for release 'main' .
* 10:27 volans@cumin1001: START - Cookbook sre.dns.netbox
* 15:28 bd808@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'toolhub' for release 'main' .
* 09:38 arturo: merging https://gerrit.wikimedia.org/r/c/operations/puppet/+/634050 change to network data yaml
* 15:26 bd808@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'toolhub' for release 'main' .
* 08:31 kormat: enabling replication from eqiad to codfw [[phab:T261914|T261914]]
* 15:17 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2025.codfw.wmnet
* 08:23 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 15:12 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2025.codfw.wmnet
* 08:23 filippo@cumin1001: START - Cookbook sre.hosts.downtime
* 14:34 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 07:52 godog: swift codfw-prod: bump object weight for ms-be2057 - [[phab:T261633|T261633]]
* 14:29 jbond: disable puppet on lvs, cp, authdns, mc, mw-be and wcqs to while i merge G:662699
* 03:37 eileen: civicrm revision changed from {{Gerrit|4dce7bf535}} to {{Gerrit|bb7c08bf6d}}, config revision is {{Gerrit|9a522d03dd}}
* 14:15 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 03:13 eileen: civicrm revision changed from {{Gerrit|3c3dcf80ae}} to {{Gerrit|4dce7bf535}}, config revision is {{Gerrit|9a522d03dd}}
* 14:11 hashar@deploy1002: Finished scap: testwikis wikis to 1.38.0-wmf.5  refs [[phab:T281169|T281169]] (duration: 45m 13s)
* 01:12 ryankemper@deploy1001: Finished deploy [wdqs/wdqs@870829c]: 0.3.52 (duration: 09m 07s)
* 13:52 kevinbazira@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality' for release 'main' .
* 01:04 ryankemper: Tests passing on canary `wdqs1003`, proceeding with wdqs deploy for rest of fleet
* 13:45 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 01:03 ryankemper@deploy1001: Started deploy [wdqs/wdqs@870829c]: 0.3.52
* 13:31 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 13:28 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 13:26 hashar@deploy1002: Started scap: testwikis wikis to 1.38.0-wmf.5  refs [[phab:T281169|T281169]]
* 13:19 marostegui@cumin1001: dbctl commit (dc=all): 'db1105:3311 (re)pooling @ 100%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17547 and previous config saved to /var/cache/conftool/dbconfig/20211019-131927-root.json
* 13:16 marostegui@cumin1001: dbctl commit (dc=all): 'db1105:3312 (re)pooling @ 100%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17546 and previous config saved to /var/cache/conftool/dbconfig/20211019-131651-root.json
* 13:04 marostegui@cumin1001: dbctl commit (dc=all): 'db1105:3311 (re)pooling @ 75%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17545 and previous config saved to /var/cache/conftool/dbconfig/20211019-130424-root.json
* 13:01 marostegui@cumin1001: dbctl commit (dc=all): 'db1105:3312 (re)pooling @ 75%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17544 and previous config saved to /var/cache/conftool/dbconfig/20211019-130147-root.json
* 12:49 marostegui@cumin1001: dbctl commit (dc=all): 'db1105:3311 (re)pooling @ 50%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17543 and previous config saved to /var/cache/conftool/dbconfig/20211019-124920-root.json
* 12:46 marostegui@cumin1001: dbctl commit (dc=all): 'db1105:3312 (re)pooling @ 50%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17542 and previous config saved to /var/cache/conftool/dbconfig/20211019-124644-root.json
* 12:40 moritzm: installing aftpd security updates
* 12:34 marostegui@cumin1001: dbctl commit (dc=all): 'db1105:3311 (re)pooling @ 25%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17541 and previous config saved to /var/cache/conftool/dbconfig/20211019-123416-root.json
* 12:34 marostegui: Upgrade dbstore1003
* 12:31 marostegui@cumin1001: dbctl commit (dc=all): 'db1105:3312 (re)pooling @ 25%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17540 and previous config saved to /var/cache/conftool/dbconfig/20211019-123140-root.json
* 12:19 marostegui@cumin1001: dbctl commit (dc=all): 'db1105:3311 (re)pooling @ 10%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17539 and previous config saved to /var/cache/conftool/dbconfig/20211019-121913-root.json
* 12:16 marostegui@cumin1001: dbctl commit (dc=all): 'db1105:3312 (re)pooling @ 10%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17538 and previous config saved to /var/cache/conftool/dbconfig/20211019-121636-root.json
* 12:12 XioNoX: push anycast tuning to all Lumen and NTT transit links - [[phab:T288843|T288843]]
* 12:09 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1167 (s8) after upgrade', diff saved to https://phabricator.wikimedia.org/P17537 and previous config saved to /var/cache/conftool/dbconfig/20211019-120918-marostegui.json
* 12:04 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1167 (s8) for upgrade', diff saved to https://phabricator.wikimedia.org/P17536 and previous config saved to /var/cache/conftool/dbconfig/20211019-120458-marostegui.json
* 12:04 marostegui@cumin1001: dbctl commit (dc=all): 'db1105:3311 (re)pooling @ 5%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17535 and previous config saved to /var/cache/conftool/dbconfig/20211019-120409-root.json
* 12:03 marostegui@cumin1001: dbctl commit (dc=all): 'db1101:3318 (re)pooling @ 100%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17534 and previous config saved to /var/cache/conftool/dbconfig/20211019-120348-root.json
* 12:01 urbanecm@deploy1002: Synchronized php-1.38.0-wmf.5/extensions/WikibaseMediaInfo/: {{Gerrit|ec0125770775c1a1a54c3b592d86d287fd9e3ad6}}: Escape captions when writing stored data into js state ([[phab:T293556|T293556]]) (duration: 00m 55s)
* 12:01 marostegui@cumin1001: dbctl commit (dc=all): 'db1105:3312 (re)pooling @ 5%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17533 and previous config saved to /var/cache/conftool/dbconfig/20211019-120132-root.json
* 12:00 urbanecm@deploy1002: Synchronized php-1.38.0-wmf.4/extensions/WikibaseMediaInfo/: {{Gerrit|79808a90a95dd5dac2b532b87fb7ec1a490ea0f0}}: Escape captions when writing stored data into js state ([[phab:T293556|T293556]]) (duration: 00m 56s)
* 12:00 marostegui@cumin1001: dbctl commit (dc=all): 'db1101:3317 (re)pooling @ 100%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17532 and previous config saved to /var/cache/conftool/dbconfig/20211019-120024-root.json
* 11:58 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 11:56 XioNoX: push anycast tuning to Tele2, Init7, DT transit links - [[phab:T288843|T288843]]
* 11:55 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 11:48 marostegui@cumin1001: dbctl commit (dc=all): 'db1101:3318 (re)pooling @ 75%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17531 and previous config saved to /var/cache/conftool/dbconfig/20211019-114844-root.json
* 11:46 marostegui: Upgrade db1105 (s1,s2)
* 11:46 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1105 (s1,s2) for upgrade', diff saved to https://phabricator.wikimedia.org/P17530 and previous config saved to /var/cache/conftool/dbconfig/20211019-114649-marostegui.json
* 11:45 marostegui@cumin1001: dbctl commit (dc=all): 'db1101:3317 (re)pooling @ 75%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17529 and previous config saved to /var/cache/conftool/dbconfig/20211019-114520-root.json
* 11:33 marostegui@cumin1001: dbctl commit (dc=all): 'db1101:3318 (re)pooling @ 50%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17527 and previous config saved to /var/cache/conftool/dbconfig/20211019-113340-root.json
* 11:30 marostegui@cumin1001: dbctl commit (dc=all): 'db1101:3317 (re)pooling @ 50%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17526 and previous config saved to /var/cache/conftool/dbconfig/20211019-113017-root.json
* 11:18 marostegui@cumin1001: dbctl commit (dc=all): 'db1101:3318 (re)pooling @ 25%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17525 and previous config saved to /var/cache/conftool/dbconfig/20211019-111837-root.json
* 11:15 marostegui@cumin1001: dbctl commit (dc=all): 'db1101:3317 (re)pooling @ 25%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17524 and previous config saved to /var/cache/conftool/dbconfig/20211019-111513-root.json
* 11:11 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 11:09 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 11:08 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|7c31b04e50101a60db7ae8acae64bc031f5e1007}}: DPL: Explicitly note it is not possible to enable DPL on any more wikis (duration: 00m 55s)
* 11:03 marostegui@cumin1001: dbctl commit (dc=all): 'db1101:3318 (re)pooling @ 10%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17523 and previous config saved to /var/cache/conftool/dbconfig/20211019-110333-root.json
* 11:02 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2025.codfw.wmnet
* 11:00 marostegui@cumin1001: dbctl commit (dc=all): 'db1101:3317 (re)pooling @ 10%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17522 and previous config saved to /var/cache/conftool/dbconfig/20211019-110009-root.json
* 10:56 marostegui: Upgrade clouddb1021
* 10:53 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2025.codfw.wmnet
* 10:51 moritzm: failover master in ganeti-test to ganeti2026
* 10:50 godog: bounce superset on an-tool1005 to pick up statsd changes - [[phab:T247963|T247963]]
* 10:49 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2112.codfw.wmnet with OS stretch
* 10:48 marostegui@cumin1001: dbctl commit (dc=all): 'db1101:3318 (re)pooling @ 5%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17521 and previous config saved to /var/cache/conftool/dbconfig/20211019-104829-root.json
* 10:45 godog: bounce navtiming on webperf1001 to pick up statsd changes - [[phab:T247963|T247963]]
* 10:45 godog: bounce superset on an-tool1010 to pick up statsd changes - [[phab:T247963|T247963]]
* 10:45 marostegui@cumin1001: dbctl commit (dc=all): 'db1101:3317 (re)pooling @ 5%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17520 and previous config saved to /var/cache/conftool/dbconfig/20211019-104506-root.json
* 10:38 oblivian@deploy1002: Synchronized w/static.php: Config: [[gerrit:730182{{!}}static.php: Add support for /static/current rewrites (take 2) (T285232)]] (duration: 00m 55s)
* 10:38 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2026.codfw.wmnet
* 10:37 marostegui: Upgrade db1101 (s7,s8)
* 10:36 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1101 (s7,s8) for upgrade', diff saved to https://phabricator.wikimedia.org/P17519 and previous config saved to /var/cache/conftool/dbconfig/20211019-103634-marostegui.json
* 10:34 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 10:31 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 10:29 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
* 10:28 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 10:27 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2026.codfw.wmnet
* 10:22 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 10:22 oblivian@deploy1002: Synchronized tests/WmfConfigServicesTest.php: Config: [[gerrit:731918{{!}}ProductionServices: use graphite2003 for statsd (T247963)]] (duration: 00m 54s)
* 10:22 godog: flip mw statsd traffic with https://gerrit.wikimedia.org/r/c/operations/mediawiki-config/+/731918 - [[phab:T247963|T247963]]
* 10:21 oblivian@deploy1002: Synchronized wmf-config/ProductionServices.php: Config: [[gerrit:731918{{!}}ProductionServices: use graphite2003 for statsd (T247963)]] (duration: 00m 54s)
* 10:20 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 10:18 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host db2112.codfw.wmnet with OS stretch
* 10:16 marostegui@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db2112.codfw.wmnet with OS buster
* 09:52 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host db2112.codfw.wmnet with OS buster
* 09:50 marostegui@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db2112.codfw.wmnet with OS buster
* 09:44 hashar@deploy1002: Pruned MediaWiki: 1.38.0-wmf.3 (duration: 01m 39s)
* 09:42 hashar@deploy1002: Pruned MediaWiki: 1.38.0-wmf.2 (duration: 16m 06s)
* 09:37 godog: move graphite/statsd writes to graphite2003 - [[phab:T247963|T247963]]
* 09:34 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host db2112.codfw.wmnet with OS buster
* 09:27 hashar: sap clean --delete 1.38.0-wmf.2 && scap clean --delete 1.38.0-wmf.3  # [[phab:T281169|T281169]]
* 09:27 hashar: Cloned and applied security patches for 1.38.0-wmf.5 # [[phab:T281169|T281169]]
* 09:19 marostegui: Stop slave on db2112 [[phab:T290865|T290865]]
* 09:18 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on 14 hosts with reason: Schema change s1 [[phab:T281058|T281058]]
* 09:18 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on 14 hosts with reason: Schema change s1 [[phab:T281058|T281058]]
* 09:03 XioNoX: push anycast tuning to all Telia transit links - [[phab:T288843|T288843]]
* 08:50 godog: point graphite.discovery.wmnet to graphite2003 - [[phab:T247963|T247963]]
* 08:40 XioNoX: push prep-work for anycast tuning to all sites - [[phab:T288843|T288843]]
* 08:33 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on 13 hosts with reason: Schema change s8 [[phab:T281058|T281058]]
* 08:33 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on 13 hosts with reason: Schema change s8 [[phab:T281058|T281058]]
* 08:32 urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript namespaceDupes.php hrwiki --fix
* 08:17 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 08:07 mvernon@cumin2002: conftool action : set/pooled=false; selector: name=codfw,dnsdisc=swift
* 08:07 mvernon@cumin2002: conftool action : set/pooled=false; selector: name=codfw,dnsdisc=swift-ro
* 08:03 XioNoX: push prep-work for anycast tuning in ulsfo (try 2) - [[phab:T288843|T288843]]
* 08:01 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 07:32 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 07:24 ema: A:cp start rolling varnish upgrades to 6.0.8-1wm1 [[phab:T292290|T292290]]
* 07:21 marostegui@cumin1001: dbctl commit (dc=all): 'db1178 (re)pooling @ 100%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17517 and previous config saved to /var/cache/conftool/dbconfig/20211019-072111-root.json
* 07:15 marostegui@cumin1001: dbctl commit (dc=all): 'db1184 (re)pooling @ 100%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17516 and previous config saved to /var/cache/conftool/dbconfig/20211019-071519-root.json
* 07:06 marostegui@cumin1001: dbctl commit (dc=all): 'db1178 (re)pooling @ 75%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17515 and previous config saved to /var/cache/conftool/dbconfig/20211019-070607-root.json
* 07:00 marostegui@cumin1001: dbctl commit (dc=all): 'db1184 (re)pooling @ 75%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17514 and previous config saved to /var/cache/conftool/dbconfig/20211019-070016-root.json
* 06:51 marostegui@cumin1001: dbctl commit (dc=all): 'db1178 (re)pooling @ 50%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17513 and previous config saved to /var/cache/conftool/dbconfig/20211019-065104-root.json
* 06:45 marostegui@cumin1001: dbctl commit (dc=all): 'db1184 (re)pooling @ 50%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17512 and previous config saved to /var/cache/conftool/dbconfig/20211019-064512-root.json
* 06:38 marostegui@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db2112.codfw.wmnet with OS buster
* 06:36 marostegui@cumin1001: dbctl commit (dc=all): 'db1178 (re)pooling @ 25%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17511 and previous config saved to /var/cache/conftool/dbconfig/20211019-063559-root.json
* 06:30 marostegui@cumin1001: dbctl commit (dc=all): 'db1184 (re)pooling @ 25%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17510 and previous config saved to /var/cache/conftool/dbconfig/20211019-063008-root.json
* 06:20 marostegui@cumin1001: dbctl commit (dc=all): 'db1178 (re)pooling @ 10%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17509 and previous config saved to /var/cache/conftool/dbconfig/20211019-062054-root.json
* 06:15 marostegui@cumin1001: dbctl commit (dc=all): 'db1184 (re)pooling @ 10%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17508 and previous config saved to /var/cache/conftool/dbconfig/20211019-061505-root.json
* 06:06 marostegui: Upgrade dbstore1005
* 06:05 marostegui@cumin1001: dbctl commit (dc=all): 'db1178 (re)pooling @ 5%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17507 and previous config saved to /var/cache/conftool/dbconfig/20211019-060551-root.json
* 06:04 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 06:03 marostegui: Upgrade db1184, db1178
* 06:01 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1178 for upgrade', diff saved to https://phabricator.wikimedia.org/P17506 and previous config saved to /var/cache/conftool/dbconfig/20211019-060123-marostegui.json
* 06:00 marostegui@cumin1001: dbctl commit (dc=all): 'db1184 (re)pooling @ 5%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17505 and previous config saved to /var/cache/conftool/dbconfig/20211019-060001-root.json
* 05:54 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1184 for upgrade', diff saved to https://phabricator.wikimedia.org/P17504 and previous config saved to /var/cache/conftool/dbconfig/20211019-055429-marostegui.json
* 05:51 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host db2112.codfw.wmnet with OS buster
* 05:46 marostegui: Reimage db2112 (s1 codfw master) [[phab:T290865|T290865]]
* 04:36 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 03:49 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 02:36 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 02:34 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 02:21 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 02:18 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 02:09 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 02:06 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 00:38 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer


== 2020-10-21 ==
== 2021-10-18 ==
* 23:16 catrope@deploy1001: Synchronized php-1.36.0-wmf.14/extensions/GrowthExperiments/: [[phab:T266033|T266033]] (duration: 01m 05s)
* 23:40 hoo: Updated the Wikidata property suggester with data from the 2021-10-04 JSON dump (with pre-applied [[phab:T132839|T132839]] workarounds)
* 23:14 catrope@deploy1001: Synchronized php-1.36.0-wmf.13/extensions/GrowthExperiments/: [[phab:T265751|T265751]] [[phab:T265754|T265754]] (duration: 01m 08s)
* 23:16 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|b654980240d51fff3c6e9c48f7076d4609c2560f}}: Create an alias for the Draft namespace on hrwiki ([[phab:T291755|T291755]]) (duration: 00m 56s)
* 21:38 mutante: testreduce1001 assigned 2 more GBs of RAM - rebooting ([[phab:T257940|T257940]], [[phab:T257906|T257906]])
* 23:16 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 19:44 Amir1: end of foreachwikiindblist wikidataclient extensions/Wikibase/lib/maintenance/populateSitesTable.php --force-protocol https ([[phab:T264963|T264963]])
* 23:13 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 19:15 Amir1: start of foreachwikiindblist wikidataclient extensions/Wikibase/lib/maintenance/populateSitesTable.php --force-protocol https ([[phab:T264963|T264963]])
* 23:12 urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript namespaceDupes.php --wiki=thwiktionary --fix # [[phab:T291761|T291761]]
* 18:13 Urbanecm: Morning B&C window done
* 23:10 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|abe777d28594da852e49ccb1c1597b2598f3e483}}: Create Rhymes namespace for thwiktionary ([[phab:T291761|T291761]]) (duration: 00m 57s)
* 18:12 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|45312d359442d274e83deb7be80f86e12fb9e864}}: [WikibaseMediaInfo] Fix concept chips array nesting structure ([[phab:T256431|T256431]]) (duration: 01m 05s)
* 23:04 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 18:12 mepps: updated payments-wiki-staging from {{Gerrit|db03677b2d}} to {{Gerrit|5fdd29bc16}}
* 23:01 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 18:09 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|d94e33ff39b300c74fcaf08d1746c089fb1af783}}: cirrus: Hardcode more_like to codfw cirrus cluster (duration: 01m 05s)
* 22:56 legoktm@deploy1002: Synchronized php-1.38.0-wmf.4/includes/http/MWHttpRequest.php: Allow using a reverse proxy for local HTTP requests ([[phab:T288848|T288848]]) (duration: 00m 56s)
* 17:56 XioNoX: configure FB PNI in eqdfw
* 22:06 maryum: deployed security patch for [[phab:T293589|T293589]]
* 17:43 ppchelko@deploy1001: Synchronized php-1.36.0-wmf.14/skins/WikimediaApiPortal: Backport gerrit:635329, [[phab:T266021|T266021]] (duration: 01m 06s)
* 21:23 maryum: deployed security patch for [[phab:T293556|T293556]]
* 17:34 ppchelko@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Switch ParserCache to JSON on testwiki gerrit:635382 (duration: 01m 05s)
* 21:05 mutante: mwmaint1002 - sudo -u www-data /usr/local/bin/mw-cli-wrapper /usr/local/bin/mwscript extensions/TranslationNotifications/scripts/DigestEmailer.php --wiki mediawikiwiki {{!}} Fatal error: Uncaught Error: Class 'MediaWiki\MediaWikiServices' not found
* 17:24 ppchelko@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enable ParserCache logger for warn+, gerrit:635071 (duration: 01m 08s)
* 20:58 mutante: mwmaint1002 - attempt to start mediawiki_job_translationnotifications-mediawikiwiki which was alerting as failed
* 17:21 ppchelko@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enable ParserCache logger for warn+, gerrit:635071 (duration: 01m 06s)
* 20:41 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 17:13 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 20:38 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 17:13 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 19:46 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 16:58 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 19:42 pt1979@cumin2002: START - Cookbook sre.dns.netbox
* 16:57 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 19:29 mutante: LDAP: removed non-existent user gerrit2 from group labsadminbots ([[phab:T160122|T160122]])
* 16:57 mutante: scandium - disabling puppet so that Parsoid team can make some tests on testreduce1001 today
* 19:29 urbanecm@deploy1002: Synchronized php-1.38.0-wmf.4/extensions/MediaSearch/resources/store/state.js: {{Gerrit|ac7b4fc2ccc69589e00a42f49d18a8f6d71777f2}}: Revert 727328 ([[phab:T293554|T293554]]) (duration: 00m 56s)
* 16:46 effie: restart php-fpm and pool mw2252 and mw2328
* 19:29 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 15:58 Lucas_WMDE: Deployed patch for [[phab:T260349|T260349]]
* 19:26 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 15:34 ppchelko@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
* 19:12 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 15:33 ppchelko@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
* 19:09 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 15:31 ppchelko@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
* 18:45 otto@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Convert $wgEventStreams to be an associative array - [[phab:T277193|T277193]] (duration: 00m 57s)
* 15:28 moritzm: updating prometheus-openldap-exporter to 0+git20171128-3 to buster-wikimedia
* 18:45 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 15:23 jbond42: upgrade puppetlabs-stdlib to 6.5.0 https://gerrit.wikimedia.org/r/c/operations/puppet/+/634278
* 18:42 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 15:08 moritzm: imported prometheus-openldap-exporter 0+git20171128-3 to buster-wikimedia [[phab:T264388|T264388]]
* 18:07 mutante: gerrit - removed tonina from wmde-mediawiki gerrit group ([[phab:T293621|T293621]])
* 15:02 otto@deploy1001: Finished deploy [analytics/refinery@e4d16f0] (hadoop-test): deploying with updated camus to test cluster (duration: 02m 56s)
* 17:51 mutante: puppet run on all bastion hosts via cumin
* 15:01 crusnov@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 15:32 mvernon@cumin2002: END (FAIL) - Cookbook sre.discovery.service-route (exit_code=99)
* 15:00 otto@deploy1001: Started deploy [analytics/refinery@e4d16f0] (hadoop-test): deploying with updated camus to test cluster
* 15:32 mvernon@cumin2002: START - Cookbook sre.discovery.service-route
* 14:56 crusnov@cumin1001: START - Cookbook sre.dns.netbox
* 15:23 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 18:00:00 on 7 hosts with reason: Schema change s3 [[phab:T281058|T281058]]
* 14:44 reedy@deploy1001: Synchronized wmf-config/wikitech.php: Set CURLOPT_RETURNTRANSFER true in gerrit handler [[phab:T242554|T242554]] (duration: 01m 07s)
* 15:23 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 18:00:00 on 7 hosts with reason: Schema change s3 [[phab:T281058|T281058]]
* 14:34 dcausse: restarting blazegraph on codfw servers ([[phab:T263952|T263952]])
* 15:16 herron: reprepro copied anycast-healthchecker, python3-json-logger and python3-anycast-healthchecker from buster-wikimedia to bullseye-wikimedia [[phab:T292196|T292196]]
* 13:21 moritzm: pooling ldap-replica2003 [[phab:T264388|T264388]]
* 15:16 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on 13 hosts with reason: Schema change s4 [[phab:T281058|T281058]]
* 13:04 liw@deploy1001: Synchronized php: group1 wikis to 1.36.0-wmf.14 (duration: 01m 04s)
* 15:16 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on 13 hosts with reason: Schema change s4 [[phab:T281058|T281058]]
* 13:03 liw@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.36.0-wmf.14
* 14:59 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 11 hosts with reason: Schema change s7 [[phab:T281058|T281058]]
* 11:40 matthiasmullie: EU B&C done
* 14:59 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 11 hosts with reason: Schema change s7 [[phab:T281058|T281058]]
* 11:33 mlitn@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [WikibaseMediaInfo] Add config for related terms API (duration: 01m 04s)
* 14:54 herron: rebuilt and uploaded kafkatee for bullseye [[phab:T292196|T292196]]
* 11:17 urbanecm@deploy1001: Synchronized wmf-config/CommonSettings.php: {{Gerrit|785404fa2b998947d236aebe481ee1abcbd14220}}: Disable registrations stat on Special:TranslationStats ([[phab:T264158|T264158]]) (duration: 01m 05s)
* 14:50 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 11:10 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|11567427c3f7d2908b29046ee56a7b0c0da32c09}}: Enable ContentTranslation in 5 Wikipedias as a default tool ([[phab:T264737|T264737]]; [[phab:T264738|T264738]]; [[phab:T264739|T264739]]; [[phab:T264740|T264740]]; [[phab:T264741|T264741]]) (duration: 01m 30s)
* 14:45 pt1979@cumin2002: START - Cookbook sre.dns.netbox
* 11:00 marostegui: Upgrade db2093's mariadb version [[phab:T266003|T266003]]
* 14:36 phuedx@deploy1002: Synchronized wmf-config/InitialiseSettings-labs.php: Config: [[gerrit:731346{{!}}[beta] Rename $wgIPInfoGeoIP2Path to $wgIPInfoGeoIP2Prefix (T289361)]] (duration: 00m 56s)
* 10:58 aborrero@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 14:36 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 10:56 aborrero@cumin2001: START - Cookbook sre.hosts.downtime
* 14:33 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 10:38 Urbanecm: Start of `mwscript extensions/AbuseFilter/maintenance/updateVarDumps.php --wiki=$wiki --print-orphaned-records-to=/tmp/urbanecm/$wiki-orphaned.log --progress-markers > $wiki.log` (wiki=rowiki; [[phab:T246539|T246539]])
* 14:15 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 10:37 Urbanecm: End of `mwscript extensions/AbuseFilter/maintenance/updateVarDumps.php --wiki=$wiki --print-orphaned-records-to=/tmp/urbanecm/$wiki-orphaned.log --progress-markers > $wiki.log` (wiki=srwiki; [[phab:T246539|T246539]])
* 14:09 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 10:01 Urbanecm: Start of `mwscript extensions/AbuseFilter/maintenance/updateVarDumps.php --wiki=$wiki --print-orphaned-records-to=/tmp/urbanecm/$wiki-orphaned.log --progress-markers > $wiki.log` (wiki=srwiki; [[phab:T246539|T246539]])
* 13:54 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 10:00 Urbanecm: End of `mwscript extensions/AbuseFilter/maintenance/updateVarDumps.php --wiki=$wiki --print-orphaned-records-to=/tmp/urbanecm/$wiki-orphaned.log --progress-markers > $wiki.log` (wiki=nowiki; [[phab:T246539|T246539]])
* 13:51 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 09:59 vgutierrez: Bump ECDHE-ECDSA-AES128-SHA pageview replacement to 100% - [[phab:T258405|T258405]]
* 13:48 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings-labs.php: Config: [[gerrit:731015{{!}}Remove wmg variables for dispatch via jobs (T291828)]] (2/2) (duration: 00m 56s)
* 09:42 Urbanecm: Start of `mwscript extensions/AbuseFilter/maintenance/updateVarDumps.php --wiki=$wiki --print-orphaned-records-to=/tmp/urbanecm/$wiki-orphaned.log --progress-markers > $wiki.log` (wiki=nowiki; [[phab:T246539|T246539]])
* 13:47 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:731015{{!}}Remove wmg variables for dispatch via jobs (T291828)]] (1/2) (duration: 00m 56s)
* 09:42 Urbanecm: End of `mwscript extensions/AbuseFilter/maintenance/updateVarDumps.php --wiki=$wiki --print-orphaned-records-to=/tmp/urbanecm/$wiki-orphaned.log --progress-markers > $wiki.log` (wiki=shwiki; [[phab:T246539|T246539]])
* 13:37 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 09:38 Urbanecm: Start of `mwscript extensions/AbuseFilter/maintenance/updateVarDumps.php --wiki=$wiki --print-orphaned-records-to=/tmp/urbanecm/$wiki-orphaned.log --progress-markers > $wiki.log` (wiki=shwiki; [[phab:T246539|T246539]])
* 13:35 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/Wikibase.php: Config: [[gerrit:731014{{!}}Unconditionally enable Wikibase dispatching via jobs (T291828)]] (duration: 00m 56s)
* 09:37 Urbanecm: mwscript extensions/AbuseFilter/maintenance/updateVarDumps.php --wiki=$wiki --print-orphaned-records-to=/tmp/urbanecm/$wiki-orphaned.log --progress-markers > $wiki.log # wiki=warwiki; [[phab:T246539|T246539]]
* 13:34 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 09:30 Urbanecm: End of `mwscript extensions/AbuseFilter/maintenance/updateVarDumps.php --wiki=$wiki --print-orphaned-records-to=/tmp/urbanecm/$wiki-orphaned.log --progress-markers > $wiki.log` (wiki=viwiki; [[phab:T246539|T246539]])
* 12:22 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2079.codfw.wmnet with OS buster
* 09:23 aborrero@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 12:04 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 09:22 root@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
* 12:02 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 09:21 aborrero@cumin2001: START - Cookbook sre.hosts.downtime
* 11:55 Lucas_WMDE: UTC morning backport window done
* 08:52 Urbanecm: Start of `mwscript extensions/AbuseFilter/maintenance/updateVarDumps.php --wiki=$wiki --print-orphaned-records-to=/tmp/urbanecm/$wiki-orphaned.log --progress-markers > $wiki.log` (wiki=viwiki; [[phab:T246539|T246539]])
* 11:55 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings-labs.php: Config: [[gerrit:730748{{!}}Remove $wmgWikibaseDispatchViaJobsAllowedClients (T291828)]] (2/2) (duration: 00m 56s)
* 08:50 Urbanecm: mwscript extensions/AbuseFilter/maintenance/updateVarDumps.php --wiki=$wiki --print-orphaned-records-to=/tmp/urbanecm/$wiki-orphaned.log --progress-markers > $wiki.log # wiki=cebwiki; [[phab:T246539|T246539]]
* 11:54 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:730748{{!}}Remove $wmgWikibaseDispatchViaJobsAllowedClients (T291828)]] (1/2) (duration: 00m 56s)
* 08:46 Urbanecm: [urbanecm@mwmaint2001 ~/updateVarDumps/output/group2-medium/output]$ mwscript extensions/AbuseFilter/maintenance/updateVarDumps.php --wiki=apiportalwiki # [[phab:T246539|T246539]]
* 11:53 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 08:38 root@cumin1001: START - Cookbook sre.ganeti.makevm
* 11:51 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host db2079.codfw.wmnet with OS buster
* 08:38 root@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99)
* 11:50 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 08:38 root@cumin1001: START - Cookbook sre.ganeti.makevm
* 11:49 marostegui: Reimage db2079 (codfw s8 master) [[phab:T290868|T290868]]
* 08:33 godog: swift codfw-prod: bump object weight for ms-be2057 - [[phab:T261633|T261633]]
* 11:48 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/Wikibase.php: Config: [[gerrit:730747{{!}}Set dispatchViaJobsAllowedClients to null everywhere (T291828)]] (duration: 00m 56s)
* 08:10 XioNoX: Upgrade Routinator 3000 to 0.8.0 on rpki1001 - [[phab:T266001|T266001]]
* 11:41 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 08:09 XioNoX: add Routinator 3000 0.8.0 to apt - [[phab:T266001|T266001]]
* 11:38 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 07:58 elukey: update analytics-in4 filter on cr1/cr2-eqiad for https://gerrit.wikimedia.org/r/635319
* 11:37 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.38.0-wmf.4/extensions/Wikibase/repo/includes/ChangeModification/DispatchChangesJob.php: Backport: [[gerrit:731239{{!}}Make deduplication actually work for DispatchChangesJob (T291118)]] (duration: 00m 55s)
* 04:35 ryankemper: re-enabled icinga notifications on all wdqs hosts now that `wdqs-updater` is healthy
* 11:10 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.38.0-wmf.4/extensions/Wikibase/repo/includes/Hooks/RecentChangeSaveHookHandler.php: Backport: [[gerrit:731238{{!}}Create DispatchChangesJob without change id (T291118)]] (2/2) (duration: 00m 56s)
* 11:09 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 11:09 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.38.0-wmf.4/extensions/Wikibase/repo/includes/ChangeModification/DispatchChangesJob.php: Backport: [[gerrit:731238{{!}}Create DispatchChangesJob without change id (T291118)]] (duration: 00m 56s)
* 11:07 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 10:55 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 10:51 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 10:47 moritzm: copied wmf-certificates from buster-wikimedia to stretch-wikimedia in reprepro
* 10:38 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.38.0-wmf.4/extensions/Wikibase/repo/: Backport: [[gerrit:731237{{!}}Don't filter by change Id when dispatching to client wikis ()]] (duration: 00m 59s)
* 09:48 moritzm: installing node-tar security updates on buster
* 09:39 vgutierrez: updating acme-chief to version 0.34 on acmechief instances - [[phab:T292619|T292619]]
* 09:38 godog: sync metrics from graphite1004 to graphite2003 - [[phab:T247963|T247963]]
* 09:13 moritzm: installing apr security updates on bullseye
* 08:57 godog: cleanup graphite metrics not modified for >= ~3yr (1024 days)
* 07:34 ema: cp3060 (text), cp3061 (upload): upgrade varnish to 6.0.8 [[phab:T292290|T292290]]
* 07:34 elukey: depool + restart blazegraph on wdqs1013
* 07:01 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 06:31 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 06:09 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .


== 2020-10-20 ==
== 2021-10-16 ==
* 22:10 dwisehaupt: frmon2001 upgraded to buster with grafana 7.2.1
* 03:56 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 21:19 razzi@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
* 02:19 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 21:18 cdanis: ✔️ cdanis@mw2252.codfw.wmnet ~ 🕠🍺 sudo depool
* 01:30 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 20:57 mforns@deploy1001: Finished deploy [analytics/refinery@e4d16f0] (thin): Regular analytics weekly train THIN [analytics/refinery@e4d16f08a96b6f65447fcdc6c9e8945724a89f54] (duration: 00m 08s)
* 20:56 mforns@deploy1001: Started deploy [analytics/refinery@e4d16f0] (thin): Regular analytics weekly train THIN [analytics/refinery@e4d16f08a96b6f65447fcdc6c9e8945724a89f54]
* 20:39 cdanis: doing some manual testing on mw2221, depooled and puppet disabled
* 20:33 mforns@deploy1001: Finished deploy [analytics/refinery@e4d16f0]: Regular analytics weekly train [analytics/refinery@e4d16f08a96b6f65447fcdc6c9e8945724a89f54] (duration: 08m 10s)
* 20:31 ryankemper: [Temporarily] disabled notifications for all wdqs hosts while we figure out how to unstick the updater process. Impact is that new updates will be delayed, but queries will still keep serving as normal, so fixing this is a priority but note that there's no availability outage
* 20:29 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 20:25 mforns@deploy1001: Started deploy [analytics/refinery@e4d16f0]: Regular analytics weekly train [analytics/refinery@e4d16f08a96b6f65447fcdc6c9e8945724a89f54]
* 20:19 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
* 20:18 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 20:06 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
* 19:59 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 19:47 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
* 19:47 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
* 19:47 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
* 19:45 dzahn@cumin1001: conftool action : set/pooled=yes; selector: dc=codfw,cluster=parsoid,service=canary
* 19:24 razzi@cumin1001: START - Cookbook sre.ganeti.makevm
* 18:58 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 18:56 andrew@cumin1001: START - Cookbook sre.hosts.downtime
* 17:48 effie: depooling mw2328 - [[phab:T266052|T266052]]
* 17:37 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 17:35 andrew@cumin1001: START - Cookbook sre.hosts.downtime
* 15:54 ebernhardson@deploy1001: Finished deploy [wikimedia/discovery/analytics@629e8bc]: search satisfaction: remove unused y/m/d cli args (duration: 01m 31s)
* 15:52 ebernhardson@deploy1001: Started deploy [wikimedia/discovery/analytics@629e8bc]: search satisfaction: remove unused y/m/d cli args
* 15:15 aborrero@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 15:13 aborrero@cumin2001: START - Cookbook sre.hosts.downtime
* 14:58 urbanecm@deploy1001: Synchronized php-1.36.0-wmf.13/extensions/AbuseFilter/includes/Views/AbuseFilterViewList.php: {{Gerrit|fee2d3be13ae14d7ea51ff2db42090a1c27819bf}}: Prevent uncaught warnings/exception on Special:AbuseFilter ([[phab:T265994|T265994]]) (duration: 01m 03s)
* 14:56 urbanecm@deploy1001: Synchronized php-1.36.0-wmf.14/extensions/AbuseFilter/includes/Views/AbuseFilterViewList.php: {{Gerrit|00ef00f59fd2a7a1366161ccc66c260be20e3e50}}: Prevent uncaught warnings/exception on Special:AbuseFilter ([[phab:T265994|T265994]]) (duration: 01m 01s)
* 14:48 urbanecm@deploy1001: Synchronized php-1.36.0-wmf.14/extensions/FileImporter/: {{Gerrit|5eee9b773338e5181867cabec9faefbdeacf67ca}}: Set originalRequest (incl. X-Forwarded-For) for remote edits ([[phab:T265810|T265810]]) (duration: 01m 06s)
* 14:16 urbanecm@deploy1001: Synchronized php-1.36.0-wmf.13/extensions/FileImporter/: {{Gerrit|5f8d3de14c116b618f5226419082d5c9a07766fb}}: Set originalRequest (incl. X-Forwarded-For) for remote edits ([[phab:T265810|T265810]]) (duration: 01m 09s)
* 14:15 Urbanecm: [urbanecm@deploy1001 /srv/mediawiki-staging (master u=)]$ sudo /usr/local/sbin/fix-staging-perms
* 13:54 marostegui@cumin1001: dbctl commit (dc=all): 'db2125 (re)pooling @ 100%: Slowly repool db2125 after checking tables ', diff saved to https://phabricator.wikimedia.org/P13033 and previous config saved to /var/cache/conftool/dbconfig/20201020-135436-root.json
* 13:39 marostegui@cumin1001: dbctl commit (dc=all): 'db2125 (re)pooling @ 80%: Slowly repool db2125 after checking tables ', diff saved to https://phabricator.wikimedia.org/P13032 and previous config saved to /var/cache/conftool/dbconfig/20201020-133933-root.json
* 13:24 marostegui@cumin1001: dbctl commit (dc=all): 'db2125 (re)pooling @ 60%: Slowly repool db2125 after checking tables ', diff saved to https://phabricator.wikimedia.org/P13031 and previous config saved to /var/cache/conftool/dbconfig/20201020-132430-root.json
* 13:19 XioNoX: install routinator 3000 0.8.0 on rpki2001 - [[phab:T266001|T266001]]
* 13:16 liw@deploy1001: rebuilt and synchronized wikiversions files: group0 wikis to 1.36.0-wmf.14
* 13:11 liw@deploy1001: Finished scap: testwikis wikis to 1.36.0-wmf.14 (duration: 58m 03s)
* 13:09 marostegui@cumin1001: dbctl commit (dc=all): 'db2125 (re)pooling @ 40%: Slowly repool db2125 after checking tables ', diff saved to https://phabricator.wikimedia.org/P13030 and previous config saved to /var/cache/conftool/dbconfig/20201020-130926-root.json
* 12:54 marostegui@cumin1001: dbctl commit (dc=all): 'db2125 (re)pooling @ 20%: Slowly repool db2125 after checking tables ', diff saved to https://phabricator.wikimedia.org/P13029 and previous config saved to /var/cache/conftool/dbconfig/20201020-125423-root.json
* 12:25 jayme@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventstreams' for release 'production' .
* 12:25 jayme@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventstreams' for release 'canary' .
* 12:24 jayme@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventstreams' for release 'production' .
* 12:24 jayme@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventstreams' for release 'canary' .
* 12:16 jayme@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventstreams' for release 'production' .
* 12:16 jayme@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventstreams' for release 'canary' .
* 12:15 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventstreams' for release 'production' .
* 12:13 liw@deploy1001: Started scap: testwikis wikis to 1.36.0-wmf.14
* 11:37 liw: 1.36.0-wmf.14 was branched at {{Gerrit|1b7b5f716015f9303d37158820dadf759e8db707}} for [[phab:T263180|T263180]]
* 11:35 Lucas_WMDE: EU backport/config window done
* 11:35 lucaswerkmeister-wmde@deploy1001: Synchronized php-1.36.0-wmf.13/extensions/WikimediaEvents/: Backport: [[gerrit:635030{{!}}SearchSatisfaction: Set isAnon field (T259250)]] (duration: 00m 57s)
* 11:15 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:634039{{!}}Set Wikidata MF to collapse sections by default (T239195)]] (duration: 00m 56s)
* 11:09 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:634938{{!}}Remove noratelimit from Wikidata bot group (T258354)]] (duration: 00m 56s)
* 10:09 jayme@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'canary' .
* 10:09 jayme@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
* 10:04 godog: swift codfw-prod: bump object weight for ms-be2057 - [[phab:T261633|T261633]]
* 09:59 dcausse: [[phab:T255399|T255399]]: resuming wdqs-data-reload manually from chunk no 776 on wdqs1009
* 09:51 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 09:51 klausman@cumin1001: START - Cookbook sre.hosts.downtime
* 09:50 jayme@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
* 09:50 jayme@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'canary' .
* 09:47 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
* 09:25 jayme@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
* 09:25 jayme@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'canary' .
* 09:08 jayme@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'canary' .
* 09:08 jayme@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
* 09:06 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .


== 2020-10-19 ==
== 2021-10-15 ==
* 23:57 gehel@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99)
* 23:48 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 23:57 gehel@cumin1001: START - Cookbook sre.wdqs.data-reload
* 23:27 dzahn@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'miscweb' for release 'main' .
* 23:57 gehel@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99)
* 23:23 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 23:57 gehel@cumin1001: START - Cookbook sre.wdqs.data-reload
* 22:38 mutante: apt1001 - removing nginx package, accidentally installed, should just be nginx-light of course, running puppet
* 23:56 gehel@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99)
* 22:36 mutante: apt2001 - removing nginx package, accidentally installed, should just be nginx-light of course, running puppet
* 23:11 ebernhardson@deploy1001: Finished deploy [wikimedia/discovery/analytics@4bfd6c9]: spark: case insensitive schema validation (duration: 04m 33s)
* 22:34 mutante: apt2001 - upgraded nginx
* 23:07 ebernhardson@deploy1001: Started deploy [wikimedia/discovery/analytics@4bfd6c9]: spark: case insensitive schema validation
* 22:18 dzahn@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'miscweb' for release 'main' .
* 23:02 mutante: etherpad got restarted with new config options related to rate limiting - hopefully this fixed [[phab:T265490|T265490]]
* 22:14 dzahn@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'miscweb' for release 'main' .
* 21:29 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 22:05 dpifke@deploy1002: Finished deploy [performance/arc-lamp@40cb764]: Revert problematic arclamp patch to fix daemon crashes (duration: 00m 05s)
* 21:20 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
* 22:05 dpifke@deploy1002: Started deploy [performance/arc-lamp@40cb764]: Revert problematic arclamp patch to fix daemon crashes
* 21:19 ebernhardson@deploy1001: Finished deploy [wikimedia/discovery/analytics@94c23a1]: airflow: fix column mismatch writing page predictions (duration: 04m 48s)
* 21:51 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 21:14 ebernhardson@deploy1001: Started deploy [wikimedia/discovery/analytics@94c23a1]: airflow: fix column mismatch writing page predictions
* 21:44 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 21:01 ppchelko@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
* 21:44 dzahn@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'miscweb' for release 'main' .
* 20:57 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 21:36 dzahn@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'miscweb' for release 'main' .
* 20:55 ppchelko@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
* 20:09 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 20:49 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
* 18:44 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 20:46 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 17:20 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 20:41 eileen: drush vset match_on_import 1
* 17:17 mutante: gitlab1001 - disabling puppet for debugging
* 20:38 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
* 17:05 mutante: gitlab2001 - temp stopped puppet - debugging gitlab restore script with Arnold - [[phab:T283076|T283076]]
* 20:21 otto@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventstreams' for release 'production' .
* 17:01 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 20:21 otto@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventstreams' for release 'canary' .
* 16:50 mutante: gitlab2001 - temp stopped puppet - debugging gitlab restore script with Arnold
* 20:19 otto@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventstreams' for release 'canary' .
* 16:46 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 20:19 otto@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventstreams' for release 'production' .
* 16:44 jayme@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'shellbox-constraints' for release 'main' .
* 20:18 otto@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventstreams' for release 'production' .
* 15:23 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 20:18 otto@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventstreams' for release 'canary' .
* 15:23 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 20:17 jgiannelos@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'mobileapps' for release 'nontls' .
* 15:08 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 20:17 jgiannelos@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
* 15:08 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 20:16 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: dc=codfw,name=wtp2020.codfw.wmnet
* 14:48 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 20:16 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 14:31 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 20:16 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 14:15 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 20:16 ebernhardson@deploy1001: Finished deploy [wikimedia/discovery/analytics@e66bec2]: Fix column mismatch when reading discovery.wikibase_item (duration: 01m 03s)
* 13:32 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 20:16 mutante: decom'ing wtp201[0-9].codfw.wmnet (pooled=inactive) [[phab:T265558|T265558]]
* 13:32 ryankemper@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99)
* 20:16 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 13:32 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 20:15 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 13:30 elukey: start topic rebalancing for kafka main-eqiad (long maintenance, it will last a couple of days)
* 20:15 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: dc=codfw,name=wtp201[0-9].codfw.wmnet
* 13:24 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 20:15 ebernhardson@deploy1001: Started deploy [wikimedia/discovery/analytics@e66bec2]: Fix column mismatch when reading discovery.wikibase_item
* 13:21 vgutierrez: updating acme-chief to version 0.34 on acmechief-test instances - [[phab:T292619|T292619]]
* 20:09 dzahn@cumin1001: conftool action : set/weight=1; selector: dc=codfw,cluster=parsoid,service=canary
* 13:19 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 20:08 jgiannelos@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
* 13:14 vgutierrez: upload acme-chief 0.34 to apt.wikimedia.org (buster) - [[phab:T292619|T292619]]
* 20:08 jgiannelos@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'mobileapps' for release 'nontls' .
* 11:55 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 20:06 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 11:49 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 20:06 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 11:48 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host testvm2007.codfw.wmnet
* 20:01 mutante: decom'ing wtp200[1-9].codfw.wmnet (pooled=inactive) [[phab:T265558|T265558]]
* 11:45 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 20:00 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: dc=codfw,name=wtp200[1-9].codfw.wmnet
* 11:33 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 19:57 jgiannelos@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'mobileapps' for release 'staging' .
* 11:24 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host testvm2007.codfw.wmnet
* 19:57 otto@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'canary' .
* 11:14 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 19:57 otto@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
* 10:46 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 19:52 otto@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'canary' .
* 09:15 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 19:52 otto@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
* 09:06 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 19:51 otto@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'canary' .
* 08:58 jelto: jelto@gitlab1001:~$ sudo disable-puppet "disable puppet on gitlab1001 to test 728380 on GitLab replica - [[phab:T283076|T283076]]"
* 19:45 ebernhardson@deploy1001: Finished deploy [wikimedia/discovery/analytics@3c590e2]: Fix column mismatch for discovery.wikibase_item and multilist handler for esbulk uploads (duration: 03m 35s)
* 07:41 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 19:41 ebernhardson@deploy1001: Started deploy [wikimedia/discovery/analytics@3c590e2]: Fix column mismatch for discovery.wikibase_item and multilist handler for esbulk uploads
* 06:20 urbanecm: Start server-side upload for 1 video file
* 19:35 otto@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'canary' .
* 02:14 ryankemper: [[phab:T288231|T288231]] `wdqs2006` data transfer complete and all tests passing on the host. All of `codfw wdqs-internal` is on the new streaming updater
* 19:34 otto@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'canary' .
* 00:09 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 19:33 mutante: wtp2001 - sudo confctl decommission
* 00:07 brennen: end of UTC late backport & config training window
* 19:29 dzahn@cumin1001: conftool action : set/weight=0; selector: dc=codfw,cluster=parsoid,service=canary
* 19:01 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: GrowthExperiments: Set default variant to D on trwiki ([[phab:T243445|T243445]], [[phab:T265556|T265556]]) (duration: 00m 56s)
* 18:37 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|18902aa75efafb7d56ca347c12781dbe59f2f8ad}}: Change votewiki language temporarily to fa for fawiki elections ([[phab:T262689|T262689]]) (duration: 00m 56s)
* 18:31 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enable and configure GrowthExperiments on trwiki ([[phab:T243445|T243445]]) (duration: 00m 57s)
* 18:29 tzatziki: removing 10 files for legal compliance
* 18:24 catrope@deploy1001: Synchronized php-1.36.0-wmf.13/extensions/MobileFrontend/: Fix mobile diff redirect when curid parameter is present ([[phab:T265654|T265654]]) (duration: 00m 58s)
* 18:20 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: GrowthExperiments: Enable variant C/D for new users ([[phab:T265556|T265556]]) (duration: 00m 56s)
* 18:10 catrope@deploy1001: Synchronized wmf-config/CommonSettings.php: Drop wgHiddenPrefs hack for VE beta feature ([[phab:T254349|T254349]]) (duration: 00m 56s)
* 17:53 robh@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 16:44 robh@cumin1001: START - Cookbook sre.dns.netbox
* 16:18 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 16:16 robh@cumin1001: START - Cookbook sre.hosts.downtime
* 15:59 Urbanecm: mwscript extensions/CirrusSearch/maintenance/UpdateSearchIndexConfig.php --wiki=smnwiki --cluster=all
* 15:31 elukey: update puppet compilers' facts
* 14:36 bpirkle@deploy1001: Synchronized wmf-config/CommonSettings.php: gerrit:634841 Add api.wikimedia.org to the list of allowed CORS origins (duration: 00m 57s)
* 14:32 bpirkle@deploy1001: Synchronized wmf-config/InitialiseSettings-labs.php: gerrit:634356 Configuration for user menu and sidebar special pages (duration: 00m 55s)
* 14:30 bpirkle@deploy1001: Synchronized wmf-config/InitialiseSettings.php: gerrit:634356 Configuration for user menu and sidebar special pages (duration: 00m 56s)
* 14:15 moritzm: installing llvm-toolchain-7 bugfix updates from Buster point release
* 13:34 Urbanecm: Start of `[urbanecm@mwmaint2001 ~/updateVarDumps/output/group2-medium]$ while read wiki; do echo "Processing $wiki"; mwscript extensions/AbuseFilter/maintenance/updateVarDumps.php --wiki=$wiki --print-orphaned-records-to=/tmp/urbanecm/$wiki-orphaned.log --progress-markers > output/$wiki.log; done < wikis.dblist` ([[phab:T246539|T246539]]; wikis.dblist is medium wikis from group2.dblist)
* 13:33 mholloway-shell@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
* 13:32 mholloway-shell@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
* 13:31 mholloway-shell@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'wikifeeds' for release 'staging' .
* 13:26 moritzm: import prometheus-openldap-exporter 0+git20171128-2+deb10u1  for buster-wikimedia  [[phab:T264388|T264388]]
* 12:48 moritzm: installing httpcomponents-client security updates on Buster
* 12:26 Urbanecm: Creation of smnwiki is done ([[phab:T264859|T264859]])
* 12:25 urbanecm@deploy1001: Synchronized wmf-config/interwiki.php: Update interwiki cache (duration: 00m 56s)
* 12:22 urbanecm@deploy1001: Synchronized langlist: Creating smnwiki ([[phab:T264859|T264859]]) (duration: 00m 56s)
* 12:16 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Creating smnwiki ([[phab:T264859|T264859]]) (duration: 00m 55s)
* 12:16 marostegui: Sanitize smnwiki on db1124:3315 and db2094:3315 - [[phab:T264900|T264900]]
* 12:15 urbanecm@deploy1001: Synchronized static/images/project-logos/: Creating smnwiki ([[phab:T264859|T264859]]) (duration: 00m 56s)
* 12:15 marostegui: Deploy schema change on smnwiki [[phab:T265321|T265321]] [[phab:T264900|T264900]]
* 12:14 urbanecm@deploy1001: rebuilt and synchronized wikiversions files: Creating smnwiki ([[phab:T264859|T264859]])
* 12:12 urbanecm@deploy1001: Synchronized dblists: Creating smnwiki ([[phab:T264859|T264859]]) (duration: 00m 55s)
* 12:11 urbanecm@deploy1001: Synchronized wmf-config/db-codfw.php: Creating smnwiki ([[phab:T264859|T264859]]) (duration: 00m 55s)
* 12:10 urbanecm@deploy1001: Synchronized wmf-config/db-eqiad.php: Creating smnwiki ([[phab:T264859|T264859]]) (duration: 00m 56s)
* 11:51 moritzm: updating idp-test1001 to CAS 6.2.4
* 11:46 moritzm: updating idp-test2001 to CAS 6.2.4
* 11:43 Urbanecm: End of `[urbanecm@mwmaint2001 ~/updateVarDumps/script]$ while read wiki; do echo "Processing $wiki"; mwscript extensions/AbuseFilter/maintenance/updateVarDumps.php --wiki=$wiki --print-orphaned-records-to=/tmp/urbanecm/$wiki-orphaned.log --progress-markers > $wiki.log; done < ../small-group2.dblist` # [[phab:T246539|T246539]] # small-group2.dblist is wikis from small.dblist that are also in group2.dblist
* 11:42 Urbanecm: End of `mwscript extensions/AbuseFilter/maintenance/updateVarDumps.php --wiki=enwikisource --print-orphaned-records-to=/tmp/urbanecm/enwikisource-orphaned.log --progress-markers` ([[phab:T246539|T246539]])
* 11:40 Urbanecm: [urbanecm@mwmaint2001 ~/updateVarDumps/script]$ while read wiki; do echo "Processing $wiki"; mwscript extensions/AbuseFilter/maintenance/updateVarDumps.php --wiki=$wiki --print-orphaned-records-to=/tmp/urbanecm/$wiki-orphaned.log --progress-markers > $wiki.log; done < ../small-group2.dblist # [[phab:T246539|T246539]] # small-group2.dblist is wikis from small.dblist that are also in group2.dblist
* 11:31 jmm@cumin2001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1)
* 11:24 Urbanecm: EU B&C window done
* 11:24 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|ce92c9814bf9c12cab1a9592dfb32f935d255d93}}: Restore bureaucrat abilities at uzwiki ([[phab:T265746|T265746]]) (duration: 00m 56s)
* 11:20 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|26b97261f2b9d1991ea08fe32b6007ba6fe5088f}}: Disable EditorJourney (UnderstandingFirstDay) ([[phab:T252391|T252391]]) (duration: 01m 10s)
* 11:17 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
* 11:13 Urbanecm: Manually run `mwscript extensions/AbuseFilter/maintenance/updateVarDumps.php --wiki=$wiki --print-orphaned-records-to=/tmp/urbanecm/$wiki-orphaned.log --progress-markers > $wiki.log` for several small group2 wikis ([[phab:T246539|T246539]])
* 10:57 Urbanecm: Start `mwscript extensions/AbuseFilter/maintenance/updateVarDumps.php --wiki=enwikisource --print-orphaned-records-to=/tmp/urbanecm/enwikisource-orphaned.log --progress-markers` in a tmux session named updateVarDumps at mwmaint2001 ([[phab:T246539|T246539]])
* 10:53 Urbanecm: [urbanecm@mwmaint2001 ~/updateVarDumps/script]$  mwscript extensions/AbuseFilter/maintenance/updateVarDumps.php --wiki=jawikivoyage --print-orphaned-records-to=- --progress-markers # [[phab:T246539|T246539]]
* 09:09 gehel@cumin2001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 08:40 jayme: updated helm to 2.16.12-1 on deploy*,chartmuseum*,contint*
* 08:37 godog: upgrade rsyslog to 8.2008.0-1~bpo10+1 on centrallog2001 - [[phab:T259780|T259780]]
* 08:31 godog: swift codfw-prod: bump object weight for ms-be2057 - [[phab:T261633|T261633]]
* 08:26 jayme: updated helm to 2.16.12-1 on deploy2001
* 08:24 jayme: imported helm 2.16.12-1 to buster-wikimedia stretch-wikimedia jessie-wikimedia - [[phab:T263616|T263616]]
* 08:01 godog: re-enable compaction for prometheus[12]003 - [[phab:T261281|T261281]]
* 07:53 gehel@cumin2001: START - Cookbook sre.wdqs.data-transfer
* 07:36 elukey@cumin1001: END (FAIL) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=99)
* 07:36 elukey@cumin1001: START - Cookbook sre.hadoop.init-hadoop-workers
* 07:16 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2125 ', diff saved to https://phabricator.wikimedia.org/P13022 and previous config saved to /var/cache/conftool/dbconfig/20201019-071614-marostegui.json
* 06:46 elukey@deploy1001: Finished deploy [analytics/turnilo/deploy@334627e]: Upgrade to 1.27 (duration: 00m 10s)
* 06:45 elukey@deploy1001: Started deploy [analytics/turnilo/deploy@334627e]: Upgrade to 1.27


== 2020-10-17 ==
== 2021-10-14 ==
* 13:22 Urbanecm: [urbanecm@mwmaint2001 ~/uploads]$ mwscript importImages.php --wiki=commonswiki --comment-ext=txt --user=. # [[phab:T264529|T264529]]
* 23:59 cjming@deploy1002: Synchronized wmf-config/logos.php: Config: [[gerrit:730737{{!}}Change Kashmiri Wikipedia logo (T293342)]] (duration: 00m 55s)
* 23:58 cjming@deploy1002: Synchronized logos/config.yaml: Config: [[gerrit:730737{{!}}Change Kashmiri Wikipedia logo (T293342)]] (duration: 00m 55s)
* 23:56 cjming@deploy1002: Synchronized static/images/project-logos: Config: [[gerrit:730737{{!}}Change Kashmiri Wikipedia logo (T293342)]] (duration: 00m 56s)
* 23:49 cjming@deploy1002: Synchronized wmf-config/logos.php: Config: [[gerrit:730736{{!}}Change Kashmiri Wiktionary logo (T293373)]] (duration: 00m 55s)
* 23:48 cjming@deploy1002: Synchronized logos/config.yaml: Config: [[gerrit:730736{{!}}Change Kashmiri Wiktionary logo (T293373)]] (duration: 00m 55s)
* 23:46 cjming@deploy1002: Synchronized static/images/project-logos: Config: [[gerrit:730736{{!}}Change Kashmiri Wiktionary logo (T293373)]] (duration: 00m 56s)
* 23:43 ejegg: updated payments-wiki from {{Gerrit|19d18c1852}} to {{Gerrit|0f48acea49}}
* 23:34 cjming@deploy1002: Synchronized php-1.38.0-wmf.4/extensions/WikimediaEvents/includes/VectorPrefDiffInstrumentation.php: Backport: [[gerrit:730733{{!}}Change VectorPrefDiffInstrumentation stream name to `mediawiki.skin_diff` (T289622)]] (duration: 00m 56s)
* 23:24 cjming@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:730936{{!}}allow sysops to add and remove users to other groups on ptwikivoyage (T292806)]] (duration: 00m 56s)
* 23:21 ryankemper@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) restart without plugin upgrade (1 nodes at a time) for ElasticSearch cluster cloudelastic: cloudelastic restart - ryankemper@cumin1001 - [[phab:T292814|T292814]]
* 23:11 brennen@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:730933{{!}}Add americanantiquarian.org to the wgCopyUploadsDomains allowlist of Wikimedia Commons (T292918)]] (duration: 00m 57s)
* 23:11 mutante: mw1452 - re-pooled, scap pull
* 23:09 dzahn@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'miscweb' for release 'main' .
* 22:35 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 22:35 ryankemper: [[phab:T288231|T288231]] Ran puppet on `wdqs2006`, now back to the cookbook run
* 22:33 ryankemper: [[phab:T288231|T288231]] Forgot about running puppet-agent on `wdqs2006`; aborted cookbook run
* 22:33 ryankemper@cumin1001: END (ERROR) - Cookbook sre.wdqs.data-transfer (exit_code=97)
* 22:33 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 22:32 ryankemper: [[phab:T288231|T288231]] Merged https://gerrit.wikimedia.org/r/c/operations/puppet/+/730795; proceeding to data-transfer on `wdqs2006`: `sudo rm -fv /srv/wdqs/data_loaded` on `wdqs2006` followed by `ryankemper@cumin1001:~$ sudo cookbook sre.wdqs.data-transfer --source wdqs2008.codfw.wmnet --dest wdqs2006.codfw.wmnet --reason "streaming updater cutover for wdqs2005" --blazegraph_instance blazegraph --task-id [[phab:T288231|T288231]]`
* 22:31 mutante: depooling mw1452 for testig
* 22:28 ryankemper: [[phab:T288231|T288231]] `ryankemper@wdqs2005:~$ sudo pool`: transfer completed successfully; tests passing on host (used `ssh -L 9999:localhost:80 wdqs2005.codfw.wmnet` to establish tunnel)
* 22:23 dpifke@deploy1002: Finished deploy [performance/arc-lamp@84fe496]: New flamegraph.pl from upstream [[phab:T291898|T291898]] (duration: 00m 05s)
* 22:23 dpifke@deploy1002: Started deploy [performance/arc-lamp@84fe496]: New flamegraph.pl from upstream [[phab:T291898|T291898]]
* 22:17 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation restart without plugin upgrade (1 nodes at a time) for ElasticSearch cluster cloudelastic: cloudelastic restart - ryankemper@cumin1001 - [[phab:T292814|T292814]]
* 22:07 eileen: civicrm revision changed from {{Gerrit|018d3b19fe}} to {{Gerrit|9b5e0d015b}}, config revision is {{Gerrit|781d6a1b1f}}
* 21:34 robh@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 21:25 robh@cumin1001: START - Cookbook sre.dns.netbox
* 21:10 robh@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
* 21:06 robh@cumin1001: START - Cookbook sre.dns.netbox
* 19:45 dancy@deploy1002: rebuilt and synchronized wikiversions files: group2 wikis to 1.38.0-wmf.4  refs [[phab:T281168|T281168]]
* 19:23 dzahn@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'miscweb' for release 'main' .
* 19:05 dzahn@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'miscweb' for release 'main' .
* 18:53 dzahn@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'miscweb' for release 'main' .
* 18:53 urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript namespaceDupes.php --wiki=dagwiki --fix
* 18:47 urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript extensions/CentralAuth/maintenance/fixStuckGlobalRename.php --wiki=frwiktionary --logwiki=metawiki 'TURK FASTER' 'ARTHUR MORGAN'
* 18:42 urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript extensions/CentralAuth/maintenance/fixStuckGlobalRename.php --wiki=metawiki --logwiki=metawiki 'George Dum Fulton' 'George Fulton' # [[phab:T293403|T293403]]
* 18:41 urbanecm: UTC evening B&C done
* 18:40 urbanecm@deploy1002: Synchronized php-1.38.0-wmf.4/extensions/MediaSearch/extension.json: {{Gerrit|6da3523daaba85a4199721980c0a9c96b20697e7}}: Fix assessment quickview labels ([[phab:T292596|T292596]]) (duration: 01m 03s)
* 18:37 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|c8dffefd0d095abe3709dcc962d5d24f27b55869}}: Create Salima namespace for dagwiki ([[phab:T289911|T289911]]) (duration: 01m 04s)
* 18:30 dzahn@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'miscweb' for release 'main' .
* 18:25 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|0bccd4bc45498db8628567574d0bb3a23f8fb378}}: Add $wgSitename and $wgMetaNamespace for kswiki and kswiktionary ([[phab:T289752|T289752]], [[phab:T289767|T289767]]) (duration: 01m 04s)
* 18:17 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 18:14 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|262e588b44f126fb9e1aa933a3ca59b191b42bd7}}: Enable Growth mentor dashboard backend on all wikis ([[phab:T278920|T278920]]) (duration: 01m 05s)
* 18:07 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|41baa8c41d64510986f009b9be2d70dad0915f8c}}: Add new mediawiki.skin_diff event logging stream ([[phab:T289622|T289622]]) (duration: 01m 05s)
* 18:03 addshore@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' .
* 18:02 addshore@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' .
* 18:01 addshore@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'staging' .
* 17:54 bd808@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'toolhub' for release 'main' .
* 17:52 rzl: repooled mw1452 (with `sudo pool` so no auto log from conftool)
* 17:47 bd808@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'toolhub' for release 'main' .
* 17:45 rzl@cumin1001: conftool action : set/pooled=no; selector: name=mw1452.eqiad.wmnet
* 17:42 rzl: depool mw1452 for training
* 17:32 addshore@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' .
* 17:31 addshore@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' .
* 17:29 addshore@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'staging' .
* 16:44 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 16:44 ryankemper: [[phab:T288231|T288231]] Manually killed dangling `pigz` / `nc` processes on `wdqs2008` (and `wdqs2005` implicitly). Should be in the right state to re-start the `data-transfer` cookbook from again
* 16:41 ryankemper@cumin1001: END (ERROR) - Cookbook sre.wdqs.data-transfer (exit_code=97)
* 16:37 elukey: drop kubeflow-kfserving* docker images from deneb
* 16:36 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 16:34 ryankemper@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99)
* 16:33 moritzm: installing node-ansi-regex security updates
* 16:28 mbsantos@deploy1002: Finished deploy [kartotherian/deploy@4bff2d1]: Force mirrored traffic to 0% for everywhere (duration: 02m 24s)
* 16:25 mbsantos@deploy1002: Started deploy [kartotherian/deploy@4bff2d1]: Force mirrored traffic to 0% for everywhere
* 16:24 dancy@deploy1002: Synchronized php-1.38.0-wmf.4/extensions/Collection/includes/CollectionHooks.php: Backport: [[gerrit:730580{{!}}Check that the timestamp  key/value is set to avoid undefined offset (T293300)]] (duration: 01m 04s)
* 16:16 mbsantos@deploy1002: Finished deploy [kartotherian/deploy@071f7c3]: Increase mirrored traffic to 100% for eqiad (duration: 02m 41s)
* 16:14 mbsantos@deploy1002: Started deploy [kartotherian/deploy@071f7c3]: Increase mirrored traffic to 100% for eqiad
* 16:08 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 16:07 ryankemper@cumin1001: END (ERROR) - Cookbook sre.wdqs.data-transfer (exit_code=97)
* 16:07 ryankemper: [[phab:T288231|T288231]] About to ctrl+c out of ongoing data transfer because puppet run following merge of https://gerrit.wikimedia.org/r/c/operations/puppet/+/730794 restarted blazegraph; we'll manually disable updater and kick off the transfer again
* 16:04 ryankemper: [[phab:T288231|T288231]] `ryankemper@wdqs2005:~$ sudo run-puppet-agent --force`
* 15:56 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 15:54 ryankemper: [[phab:T288231|T288231]] `ryankemper@wdqs2008:~$ sudo depool`
* 15:52 ryankemper: [[phab:T288231|T288231]] `ryankemper@wdqs2005:~$ sudo depool`
* 15:22 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.addnode (exit_code=0) for new host ganeti2026.codfw.wmnet to ganeti-test01.svc.codfw.wmnet
* 15:20 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti2026.codfw.wmnet to ganeti-test01.svc.codfw.wmnet
* 15:13 bd808@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'toolhub' for release 'main' .
* 15:06 dancy@deploy1002: Synchronized php-1.38.0-wmf.4/extensions/VisualEditor/includes/VisualEditorHooks.php: Backport: [[gerrit:730729{{!}}Fix value of 'namespacesWithSubpages' in wgVisualEditorConfig (T293310)]] (duration: 01m 04s)
* 15:02 dancy@deploy1002: Synchronized php-1.38.0-wmf.4/extensions/Collection/includes/CollectionHooks.php: Backport: [[gerrit:730580{{!}}Check that the timestamp  key/value is set to avoid undefined offset (T293300)]] (duration: 01m 03s)
* 15:00 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti2026.codfw.wmnet to ganeti-test01.svc.codfw.wmnet
* 14:59 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti2026.codfw.wmnet to ganeti-test01.svc.codfw.wmnet
* 14:53 kormat: upgrading orchestrator.wm.o to 3.2.6-1 [[phab:T275784|T275784]]
* 14:49 jbond@cumin1001: conftool action : set/pooled=true; selector: name=eqiad,dnsdisc=apt
* 14:43 jbond: migrate apt.w.o to a dns active/passiev discovery address (cc moritzm)
* 14:23 moritzm: installing krb5 security updates on KDCs
* 14:19 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality' for release 'main' .
* 14:10 urbanecm@deploy1002: Synchronized dblists/growthexperiments.dblist: {{Gerrit|b35adfc59eec9c19b509bb9439cdfe33978a4f8b}}: Deploy Growth wikis to 4 wikis in dark mode ([[phab:T291826|T291826]]; 2/2) (duration: 01m 03s)
* 14:07 urbanecm: Run extensions/GrowthExperiments/initWikiConfig.php for ganwiki, iuwiki, tgwiki ([[phab:T291826|T291826]])
* 14:07 urbanecm: Create growthexperiments DB tables for ganwiki, iuwiki, tgwiki ([[phab:T291826|T291826]])
* 14:06 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality' for release 'main' .
* 14:05 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
* 14:05 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 14:04 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|b35adfc59eec9c19b509bb9439cdfe33978a4f8b}}: Deploy Growth wikis to 4 wikis in dark mode ([[phab:T291826|T291826]]; 1/2) (duration: 01m 04s)
* 14:03 urbanecm@deploy1002: Synchronized dblists/visualeditor-nondefault.dblist: {{Gerrit|82d0a4bf45126ecba2cfcd1a0c2081a00f58dca3}}: Enable VE by default on 4 more wikis ([[phab:T290614|T290614]]) (duration: 01m 05s)
* 13:56 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality' for release 'main' .
* 13:55 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
* 13:54 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 13:54 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
* 13:54 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 13:52 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
* 13:52 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 13:43 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2026.codfw.wmnet
* 13:36 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2026.codfw.wmnet
* 13:33 ayounsi@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 13:14 kormat: uploaded orchestrator 3.2.6-1 packages to apt.wm.o (buster) [[phab:T275784|T275784]]
* 12:45 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti2026.codfw.wmnet with OS buster
* 12:44 ayounsi@cumin1001: START - Cookbook sre.dns.netbox
* 12:42 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10 days, 0:00:00 on cloudbackup2002.codfw.wmnet with reason: working on cinder backupse
* 12:42 aborrero@cumin1001: START - Cookbook sre.hosts.downtime for 10 days, 0:00:00 on cloudbackup2002.codfw.wmnet with reason: working on cinder backupse
* 12:19 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/Wikibase.php: Config: [[gerrit:730746{{!}}Untangle “dispatch via jobs” settings in Wikibase.php (T291828)]] (no-op) (duration: 01m 04s)
* 12:12 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:730725{{!}}Set wmgWikibaseDispatchViaJobsPruneChangesTableInJobEnabled for wikidatawiki (T291828)]] (no-op) (duration: 01m 05s)
* 11:47 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti2026.codfw.wmnet with OS buster
* 11:17 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts testvm2006.codfw.wmnet
* 11:10 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts testvm2006.codfw.wmnet
* 11:10 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 11:01 jmm@cumin2002: START - Cookbook sre.dns.netbox
* 10:52 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts testvm2002.codfw.wmnet
* 10:38 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts testvm2002.codfw.wmnet
* 10:38 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts testvm2001.codfw.wmnet
* 10:35 urbanecm@deploy1002: Synchronized php-1.38.0-wmf.4/extensions/GrowthExperiments/: {{Gerrit|1f33fc3}}, {{Gerrit|e0ea1b8}}, {{Gerrit|cba2ac9}}: GrowthExperiments backports ([[phab:T290609|T290609]]) (duration: 01m 05s)
* 10:33 urbanecm@deploy1002: Synchronized php-1.38.0-wmf.3/extensions/GrowthExperiments/: {{Gerrit|465b564}}, {{Gerrit|a8cc98b}}, {{Gerrit|6e95c48}}: GrowthExperiments backports ([[phab:T290609|T290609]]) (duration: 01m 06s)
* 10:32 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts testvm2001.codfw.wmnet
* 09:20 volans@cumin2002: END (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: sretest1001.eqiad.wmnet
* 09:20 volans@cumin2002: START - Cookbook sre.debmonitor.remove-hosts for 1 hosts: sretest1001.eqiad.wmnet
* 09:19 volans@cumin2002: END (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: sretest1001.eqiad.wmnet
* 09:19 volans@cumin2002: START - Cookbook sre.debmonitor.remove-hosts for 1 hosts: sretest1001.eqiad.wmnet
* 09:19 volans@cumin2002: END (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: sretest1001.eqiad.wmnet
* 09:19 volans@cumin2002: START - Cookbook sre.debmonitor.remove-hosts for 1 hosts: sretest1001.eqiad.wmnet
* 09:18 volans@deploy1002: Finished deploy [debmonitor/deploy@ab62ac5]: Release v0.3.1 (duration: 00m 50s)
* 09:17 volans@deploy1002: Started deploy [debmonitor/deploy@ab62ac5]: Release v0.3.1
* 09:04 volans@deploy1002: Finished deploy [debmonitor/deploy@444b931]: Release v0.3.1 (duration: 00m 45s)
* 09:03 volans@deploy1002: Started deploy [debmonitor/deploy@444b931]: Release v0.3.1
* 09:02 volans@deploy1002: Finished deploy [debmonitor/deploy@444b931]: Release v0.3.1 (duration: 00m 23s)
* 09:02 volans@deploy1002: Started deploy [debmonitor/deploy@444b931]: Release v0.3.1
* 08:52 volans@cumin2002: END (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: sretest1001.eqiad.wmnet
* 08:52 volans@cumin2002: START - Cookbook sre.debmonitor.remove-hosts for 1 hosts: sretest1001.eqiad.wmnet
* 08:51 volans@cumin2002: END (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: sretest1001.eqiad.wmnet
* 08:51 volans@cumin2002: START - Cookbook sre.debmonitor.remove-hosts for 1 hosts: sretest1001.eqiad.wmnet
* 08:22 volans: rolling out debmonitor-client upgrade to 0.3.1 across the fleet
* 07:25 oblivian@cumin1001: END (FAIL) - Cookbook sre.discovery.service-route (exit_code=99)
* 07:25 oblivian@cumin1001: START - Cookbook sre.discovery.service-route
* 07:25 oblivian@cumin1001: END (FAIL) - Cookbook sre.discovery.service-route (exit_code=99)
* 07:25 oblivian@cumin1001: START - Cookbook sre.discovery.service-route
* 07:24 oblivian@cumin1001: END (FAIL) - Cookbook sre.discovery.service-route (exit_code=99)
* 07:24 oblivian@cumin1001: START - Cookbook sre.discovery.service-route
* 07:18 filippo@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=swift-ro,name=eqiad
* 07:18 filippo@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=swift,name=eqiad
* 07:17 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 06:37 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 01:52 bd808@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'toolhub' for release 'main' .
* 01:50 foks: changing user email for "Region of Peel Archives"
* 01:41 ejegg: updated payments-wiki from {{Gerrit|b329d2dea2}} to {{Gerrit|19d18c1852}}
* 01:35 bd808@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'toolhub' for release 'main' .
* 01:31 bd808@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'toolhub' for release 'main' .


== 2020-10-16 ==
== 2021-10-13 ==
* 21:46 pt1979@cumin2001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 23:37 dzahn@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'miscweb' for release 'main' .
* 21:43 pt1979@cumin2001: START - Cookbook sre.dns.netbox
* 23:36 eileen: civicrm revision changed from {{Gerrit|946dfb6c5a}} to {{Gerrit|018d3b19fe}}, config revision is {{Gerrit|85277466ed}}
* 20:27 robh@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 23:36 tgr@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:730575{{!}}Create an alias for the project namespace on kswiki (T291740)]] (duration: 01m 05s)
* 20:25 robh@cumin1001: START - Cookbook sre.hosts.downtime
* 22:30 dzahn@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'miscweb' for release 'main' .
* 19:39 robh@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 22:01 dancy@deploy1002: Synchronized php-1.38.0-wmf.3/extensions/Collection/includes/Specials/SpecialCollection.php: Backport: [[gerrit:730578{{!}}Api: Avoid trying to access undefined offset in a user's collection (T293261)]] (duration: 01m 04s)
* 19:37 robh@cumin1001: START - Cookbook sre.hosts.downtime
* 21:50 dancy@deploy1002: Synchronized php-1.38.0-wmf.4/extensions/Collection: Backport: [[gerrit:730577{{!}}Api: Avoid trying to access undefined offset in a user's collection (T293261)]] (duration: 01m 04s)
* 17:43 thcipriani: restarting gerrit due to gc thrashing
* 21:47 foks: removing 8 files for legal compliance
* 16:25 andrew@deploy1001: Finished deploy [horizon/deploy@89b308c]: prevent creation of VMs with non-ceph flavors (duration: 04m 08s)
* 21:03 foks: removing 2 files for legal compliance
* 16:21 andrew@deploy1001: Started deploy [horizon/deploy@89b308c]: prevent creation of VMs with non-ceph flavors
* 21:00 mbsantos@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 15:36 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.cf (exit_code=0)
* 20:50 bd808@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'toolhub' for release 'main' .
* 15:36 ayounsi@cumin1001: START - Cookbook sre.network.cf
* 20:49 brennen@deploy1002: Synchronized php-1.38.0-wmf.4/extensions/Collection/includes/Api/ApiGetBookCreatorBoxContent.php: Backport: [[gerrit:730574{{!}}Fall back to main page if given title is invalid (T293299)]] (duration: 01m 04s)
* 15:11 bblack@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 20:46 bd808@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'toolhub' for release 'main' .
* 15:01 bblack@cumin1001: START - Cookbook sre.hosts.decommission
* 20:40 bd808@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'toolhub' for release 'main' .
* 13:41 effie: pooling mw2279.codfw.wmnet [[phab:T264698|T264698]]
* 20:31 mbsantos@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 12:11 jiji@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 20:27 robh@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes1021.eqiad.wmnet with OS stretch
* 12:09 jiji@cumin2001: START - Cookbook sre.hosts.downtime
* 20:04 robh@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes1021.eqiad.wmnet with OS stretch
* 10:35 reedy@deploy1001: Synchronized php-1.36.0-wmf.13/extensions/ProofreadPage/: Revert excessive escaping [[phab:T265571|T265571]] (duration: 01m 12s)
* 20:03 robh@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host kubernetes1021.eqiad.wmnet with OS stretch
* 09:23 ema: text@esams (except for cp3050/cp3052): upgrade varnish to 6.0.6-1wm2, restart varnishkafka instances [[phab:T264074|T264074]]
* 20:01 robh@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes1021.eqiad.wmnet with OS stretch
* 09:19 ema: upload@esams: upgrade varnish to 6.0.6-1wm2, restart varnishkafka-webrequest [[phab:T264074|T264074]]
* 19:18 dzahn@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'miscweb' for release 'main' .
* 09:08 ema: upload@eqsin: upgrade varnish to 6.0.6-1wm2, restart varnishkafka-webrequest [[phab:T264074|T264074]]
* 19:16 mutante: gitlab2001 - status before was that "gitlab-ctl status" showed components "gitlab-workhorse" and "postgres-exporter" as "down". this was either pre-broken or caused by the restore process. after manually 'gitlab-ctl start gitlab-workhorse' all of the components are in "run" and https://gitlab-replica.wikimedia.org is up ( [[phab:T285867|T285867]])
* 09:03 XioNoX: eqsin, push CR 634473
* 19:08 mutante: gitl1b2001 - started workhorse which was for some reason marked as down after restore command ran
* 09:01 ema: text@eqsin: upgrade varnish to 6.0.6-1wm2, restart varnishkafka instances [[phab:T264074|T264074]]
* 19:08 mutante: [gitlab2001:~] $ sudo /usr/bin/gitlab-ctl start gitlab-workhorse
* 08:53 ema: upload@codfw: upgrade varnish to 6.0.6-1wm2, restart varnishkafka-webrequest [[phab:T264074|T264074]]
* 19:06 dancy@deploy1002: Synchronized php: group1 wikis to 1.38.0-wmf.4  refs [[phab:T281168|T281168]] (duration: 01m 03s)
* 08:52 XioNoX: add BGP_IXP_RS_in to eqsin RS BGP sessions
* 19:05 dancy@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.38.0-wmf.4  refs [[phab:T281168|T281168]]
* 08:48 ema: text@codfw: upgrade varnish to 6.0.6-1wm2, restart varnishkafka instances [[phab:T264074|T264074]]
* 19:02 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|87879865c35edab3ead523027681146e00d6fc02}}: Create Translation namespace for viwikisource ([[phab:T290691|T290691]]) (duration: 01m 04s)
* 08:29 ema: upload@eqiad: upgrade varnish to 6.0.6-1wm2, restart varnishkafka-webrequest [[phab:T264074|T264074]]
* 18:39 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|06fd0f225575448771cdba0d4e6bf36bb6715bc1}}: add extendedconfimed for autoreview group on ptwiki ([[phab:T292912|T292912]]) (duration: 01m 04s)
* 08:24 ema: text@eqiad: upgrade varnish to 6.0.6-1wm2, restart varnishkafka instances [[phab:T264074|T264074]]
* 18:37 urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript initSiteStats.php --wiki=ptwiki --update
* 08:09 elukey: reboot stat1005/stat1008 to pick up correct GPU settings
* 18:33 urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript emptyUserGroup.php --wiki=ptwiki extendedconfirmed
* 08:09 ema: upload@ulsfo: upgrade varnish to 6.0.6-1wm2, restart varnishkafka-webrequest [[phab:T264074|T264074]]
* 18:31 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|0bb2b388217aa91a39ed3684f87fdf7edb06fd81}}: Set autoconfirmedextended and confirmedextended for ptwiki ([[phab:T292915|T292915]]) (duration: 01m 04s)
* 07:59 ema: text@ulsfo: upgrade varnish to 6.0.6-1wm2, restart varnishkafka instances [[phab:T264074|T264074]]
* 18:16 urbanecm@deploy1002: Synchronized static/images/project-logos: {{Gerrit|694bc234ab5dbb9a2387a6129998d45a53ac0ab3}}: Remove an old dawiki temporary logo (duration: 01m 04s)
* 07:19 dcausse@deploy1001: Finished deploy [wikimedia/discovery/analytics@27d0b01]: cirrus namespace map: Align output columns with table (duration: 04m 22s)
* 18:15 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|224e2a374b1cc6327e9d8c2bca576091ce4efc74}}: Add NS_MAIN back to wgExtraSignatureNamespaces for mediawikiwiki ([[phab:T291630|T291630]]) (duration: 01m 05s)
* 07:15 dcausse@deploy1001: Started deploy [wikimedia/discovery/analytics@27d0b01]: cirrus namespace map: Align output columns with table
* 18:12 volans@cumin2002: END (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: sretest1001.eqiad.wmnet
* 06:57 XioNoX: enable cr2-eqdfw:xe-0/1/2
* 18:12 volans@cumin2002: START - Cookbook sre.debmonitor.remove-hosts for 1 hosts: sretest1001.eqiad.wmnet
* 02:14 eileen: civicrm revision changed from {{Gerrit|585eb835d8}} to {{Gerrit|3c3dcf80ae}}, config revision is {{Gerrit|f76d7849bc}}
* 18:11 urbanecm@deploy1002: Synchronized static/images/project-logos/: {{Gerrit|1b96f54a518620b0dc6a0ab63b402d0ea2c6bf70}}: Update logo for liwiktionary ([[phab:T291479|T291479]]) (duration: 01m 14s)
* 01:01 ryankemper: Cleaning up a dangling no-longer-puppet-managed udev elasticsearch-readahead rule across all cirrus instances: `sudo cumin -b 36 C:profile::elasticsearch::cirrus 'sudo rm -fv /etc/udev/rules.d/elasticsearch-readahead.rules && sudo /sbin/udevadm control --reload && sudo /sbin/udevadm trigger'`
* 18:10 volans@cumin2002: END (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: sretest1001.eqiad.wmnet
* 00:56 cdanis@cumin1001: END (PASS) - Cookbook sre.network.cf (exit_code=0)
* 18:10 volans@cumin2002: START - Cookbook sre.debmonitor.remove-hosts for 1 hosts: sretest1001.eqiad.wmnet
* 00:56 cdanis@cumin1001: START - Cookbook sre.network.cf
* 18:09 volans@cumin2002: END (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: sretest1001.eqiad.wmnet
* 18:09 volans@cumin2002: START - Cookbook sre.debmonitor.remove-hosts for 1 hosts: sretest1001.eqiad.wmnet
* 18:08 volans: uploaded debmonitor-client_0.3.1 to apt.wikimedia.org stretch-wikimedia,buster-wikimedia,bullseye-wikimedia
* 17:14 urbanecm@deploy1002: Synchronized php-1.38.0-wmf.4/extensions/GrowthExperiments/maintenance/initWikiConfig.php: {{Gerrit|dd7a3314602ffddc5b917cccc71c917301639388}}: initWikiConfig: Fix loading difficulty/group from SUGGESTED_EDITS_TASK_TYPES ([[phab:T293219|T293219]]) (duration: 01m 04s)
* 17:13 urbanecm@deploy1002: Synchronized php-1.38.0-wmf.3/extensions/GrowthExperiments/maintenance/initWikiConfig.php: {{Gerrit|5c27154cf434bebc37f5e98e2ad1b5cea7cde1d4}}: initWikiConfig: Fix loading difficulty/group from SUGGESTED_EDITS_TASK_TYPES ([[phab:T293219|T293219]]) (duration: 01m 15s)
* 16:57 mutante: stat1008 - short on disk space, mostly used in /tmp, high CPU usage by R proccess, sent a message about it to all shell users via wall
* 16:50 mutante: stat1008 - apt-get clean - freed 1.3 GB disk space - was alerting in Icinga because / was 97% full
* 16:37 volans@cumin2002: END (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: sretest1001.eqiad.wmnet
* 16:37 volans@cumin2002: START - Cookbook sre.debmonitor.remove-hosts for 1 hosts: sretest1001.eqiad.wmnet
* 16:23 volans@cumin2002: END (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: sretest1001.eqiad.wmnet
* 16:23 volans@cumin2002: START - Cookbook sre.debmonitor.remove-hosts for 1 hosts: sretest1001.eqiad.wmnet
* 15:29 volans@cumin2002: END (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: sretest1001.eqiad.wmnet
* 15:28 volans@cumin2002: START - Cookbook sre.debmonitor.remove-hosts for 1 hosts: sretest1001.eqiad.wmnet
* 15:26 volans@cumin2002: END (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: sretest1001.eqiad.wmnet
* 15:26 volans@cumin2002: START - Cookbook sre.debmonitor.remove-hosts for 1 hosts: sretest1001.eqiad.wmnet
* 15:16 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2026.codfw.wmnet
* 15:13 jbond@cumin1001: END (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: sretest1001.eqiad.wmnet
* 15:13 jbond@cumin1001: START - Cookbook sre.debmonitor.remove-hosts for 1 hosts: sretest1001.eqiad.wmnet
* 15:12 jbond@cumin1001: END (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: sretest1001.eqiad.wmnet
* 15:12 jbond@cumin1001: START - Cookbook sre.debmonitor.remove-hosts for 1 hosts: sretest1001.eqiad.wmnet
* 15:09 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2026.codfw.wmnet
* 15:04 jgiannelos@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 15:03 jbond@cumin1001: END (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: sretest1001.eqiad.wmnet
* 15:03 jbond@cumin1001: START - Cookbook sre.debmonitor.remove-hosts for 1 hosts: sretest1001.eqiad.wmnet
* 15:01 jgiannelos@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 15:01 jbond@cumin1001: END (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: sretest1001.eqiad.wmnet
* 15:01 jbond@cumin1001: START - Cookbook sre.debmonitor.remove-hosts for 1 hosts: sretest1001.eqiad.wmnet
* 14:59 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 14:59 jbond@cumin1001: END (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: sretest1001.eqiad.wmnet
* 14:59 jbond@cumin1001: START - Cookbook sre.debmonitor.remove-hosts for 1 hosts: sretest1001.eqiad.wmnet
* 14:57 jbond@cumin1001: END (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: sretest1001.eqiad.wmnet
* 14:56 jbond@cumin1001: START - Cookbook sre.debmonitor.remove-hosts for 1 hosts: sretest1001.eqiad.wmnet
* 14:56 jbond@cumin1001: END (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: sretest1001.eqiad.wmnet
* 14:56 jbond@cumin1001: START - Cookbook sre.debmonitor.remove-hosts for 1 hosts: sretest1001.eqiad.wmnet
* 14:54 jbond@cumin1001: END (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: sretest1001.eqiad.wmnet
* 14:54 jbond@cumin1001: START - Cookbook sre.debmonitor.remove-hosts for 1 hosts: sretest1001.eqiad.wmnet
* 14:52 ema: repool cp4021, further testing can be performed on sretest1001 [[phab:T201317|T201317]]
* 14:51 volans: restarting ircecho.service on alert1001 to get back icinga-wm without the underscore
* 14:50 elukey: restart pybal on lvs1015 (low-traffic primary) to pick up new config for inference.discovery.wmnet - [[phab:T289835|T289835]]
* 14:48 moritzm: reverted to clean package state on deneb
* 14:44 elukey@puppetmaster1001: conftool action : ge; selector: cluster=ml_serve,service=inference
* 14:36 elukey: restart pybal on lvs1016 (low-traffic secondary) to pick up new config for inference.discovery.wmnet - [[phab:T289835|T289835]]
* 14:27 jbond@cumin1001: END (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: sretest1001.eqiad.wmnet
* 14:27 jbond@cumin1001: START - Cookbook sre.debmonitor.remove-hosts for 1 hosts: sretest1001.eqiad.wmnet
* 14:25 jbond@cumin1001: END (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: sretest1001.eqiad.wmnet
* 14:25 jbond@cumin1001: START - Cookbook sre.debmonitor.remove-hosts for 1 hosts: sretest1001.eqiad.wmnet
* 14:21 jbond@cumin1001: END (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: sretest1001.eqiad.wmnet
* 14:21 jbond@cumin1001: START - Cookbook sre.debmonitor.remove-hosts for 1 hosts: sretest1001.eqiad.wmnet
* 14:20 moritzm: temporarily downgrade sphinx packages on deneb to 1.7.9-1~bpo9+1 to build a Ganeti 2.16 stretch backport with delicate toolchain needs
* 14:13 jbond@cumin1001: END (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: sretest1001.eqiad.wmnet
* 14:13 jbond@cumin1001: START - Cookbook sre.debmonitor.remove-hosts for 1 hosts: sretest1001.eqiad.wmnet
* 14:10 jbond@cumin1001: END (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: sretest1001.eqiad.wmnet
* 14:10 jbond@cumin1001: START - Cookbook sre.debmonitor.remove-hosts for 1 hosts: sretest1001.eqiad.wmnet
* 14:10 jbond@cumin1001: END (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: sretest1001.eqiad.wmnet
* 14:10 jbond@cumin1001: START - Cookbook sre.debmonitor.remove-hosts for 1 hosts: sretest1001.eqiad.wmnet
* 13:59 XioNoX: push prep-work for anycast tuning in ulsfo - [[phab:T288843|T288843]]
* 13:38 jayme: imported helm-diff_3.1.3-2 to buster-wikimedia (https://gerrit.wikimedia.org/r/c/operations/debs/helm-diff/+/730509)
* 13:37 jayme@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'shellbox' for release 'main' .
* 13:34 ema@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4021.ulsfo.wmnet with OS buster
* 12:13 Lucas_WMDE: UTC morning backport+config window done
* 12:12 kharlan@deploy1002: Synchronized php-1.38.0-wmf.3/extensions/GrowthExperiments/includes: Backport: [[gerrit:730370{{!}}Add Link: Do not log "no suggestion found" errors in production log (T291251)]] (duration: 01m 04s)
* 12:11 urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript extensions/GrowthExperiments/maintenance/initWikiConfig.php --wiki=itwiki --phab='[[phab:T255037|T255037]]'  # after applying 730512 at mwmaint1002 to workaround [[phab:T293219|T293219]] # [[phab:T255037|T255037]]
* 12:11 kharlan@deploy1002: Synchronized php-1.38.0-wmf.3/extensions/GrowthExperiments/modules: Backport: [[gerrit:730371{{!}}Suggested Edits: Update local config.presets when topics/difficulty presets change (T292536)]] (duration: 01m 07s)
* 11:56 urbanecm@deploy1002: Synchronized wmf-config/config/itwiki.yaml: {{Gerrit|38a019d4fd6ff8e7cf92f5e7c6a899c336f20235}}: itwiki: Deploy Growth features in dark mode ([[phab:T255037|T255037]]) (duration: 01m 04s)
* 11:55 urbanecm: mwscript extensions/Translate/scripts/moveTranslatablePage.php --wiki=mediawikiwiki "Growth/Communities/How to introduce yourself as a mentor" "Growth/Communities/How to configure the mentors' list" "Martin Urbanec (WMF)" --reason '[[:phab:T293184]]' # [[phab:T293184|T293184]]
* 11:55 urbanecm@deploy1002: Synchronized dblists/growthexperiments.dblist: {{Gerrit|38a019d4fd6ff8e7cf92f5e7c6a899c336f20235}}: Deploy Growth features in dark mode ([[phab:T255037|T255037]]; 2/3) (duration: 01m 04s)
* 11:54 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|38a019d4fd6ff8e7cf92f5e7c6a899c336f20235}}: itwiki: Deploy Growth features in dark mode ([[phab:T255037|T255037]]; 1/3) (duration: 01m 05s)
* 11:50 urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript extensions/GrowthExperiments/maintenance/initWikiConfig.php --wiki=itwiki --phab='[[phab:T255037|T255037]]' # [[phab:T255037|T255037]]
* 11:49 urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript extensions/WikimediaMaintenance/createExtensionTables.php --wiki=itwiki growthexperiments # [[phab:T255037|T255037]]
* 11:48 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.38.0-wmf.4/extensions/Wikibase/repo/: Backport: [[gerrit:730380{{!}}Instantiate ItemId for SiteLinkConflictLookup results (T293104)]] (duration: 01m 07s)
* 11:43 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.38.0-wmf.3/extensions/Wikibase/repo/: Backport: [[gerrit:730385{{!}}Instantiate ItemId for SiteLinkConflictLookup results (T293104)]] (duration: 01m 18s)
* 11:33 ema@cumin2002: START - Cookbook sre.hosts.reimage for host cp4021.ulsfo.wmnet with OS buster
* 11:19 ema: pool cp4021 after reimage [[phab:T201317|T201317]]
* 11:05 ema@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4021.ulsfo.wmnet with OS buster
* 10:15 jgiannelos@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 10:10 ayounsi@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 10:09 phuedx@deploy1002: Synchronized wmf-config/InitialiseSettings-labs.php: Config: [[gerrit:728490{{!}}Add more types of QuickSurveys on beta cluster (T292459)]] (duration: 01m 53s)
* 10:06 ayounsi@cumin1001: START - Cookbook sre.dns.netbox
* 09:22 ema@cumin2002: START - Cookbook sre.hosts.reimage for host cp4021.ulsfo.wmnet with OS buster
* 08:35 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 08:28 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 08:21 elukey: run kafka preferred-replica-election on kafka-main1001 to rebalance partition leaders - [[phab:T288825|T288825]]
* 08:15 godog: bounce graphite on graphite1004 to apply new config
* 07:33 elukey: increase kafka topic partition size of the top 4 high traffic topics of main-eqiad as described in https://phabricator.wikimedia.org/T288825#7422726
* 07:13 XioNoX: provision new eqsin-ulsfo link - [[phab:T273308|T273308]]
* 06:26 elukey: `kafka topics --alter --topic <nowiki>{</nowiki>eqiad,codfw<nowiki>}</nowiki>.change-prop.transcludes.resource-change --partitions 3` on kafka-main2001 - [[phab:T288825|T288825]]
* 00:38 ejegg: updated payments-wiki from {{Gerrit|030b11da1a}} to {{Gerrit|b329d2dea2}}


== 2020-10-15 ==
== 2021-10-12 ==
* 23:49 ryankemper: Began in-place reindex of `eqiad`, `codfw`, and `cloudelastic`. Running on `ryankemper@mwmaint2001` under tmux sessions `inplace_reindex_[eqiad, codfw, cloudelastic]`
* 23:48 dzahn@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'miscweb' for release 'main' .
* 23:00 krinkle@deploy1001: Synchronized wmf-config/env.php: {{Gerrit|I245e84e0b8c}} (duration: 01m 10s)
* 23:16 urbanecm: UTC late B&C window done
* 22:09 cdanis: previous sre.network.cf invocation was a no-op; just checking status
* 23:15 urbanecm@deploy1002: Synchronized wmf-config/logos.php: {{Gerrit|59c31d9046a68e73b07d8179ac569425d18dcf73}}: Change logo in astwiki ([[phab:T292742|T292742]]) (duration: 01m 04s)
* 22:08 cdanis@cumin1001: END (PASS) - Cookbook sre.network.cf (exit_code=0)
* 23:12 urbanecm@deploy1002: Synchronized static/images/project-logos/: {{Gerrit|59c31d9046a68e73b07d8179ac569425d18dcf73}}: Change logo in astwiki ([[phab:T292742|T292742]]) (duration: 02m 09s)
* 22:08 cdanis@cumin1001: START - Cookbook sre.network.cf
* 23:05 dzahn@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'miscweb' for release 'main' .
* 22:06 mutante: depooled remaining wtp* servers in codfw. old parsoid servers, new servers are parse2* ([[phab:T265558|T265558]])
* 22:53 urbanecm: [urbanecm@labweb1001 ~]$ mwscript extensions/OATHAuth/maintenance/disableOATHAuthForUser.php --wiki=labswiki Jamesmontalvo3 #
* 22:05 dzahn@cumin1001: conftool action : set/pooled=no; selector: dc=codfw,name=wtp2020.codfw.wmnet
* 22:51 dzahn@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'miscweb' for release 'main' .
* 22:05 dzahn@cumin1001: conftool action : set/pooled=no; selector: dc=codfw,name=wtp201[6-9].codfw.wmnet
* 20:21 bd808@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'toolhub' for release 'main' .
* 21:35 dzahn@cumin1001: conftool action : set/pooled=no; selector: dc=codfw,name=wtp201[0-5].codfw.wmnet
* 19:31 dancy@deploy1002: Pruned MediaWiki: 1.38.0-wmf.1 (duration: 04m 02s)
* 20:27 cdanis@cumin1001: END (PASS) - Cookbook sre.network.cf (exit_code=0)
* 19:13 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 20:27 cdanis@cumin1001: START - Cookbook sre.network.cf
* 19:08 pt1979@cumin2002: START - Cookbook sre.dns.netbox
* 19:46 ebernhardson@deploy1001: Finished deploy [wikimedia/discovery/analytics@88e1283]: spark: fix handling of unpartitioned data sources (duration: 06m 22s)
* 19:02 dancy@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.38.0-wmf.4  refs [[phab:T281168|T281168]]
* 19:43 marxarelli: all wikis promoted to 1.36.0-wmf.13 ([[phab:T263179|T263179]])
* 18:47 dancy@deploy1002: Finished scap: testwikis wikis to 1.38.0-wmf.4  refs [[phab:T281168|T281168]] (duration: 45m 36s)
* 19:39 ebernhardson@deploy1001: Started deploy [wikimedia/discovery/analytics@88e1283]: spark: fix handling of unpartitioned data sources
* 18:12 volans@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sretest1001.eqiad.wmnet with OS buster
* 19:33 dduvall@deploy1001: rebuilt and synchronized wikiversions files: all wikis to 1.36.0-wmf.13
* 18:01 dancy@deploy1002: Started scap: testwikis wikis to 1.38.0-wmf.4  refs [[phab:T281168|T281168]]
* 19:30 robh@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 17:58 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 19:23 robh@cumin1001: START - Cookbook sre.dns.netbox
* 17:56 dancy@deploy1002: Synchronized php-1.38.0-wmf.4/extensions/CentralNotice: Backport: [[gerrit:730141]] (duration: 00m 59s)
* 19:20 catrope@deploy1001: Synchronized php-1.36.0-wmf.11/extensions/DiscussionTools/: Correctly generate timezone abbreviations for parsing ([[phab:T265500|T265500]]) (duration: 01m 29s)
* 17:55 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 19:16 catrope@deploy1001: Synchronized php-1.36.0-wmf.13/extensions/DiscussionTools/: Correctly generate timezone abbreviations for parsing ([[phab:T265500|T265500]]) (duration: 01m 51s)
* 17:46 volans@cumin2002: START - Cookbook sre.hosts.reimage for host sretest1001.eqiad.wmnet with OS buster
* 19:14 catrope@deploy1001: Synchronized php-1.36.0-wmf.13/extensions/Echo/: Drop text indent in modern Vector ([[phab:T264339|T264339]]) (duration: 01m 51s)
* 17:43 jgiannelos@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 19:09 catrope@deploy1001: Synchronized php-1.36.0-wmf.13/skins/Vector/: Vertically align personal tools ([[phab:T264339|T264339]]) (duration: 01m 43s)
* 17:41 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 19:07 catrope@deploy1001: Synchronized php-1.36.0-wmf.13/extensions/WikimediaEvents/: Revert "clientError: Adds is_logged_in tag to aid filtering" ([[phab:T256173|T256173]]) (duration: 01m 58s)
* 17:41 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 19:04 catrope@deploy1001: Synchronized php-1.36.0-wmf.13/extensions/UploadWizard/: Work around LESS calculating calc() values wrong ([[phab:T265560|T265560]]) (duration: 02m 07s)
* 17:38 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 18:32 mutante: depooling wtp2005 through wtp2009 (parsoid, old server generation) [[phab:T265558|T265558]]
* 17:32 dancy@deploy1002: Synchronized php-1.38.0-wmf.3/extensions/SyntaxHighlight_GeSHi/includes/ResourceLoaderPygmentsModule.php: Backport: [[gerrit:730233{{!}}Include generated styles before Mediawiki overrides (T292736)]] (duration: 00m 57s)
* 18:32 dzahn@cumin1001: conftool action : set/pooled=no; selector: dc=codfw,name=wtp200[6-9].codfw.wmnet
* 17:30 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 18:07 mutante: mx1001/mx2001: made previous live hack official and added benefactors@wikipedia alias, re-enabling puppet
* 17:27 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 17:51 pt1979@cumin2001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 17:23 dancy@deploy1002: Synchronized php-1.38.0-wmf.4/includes/actions/pagers/HistoryPager.php: Backport: [[gerrit:730236{{!}}Fix history page iteration in backwards mode (T292791)]] (duration: 00m 57s)
* 17:46 pt1979@cumin2001: START - Cookbook sre.dns.netbox
* 17:19 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 17:19 volans@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 17:16 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 17:17 jbond42: deleteing old pcc reports in compiler1002 to free disk space
* 17:16 dancy@deploy1002: Synchronized php-1.38.0-wmf.3/includes/actions/pagers/HistoryPager.php: Backport: [[gerrit:730235{{!}}Fix history page iteration in backwards mode (T292791)]] (duration: 00m 57s)
* 17:12 volans@cumin1001: START - Cookbook sre.dns.netbox
* 17:12 moritzm: installing rsync bugfix updates
* 17:06 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'zotero' for release 'staging' .
* 17:09 bd808@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'toolhub' for release 'main' .
* 17:05 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'wikifeeds' for release 'staging' .
* 16:56 bd808@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'toolhub' for release 'main' .
* 17:00 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'termbox' for release 'staging' .
* 16:55 volans@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts testvm2009.codfw.wmnet
* 16:58 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'push-notifications' for release 'main' .
* 16:53 moritzm: failed over ganeti master for test cluster to ganeti2025
* 16:57 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'proton' for release 'production' .
* 16:50 bd808@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'toolhub' for release 'main' .
* 16:56 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'mobileapps' for release 'staging' .
* 16:48 volans@cumin2002: START - Cookbook sre.hosts.decommission for hosts testvm2009.codfw.wmnet
* 16:54 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'mathoid' for release 'staging' .
* 16:32 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 16:51 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
* 16:30 volans@cumin2002: END (ERROR) - Cookbook sre.hosts.decommission (exit_code=97) for hosts testvm2009.codfw.wmnet
* 16:50 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics' for release 'canary' .
* 16:30 volans@cumin2002: START - Cookbook sre.hosts.decommission for hosts testvm2009.codfw.wmnet
* 16:50 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics' for release 'production' .
* 16:29 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 16:48 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'cxserver' for release 'staging' .
* 16:26 volans@cumin2002: END (ERROR) - Cookbook sre.ganeti.makevm (exit_code=97) for new host testvm2009.codfw.wmnet
* 16:46 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'citoid' for release 'staging' .
* 16:26 dancy@deploy1002: Synchronized php-1.38.0-wmf.4/includes: Backport: [[gerrit:730226{{!}}Pre-format comments for non-local files too (T292570)]] (duration: 01m 15s)
* 16:40 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
* 16:17 volans@cumin2002: START - Cookbook sre.ganeti.makevm for new host testvm2009.codfw.wmnet
* 16:25 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
* 16:16 volans@cumin2002: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts testvm2009.codfw.wmnet
* 16:25 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'canary' .
* 16:11 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 16:14 elukey@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 16:10 volans@cumin2002: START - Cookbook sre.hosts.decommission for hosts testvm2009.codfw.wmnet
* 16:14 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventstreams' for release 'canary' .
* 16:09 volans@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts testvm2009.codfw.wmnet
* 16:14 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventstreams' for release 'production' .
* 16:08 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 16:11 elukey@cumin1001: START - Cookbook sre.dns.netbox
* 16:06 dancy@deploy1002: Synchronized php-1.38.0-wmf.4/extensions/SecurePoll/includes/Hooks/HookRunner.php: Backport: [[gerrit:730231{{!}}Fix wrong var being passed (T289950 T293102)]] (duration: 00m 57s)
* 16:11 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'canary' .
* 16:00 volans@cumin2002: START - Cookbook sre.hosts.decommission for hosts testvm2009.codfw.wmnet
* 16:11 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
* 15:59 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 15:53 elukey@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 15:58 dancy@deploy1002: Synchronized php-1.38.0-wmf.3/extensions/SecurePoll/includes/Hooks/HookRunner.php: Backport: [[gerrit:730230{{!}}Fix wrong var being passed (T289950 T293102)]] (duration: 02m 13s)
* 15:53 urbanecm@deploy1001: Synchronized php-1.36.0-wmf.13/extensions/CheckUser/includes/specials/: {{Gerrit|fd94002cf6070180a289296ec65ad224e5a0ae67}}: Revert "Validate username input before constructing subpage links" ([[phab:T265606|T265606]]) (duration: 02m 48s)
* 15:57 volans@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host testvm2009.codfw.wmnet
* 15:50 elukey@cumin1001: START - Cookbook sre.dns.netbox
* 15:57 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 15:47 elukey@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 15:51 jgiannelos@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 15:35 elukey@cumin1001: START - Cookbook sre.dns.netbox
* 15:49 volans@cumin2002: START - Cookbook sre.ganeti.makevm for new host testvm2009.codfw.wmnet
* 15:29 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 15:48 volans@cumin2002: END (ERROR) - Cookbook sre.ganeti.makevm (exit_code=97) for new host testvm2009.codfw.wmnet
* 15:19 elukey@cumin1001: START - Cookbook sre.hosts.decommission
* 15:48 volans@cumin2002: START - Cookbook sre.ganeti.makevm for new host testvm2009.codfw.wmnet
* 15:09 elukey@cumin1001: END (PASS) - Cookbook sre.druid.roll-restart-workers (exit_code=0)
* 15:41 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for analytics1069.eqiad.wmnet
* 15:07 ebernhardson@deploy1001: Finished deploy [wikimedia/discovery/analytics@500bdad]: spark: correctly parse non-partitioned partition specs (duration: 00m 59s)
* 15:41 btullis@cumin1001: START - Cookbook sre.hosts.remove-downtime for analytics1069.eqiad.wmnet
* 15:06 ebernhardson@deploy1001: Started deploy [wikimedia/discovery/analytics@500bdad]: spark: correctly parse non-partitioned partition specs
* 15:02 volans@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 14:51 elukey: roll restart druid-historical daemons on druid1004-1008 to pick up new conn pooling changes
* 14:50 volans@cumin2002: START - Cookbook sre.dns.netbox
* 14:51 elukey@cumin1001: START - Cookbook sre.druid.roll-restart-workers
* 13:49 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 14:45 jbond42: enable puppet post deploy puppetdb change blacklisting dynamic facts
* 13:40 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host testvm2006.codfw.wmnet
* 14:41 ema: varnish 6.0.6-1wm2 uploaded to apt.wikimedia.org component/varnish6 [[phab:T264074|T264074]]
* 13:25 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host testvm2006.codfw.wmnet
* 14:38 jbond42: disable puppet to deploy puppetdb change blacklisting dynamic facts
* 13:21 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 14:21 ema: cp3050: systemctl reload varnishkafka-webrequest.service [[phab:T264074|T264074]]
* 13:19 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 14:21 jayme: imported doxygen_1.8.19-1~deb10+wmf1 to component/ci buster-wikimedia - [[phab:T265579|T265579]]
* 13:14 godog: add 50G to prometheus/k8s in eqiad
* 14:12 ema: cp3050: restart varnishkafka-webrequest w/ libvarnishapi2 6.0.6-1wm2 [[phab:T264074|T264074]]
* 13:13 otto@deploy1002: Synchronized wmf-config/CommonSettings.php: Enable x_client_ip_forwarding_enabled for eventgate-analytics and eventgate-analytics-external - [[phab:T288853|T288853]] (duration: 00m 56s)
* 14:11 ema: cp3050: upgrade varnish to 6.0.6-1wm2 [[phab:T264074|T264074]]
* 13:11 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on analytics1069.eqiad.wmnet with reason: draining flea power [[phab:T291732|T291732]]
* 14:10 ema: cp3050: upgrade varnish to 6.0.6-1wm2 [[phab:T26407|T26407]]
* 13:11 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on analytics1069.eqiad.wmnet with reason: draining flea power [[phab:T291732|T291732]]
* 12:58 gilles@deploy1001: Finished deploy [performance/navtiming@dff55f8]: (no justification provided) (duration: 00m 05s)
* 13:05 volans: upgraed spicerack to 1.0.5 on cumin hosts
* 12:58 gilles@deploy1001: Started deploy [performance/navtiming@dff55f8]: (no justification provided)
* 12:25 volans: uploaded spicerack_1.0.5 to apt.wikimedia.org buster-wikimedia,bullseye-wikimedia
* 12:12 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'blubberoid' for release 'staging' .
* 12:15 elukey: `kafka topics --alter --topic codfw.mediawiki.job.cirrusSearchElasticaWrite --partitions 5` - [[phab:T288825|T288825]]
* 10:47 vgutierrez: restart ats-backend on cp3050
* 12:15 elukey: `kafka topics --alter --topic eqiad.mediawiki.job.cirrusSearchElasticaWrite --partitions 5` - [[phab:T288825|T288825]]
* 10:00 akosiaris: [[phab:T264209|T264209]]. Initiate a docker pull of docker-registry.discovery.wmnet/mwcachedir:0.0.1 from all kubernetes and kubernetes staging nodes.
* 12:10 elukey: `kafka topics --alter --topic codfw.cpjobqueue.partitioned.mediawiki.job.cirrusSearchElasticaWrite --partitions 5` - [[phab:T288825|T288825]]
* 08:17 godog: swift codfw-prod: bump object weight for ms-be2057 - [[phab:T261633|T261633]]
* 12:09 elukey: `kafka topics --alter --topic eqiad.cpjobqueue.partitioned.mediawiki.job.cirrusSearchElasticaWrite --partitions 5` - [[phab:T288825|T288825]]
* 04:27 ryankemper: Rolling upgrade for cirrus `codfw` complete
* 11:58 elukey: `kafka topics --alter --topic codfw.resource-purge --partitions 5` on kafka-main2001 - [[phab:T288825|T288825]]
* 04:10 ryankemper@cumin1001: END (PASS) - Cookbook sre.elasticsearch.rolling-upgrade (exit_code=0)
* 11:49 elukey: `kafka topics --alter --topic eqiad.resource-purge --partitions 5` on kafka-main2001 - [[phab:T288825|T288825]]
* 02:18 ryankemper: Rolling upgrade for cirrussearch `codfw` beginning
* 11:46 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2025.codfw.wmnet
* 02:18 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-upgrade
* 11:44 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 02:14 ryankemper: Rolling upgrade for cirrussearch `eqiad` is complete
* 11:42 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 02:13 ryankemper@cumin1001: END (PASS) - Cookbook sre.elasticsearch.rolling-upgrade (exit_code=0)
* 11:40 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2025.codfw.wmnet
* 00:36 ryankemper: Beginning rolling upgrade for cirrussearch `eqiad`. Cookbook will restart elasticsearch on 36 nodes total, 3 nodes at a time
* 11:34 urbanecm: UTC morning B&C window done
* 00:36 eileen: tools revision changed from {{Gerrit|d4e08c52de}} to {{Gerrit|a2a91d6c6a}}
* 11:33 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 00:35 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-upgrade
* 11:32 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|860ea0944d6dc1e6b5061eb84eec378eb5ac8441}}: Remove NS_MAIN from wgExtraSignatureNamespaces on most special wikis ([[phab:T291630|T291630]]) (duration: 00m 57s)
* 00:24 twentyafterfour: phabricator update was uneventful
* 11:30 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 00:13 twentyafterfour: updating phabricator
* 11:22 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 11:19 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 11:14 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 11:11 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 11:08 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 11:06 urbanecm@deploy1002: Synchronized w/static.php: {{Gerrit|e77ae17efb34723598fc69e87109944384df442a}}: static.php: correctly report a bad request (duration: 00m 57s)
* 11:02 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host testvm2003.codfw.wmnet
* 10:58 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host testvm2003.codfw.wmnet
* 10:53 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on testvm[2001-2002,2005].codfw.wmnet with reason: Ganeti tests
* 10:53 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on testvm[2001-2002,2005].codfw.wmnet with reason: Ganeti tests
* 10:30 ema: apply https://gerrit.wikimedia.org/r/726912 to all A:cp nodes [[phab:T288106|T288106]]
* 10:24 ema@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp4028.ulsfo.wmnet,service=ats-be
* 10:23 ema: depool/repool ats-be on cp4028 to verify updates to /etc/varnish/directors.frontend.vcl on cp4027 keep on working fine [[phab:T288106|T288106]]
* 10:23 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 10:22 ema@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp4028.ulsfo.wmnet,service=ats-be
* 10:16 ema: cp4027: enable and run puppet to test https://gerrit.wikimedia.org/r/726912 [[phab:T288106|T288106]]
* 10:12 jmm@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ganeti2025.codfw.wmnet with OS buster
* 09:16 kormat@cumin1001: dbctl commit (dc=all): 'db1127 (re)pooling @ 100%: repool db1127 [[phab:T292956|T292956]]', diff saved to https://phabricator.wikimedia.org/P17456 and previous config saved to /var/cache/conftool/dbconfig/20211012-091614-kormat.json
* 09:01 kormat@cumin1001: dbctl commit (dc=all): 'db1127 (re)pooling @ 75%: repool db1127 [[phab:T292956|T292956]]', diff saved to https://phabricator.wikimedia.org/P17455 and previous config saved to /var/cache/conftool/dbconfig/20211012-090111-kormat.json
* 08:46 kormat@cumin1001: dbctl commit (dc=all): 'db1127 (re)pooling @ 50%: repool db1127 [[phab:T292956|T292956]]', diff saved to https://phabricator.wikimedia.org/P17454 and previous config saved to /var/cache/conftool/dbconfig/20211012-084607-kormat.json
* 08:31 kormat@cumin1001: dbctl commit (dc=all): 'db1127 (re)pooling @ 25%: repool db1127 [[phab:T292956|T292956]]', diff saved to https://phabricator.wikimedia.org/P17453 and previous config saved to /var/cache/conftool/dbconfig/20211012-083103-kormat.json
* 08:03 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 08:00 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 07:58 urbanecm@deploy1002: Synchronized php-1.38.0-wmf.3/extensions/GrowthExperiments/: {{Gerrit|17dc3aa}}, {{Gerrit|e0ca905}}, {{Gerrit|c0f4f4e}}: GrowthExperiments backports ([[phab:T292224|T292224]], [[phab:T290609|T290609]], [[phab:T290609|T290609]]) (duration: 00m 59s)
* 07:40 elukey: run kafka preferred-replica-election on kafka-main2001 to rebalance partition leaders after the last topic moves - [[phab:T288825|T288825]]
* 07:33 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti2025.codfw.wmnet with OS buster
* 07:22 moritzm: installing RT security updates
* 04:43 eileen: civicrm revision changed from {{Gerrit|96090e4bd2}} to {{Gerrit|946dfb6c5a}}, config revision is {{Gerrit|85277466ed}}
* 03:56 kart_: cxserver: Remove Matxin Key from Production ([[phab:T292635|T292635]])
* 03:54 kartik@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'cxserver' for release 'production' .
* 03:48 kartik@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'cxserver' for release 'production' .
* 03:45 kartik@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'cxserver' for release 'staging' .
* 02:44 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 02:41 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 02:28 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 02:22 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 00:11 eileen: civicrm revision changed from {{Gerrit|598b59b0ee}} to {{Gerrit|96090e4bd2}}, config revision is {{Gerrit|85277466ed}}


== 2020-10-14 ==
== 2021-10-11 ==
* 23:35 foks: Removing one further file for legal compliance
* 21:25 btullis@cumin1001: END (PASS) - Cookbook sre.hadoop.roll-restart-masters (exit_code=0) restart masters for Hadoop analytics cluster: Restart of jvm daemons. - btullis@cumin1001
* 23:28 foks: Removing nine files for legal compliance
* 20:58 btullis@cumin1001: START - Cookbook sre.hadoop.roll-restart-masters restart masters for Hadoop analytics cluster: Restart of jvm daemons. - btullis@cumin1001
* 23:11 ebernhardson: Syncronized wmf-config/InitialiseSettings.php to sync reduction of cirrus morelike query cache from 3 back to 1 day
* 17:08 elukey: force kafka preferred-replica-election on kafka-main2001 after another batch of topic partitions moves - [[phab:T288825|T288825]]
* 23:08 ebernhardson@deploy1001: Synchronized wmf-config/InitialiseSettings.php: (no justification provided) (duration: 01m 04s)
* 15:40 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2026.codfw.wmnet
* 23:00 dwisehaupt: all payments hosts in eqiad are now running the REL1_35 code.
* 15:34 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2026.codfw.wmnet
* 22:41 ebernhardson@deploy1001: Finished deploy [search/mjolnir/deploy@9ce273f]: bulk_daemon: revert of streaming gzip decompression (duration: 02m 25s)
* 15:31 jgleeson: smashpig updated from {{Gerrit|3607b16f83}} to {{Gerrit|dd3a81c7c2}}
* 22:38 ebernhardson@deploy1001: Started deploy [search/mjolnir/deploy@9ce273f]: bulk_daemon: revert of streaming gzip decompression
* 14:59 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on testvm[2001-2002,2005].codfw.wmnet with reason: Ganeti tests
* 22:13 dduvall@deploy1001: Synchronized php: group1 wikis to 1.36.0-wmf.13 (duration: 01m 03s)
* 14:59 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on testvm[2001-2002,2005].codfw.wmnet with reason: Ganeti tests
* 22:12 dduvall@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.36.0-wmf.13
* 14:36 Emperor: start restoring weight to ms-be2045 [[phab:T290881|T290881]]
* 22:08 ebernhardson@deploy1001: Finished deploy [wikimedia/discovery/analytics@04548dd]: spark: centralize reading/writing to hive (duration: 03m 44s)
* 13:42 elukey: force kafka preferred-replica-election on kafka-main2001 after another batch of topic partitions moves - [[phab:T288825|T288825]]
* 22:04 ebernhardson@deploy1001: Started deploy [wikimedia/discovery/analytics@04548dd]: spark: centralize reading/writing to hive
* 12:53 moritzm: install apache security updates on buster
* 22:01 thcipriani@deploy1001: Synchronized php-1.36.0-wmf.13/extensions/NavigationTiming: BACON: [[gerrit:634002{{!}}Make attribution source logic more defensive]] [[phab:T263599|T263599]] (duration: 01m 05s)
* 12:49 topranks: Setting up BGP peering to AS12552 (GlobalConnect Group) at AMS-IX on cr2-esams
* 21:51 dpifke@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enabling image preconnect in group0 ([[phab:T123582|T123582]]) (duration: 01m 03s)
* 12:45 ema: cp4027: upgrade varnish to 6.0.8 [[phab:T292290|T292290]]
* 21:33 thcipriani@deploy1001: Synchronized php-1.36.0-wmf.13/skins/Vector/resources/skins.vector.styles/Menu.less: BACON: [[gerrit:634086{{!}}Stylesheet needs to be compatible with cached HTML]] [[phab:T265543|T265543]] (duration: 01m 07s)
* 12:04 moritzm: install apache security updates on bullseye
* 20:39 marxarelli: group1 rolled back to 1.36.0-wmf.11 due to malformed html in nav. task incoming (cc: [[phab:T263179|T263179]])
* 10:23 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host graphite2003.codfw.wmnet
* 20:37 dduvall@deploy1001: rebuilt and synchronized wikiversions files: Revert group1 wikis to 1.36.0-wmf.11
* 09:50 filippo@cumin1001: START - Cookbook sre.hosts.reimage for host graphite2003.codfw.wmnet
* 20:32 marxarelli: rolling back group1 due to malformed html in nav menu
* 09:45 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host graphite2003.codfw.wmnet
* 19:46 marxarelli: 1.36.0-wmf.13 promoted to group1. no new or concerning errors or changes in error rates ([[phab:T263179|T263179]])
* 09:37 elukey: force kafka preferred-replica-election on kafka-main2001 after another batch of topic partitions moves - [[phab:T288825|T288825]]
* 19:39 dduvall@deploy1001: Synchronized php: group1 wikis to 1.36.0-wmf.13 (duration: 01m 03s)
* 09:13 filippo@cumin1001: START - Cookbook sre.hosts.reimage for host graphite2003.codfw.wmnet
* 19:38 dduvall@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.36.0-wmf.13
* 09:09 elukey: force kafka preferred-replica-election on kafka-main2001 after the first 50 topic partitions moves - [[phab:T288825|T288825]]
* 19:33 mutante: mx1001/mx2001 - temp. disabled puppet, live hacking urgent alias change since private repo needs to be fixed
* 09:05 volans@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sretest1002.eqiad.wmnet
* 19:14 mutante: depooling 5 of the older parsoid servers in codfw
* 09:01 godog: bounce swift-object-replicator on ms-be2036
* 19:14 dzahn@cumin1001: conftool action : set/pooled=no; selector: dc=codfw,name=wtp200[1-5].codfw.wmnet
* 08:52 godog: bounce statsite on graphite1004 to apply unit config changes
* 18:28 Urbanecm: wikiadmin@10.192.0.6(wikidatawiki)> DELETE FROM watchlist WHERE wl_user=104889; # [[phab:T265347|T265347]]
* 08:48 volans@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sretest1001.eqiad.wmnet
* 18:14 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|d6a56bb7fb762c53db5965f2698a93db2433d33d}}: Add rollbacker right on uzwiki ([[phab:T265509|T265509]]) (duration: 01m 04s)
* 08:41 volans@cumin2002: START - Cookbook sre.hosts.reimage for host sretest1002.eqiad.wmnet
* 18:10 urbanecm@deploy1001: Synchronized wmf-config/CommonSettings.php: {{Gerrit|0da89998e4e380f3ebe527a42a47dc66c49ee4d2}}: Add spamblacklistlog as a default right for the CU log user ([[phab:T239288|T239288]]) (duration: 01m 05s)
* 08:38 moritzm: updated buster d-i image for Bullseye 11.1 point release [[phab:T292844|T292844]]
* 16:12 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.reboot-workers (exit_code=0)
* 08:38 moritzm: updated buster d-i image for Buster 10.11 point release [[phab:T292838|T292838]]
* 15:59 elukey: drain + reboot an-worker1100 to pick up GPU settings - [[phab:T255138|T255138]]
* 08:26 godog: swift eqiad-prod: final weight to ms-be10[64-67] - [[phab:T290546|T290546]]
* 15:58 elukey@cumin1001: START - Cookbook sre.hadoop.reboot-workers
* 08:25 moritzm: updated buster d-i image for Buster 10.11 point release [[phab:T292838|T292838]]
* 15:55 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.reboot-workers (exit_code=0)
* 08:24 volans@cumin1001: START - Cookbook sre.hosts.reimage for host sretest1001.eqiad.wmnet
* 15:29 elukey: drain + reboot an-worker110[1,2] to pick up GPU settings - [[phab:T255138|T255138]]
* 08:06 godog: bounce uwsgi on graphite hosts to bump request size limit - [[phab:T292877|T292877]]
* 15:28 elukey@cumin1001: START - Cookbook sre.hadoop.reboot-workers
* 07:58 volans: migrating physical hosts DHCP to the new reimage process - [[phab:T269855|T269855]]
* 15:26 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.reboot-workers (exit_code=0)
* 07:57 elukey: start kafka topics rebalancing for main-codfw (long running maintenance) - [[phab:T288825|T288825]]
* 15:24 jayme: enabled and ran puppet on deploy1001 - [[phab:T260917|T260917]]
* 14:56 elukey: drain + reboot an-worker109[8,9] to pick up GPU settings - [[phab:T255138|T255138]]
* 14:55 elukey@cumin1001: START - Cookbook sre.hadoop.reboot-workers
* 14:12 jayme: disable-puppet on deploy1001 to test a change in hemlfile puppet on deploy2001 only - [[phab:T260917|T260917]]
* 14:01 akosiaris: push a 6GB image, named docker-registry.discovery.wmnet/mwcachedir:0.0.1, containing the cache/ dir of a mediawiki installation to the registry. [[phab:T264209|T264209]]
* 14:01 akosiaris: push a 6GB image, named docker-registry.discovery.wmnet/mwcachedir:0.0.1, containing the cache/ dir of a mediawiki installation to the registry. [[phab:T265183|T265183]]
* 13:53 jbond42: enable puppet fleet wide post - convert puppetdb stockpile queue to tmpfs
* 13:48 jbond42: disable puppet fleet wide to convert puppetdb stockpile queue to tmpfs
* 12:46 vgutierrez: Bump ECDHE-ECDSA-AES128-SHA pageview replacement to 10% - [[phab:T258405|T258405]]
* 11:50 hnowlan@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
* 11:50 hnowlan@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
* 11:48 hnowlan@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
* 11:48 hnowlan@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
* 11:43 moritzm: imported php-memcached, php-redis to component/icu63 [[phab:T264991|T264991]]
* 11:25 Urbanecm: EU B&C window completed
* 11:22 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|c63632de6a20b2f00da91187e5cf416fd39d8c5b}}: Enable DiscussionTools as a beta feature on 30 more wikis ([[phab:T264693|T264693]]) (duration: 01m 15s)
* 11:16 moritzm: imported php-igbinary, php-apcu-bc to component/icu63 [[phab:T264991|T264991]]
* 09:59 moritzm: imported php-wmerrors, tideways, tideways-xhprof, wikidiff2, xdebug to component/icu63 [[phab:T264991|T264991]]
* 08:34 elukey@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 08:28 elukey@cumin1001: START - Cookbook sre.dns.netbox
* 08:09 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 08:09 filippo@cumin1001: START - Cookbook sre.hosts.downtime
* 07:14 marostegui@cumin1001: dbctl commit (dc=all): 'db2125 (re)pooling @ 100%: Slowly repool db2125 after on-site maintenance [[phab:T260670|T260670]] ', diff saved to https://phabricator.wikimedia.org/P12988 and previous config saved to /var/cache/conftool/dbconfig/20201014-071440-root.json
* 06:59 marostegui@cumin1001: dbctl commit (dc=all): 'db2125 (re)pooling @ 75%: Slowly repool db2125 after on-site maintenance [[phab:T260670|T260670]] ', diff saved to https://phabricator.wikimedia.org/P12987 and previous config saved to /var/cache/conftool/dbconfig/20201014-065936-root.json
* 06:44 marostegui@cumin1001: dbctl commit (dc=all): 'db2125 (re)pooling @ 50%: Slowly repool db2125 after on-site maintenance [[phab:T260670|T260670]] ', diff saved to https://phabricator.wikimedia.org/P12986 and previous config saved to /var/cache/conftool/dbconfig/20201014-064433-root.json
* 06:29 marostegui@cumin1001: dbctl commit (dc=all): 'db2125 (re)pooling @ 40%: Slowly repool db2125 after on-site maintenance [[phab:T260670|T260670]] ', diff saved to https://phabricator.wikimedia.org/P12985 and previous config saved to /var/cache/conftool/dbconfig/20201014-062930-root.json
* 06:14 marostegui@cumin1001: dbctl commit (dc=all): 'db2125 (re)pooling @ 20%: Slowly repool db2125 after on-site maintenance [[phab:T260670|T260670]] ', diff saved to https://phabricator.wikimedia.org/P12984 and previous config saved to /var/cache/conftool/dbconfig/20201014-061426-root.json
* 06:12 marostegui: Change UNIQUE into KEY on enwikivoyage.imagelinks [[phab:T265445|T265445]]
* 05:59 marostegui@cumin1001: dbctl commit (dc=all): 'db2125 (re)pooling @ 30%: Slowly repool db2125 after on-site maintenance [[phab:T260670|T260670]] ', diff saved to https://phabricator.wikimedia.org/P12983 and previous config saved to /var/cache/conftool/dbconfig/20201014-055923-root.json
* 05:44 marostegui@cumin1001: dbctl commit (dc=all): 'db2125 (re)pooling @ 10%: Slowly repool db2125 after on-site maintenance [[phab:T260670|T260670]] ', diff saved to https://phabricator.wikimedia.org/P12982 and previous config saved to /var/cache/conftool/dbconfig/20201014-054420-root.json


== 2020-10-13 ==
== 2021-10-09 ==
* 23:22 catrope@deploy1001: Synchronized php-1.36.0-wmf.13/extensions/GrowthExperiments/: Revert removal of variant A ([[phab:T265372|T265372]]) (duration: 01m 04s)
* 05:01 jiji@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 23:18 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Rename GrowthExperiments help desk on ptwiki ([[phab:T265214|T265214]]) (duration: 01m 04s)
* 04:28 jiji@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 23:14 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Disable event logging in MediaViewer ([[phab:T260582|T260582]]) (duration: 01m 04s)
* 01:32 ryankemper@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) restart without plugin upgrade (1 nodes at a time) for ElasticSearch cluster cloudelastic: cloudelastic restart - ryankemper@cumin1001 - [[phab:T292814|T292814]]
* 23:07 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enable watchlist expiry on frwiki, fawiki, dewiki, cswiki ([[phab:T264780|T264780]]) (duration: 01m 04s)
* 00:46 mutante: ms-be2045 - started systemd-timedated which had been killed by something
* 21:16 mutante: icinga had gerrit health alert but did not notice an issue myself and was gone next check
* 00:28 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation restart without plugin upgrade (1 nodes at a time) for ElasticSearch cluster cloudelastic: cloudelastic restart - ryankemper@cumin1001 - [[phab:T292814|T292814]]
* 21:12 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 00:24 ryankemper@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.force-unfreeze (exit_code=99)
* 21:12 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 00:23 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.force-unfreeze
* 21:09 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 00:13 ryankemper: [[phab:T292814|T292814]] Write queue stuck at 133 events in partition 1 of topic `codfw.cpjobqueue.partitioned.mediawiki.job.cirrusSearchElasticaWrite`, will try again at another time
* 21:07 andrew@cumin1001: START - Cookbook sre.hosts.downtime
* 00:12 ryankemper@cumin1001: END (ERROR) - Cookbook sre.elasticsearch.rolling-operation (exit_code=97) restart without plugin upgrade (1 nodes at a time) for ElasticSearch cluster cloudelastic: cloudelastic restart - ryankemper@cumin1001 - [[phab:T292814|T292814]]
* 20:44 mutante: bast1002 - apt-get autoremove - cleans up golang and ruby packages
* 20:44 mutante: bast1002 - apt-get remove nmap (it can be used on netmon hosts and was not consistent with other bast hosts)
* 20:15 ebernhardson: unban elastic2029 from production-search-psi-codfw
* 20:14 ebernhardson: restart production-search-psi-codfw on elastic2029 to reset any wonkiness from gc hell
* 20:06 marxarelli: 1.36.0-wmf.13 promoted to group0. no new or concerning errors or changes in error rates ([[phab:T263179|T263179]])
* 20:03 ebernhardson: add elastic2029-production-search-psi-codfw to cluster.routing.allocatin.exclude._name to drain active shards, instance currently in gc hell
* 19:54 dduvall@deploy1001: rebuilt and synchronized wikiversions files: group0 wikis to 1.36.0-wmf.13
* 19:52 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 19:49 andrew@cumin1001: START - Cookbook sre.hosts.downtime
* 19:40 dduvall@deploy1001: Finished scap: testwikis wikis to 1.36.0-wmf.13 (duration: 40m 51s)
* 19:00 dduvall@deploy1001: Started scap: testwikis wikis to 1.36.0-wmf.13
* 18:58 dduvall@deploy1001: Pruned MediaWiki: 1.36.0-wmf.9 (duration: 01m 56s)
* 18:56 dduvall@deploy1001: Pruned MediaWiki: 1.36.0-wmf.8 (duration: 02m 10s)
* 18:53 dduvall@deploy1001: Pruned MediaWiki: 1.36.0-wmf.6 (duration: 13m 00s)
* 18:23 dduvall@deploy1001: rebuilt and synchronized wikiversions files: all wikis to 1.36.0-wmf.11
* 18:21 marxarelli: 1.36.0-wmf.11 promoted to group1. no new errors ([[phab:T263177|T263177]]). promoting to all wikis
* 18:10 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 18:09 robh: scs-c1-codfw mgmt firmware updated, updating scs-a1-codfw [[phab:T238036|T238036]]
* 18:08 andrew@cumin1001: START - Cookbook sre.hosts.downtime
* 18:01 robh: scs-c1-codfw firmware update via [[phab:T238036|T238036]]
* 17:47 marxarelli: 1.36.0-wmf.13 branched at {{Gerrit|a6be801fc6331a6a6b96f02f368750200d50ab09}} for [[phab:T263179|T263179]]
* 17:35 dduvall@deploy1001: Synchronized php: group1 wikis to 1.36.0-wmf.11 (duration: 01m 07s)
* 17:34 dduvall@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.36.0-wmf.11
* 17:32 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 17:32 jbond@cumin1001: START - Cookbook sre.hosts.downtime
* 17:30 marxarelli: 1.36.0-wmf.11 promoted to group0. no new errors ([[phab:T263177|T263177]]). preparing to promote to group1
* 17:18 ppchelko@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventstreams' for release 'production' .
* 17:18 ppchelko@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventstreams' for release 'canary' .
* 17:17 ppchelko@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventstreams' for release 'production' .
* 17:16 ppchelko@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventstreams' for release 'canary' .
* 17:15 ppchelko@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventstreams' for release 'canary' .
* 17:15 ppchelko@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventstreams' for release 'production' .
* 16:39 dduvall@deploy1001: rebuilt and synchronized wikiversions files: group0 wikis to 1.36.0-wmf.11
* 16:31 ebernhardson@deploy1001: Finished deploy [wikimedia/discovery/analytics@77febb6]: airflow: parameterize active mediawiki dc (duration: 05m 29s)
* 16:26 ebernhardson@deploy1001: Started deploy [wikimedia/discovery/analytics@77febb6]: airflow: parameterize active mediawiki dc
* 15:56 papaul: power down ms-be2036 for maintenance
* 15:02 godog: bounce logstash on logstash1007, GC death
* 14:41 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 14:39 andrew@cumin1001: START - Cookbook sre.hosts.downtime
* 14:18 urbanecm@deploy1001: Synchronized wmf-config/CommonSettings.php: {{Gerrit|5b28fd685b9cb8d8e93650b5d02bc41b81d0883c}}: Add setmentor to wgAvailableRights (duration: 00m 59s)
* 13:42 jayme@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'push-notifications' for release 'main' .
* 13:40 jayme@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'push-notifications' for release 'main' .
* 13:15 Urbanecm: [urbanecm@mwmaint2001 ~]$ mwscript namespaceDupes.php --wiki=trwiki --add-prefix=BROKEN --fix # [[phab:T265336|T265336]]
* 13:08 moritzm: imported php-mailparse, php-mongodb, php-msgpack to component/icu63 [[phab:T264991|T264991]]
* 12:50 Urbanecm: urbanecm@mwmaint2001:~$ mwscript namespaceDupes.php --wiki=trwiki --add-prefix=FIXME --fix # [[phab:T265336|T265336]]
* 12:49 Urbanecm: End of `urbanecm@mwmaint2001:~$ mwscript namespaceDupes.php --wiki=trwiki --fix` # [[phab:T265336|T265336]]
* 12:49 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es2026 for on-site maintenance [[phab:T263837|T263837]] ', diff saved to https://phabricator.wikimedia.org/P12975 and previous config saved to /var/cache/conftool/dbconfig/20201013-124940-marostegui.json
* 12:20 moritzm: imported dh-php, php-acpu, php-imagick to component/icu63 [[phab:T264991|T264991]]
* 11:22 moritzm: imported php-defaults, php-excimer, php-luasandbox, php-geoip to component/icu63 [[phab:T264991|T264991]]
* 11:16 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|90028b4c3c1cd4407e0834d603ccb8b256f2498e}}: Add suppressredirect right to reviewers on bnwiki ([[phab:T265169|T265169]]) (duration: 00m 58s)
* 11:14 Urbanecm: Start of `urbanecm@mwmaint2001:~$ mwscript namespaceDupes.php --wiki=trwiki --fix # [[phab:T265336|T265336]]`
* 11:13 volans: installed spicerack_0.0.43-1+deb10u1_amd64.deb on cumin2001 , need to wait a long-rnning cookbook to end to upgrade both hosts
* 11:09 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|e61fcebe7315f73d1fb4d531da37d2c1253115ee}}: Add namespace aliases for Turkish Wikipedia ([[phab:T265336|T265336]]) (duration: 00m 59s)
* 10:47 jayme: no-change rolling restart of push-notifications in codfw - [[phab:T265258|T265258]]
* 10:29 volans: upgrading spicerack on cumin2001 to 0.0.44
* 10:19 ema: cp3050: clear varnishkafka-webrequest's vut->sighup via stap [[phab:T264074|T264074]]
* 10:09 ema: cp3050: *reload* varnishkafka-webrequest [[phab:T264074|T264074]]
* 10:04 volans: uploaded spicerack_0.0.44 to apt.wikimedia.org buster-wikimedia
* 09:55 ema: cp3054: systemctl restart varnishkafka-webrequest.service [[phab:T264074|T264074]]
* 09:51 ema: cp3052: systemctl restart varnishkafka-webrequest.service [[phab:T264074|T264074]]
* 09:39 kormat: running schema change against s1 in eqiad [[phab:T259831|T259831]]
* 09:38 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 09:38 kormat@cumin1001: START - Cookbook sre.hosts.downtime
* 09:32 ema: cp3050: set grouping by request (vut->g_arg = 2) on varnishkafka-webrequest [[phab:T264074|T264074]]
* 08:40 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 08:40 kormat@cumin1001: START - Cookbook sre.hosts.downtime
* 08:13 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 08:11 klausman@cumin1001: START - Cookbook sre.hosts.downtime
* 07:55 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 07:55 filippo@cumin1001: START - Cookbook sre.hosts.downtime
* 07:43 kormat: running schema change against s3 in eqiad [[phab:T259831|T259831]]
* 07:43 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 07:43 kormat@cumin1001: START - Cookbook sre.hosts.downtime
* 07:37 moritzm: installing ruby security updates on stretch
* 07:02 moritzm: installing PHP 7.0 security updates
* 06:39 moritzm: Installing httpcomponents-client security updates for Stretch
* 05:35 marostegui: Set global innodb_change_buffering = inserts; on pc2009 [[phab:T263443|T263443]]


== 2020-10-12 ==
== 2021-10-08 ==
* 17:03 jayme: fixed /var/lock/ permission (1777) on ms-be2036 - [[phab:T265208|T265208]]
* 23:16 legoktm: sudo cumin -b 10 C:mediawiki::packages 'apt-get purge lilypond-data -y'
* 15:41 godog: roll-restart logstash5 in codfw
* 23:10 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation restart without plugin upgrade (1 nodes at a time) for ElasticSearch cluster cloudelastic: cloudelastic restart - ryankemper@cumin1001 - [[phab:T292814|T292814]]
* 14:44 _joe_: freed 1.5 GB of space on ms-be2036 by running "apt-get clean"
* 21:38 mutante: mwmaint2002 - disable-puppet, stop bacula-fd, recovery in progress
* 14:05 moritzm: uploaded php7.2 7.2.31-1+0~20200514.41+debian9~1.gbpe2a56b+wmf1+icu63 to component/icu63 [[phab:T264991|T264991]]
* 21:34 mutante: disabling puppet on bacula - going through a restore https://wikitech.wikimedia.org/wiki/Bacula#Restore_from_a_non-existent_host_(missing_private_key)
* 12:39 moritzm: installing rails security updates on Stretch
* 21:30 legoktm: running puppet across C:mediawiki::packages to uninstall lilypond and ploticus: legoktm@cumin1001:~$ sudo cumin -b 4 C:mediawiki::packages 'run-puppet-agent'
* 12:26 moritzm: installing spice security updates on Buster
* 20:12 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes1018.eqiad.wmnet with reason: REIMAGE
* 11:38 Urbanecm: EU B&C done
* 20:10 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubestage1004.eqiad.wmnet with reason: REIMAGE
* 11:32 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|fff2532424f84970962f7de1e35d4250b83cb3da}}: [testwiki, test2wiki] Allow bureaucrats to grant import rights (duration: 00m 58s)
* 20:08 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1018.eqiad.wmnet with reason: REIMAGE
* 11:28 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|4966e8a6b8ae4e6d5623dd35e65ed8fcf3338bc1}}: Enable wgCheckUserLogLogins at all wikis but few large wikis ([[phab:T253802|T253802]]) (duration: 00m 58s)
* 20:08 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubestage1003.eqiad.wmnet with reason: REIMAGE
* 11:27 hnowlan@cumin1001: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0)
* 20:06 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubestage1004.eqiad.wmnet with reason: REIMAGE
* 11:18 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:631809{{!}}Require autoconfirmed status to edit Wikidata Properties (T254280)]] (duration: 01m 00s)
* 20:05 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubestage1003.eqiad.wmnet with reason: REIMAGE
* 10:26 hnowlan@cumin1001: START - Cookbook sre.cassandra.roll-restart
* 19:46 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes1020.eqiad.wmnet with reason: REIMAGE
* 10:26 hnowlan: roll-restarting restbase201[345678] for cert refresh
* 19:45 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes1019.eqiad.wmnet with reason: REIMAGE
* 08:50 moritzm: uploaded libxml2 2.9.4+dfsg1-2.2+deb9u3+wmf1 to component/icu63 [[phab:T264991|T264991]]
* 19:43 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1020.eqiad.wmnet with reason: REIMAGE
* 07:54 godog: reboot ms-be2036 - [[phab:T265208|T265208]]
* 19:42 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1019.eqiad.wmnet with reason: REIMAGE
* 07:53 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 19:42 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes1018.eqiad.wmnet with reason: REIMAGE
* 07:53 filippo@cumin1001: START - Cookbook sre.hosts.downtime
* 19:39 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1018.eqiad.wmnet with reason: REIMAGE
* 07:53 filippo@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 18:15 cstone: civicrm revision changed from {{Gerrit|5cb7d487cb}} to {{Gerrit|598b59b0ee}}
* 07:53 filippo@cumin1001: START - Cookbook sre.hosts.downtime
* 16:19 urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript extensions/GrowthExperiments/maintenance/updateMenteeData.php --wiki=enwiki --force # to measure performance on a large wiki
* 15:48 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
* 15:48 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 15:29 jelto: enable puppet on gitlab1001 again for [[phab:T283076|T283076]]
* 14:05 jiji@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 14:01 jiji@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 09:49 Amir1: wikiadmin@10.64.16.85(wikidatawiki)> delete from wb_changes_subscription where cs_subscriber_id in ('testcommonswiki', 'mowiki');
* 09:39 Emperor: installing stress on ms-be2045 given recent h/w issues [[phab:T290881|T290881]]
* 08:20 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 08:12 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 08:04 urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript extensions/GrowthExperiments/maintenance/updateMenteeData.php --wiki=frwiki --force
* 07:43 Emperor: reboot ms-be2045 [[phab:T290881|T290881]]
* 07:41 gehel: manually resuming the data reloads on wdqs1009 and wdqs2008
* 06:42 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.cf (exit_code=0)
* 06:42 ayounsi@cumin1001: START - Cookbook sre.network.cf
* 06:28 ayounsi@cumin2002: END (PASS) - Cookbook sre.network.cf (exit_code=0)
* 06:28 ayounsi@cumin2002: START - Cookbook sre.network.cf
* 05:35 ryankemper@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) restart without plugin upgrade (1 nodes at a time) for ElasticSearch cluster cloudelastic: cloudelastic restart - ryankemper@cumin1001 - [[phab:T292814|T292814]]
* 04:56 ryankemper: [WDQS Deploy] Deploy complete. Successful test query placed on query.wikidata.org, there's no relevant criticals in Icinga, and Grafana looks good
* 04:32 ryankemper: [[phab:T292814|T292814]] Beginning rolling restart of `cloudelastic`: `sudo -i cookbook sre.elasticsearch.rolling-operation cloudelastic "cloudelastic restart" --nodes-per-run 1 --start-datetime 2021-10-08T03:53:49 --task-id [[phab:T292814|T292814]]` on `ryankemper@cumin1001` tmux `elastic`
* 04:31 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation restart without plugin upgrade (1 nodes at a time) for ElasticSearch cluster cloudelastic: cloudelastic restart - ryankemper@cumin1001 - [[phab:T292814|T292814]]
* 04:29 ryankemper: [WDQS Deploy] Restarting `wdqs-categories` across lvs-managed hosts, one node at a time: `sudo -E cumin -b 1 'A:wdqs-all and not A:wdqs-test' 'depool && sleep 45 && systemctl restart wdqs-categories && sleep 45 && pool'`
* 04:28 ryankemper: [WDQS Deploy] Restarted `wdqs-categories` across both test hosts simultaneously: `sudo -E cumin 'A:wdqs-test' 'systemctl restart wdqs-categories'`
* 04:28 ryankemper: [WDQS Deploy] Restarted `wdqs-updater` across all hosts, 4 hosts at a time: `sudo -E cumin -b 4 'A:wdqs-all' 'systemctl restart wdqs-updater'`
* 04:23 ryankemper@deploy1002: Finished deploy [wdqs/wdqs@8f57a56]: 0.3.89 (duration: 08m 22s)
* 04:20 ryankemper@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) restart without plugin upgrade (1 nodes at a time) for ElasticSearch cluster cloudelastic: cloudelastic restart - ryankemper@cumin1001 - [[phab:T292814|T292814]]
* 04:20 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation restart without plugin upgrade (1 nodes at a time) for ElasticSearch cluster cloudelastic: cloudelastic restart - ryankemper@cumin1001 - [[phab:T292814|T292814]]
* 04:18 gehel@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99)
* 04:17 gehel@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99)
* 04:15 ryankemper: [WDQS Deploy] Tests passing following deploy of `0.3.89` on canary `wdqs1003`; proceeding to rest of fleet
* 04:14 ryankemper@deploy1002: Started deploy [wdqs/wdqs@8f57a56]: 0.3.89
* 04:14 ryankemper: [WDQS Deploy] Gearing up for deploy of wdqs `0.3.89`. Pre-deploy tests passing on canary `wdqs1003`
* 03:58 ryankemper@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) restart without plugin upgrade (1 nodes at a time) for ElasticSearch cluster cloudelastic: cloudelastic restart - ryankemper@cumin1001 - [[phab:T292814|T292814]]
* 03:58 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation restart without plugin upgrade (1 nodes at a time) for ElasticSearch cluster cloudelastic: cloudelastic restart - ryankemper@cumin1001 - [[phab:T292814|T292814]]
* 02:04 Krinkle: krinkle@deploy1002$ echo 'https://en.wikipedia.org/static/images/project-logos/jvwiktionary.png' {{!}} mwscript purgeList.php , ref [[phab:T287425|T287425]], [[phab:T292810|T292810]]
* 00:07 tgr_: deploy window over
* 00:05 tgr@deploy1002: Synchronized php-1.38.0-wmf.3/extensions/GrowthExperiments: Backport: [[gerrit:727498{{!}}Mentee overview: Make UncachedMenteeOverviewDataProvider::getBlocksForUsers faster (T290609)]] (duration: 00m 56s)


== 2020-10-10 ==
== 2021-10-07 ==
* 01:32 tgr@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:633281{{!}}Enable session-ip log channel everywhere (T264799)]] (duration: 00m 59s)
* 23:43 thcipriani@deploy1002: Synchronized wmf-config/logos.php: Config: [[gerrit:708065{{!}}Change Javanese Wiktionary logo (T287425)]] part 3/3 (duration: 00m 55s)
* 00:54 tgr@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:633277{{!}}Enable session-ip log channel on all but enwiki (T264799)]] (duration: 01m 01s)
* 23:41 thcipriani@deploy1002: Synchronized logos/config.yaml: Config: [[gerrit:708065{{!}}Change Javanese Wiktionary logo (T287425)]] part 2/3 (duration: 00m 55s)
* 00:18 tgr@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:633276{{!}}Enable session-ip log channel on eswiki (T264799)]] (duration: 00m 55s)
* 23:40 thcipriani@deploy1002: Synchronized static/images/project-logos: Config: [[gerrit:708065{{!}}Change Javanese Wiktionary logo (T287425)]] part 1/3 (duration: 00m 56s)
* 00:13 mutante: built prometheus-nutcracker-exporter for buster and imported on apt1001 (0.2+nmu1)
* 23:30 thcipriani@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:704170{{!}}Adding and use wordmark in trwikiquote (T286133)]] Part 2/2 (duration: 00m 56s)
* 23:28 thcipriani@deploy1002: Synchronized static/images/mobile/copyright/wikiquote-wordmark-tr.svg: Config: [[gerrit:704170{{!}}Adding and use wordmark in trwikiquote (T286133)]] Part 1/2 (duration: 00m 57s)
* 21:35 urbanecm: Password reset for SUL User:LA2-bot ([[phab:T292793|T292793]])
* 20:43 brennen@deploy1002: rebuilt and synchronized wikiversions files: all wikis to 1.38.0-wmf.3
* 20:37 brennen@deploy1002: rebuilt and synchronized wikiversions files: all wikis to 1.38.0-wmf.2  refs [[phab:T281167|T281167]]
* 20:35 cmooney@cumin1001: END (PASS) - Cookbook sre.network.cf (exit_code=0)
* 20:35 cmooney@cumin1001: START - Cookbook sre.network.cf
* 20:23 krinkle@deploy1002: Synchronized php-1.38.0-wmf.3/extensions/Gadgets/: {{Gerrit|I7c858b8c4bc}} (duration: 00m 56s)
* 20:01 urbanecm@deploy1002: Synchronized php-1.38.0-wmf.3/extensions/Echo/: {{Gerrit|8a7ff05ba28f302adb581bf430a868bb815b4ffd}}: Revert "Use namespaced CentralAuthSessionProvider" (duration: 00m 57s)
* 19:45 urbanecm@deploy1002: Synchronized php-1.38.0-wmf.3/extensions/CentralAuth/: {{Gerrit|c01c2e4983bad8582ddd62aeb35ac9be852d493b}}: Revert "Namespace session providers" (duration: 00m 57s)
* 19:44 urbanecm: Backporting https://gerrit.wikimedia.org/r/c/mediawiki/extensions/CentralAuth/+/727489, https://gerrit.wikimedia.org/r/c/mediawiki/extensions/Echo/+/727487 in an unsafe way -- exceptions at testwikis expected, wmf.3 is not deployed elsewhere, so this should be ok
* 19:37 brennen@deploy1002: rebuilt and synchronized wikiversions files: Revert all wikis to 1.38.0-wmf.2 ([[phab:T281167|T281167]])
* 19:33 brennen: 1.38.0-wmf.3 train ([[phab:T281167|T281167]]): variously blocked, rolling back to testwikis for safe deploy of backports
* 19:14 brennen@deploy1002: rebuilt and synchronized wikiversions files: Revert group2 wikis to 1.38.0-wmf.2
* 19:07 brennen@deploy1002: rebuilt and synchronized wikiversions files: all wikis to 1.38.0-wmf.3  refs [[phab:T281167|T281167]]
* 19:03 brennen: 1.38.0-wmf.3 train ([[phab:T281167|T281167]]): unblocked, rolling to all wikis
* 18:50 urbanecm: [urbanecm@mwmaint1002 /srv/mediawiki/php]$ mwscript extensions/GrowthExperiments/maintenance/initWikiConfig.php --wiki=test2wiki
* 18:46 sukhe: running authdns-update for [[phab:T292537|T292537]]
* 18:29 urbanecm: Morning B&C window done
* 18:28 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|4a946c046ae17a520f8d3463a16b1435ceb4856c}}: Deploy Growth mentor dashboard to pilot wikis ([[phab:T278920|T278920]]) (duration: 01m 04s)
* 18:23 urbanecm@deploy1002: Synchronized dblists/growthexperiments.dblist: {{Gerrit|87e300137c14451949fac12c3ec89319305a423e}}: Deploy Growth features to test2wiki (duration: 01m 03s)
* 18:21 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|87e300137c14451949fac12c3ec89319305a423e}}: Deploy Growth features to test2wiki (duration: 01m 04s)
* 18:20 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|31770f2b3660e7d7490c0a9ab66285c1f069732d}}: shwiki: Deploy Growth features to newcomers ([[phab:T278240|T278240]]) (duration: 01m 04s)
* 18:15 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|33526dfed148068585289f5ac501feda72068fd9}}: Stream config changes for android_daily_stats schema ([[phab:T286000|T286000]]) (duration: 01m 06s)
* 18:10 ejegg: updated payments-wiki from {{Gerrit|6d3560d083}} to {{Gerrit|030b11da1a}}
* 18:07 arnoldokoth: gitlab2001 re-image complete ([[phab:T283076|T283076]])
* 17:30 mutante: rebooting gitlab2001.wikimedia.org
* 16:56 arnoldokoth: down timing gitlab2001 for re-imaging ([[phab:T283076|T283076]])
* 16:47 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on gitlab2001.wikimedia.org with reason: reimage
* 16:47 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on gitlab2001.wikimedia.org with reason: reimage
* 16:32 hnowlan: roll restarting maps cassandra instances for java updates
* 16:19 ayounsi@cumin2002: END (PASS) - Cookbook sre.network.cf (exit_code=0)
* 16:19 ayounsi@cumin2002: START - Cookbook sre.network.cf
* 16:18 ayounsi@cumin1001: END (FAIL) - Cookbook sre.network.cf (exit_code=99)
* 16:18 ayounsi@cumin1001: START - Cookbook sre.network.cf
* 16:18 ayounsi@cumin1001: END (FAIL) - Cookbook sre.network.cf (exit_code=99)
* 16:18 ayounsi@cumin1001: START - Cookbook sre.network.cf
* 15:07 hashar@deploy1002: Finished deploy [gerrit/gerrit@13cef9f]: Gerrit to 3.3.6 on gerrit1001 (duration: 00m 08s)
* 15:07 hashar@deploy1002: Started deploy [gerrit/gerrit@13cef9f]: Gerrit to 3.3.6 on gerrit1001
* 14:50 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 14:49 hashar@deploy1002: Finished deploy [gerrit/gerrit@13cef9f]: Gerrit to 3.3.6 on gerrit2001 (duration: 00m 10s)
* 14:49 hashar@deploy1002: Started deploy [gerrit/gerrit@13cef9f]: Gerrit to 3.3.6 on gerrit2001
* 14:48 hashar: Upgrading Gerrit replica to 3.3.6 # [[phab:T290236|T290236]]
* 14:48 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 14:34 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 14:31 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 13:56 jiji@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
* 13:46 jiji@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
* 13:39 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 13:30 jiji@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
* 13:29 jiji@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
* 13:29 hashar: restarting CI Jenkins for git plugin update
* 13:19 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 13:15 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 13:14 hashar: Upgraded CI Jenkins on contint2001
* 13:14 jiji@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
* 13:13 jiji@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
* 13:10 jiji@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
* 13:09 jiji@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
* 13:06 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 13:06 volans@cumin2002: END (PASS) - Cookbook sre.experimental.reimage (exit_code=0) for host sretest1001.eqiad.wmnet
* 13:05 jiji@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
* 13:05 jiji@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
* 12:56 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 12:56 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 12:51 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 12:40 volans@cumin2002: START - Cookbook sre.experimental.reimage for host sretest1001.eqiad.wmnet
* 12:16 moritzm: installing testvm2005
* 11:59 moritzm: installing openssl security updates for stretch (buster/bullseye already fixed)
* 11:52 Lucas_WMDE: EU backport+config window (aka UTC morning) done
* 11:52 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 11:50 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:725858{{!}}Enable Content and Section Translation to Kurdish WP (T290238)]] (duration: 01m 04s)
* 11:49 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 11:44 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.38.0-wmf.3/extensions/WikidataPageBanner/includes/WikidataPageBannerFunctions.php: Backport: [[gerrit:727188{{!}}Change PropertyId to NumericPropertyId (T289125, T292667)]] (duration: 01m 05s)
* 11:41 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 11:38 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 11:10 jbond: update puppet stdlib gerrit:726872
* 09:36 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 09:31 jmm@cumin2002: START - Cookbook sre.dns.netbox
* 09:27 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts testvm2004.codfw.wmnet
* 09:26 mvernon@cumin2002: END (PASS) - Cookbook sre.experimental.reimage (exit_code=0) for host ms-be2045.codfw.wmnet
* 09:25 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host testvm2005.codfw.wmnet
* 09:19 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts testvm2004.codfw.wmnet
* 09:08 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host testvm2005.codfw.wmnet
* 08:49 mvernon@cumin2002: START - Cookbook sre.experimental.reimage for host ms-be2045.codfw.wmnet
* 08:36 moritzm: imported jenkins 2.303.2 to thirdparty/ci component for buster-wikimedia
* 07:57 Emperor: re-enabling puppet on ms-be2045 after hw work [[phab:T290881|T290881]]
* 07:39 jayme@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'sync'.
* 07:39 jayme@deploy1002: helmfile [eqiad] START helmfile.d/admin 'sync'.
* 07:38 jayme@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'sync'.
* 07:37 jayme@deploy1002: helmfile [codfw] START helmfile.d/admin 'sync'.
* 07:34 jayme@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'sync'.
* 07:33 jayme@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'sync'.
* 07:32 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'sync'.
* 07:31 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'sync'.
* 06:21 ryankemper: [Elastic] Restart of `relforge` complete
* 06:05 ryankemper: [Elastic] Cluster in green status, proceeding to next and final node => `ryankemper@relforge1003:~$ sudo systemctl restart elasticsearch_6@relforge-eqiad-small-alpha.service && sudo systemctl restart elasticsearch_6@relforge-eqiad.service`
* 05:53 ryankemper: [Elastic] `ryankemper@relforge1004:~$ sudo systemctl restart elasticsearch_6@relforge-eqiad-small-alpha.service && sudo systemctl restart elasticsearch_6@relforge-eqiad.service`
* 05:48 ryankemper: [Elastic] Performing rolling restarts of `relforge`. `relforge1003` is the master so I'll restart `relforge1004` first to minimize disruption
* 03:00 ejegg: updated payments-wiki from {{Gerrit|23d0ffac66}} to {{Gerrit|6d3560d083}}
* 02:30 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 02:28 tstarling@deploy1002: Synchronized wmf-config/CommonSettings.php: enable Parsoid API everywhere (duration: 01m 04s)
* 02:27 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 00:11 mutante: [grafana2001:~] $ sudo systemctl start rsync-var-lib-grafana  because of "PROBLEM - Check systemd state on grafana2001 is CRITICAL: CRITICAL - degraded" because of some race condition where a file vanished during sync


== 2020-10-09 ==
== 2021-10-06 ==
* 23:44 tgr@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:633274{{!}}Enable session-ip log channel on Wikidata (T264799)]] (duration: 00m 59s)
* 23:57 mutante: releases2002 - rm /srv/org/wikimedia/reprepro/conf/distributions - contains only jessie-mediawiki - see 725670 and EOL of MediaWiki 1.31
* 23:25 tgr@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:633272{{!}}Enable session-ip log channel on Commons (T264799)]] (duration: 00m 59s)
* 23:32 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 23:13 mutante: maps2010 is down since almost 3 days - unhandled crit alert but nothing in SAL and only related ticket says resolved - powercycling it - boots normal but doesn't have a prod role ([[phab:T260271|T260271]])
* 23:29 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 23:07 mutante: maps2010 is down since almost 3 days - unhandled crit alert but nothing in SAL or tickets
* 23:21 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 23:06 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 23:21 jforrester@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:726955{{!}}Adding and use wordmark in ckbwiki (T288368)]] (duration: 01m 04s)
* 23:06 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 23:20 jforrester@deploy1002: Synchronized static/images/mobile/copyright/wikipedia-wordmark-ckb.svg: Config: [[gerrit:726955{{!}}Adding and use wordmark in ckbwiki (T288368)]] (duration: 01m 04s)
* 23:06 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 23:18 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 23:06 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 23:16 jforrester@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:726603{{!}}Enable NewUserMessage for ptwikivoyage (T290820)]] (duration: 01m 05s)
* 23:06 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 22:30 mutante: re-enabling puppet on mw*, an-worker* after deploying gerrit:726954. no issue this time
* 23:06 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 22:23 mutante: temp. disabling puppet on an-worker*, mw*
* 23:03 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 20:50 mutante: global puppet failure - revert is merged, puppet run will recover on next run everywhere. partially forcing with cumin, partially letting it recover naturally
* 23:03 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 20:43 mutante: [cumin1001:~] $ sudo cumin -b 5 -p 95 'mw2*' 'run-puppet-agent -q --failed-only'
* 22:52 tgr@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:633271{{!}}Enable session-ip log channel on group1, except Commons/Wikidata (T264799)]] (duration: 00m 57s)
* 19:08 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 22:23 tgr@deploy1001: Synchronized php-1.36.0-wmf.11/includes/: Backport: [[gerrit:633252{{!}}Log IP/device changes within the same session (T264799)]] & [[gerrit:633254{{!}}SessionManager: Always log IP/UA in session-ip]] (duration: 01m 04s)
* 19:06 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 22:20 tgr@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:633210{{!}}Enable session-ip log channel on group0 (T264799)]] (duration: 00m 59s)
* 19:05 brennen@deploy1002: Synchronized php: group1 wikis to 1.38.0-wmf.3  refs [[phab:T281167|T281167]] (duration: 01m 03s)
* 22:09 tgr@deploy1001: Synchronized php-1.36.0-wmf.10/includes/: Backport: [[gerrit:633252{{!}}Log IP/device changes within the same session (T264799)]] & [[gerrit:633254{{!}}SessionManager: Always log IP/UA in session-ip]] (duration: 01m 06s)
* 19:04 brennen@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.38.0-wmf.3  refs [[phab:T281167|T281167]]
* 22:01 tgr_: rolling out [[phab:T264799|T264799]]#6533622
* 19:01 brennen: 1.38.0-wmf.3 train ([[phab:T281167|T281167]]): still unblocked after triage meeting, rolling to group1
* 21:53 Urbanecm: [urbanecm@mwmaint2001 ~]$ mwscript extensions/CentralAuth/maintenance/attachAccount.php --wiki=dewiki --userlist users.txt # users.txt contains Almeida # [[phab:T263935|T263935]]
* 18:52 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 20:41 dwisehaupt: upgrading pay-lvs1001 to buster
* 18:50 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 20:31 dwisehaupt: upgrading pay-lvs1002 to buster
* 18:44 legoktm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Revert disabling static mapframes on eswiki (duration: 01m 14s)
* 20:04 dwisehaupt: upgrading payments1001 to buster
* 18:36 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 19:14 dwisehaupt: upgrading payments1002 to buster
* 18:34 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 19:10 dzahn@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
* 18:31 legoktm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: eswiki: Disable static mapframes ([[phab:T291736|T291736]]) (duration: 01m 17s)
* 18:44 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm
* 18:25 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 18:30 dwisehaupt: upgrading payments1003 to buster
* 18:22 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 17:53 dwisehaupt: upgrading payments1004 to buster
* 18:22 legoktm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: viwikibooks: Set $wgRestrictDisplayTitle to false ([[phab:T289837|T289837]]) (duration: 01m 21s)
* 17:52 cstone: civicrm revision changed from {{Gerrit|b86a15a430}} to {{Gerrit|585eb835d8}}, config revision is {{Gerrit|57843925bb}}
* 17:16 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 16:06 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 17:10 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 15:56 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
* 16:53 bd808@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'toolhub' for release 'main' .
* 15:42 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
* 16:47 brennen@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.38.0-wmf.3  refs [[phab:T281167|T281167]]
* 15:40 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
* 16:47 bd808@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'toolhub' for release 'main' .
* 14:41 jayme@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'zotero' for release 'production' .
* 16:43 brennen: 1.38.0-wmf.3 train ([[phab:T281167|T281167]]): unblocked, rolling to group0
* 14:32 jayme@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'zotero' for release 'production' .
* 16:41 bd808@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'toolhub' for release 'main' .
* 14:18 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'zotero' for release 'staging' .
* 16:35 brennen@deploy1002: Synchronized php-1.38.0-wmf.3/extensions/Scribunto/includes/engines/LuaCommon/LanguageLibrary.php: Backport: [[gerrit:726596{{!}}Replace deprecated ParserOptions::getUser with ::getUserIdentity (T292589)]] (duration: 01m 04s)
* 13:48 jayme@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
* 16:35 jynus: stopping db1127 for hw maintenance [[phab:T292366|T292366]]
* 13:45 jayme: helm rollback push-notification in eqiad to revision 8
* 16:31 jynus@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1127.eqiad.wmnet with reason: hw maintenance
* 13:31 gehel@cumin1001: START - Cookbook sre.wdqs.data-reload
* 16:31 jynus@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1127.eqiad.wmnet with reason: hw maintenance
* 13:29 jayme@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
* 16:28 brennen@deploy1002: Synchronized php-1.38.0-wmf.2/extensions/Scribunto/includes/engines/LuaCommon/LanguageLibrary.php: Backport: [[gerrit:726597{{!}}Replace deprecated ParserOptions::getUser with ::getUserIdentity (T292589)]] (duration: 01m 10s)
* 13:23 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'wikifeeds' for release 'staging' .
* 16:22 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 13:12 jayme@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'termbox' for release 'production' .
* 16:19 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 12:55 jayme@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'termbox' for release 'production' .
* 16:11 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 12:52 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'termbox' for release 'staging' .
* 16:08 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 12:33 jayme@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'proton' for release 'production' .
* 16:01 volans@cumin2002: END (PASS) - Cookbook sre.experimental.reimage (exit_code=0) for host sretest1002.eqiad.wmnet
* 12:20 jayme@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'push-notifications' for release 'main' .
* 15:45 brennen: 1.38.0-wmf.3 train ([[phab:T281167|T281167]]): proceeding to deploy backports for [[phab:T292589|T292589]]
* 12:20 jayme@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'proton' for release 'production' .
* 15:37 volans@cumin2002: START - Cookbook sre.experimental.reimage for host sretest1002.eqiad.wmnet
* 12:16 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'proton' for release 'production' .
* 15:35 volans: installer spicerack 1.0.4 on cumin2002
* 12:15 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'push-notifications' for release 'main' .
* 12:50 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 12:13 jayme@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'mathoid' for release 'production' .
* 12:48 volans: uploaded spicerack_1.0.4 to apt.wikimedia.org buster-wikimedia,bullseye-wikimedia
* 11:38 jayme@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
* 12:26 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host testvm2004.codfw.wmnet
* 11:16 jayme@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
* 12:23 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 11:13 jayme@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'mathoid' for release 'production' .
* 12:18 effie: pool mw1455 mw1422
* 11:13 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'mobileapps' for release 'staging' .
* 12:17 urbanecm: wikiadmin@10.64.0.164(viwiki)> delete from growthexperiments_mentee_data; # cleanup after disabling mentor dashboard backend
* 10:52 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'mathoid' for release 'staging' .
* 12:16 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host testvm2004.codfw.wmnet
* 10:41 gehel@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99)
* 12:13 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|1aa67d4846f39f59127a835cb7a8ed2974506025}}: viwiki: Disable mentor dashboard backend ([[phab:T278920|T278920]]) (duration: 01m 06s)
* 10:17 jayme@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-main' for release 'canary' .
* 12:02 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 10:17 jayme@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
* 11:59 jmm@cumin2002: START - Cookbook sre.dns.netbox
* 10:16 jayme@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-analytics' for release 'production' .
* 11:56 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts testvm2003.codfw.wmnet
* 10:11 jayme@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-main' for release 'canary' .
* 11:55 XioNoX: esams - Advertise 185.15.59.0/24 instead of 185.15.58.0/23 - [[phab:T288505|T288505]] - [[phab:T283050|T283050]]
* 10:11 jayme@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
* 11:46 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts testvm2003.codfw.wmnet
* 09:55 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
* 10:58 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2025.codfw.wmnet
* 09:53 jayme@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-analytics' for release 'production' .
* 10:52 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2025.codfw.wmnet
* 09:47 elukey: roll restart of hadoop-yarn-nodemanager on all hadoop workers to pick up new settings
* 10:50 jelto: disable puppet on gitlab1001 to test puppetized code on GitLab replica - [[phab:T283076|T283076]]
* 09:38 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics' for release 'production' .
* 10:21 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ldap-replica1003.wikimedia.org
* 09:38 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics' for release 'canary' .
* 10:18 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ldap-replica1003.wikimedia.org
* 09:32 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 10:06 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 09:32 kormat@cumin1001: START - Cookbook sre.hosts.downtime
* 10:04 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 09:07 XioNoX: remove user from all network devices
* 10:04 urbanecm@deploy1002: Synchronized wmf-config/: {{Gerrit|01633739462f3bf09ae4e50b955454921ea4fbf9}}: Delete gettingstarted-with-category-suggestions dblist ([[phab:T235752|T235752]]; 2/2) (duration: 01m 05s)
* 08:22 marostegui: Restart dbstore1005 mysql to pick up new buffer pool sizes
* 10:01 urbanecm@deploy1002: Synchronized dblists/: {{Gerrit|01633739462f3bf09ae4e50b955454921ea4fbf9}}: Delete gettingstarted-with-category-suggestions dblist ([[phab:T235752|T235752]]; 1/2) (duration: 01m 04s)
* 08:11 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 09:59 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host copernicium.wikimedia.org
* 08:11 filippo@cumin1001: START - Cookbook sre.hosts.downtime
* 09:54 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host copernicium.wikimedia.org
* 07:36 moritzm: installing xen security updates for buster (libs only)
* 09:19 jbond: update ipaddress6 fact - https://gerrit.wikimedia.org/r/c/operations/puppet/+/726625
* 07:34 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 09:15 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 07:34 filippo@cumin1001: START - Cookbook sre.hosts.downtime
* 09:13 ladsgroup@deploy1002: Synchronized php-1.38.0-wmf.2/extensions/Wikibase/repo/includes/ChangeModification/DispatchChangesJob.php: Backport: [[gerrit:725923{{!}}Don't fail job if subscribed wiki is unknown (T292446 T292440)]] (duration: 01m 15s)
* 00:16 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 09:13 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 00:00 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
* 08:29 volans@cumin2002: END (PASS) - Cookbook sre.experimental.reimage (exit_code=0) for host sretest1001.eqiad.wmnet
* 08:21 XioNoX: add ROAs for 185.15.58.0/24 and 185.15.59.0/24 - [[phab:T288505|T288505]] - [[phab:T283050|T283050]]
* 08:04 volans@cumin2002: START - Cookbook sre.experimental.reimage for host sretest1001.eqiad.wmnet
* 07:56 urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript namespaceDupes.php plwikinews --fix # [[phab:T291344|T291344]]
* 07:56 urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript namespaceDupes.php plwikinews # [[phab:T291344|T291344]]
* 07:55 urbanecm: mwdebug1001: scap pull ([[phab:T291344|T291344]] fix done)
* 07:51 urbanecm: Staging at mwdebug1001 for [[phab:T291344|T291344]]
* 05:53 kart_: Updated cxserver to use nodejs12 ([[phab:T290754|T290754]])
* 05:47 kartik@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'cxserver' for release 'production' .
* 05:39 kartik@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'cxserver' for release 'production' .
* 05:36 Amir1: start of mwscript extensions/Wikibase/repo/maintenance/pruneChanges.php --wiki wikidatawiki --number-of-days=2
* 05:31 kartik@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'cxserver' for release 'staging' .
* 04:54 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 04:49 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 04:29 ryankemper: [WDQS] `wdqs1012` is back up after restarting blazegraph (blazegraph was locked up)
* 04:27 ryankemper: [WDQS] `ryankemper@wdqs1012:~$ sudo systemctl restart wdqs-blazegraph` (attempting to bring downed `wdqs1012` back into health)
* 04:25 ryankemper: [WDQS] Repooling eqiad hosts following the brief outage from earlier: `wdqs1004`, `wdqs1006`, `wdqs1007`
* 03:19 eileen: civicrm revision changed from {{Gerrit|b6f5f71c18}} to {{Gerrit|82efd2e195}}, config revision is {{Gerrit|f4c57d4733}}
* 03:11 tstarling@deploy1002: Synchronized php-1.38.0-wmf.3/includes/CommentFormatter/RowCommentIterator.php: fix UBN [[phab:T292590|T292590]] (duration: 01m 04s)
* 01:39 legoktm: legoktm@mwmaint1002:~$ echo "https://en.wikiversity.org/static/images/mobile/copyright/wikiversity.svg" {{!}}mwscript purgeList.php
* 01:17 arlolra@deploy1002: Synchronized php-1.38.0-wmf.2/extensions/GlobalUserPage/includes/GlobalUserPage.php: Bump GlobalUserPage::PARSED_CACHE_VERSION for media DOM changes (duration: 01m 03s)
* 01:12 arlolra@deploy1002: Synchronized php-1.38.0-wmf.3/extensions/GlobalUserPage/includes/GlobalUserPage.php: Bump GlobalUserPage::PARSED_CACHE_VERSION for media DOM changes (duration: 01m 17s)
* 00:59 arlolra@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Enable legacy media dom on metawiki (duration: 01m 05s)
* 00:37 arlolra@deploy1002: Synchronized php-1.38.0-wmf.2/includes/resourceloader/ResourceLoaderSkinModule.php: Add a separate config for content.media.less 2/2 (duration: 01m 03s)
* 00:35 arlolra@deploy1002: Synchronized php-1.38.0-wmf.2/includes/DefaultSettings.php: Add a separate config for content.media.less 1/2 (duration: 01m 03s)
* 00:32 arlolra@deploy1002: Synchronized php-1.38.0-wmf.3/includes/resourceloader/ResourceLoaderSkinModule.php: Add a separate config for content.media.less 2/2 (duration: 01m 03s)
* 00:29 arlolra@deploy1002: Synchronized php-1.38.0-wmf.3/includes/DefaultSettings.php: Add a separate config for content.media.less 1/2 (duration: 01m 04s)
* 00:16 mutante: puppetmasters: rm /etc/logrotate.d/geoipupdate && systemctl start logrotate && puppet agent -tv
* 00:14 mutante: puppetmaster2002 - rm /etc/logrotate.d/geoipupdate (not managed by puppet anymore but not removed, caused duplicate logrotate config, made logrotate service fail), start logrotate
* 00:08 cstone: civicrm revision changed from {{Gerrit|34d3c3aae8}} to {{Gerrit|b6f5f71c18}}
* 00:01 tgr@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:725132{{!}}Add WN as an alias to project namespace in Polish Wikinews (T291344)]] (duration: 01m 04s)


== 2020-10-08 ==
== 2021-10-05 ==
* 23:42 ryankemper: `cloudelastic1006` done. Writes thawed, maintenance window lifted; restarts are done for `cloudelastic`
* 23:54 tgr@deploy1002: Synchronized static/images/mobile/copyright/wikiversity.svg: Config: [[gerrit:725413{{!}}Wikiversity Logo Update for 2017 Logo Version (T292109)]] (duration: 01m 03s)
* 23:37 ryankemper: `cloudelastic1005` done
* 23:47 tgr@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:704376{{!}}Adding and use wordmark in azwiki (T284877)]] (duration: 01m 04s)
* 23:31 ryankemper: `cloudelastic1004` done
* 23:44 tgr@deploy1002: Synchronized static/images/mobile/copyright/wikipedia-wordmark-az.svg: Config: [[gerrit:704376{{!}}Adding and use wordmark in azwiki (T284877)]] (duration: 01m 23s)
* 23:27 ryankemper: `cloudelastic1003` done
* 23:16 tgr@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:725386{{!}}Add image_suggestion_interaction event stream]] (duration: 01m 12s)
* 23:23 ryankemper: `cloudelastic1002` done
* 23:02 legoktm: deleting old stretch docker images from the registry for [[phab:T292485|T292485]]
* 23:16 tgr_: Evening deploys done
* 22:24 brennen@deploy1002: rebuilt and synchronized wikiversions files: Revert group0 wikis to 1.38.0-wmf.2
* 23:16 ryankemper: `cloudelastic1001` is done restarting and cluster is green again. Proceeding to `cloudelastic1002`
* 22:20 brennen: 1.38.0-wmf.3 ([[phab:T281167|T281167]]) rolling back to testwikis for the day; will revisit in US-morning
* 23:16 tgr@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:632797{{!}}Enable logging of session cookie changes everywhere (T264793)]] (duration: 01m 01s)
* 20:47 brennen@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.38.0-wmf.3  refs [[phab:T281167|T281167]]
* 23:04 ryankemper: Beginning cluster restarts one server at a time. For each server, the process is depool->restart elasticsearch services->wait for services to restart and then pool->wait for cluster to return to green status before starting next server
* 20:44 brennen@deploy1002: Synchronized php-1.38.0-wmf.3/includes/page: Backport: [[gerrit:726594{{!}}Pre-format comments for non-local files too]] ([[phab:T292570|T292570]]) (duration: 01m 04s)
* 23:01 ryankemper: Writes are frozen for `cloudelastic`: `/usr/local/bin/mwscript extensions/CirrusSearch/maintenance/FreezeWritesToCluster.php --wiki=enwiki --cluster=cloudelastic` on `mwmaint2001` => `Applied cluster-wide freeze`
* 20:18 mutante: puppetmaster1003 et al - converting maxmind geoip database fetching from cron to timers
* 22:56 ryankemper: `sudo apt policy wmf-elasticsearch-search-plugins` shows correct state: `Installed: 6.5.4-4~stretch`
* 20:06 mutante: cumin 'puppetmaster*' "disable-puppet '[[phab:T288844|T288844]] - [[phab:T273673|T273673]] - gerrit:721595 - $<nowiki>{</nowiki>USER<nowiki>}</nowiki>'"
* 22:56 ryankemper: `sudo -E cumin -b 6 C:role::elasticsearch::cloudelastic 'DEBIAN_FRONTEND=noninteractive sudo apt-get -y -o Dpkg::Options::="--force-confdef" -o Dpkg::Options::="--force-confold" install wmf-elasticsearch-search-plugins'`
* 19:30 mutante: restoring /home/amire80 from and to mwmaint2002 via Bacula bconsole ([[phab:T292573|T292573]])
* 22:54 ryankemper: About to start plugin upgrade followed by restarts of `cloudelastic`. Maintenance window set for the next 2 hours on `cloudelastic100[1-6]`
* 19:09 brennen@deploy1002: rebuilt and synchronized wikiversions files: Revert group0 wikis to 1.38.0-wmf.2
* 21:54 ebernhardson@deploy1001: Finished deploy [wikimedia/discovery/analytics@a923949]: search_satisfaction: update druid datasource to match previous data (duration: 01m 04s)
* 19:03 brennen@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.38.0-wmf.3  refs [[phab:T281167|T281167]]
* 21:53 ebernhardson@deploy1001: Started deploy [wikimedia/discovery/analytics@a923949]: search_satisfaction: update druid datasource to match previous data
* 18:26 brennen@deploy1002: Pruned MediaWiki: 1.37.0-wmf.23 (duration: 01m 57s)
* 21:52 hashar@deploy1001: Synchronized php-1.36.0-wmf.10/includes/session/SessionBackend.php: Deduplicate SessionBackend::logPersistenceChange calls - [[phab:T264793|T264793]] (duration: 01m 01s)
* 18:23 brennen@deploy1002: Pruned MediaWiki: 1.37.0-wmf.21 (duration: 04m 20s)
* 21:05 volans@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 18:21 brennen: 1.38.0-wmf.3 ([[phab:T281167|T281167]]): pruning old branches, starting with 1.37.0-wmf.21, proceeeding to 1.37.0-wmf.23 if time allows
* 21:00 volans@cumin1001: START - Cookbook sre.dns.netbox
* 18:11 ppchelko@deploy1002: Synchronized wmf-config: Remove mb_strtoupper overrides for HHVM [[phab:T219279|T219279]] Php72ToUpper.php removal (duration: 01m 06s)
* 21:00 volans@cumin1001: END (ERROR) - Cookbook sre.dns.netbox (exit_code=97)
* 18:04 ppchelko@deploy1002: Synchronized wmf-config/CommonSettings.php: Remove mb_strtoupper overrides for HHVM [[phab:T219279|T219279]] CS.php (duration: 01m 06s)
* 21:00 volans@cumin1001: START - Cookbook sre.dns.netbox
* 17:55 brennen@deploy1002: Finished scap: testwikis wikis to 1.38.0-wmf.3  refs [[phab:T281167|T281167]] (duration: 45m 59s)
* 20:50 volans@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 17:12 btullis@cumin1001: END (PASS) - Cookbook sre.aqs.roll-restart (exit_code=0) for AQS aqs cluster: Roll restart of all AQS's nodejs daemons. - btullis@cumin1001
* 20:45 volans@cumin1001: START - Cookbook sre.dns.netbox
* 17:09 brennen@deploy1002: Started scap: testwikis wikis to 1.38.0-wmf.3  refs [[phab:T281167|T281167]]
* 20:43 volans: deploying Netbox DNS zone consolidation - [[phab:T264273|T264273]]
* 17:03 btullis@cumin1001: START - Cookbook sre.aqs.roll-restart for AQS aqs cluster: Roll restart of all AQS's nodejs daemons. - btullis@cumin1001
* 20:11 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 17:02 btullis@cumin1001: END (FAIL) - Cookbook sre.aqs.roll-restart (exit_code=99) for AQS aqs cluster: Roll restart of all AQS's nodejs daemons. - btullis@cumin1001
* 20:09 andrew@cumin1001: START - Cookbook sre.hosts.downtime
* 17:02 btullis@cumin1001: START - Cookbook sre.aqs.roll-restart for AQS aqs cluster: Roll restart of all AQS's nodejs daemons. - btullis@cumin1001
* 19:24 ebernhardson@deploy1001: Finished deploy [wikimedia/discovery/analytics@3b11443]: search_satisfaction: Alias sample multiplier to expected name (duration: 01m 09s)
* 16:56 brennen: successfully applied security patches for 1.38.0-wmf.3 train ([[phab:T281167|T281167]])
* 19:23 ebernhardson@deploy1001: Started deploy [wikimedia/discovery/analytics@3b11443]: search_satisfaction: Alias sample multiplier to expected name
* 16:47 brennen: coordinated with deployment backup and starting train prep for 1.38.0-wmf.3 ([[phab:T281167|T281167]]), branched at {{Gerrit|65279490f82c785181b8b6961e40901a4aaafca4}}
* 18:57 volker-e@deploy1001: Finished deploy [design/style-guide@b1166af]: Deploy design/style-guide:  (duration: 00m 06s)
* 15:57 jbond@cumin2002: END (PASS) - Cookbook sre.puppet.renew-cert (exit_code=0) for puppetboard2002.codfw.wmnet: Renew puppet certificate - jbond@cumin2002
* 18:57 volker-e@deploy1001: Started deploy [design/style-guide@b1166af]: Deploy design/style-guide:
* 15:57 jbond@cumin2002: START - Cookbook sre.puppet.renew-cert for puppetboard2002.codfw.wmnet: Renew puppet certificate - jbond@cumin2002
* 18:17 tchanders@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:632908{{!}}Enable Special:Investigate by default on production (T264357)]] (duration: 01m 06s)
* 15:38 jbond: reimage puppetboard2002
* 17:50 root@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 15:15 jbond@cumin1001: END (PASS) - Cookbook sre.puppet.renew-cert (exit_code=0) for puppetboard1002.eqiad.wmnet: Renew puppet certificate - jbond@cumin1001
* 17:49 ebernhardson@deploy1001: Finished deploy [wikimedia/discovery/analytics@945e5c1]: airflow: Set search satisfaction dag start date to oldest current available data (duration: 11m 55s)
* 15:15 jbond@cumin1001: START - Cookbook sre.puppet.renew-cert for puppetboard1002.eqiad.wmnet: Renew puppet certificate - jbond@cumin1001
* 17:44 root@cumin1001: START - Cookbook sre.dns.netbox
* 15:10 moritzm: imported routinator 0.10.1-1bullseye to thirdparty/routinator for bullseye-wikimedia [[phab:T292503|T292503]]
* 17:37 ebernhardson@deploy1001: Started deploy [wikimedia/discovery/analytics@945e5c1]: airflow: Set search satisfaction dag start date to oldest current available data
* 14:58 jbond: reimage puppetboard1002
* 17:31 volans@cumin1001: START - Cookbook sre.dns.netbox
* 14:40 effie: depool mw1455 and mw1422
* 17:30 volans@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 14:30 Pchelolo: run foreachwiki uppercaseTitlesForUnicodeTransition.php --charmap current_to_php7_overrides.php [[phab:T219279|T219279]]
* 17:27 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 13:51 reedy@deploy1002: Synchronized wmf-config/CommonSettings.php: ExtensionDistributor - Drop REL1_31, start REL1_37 (duration: 00m 57s)
* 17:26 andrew@cumin1001: START - Cookbook sre.hosts.downtime
* 13:46 Pchelolo: run renameInvalidUsernames.php --wiki loginwiki --list /tmp/rename_users_for_uppercase_all.txt [[phab:T219279|T219279]]
* 17:23 volans@cumin1001: START - Cookbook sre.dns.netbox
* 13:39 elukey@cumin1001: END (FAIL) - Cookbook sre.aqs.roll-restart (exit_code=99) for AQS aqs cluster: Roll restart of all AQS's nodejs daemons. - elukey@cumin1001
* 17:16 shdubsh: install prometheus-rsyslog-exporter_0.0.0+git20201008 on centrallog1001 - [[phab:T210137|T210137]]
* 13:39 elukey@cumin1001: START - Cookbook sre.aqs.roll-restart for AQS aqs cluster: Roll restart of all AQS's nodejs daemons. - elukey@cumin1001
* 16:25 mutante: rebooting cloudvirt1023 - trying PXE boot
* 13:23 ppchelko@deploy1002: Synchronized php-1.38.0-wmf.2/maintenance/uppercaseTitlesForUnicodeTransition.php: Backport uppercaseTitlesForUnicodeTransition.php maintenance script improvements [[phab:T219279|T219279]] (duration: 00m 58s)
* 16:19 hashar: Restarting CI Jenkins
* 12:53 ema: upload varnish 6.0.8-1wm1 to apt.wikimedia.org [[phab:T292290|T292290]]
* 16:15 volans@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 12:43 elukey: import AMD ROCm 4.2 to buster-wikimedia's thirdparty/amd-rocm42 - [[phab:T287267|T287267]]
* 16:09 volans@cumin1001: START - Cookbook sre.dns.netbox
* 12:24 ema: deployment-cache-text06: upgrade varnish to 6.0.8-1wm1 [[phab:T292290|T292290]]
* 16:08 volans@cumin1001: END (ERROR) - Cookbook sre.dns.netbox (exit_code=97)
* 11:58 hnowlan: reverted restbase2023 to use CN=hostname certificate due to loading errors
* 16:08 volans@cumin1001: START - Cookbook sre.dns.netbox
* 11:57 hnowlan@cumin1001: END (FAIL) - Cookbook sre.cassandra.roll-restart (exit_code=99) for nodes matching restbase2023.codfw.wmnet: Switching over to using FQDN certificate - hnowlan@cumin1001
* 14:21 marostegui: Set global innodb_change_buffering = all; on pc2009 [[phab:T263443|T263443]]
* 11:57 hnowlan@cumin1001: START - Cookbook sre.cassandra.roll-restart for nodes matching restbase2023.codfw.wmnet: Switching over to using FQDN certificate - hnowlan@cumin1001
* 14:17 moritzm: importing icu 63.1-6+deb10u1~wmf5 to component/icu63 [[phab:T264991|T264991]]
* 11:37 hnowlan@cumin1001: END (FAIL) - Cookbook sre.cassandra.roll-restart (exit_code=99) for nodes matching restbase2023.codfw.wmnet: Switching over to using FQDN certificate - hnowlan@cumin1001
* 13:37 hnowlan@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
* 11:28 hnowlan@cumin1001: START - Cookbook sre.cassandra.roll-restart for nodes matching restbase2023.codfw.wmnet: Switching over to using FQDN certificate - hnowlan@cumin1001
* 13:37 hnowlan@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
* 11:17 hnowlan_: disabling puppet on cassandra nodes for rollout of 724061 - defaulting to cn=fqdn certificates
* 12:29 kart_: Updated cxserver to 2020-10-08-053343-production ([[phab:T264407|T264407]], [[phab:T264859|T264859]])
* 11:15 effie: upgrade scap to 4.0.2 - [[phab:T291095|T291095]]
* 12:26 kartik@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'cxserver' for release 'production' .
* 11:12 urbanecm@deploy1002: Synchronized dblists/commonsuploads.dblist: {{Gerrit|04524992865b0ae5750eb6fb0a374aa74a65b383}}: Enable local uploads for tcywiki ([[phab:T166763|T166763]]) (duration: 00m 59s)
* 12:24 kartik@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'cxserver' for release 'production' .
* 10:11 vgutierrez: update acme-chief to version 0.32 on acmechief hosts - [[phab:T290249|T290249]]
* 12:21 kartik@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'cxserver' for release 'staging' .
* 10:09 vgutierrez: update acme-chief to version 0.32 on acmechief-test hosts - [[phab:T290249|T290249]]
* 12:10 hnowlan@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
* 10:06 vgutierrez: upload acme-chief 0.32 to apt.wm.o (buster) - [[phab:T290249|T290249]]
* 12:10 hnowlan@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
* 09:46 hnowlan_: generated cassandra certificate using FQDN for restbase2023
* 12:08 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 09:09 topranks: updating routinator on rpki2001 ([[phab:T291543|T291543]])
* 12:07 kormat@cumin1001: START - Cookbook sre.hosts.downtime
* 08:59 dcausse: depool and restart blazegraph on wdqs1007
* 12:07 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
* 08:51 moritzm: installing openssl security updates for stretch (buster/bullseye already fixed)
* 12:07 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
* 07:58 moritzm: installing apache security updates
* 12:05 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
* 07:57 elukey: upgrade GPU drivers (AMD ROCm 4.3.1) on an-worker1[096-101]
* 12:05 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
* 07:27 ladsgroup@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
* 10:54 aborrero@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 07:26 ladsgroup@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
* 10:52 aborrero@cumin2001: START - Cookbook sre.hosts.downtime
* 07:26 elukey@puppetmaster1001: conftool action : set/pooled=yes; selector: name=wdqs1004.wmnet
* 10:45 hnowlan@puppetmaster1001: conftool action : set/pooled=yes:weight=10; selector: dc=eqiad,cluster=restbase,service=restbase-backend,name=restbase1030.eqiad.wmnet
* 06:38 elukey: reboot an-worker1096 after installing new GPU drivers
* 10:45 hnowlan@puppetmaster1001: conftool action : set/pooled=yes:weight=10; selector: dc=eqiad,cluster=restbase,service=restbase-ssl,name=restbase1030.eqiad.wmnet
* 04:20 eileen: civicrm revision changed from {{Gerrit|d74e9aa0a1}} to {{Gerrit|34d3c3aae8}}, config revision is {{Gerrit|cae09f7691}}
* 10:45 hnowlan@puppetmaster1001: conftool action : set/pooled=yes:weight=10; selector: dc=eqiad,cluster=restbase,service=restbase,name=restbase1030.eqiad.wmnet
* 10:37 moritzm: installing Postgres security updates on netboxdb1001
* 10:34 hnowlan@puppetmaster1001: conftool action : set/pooled=yes:weight=10; selector: dc=eqiad,cluster=restbase,service=restbase-backend,name=restbase1029.eqiad.wmnet
* 10:34 hnowlan@puppetmaster1001: conftool action : set/pooled=yes:weight=10; selector: dc=eqiad,cluster=restbase,service=restbase-ssl,name=restbase1029.eqiad.wmnet
* 10:34 hnowlan@puppetmaster1001: conftool action : set/pooled=yes:weight=10; selector: dc=eqiad,cluster=restbase,service=restbase,name=restbase1029.eqiad.wmnet
* 10:32 moritzm: installing Postgres security updates on netboxdb2001
* 10:29 mvolz@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'citoid' for release 'production' .
* 10:28 hnowlan@puppetmaster1001: conftool action : set/pooled=yes:weight=10; selector: dc=eqiad,cluster=restbase,service=restbase-backend,name=restbase1028.eqiad.wmnet
* 10:27 hnowlan@puppetmaster1001: conftool action : set/pooled=yes:weight=10; selector: dc=eqiad,cluster=restbase,service=restbase-ssl,name=restbase1028.eqiad.wmnet
* 10:27 hnowlan@puppetmaster1001: conftool action : set/pooled=yes:weight=10; selector: dc=eqiad,cluster=restbase,service=restbase,name=restbase1028.eqiad.wmnet
* 10:26 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: dc=eqiad,cluster=restbase,service=restbase-backend,name=restbase1028.eqiad.wmnet
* 10:26 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: dc=eqiad,cluster=restbase,service=restbase-ssl,name=restbase1028.eqiad.wmnet
* 10:26 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: dc=eqiad,cluster=restbase,service=restbase,name=restbase1028.eqiad.wmnet
* 10:26 hnowlan: pooling restbase1028,restbase1029,restbase1030
* 10:22 mvolz@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'citoid' for release 'production' .
* 10:14 mvolz@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'citoid' for release 'staging' .
* 09:40 gehel@cumin1001: START - Cookbook sre.wdqs.data-reload
* 09:10 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 09:09 klausman@cumin1001: START - Cookbook sre.hosts.downtime
* 08:38 godog: roll-restart swift-object-replicator on ms-be2* - [[phab:T261633|T261633]]
* 08:19 kormat: running schema change against s8 in eqiad [[phab:T259831|T259831]]
* 08:19 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 08:19 kormat@cumin1001: START - Cookbook sre.hosts.downtime
* 08:06 gehel@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 08:04 gehel@cumin1001: START - Cookbook sre.hosts.downtime
* 08:02 gehel: repooling wdqs2002
* 07:55 marostegui: Rebuild db2125 from snapshots - [[phab:T260670|T260670]]
* 07:45 marostegui: Stop MySQL on db1077 to build it from s1 snapshot
* 07:40 gehel: depooled wdqs2002 to catch up on lag
* 07:29 jayme: updated envoyproxy to 1.15.1-2 on all codfw hosts
* 07:23 moritzm: installing pyzmq updates from Buster point release
* 07:00 dcausse: depooling wdqs2002 (catching-up lag)
* 06:57 dcausse: restart blazegraph on wdqs2002 (stuck) [[phab:T242453|T242453]]
* 06:51 _joe_: enable notifications for wdqs-ssl-codfw
* 05:33 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 05:27 marostegui@cumin1001: START - Cookbook sre.hosts.decommission
* 04:05 ejegg: updated fundraising python tools from {{Gerrit|5515923ef7}} to {{Gerrit|d4e08c52de}}
* 00:31 tgr_: evening deploys done
* 00:20 tgr@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:632796{{!}}Enable logging of session cookie changes in group1 (T264793)]] (again, forgot to rebase the previous time) (duration: 00m 59s)
* 00:15 tgr@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:632796{{!}}Enable logging of session cookie changes in group1 (T264793)]] (duration: 00m 57s)
* 00:03 tgr@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:632795{{!}}Enable logging of session cookie changes in group0 (T264793)]] (duration: 00m 58s)


== 2020-10-07 ==
== 2021-10-04 ==
* 23:58 tgr@deploy1001: Synchronized php-1.36.0-wmf.10/includes/session: Backport: [[gerrit:632685{{!}}Log when SessionManager is emitting cookies (T264793)]] (duration: 01m 00s)
* 23:30 foks: resetting some emails used for abuse by a globally-banned user
* 23:56 ryankemper@cumin1001: END (PASS) - Cookbook sre.elasticsearch.rolling-reboot (exit_code=0)
* 23:19 jdrewniak@deploy1002: Synchronized portals: Wikimedia Portals Update: [[gerrit:726084{{!}} Bumping portals to master (T128546)]] (duration: 00m 59s)
* 23:56 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-reboot
* 23:18 jdrewniak@deploy1002: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: [[gerrit:726084{{!}} Bumping portals to master (T128546)]] (duration: 00m 59s)
* 23:56 ryankemper@cumin1001: END (PASS) - Cookbook sre.elasticsearch.rolling-restart (exit_code=0)
* 23:13 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|75645c9cc59b37dbf59942eabbc014b7dc147626}}: Add explicit config for licensing/copyright message overrides ([[phab:T284097|T284097]]) (duration: 00m 59s)
* 23:55 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-restart
* 23:05 mutante: [deneb:~] $ sudo systemctl start docker-reporter-releng-images
* 21:55 ryankemper@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.rolling-reboot (exit_code=99)
* 22:54 mutante: puppetmaster2001 - rm /etc/logrotate.d/geoipupdate_ipinfo  and geoipupdate_ipinfo ; running puppet, starting logrotate service
* 21:48 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-reboot
* 18:13 jiji@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 21:14 ryankemper@cumin1001: END (PASS) - Cookbook sre.elasticsearch.rolling-reboot (exit_code=0)
* 16:51 bblack: rolling restart of haproxy for DoTLS on dns300[12],authdns1001,authdns2001 to recycle connections
* 20:56 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-reboot
* 15:24 vgutierrez: pool cp5006
* 20:09 ebernhardson@deploy1001: Finished deploy [wikimedia/discovery/analytics@7fa787e]: airflow: update mjolnir configuration to reduce max training dataset (duration: 03m 23s)
* 15:17 ladsgroup@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' .
* 20:05 ebernhardson@deploy1001: Started deploy [wikimedia/discovery/analytics@7fa787e]: airflow: update mjolnir configuration to reduce max training dataset
* 15:16 ladsgroup@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' .
* 19:36 mutante: blog post: The latest addition to our family of Wikimedia languages is "Inari Sami" with language code "smn". It is a Sami language spoken by the Inari Sami of Finland and has about 400 native speakers. It's in the Uralic language family. Wikipedia will be created in [[phab:T264859|T264859]]. https://en.wikipedia.org/wiki/Inari_Sami {{!}} https://iso639-3.sil.org/code/smn {{!}}
* 14:50 phuedx: phuedx@mwmaint1002:~$ mwscript extensions/SecurePoll/cli/purgeDecryptionKeys.php --wiki=votewiki --before="20210101000000"
* 18:30 ryankemper: search team's backport deploy is complete
* 14:46 ladsgroup@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' .
* 18:30 ryankemper@deploy1001: Synchronized wmf-config/ProductionServices.php: Config: [[gerrit:632683{{!}}cloudelastic: envoy sits in front now (T263073)]] (duration: 00m 58s)
* 14:46 effie: uploading scap 4.0.2 - [[phab:T291095|T291095]]
* 18:29 ryankemper: Above tests are as expected, syncing changes everywhere: `scap sync-file wmf-config/ProductionServices.php 'Config: [[gerrit:632683{{!}}cloudelastic: envoy sits in front now (T263073)]]'`
* 14:45 ladsgroup@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' .
* 18:27 ryankemper: `scap pull`ed onto `mwdebug2001`; talking to cloudelastic via mediawiki from codfw has the expected decrease in latency due to the tls connection pooling
* 14:39 brennen: gitlab: upgrade to 14.3.2 (note there was an additional patch release on 2021-10-01) complete ([[phab:T292256|T292256]])
* 18:24 ryankemper: `scap pull`ed onto `mwdebug1002`. Talking to cloudelastic on localhost (which routes thru envoy), 6105 is `cloudelastic-chi-eqiad`, 6106 is `cloudelastic-omega-eqiad`, and 6107 is `cloudelastic-psi-eqiad` as expected
* 14:25 Amir1: cleaning up wb_changes_subscription rows from closed wikis ([[phab:T292440|T292440]])
* 18:20 ryankemper: (backport) HEAD set to {{Gerrit|834b4571f978674162fa805906e665e35ac68e27}} as expected
* 14:24 brennen: gitlab: downtime for upgrade to 14.3.1
* 18:12 hashar@deploy1001: Synchronized php-1.36.0-wmf.10/includes/HeaderCallback.php: Preload class used in HeaderCallback - [[phab:T261260|T261260]] (duration: 01m 01s)
* 14:19 elukey: import AMD ROCm 4.3.1 packages in buster-wikimedia's thirdparty/amd-rocm431 - [[phab:T287267|T287267]]
* 17:58 hashar: Pulled https://gerrit.wikimedia.org/r/c/mediawiki/core/+/632680  on deployment staging area  and mw2001
* 14:19 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 17:35 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 14:17 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 17:33 elukey@cumin1001: START - Cookbook sre.hosts.downtime
* 14:13 ladsgroup@deploy1002: Synchronized wmf-config/Wikibase.php: Config: [[gerrit:725905{{!}}Explicitly enable dispatching and pruning for wikidata (T48643)]] (duration: 00m 58s)
* 16:39 jgleeson: updated civicrm from {{Gerrit|39b4f954ed}} to {{Gerrit|b86a15a430}}
* 14:08 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 16:35 mutante: switching webproxy service names to the new local install servers in esams/eqsin/ulsfo [[phab:T242602|T242602]]
* 14:06 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 15:12 godog: upgrade rsyslog to 8.2008.0-1~bpo10+1 on centrallog1001 - [[phab:T259780|T259780]]
* 14:03 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on gitlab2001.wikimedia.org with reason: upgrade gitlab2001 to new version https://phabricator.wikmiedia.org/T292256
* 14:45 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 14:03 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on gitlab2001.wikimedia.org with reason: upgrade gitlab2001 to new version https://phabricator.wikmiedia.org/T292256
* 14:41 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 14:01 ladsgroup@deploy1002: Synchronized wmf-config: Config: [[gerrit:725502{{!}}Enable dispatching via jobs everywhere (T48643)]] (duration: 01m 00s)
* 14:37 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 12:57 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 14:33 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 12:56 ladsgroup@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:725785{{!}}Enable dispatching for wikidatawiki and commonswiki (T292088)]] (duration: 01m 00s)
* 14:22 jayme@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'cxserver' for release 'production' .
* 12:54 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 14:04 hoo: Ran "mwscript extensions/Wikibase/repo/maintenance/changePropertyDataType.php --wiki=wikidatawiki --property-id P1820 --new-data-type external-id" on mwmaint2001 ([[phab:T263986|T263986]])
* 12:08 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 14:04 jayme@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'cxserver' for release 'production' .
* 12:02 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on ganeti2026.codfw.wmnet with reason: Ganeti tests
* 14:03 akosiaris@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'termbox' for release 'production' .
* 12:02 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on ganeti2026.codfw.wmnet with reason: Ganeti tests
* 14:00 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'cxserver' for release 'staging' .
* 12:02 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on ganeti2025.codfw.wmnet with reason: Ganeti tests
* 13:42 jayme: updated envoyproxy to 1.15.1-2 on all eqiad hosts
* 12:02 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on ganeti2025.codfw.wmnet with reason: Ganeti tests
* 13:39 akosiaris@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'termbox' for release 'production' .
* 12:01 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 13:37 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'termbox' for release 'test' .
* 11:55 urbanecm: EU B&C window done
* 13:37 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'termbox' for release 'staging' .
* 11:55 urbanecm@deploy1002: Synchronized multiversion/MWWikiversions.php: {{Gerrit|508cf5cc6d213373f7c9ba1cdef142ebc8398022}}: Let DB expressions intersect DB lists ([[phab:T290609|T290609]]) (duration: 00m 58s)
* 13:18 volker-e@deploy1001: Finished deploy [design/style-guide@e3fda83]: Deploy design/style-guide: (duration: 00m 04s)
* 11:50 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|a855078cf52d88cc2cd27a0adc7c6a680c80dd39}}: dewiki, nlwiki: Bump Growth features to 80% ([[phab:T288420|T288420]], [[phab:T285254|T285254]]) (duration: 00m 58s)
* 13:18 volker-e@deploy1001: Started deploy [design/style-guide@e3fda83]: Deploy design/style-guide:
* 11:46 urbanecm@deploy1002: Synchronized private/PrivateSettings.php: {{Gerrit|5728376}}: Update [[phab:T250887|T250887]] mitigations (duration: 00m 58s)
* 12:33 jayme@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'blubberoid' for release 'production' .
* 11:44 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|b0a96bed4562bcc975187b1d34626201d407404b}}: Undeploy GettingStarted V: Remove now-obsolete logging channels ([[phab:T235752|T235752]]) (duration: 00m 59s)
* 12:24 jayme@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'blubberoid' for release 'production' .
* 11:42 urbanecm@deploy1002: Synchronized wmf-config/extension-list: {{Gerrit|9709bcfc8dacbcd1704471df08c31cec0711bea6}}: Undeploy GettingStarted IV: Dont build i18n ([[phab:T235752|T235752]]) (duration: 00m 58s)
* 12:22 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'blubberoid' for release 'staging' .
* 11:39 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|d60f332785868797e7ecc9b5e410616d5604b392}}: Undeploy getting started III: Dont set wmgUseGettingStarted, now ignored ([[phab:T235752|T235752]]) (duration: 00m 58s)
* 11:55 _joe_: rolling restart of restbase due to running puppet with changed config-vars (a noop for the actual configuration)
* 11:37 urbanecm@deploy1002: Synchronized wmf-config/CommonSettings.php: {{Gerrit|9eaf960c4b7c304be57dfc8d248aca0c6501d04c}}: Undeploy GettingStarted II: Dont load regardless of config ([[phab:T235752|T235752]]) (duration: 00m 58s)
* 11:22 Urbanecm: EU B&C window done
* 11:35 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|1c7405ad1eb323a8da524819f17d6f1a66afaa57}}: Undeploy GettingStarted I: Disable on all wikis ([[phab:T235752|T235752]]) (duration: 00m 58s)
* 11:22 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|f85bc3056f809910c0487fb0b0559b3de92b1992}}: Enable bot passwords at all fishbowl and private wikis ([[phab:T258356|T258356]]) (duration: 00m 58s)
* 11:31 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:724992{{!}}Remove deprecated SectionTranslationTargetLanguage config (T290302)]] (duration: 00m 58s)
* 11:15 urbanecm@deploy1001: Synchronized wmf-config/CommonSettings.php: {{Gerrit|57297362c0a22ecf16648b7be4a73c4cb80d53ef}}: Fix OAuthRateLimiter rate limit configuration (duration: 00m 59s)
* 11:22 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:725042{{!}}Add wikisource-bot.toolforge.org to Commons copy upload list (T292213)]] (duration: 00m 59s)
* 11:14 urbanecm@deploy1001: sync-file aborted: {{Gerrit|57297362c0a22ecf16648b7be4a73c4cb80d53ef}}: Fix OAuthRateLimiter rate limit configuration (duration: 00m 02s)
* 11:16 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:720058{{!}}Add IA-Upload tool domains to Commons wgCopyUploadsDomains (T287241)]] (duration: 00m 59s)
* 11:07 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|6cdeea2c4c15780a641722157584f12febedab2a}}: Set CXMTThresholdForPublish to 95% for Vietnamese Wikipedia ([[phab:T264161|T264161]]) (duration: 00m 59s)
* 11:12 akosiaris@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mathoid' for release 'production' .
* 10:58 marostegui: Set innodb_change_buffering = inserts on pc2009 [[phab:T263443|T263443]]
* 11:10 akosiaris@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mathoid' for release 'production' .
* 09:53 kormat@cumin1001: dbctl commit (dc=all): 'Remove db2119 from mw load groups [[phab:T259831|T259831]]', diff saved to https://phabricator.wikimedia.org/P12945 and previous config saved to /var/cache/conftool/dbconfig/20201007-095355-kormat.json
* 11:07 jiji@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 09:44 kormat@cumin1001: dbctl commit (dc=all): 'db2138:3314 (re)pooling @ 100%: 75', diff saved to https://phabricator.wikimedia.org/P12944 and previous config saved to /var/cache/conftool/dbconfig/20201007-094412-kormat.json
* 11:06 akosiaris@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'mathoid' for release 'staging' .
* 09:21 moritzm: imported icu63 63.1-6+deb10u1~wmf1 to component/icu63 for stretch-wikimedia
* 11:04 effie: depool  wtp1026 for tests
* 09:09 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1076 [[phab:T264755|T264755]] ', diff saved to https://phabricator.wikimedia.org/P12943 and previous config saved to /var/cache/conftool/dbconfig/20201007-090943-marostegui.json
* 11:04 effie: pool  wtp1025
* 08:39 kormat@cumin1001: dbctl commit (dc=all): 'db2138:3314 depooling: schema change [[phab:T259831|T259831]]', diff saved to https://phabricator.wikimedia.org/P12942 and previous config saved to /var/cache/conftool/dbconfig/20201007-083903-kormat.json
* 10:59 jiji@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 08:38 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 09:13 akosiaris: hbal -L -G row_C -X on ganeti01.svc.eqiad.wmnet
* 08:38 kormat@cumin1001: START - Cookbook sre.hosts.downtime
* 08:59 jgiannelos@deploy1002: Finished deploy [kartotherian/deploy@071f7c3] (eqiad): Increase mirrored traffic to 100% for eqiad (duration: 00m 54s)
* 08:32 godog: roll-restart statsd-exporter across ms-be* after puppet run - [[phab:T264588|T264588]]
* 08:58 jgiannelos@deploy1002: Started deploy [kartotherian/deploy@071f7c3] (eqiad): Increase mirrored traffic to 100% for eqiad
* 08:09 jayme: updated envoyproxy to 1.15.1-2 on all non mw and restbase hosts
* 07:37 joal@deploy1002: Finished deploy [analytics/refinery@38f3adc] (hadoop-test): Hotfix analytics deploy TEST [analytics/refinery@38f3adc] (duration: 06m 14s)
* 08:05 volans@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 07:31 joal@deploy1002: Started deploy [analytics/refinery@38f3adc] (hadoop-test): Hotfix analytics deploy TEST [analytics/refinery@38f3adc]
* 07:58 volans@cumin1001: START - Cookbook sre.dns.netbox
* 07:30 joal@deploy1002: Finished deploy [analytics/refinery@38f3adc] (thin): Hotfix analytics deploy THIN [analytics/refinery@38f3adc] (duration: 00m 06s)
* 07:49 marostegui@cumin1001: dbctl commit (dc=all): 'Remove es2015 from dbctl [[phab:T264700|T264700]]', diff saved to https://phabricator.wikimedia.org/P12941 and previous config saved to /var/cache/conftool/dbconfig/20201007-074951-marostegui.json
* 07:30 joal@deploy1002: Started deploy [analytics/refinery@38f3adc] (thin): Hotfix analytics deploy THIN [analytics/refinery@38f3adc]
* 07:14 marostegui: Stop MySQL es2015 for decommissioning [[phab:T264700|T264700]]
* 07:29 joal@deploy1002: Finished deploy [analytics/refinery@38f3adc]: Hotfix analytics deploy [analytics/refinery@38f3adc] (duration: 19m 18s)
* 05:52 ayounsi@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 07:19 dcausse: restarting blazegraph on wdqs2001 & wdqs2004 (allocators burning too quickly)
* 05:46 ayounsi@cumin1001: START - Cookbook sre.dns.netbox
* 07:18 elukey: depool + restart blazegraph + restart updater for wdqs1006
* 02:37 eileen: civicrm revision changed from {{Gerrit|a30da7f92a}} to {{Gerrit|39b4f954ed}}, config revision is {{Gerrit|0ca9a3a055}}
* 07:18 elukey@puppetmaster1001: conftool action : set/pooled=inactive; selector: name=wdqs1006.wmnet
* 01:00 cdanis: repool esams; cr2-esams router upgrade complete
* 07:18 elukey@puppetmaster1001: conftool action : set/pooled=inactive; selector: name=wdqs1004.wmnet
* 00:43 cdanis: [[phab:T259621|T259621]] cdanis@re1.cr2-esams> request chassis routing-engine master switch
* 07:10 joal@deploy1002: Started deploy [analytics/refinery@38f3adc]: Hotfix analytics deploy [analytics/refinery@38f3adc]
* 00:40 cdanis: [[phab:T259621|T259621]] cdanis@re1.cr2-esams> request system reboot other-routing-engine
* 07:02 godog: swift eqiad-prod: add weight to ms-be10[64-67] - [[phab:T290546|T290546]]
* 00:36 cdanis: [[phab:T259621|T259621]] cdanis@re1.cr2-esams> request system software add /var/tmp/junos-install-mx-x86-64-17.3R3-S8.1.tgz re0 no-validate
* 06:44 elukey: depool + restart blazegraph + restart updater on wdqs1004
* 00:26 cdanis: [[phab:T259621|T259621]] cdanis@re0.cr2-esams> request chassis routing-engine master switch
* 05:50 ladsgroup@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' .
* 00:22 cdanis: [[phab:T259621|T259621]] cdanis@re0.cr2-esams> request system reboot other-routing-engine
* 05:49 ladsgroup@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' .
* 00:15 cdanis: [[phab:T259621|T259621]] cdanis@re0.cr2-esams> request system software add re1 no-validate /var/tmp/junos-install-mx-x86-64-17.3R3-S8.1.tgz
* 05:47 ladsgroup@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'staging' .
* 00:01 mutante: reinstalling testvm[345]001 to confirm OS installs work as normal after switching DHCP servers in POPs ([[phab:T252526|T252526]])


== 2020-10-06 ==
== 2021-10-03 ==
* 23:55 mutante: 🖧  switched DHCP server for eqsin from install2003 to install5001 - homer deployed to cr*eqsin* ([[phab:T252526|T252526]]) 🖧
* 14:45 _joe_: restarting acmechief on acmechief1001
* 23:53 mutante: 🖧  switched DHCP server for ulsfo from install2003 to install4001 - homer deployed to cr*ulsfo* ([[phab:T252526|T252526]]) 🖧
* 12:55 kormat@cumin1001: dbctl commit (dc=all): 'Depool db1127, bad ram', diff saved to https://phabricator.wikimedia.org/P17414 and previous config saved to /var/cache/conftool/dbconfig/20211003-125530-kormat.json
* 23:52 mutante: 🖧  switched DHCP server for esams from install1003 to install3001 - homer deployed to cr*esams* ([[phab:T252526|T252526]]) 🖧
* 08:24 elukey: powercycle cp5006 (unresponsive to ssh, remote tty available but not able to login as root, no prometheus metrics in hours)
* 23:43 jhuneidi@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'blubberoid' for release 'production' .
* 08:23 elukey@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp5006.eqsin.wmnet
* 23:11 jhuneidi@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'blubberoid' for release 'production' .
* 23:07 jhuneidi@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'blubberoid' for release 'staging' .
* 22:32 ryankemper: Restart of `wdqs-categories` done. WDQS deploy is complete
* 21:57 ryankemper: Restarting `wdqs-categories` across production instances one-at-a-time: `sudo -E cumin -b 1 'A:wdqs-all and not A:wdqs-test' 'depool && sleep 60 && systemctl restart wdqs-categories && sleep 30 && pool'`
* 21:57 ryankemper: Restarting `wdqs-categories` across all test instances (not public facing): `sudo -E cumin 'A:wdqs-test' 'systemctl restart wdqs-categories'`
* 21:56 ryankemper: Restarting `wdqs-updater` across the fleet: `sudo -E cumin -b 4 'A:wdqs-all' 'systemctl restart wdqs-updater'`
* 21:55 ryankemper@deploy1001: Finished deploy [wdqs/wdqs@e56a20e]: 0.3.51 (duration: 13m 09s)
* 21:43 ryankemper: All tests passing on canary `wdqs1003`, proceeding to rest of fleet
* 21:42 ryankemper@deploy1001: Started deploy [wdqs/wdqs@e56a20e]: 0.3.51
* 21:14 ppchelko@deploy1001: Synchronized wmf-config/CommonSettings.php: gerrit:632535 (duration: 01m 00s)
* 20:25 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 20:23 andrew@cumin1001: START - Cookbook sre.hosts.downtime
* 18:40 Urbanecm: Morning B&C done
* 18:40 urbanecm@deploy1001: Synchronized php-1.36.0-wmf.11/skins/MinervaNeue/: {{Gerrit|2118d265c0f5b6c914efeba86ba7eacd30c5ee0f}}: Hot fix: Use display for hiding/showing sidebar on OS 14_0 ([[phab:T264376|T264376]]) (duration: 01m 00s)
* 18:37 urbanecm@deploy1001: Synchronized php-1.36.0-wmf.10/skins/MinervaNeue/: {{Gerrit|d428ccbdf3be9a45139f8b8c0874c113f1732198}}: Hot fix: Use display for hiding/showing sidebar on OS 14_0 ([[phab:T264376|T264376]]) (duration: 01m 03s)
* 18:25 ppchelko@deploy1001: Synchronized wmf-config/Wikibase.php: Wikibase.php gerrit:631775 [[phab:T263493|T263493]] [[phab:T259622|T259622]] (duration: 00m 58s)
* 18:23 ppchelko@deploy1001: Synchronized wmf-config/InitialiseSettings.php: IS.php gerrit:631775 [[phab:T263493|T263493]] [[phab:T259622|T259622]] (duration: 00m 59s)
* 18:19 ppchelko@deploy1001: Synchronized wmf-config/InitialiseSettings.php: gerrit:632516 [[phab:T264043|T264043]] (duration: 00m 59s)
* 18:15 ppchelko@deploy1001: Synchronized wmf-config/InitialiseSettings.php: gerrit:632323 [[phab:T264637|T264637]] (duration: 00m 58s)
* 18:12 ppchelko@deploy1001: Synchronized wmf-config/InitialiseSettings.php: gerrit:632484 [[phab:T264637|T264637]] (duration: 00m 58s)
* 15:41 godog: centrallog* delete archived logs from old, single file, organization
* 15:23 jayme@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'citoid' for release 'production' .
* 15:23 jayme: updated envoyproxy to 1.15.1-2 on mw-canary and restbase-canary
* 14:57 sukhe: upload dnsdist_1.5.0-1wm1 to apt.wm.o (buster) - [[phab:T263789|T263789]]
* 14:47 kormat@cumin1001: dbctl commit (dc=all): 'db2137:3314 (re)pooling @ 100%: schema change [[phab:T259831|T259831]]', diff saved to https://phabricator.wikimedia.org/P12936 and previous config saved to /var/cache/conftool/dbconfig/20201006-144701-kormat.json
* 14:45 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 14:45 aborrero@cumin1001: START - Cookbook sre.hosts.downtime
* 14:45 vgutierrez: Bump ECDHE-ECDSA-AES128-SHA pageview replacement to 5% - [[phab:T262946|T262946]]
* 14:45 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 14:44 aborrero@cumin1001: START - Cookbook sre.hosts.downtime
* 14:40 jayme: updated envoyproxy to 1.15.1-2 on mw2295.codfw.wmnet,restbase2017.codfw.wmnet
* 14:38 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: dc=codfw,cluster=restbase,service=restbase-backend,name=restbase2009.codfw.wmnet
* 14:38 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: dc=codfw,cluster=restbase,service=restbase-ssl,name=restbase2009.codfw.wmnet
* 14:38 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: dc=codfw,cluster=restbase,service=restbase,name=restbase2009.codfw.wmnet
* 14:36 hnowlan: repooling restbase2009
* 14:31 kormat@cumin1001: dbctl commit (dc=all): 'db2137:3314 (re)pooling @ 75%: schema change [[phab:T259831|T259831]]', diff saved to https://phabricator.wikimedia.org/P12935 and previous config saved to /var/cache/conftool/dbconfig/20201006-143157-kormat.json
* 14:19 volker-e@deploy1001: Finished deploy [design/style-guide@e3fda83]: Deploy design/style-guide:  (duration: 00m 05s)
* 14:19 volker-e@deploy1001: Started deploy [design/style-guide@e3fda83]: Deploy design/style-guide:
* 14:15 jayme: installed envoyproxy 1.15.1-2 on mwdebug1001
* 14:08 marostegui: Reboot db1076 for kernel upgrade [[phab:T264755|T264755]]
* 14:04 jayme@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'citoid' for release 'production' .
* 14:03 marostegui: Power cycle db1076 [[phab:T264755|T264755]]
* 13:58 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1076 ', diff saved to https://phabricator.wikimedia.org/P12934 and previous config saved to /var/cache/conftool/dbconfig/20201006-135810-marostegui.json
* 13:41 kormat@cumin1001: dbctl commit (dc=all): 'db2137:3314 depooling: schema change [[phab:T259831|T259831]]', diff saved to https://phabricator.wikimedia.org/P12932 and previous config saved to /var/cache/conftool/dbconfig/20201006-134149-kormat.json
* 13:41 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 13:41 kormat@cumin1001: START - Cookbook sre.hosts.downtime
* 13:40 kormat@cumin1001: dbctl commit (dc=all): 'Remove db2119 from dump/vslow, add to all other contributions/logpager/recentchanges*/watchlist temporarily [[phab:T259831|T259831]]', diff saved to https://phabricator.wikimedia.org/P12931 and previous config saved to /var/cache/conftool/dbconfig/20201006-134020-kormat.json
* 13:40 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'citoid' for release 'staging' .
* 13:14 jayme: pushed docker-registry.discovery.wmnet/envoy:1.15.1-2 - [[phab:T264157|T264157]]
* 13:04 marostegui: Change innodb_change_buffering = inserts on db2075 db2089 db2099 db2111 db2128 [[phab:T263443|T263443]]
* 12:55 godog: swift codfw-prod: bump weight for ms-be2057 - [[phab:T261633|T261633]]
* 12:20 elukey: update HDFS Namenode GC/Heap settings on an-master100[1,2]
* 12:13 jayme: imported envoyproxy_1.15.1-2 to buster-wikimedia and stretch-wikimedia
* 12:08 jbond42: deploy puppetlabs-stdlib 5.2
* 11:50 ayounsi@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 11:42 ayounsi@cumin1001: START - Cookbook sre.dns.netbox
* 11:39 ayounsi@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 11:35 ayounsi@cumin1001: START - Cookbook sre.dns.netbox
* 11:34 Urbanecm: EU B&C window done
* 11:34 Urbanecm: urbanecm@mwmaint2001:~$ mwscript namespaceDupes.php --wiki=arbcom_ruwiki --fix # [[phab:T264430|T264430]] # P12930
* 11:33 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|07c19f97c79ec20d6b1657e589acfc242dd53b09}}: arbcom_ruwiki: Set AK as alias for NS_PROJECT ([[phab:T264430|T264430]]) (duration: 00m 58s)
* 11:31 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|7e4e81129b8697c394ec329dd2b3c784e607a4d1}}: arbcom_ruwiki: Change favicon to File:Arbcom-ru_favicon.svg from commons ([[phab:T264430|T264430]]) (duration: 00m 58s)
* 11:30 urbanecm@deploy1001: Synchronized static/favicon/arbcom_ruwiki.ico: {{Gerrit|7e4e81129b8697c394ec329dd2b3c784e607a4d1}}: arbcom_ruwiki: Change favicon to File:Arbcom-ru_favicon.svg from commons ([[phab:T264430|T264430]]) (duration: 00m 58s)
* 11:20 XioNoX: push L3 prep work to cloudsw1-c8-eqiad
* 11:19 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|7b1a4fad0f55c626e42961489062115d5f97ed6c}}: ruewiki: Add rollbacker, grantable and revokable by sysops ([[phab:T264147|T264147]]) (duration: 00m 58s)
* 11:15 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|5cc7027ba8d0ddee5c9898b80afe850603bf870e}}: Allow bureaucrats to remove sysop permissions on Commons ([[phab:T261481|T261481]]) (duration: 00m 58s)
* 11:07 hnowlan@deploy1001: Finished deploy [restbase/deploy@4ad65b0]: Redeploying restbase2009 (duration: 03m 14s)
* 11:07 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|5f9721b3300c8e733d331bcbc754d31d9493f8ba}}: GrowthExperiments: Change Help Page URL for kowiki ([[phab:T254364|T254364]]) (duration: 01m 00s)
* 11:04 hnowlan@deploy1001: Started deploy [restbase/deploy@4ad65b0]: Redeploying restbase2009
* 11:02 hnowlan@deploy1001: Finished deploy [restbase/deploy@4ad65b0]: Redeploying restbase2009 (duration: 00m 12s)
* 11:02 hnowlan@deploy1001: Started deploy [restbase/deploy@4ad65b0]: Redeploying restbase2009
* 11:01 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 11:01 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime
* 10:48 effie: set mw2279.codfw.wmnet as inactive [[phab:T264698|T264698]]
* 10:47 jiji@cumin1001: conftool action : set/pooled=inactive; selector: name=mw2279.codfw.wmnet
* 10:45 hnowlan@deploy1001: Finished deploy [restbase/deploy@4ad65b0]: Deploying restbase to new hosts (duration: 01m 19s)
* 10:44 hnowlan@deploy1001: Started deploy [restbase/deploy@4ad65b0]: Deploying restbase to new hosts
* 10:43 hnowlan@deploy1001: Finished deploy [restbase/deploy@4ad65b0]: Deploying restbase to new hosts (duration: 01m 19s)
* 10:41 hnowlan@deploy1001: Started deploy [restbase/deploy@4ad65b0]: Deploying restbase to new hosts
* 10:37 hnowlan@deploy1001: Finished deploy [restbase/deploy@4ad65b0]: Redeploying to depooled restbase2009 (duration: 00m 15s)
* 10:37 hnowlan@deploy1001: Started deploy [restbase/deploy@4ad65b0]: Redeploying to depooled restbase2009
* 10:36 volans@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 10:33 hnowlan@deploy1001: Finished deploy [restbase/deploy@4ad65b0]: (no justification provided) (duration: 03m 01s)
* 10:31 volans@cumin1001: START - Cookbook sre.dns.netbox
* 10:30 hnowlan@deploy1001: Started deploy [restbase/deploy@4ad65b0]: (no justification provided)
* 10:01 marostegui: Restart mysql on dbstore1004 to pick up new buffer pool sizes
* 09:59 effie: enable puppet on mc20*
* 09:41 effie: enable puppet on mc10*
* 09:38 effie: disable puppet on mc*
* 09:27 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 09:26 klausman@cumin1001: START - Cookbook sre.hosts.downtime
* 08:57 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=0)
* 08:55 elukey@cumin1001: START - Cookbook sre.hadoop.init-hadoop-workers
* 08:33 jayme: imported envoyproxy_1.15.1-1+deb9u1 to stretch-wikimedia
* 08:27 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 08:26 elukey@cumin1001: START - Cookbook sre.hosts.downtime
* 08:02 volans: removing unused ms-fe and ms-fe-thumbs svc records from DNS (gerrit/628086)
* 07:53 marostegui: Change innodb_change_buffering = inserts on db2087:3316 db2089:3316 db2076 db2097:3316 db2114 [[phab:T263443|T263443]]
* 07:39 filippo@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
* 07:35 filippo@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
* 07:31 filippo@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'wikifeeds' for release 'staging' .
* 07:17 marostegui: Remove es2015 and es2017 from tendril and zarcillo [[phab:T264700|T264700]] [[phab:T264386|T264386]]
* 07:14 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es2015 [[phab:T264700|T264700]] ', diff saved to https://phabricator.wikimedia.org/P12926 and previous config saved to /var/cache/conftool/dbconfig/20201006-071451-marostegui.json
* 07:05 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 06:59 marostegui@cumin1001: START - Cookbook sre.hosts.decommission
* 05:28 marostegui@cumin1001: dbctl commit (dc=all): 'Remove es2017 from dbctl [[phab:T264386|T264386]]', diff saved to https://phabricator.wikimedia.org/P12925 and previous config saved to /var/cache/conftool/dbconfig/20201006-052849-marostegui.json


== 2020-10-05 ==
== 2021-10-02 ==
* 23:11 ejegg: updated payments staging from {{Gerrit|52704ffe24}} to {{Gerrit|db03677b2d}}
* 17:28 bd808@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'toolhub' for release 'main' .
* 22:27 mutante: removing shinken puppet module and role
* 16:10 bd808@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'toolhub' for release 'main' .
* 22:01 ebernhardson: restore wikidatawiki_content enwiki_content enwiki_general and commonswiki_file to default index.merge.policy.deletes_pct_allowed on eqiad cirrus cluster [[phab:T264053|T264053]]
* 21:01 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 20:59 andrew@cumin1001: START - Cookbook sre.hosts.downtime
* 20:30 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 20:28 andrew@cumin1001: START - Cookbook sre.hosts.downtime
* 20:26 ebernhardson: restart elasticsearch_6@production-search-codfw on elastic2051 to take reduced (32 sector, 16kB) readahead settings [[phab:T264053|T264053]]
* 20:13 ebernhardson: restart elasticsearch_6@production-search-codfw on elastic2051 to take reduced (64 sector, 32kB) readahead settings [[phab:T264053|T264053]]
* 19:56 ebernhardson: restart elasticsearch_6@production-search-codfw on elastic2050 to take reduced (128kB) readahead settings [[phab:T264053|T264053]]
* 19:31 mutante: ran sre.dns.netbox to push addition of an-worker1113 which was commited in prod repo but not in netbox data
* 19:30 dzahn@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 19:27 dzahn@cumin1001: START - Cookbook sre.dns.netbox
* 18:59 mforns@deploy1001: Finished deploy [analytics/refinery@2c6c335] (thin): [THIN] Special deployment to unblock deletion jobs [analytics/refinery@2c6c335e61cecd0321ec6f066a153feaf2dbbc27] (duration: 00m 08s)
* 18:59 mforns@deploy1001: Started deploy [analytics/refinery@2c6c335] (thin): [THIN] Special deployment to unblock deletion jobs [analytics/refinery@2c6c335e61cecd0321ec6f066a153feaf2dbbc27]
* 18:58 mforns@deploy1001: Finished deploy [analytics/refinery@2c6c335]: Special deployment to unblock deletion jobs [analytics/refinery@2c6c335e61cecd0321ec6f066a153feaf2dbbc27] (duration: 12m 08s)
* 18:46 mforns@deploy1001: Started deploy [analytics/refinery@2c6c335]: Special deployment to unblock deletion jobs [analytics/refinery@2c6c335e61cecd0321ec6f066a153feaf2dbbc27]
* 18:17 elukey@cumin1001: END (FAIL) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=99)
* 18:17 elukey@cumin1001: START - Cookbook sre.hadoop.init-hadoop-workers
* 18:15 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=0)
* 18:13 elukey@cumin1001: START - Cookbook sre.hadoop.init-hadoop-workers
* 18:11 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=0)
* 18:10 elukey@cumin1001: START - Cookbook sre.hadoop.init-hadoop-workers
* 17:53 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 17:51 elukey@cumin1001: START - Cookbook sre.hosts.downtime
* 17:29 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 17:27 elukey@cumin1001: START - Cookbook sre.hosts.downtime
* 17:25 hnowlan@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' .
* 17:25 hnowlan@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'staging' .
* 17:00 hnowlan@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'staging' .
* 17:00 hnowlan@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' .
* 16:59 hnowlan@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' .
* 16:59 hnowlan@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'staging' .
* 16:51 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'staging' .
* 16:51 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' .
* 15:15 ppchelko@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' .
* 14:56 ppchelko@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' .
* 14:55 ppchelko@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'staging' .
* 14:41 elukey: shutdown stat1005 and stat1008 for ram expansion (1005 again)
* 14:36 ppchelko@deploy1001: Finished deploy [restbase/deploy@366a543]: [[phab:T263133|T263133]] [[phab:T264035|T264035]] (duration: 22m 23s)
* 14:25 elukey: shutdown an-master1001 for ram expansion
* 14:13 ppchelko@deploy1001: Started deploy [restbase/deploy@366a543]: [[phab:T263133|T263133]] [[phab:T264035|T264035]]
* 14:01 filippo@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'citoid' for release 'production' .
* 13:58 filippo@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'citoid' for release 'production' .
* 13:55 filippo@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'citoid' for release 'staging' .
* 13:54 elukey: shutdown stat1005 for ram upgrade
* 13:31 elukey: shutdown an-master1002 for ram expansion (64 -> 128G)
* 12:39 moritzm: installing curl security updates on remaining hosts
* 11:34 hoo@deploy1001: Synchronized wmf-config/: Revert "Remove $wgExtraLanguageNames from Wikidata and Commons" ([[phab:T264295|T264295]]) (duration: 00m 59s)
* 11:28 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|be73f155001e9095697c3c21a208c63e7bf5d2d1}}: Move changetags right from users to sysop [trwiki] ([[phab:T264508|T264508]]) (duration: 00m 59s)
* 11:12 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|cd30b626e23b48146b970c72731f8f7bb1eee9e1}}: wgSkipSkins: Exclude contenttranslation skin from skin options for users ([[phab:T263093|T263093]]) (duration: 00m 59s)
* 11:05 jdrewniak@deploy1001: Synchronized portals: Wikimedia Portals Update: [[gerrit:632212{{!}} Bumping portals to master (T128546)]] (duration: 00m 58s)
* 11:04 jdrewniak@deploy1001: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: [[gerrit:632212{{!}} Bumping portals to master (T128546)]] (duration: 00m 58s)
* 10:37 elukey@cumin1001: END (PASS) - Cookbook sre.aqs.roll-restart (exit_code=0)
* 10:37 jdrewniak@deploy1001: Synchronized portals: Wikimedia Portals Update: [[gerrit:632204{{!}} Bumping portals to master (T128546)]] (duration: 00m 58s)
* 10:36 jdrewniak@deploy1001: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: [[gerrit:632204{{!}} Bumping portals to master (T128546)]] (duration: 01m 00s)
* 10:34 elukey@cumin1001: START - Cookbook sre.aqs.roll-restart
* 10:32 ema: cp3052: pool with varnish 5.1.3-1wm15 [[phab:T264398|T264398]]
* 10:28 ema: cp3052: depool and downgrade varnish to 5.1.3-1wm15 [[phab:T264398|T264398]]
* 10:08 moritzm: installing ldap-replica1002 [[phab:T264390|T264390]]
* 09:52 moritzm: installing ldap-replica1001 [[phab:T264390|T264390]]
* 09:22 moritzm: installing ldap-replica2003 [[phab:T264390|T264390]]
* 09:02 hnowlan: bootstrapping restbase1030-b
* 08:57 moritzm: installing ldap-replica2004 [[phab:T264390|T264390]]
* 08:40 kormat@cumin1001: dbctl commit (dc=all): 'db2073 depooling: schema change [[phab:T259831|T259831]]', diff saved to https://phabricator.wikimedia.org/P12918 and previous config saved to /var/cache/conftool/dbconfig/20201005-084022-kormat.json
* 08:39 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 08:39 kormat@cumin1001: START - Cookbook sre.hosts.downtime
* 08:38 kormat@cumin1001: dbctl commit (dc=all): 'Add db2119 to s4 dump/vslow temporarily [[phab:T259831|T259831]]', diff saved to https://phabricator.wikimedia.org/P12917 and previous config saved to /var/cache/conftool/dbconfig/20201005-083822-kormat.json
* 08:23 godog: prometheus codfw/ops, add 100G to the LV
* 08:06 jmm@cumin2001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
* 07:46 marostegui: Stop mysql on es2017 [[phab:T264386|T264386]]
* 07:30 jmm@cumin2001: START - Cookbook sre.ganeti.makevm
* 06:52 XioNoX: add static NAT to pfw3-eqiad - [[phab:T264356|T264356]]
* 06:33 elukey: reboot stat1005 to resolve weird GPU state (scheduled last week)
* 05:06 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es2017 [[phab:T264386|T264386]] ', diff saved to https://phabricator.wikimedia.org/P12916 and previous config saved to /var/cache/conftool/dbconfig/20201005-050636-marostegui.json


== 2020-10-03 ==
== 2021-10-01 ==
* 15:52 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: emergency: {{Gerrit|840545f1d9115ea6b672cecce1762d850d8b1f54}}: Restrict flow-hide right to autoconfirmed users on zhwiki ([[phab:T264489|T264489]]) (duration: 01m 17s)
* 23:19 bd808@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'toolhub' for release 'main' .
* 00:08 ejegg: updated fundraising CiviCRM from {{Gerrit|256adda03c}} to {{Gerrit|a30da7f92a}}
* 22:27 mutante: puppetmaster2001 - systemctl reset-failed
* 22:16 mutante: puppetmaster2001 systemctl disable geoip_update_ipinfo.timer
* 22:15 mutante: puppetmaster2001 - sudo /usr/local/bin/geoipupdate_job after adding new shell command and timer - succesfully downloaded enterprise database for [[phab:T288844|T288844]]
* 21:56 bd808@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'toolhub' for release 'main' .
* 21:44 mutante: puppetmasters - temp. disabling puppet one more time, now for a different deploy, to fetch an additional MaxMind database - [[phab:T288844|T288844]]
* 21:19 mutante: puppetmaster2001 - puppet removed cron sync_volatile and cron sync_ca - starting and verifying new timers: 'systemctl status sync-puppet-volatile', 'systemctl status sync-puppet-ca' [[phab:T273673|T273673]]
* 21:12 mutante: puppetmaster1002, puppetmaster1003, puppetmaster2002, puppetmaster2003: re-enabled puppet, they are backends. backends don't have the sync cron/job/timer, so noop as well, just like 1004/1005/2004/2005. this just leaves the actual change on 2001  - [[phab:T273673|T273673]]
* 21:07 mutante: puppetmaster1004, puppetmaster1005, puppetmaster2004, puppetmaster2005: re-enabled puppet, they are "insetup" role
* 21:06 mbsantos@deploy1002: Finished deploy [kartotherian/deploy@d309a6e] (eqiad): tegola: reduce load to 50% during the weekend (duration: 00m 54s)
* 21:05 mbsantos@deploy1002: Started deploy [kartotherian/deploy@d309a6e] (eqiad): tegola: reduce load to 50% during the weekend
* 21:05 mutante: puppetmaster1001 - re-enabled puppet, noop as expected, the passive host pulls from the active one, so only 2001 has the cron/job/timer
* 21:05 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 21:02 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 21:01 legoktm@deploy1002: Synchronized wmf-config/CommonSettings.php: Revert "Have PdfHandler use Shellbox on Commons for 10% of requests" (duration: 00m 59s)
* 20:58 mutante: temp disabling puppet on puppetmasters - deploying gerrit:724115 (gerrit:723310) [[phab:T273673|T273673]]
* 18:58 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-db1002.eqiad.wmnet with reason: REIMAGE
* 18:56 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-db1001.eqiad.wmnet with reason: REIMAGE
* 18:55 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-db1002.eqiad.wmnet with reason: REIMAGE
* 18:53 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-db1001.eqiad.wmnet with reason: REIMAGE
* 18:07 robh@cumin1001: END (ERROR) - Cookbook sre.experimental.reimage (exit_code=97) for host an-db1001.eqiad.wmnet
* 18:05 robh@cumin1001: START - Cookbook sre.experimental.reimage for host an-db1001.eqiad.wmnet
* 17:58 effie: depool mw1025, mw1319, mw1312 for test
* 16:20 dancy: testing upcoming Scap 4.0.2 release on beta
* 14:04 bblack: C:envoyproxy (appservers and others): restarting envoyproxy
* 14:04 bblack: C:envoyproxy (appservers and others): ca-certificates updated via cumin to workaround [[phab:T292291|T292291]] issues
* 13:45 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
* 13:45 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 13:23 bblack: manually trying LE expired root workaround on mwdebug1001 with puppet disabled ...
* 13:12 gehel@cumin1001: START - Cookbook sre.wdqs.data-reload
* 13:11 gehel@cumin1001: END (ERROR) - Cookbook sre.wdqs.data-reload (exit_code=97)
* 13:11 gehel@cumin1001: START - Cookbook sre.wdqs.data-reload
* 13:10 gehel@cumin1001: START - Cookbook sre.wdqs.data-reload
* 11:42 jgiannelos@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 11:11 jynus: manually migrating some vms out of ganeti1009 to avoid excessive memory pressure
* 10:58 marostegui@cumin1001: dbctl commit (dc=all): 'db1164 (re)pooling @ 100%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17413 and previous config saved to /var/cache/conftool/dbconfig/20211001-105849-root.json
* 10:57 marostegui@cumin1001: dbctl commit (dc=all): 'db1177 (re)pooling @ 100%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17412 and previous config saved to /var/cache/conftool/dbconfig/20211001-105735-root.json
* 10:43 jgiannelos@deploy1002: Finished deploy [kartotherian/deploy@d4caf6d] (eqiad): Increase mirrored traffic to 100% for eqiad (duration: 00m 49s)
* 10:43 marostegui@cumin1001: dbctl commit (dc=all): 'db1164 (re)pooling @ 75%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17411 and previous config saved to /var/cache/conftool/dbconfig/20211001-104345-root.json
* 10:43 jgiannelos@deploy1002: Started deploy [kartotherian/deploy@d4caf6d] (eqiad): Increase mirrored traffic to 100% for eqiad
* 10:42 marostegui@cumin1001: dbctl commit (dc=all): 'db1177 (re)pooling @ 75%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17410 and previous config saved to /var/cache/conftool/dbconfig/20211001-104232-root.json
* 10:28 marostegui@cumin1001: dbctl commit (dc=all): 'db1164 (re)pooling @ 50%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17409 and previous config saved to /var/cache/conftool/dbconfig/20211001-102841-root.json
* 10:27 marostegui@cumin1001: dbctl commit (dc=all): 'db1177 (re)pooling @ 50%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17408 and previous config saved to /var/cache/conftool/dbconfig/20211001-102728-root.json
* 10:13 marostegui@cumin1001: dbctl commit (dc=all): 'db1164 (re)pooling @ 25%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17407 and previous config saved to /var/cache/conftool/dbconfig/20211001-101338-root.json
* 10:12 marostegui@cumin1001: dbctl commit (dc=all): 'db1177 (re)pooling @ 25%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17406 and previous config saved to /var/cache/conftool/dbconfig/20211001-101224-root.json
* 10:00 jgiannelos@deploy1002: Finished deploy [kartotherian/deploy@c123ab9] (eqiad): Increase mirrored traffic to 80% for eqiad (duration: 00m 51s)
* 09:59 jgiannelos@deploy1002: Started deploy [kartotherian/deploy@c123ab9] (eqiad): Increase mirrored traffic to 80% for eqiad
* 09:58 marostegui@cumin1001: dbctl commit (dc=all): 'db1164 (re)pooling @ 10%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17405 and previous config saved to /var/cache/conftool/dbconfig/20211001-095834-root.json
* 09:57 marostegui@cumin1001: dbctl commit (dc=all): 'db1177 (re)pooling @ 10%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17404 and previous config saved to /var/cache/conftool/dbconfig/20211001-095720-root.json
* 09:55 marostegui: Upgrade db1164 and db1177
* 09:54 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1177 and db1164 for upgrade', diff saved to https://phabricator.wikimedia.org/P17403 and previous config saved to /var/cache/conftool/dbconfig/20211001-095433-marostegui.json
* 09:49 marostegui@cumin1001: dbctl commit (dc=all): 'db1172 (re)pooling @ 100%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17402 and previous config saved to /var/cache/conftool/dbconfig/20211001-094913-root.json
* 09:49 marostegui@cumin1001: dbctl commit (dc=all): 'db1135 (re)pooling @ 100%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17401 and previous config saved to /var/cache/conftool/dbconfig/20211001-094902-root.json
* 09:38 urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript extensions/GrowthExperiments/maintenance/updateMenteeData.php --wiki=frwiki --force # to get an idea about timing for [[phab:T290609|T290609]], runs in a tmux session under my account
* 09:34 marostegui@cumin1001: dbctl commit (dc=all): 'db1172 (re)pooling @ 75%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17400 and previous config saved to /var/cache/conftool/dbconfig/20211001-093410-root.json
* 09:33 marostegui@cumin1001: dbctl commit (dc=all): 'db1135 (re)pooling @ 75%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17399 and previous config saved to /var/cache/conftool/dbconfig/20211001-093358-root.json
* 09:25 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host testvm2002.codfw.wmnet
* 09:19 marostegui@cumin1001: dbctl commit (dc=all): 'db1172 (re)pooling @ 50%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17398 and previous config saved to /var/cache/conftool/dbconfig/20211001-091906-root.json
* 09:18 marostegui@cumin1001: dbctl commit (dc=all): 'db1135 (re)pooling @ 50%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17397 and previous config saved to /var/cache/conftool/dbconfig/20211001-091854-root.json
* 09:04 marostegui@cumin1001: dbctl commit (dc=all): 'db1172 (re)pooling @ 25%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17396 and previous config saved to /var/cache/conftool/dbconfig/20211001-090402-root.json