You are browsing a read-only backup copy of Wikitech. The primary site can be found at wikitech.wikimedia.org

Difference between revisions of "Server Admin Log"

From Wikitech-static
Jump to navigation Jump to search
imported>Stashbot
(XenoRyet: updated civicrm from cf86495d44 to 8c77e9e915)
imported>Stashbot
(ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0))
 
(547 intermediate revisions by 3 users not shown)
Line 1: Line 1:
== 2020-02-14 ==
== 2021-10-16 ==
* 23:42 XenoRyet: updated civicrm from {{Gerrit|cf86495d44}} to {{Gerrit|8c77e9e915}}
* 03:56 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 21:01 volker-e@deploy1001: Finished deploy [design/style-guide@1928c00]: Deploy design/style-guide:  (duration: 00m 09s)
* 02:19 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 21:01 volker-e@deploy1001: Started deploy [design/style-guide@1928c00]: Deploy design/style-guide:
* 01:30 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 20:21 reedy@deploy1001: Synchronized wmf-config/CommonSettings.php: Prevent some logspam [[phab:T245280|T245280]] (duration: 01m 05s)
* 19:27 XenoRyet: updated civicrm from {{Gerrit|55b2afb6eb}} to {{Gerrit|cf86495d44}}
* 19:10 jforrester@deploy1001: Synchronized php-1.35.0-wmf.19/extensions/Wikibase: [[phab:T245062|T245062]] Prevent invalid term languages from cached PrefetchingTermLookup (duration: 01m 09s)
* 17:37 jforrester@deploy1001: Unlocked for deployment [ALL REPOSITORIES]: Testing [[phab:T245062|T245062]] fix on mwdebug1001 (duration: 03m 05s)
* 17:33 jforrester@deploy1001: Locking from deployment [ALL REPOSITORIES]: Testing [[phab:T245062|T245062]] fix on mwdebug1001 (planned duration: 60m 00s)
* 16:11 moritzm: installing git-lfs updates from Buster 10.3 point update
* 15:55 moritzm: uploaded pypuppetdb 0.3.3-2~wmf+deb10u1 to apt.wikimedia.org
* 15:55 bblack: (log(n))
* 15:54 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db2086:3318 [[phab:T239453|T239453]]', diff saved to https://phabricator.wikimedia.org/P10414 and previous config saved to /var/cache/conftool/dbconfig/20200214-155443-marostegui.json
* 15:52 moritzm: uploaded pypuppetdb 0.3.3-2~wmf+deb9u1 to apt.wikimedia.org
* 15:46 ebernhardson@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Resync initialisesetting to try and pick up previoiusly deployed cirrus query routing changes (duration: 01m 05s)
* 15:42 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 15:42 jmm@cumin2001: START - Cookbook sre.hosts.downtime
* 15:32 effie: restart mc-gp* for updates
* 15:17 bd808: Toil reduction: !log messages now work from the SRE team's Freenode channel.
* 13:50 gehel: restart relforge for JVM upgrade - [[phab:T245120|T245120]]
* 10:35 vgutierrez: revert ats 8.0.6-rc0 experiment on cp40[26,32]
* 10:14 vgutierrez: rolling restart of ats-be to enable TLSv1.3 against origin servers - [[phab:T170567|T170567]]
* 09:34 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1107 after 10.4 testing - [[phab:T242702|T242702]]', diff saved to https://phabricator.wikimedia.org/P10409 and previous config saved to /var/cache/conftool/dbconfig/20200214-093456-marostegui.json
* 09:32 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 09:32 jmm@cumin2001: START - Cookbook sre.hosts.downtime
* 09:25 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 09:25 jmm@cumin2001: START - Cookbook sre.hosts.downtime
* 09:25 volans: manually absented /usr/local/bin/apt2xml on the 5 hosts with puppet disabled
* 09:15 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 09:15 jmm@cumin2001: START - Cookbook sre.hosts.downtime
* 09:12 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 09:12 jmm@cumin2001: START - Cookbook sre.hosts.downtime
* 09:05 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 09:05 jmm@cumin2001: START - Cookbook sre.hosts.downtime
* 09:05 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 09:05 jmm@cumin2001: START - Cookbook sre.hosts.downtime
* 09:05 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 09:05 jmm@cumin2001: START - Cookbook sre.hosts.downtime
* 08:46 moritzm: installing 4.19.98 kernel update on Buster systems
* 08:06 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1107 with weight 100 for 10.4 testing - [[phab:T242702|T242702]]', diff saved to https://phabricator.wikimedia.org/P10408 and previous config saved to /var/cache/conftool/dbconfig/20200214-080600-marostegui.json
* 06:51 vgutierrez: updating puppet compiler facts
* 01:27 dpifke@deploy1001: Finished deploy [performance/navtiming@2eec00a]: (no justification provided) (duration: 00m 05s)
* 01:27 dpifke@deploy1001: Started deploy [performance/navtiming@2eec00a]: (no justification provided)
* 00:53 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [[phab:T245202|T245202]] cirrus: Move all move_like traffic to codfw (duration: 01m 02s)
* 00:51 jforrester@deploy1001: Synchronized wmf-config/PoolCounterSettings.php: [[phab:T245202|T245202]] cirrus: Increase the pool counter limits a bit (duration: 01m 05s)


== 2020-02-13 ==
== 2021-10-15 ==
* 22:13 jeh: running filesystem tests on cloudvirt1024 [[phab:T241884|T241884]]
* 23:48 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 21:42 otto@deploy1001: helmfile [EQIAD] Ran 'apply' command on namespace 'eventgate-logging-external' for release 'canary' .
* 23:27 dzahn@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'miscweb' for release 'main' .
* 21:41 otto@deploy1001: helmfile [EQIAD] Ran 'apply' command on namespace 'eventgate-logging-external' for release 'production' .
* 23:23 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 21:40 jbond42: refresh facts on compilers
* 22:38 mutante: apt1001 - removing nginx package, accidentally installed, should just be nginx-light of course, running puppet
* 21:38 otto@deploy1001: helmfile [CODFW] Ran 'apply' command on namespace 'eventgate-logging-external' for release 'canary' .
* 22:36 mutante: apt2001 - removing nginx package, accidentally installed, should just be nginx-light of course, running puppet
* 21:37 otto@deploy1001: helmfile [CODFW] Ran 'apply' command on namespace 'eventgate-logging-external' for release 'production' .
* 22:34 mutante: apt2001 - upgraded nginx
* 21:35 ottomata: deploying production and canary releases for eventgate-logging-external (and destroying the 'logging-external' release) (safe because eventgate-logging-external is not in use)  - [[phab:T245203|T245203]]
* 22:18 dzahn@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'miscweb' for release 'main' .
* 21:29 otto@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'eventgate-logging-external' for release 'canary' .
* 22:14 dzahn@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'miscweb' for release 'main' .
* 21:28 otto@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'eventgate-logging-external' for release 'production' .
* 22:05 dpifke@deploy1002: Finished deploy [performance/arc-lamp@40cb764]: Revert problematic arclamp patch to fix daemon crashes (duration: 00m 05s)
* 20:33 marxarelli: rollback to group1 due to 500 spike (2k/min) ([[phab:T233867|T233867]])
* 22:05 dpifke@deploy1002: Started deploy [performance/arc-lamp@40cb764]: Revert problematic arclamp patch to fix daemon crashes
* 20:32 dduvall@deploy1001: rebuilt and synchronized wikiversions files: (no justification provided)
* 21:51 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 20:30 marxarelli: varnish 500 spike. rolling back
* 21:44 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 20:20 gehel: restarting blazegraph + updater on wdqs2006
* 21:44 dzahn@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'miscweb' for release 'main' .
* 20:19 dduvall@deploy1001: rebuilt and synchronized wikiversions files: all wikis to 1.35.0-wmf.19
* 21:36 dzahn@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'miscweb' for release 'main' .
* 19:44 jforrester@deploy1001: Synchronized php-1.35.0-wmf.19/includes/api/ApiRollback.php: [[phab:T245159|T245159]] ApiRollback: Properly deal with UserIdentity (duration: 01m 04s)
* 20:09 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 19:20 jforrester@deploy1001: Synchronized php-1.35.0-wmf.19/includes/resourceloader/ResourceLoaderSkinModule.php: [[phab:T245182|T245182]] ResourceLoaderSkinModule: Don't hard-deprecate wgLogoHD just now (duration: 01m 03s)
* 18:44 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 19:17 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [[phab:T219534|T219534]] Add new MLR models for Cirrus on zh/ja/kowiki (duration: 01m 03s)
* 17:20 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 19:10 moritzm: installing e2fsprogs security updates
* 17:17 mutante: gitlab1001 - disabling puppet for debugging
* 18:48 bblack: ns1.wikimedia.org - re-routing back to authdns2001 instead of dns2002 on cr[12]-codfw - [[phab:T242017|T242017]]
* 17:05 mutante: gitlab2001 - temp stopped puppet - debugging gitlab restore script with Arnold - [[phab:T283076|T283076]]
* 18:38 bblack: authdns2001 - reboot - [[phab:T242017|T242017]]
* 17:01 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 18:36 bblack: ns1.wikimedia.org - re-routing from authdns2001 to dns2002 on cr[12]-codfw - [[phab:T242017|T242017]]
* 16:50 mutante: gitlab2001 - temp stopped puppet - debugging gitlab restore script with Arnold
* 18:09 krinkle@deploy1001: Synchronized wmf-config/CommonSettings.php: {{Gerrit|I9d0c8af3c577}} (duration: 01m 06s)
* 16:46 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 18:00 krinkle@deploy1001: Synchronized wmf-config/etcd.php: {{Gerrit|Iae1f45896}} (duration: 01m 06s)
* 16:44 jayme@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'shellbox-constraints' for release 'main' .
* 17:59 volans: downtimed mgmt in eqiad for 1h
* 15:23 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 17:58 krinkle@deploy1001: Synchronized wmf-config/CommonSettings.php: {{Gerrit|Iae1f45896}} (duration: 01m 08s)
* 15:23 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 17:49 krinkle@deploy1001: Synchronized wmf-config/etcd.php: {{Gerrit|Ibfca686f681}} (duration: 01m 06s)
* 15:08 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 17:41 krinkle@deploy1001: Synchronized wmf-config/etcd.php: {{Gerrit|Iefff596955e}} (duration: 01m 08s)
* 15:08 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 17:40 krinkle@deploy1001: Synchronized wmf-config/CommonSettings.php: {{Gerrit|Iefff596955e}} (duration: 01m 06s)
* 14:48 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 17:35 krinkle@deploy1001: Synchronized wmf-config/etcd.php: {{Gerrit|I2e4fb0c086de0f8ac}} (duration: 01m 06s)
* 14:31 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 17:32 krinkle@deploy1001: Synchronized wmf-config/CommonSettings.php: {{Gerrit|I2e4fb0c086de0f8ac}} (duration: 01m 06s)
* 14:15 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 17:10 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: No-op (code style only) deploy sync (duration: 01m 07s)
* 13:32 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 17:09 jforrester@deploy1001: sync aborted: wmf-config/CommonSettings.php No-op (code style only) deploy sync (duration: 00m 04s)
* 13:32 ryankemper@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99)
* 17:09 jforrester@deploy1001: Started scap: wmf-config/CommonSettings.php No-op (code style only) deploy sync
* 13:32 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 16:32 robh: ps1-a8-codfw.mgmt.codfw.wmnet firmware upgraded via [[phab:T245164|T245164]]
* 13:30 elukey: start topic rebalancing for kafka main-eqiad (long maintenance, it will last a couple of days)
* 16:28 papaul: rebooting elastic2043 for firmware upgrade
* 13:24 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 16:22 gehel: canceled the restart of elastic2043 - [[phab:T243715|T243715]]
* 13:21 vgutierrez: updating acme-chief to version 0.34 on acmechief-test instances - [[phab:T292619|T292619]]
* 16:21 gehel: restarting elastic2043 - [[phab:T243715|T243715]]
* 13:19 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 16:10 _joe_: depooling/repooling mw1240
* 13:14 vgutierrez: upload acme-chief 0.34 to apt.wikimedia.org (buster) - [[phab:T292619|T292619]]
* 16:02 _joe_: pooled mw1238 again
* 11:55 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 15:59 _joe_: depooling mw1238 for analysis
* 11:49 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 15:42 vgutierrez: rolling restart of ats-be on esams - [[phab:T170567|T170567]]
* 11:48 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host testvm2007.codfw.wmnet
* 15:38 vgutierrez: disable allow_half_open on ats-tls @ cp4031 - [[phab:T236458|T236458]]
* 11:45 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 15:27 vgutierrez: turning on TLSv1.3 between ats-be and applayer in cp30[51-52] - [[phab:T170567|T170567]]
* 11:33 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 15:22 jforrester@deploy1001: Synchronized php-1.35.0-wmf.19/extensions/WikibaseMediaInfo/resources/: UBN fix: Force non-value to be undefined (duration: 01m 06s)
* 11:24 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host testvm2007.codfw.wmnet
* 14:51 vgutierrez: test TLSv1.3 between ats-be and applayer in cp3050 - [[phab:T170567|T170567]]
* 11:14 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 14:47 XioNoX: re-image rpki2001 - [[phab:T244585|T244585]]
* 10:46 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 14:33 XioNoX: add routinator_0.6.4_amd64.deb to buster-wikimedia apt repo
* 09:15 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 14:27 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1107 after 10.4 testing - [[phab:T242702|T242702]]', diff saved to https://phabricator.wikimedia.org/P10405 and previous config saved to /var/cache/conftool/dbconfig/20200213-142735-marostegui.json
* 09:06 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 14:24 XioNoX: re-enable ping offload in esams - [[phab:T244584|T244584]]
* 08:58 jelto: jelto@gitlab1001:~$ sudo disable-puppet "disable puppet on gitlab1001 to test 728380 on GitLab replica - [[phab:T283076|T283076]]"
* 13:31 XioNoX: disable ping offload in esams - [[phab:T244584|T244584]]
* 07:41 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 13:24 XioNoX: re-enable ping offload in eqiad - [[phab:T244584|T244584]]
* 06:20 urbanecm: Start server-side upload for 1 video file
* 13:06 XioNoX: disable ping offload in eqiad - [[phab:T244584|T244584]]
* 02:14 ryankemper: [[phab:T288231|T288231]] `wdqs2006` data transfer complete and all tests passing on the host. All of `codfw wdqs-internal` is on the new streaming updater
* 13:03 XioNoX: re-enable ping offload in codfw - [[phab:T244584|T244584]]
* 00:09 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 13:00 vgutierrez: pool cp10[75,76] running buster - [[phab:T242093|T242093]]
* 00:07 brennen: end of UTC late backport & config training window
* 12:51 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 12:49 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 12:49 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 12:47 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 12:34 Amir1: EU SWAT is done
* 12:30 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:571961{{!}}Read and write more in the new term store]], take II, the cache issue ([[phab:T219123|T219123]] [[phab:T225055|T225055]]) (duration: 01m 03s)
* 12:29 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:571961{{!}}Read and write more in the new term store]] (duration: 01m 03s)
* 12:29 vgutierrez: depool cp10[75,76] and reimage as buster - [[phab:T242093|T242093]]
* 12:28 vgutierrez: pool cp10[77,78] running buster - [[phab:T242093|T242093]]
* 12:20 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:571956{{!}}Revert: Triple the factor of WDQS lag to maxlag for Wikidata (T244722)]] (duration: 01m 04s)
* 12:18 XioNoX: re-image ping2001 to buster - [[phab:T244584|T244584]]
* 12:16 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 12:15 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: {{Gerrit|1c81925}}: Create Test Custodians group at Beta Wikiversity ([[phab:T240438|T240438]]) (duration: 01m 07s)
* 12:13 XioNoX: disable ping offload in codfw
* 12:13 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 12:13 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 12:13 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: {{Gerrit|0f035e4}}: Update wgAvailableRights declaration of autoreviewprotected ([[phab:T230103|T230103]]) (duration: 01m 03s)
* 12:11 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 12:08 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: {{Gerrit|176b0e8}}: Grant autopatrol to azwiki patrollers ([[phab:T244338|T244338]]) (duration: 01m 05s)
* 11:53 vgutierrez: depool cp10[77,78] and reimage as buster - [[phab:T242093|T242093]]
* 11:52 vgutierrez: pool cp10[79,80] running buster - [[phab:T242093|T242093]]
* 11:40 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 11:37 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 11:37 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 11:35 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 11:18 vgutierrez: rolling upgrade of ATS to version 8.0.5-1wm16 fleet wide - [[phab:T244464|T244464]]
* 11:16 vgutierrez: depool cp10[79,80] and reimage as buster - [[phab:T242093|T242093]]
* 11:12 ema: A:cp re-enable puppet, leave it to cron to apply wikimedia-common/wikimedia-frontend VCL merge [[phab:T241239|T241239]]
* 11:08 vgutierrez: upload trafficserver 8.0.5-1wm16 to apt.wm.o (buster) - [[phab:T244464|T244464]]
* 11:02 vgutierrez: pool cp10[81,82] and reimage as buster - [[phab:T242093|T242093]]
* 10:59 ema: cp4021 (cache_upload): apply wikimedia-common/wikimedia-frontend VCL merge [[phab:T241239|T241239]]
* 10:49 ema: cp4027 (cache_text): apply wikimedia-common/wikimedia-frontend VCL merge [[phab:T241239|T241239]]
* 10:46 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 10:44 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 10:44 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 10:41 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 10:23 vgutierrez: removing /root/.ssh/known_hosts in cumin1001
* 10:21 vgutierrez: pool cp10[83,84] running buster - [[phab:T242093|T242093]]
* 10:08 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 10:06 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 10:06 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 10:03 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 09:45 vgutierrez: depool cp10[83,84] and reimage as buster - [[phab:T242093|T242093]]
* 09:45 vgutierrez: pool cp10[85,86] running buster - [[phab:T242093|T242093]]
* 09:10 moritzm: installing Java security updates on elastic* and relforge*
* 08:59 marostegui@cumin1001: dbctl commit (dc=all): 'Increase weight for db1107 50 -> 100 for 10.4 testing - [[phab:T242702|T242702]]', diff saved to https://phabricator.wikimedia.org/P10403 and previous config saved to /var/cache/conftool/dbconfig/20200213-085957-marostegui.json
* 08:57 gehel: restart elasticsearch on elastic2051 - JVM upgrade
* 08:21 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 08:18 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 08:17 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 08:15 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 07:57 moritzm: installing Java security updates on Hadoop, Kafka/Jumbo, AQS and Druid canaries
* 07:57 vgutierrez: depool cp10[85,86] and reimage as buster - [[phab:T242093|T242093]]
* 07:53 moritzm: rolling restart of restbase-dev to pick up Java security update
* 07:49 vgutierrez: pool cp10[87,88] running buster - [[phab:T242093|T242093]]
* 07:49 vgutierrez: testing ATS 8.0.5-1wm16 + KA between ats-tls and varnish-fe in cp4031 - [[phab:T244464|T244464]]
* 07:47 moritzm: installing Java security updates on stat/SWAP hosts
* 07:28 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1107 with weight 50 for 10.4 testing - [[phab:T242702|T242702]]', diff saved to https://phabricator.wikimedia.org/P10402 and previous config saved to /var/cache/conftool/dbconfig/20200213-072839-marostegui.json
* 07:26 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 07:24 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 07:23 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 07:21 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 07:03 vgutierrez: depool cp10[87,88] and reimage as buster - [[phab:T242093|T242093]]
* 07:02 vgutierrez: pool cp10[89,90] running buster - [[phab:T242093|T242093]]
* 06:49 vgutierrez: pool cp20[02,05] running buster - [[phab:T242093|T242093]]
* 06:36 marostegui: Upgrade and compress db1087, this will generate lag on s8 on the wiki replicas - [[phab:T232446|T232446]]
* 06:35 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1087 for compression - [[phab:T232446|T232446]]', diff saved to https://phabricator.wikimedia.org/P10401 and previous config saved to /var/cache/conftool/dbconfig/20200213-063535-marostegui.json
* 06:34 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 06:33 marostegui@cumin1001: dbctl commit (dc=all): 'Pool db1099:3318 into vslow for s8 [[phab:T239453|T239453]]', diff saved to https://phabricator.wikimedia.org/P10400 and previous config saved to /var/cache/conftool/dbconfig/20200213-063334-marostegui.json
* 06:32 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 06:32 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1099:3318, db1099:3311 [[phab:T239453|T239453]]', diff saved to https://phabricator.wikimedia.org/P10399 and previous config saved to /var/cache/conftool/dbconfig/20200213-063207-marostegui.json
* 06:30 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 06:28 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 06:26 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1099:3318, db1099:3311 [[phab:T239453|T239453]]', diff saved to https://phabricator.wikimedia.org/P10398 and previous config saved to /var/cache/conftool/dbconfig/20200213-062642-marostegui.json
* 06:25 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 06:23 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 06:22 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 06:21 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1099:3318, db1099:3311 [[phab:T239453|T239453]]', diff saved to https://phabricator.wikimedia.org/P10397 and previous config saved to /var/cache/conftool/dbconfig/20200213-062148-marostegui.json
* 06:19 vgutierrez: testing a new build of ATS 8.0.6 in cp40[26,32]
* 06:19 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 06:12 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1099:3318, db1099:3311 [[phab:T239453|T239453]]', diff saved to https://phabricator.wikimedia.org/P10396 and previous config saved to /var/cache/conftool/dbconfig/20200213-061219-marostegui.json
* 06:11 vgutierrez: depool cp10[89,90] and reimage as buster - [[phab:T242093|T242093]]
* 06:04 vgutierrez: depool cp20[02,05] and reimage as buster - [[phab:T242093|T242093]]
* 06:04 vgutierrez: pool cp20[01,08] running buster - [[phab:T242093|T242093]]
* 06:02 twentyafterfour: set phabricator read-only to false
* 06:01 twentyafterfour: set phabricator read-only
* 06:00 marostegui: Start phabricator maintenance [[phab:T244566|T244566]]
* 05:55 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 05:53 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 05:53 marostegui: Upgrade db1128 without restarting mysql - [[phab:T244566|T244566]]
* 05:52 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 05:50 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 05:47 marostegui: Silence m3 hosts for maintenance - [[phab:T244566|T244566]]
* 05:38 vgutierrez: depool cp2008 and reimage as buster - [[phab:T242093|T242093]]
* 05:37 vgutierrez: pool cp2011 running buster - [[phab:T242093|T242093]]
* 05:35 vgutierrez: depool cp2001 and reimage as buster - [[phab:T242093|T242093]]
* 05:34 vgutierrez: pool cp2004 running buster - [[phab:T242093|T242093]]
* 05:30 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 05:28 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 05:28 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 05:25 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 05:09 vgutierrez: depool cp20[04,11] and reimage as buster - [[phab:T242093|T242093]]
* 03:57 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 03:57 pt1979@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 03:54 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 03:52 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 03:52 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 03:52 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 03:32 pt1979@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 03:30 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 03:28 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 03:27 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 03:06 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 03:04 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 03:02 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 03:00 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 02:44 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 02:41 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 02:35 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 02:33 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 01:22 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 01:20 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 01:18 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 01:17 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 01:10 twentyafterfour: no apparent problems with phabricator upgrade, all done
* 01:01 twentyafterfour: starting phabricator deploy, momentary downtime expected while apache restarts
* 00:58 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 00:56 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 00:56 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 00:54 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 00:45 niharika29@deploy1001: Synchronized wmf-config/throttle.php: Throttle rule for National Gallery of Canada Library and Archives edit-a-thon - [[phab:T244488|T244488]] (duration: 01m 07s)
* 00:36 pt1979@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 00:34 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 00:32 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 00:31 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 00:11 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 00:08 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 00:08 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 00:06 pt1979@cumin2001: START - Cookbook sre.hosts.downtime


== 2020-02-12 ==
== 2021-10-14 ==
* 23:46 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 23:59 cjming@deploy1002: Synchronized wmf-config/logos.php: Config: [[gerrit:730737{{!}}Change Kashmiri Wikipedia logo (T293342)]] (duration: 00m 55s)
* 23:44 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 23:58 cjming@deploy1002: Synchronized logos/config.yaml: Config: [[gerrit:730737{{!}}Change Kashmiri Wikipedia logo (T293342)]] (duration: 00m 55s)
* 23:43 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 23:56 cjming@deploy1002: Synchronized static/images/project-logos: Config: [[gerrit:730737{{!}}Change Kashmiri Wikipedia logo (T293342)]] (duration: 00m 56s)
* 23:41 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 23:49 cjming@deploy1002: Synchronized wmf-config/logos.php: Config: [[gerrit:730736{{!}}Change Kashmiri Wiktionary logo (T293373)]] (duration: 00m 55s)
* 23:11 XioNoX: deactivate BGP to office's router1 while it's on maintenance
* 23:48 cjming@deploy1002: Synchronized logos/config.yaml: Config: [[gerrit:730736{{!}}Change Kashmiri Wiktionary logo (T293373)]] (duration: 00m 55s)
* 21:59 otto@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'eventgate-analytics' for release 'production' .
* 23:46 cjming@deploy1002: Synchronized static/images/project-logos: Config: [[gerrit:730736{{!}}Change Kashmiri Wiktionary logo (T293373)]] (duration: 00m 56s)
* 21:58 otto@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'eventgate-analytics' for release 'production' .
* 23:43 ejegg: updated payments-wiki from {{Gerrit|19d18c1852}} to {{Gerrit|0f48acea49}}
* 21:57 otto@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'eventgate-analytics' for release 'canary' .
* 23:34 cjming@deploy1002: Synchronized php-1.38.0-wmf.4/extensions/WikimediaEvents/includes/VectorPrefDiffInstrumentation.php: Backport: [[gerrit:730733{{!}}Change VectorPrefDiffInstrumentation stream name to `mediawiki.skin_diff` (T289622)]] (duration: 00m 56s)
* 21:53 chaomodus: restart nagios-nrpe-service on cumin1001 after it had oomed
* 23:24 cjming@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:730936{{!}}allow sysops to add and remove users to other groups on ptwikivoyage (T292806)]] (duration: 00m 56s)
* 21:51 otto@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'eventgate-analytics' for release 'canary' .
* 23:21 ryankemper@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) restart without plugin upgrade (1 nodes at a time) for ElasticSearch cluster cloudelastic: cloudelastic restart - ryankemper@cumin1001 - [[phab:T292814|T292814]]
* 21:51 otto@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'eventgate-analytics' for release 'canary' .
* 23:11 brennen@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:730933{{!}}Add americanantiquarian.org to the wgCopyUploadsDomains allowlist of Wikimedia Commons (T292918)]] (duration: 00m 57s)
* 21:47 otto@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'eventgate-analytics' for release 'canary' .
* 23:11 mutante: mw1452 - re-pooled, scap pull
* 21:18 otto@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'eventgate-analytics' for release 'canary' .
* 23:09 dzahn@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'miscweb' for release 'main' .
* 21:18 otto@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'eventgate-analytics' for release 'production' .
* 22:35 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 21:10 marxarelli: completed group1 to 1.35.0-wmf.19
* 22:35 ryankemper: [[phab:T288231|T288231]] Ran puppet on `wdqs2006`, now back to the cookbook run
* 21:00 dduvall@deploy1001: Synchronized php: group1 wikis to 1.35.0-wmf.19 (duration: 01m 03s)
* 22:33 ryankemper: [[phab:T288231|T288231]] Forgot about running puppet-agent on `wdqs2006`; aborted cookbook run
* 20:59 dduvall@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.35.0-wmf.19
* 22:33 ryankemper@cumin1001: END (ERROR) - Cookbook sre.wdqs.data-transfer (exit_code=97)
* 20:49 krinkle@deploy1001: Synchronized wmf-config/CommonSettings.php: [[phab:T232563|T232563]] - Remove SERVER_SOFTWARE override (duration: 01m 03s)
* 22:33 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 20:39 krinkle@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [[phab:T72470|T72470]] - Disable wgLegacyJavaScriptGlobals on svwiki (duration: 01m 08s)
* 22:32 ryankemper: [[phab:T288231|T288231]] Merged https://gerrit.wikimedia.org/r/c/operations/puppet/+/730795; proceeding to data-transfer on `wdqs2006`: `sudo rm -fv /srv/wdqs/data_loaded` on `wdqs2006` followed by `ryankemper@cumin1001:~$ sudo cookbook sre.wdqs.data-transfer --source wdqs2008.codfw.wmnet --dest wdqs2006.codfw.wmnet --reason "streaming updater cutover for wdqs2005" --blazegraph_instance blazegraph --task-id [[phab:T288231|T288231]]`
* 19:53 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Don't use hex escapes in the name of cawiki (duration: 01m 04s)
* 22:31 mutante: depooling mw1452 for testig
* 19:47 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [[phab:T243503|T243503]] [itwiki] Move assignment of 'mover' group from sysops to bureaucrats (duration: 01m 02s)
* 22:28 ryankemper: [[phab:T288231|T288231]] `ryankemper@wdqs2005:~$ sudo pool`: transfer completed successfully; tests passing on host (used `ssh -L 9999:localhost:80 wdqs2005.codfw.wmnet` to establish tunnel)
* 19:42 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [[phab:T243509|T243509]] [zh_classicalwiki] Enable new user message for auto-created accounts (duration: 01m 03s)
* 22:23 dpifke@deploy1002: Finished deploy [performance/arc-lamp@84fe496]: New flamegraph.pl from upstream [[phab:T291898|T291898]] (duration: 00m 05s)
* 19:38 James_F: Ran mwscript maintenance/namespaceDupes.php --wiki=mywiki --fix and mwscript maintenance/namespaceDupes.php --wiki=mywiktionary --fix on mwmaint1002
* 22:23 dpifke@deploy1002: Started deploy [performance/arc-lamp@84fe496]: New flamegraph.pl from upstream [[phab:T291898|T291898]]
* 19:37 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [[phab:T244980|T244980]] Localise $wgMetaNamespace for mywiki and mywiktionary (duration: 01m 03s)
* 22:17 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation restart without plugin upgrade (1 nodes at a time) for ElasticSearch cluster cloudelastic: cloudelastic restart - ryankemper@cumin1001 - [[phab:T292814|T292814]]
* 19:30 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [[phab:T244205|T244205]] [newiki] Set local timezone to Kathmandu (duration: 01m 03s)
* 22:07 eileen: civicrm revision changed from {{Gerrit|018d3b19fe}} to {{Gerrit|9b5e0d015b}}, config revision is {{Gerrit|781d6a1b1f}}
* 19:19 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [[phab:T241883|T241883]] [fywiktionary] Set a local wgSitename (duration: 01m 03s)
* 21:34 robh@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 19:12 jforrester@deploy1001: Synchronized wmf-config/throttle-analyze.php: Replace deprecated IP class with IPUtils (no-op sync) (duration: 01m 03s)
* 21:25 robh@cumin1001: START - Cookbook sre.dns.netbox
* 18:31 mutante: irc2001 - manually run the "$<nowiki>{</nowiki>v6_token_cmd<nowiki>}</nowiki> && $<nowiki>{</nowiki>v6_flush_dyn_cmd<nowiki>}</nowiki>" commands from interface::add_ip6_mapped to debug 'Interface::Add_ip6_mapped[main]/Augeas[ens5_v6_token]: Could not evaluate: Saving failed' but it does not reproduce the puppet error ... ([[phab:T244719|T244719]])
* 21:10 robh@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
* 17:57 jforrester@deploy1001: Synchronized php-1.35.0-wmf.19/includes/pager/IndexPager.php: [[phab:T244941|T244941]] IndexPager: Cast properties passed to implode to arrays (duration: 01m 03s)
* 21:06 robh@cumin1001: START - Cookbook sre.dns.netbox
* 17:27 jeh: upgrade RAID firmware on cloudvirt1024 to 25.5.6.0009 [[phab:T241884|T241884]]
* 19:45 dancy@deploy1002: rebuilt and synchronized wikiversions files: group2 wikis to 1.38.0-wmf.4  refs [[phab:T281168|T281168]]
* 17:22 bblack: ns1.wikimedia.org - re-route back to original authdns2001 destination
* 19:23 dzahn@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'miscweb' for release 'main' .
* 17:11 brennen: restarting jenkins for updates
* 19:05 dzahn@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'miscweb' for release 'main' .
* 17:09 vgutierrez: disabling KA between ats-tls and varnish-fe on cp4031 - [[phab:T244464|T244464]]
* 18:53 dzahn@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'miscweb' for release 'main' .
* 17:01 vgutierrez: rolling back cp4026 and cp4032 to trafficserver 8.0.5-1wm15
* 18:53 urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript namespaceDupes.php --wiki=dagwiki --fix
* 17:00 vgutierrez: depool cp40[26,32]
* 18:47 urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript extensions/CentralAuth/maintenance/fixStuckGlobalRename.php --wiki=frwiktionary --logwiki=metawiki 'TURK FASTER' 'ARTHUR MORGAN'
* 16:53 bblack@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 18:42 urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript extensions/CentralAuth/maintenance/fixStuckGlobalRename.php --wiki=metawiki --logwiki=metawiki 'George Dum Fulton' 'George Fulton' # [[phab:T293403|T293403]]
* 16:52 vgutierrez: pool cp20[06,14] running buster - [[phab:T242093|T242093]]
* 18:41 urbanecm: UTC evening B&C done
* 16:51 bblack@cumin1001: START - Cookbook sre.hosts.downtime
* 18:40 urbanecm@deploy1002: Synchronized php-1.38.0-wmf.4/extensions/MediaSearch/extension.json: {{Gerrit|6da3523daaba85a4199721980c0a9c96b20697e7}}: Fix assessment quickview labels ([[phab:T292596|T292596]]) (duration: 01m 03s)
* 16:49 moritzm: installing openjpeg2 security updates
* 18:37 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|c8dffefd0d095abe3709dcc962d5d24f27b55869}}: Create Salima namespace for dagwiki ([[phab:T289911|T289911]]) (duration: 01m 04s)
* 16:10 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 18:30 dzahn@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'miscweb' for release 'main' .
* 16:08 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 18:25 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|0bccd4bc45498db8628567574d0bb3a23f8fb378}}: Add $wgSitename and $wgMetaNamespace for kswiki and kswiktionary ([[phab:T289752|T289752]], [[phab:T289767|T289767]]) (duration: 01m 04s)
* 16:07 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 18:17 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 16:05 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 18:14 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|262e588b44f126fb9e1aa933a3ca59b191b42bd7}}: Enable Growth mentor dashboard backend on all wikis ([[phab:T278920|T278920]]) (duration: 01m 05s)
* 15:56 vgutierrez: Enable KA and disable parent proxies on cp4031 - [[phab:T244464|T244464]]
* 18:07 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|41baa8c41d64510986f009b9be2d70dad0915f8c}}: Add new mediawiki.skin_diff event logging stream ([[phab:T289622|T289622]]) (duration: 01m 05s)
* 15:50 vgutierrez: depool cp20[06,14] and reimage as buster - [[phab:T242093|T242093]]
* 18:03 addshore@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' .
* 15:49 volans: spicerack upgraded to 0.0.30-1 on both cumin hosts
* 18:02 addshore@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' .
* 15:48 vgutierrez: pool cp20[07,17] running buster - [[phab:T242093|T242093]]
* 18:01 addshore@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'staging' .
* 15:46 bblack: authdns2001 - shutting down for hardware work - [[phab:T242017|T242017]]
* 17:54 bd808@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'toolhub' for release 'main' .
* 15:40 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 17:52 rzl: repooled mw1452 (with `sudo pool` so no auto log from conftool)
* 15:39 jeh: clearing foreign drive RAID configuration on cloudvirt1024 [[phab:T241884|T241884]]
* 17:47 bd808@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'toolhub' for release 'main' .
* 15:37 jmm@cumin2001: START - Cookbook sre.hosts.downtime
* 17:45 rzl@cumin1001: conftool action : set/pooled=no; selector: name=mw1452.eqiad.wmnet
* 15:32 marostegui: Disable event handler for db1095 RAID check on icinga - [[phab:T244958|T244958]]
* 17:42 rzl: depool mw1452 for training
* 15:32 marostegui: Disable event handler for db1095 RAID check on icinga -
* 17:32 addshore@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' .
* 15:28 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 17:31 addshore@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' .
* 15:26 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 17:29 addshore@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'staging' .
* 15:25 jeh: upgrade BIOS firmware on cloudvirt1024 to 2.4.8 [[phab:T241884|T241884]]
* 16:44 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 15:19 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 16:44 ryankemper: [[phab:T288231|T288231]] Manually killed dangling `pigz` / `nc` processes on `wdqs2008` (and `wdqs2005` implicitly). Should be in the right state to re-start the `data-transfer` cookbook from again
* 15:17 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 16:41 ryankemper@cumin1001: END (ERROR) - Cookbook sre.wdqs.data-transfer (exit_code=97)
* 15:02 vgutierrez: depool cp20[07,17] and reimage as buster - [[phab:T242093|T242093]]
* 16:37 elukey: drop kubeflow-kfserving* docker images from deneb
* 14:34 XioNoX: repool eqsin
* 16:36 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 14:31 moritzm: reimage logstash2026 to test new standard RAID0 partman recipe
* 16:34 ryankemper@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99)
* 14:00 vgutierrez: pool cp20[10,18] running buster - [[phab:T242093|T242093]]
* 16:33 moritzm: installing node-ansi-regex security updates
* 13:55 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1107 after 10.4 testing - [[phab:T242702|T242702]]', diff saved to https://phabricator.wikimedia.org/P10393 and previous config saved to /var/cache/conftool/dbconfig/20200212-135514-marostegui.json
* 16:28 mbsantos@deploy1002: Finished deploy [kartotherian/deploy@4bff2d1]: Force mirrored traffic to 0% for everywhere (duration: 02m 24s)
* 13:39 akosiaris: revert sessionstore on mw1331, mw1348 so that it times out instead of returning TCP RSTs. Testing for [[phab:T243106|T243106]]
* 16:25 mbsantos@deploy1002: Started deploy [kartotherian/deploy@4bff2d1]: Force mirrored traffic to 0% for everywhere
* 13:36 XioNoX: re-enable transit/peering on cr1-eqsin - [[phab:T244944|T244944]]
* 16:24 dancy@deploy1002: Synchronized php-1.38.0-wmf.4/extensions/Collection/includes/CollectionHooks.php: Backport: [[gerrit:730580{{!}}Check that the timestamp  key/value is set to avoid undefined offset (T293300)]] (duration: 01m 04s)
* 13:26 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 16:16 mbsantos@deploy1002: Finished deploy [kartotherian/deploy@071f7c3]: Increase mirrored traffic to 100% for eqiad (duration: 02m 41s)
* 13:24 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 16:14 mbsantos@deploy1002: Started deploy [kartotherian/deploy@071f7c3]: Increase mirrored traffic to 100% for eqiad
* 13:23 akosiaris: mangle sessionstore on mw1331, mw1348 so that it timesout instead of returning TCP RSTs. Testing for [[phab:T243106|T243106]]
* 16:08 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 13:23 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 16:07 ryankemper@cumin1001: END (ERROR) - Cookbook sre.wdqs.data-transfer (exit_code=97)
* 13:22 XioNoX: cr1-eqsin RE failover (final) - [[phab:T244944|T244944]]
* 16:07 ryankemper: [[phab:T288231|T288231]] About to ctrl+c out of ongoing data transfer because puppet run following merge of https://gerrit.wikimedia.org/r/c/operations/puppet/+/730794 restarted blazegraph; we'll manually disable updater and kick off the transfer again
* 13:21 marostegui: Restart wikibugs as phab comments aren't showing up on irc - [[phab:T241109|T241109]]
* 16:04 ryankemper: [[phab:T288231|T288231]] `ryankemper@wdqs2005:~$ sudo run-puppet-agent --force`
* 13:20 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 15:56 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 13:18 jynus: setting up db1140 under maintenance (upgrade, reboot, disable alerts)
* 15:54 ryankemper: [[phab:T288231|T288231]] `ryankemper@wdqs2008:~$ sudo depool`
* 13:15 vgutierrez: disabling KA between ats-tls and varnish-fe on cp4031 - [[phab:T244464|T244464]]
* 15:52 ryankemper: [[phab:T288231|T288231]] `ryankemper@wdqs2005:~$ sudo depool`
* 13:10 moritzm: upgrading debdeploy fleet-wide to 0.0.99.13
* 15:22 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.addnode (exit_code=0) for new host ganeti2026.codfw.wmnet to ganeti-test01.svc.codfw.wmnet
* 13:08 moritzm: uploaded libapache2-mod-auth-cas 1.2-1~deb8u1 for jessie-wikimedia to apt.wikimedia.org
* 15:20 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti2026.codfw.wmnet to ganeti-test01.svc.codfw.wmnet
* 13:05 vgutierrez: depool cp20[10,18] and reimage as buster - [[phab:T242093|T242093]]
* 15:13 bd808@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'toolhub' for release 'main' .
* 13:05 vgutierrez: pool cp20[12,20] running buster - [[phab:T242093|T242093]]
* 15:06 dancy@deploy1002: Synchronized php-1.38.0-wmf.4/extensions/VisualEditor/includes/VisualEditorHooks.php: Backport: [[gerrit:730729{{!}}Fix value of 'namespacesWithSubpages' in wgVisualEditorConfig (T293310)]] (duration: 01m 04s)
* 12:55 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 15:02 dancy@deploy1002: Synchronized php-1.38.0-wmf.4/extensions/Collection/includes/CollectionHooks.php: Backport: [[gerrit:730580{{!}}Check that the timestamp  key/value is set to avoid undefined offset (T293300)]] (duration: 01m 03s)
* 12:53 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 15:00 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti2026.codfw.wmnet to ganeti-test01.svc.codfw.wmnet
* 12:53 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 14:59 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti2026.codfw.wmnet to ganeti-test01.svc.codfw.wmnet
* 12:53 XioNoX: cr1-eqsin RE failover - [[phab:T244944|T244944]]
* 14:53 kormat: upgrading orchestrator.wm.o to 3.2.6-1 [[phab:T275784|T275784]]
* 12:50 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 14:49 jbond@cumin1001: conftool action : set/pooled=true; selector: name=eqiad,dnsdisc=apt
* 12:35 vgutierrez: depool cp20[12,20] and reimage as buster - [[phab:T242093|T242093]]
* 14:43 jbond: migrate apt.w.o to a dns active/passiev discovery address (cc moritzm)
* 12:34 vgutierrez: pool cp20[13,22] running buster - [[phab:T242093|T242093]]
* 14:23 moritzm: installing krb5 security updates on KDCs
* 12:26 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 14:19 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality' for release 'main' .
* 12:24 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 14:10 urbanecm@deploy1002: Synchronized dblists/growthexperiments.dblist: {{Gerrit|b35adfc59eec9c19b509bb9439cdfe33978a4f8b}}: Deploy Growth wikis to 4 wikis in dark mode ([[phab:T291826|T291826]]; 2/2) (duration: 01m 03s)
* 12:21 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:571705{{!}}Triple the factor of WDQS lag to maxlag for Wikidata (T244722)]], take II, the cache issue (duration: 01m 03s)
* 14:07 urbanecm: Run extensions/GrowthExperiments/initWikiConfig.php for ganwiki, iuwiki, tgwiki ([[phab:T291826|T291826]])
* 12:19 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 14:07 urbanecm: Create growthexperiments DB tables for ganwiki, iuwiki, tgwiki ([[phab:T291826|T291826]])
* 12:19 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:571705{{!}}Triple the factor of WDQS lag to maxlag for Wikidata (T244722)]] (duration: 01m 04s)
* 14:06 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality' for release 'main' .
* 12:17 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 14:05 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
* 12:12 kartik@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit{{!}}571412{{!}}Enable ContentTranslation out of beta in bs and mk WPs (T244139, T244140)]] (duration: 01m 15s)
* 14:05 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 12:08 vgutierrez: depool cp2013 and reimage as buster - [[phab:T242093|T242093]]
* 14:04 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|b35adfc59eec9c19b509bb9439cdfe33978a4f8b}}: Deploy Growth wikis to 4 wikis in dark mode ([[phab:T291826|T291826]]; 1/2) (duration: 01m 04s)
* 12:06 vgutierrez: pool cp2016 running buster - [[phab:T242093|T242093]]
* 14:03 urbanecm@deploy1002: Synchronized dblists/visualeditor-nondefault.dblist: {{Gerrit|82d0a4bf45126ecba2cfcd1a0c2081a00f58dca3}}: Enable VE by default on 4 more wikis ([[phab:T290614|T290614]]) (duration: 01m 05s)
* 12:01 vgutierrez: depool cp20[16,22] and reimage as buster - [[phab:T242093|T242093]]
* 13:56 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality' for release 'main' .
* 11:57 vgutierrez: pool cp20[19,24] running buster - [[phab:T242093|T242093]]
* 13:55 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
* 11:53 akosiaris: mangle sessionstore on mw1331 so that it is unreachable. Testing for [[phab:T243106|T243106]]
* 13:54 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 11:49 vgutierrez: repooling cp40[26,32]
* 13:54 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
* 11:39 vgutierrez: pool cp3050 running buster - [[phab:T242093|T242093]]
* 13:54 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 11:37 vgutierrez: depooling cp[4026,4032]
* 13:52 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
* 11:35 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 13:52 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 11:33 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 13:43 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2026.codfw.wmnet
* 11:32 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 13:36 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2026.codfw.wmnet
* 11:30 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 13:33 ayounsi@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 11:19 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 13:14 kormat: uploaded orchestrator 3.2.6-1 packages to apt.wm.o (buster) [[phab:T275784|T275784]]
* 11:18 vgutierrez: depool cp2024 and reimage as buster - [[phab:T242093|T242093]]
* 12:45 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti2026.codfw.wmnet with OS buster
* 11:17 vgutierrez: pool cp2025 running buster - [[phab:T242093|T242093]]
* 12:44 ayounsi@cumin1001: START - Cookbook sre.dns.netbox
* 11:16 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 12:42 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10 days, 0:00:00 on cloudbackup2002.codfw.wmnet with reason: working on cinder backupse
* 11:15 vgutierrez: depool cp2016 and reimage as buster - [[phab:T242093|T242093]]
* 12:42 aborrero@cumin1001: START - Cookbook sre.hosts.downtime for 10 days, 0:00:00 on cloudbackup2002.codfw.wmnet with reason: working on cinder backupse
* 11:14 vgutierrez: pool cp2019 running buster - [[phab:T242093|T242093]]
* 12:19 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/Wikibase.php: Config: [[gerrit:730746{{!}}Untangle “dispatch via jobs” settings in Wikibase.php (T291828)]] (no-op) (duration: 01m 04s)
* 11:11 moritzm: reimage logstash2026 to test new standard RAID0 partman recipe
* 12:12 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:730725{{!}}Set wmgWikibaseDispatchViaJobsPruneChangesTableInJobEnabled for wikidatawiki (T291828)]] (no-op) (duration: 01m 05s)
* 11:05 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 11:47 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti2026.codfw.wmnet with OS buster
* 11:03 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 11:17 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts testvm2006.codfw.wmnet
* 11:03 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 11:10 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts testvm2006.codfw.wmnet
* 11:00 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 11:10 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 10:50 vgutierrez: depool cp3050 and reimage as buster - [[phab:T242093|T242093]]
* 11:01 jmm@cumin2002: START - Cookbook sre.dns.netbox
* 10:49 vgutierrez: pool cp30[51,52] running buster - [[phab:T242093|T242093]]
* 10:52 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts testvm2002.codfw.wmnet
* 10:45 vgutierrez: depool cp20[19,25] and reimage as buster - [[phab:T242093|T242093]]
* 10:38 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts testvm2002.codfw.wmnet
* 10:42 vgutierrez: pool cp2026 running buster - [[phab:T242093|T242093]]
* 10:38 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts testvm2001.codfw.wmnet
* 10:36 vgutierrez: pool cp2023 running buster - [[phab:T242093|T242093]]
* 10:35 urbanecm@deploy1002: Synchronized php-1.38.0-wmf.4/extensions/GrowthExperiments/: {{Gerrit|1f33fc3}}, {{Gerrit|e0ea1b8}}, {{Gerrit|cba2ac9}}: GrowthExperiments backports ([[phab:T290609|T290609]]) (duration: 01m 05s)
* 10:34 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 10:33 urbanecm@deploy1002: Synchronized php-1.38.0-wmf.3/extensions/GrowthExperiments/: {{Gerrit|465b564}}, {{Gerrit|a8cc98b}}, {{Gerrit|6e95c48}}: GrowthExperiments backports ([[phab:T290609|T290609]]) (duration: 01m 06s)
* 10:34 moritzm: bouncing ferm on ganeti1016, failed to start after boot
* 10:32 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts testvm2001.codfw.wmnet
* 10:32 vgutierrez: Enable KA between ats-tls and varnish-fe on cp4031 - [[phab:T244464|T244464]]
* 09:20 volans@cumin2002: END (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: sretest1001.eqiad.wmnet
* 10:31 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 09:20 volans@cumin2002: START - Cookbook sre.debmonitor.remove-hosts for 1 hosts: sretest1001.eqiad.wmnet
* 10:31 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 09:19 volans@cumin2002: END (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: sretest1001.eqiad.wmnet
* 10:29 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 09:19 volans@cumin2002: START - Cookbook sre.debmonitor.remove-hosts for 1 hosts: sretest1001.eqiad.wmnet
* 10:29 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 09:19 volans@cumin2002: END (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: sretest1001.eqiad.wmnet
* 10:27 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 09:19 volans@cumin2002: START - Cookbook sre.debmonitor.remove-hosts for 1 hosts: sretest1001.eqiad.wmnet
* 10:25 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 09:18 volans@deploy1002: Finished deploy [debmonitor/deploy@ab62ac5]: Release v0.3.1 (duration: 00m 50s)
* 10:23 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 09:17 volans@deploy1002: Started deploy [debmonitor/deploy@ab62ac5]: Release v0.3.1
* 10:12 vgutierrez: testing trafficserver 8.0.6-rc0 in cp40[26,32]
* 09:04 volans@deploy1002: Finished deploy [debmonitor/deploy@444b931]: Release v0.3.1 (duration: 00m 45s)
* 10:06 vgutierrez: depool cp20[23,26] and reimage as buster - [[phab:T242093|T242093]]
* 09:03 volans@deploy1002: Started deploy [debmonitor/deploy@444b931]: Release v0.3.1
* 10:01 vgutierrez: depool cp30[51-52] and reimage as buster - [[phab:T242093|T242093]]
* 09:02 volans@deploy1002: Finished deploy [debmonitor/deploy@444b931]: Release v0.3.1 (duration: 00m 23s)
* 09:38 ema: cp: rolling ats-tls-restart to enable analytics logging [[phab:T237993|T237993]]
* 09:02 volans@deploy1002: Started deploy [debmonitor/deploy@444b931]: Release v0.3.1
* 09:26 ema: cp4027: ats-tls-restart to enable analytics logging to pipe [[phab:T237993|T237993]]
* 08:52 volans@cumin2002: END (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: sretest1001.eqiad.wmnet
* 09:25 moritzm: rolling restart of cassandra on restbase-dev to pick up Java security updates
* 08:52 volans@cumin2002: START - Cookbook sre.debmonitor.remove-hosts for 1 hosts: sretest1001.eqiad.wmnet
* 09:17 marostegui: Failover m2 master dbproxy from dbproxy1007 to dbproxy1013 - [[phab:T202367|T202367]]
* 08:51 volans@cumin2002: END (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: sretest1001.eqiad.wmnet
* 09:13 elukey@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
* 08:51 volans@cumin2002: START - Cookbook sre.debmonitor.remove-hosts for 1 hosts: sretest1001.eqiad.wmnet
* 09:11 marostegui: Upgrade and reboot dbproxy1013 before making it master - [[phab:T202367|T202367]]
* 08:22 volans: rolling out debmonitor-client upgrade to 0.3.1 across the fleet
* 08:55 elukey@cumin1001: START - Cookbook sre.ganeti.makevm
* 07:25 oblivian@cumin1001: END (FAIL) - Cookbook sre.discovery.service-route (exit_code=99)
* 08:46 phedenskog@deploy1001: Finished deploy [performance/navtiming@9bbbb58]: (no justification provided) (duration: 00m 05s)
* 07:25 oblivian@cumin1001: START - Cookbook sre.discovery.service-route
* 08:46 phedenskog@deploy1001: Started deploy [performance/navtiming@9bbbb58]: (no justification provided)
* 07:25 oblivian@cumin1001: END (FAIL) - Cookbook sre.discovery.service-route (exit_code=99)
* 08:38 marostegui: Restart wikibugs as it doesn't show phab comments on irc - [[phab:T241109|T241109]]
* 07:25 oblivian@cumin1001: START - Cookbook sre.discovery.service-route
* 08:21 moritzm: installing mesa security updates
* 07:24 oblivian@cumin1001: END (FAIL) - Cookbook sre.discovery.service-route (exit_code=99)
* 07:28 vgutierrez: pool cp30[53-54] running buster - [[phab:T242093|T242093]]
* 07:24 oblivian@cumin1001: START - Cookbook sre.discovery.service-route
* 07:18 oblivian@puppetmaster1001: conftool action : set/weight=30; selector: dc=eqiad,pool=appserver,name=mw132[3-4].*
* 07:18 filippo@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=swift-ro,name=eqiad
* 07:16 oblivian@puppetmaster1001: conftool action : set/weight=20; selector: dc=eqiad,pool=appserver,service=nginx,name=mw12[3-5].*
* 07:18 filippo@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=swift,name=eqiad
* 07:02 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1107 with weight 20 for  10.4 testing - [[phab:T242702|T242702]]', diff saved to https://phabricator.wikimedia.org/P10391 and previous config saved to /var/cache/conftool/dbconfig/20200212-070250-marostegui.json
* 07:17 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 06:54 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 06:37 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 06:52 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 01:52 bd808@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'toolhub' for release 'main' .
* 06:51 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 01:50 foks: changing user email for "Region of Peel Archives"
* 06:50 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 01:41 ejegg: updated payments-wiki from {{Gerrit|b329d2dea2}} to {{Gerrit|19d18c1852}}
* 06:46 marostegui: Redact ngwikimedia on db1124:3313 and db2094:3313 [[phab:T240772|T240772]]
* 01:35 bd808@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'toolhub' for release 'main' .
* 06:22 vgutierrez: depool cp30[53-54] and reimage as buster - [[phab:T242093|T242093]]
* 01:31 bd808@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'toolhub' for release 'main' .
* 06:18 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 06:17 marostegui@cumin1001: START - Cookbook sre.hosts.decommission
* 06:16 marostegui@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
* 06:16 marostegui@cumin1001: START - Cookbook sre.hosts.decommission
* 01:48 XioNoX: disabling peering session on cr1-eqsin (they're flapping otherwise)
* 00:44 jforrester@deploy1001: Synchronized php-1.35.0-wmf.19/includes/page/ImageHistoryPseudoPager.php: [[phab:T244937|T244937]] ImageHistoryPseudoPager: Update doQuery() for IndexPager changes (duration: 01m 03s)
* 00:38 XioNoX: reboot cr1-eqsin
* 00:33 XioNoX: commit full on cr1-eqsin - [[phab:T243080|T243080]]
* 00:21 reedy@deploy1001: Synchronized wmf-config/CommonSettings.php: rm wgKartographerIconServer (duration: 01m 02s)
* 00:20 reedy@deploy1001: Synchronized wmf-config/CommonSettings-labs.php: rm wgKartographerIconServer (duration: 01m 03s)
* 00:16 eileen: civicrm revision changed from {{Gerrit|ee9edf8137}} to {{Gerrit|55b2afb6eb}}, config revision is {{Gerrit|561ae21f77}}


== 2020-02-11 ==
== 2021-10-13 ==
* 22:04 XioNoX: switchover RE mastership back re0 on cr1-eqsin - [[phab:T243080|T243080]]
* 23:37 dzahn@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'miscweb' for release 'main' .
* 21:50 XioNoX: reboot re0:cr1-eqsin (backup) - [[phab:T243080|T243080]]
* 23:36 eileen: civicrm revision changed from {{Gerrit|946dfb6c5a}} to {{Gerrit|018d3b19fe}}, config revision is {{Gerrit|85277466ed}}
* 21:45 cdanis: repool eqiad
* 23:36 tgr@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:730575{{!}}Create an alias for the project namespace on kswiki (T291740)]] (duration: 01m 05s)
* 21:37 bblack@cumin1001: conftool action : set/pooled=yes; selector: name=^cp107.*
* 22:30 dzahn@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'miscweb' for release 'main' .
* 21:36 bblack@cumin1001: conftool action : set/pooled=yes; selector: name=^cp108.*
* 22:01 dancy@deploy1002: Synchronized php-1.38.0-wmf.3/extensions/Collection/includes/Specials/SpecialCollection.php: Backport: [[gerrit:730578{{!}}Api: Avoid trying to access undefined offset in a user's collection (T293261)]] (duration: 01m 04s)
* 21:36 bblack: re-pooling all cp10xx in eqiad
* 21:50 dancy@deploy1002: Synchronized php-1.38.0-wmf.4/extensions/Collection: Backport: [[gerrit:730577{{!}}Api: Avoid trying to access undefined offset in a user's collection (T293261)]] (duration: 01m 04s)
* 21:32 XioNoX: switchover RE mastership on cr1-eqsin - [[phab:T243080|T243080]]
* 21:47 foks: removing 8 files for legal compliance
* 21:14 robh: cp1067 powered back into service post firmware update via [[phab:T243167|T243167]]
* 21:03 foks: removing 2 files for legal compliance
* 21:11 cdanis: depool eqiad
* 21:00 mbsantos@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 21:01 marxarelli: completed group0 to 1.35.0-wmf.19 ([[phab:T233867|T233867]])
* 20:50 bd808@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'toolhub' for release 'main' .
* 20:57 robh: cp108[45] returned to service, depooling cp108[67]for firmware update via [[phab:T243167|T243167]]
* 20:49 brennen@deploy1002: Synchronized php-1.38.0-wmf.4/extensions/Collection/includes/Api/ApiGetBookCreatorBoxContent.php: Backport: [[gerrit:730574{{!}}Fall back to main page if given title is invalid (T293299)]] (duration: 01m 04s)
* 20:54 dduvall@deploy1001: rebuilt and synchronized wikiversions files: group0 to 1.35.0-wmf.19
* 20:46 bd808@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'toolhub' for release 'main' .
* 20:53 mutante: gerrit - moving gerrit db_pass from private module passwords to private hieradata
* 20:40 bd808@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'toolhub' for release 'main' .
* 20:51 XioNoX: reboot backup RE on cr1-eqsin - [[phab:T243080|T243080]]
* 20:31 mbsantos@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 20:38 robh: depooling cp108[45] for firmware update via [[phab:T243167|T243167]]
* 20:27 robh@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes1021.eqiad.wmnet with OS stretch
* 20:32 dduvall@deploy1001: Finished scap: testwiki to php-1.35.0-wmf.19 and rebuild l10n cache (duration: 37m 31s)
* 20:04 robh@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes1021.eqiad.wmnet with OS stretch
* 20:19 volker-e@deploy1001: Finished deploy [design/style-guide@dd8e6de]: Deploy design/style-guide:  (duration: 00m 02s)
* 20:03 robh@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host kubernetes1021.eqiad.wmnet with OS stretch
* 20:19 volker-e@deploy1001: Started deploy [design/style-guide@dd8e6de]: Deploy design/style-guide:
* 20:01 robh@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes1021.eqiad.wmnet with OS stretch
* 20:18 volker-e@deploy1001: Finished deploy [design/style-guide@dd8e6de]: Deploy design/style-guide:  (duration: 00m 03s)
* 19:18 dzahn@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'miscweb' for release 'main' .
* 20:18 volker-e@deploy1001: Started deploy [design/style-guide@dd8e6de]: Deploy design/style-guide:
* 19:16 mutante: gitlab2001 - status before was that "gitlab-ctl status" showed components "gitlab-workhorse" and "postgres-exporter" as "down". this was either pre-broken or caused by the restore process. after manually 'gitlab-ctl start gitlab-workhorse' all of the components are in "run" and https://gitlab-replica.wikimedia.org is up ( [[phab:T285867|T285867]])
* 20:08 XioNoX: depool eqsin for router upgrade - [[phab:T243080|T243080]]
* 19:08 mutante: gitl1b2001 - started workhorse which was for some reason marked as down after restore command ran
* 20:01 volker-e@deploy1001: Finished deploy [design/style-guide@dd8e6de]: Deploy design/style-guide:  (duration: 00m 04s)
* 19:08 mutante: [gitlab2001:~] $ sudo /usr/bin/gitlab-ctl start gitlab-workhorse
* 20:01 volker-e@deploy1001: Started deploy [design/style-guide@dd8e6de]: Deploy design/style-guide:
* 19:06 dancy@deploy1002: Synchronized php: group1 wikis to 1.38.0-wmf.4  refs [[phab:T281168|T281168]] (duration: 01m 03s)
* 19:55 dduvall@deploy1001: Started scap: testwiki to php-1.35.0-wmf.19 and rebuild l10n cache
* 19:05 dancy@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.38.0-wmf.4  refs [[phab:T281168|T281168]]
* 19:43 dduvall@deploy1001: Pruned MediaWiki: 1.35.0-wmf.16 (duration: 01m 48s)
* 19:02 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|87879865c35edab3ead523027681146e00d6fc02}}: Create Translation namespace for viwikisource ([[phab:T290691|T290691]]) (duration: 01m 04s)
* 19:42 dduvall@deploy1001: Pruned MediaWiki: 1.35.0-wmf.15 (duration: 01m 51s)
* 18:39 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|06fd0f225575448771cdba0d4e6bf36bb6715bc1}}: add extendedconfimed for autoreview group on ptwiki ([[phab:T292912|T292912]]) (duration: 01m 04s)
* 19:38 dduvall@deploy1001: Pruned MediaWiki: 1.35.0-wmf.14 (duration: 02m 08s)
* 18:37 urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript initSiteStats.php --wiki=ptwiki --update
* 19:36 dduvall@deploy1001: Pruned MediaWiki: 1.35.0-wmf.11 (duration: 10m 53s)
* 18:33 urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript emptyUserGroup.php --wiki=ptwiki extendedconfirmed
* 19:35 marxarelli: running `scap clean --delete` for old wmf branches wmf.11, wmf.14, wmf.15, wmf.16 ([[phab:T233867|T233867]])
* 18:31 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|0bb2b388217aa91a39ed3684f87fdf7edb06fd81}}:  Set autoconfirmedextended and confirmedextended for ptwiki ([[phab:T292915|T292915]]) (duration: 01m 04s)
* 19:03 volans: uploaded spicerack_0.0.30-1_amd64.deb to apt.wikimedia.org stretch-wikimedia
* 18:16 urbanecm@deploy1002: Synchronized static/images/project-logos: {{Gerrit|694bc234ab5dbb9a2387a6129998d45a53ac0ab3}}: Remove an old dawiki temporary logo (duration: 01m 04s)
* 19:00 Urbanecm: Create User:Ammarpad on ngwikimedia and promote to sysop, bureaucrat ([[phab:T240771|T240771]])
* 18:15 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|224e2a374b1cc6327e9d8c2bca576091ce4efc74}}: Add NS_MAIN back to wgExtraSignatureNamespaces for mediawikiwiki ([[phab:T291630|T291630]]) (duration: 01m 05s)
* 18:48 jforrester@deploy1001: rebuilt and synchronized wikiversions files: all wikis to 1.35.0-wmf.18
* 18:12 volans@cumin2002: END (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: sretest1001.eqiad.wmnet
* 18:43 twentyafterfour: getting ready to deploy wmf.18 refs  [[phab:T233866|T233866]]
* 18:12 volans@cumin2002: START - Cookbook sre.debmonitor.remove-hosts for 1 hosts: sretest1001.eqiad.wmnet
* 18:42 greg-g: restarting stashbot
* 18:11 urbanecm@deploy1002: Synchronized static/images/project-logos/: {{Gerrit|1b96f54a518620b0dc6a0ab63b402d0ea2c6bf70}}: Update logo for liwiktionary ([[phab:T291479|T291479]]) (duration: 01m 14s)
* 18:35 bblack: ns1.wikimedia.org - changing static route destination on cr[12]-codfw from authdns2001 to dns2002 - [[phab:T242017|T242017]]
* 18:10 volans@cumin2002: END (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: sretest1001.eqiad.wmnet
* 18:33 Urbanecm: Create ngwikimedia is done ([[phab:T240771|T240771]])
* 18:10 volans@cumin2002: START - Cookbook sre.debmonitor.remove-hosts for 1 hosts: sretest1001.eqiad.wmnet
* 18:26 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Create ngwikimedia ([[phab:T240771|T240771]]) (duration: 01m 03s)
* 18:09 volans@cumin2002: END (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: sretest1001.eqiad.wmnet
* 18:24 urbanecm@deploy1001: Synchronized multiversion/MWMultiVersion.php: Create ngwikimedia ([[phab:T240771|T240771]]) (duration: 01m 06s)
* 18:09 volans@cumin2002: START - Cookbook sre.debmonitor.remove-hosts for 1 hosts: sretest1001.eqiad.wmnet
* 18:21 urbanecm@deploy1001: rebuilt and synchronized wikiversions files: Create ngwikimedia ([[phab:T240771|T240771]])
* 18:08 volans: uploaded debmonitor-client_0.3.1 to apt.wikimedia.org stretch-wikimedia,buster-wikimedia,bullseye-wikimedia
* 18:20 dpifke@deploy1001: Finished deploy [performance/navtiming@b471b64]: (no justification provided) (duration: 00m 05s)
* 17:14 urbanecm@deploy1002: Synchronized php-1.38.0-wmf.4/extensions/GrowthExperiments/maintenance/initWikiConfig.php: {{Gerrit|dd7a3314602ffddc5b917cccc71c917301639388}}: initWikiConfig: Fix loading difficulty/group from SUGGESTED_EDITS_TASK_TYPES ([[phab:T293219|T293219]]) (duration: 01m 04s)
* 18:20 dpifke@deploy1001: Started deploy [performance/navtiming@b471b64]: (no justification provided)
* 17:13 urbanecm@deploy1002: Synchronized php-1.38.0-wmf.3/extensions/GrowthExperiments/maintenance/initWikiConfig.php: {{Gerrit|5c27154cf434bebc37f5e98e2ad1b5cea7cde1d4}}: initWikiConfig: Fix loading difficulty/group from SUGGESTED_EDITS_TASK_TYPES ([[phab:T293219|T293219]]) (duration: 01m 15s)
* 18:19 urbanecm@deploy1001: Synchronized dblists/: Create ngwikimedia ([[phab:T240771|T240771]]) (duration: 01m 06s)
* 16:57 mutante: stat1008 - short on disk space, mostly used in /tmp, high CPU usage by R proccess, sent a message about it to all shell users via wall
* 17:57 bblack: reboot dns2002 post-reimaging
* 16:50 mutante: stat1008 - apt-get clean - freed 1.3 GB disk space - was alerting in Icinga because / was 97% full
* 17:13 vgutierrez: Disable KA on cp4031 - [[phab:T244464|T244464]]
* 16:37 volans@cumin2002: END (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: sretest1001.eqiad.wmnet
* 16:49 vgutierrez: pool cp3055 running buster - [[phab:T242093|T242093]]
* 16:37 volans@cumin2002: START - Cookbook sre.debmonitor.remove-hosts for 1 hosts: sretest1001.eqiad.wmnet
* 16:43 vgutierrez: repooling cp4031
* 16:23 volans@cumin2002: END (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: sretest1001.eqiad.wmnet
* 16:38 vgutierrez: depooling cp4031 for some KA tests
* 16:23 volans@cumin2002: START - Cookbook sre.debmonitor.remove-hosts for 1 hosts: sretest1001.eqiad.wmnet
* 16:25 vgutierrez: pool cp3056 running buster - [[phab:T242093|T242093]]
* 15:29 volans@cumin2002: END (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: sretest1001.eqiad.wmnet
* 16:23 bblack: dns2002 - shutting down for hardware work and reinstall - [[phab:T242017|T242017]]
* 15:28 volans@cumin2002: START - Cookbook sre.debmonitor.remove-hosts for 1 hosts: sretest1001.eqiad.wmnet
* 16:21 bblack: dns2002 - stopping bird adverts to depool service for [[phab:T242017|T242017]]
* 15:26 volans@cumin2002: END (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: sretest1001.eqiad.wmnet
* 16:20 bblack: dns2002 - downtimed in icinga for [[phab:T242017|T242017]]
* 15:26 volans@cumin2002: START - Cookbook sre.debmonitor.remove-hosts for 1 hosts: sretest1001.eqiad.wmnet
* 16:07 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 15:16 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2026.codfw.wmnet
* 16:05 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 15:13 jbond@cumin1001: END (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: sretest1001.eqiad.wmnet
* 15:38 vgutierrez: depool cp3056 and reimage as buster - [[phab:T242093|T242093]]
* 15:13 jbond@cumin1001: START - Cookbook sre.debmonitor.remove-hosts for 1 hosts: sretest1001.eqiad.wmnet
* 15:36 vgutierrez: pool cp3058 running buster - [[phab:T242093|T242093]]
* 15:12 jbond@cumin1001: END (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: sretest1001.eqiad.wmnet
* 15:29 otto@deploy1001: Synchronized wmf-config/InitialiseSettings-labs.php: Configuring test.event stream in beta, no-op in prod - [[phab:T242122|T242122]] (duration: 01m 08s)
* 15:12 jbond@cumin1001: START - Cookbook sre.debmonitor.remove-hosts for 1 hosts: sretest1001.eqiad.wmnet
* 15:24 vgutierrez@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 15:09 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2026.codfw.wmnet
* 15:24 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 15:04 jgiannelos@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 15:14 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 15:03 jbond@cumin1001: END (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: sretest1001.eqiad.wmnet
* 15:12 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 15:03 jbond@cumin1001: START - Cookbook sre.debmonitor.remove-hosts for 1 hosts: sretest1001.eqiad.wmnet
* 14:58 vgutierrez: depool cp3055 and reimage as buster - [[phab:T242093|T242093]]
* 15:01 jgiannelos@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 14:56 vgutierrez: pool cp3057 running buster - [[phab:T242093|T242093]]
* 15:01 jbond@cumin1001: END (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: sretest1001.eqiad.wmnet
* 14:52 moritzm: pruning old CAS logs (predating the current logger config for /var/log/cas/*) from idp1001/idp2001
* 15:01 jbond@cumin1001: START - Cookbook sre.debmonitor.remove-hosts for 1 hosts: sretest1001.eqiad.wmnet
* 14:38 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 14:59 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 14:35 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 14:59 jbond@cumin1001: END (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: sretest1001.eqiad.wmnet
* 14:21 Amir1: ladsgroup@mwmaint1002:~$ mwscript createAndPromote.php --wiki=labswiki --force "Ladsgroup" --custom-groups checkuser
* 14:59 jbond@cumin1001: START - Cookbook sre.debmonitor.remove-hosts for 1 hosts: sretest1001.eqiad.wmnet
* 14:20 vgutierrez: restart varnish-fe on cp4031 - [[phab:T244464|T244464]]
* 14:57 jbond@cumin1001: END (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: sretest1001.eqiad.wmnet
* 14:07 vgutierrez: depool cp3057 and cp3058 and reimage as buster - [[phab:T242093|T242093]]
* 14:56 jbond@cumin1001: START - Cookbook sre.debmonitor.remove-hosts for 1 hosts: sretest1001.eqiad.wmnet
* 13:52 vgutierrez: pool cp3059 and cp3060 running buster - [[phab:T242093|T242093]]
* 14:56 jbond@cumin1001: END (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: sretest1001.eqiad.wmnet
* 13:03 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1107 after 10.4 testing - [[phab:T242702|T242702]]', diff saved to https://phabricator.wikimedia.org/P10382 and previous config saved to /var/cache/conftool/dbconfig/20200211-130343-marostegui.json
* 14:56 jbond@cumin1001: START - Cookbook sre.debmonitor.remove-hosts for 1 hosts: sretest1001.eqiad.wmnet
* 12:56 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 14:54 jbond@cumin1001: END (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: sretest1001.eqiad.wmnet
* 12:53 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 14:54 jbond@cumin1001: START - Cookbook sre.debmonitor.remove-hosts for 1 hosts: sretest1001.eqiad.wmnet
* 12:34 Amir1: EU SWAT is done
* 14:52 ema: repool cp4021, further testing can be performed on sretest1001 [[phab:T201317|T201317]]
* 12:31 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 14:51 volans: restarting ircecho.service on alert1001 to get back icinga-wm without the underscore
* 12:29 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 14:50 elukey: restart pybal on lvs1015 (low-traffic primary) to pick up new config for inference.discovery.wmnet - [[phab:T289835|T289835]]
* 12:28 ladsgroup@deploy1001: Synchronized wmf-config/Wikibase.php: SWAT: [[gerrit:571339{{!}}Fix typo in the config name (T244697)]], take II, cache (duration: 01m 06s)
* 14:48 moritzm: reverted to clean package state on deneb
* 12:26 ladsgroup@deploy1001: Synchronized wmf-config/Wikibase.php: SWAT: [[gerrit:571339{{!}}Fix typo in the config name (T244697)]] (duration: 01m 05s)
* 14:44 elukey@puppetmaster1001: conftool action : ge; selector: cluster=ml_serve,service=inference
* 12:12 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:571338{{!}}Stop reading for the new term store as the default of client wikis (T244697)]], Second round, cache issue (duration: 01m 07s)
* 14:36 elukey: restart pybal on lvs1016 (low-traffic secondary) to pick up new config for inference.discovery.wmnet - [[phab:T289835|T289835]]
* 12:10 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:571338{{!}}Stop reading for the new term store as the default of client wikis (T244697)]] (duration: 01m 11s)
* 14:27 jbond@cumin1001: END (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: sretest1001.eqiad.wmnet
* 12:04 vgutierrez: depool cp3059 and cp360 and reimage as buster - [[phab:T242093|T242093]]
* 14:27 jbond@cumin1001: START - Cookbook sre.debmonitor.remove-hosts for 1 hosts: sretest1001.eqiad.wmnet
* 11:59 vgutierrez: repool cp3061 and cp3062 running buster - [[phab:T242093|T242093]]
* 14:25 jbond@cumin1001: END (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: sretest1001.eqiad.wmnet
* 11:26 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 14:25 jbond@cumin1001: START - Cookbook sre.debmonitor.remove-hosts for 1 hosts: sretest1001.eqiad.wmnet
* 11:24 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 14:21 jbond@cumin1001: END (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: sretest1001.eqiad.wmnet
* 11:20 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 14:21 jbond@cumin1001: START - Cookbook sre.debmonitor.remove-hosts for 1 hosts: sretest1001.eqiad.wmnet
* 11:20 vgutierrez: ats-tls effectively reusing connections between ats-tls and varnish-fe on cp4031 - [[phab:T244464|T244464]]
* 14:20 moritzm: temporarily downgrade sphinx packages on deneb to 1.7.9-1~bpo9+1 to build a Ganeti 2.16 stretch backport with delicate toolchain needs
* 11:18 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 14:13 jbond@cumin1001: END (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: sretest1001.eqiad.wmnet
* 10:56 vgutierrez: depool cp3062 and reimage as buster - [[phab:T242093|T242093]]
* 14:13 jbond@cumin1001: START - Cookbook sre.debmonitor.remove-hosts for 1 hosts: sretest1001.eqiad.wmnet
* 10:54 vgutierrez: repool cp3064 running buster - [[phab:T242093|T242093]]
* 14:10 jbond@cumin1001: END (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: sretest1001.eqiad.wmnet
* 10:51 vgutierrez: depool cp3061 and reimage as buster - [[phab:T242093|T242093]]
* 14:10 jbond@cumin1001: START - Cookbook sre.debmonitor.remove-hosts for 1 hosts: sretest1001.eqiad.wmnet
* 10:50 vgutierrez: repool cp5006 and cp3063 running buster - [[phab:T242093|T242093]]
* 14:10 jbond@cumin1001: END (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: sretest1001.eqiad.wmnet
* 10:30 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 14:10 jbond@cumin1001: START - Cookbook sre.debmonitor.remove-hosts for 1 hosts: sretest1001.eqiad.wmnet
* 10:28 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 13:59 XioNoX: push prep-work for anycast tuning in ulsfo - [[phab:T288843|T288843]]
* 10:27 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 13:38 jayme: imported helm-diff_3.1.3-2 to buster-wikimedia (https://gerrit.wikimedia.org/r/c/operations/debs/helm-diff/+/730509)
* 10:25 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 13:37 jayme@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'shellbox' for release 'main' .
* 10:25 mvolz@deploy1001: helmfile [EQIAD] Ran 'apply' command on namespace 'citoid' for release 'production' .
* 13:34 ema@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4021.ulsfo.wmnet with OS buster
* 10:23 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 12:13 Lucas_WMDE: UTC morning backport+config window done
* 10:23 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 12:12 kharlan@deploy1002: Synchronized php-1.38.0-wmf.3/extensions/GrowthExperiments/includes: Backport: [[gerrit:730370{{!}}Add Link: Do not log "no suggestion found" errors in production log (T291251)]] (duration: 01m 04s)
* 10:18 mvolz@deploy1001: helmfile [CODFW] Ran 'apply' command on namespace 'citoid' for release 'production' .
* 12:11 urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript extensions/GrowthExperiments/maintenance/initWikiConfig.php --wiki=itwiki --phab='[[phab:T255037|T255037]]'  # after applying 730512 at mwmaint1002 to workaround [[phab:T293219|T293219]] # [[phab:T255037|T255037]]
* 10:11 mvolz@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'citoid' for release 'staging' .
* 12:11 kharlan@deploy1002: Synchronized php-1.38.0-wmf.3/extensions/GrowthExperiments/modules: Backport: [[gerrit:730371{{!}}Suggested Edits: Update local config.presets when topics/difficulty presets change (T292536)]] (duration: 01m 07s)
* 10:07 vgutierrez: rolling restart of ats-tls in ulsfo - [[phab:T244464|T244464]]
* 11:56 urbanecm@deploy1002: Synchronized wmf-config/config/itwiki.yaml: {{Gerrit|38a019d4fd6ff8e7cf92f5e7c6a899c336f20235}}: itwiki: Deploy Growth features in dark mode ([[phab:T255037|T255037]]) (duration: 01m 04s)
* 09:57 vgutierrez: depool cp3063 and cp3064 and reimage as buster - [[phab:T242093|T242093]]
* 11:55 urbanecm: mwscript extensions/Translate/scripts/moveTranslatablePage.php --wiki=mediawikiwiki "Growth/Communities/How to introduce yourself as a mentor" "Growth/Communities/How to configure the mentors' list" "Martin Urbanec (WMF)" --reason '[[:phab:T293184]]' # [[phab:T293184|T293184]]
* 09:52 vgutierrez: depool cp5006 and reimage as buster - [[phab:T242093|T242093]]
* 11:55 urbanecm@deploy1002: Synchronized dblists/growthexperiments.dblist: {{Gerrit|38a019d4fd6ff8e7cf92f5e7c6a899c336f20235}}: Deploy Growth features in dark mode ([[phab:T255037|T255037]]; 2/3) (duration: 01m 04s)
* 09:52 vgutierrez: pool cp5007 running buster - [[phab:T242093|T242093]]
* 11:54 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|38a019d4fd6ff8e7cf92f5e7c6a899c336f20235}}: itwiki: Deploy Growth features in dark mode ([[phab:T255037|T255037]]; 1/3) (duration: 01m 05s)
* 08:38 marostegui@cumin1001: dbctl commit (dc=all): 'Increase db1107 weight from 10 to 11', diff saved to https://phabricator.wikimedia.org/P10380 and previous config saved to /var/cache/conftool/dbconfig/20200211-083812-marostegui.json
* 11:50 urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript extensions/GrowthExperiments/maintenance/initWikiConfig.php --wiki=itwiki --phab='[[phab:T255037|T255037]]' # [[phab:T255037|T255037]]
* 08:25 marostegui: Upgrade db1095:3312, db1095:3313
* 11:49 urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript extensions/WikimediaMaintenance/createExtensionTables.php --wiki=itwiki growthexperiments # [[phab:T255037|T255037]]
* 08:22 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool es1013 after upgrade', diff saved to https://phabricator.wikimedia.org/P10379 and previous config saved to /var/cache/conftool/dbconfig/20200211-082204-marostegui.json
* 11:48 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.38.0-wmf.4/extensions/Wikibase/repo/: Backport: [[gerrit:730380{{!}}Instantiate ItemId for SiteLinkConflictLookup results (T293104)]] (duration: 01m 07s)
* 08:14 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool es1013 after upgrade', diff saved to https://phabricator.wikimedia.org/P10378 and previous config saved to /var/cache/conftool/dbconfig/20200211-081421-marostegui.json
* 11:43 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.38.0-wmf.3/extensions/Wikibase/repo/: Backport: [[gerrit:730385{{!}}Instantiate ItemId for SiteLinkConflictLookup results (T293104)]] (duration: 01m 18s)
* 08:13 marostegui@cumin1001: dbctl commit (dc=all): 'Increase weight from 5 to 10 for db1107 - [[phab:T242702|T242702]]', diff saved to https://phabricator.wikimedia.org/P10377 and previous config saved to /var/cache/conftool/dbconfig/20200211-081319-marostegui.json
* 11:33 ema@cumin2002: START - Cookbook sre.hosts.reimage for host cp4021.ulsfo.wmnet with OS buster
* 08:04 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool es1013 after upgrade', diff saved to https://phabricator.wikimedia.org/P10376 and previous config saved to /var/cache/conftool/dbconfig/20200211-080458-marostegui.json
* 11:19 ema: pool cp4021 after reimage [[phab:T201317|T201317]]
* 07:57 akosiaris: [[phab:T242705|T242705]] systemctl stop uwsgi-ores on ores2001.
* 11:05 ema@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4021.ulsfo.wmnet with OS buster
* 07:54 vgutierrez@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 10:15 jgiannelos@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 07:54 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 10:10 ayounsi@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 07:53 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool es1013 after upgrade', diff saved to https://phabricator.wikimedia.org/P10375 and previous config saved to /var/cache/conftool/dbconfig/20200211-075358-marostegui.json
* 10:09 phuedx@deploy1002: Synchronized wmf-config/InitialiseSettings-labs.php: Config: [[gerrit:728490{{!}}Add more types of QuickSurveys on beta cluster (T292459)]] (duration: 01m 53s)
* 07:47 marostegui: Upgrade es1013 - [[phab:T239791|T239791]]
* 10:06 ayounsi@cumin1001: START - Cookbook sre.dns.netbox
* 07:44 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es1013 - [[phab:T239791|T239791]]', diff saved to https://phabricator.wikimedia.org/P10374 and previous config saved to /var/cache/conftool/dbconfig/20200211-074358-marostegui.json
* 09:22 ema@cumin2002: START - Cookbook sre.hosts.reimage for host cp4021.ulsfo.wmnet with OS buster
* 07:23 vgutierrez: depool cp5007 and reimage as buster - [[phab:T242093|T242093]]
* 08:35 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 07:22 vgutierrez: pool cp5001 and cp5008 running buster - [[phab:T242093|T242093]]
* 08:28 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 07:21 marostegui: Remove partitions from db2086:3318 - [[phab:T239453|T239453]]
* 08:21 elukey: run kafka preferred-replica-election on kafka-main1001 to rebalance partition leaders - [[phab:T288825|T288825]]
* 07:19 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2086:3318 [[phab:T239453|T239453]]', diff saved to https://phabricator.wikimedia.org/P10373 and previous config saved to /var/cache/conftool/dbconfig/20200211-071936-marostegui.json
* 08:15 godog: bounce graphite on graphite1004 to apply new config
* 07:16 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db2085:3318 [[phab:T239453|T239453]]', diff saved to https://phabricator.wikimedia.org/P10372 and previous config saved to /var/cache/conftool/dbconfig/20200211-071639-marostegui.json
* 07:33 elukey: increase kafka topic partition size of the top 4 high traffic topics of main-eqiad as described in https://phabricator.wikimedia.org/T288825#7422726
* 07:07 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1107 for 10.4 testing - [[phab:T242702|T242702]]', diff saved to https://phabricator.wikimedia.org/P10371 and previous config saved to /var/cache/conftool/dbconfig/20200211-070720-marostegui.json
* 07:13 XioNoX: provision new eqsin-ulsfo link - [[phab:T273308|T273308]]
* 07:01 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 06:26 elukey: `kafka topics --alter --topic <nowiki>{</nowiki>eqiad,codfw<nowiki>}</nowiki>.change-prop.transcludes.resource-change --partitions 3` on kafka-main2001 - [[phab:T288825|T288825]]
* 06:59 marostegui: Stop haproxy on dbproxy1001 - [[phab:T244463|T244463]]
* 00:38 ejegg: updated payments-wiki from {{Gerrit|030b11da1a}} to {{Gerrit|b329d2dea2}}
* 06:59 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 06:58 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 06:57 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 06:48 marostegui: Remove grants in m1 for dbproxy1001 - [[phab:T231280|T231280]]
* 06:25 vgutierrez: depool cp5001 & cp5008 and reimage as buster - [[phab:T242093|T242093]]
* 06:18 marostegui: Failover m1-master from dbproxy1014 to dbproxy1012 - [[phab:T202367|T202367]]
* 00:26 ebernhardson@deploy1001: Synchronized php-1.35.0-wmf.18/skins/MinervaNeue: SWAT: Revert: Reduce userContributions icon code (duration: 01m 06s)
* 00:20 ebernhardson@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Give NS_HELP same weight as NS_MAIN in search on wikitech (duration: 01m 06s)
* 00:15 ebernhardson@deploy1001: Synchronized wmf-config/: SWAT: Enable SpecialMute page on all wikis (duration: 01m 06s)


== 2020-02-10 ==
== 2021-10-12 ==
* 23:30 robh: cp108[23] returned to service via [[phab:T243167|T243167]]
* 23:48 dzahn@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'miscweb' for release 'main' .
* 23:28 legoktm: restarting zuul
* 23:16 urbanecm: UTC late B&C window done
* 23:26 reedy@deploy1001: Synchronized php-1.35.0-wmf.18/extensions/OATHAuth/src/Key/TOTPKey.php: [[phab:T244308|T244308]] (duration: 01m 04s)
* 23:15 urbanecm@deploy1002: Synchronized wmf-config/logos.php: {{Gerrit|59c31d9046a68e73b07d8179ac569425d18dcf73}}: Change logo in astwiki ([[phab:T292742|T292742]]) (duration: 01m 04s)
* 23:25 reedy@deploy1001: Synchronized php-1.35.0-wmf.16/extensions/OATHAuth/src/Key/TOTPKey.php: [[phab:T244308|T244308]] (duration: 01m 07s)
* 23:12 urbanecm@deploy1002: Synchronized static/images/project-logos/: {{Gerrit|59c31d9046a68e73b07d8179ac569425d18dcf73}}: Change logo in astwiki ([[phab:T292742|T292742]]) (duration: 02m 09s)
* 23:06 robh: cp108[01] returned to service, cp108[23] offline for bios update via [[phab:T243167|T243167]]
* 23:05 dzahn@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'miscweb' for release 'main' .
* 22:50 chasemp: phab1001:~# sudo /srv/phab/phabricator/bin/bulk make-silent  --id 2164
* 22:53 urbanecm: [urbanecm@labweb1001 ~]$ mwscript extensions/OATHAuth/maintenance/disableOATHAuthForUser.php --wiki=labswiki Jamesmontalvo3 #
* 22:45 sbassett@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Add authevents as monolog channel (duration: 01m 06s)
* 22:51 dzahn@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'miscweb' for release 'main' .
* 22:43 robh: cp107[789] returned to service, cp108[01] offline for bios update via [[phab:T243167|T243167]]
* 20:21 bd808@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'toolhub' for release 'main' .
* 22:42 robh: cp107[89] returned to service, cp108[01] offline for bios update via [[phab:T243167|T243167]]
* 19:31 dancy@deploy1002: Pruned MediaWiki: 1.38.0-wmf.1 (duration: 04m 02s)
* 21:58 robh: cp107[56] returned to service, cp107[78] offline for bios update via [[phab:T243167|T243167]]
* 19:13 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 21:43 arlolra: Updated Parsoid to {{Gerrit|612106d2}} ([[phab:T244412|T244412]], [[phab:T244413|T244413]], [[phab:T242746|T242746]], [[phab:T235273|T235273]], [[phab:T235307|T235307]], [[phab:T238845|T238845]], [[phab:T204618|T204618]], [[phab:T240054|T240054]])
* 19:08 pt1979@cumin2002: START - Cookbook sre.dns.netbox
* 21:38 robh: cp1075 & cp1076 offline for bios updates per [[phab:T243167|T243167]]
* 19:02 dancy@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.38.0-wmf.4  refs [[phab:T281168|T281168]]
* 21:36 robh: cp1075 and cp1076 going offline for bios updates. This will cause a bit of cp irc icinga noise, but no paging.  Not putting into maint mode, as there is no way to maint mode the noisest check (which checks all backends and thus shouldnt be disabled)
* 18:47 dancy@deploy1002: Finished scap: testwikis wikis to 1.38.0-wmf.4  refs [[phab:T281168|T281168]] (duration: 45m 36s)
* 21:33 arlolra@deploy1001: Finished deploy [parsoid/deploy@d2d4870]: Updating Parsoid to {{Gerrit|612106d2}} (duration: 10m 26s)
* 18:12 volans@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sretest1001.eqiad.wmnet with OS buster
* 21:32 XioNoX: clamp tcp-mss on cr2-eqiad:xe-3/3/3
* 18:01 dancy@deploy1002: Started scap: testwikis wikis to 1.38.0-wmf.4  refs [[phab:T281168|T281168]]
* 21:23 arlolra@deploy1001: Started deploy [parsoid/deploy@d2d4870]: Updating Parsoid to {{Gerrit|612106d2}}
* 17:58 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 21:12 halfak@deploy1001: Finished deploy [ores/deploy@a6f4f14]: [[phab:T242705|T242705]] (duration: 12m 18s)
* 17:56 dancy@deploy1002: Synchronized php-1.38.0-wmf.4/extensions/CentralNotice: Backport: [[gerrit:730141]] (duration: 00m 59s)
* 21:00 halfak@deploy1001: Started deploy [ores/deploy@a6f4f14]: [[phab:T242705|T242705]]
* 17:55 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 20:55 mholloway-shell@deploy1001: Synchronized php-1.35.0-wmf.16/extensions/MachineVision: MachineVision: Fix page id parsing from imageinfo results ([[phab:T244752|T244752]]) (duration: 01m 11s)
* 17:46 volans@cumin2002: START - Cookbook sre.hosts.reimage for host sretest1001.eqiad.wmnet with OS buster
* 20:14 mholloway-shell@deploy1001: Synchronized php-1.35.0-wmf.18/extensions/MachineVision: MachineVision: Fix page id parsing from imageinfo results ([[phab:T244752|T244752]]) (duration: 01m 15s)
* 17:43 jgiannelos@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 19:31 ppchelko@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:570393]] Config: Session Store: Switch group0 and group1 to kask-session [[phab:T243106|T243106]] (duration: 01m 06s)
* 17:41 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 19:28 mutante: Gerrit - added eevans to 'wmf-deployment' group ([[phab:T244508|T244508]])
* 17:41 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 19:12 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: [[phab:T242122|T242122]] Load new EventStreamConfig extension if so configured (duration: 01m 06s)
* 17:38 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 19:07 jforrester@deploy1001: Scap failed!: Call to mwscript eval.php stderr: not empty
* 17:32 dancy@deploy1002: Synchronized php-1.38.0-wmf.3/extensions/SyntaxHighlight_GeSHi/includes/ResourceLoaderPygmentsModule.php: Backport: [[gerrit:730233{{!}}Include generated styles before Mediawiki overrides (T292736)]] (duration: 00m 57s)
* 19:06 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [[phab:T242122|T242122]] Set default of wmgUseEventStreamConfig false everywhere (duration: 01m 06s)
* 17:30 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 18:39 twentyafterfour@deploy1001: Synchronized php: group1 wikis to 1.35.0-wmf.18  refs [[phab:T233866|T233866]] (duration: 01m 05s)
* 17:27 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 18:38 twentyafterfour@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.35.0-wmf.18  refs [[phab:T233866|T233866]]
* 17:23 dancy@deploy1002: Synchronized php-1.38.0-wmf.4/includes/actions/pagers/HistoryPager.php: Backport: [[gerrit:730236{{!}}Fix history page iteration in backwards mode (T292791)]] (duration: 00m 57s)
* 18:25 twentyafterfour@deploy1001: rebuilt and synchronized wikiversions files: group0 wikis to 1.35.0-wmf.18 refs [[phab:T233867|T233867]]
* 17:19 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 18:21 twentyafterfour: MediaWiki train: finally moving forward with group0 wikis to 1.35.0-wmf.18 refs [[phab:T233866|T233866]]
* 17:16 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 17:52 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: [[phab:T244561|T244561]] Set Kartographer servers to Wikimedia servers (duration: 01m 06s)
* 17:16 dancy@deploy1002: Synchronized php-1.38.0-wmf.3/includes/actions/pagers/HistoryPager.php: Backport: [[gerrit:730235{{!}}Fix history page iteration in backwards mode (T292791)]] (duration: 00m 57s)
* 16:48 moritzm: installing libexif security updates on jessie
* 17:12 moritzm: installing rsync bugfix updates
* 16:22 vgutierrez: pooling cp5002 and cp5009 running buster - [[phab:T242093|T242093]]
* 17:09 bd808@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'toolhub' for release 'main' .
* 15:45 XioNoX: push outbound flowspec support to core routers
* 16:56 bd808@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'toolhub' for release 'main' .
* 15:45 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1107 after first day of 10.4 testing - [[phab:T242702|T242702]]', diff saved to https://phabricator.wikimedia.org/P10366 and previous config saved to /var/cache/conftool/dbconfig/20200210-154552-marostegui.json
* 16:55 volans@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts testvm2009.codfw.wmnet
* 15:41 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 16:53 moritzm: failed over ganeti master for test cluster to ganeti2025
* 15:41 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 16:50 bd808@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'toolhub' for release 'main' .
* 15:33 godog: roll restart cassandra on session* to apply logging changes - [[phab:T242585|T242585]]
* 16:48 volans@cumin2002: START - Cookbook sre.hosts.decommission for hosts testvm2009.codfw.wmnet
* 15:23 moritzm: uploading debdeploy 0.0.99.13 to apt.wikimedia.org
* 16:32 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 15:22 godog: roll restart cassandra on restbase* to apply logging changes - [[phab:T242585|T242585]]
* 16:30 volans@cumin2002: END (ERROR) - Cookbook sre.hosts.decommission (exit_code=97) for hosts testvm2009.codfw.wmnet
* 15:19 vgutierrez@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 16:30 volans@cumin2002: START - Cookbook sre.hosts.decommission for hosts testvm2009.codfw.wmnet
* 15:19 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 16:29 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 15:19 vgutierrez@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 16:26 volans@cumin2002: END (ERROR) - Cookbook sre.ganeti.makevm (exit_code=97) for new host testvm2009.codfw.wmnet
* 15:19 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 16:26 dancy@deploy1002: Synchronized php-1.38.0-wmf.4/includes: Backport: [[gerrit:730226{{!}}Pre-format comments for non-local files too (T292570)]] (duration: 01m 15s)
* 15:06 marostegui: Reload haproxy on dbproxy1017 and dbproxy1017 - [[phab:T244209|T244209]]
* 16:17 volans@cumin2002: START - Cookbook sre.ganeti.makevm for new host testvm2009.codfw.wmnet
* 15:04 twentyafterfour@deploy1001: Finished scap: full scap sync prior to wmf.18 rollout (duration: 20m 13s)
* 16:16 volans@cumin2002: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts testvm2009.codfw.wmnet
* 15:04 godog: roll restart cassandra on maps* to apply logging changes - [[phab:T242585|T242585]]
* 16:11 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 15:03 vgutierrez: rolling restart of ats-tls - [[phab:T240950|T240950]]
* 16:10 volans@cumin2002: START - Cookbook sre.hosts.decommission for hosts testvm2009.codfw.wmnet
* 15:00 marostegui: Restart mysql on m5 master (wikitech will go down) - [[phab:T244209|T244209]]
* 16:09 volans@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts testvm2009.codfw.wmnet
* 14:52 vgutierrez: rolling restart of ats-tls in ulsfo - [[phab:T244464|T244464]]
* 16:08 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 14:46 vgutierrez: depool cp5002 and cp5009 and reimage as buster - [[phab:T242093|T242093]]
* 16:06 dancy@deploy1002: Synchronized php-1.38.0-wmf.4/extensions/SecurePoll/includes/Hooks/HookRunner.php: Backport: [[gerrit:730231{{!}}Fix wrong var being passed (T289950 T293102)]] (duration: 00m 57s)
* 14:44 twentyafterfour@deploy1001: Started scap: full scap sync prior to wmf.18 rollout
* 16:00 volans@cumin2002: START - Cookbook sre.hosts.decommission for hosts testvm2009.codfw.wmnet
* 14:42 vgutierrez: repool cp5003 and cp5010 running buster - [[phab:T242093|T242093]]
* 15:59 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 14:41 marostegui: Full-upgrade db1133 (without restarting mysql) - [[phab:T244209|T244209]]
* 15:58 dancy@deploy1002: Synchronized php-1.38.0-wmf.3/extensions/SecurePoll/includes/Hooks/HookRunner.php: Backport: [[gerrit:730230{{!}}Fix wrong var being passed (T289950 T293102)]] (duration: 02m 13s)
* 14:40 twentyafterfour: MediaWiki Train: Running a full scap to prepare for moving forward to 1.35.0-wmf.18 ( [[phab:T233866|T233866]] )
* 15:57 volans@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host testvm2009.codfw.wmnet
* 14:32 marostegui: Downtime m5 hosts for the upcoming maintenance - [[phab:T244209|T244209]]
* 15:57 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 14:19 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 15:51 jgiannelos@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 14:17 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 15:49 volans@cumin2002: START - Cookbook sre.ganeti.makevm for new host testvm2009.codfw.wmnet
* 14:17 vgutierrez@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 15:48 volans@cumin2002: END (ERROR) - Cookbook sre.ganeti.makevm (exit_code=97) for new host testvm2009.codfw.wmnet
* 14:17 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 15:48 volans@cumin2002: START - Cookbook sre.ganeti.makevm for new host testvm2009.codfw.wmnet
* 14:11 XioNoX: remove TCP-MSS clamping on cr3-knams
* 15:41 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for analytics1069.eqiad.wmnet
* 13:48 vgutierrez: depool cp5003 and reimage as buster - [[phab:T242093|T242093]]
* 15:41 btullis@cumin1001: START - Cookbook sre.hosts.remove-downtime for analytics1069.eqiad.wmnet
* 13:47 vgutierrez: pooling cp5004 with buster - [[phab:T242093|T242093]]
* 15:02 volans@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 13:46 vgutierrez: depool cp5010 and reimage as buster - [[phab:T242093|T242093]]
* 14:50 volans@cumin2002: START - Cookbook sre.dns.netbox
* 13:45 vgutierrez: pooling cp5011 with buster - [[phab:T242093|T242093]]
* 13:49 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 13:28 godog: roll restart cassandra on aqs to apply logging changes - [[phab:T242585|T242585]]
* 13:40 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host testvm2006.codfw.wmnet
* 13:03 ladsgroup@deploy1001: Synchronized php-1.35.0-wmf.18/extensions/Wikibase: [[gerrit:570911{{!}}Revert "wbterms: Set default for the term store to read new"]] ([[phab:T244529|T244529]]) (duration: 01m 00s)
* 13:25 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host testvm2006.codfw.wmnet
* 13:03 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 13:21 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 13:00 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 13:19 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 12:59 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 13:14 godog: add 50G to prometheus/k8s in eqiad
* 12:58 Urbanecm: EU SWAT is done
* 13:13 otto@deploy1002: Synchronized wmf-config/CommonSettings.php: Enable x_client_ip_forwarding_enabled for eventgate-analytics and eventgate-analytics-external - [[phab:T288853|T288853]] (duration: 00m 56s)
* 12:58 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 13:11 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on analytics1069.eqiad.wmnet with reason: draining flea power [[phab:T291732|T291732]]
* 12:56 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: {{Gerrit|989c9f8}}: Revert "Revert "Remove handler deleted from the MachineVision extension"" (duration: 00m 58s)
* 13:11 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on analytics1069.eqiad.wmnet with reason: draining flea power [[phab:T291732|T291732]]
* 12:51 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: {{Gerrit|989c9f8}}: Revert "Revert "Remove handler deleted from the MachineVision extension"" (duration: 00m 59s)
* 13:05 volans: upgraed spicerack to 1.0.5 on cumin hosts
* 12:49 urbanecm@deploy1001: Finished scap: SWAT: {{Gerrit|799224f}}:  {{Gerrit|137a40e}} ([[phab:T241242|T241242]]; [[phab:T243974|T243974]]) (duration: 20m 18s)
* 12:25 volans: uploaded spicerack_1.0.5 to apt.wikimedia.org buster-wikimedia,bullseye-wikimedia
* 12:30 vgutierrez: depool cp5004 and reimage as buster - [[phab:T242093|T242093]]
* 12:15 elukey: `kafka topics --alter --topic codfw.mediawiki.job.cirrusSearchElasticaWrite --partitions 5` - [[phab:T288825|T288825]]
* 12:29 vgutierrez: pooling cp5005 with buster - [[phab:T242093|T242093]]
* 12:15 elukey: `kafka topics --alter --topic eqiad.mediawiki.job.cirrusSearchElasticaWrite --partitions 5` - [[phab:T288825|T288825]]
* 12:28 urbanecm@deploy1001: Started scap: SWAT: {{Gerrit|799224f}}:  {{Gerrit|137a40e}} ([[phab:T241242|T241242]]; [[phab:T243974|T243974]])
* 12:10 elukey: `kafka topics --alter --topic codfw.cpjobqueue.partitioned.mediawiki.job.cirrusSearchElasticaWrite --partitions 5` - [[phab:T288825|T288825]]
* 12:23 vgutierrez: pooling ncredir1001 with buster - [[phab:T243391|T243391]]
* 12:09 elukey: `kafka topics --alter --topic eqiad.cpjobqueue.partitioned.mediawiki.job.cirrusSearchElasticaWrite --partitions 5` - [[phab:T288825|T288825]]
* 12:18 _joe_: running puppet, scap pull on mwdebug1001
* 11:58 elukey: `kafka topics --alter --topic codfw.resource-purge --partitions 5` on kafka-main2001 - [[phab:T288825|T288825]]
* 12:17 vgutierrez: upload trafficserver 8.0.5-1wm15 to apt.wm.o (buster) - [[phab:T244538|T244538]]
* 11:49 elukey: `kafka topics --alter --topic eqiad.resource-purge --partitions 5` on kafka-main2001 - [[phab:T288825|T288825]]
* 12:08 vgutierrez@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 11:46 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2025.codfw.wmnet
* 12:08 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 11:44 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 12:06 vgutierrez: testing ats 8.0.5-1-wm15 on cp4032 - [[phab:T244538|T244538]]
* 11:42 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 12:06 urbanecm@deploy1001: Synchronized wmf-config/throttle.php: SWAT: {{Gerrit|014405a}}: Add throttle rules for OSU Editathon and workshop for cawiki, remove expired ones ([[phab:T244608|T244608]], [[phab:T244645|T244645]]) (duration: 01m 03s)
* 11:40 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2025.codfw.wmnet
* 11:57 vgutierrez: depool ncredir1001 and reimage as buster - [[phab:T243391|T243391]]
* 11:34 urbanecm: UTC morning B&C window done
* 11:57 vgutierrez: pooling ncredir1002 with buster - [[phab:T243391|T243391]]
* 11:33 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 11:43 vgutierrez: pooling cp4027 with buster - [[phab:T242093|T242093]]
* 11:32 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|860ea0944d6dc1e6b5061eb84eec378eb5ac8441}}: Remove NS_MAIN from wgExtraSignatureNamespaces on most special wikis ([[phab:T291630|T291630]]) (duration: 00m 57s)
* 11:38 vgutierrez: depool ncredir1002 and reimage as buster - [[phab:T243391|T243391]]
* 11:30 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 11:31 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 11:22 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 11:29 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 11:19 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 11:22 vgutierrez: depooling cp5011 and cp5005 & reimage as buster - [[phab:T242093|T242093]]
* 11:14 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 11:07 vgutierrez: depool cp4027 & reimage as buster - [[phab:T242093|T242093]]
* 11:11 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 11:07 vgutierrez: pooling ncredir2001 with buster - [[phab:T243391|T243391]]
* 11:08 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 11:03 vgutierrez: pooling cp4028 with buster - [[phab:T242093|T242093]]
* 11:06 urbanecm@deploy1002: Synchronized w/static.php: {{Gerrit|e77ae17efb34723598fc69e87109944384df442a}}: static.php: correctly report a bad request (duration: 00m 57s)
* 10:50 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 11:02 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host testvm2003.codfw.wmnet
* 10:48 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 10:58 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host testvm2003.codfw.wmnet
* 10:47 godog: remove old logs from /var/log/swift on swift hsots
* 10:53 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on testvm[2001-2002,2005].codfw.wmnet with reason: Ganeti tests
* 10:31 vgutierrez: depool ncredir2001 and reimage as buster - [[phab:T243391|T243391]]
* 10:53 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on testvm[2001-2002,2005].codfw.wmnet with reason: Ganeti tests
* 10:26 vgutierrez: depool cp4028 & reimage as buster - [[phab:T242093|T242093]]
* 10:30 ema: apply https://gerrit.wikimedia.org/r/726912 to all A:cp nodes [[phab:T288106|T288106]]
* 10:14 moritzm: installing sudo security updates for buster
* 10:24 ema@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp4028.ulsfo.wmnet,service=ats-be
* 08:53 vgutierrez: pooling cp4029 with buster - [[phab:T242093|T242093]]
* 10:23 ema: depool/repool ats-be on cp4028 to verify updates to /etc/varnish/directors.frontend.vcl on cp4027 keep on working fine [[phab:T288106|T288106]]
* 08:44 marostegui@cumin1001: dbctl commit (dc=all): 'Increase weight from 1 to 5 for db1107 - [[phab:T242702|T242702]]', diff saved to https://phabricator.wikimedia.org/P10364 and previous config saved to /var/cache/conftool/dbconfig/20200210-084446-marostegui.json
* 10:23 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 08:43 vgutierrez: pooling ncredir2002 with buster - [[phab:T243391|T243391]]
* 10:22 ema@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp4028.ulsfo.wmnet,service=ats-be
* 08:34 effie: rolling restart php-fpm on labweb[1001-1002].wikimedia.org,mw*.eqiad.wmnet,scandium.eqiad.wmnet, wtp[1025-1048].eqiad.wmnet
* 10:16 ema: cp4027: enable and run puppet to test https://gerrit.wikimedia.org/r/726912 [[phab:T288106|T288106]]
* 08:32 effie: update php-apcu on eqiad - [[phab:T236800|T236800]]
* 10:12 jmm@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ganeti2025.codfw.wmnet with OS buster
* 08:29 effie: rolling restart php-fpm on cloudweb2001-dev.wikimedia.org,mw[2135-2147,2150-2212,2214-2290].codfw.wmnet,wtp[2001-2020].codfw.wmnet
* 09:16 kormat@cumin1001: dbctl commit (dc=all): 'db1127 (re)pooling @ 100%: repool db1127 [[phab:T292956|T292956]]', diff saved to https://phabricator.wikimedia.org/P17456 and previous config saved to /var/cache/conftool/dbconfig/20211012-091614-kormat.json
* 08:23 effie: update php-apcu on codfw - [[phab:T236800|T236800]]
* 09:01 kormat@cumin1001: dbctl commit (dc=all): 'db1127 (re)pooling @ 75%: repool db1127 [[phab:T292956|T292956]]', diff saved to https://phabricator.wikimedia.org/P17455 and previous config saved to /var/cache/conftool/dbconfig/20211012-090111-kormat.json
* 07:58 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 08:46 kormat@cumin1001: dbctl commit (dc=all): 'db1127 (re)pooling @ 50%: repool db1127 [[phab:T292956|T292956]]', diff saved to https://phabricator.wikimedia.org/P17454 and previous config saved to /var/cache/conftool/dbconfig/20211012-084607-kormat.json
* 07:56 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 08:31 kormat@cumin1001: dbctl commit (dc=all): 'db1127 (re)pooling @ 25%: repool db1127 [[phab:T292956|T292956]]', diff saved to https://phabricator.wikimedia.org/P17453 and previous config saved to /var/cache/conftool/dbconfig/20211012-083103-kormat.json
* 07:54 moritzm: updating d-i netinst image for Stretch 9.12 point release (which bumped the kernel ABI)
* 08:03 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 07:29 moritzm: updating d-i netinst image for Buster 10.3 point release (which bumped the kernel ABI)
* 08:00 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 07:09 elukey: restore mw1347's mcrouter settings to its default (proxy threads 10 -> 5)
* 07:58 urbanecm@deploy1002: Synchronized php-1.38.0-wmf.3/extensions/GrowthExperiments/: {{Gerrit|17dc3aa}}, {{Gerrit|e0ca905}}, {{Gerrit|c0f4f4e}}: GrowthExperiments backports ([[phab:T292224|T292224]], [[phab:T290609|T290609]], [[phab:T290609|T290609]]) (duration: 00m 59s)
* 07:01 marostegui@cumin1001: dbctl commit (dc=all): 'Place db1107 - MariaDB 10.4 on s1 with minimal weight - [[phab:T242702|T242702]]', diff saved to https://phabricator.wikimedia.org/P10363 and previous config saved to /var/cache/conftool/dbconfig/20200210-070140-marostegui.json
* 07:40 elukey: run kafka preferred-replica-election on kafka-main2001 to rebalance partition leaders after the last topic moves - [[phab:T288825|T288825]]
* 06:55 vgutierrez: depool ncredir2002 and reimage as buster - [[phab:T243391|T243391]]
* 07:33 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti2025.codfw.wmnet with OS buster
* 06:53 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool es1019', diff saved to https://phabricator.wikimedia.org/P10362 and previous config saved to /var/cache/conftool/dbconfig/20200210-065326-marostegui.json
* 07:22 moritzm: installing RT security updates
* 06:51 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1091 [[phab:T232446|T232446]]', diff saved to https://phabricator.wikimedia.org/P10361 and previous config saved to /var/cache/conftool/dbconfig/20200210-065135-marostegui.json
* 04:43 eileen: civicrm revision changed from {{Gerrit|96090e4bd2}} to {{Gerrit|946dfb6c5a}}, config revision is {{Gerrit|85277466ed}}
* 06:47 vgutierrez: depool cp4029 & reimage as buster - [[phab:T242093|T242093]]
* 03:56 kart_: cxserver: Remove Matxin Key from Production ([[phab:T292635|T292635]])
* 06:45 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool es1019', diff saved to https://phabricator.wikimedia.org/P10360 and previous config saved to /var/cache/conftool/dbconfig/20200210-064553-marostegui.json
* 03:54 kartik@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'cxserver' for release 'production' .
* 06:45 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1091 [[phab:T232446|T232446]]', diff saved to https://phabricator.wikimedia.org/P10359 and previous config saved to /var/cache/conftool/dbconfig/20200210-064458-marostegui.json
* 03:48 kartik@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'cxserver' for release 'production' .
* 06:39 marostegui: Compress db1124:3318 - this will generate lag on s8 wiki replicas - [[phab:T232446|T232446]]
* 03:45 kartik@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'cxserver' for release 'staging' .
* 06:37 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1091 [[phab:T232446|T232446]]', diff saved to https://phabricator.wikimedia.org/P10358 and previous config saved to /var/cache/conftool/dbconfig/20200210-063716-marostegui.json
* 02:44 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 06:23 marostegui: Remove partitions from db1099:3311, db1099:3318 [[phab:T239453|T239453]]
* 02:41 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 06:21 marostegui@cumin1001: dbctl commit (dc=all): 'Depool  db1099:3318 [[phab:T239453|T239453]]', diff saved to https://phabricator.wikimedia.org/P10357 and previous config saved to /var/cache/conftool/dbconfig/20200210-062112-marostegui.json
* 02:28 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 06:18 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1099:3311 [[phab:T239453|T239453]]', diff saved to https://phabricator.wikimedia.org/P10356 and previous config saved to /var/cache/conftool/dbconfig/20200210-061822-marostegui.json
* 02:22 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 06:16 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1091 [[phab:T232446|T232446]]', diff saved to https://phabricator.wikimedia.org/P10355 and previous config saved to /var/cache/conftool/dbconfig/20200210-061656-marostegui.json
* 00:11 eileen: civicrm revision changed from {{Gerrit|598b59b0ee}} to {{Gerrit|96090e4bd2}}, config revision is {{Gerrit|85277466ed}}


== 2020-02-09 ==
== 2021-10-11 ==
* 05:11 cdanis: [[phab:T238305|T238305]] hardreset cp3051
* 21:25 btullis@cumin1001: END (PASS) - Cookbook sre.hadoop.roll-restart-masters (exit_code=0) restart masters for Hadoop analytics cluster: Restart of jvm daemons. - btullis@cumin1001
* 20:58 btullis@cumin1001: START - Cookbook sre.hadoop.roll-restart-masters restart masters for Hadoop analytics cluster: Restart of jvm daemons. - btullis@cumin1001
* 17:08 elukey: force kafka preferred-replica-election on kafka-main2001 after another batch of topic partitions moves - [[phab:T288825|T288825]]
* 15:40 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2026.codfw.wmnet
* 15:34 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2026.codfw.wmnet
* 15:31 jgleeson: smashpig updated from {{Gerrit|3607b16f83}} to {{Gerrit|dd3a81c7c2}}
* 14:59 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on testvm[2001-2002,2005].codfw.wmnet with reason: Ganeti tests
* 14:59 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on testvm[2001-2002,2005].codfw.wmnet with reason: Ganeti tests
* 14:36 Emperor: start restoring weight to ms-be2045 [[phab:T290881|T290881]]
* 13:42 elukey: force kafka preferred-replica-election on kafka-main2001 after another batch of topic partitions moves - [[phab:T288825|T288825]]
* 12:53 moritzm: install apache security updates on buster
* 12:49 topranks: Setting up BGP peering to AS12552 (GlobalConnect Group) at AMS-IX on cr2-esams
* 12:45 ema: cp4027: upgrade varnish to 6.0.8 [[phab:T292290|T292290]]
* 12:04 moritzm: install apache security updates on bullseye
* 10:23 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host graphite2003.codfw.wmnet
* 09:50 filippo@cumin1001: START - Cookbook sre.hosts.reimage for host graphite2003.codfw.wmnet
* 09:45 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host graphite2003.codfw.wmnet
* 09:37 elukey: force kafka preferred-replica-election on kafka-main2001 after another batch of topic partitions moves - [[phab:T288825|T288825]]
* 09:13 filippo@cumin1001: START - Cookbook sre.hosts.reimage for host graphite2003.codfw.wmnet
* 09:09 elukey: force kafka preferred-replica-election on kafka-main2001 after the first 50 topic partitions moves - [[phab:T288825|T288825]]
* 09:05 volans@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sretest1002.eqiad.wmnet
* 09:01 godog: bounce swift-object-replicator on ms-be2036
* 08:52 godog: bounce statsite on graphite1004 to apply unit config changes
* 08:48 volans@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sretest1001.eqiad.wmnet
* 08:41 volans@cumin2002: START - Cookbook sre.hosts.reimage for host sretest1002.eqiad.wmnet
* 08:38 moritzm: updated buster d-i image for Bullseye 11.1 point release [[phab:T292844|T292844]]
* 08:38 moritzm: updated buster d-i image for Buster 10.11 point release [[phab:T292838|T292838]]
* 08:26 godog: swift eqiad-prod: final weight to ms-be10[64-67] - [[phab:T290546|T290546]]
* 08:25 moritzm: updated buster d-i image for Buster 10.11 point release [[phab:T292838|T292838]]
* 08:24 volans@cumin1001: START - Cookbook sre.hosts.reimage for host sretest1001.eqiad.wmnet
* 08:06 godog: bounce uwsgi on graphite hosts to bump request size limit - [[phab:T292877|T292877]]
* 07:58 volans: migrating physical hosts DHCP to the new reimage process - [[phab:T269855|T269855]]
* 07:57 elukey: start kafka topics rebalancing for main-codfw (long running maintenance) - [[phab:T288825|T288825]]


== 2020-02-08 ==
== 2021-10-09 ==
* 19:12 _joe_: set cpufreq governor to performance on mw1328
* 05:01 jiji@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 17:04 _joe_: restarted php7.2-fpm on mw1332
* 04:28 jiji@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 16:53 Urbanecm: mwscript resetAuthenticationThrottle.php --wiki=enwiki --signup --ip 12.24.27.50
* 01:32 ryankemper@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) restart without plugin upgrade (1 nodes at a time) for ElasticSearch cluster cloudelastic: cloudelastic restart - ryankemper@cumin1001 - [[phab:T292814|T292814]]
* 16:47 gjg@deploy1001: Synchronized wmf-config/throttle.php: SWAT: Editathon in Charolette (duration: 00m 58s)
* 00:46 mutante: ms-be2045 - started systemd-timedated which had been killed by something
* 00:05 Jeff_Green: switched payments.wikimedia.org to codfw datacenter due to [[phab:T244610|T244610]]
* 00:28 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation restart without plugin upgrade (1 nodes at a time) for ElasticSearch cluster cloudelastic: cloudelastic restart - ryankemper@cumin1001 - [[phab:T292814|T292814]]
* 00:24 ryankemper@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.force-unfreeze (exit_code=99)
* 00:23 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.force-unfreeze
* 00:13 ryankemper: [[phab:T292814|T292814]] Write queue stuck at 133 events in partition 1 of topic `codfw.cpjobqueue.partitioned.mediawiki.job.cirrusSearchElasticaWrite`, will try again at another time
* 00:12 ryankemper@cumin1001: END (ERROR) - Cookbook sre.elasticsearch.rolling-operation (exit_code=97) restart without plugin upgrade (1 nodes at a time) for ElasticSearch cluster cloudelastic: cloudelastic restart - ryankemper@cumin1001 - [[phab:T292814|T292814]]


== 2020-02-07 ==
== 2021-10-08 ==
* 22:20 jeh: ceph: round 2 OSD failover and recovery testing on cloudcephosd1003.wikimedia.org [[phab:T240718|T240718]]
* 23:16 legoktm: sudo cumin -b 10 C:mediawiki::packages 'apt-get purge lilypond-data -y'
* 20:47 mutante: OS install on new install_server VMs worked on second attempt, issues are gone. signed puppet certs for install1003.eqiad.wmnet, install2003.codfw.wmnet, initial puppet runs ([[phab:T224576|T224576]])
* 23:10 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation restart without plugin upgrade (1 nodes at a time) for ElasticSearch cluster cloudelastic: cloudelastic restart - ryankemper@cumin1001 - [[phab:T292814|T292814]]
* 20:42 jeh: ceph: OSD failover and recovery testing on cloudcephosd1003.wikimedia.org [[phab:T240718|T240718]]
* 21:38 mutante: mwmaint2002 - disable-puppet, stop bacula-fd, recovery in progress
* 20:32 mutante: ganeti: attempting to reinstall install1003 which failed last time
* 21:34 mutante: disabling puppet on bacula - going through a restore https://wikitech.wikimedia.org/wiki/Bacula#Restore_from_a_non-existent_host_(missing_private_key)
* 17:38 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool es1019 after on-site maintenance [[phab:T243963|T243963]]', diff saved to https://phabricator.wikimedia.org/P10350 and previous config saved to /var/cache/conftool/dbconfig/20200207-173850-marostegui.json
* 21:30 legoktm: running puppet across C:mediawiki::packages to uninstall lilypond and ploticus: legoktm@cumin1001:~$ sudo cumin -b 4 C:mediawiki::packages 'run-puppet-agent'
* 17:36 twentyafterfour@deploy1001: Synchronized wmf-config/InitialiseSettings.php: sync InitializeSettings again for lols refs [[phab:T233866|T233866]] (duration: 01m 03s)
* 20:12 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes1018.eqiad.wmnet with reason: REIMAGE
* 17:32 twentyafterfour@deploy1001: Synchronized wmf-config/InitialiseSettings.php: sync https://gerrit.wikimedia.org/r/c/operations/mediawiki-config/+/570929 refs [[phab:T233866|T233866]] (duration: 01m 02s)
* 20:10 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubestage1004.eqiad.wmnet with reason: REIMAGE
* 17:25 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool es1019 after on-site maintenance [[phab:T243963|T243963]]', diff saved to https://phabricator.wikimedia.org/P10349 and previous config saved to /var/cache/conftool/dbconfig/20200207-172541-marostegui.json
* 20:08 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1018.eqiad.wmnet with reason: REIMAGE
* 17:22 twentyafterfour@deploy1001: rebuilt and synchronized wikiversions files: roll back all wikis to 1.35.0-wmf.16 refs [[phab:T233866|T233866]]
* 20:08 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubestage1003.eqiad.wmnet with reason: REIMAGE
* 17:19 marostegui: Start MySQL on es1019 after onsite maintenance [[phab:T243963|T243963]]
* 20:06 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubestage1004.eqiad.wmnet with reason: REIMAGE
* 16:46 filippo@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
* 20:05 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubestage1003.eqiad.wmnet with reason: REIMAGE
* 16:38 filippo@cumin1001: START - Cookbook sre.ganeti.makevm
* 19:46 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes1020.eqiad.wmnet with reason: REIMAGE
* 16:13 XioNoX: remove MSS clamping from eqiad/eqord/knams/esams
* 19:45 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes1019.eqiad.wmnet with reason: REIMAGE
* 16:05 andrew@deploy1001: Finished deploy [horizon/deploy@bc777d6]: Fix for [[phab:T243422|T243422]] (duration: 03m 45s)
* 19:43 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1020.eqiad.wmnet with reason: REIMAGE
* 16:04 vgutierrez: pooling cp4030 with buster - [[phab:T242093|T242093]]
* 19:42 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1019.eqiad.wmnet with reason: REIMAGE
* 16:03 bblack: removing GRE MTU mitigations from cp[135]xxx - [[phab:T232602|T232602]]
* 19:42 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes1018.eqiad.wmnet with reason: REIMAGE
* 16:01 andrew@deploy1001: Started deploy [horizon/deploy@bc777d6]: Fix for [[phab:T243422|T243422]]
* 19:39 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1018.eqiad.wmnet with reason: REIMAGE
* 15:50 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 18:15 cstone: civicrm revision changed from {{Gerrit|5cb7d487cb}} to {{Gerrit|598b59b0ee}}
* 15:48 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 16:19 urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript extensions/GrowthExperiments/maintenance/updateMenteeData.php --wiki=enwiki --force # to measure performance on a large wiki
* 15:25 vgutierrez: depool & reimage cp4030 as buster - [[phab:T242093|T242093]]
* 15:48 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
* 15:21 vgutierrez: pooling cp4031 with buster - [[phab:T242093|T242093]]
* 15:48 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 15:20 vgutierrez: pooling ncredir3001 running buster - [[phab:T243391|T243391]]
* 15:29 jelto: enable puppet on gitlab1001 again for [[phab:T283076|T283076]]
* 15:18 marostegui: Restart all instances on db1124 and db1125 to pick up a new replication filter - [[phab:T240094|T240094]]
* 14:05 jiji@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 15:11 marostegui: Restart all instances on db2094 and db2095 to pick up a new replication filter - [[phab:T240094|T240094]]
* 14:01 jiji@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 14:56 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 09:49 Amir1: wikiadmin@10.64.16.85(wikidatawiki)> delete from wb_changes_subscription where cs_subscriber_id in ('testcommonswiki', 'mowiki');
* 14:53 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 09:39 Emperor: installing stress on ms-be2045 given recent h/w issues [[phab:T290881|T290881]]
* 14:43 hoo@deploy1001: Synchronized wmf-config/Wikibase.php: REVERT: Wikibase Client: Fix setting name typo ([[phab:T244529|T244529]]) (duration: 01m 40s)
* 08:20 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 14:43 Amir1: ladsgroup@mwmaint1002:~$ mwscript createAndPromote.php --wiki=zhwiki --force "Amir Sarabadani (WMDE)" --sysop ([[phab:T244578|T244578]])
* 08:12 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 14:40 hoo@deploy1001: Scap failed!: 9/11 canaries failed their endpoint checks(http://en.wikipedia.org)
* 08:04 urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript extensions/GrowthExperiments/maintenance/updateMenteeData.php --wiki=frwiki --force
* 14:38 hoo@deploy1001: Synchronized wmf-config/Wikibase.php: Wikibase Client: Fix setting name typo ([[phab:T244529|T244529]]) (duration: 01m 20s)
* 07:43 Emperor: reboot ms-be2045 [[phab:T290881|T290881]]
* 14:33 vgutierrez: depool and reimage ncredir3001 as buster - [[phab:T243391|T243391]]
* 07:41 gehel: manually resuming the data reloads on wdqs1009 and wdqs2008
* 14:32 vgutierrez: depool & reimage cp4031 as buster - [[phab:T242093|T242093]]
* 06:42 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.cf (exit_code=0)
* 14:23 vgutierrez: pooling ncredir3002 running buster - [[phab:T243391|T243391]]
* 06:42 ayounsi@cumin1001: START - Cookbook sre.network.cf
* 13:26 vgutierrez: pooling cp4021 with buster - [[phab:T242093|T242093]]
* 06:28 ayounsi@cumin2002: END (PASS) - Cookbook sre.network.cf (exit_code=0)
* 13:05 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 06:28 ayounsi@cumin2002: START - Cookbook sre.network.cf
* 13:03 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 05:35 ryankemper@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) restart without plugin upgrade (1 nodes at a time) for ElasticSearch cluster cloudelastic: cloudelastic restart - ryankemper@cumin1001 - [[phab:T292814|T292814]]
* 12:51 vgutierrez: depool and reimage ncredir3002 as buster - [[phab:T243391|T243391]]
* 04:56 ryankemper: [WDQS Deploy] Deploy complete. Successful test query placed on query.wikidata.org, there's no relevant criticals in Icinga, and Grafana looks good
* 12:42 vgutierrez: depool & reimage cp4021 as buster - [[phab:T242093|T242093]]
* 04:32 ryankemper: [[phab:T292814|T292814]] Beginning rolling restart of `cloudelastic`: `sudo -i cookbook sre.elasticsearch.rolling-operation cloudelastic "cloudelastic restart" --nodes-per-run 1 --start-datetime 2021-10-08T03:53:49 --task-id [[phab:T292814|T292814]]` on `ryankemper@cumin1001` tmux `elastic`
* 12:08 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 04:31 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation restart without plugin upgrade (1 nodes at a time) for ElasticSearch cluster cloudelastic: cloudelastic restart - ryankemper@cumin1001 - [[phab:T292814|T292814]]
* 12:08 akosiaris@cumin1001: START - Cookbook sre.hosts.downtime
* 04:29 ryankemper: [WDQS Deploy] Restarting `wdqs-categories` across lvs-managed hosts, one node at a time: `sudo -E cumin -b 1 'A:wdqs-all and not A:wdqs-test' 'depool && sleep 45 && systemctl restart wdqs-categories && sleep 45 && pool'`
* 11:58 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 04:28 ryankemper: [WDQS Deploy] Restarted `wdqs-categories` across both test hosts simultaneously: `sudo -E cumin 'A:wdqs-test' 'systemctl restart wdqs-categories'`
* 11:57 akosiaris@cumin1001: START - Cookbook sre.hosts.downtime
* 04:28 ryankemper: [WDQS Deploy] Restarted `wdqs-updater` across all hosts, 4 hosts at a time: `sudo -E cumin -b 4 'A:wdqs-all' 'systemctl restart wdqs-updater'`
* 11:25 vgutierrez: pooling ncredir5001 running buster - [[phab:T243391|T243391]]
* 04:23 ryankemper@deploy1002: Finished deploy [wdqs/wdqs@8f57a56]: 0.3.89 (duration: 08m 22s)
* 11:24 vgutierrez: pooling cp4022 with buster - [[phab:T242093|T242093]]
* 04:20 ryankemper@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) restart without plugin upgrade (1 nodes at a time) for ElasticSearch cluster cloudelastic: cloudelastic restart - ryankemper@cumin1001 - [[phab:T292814|T292814]]
* 11:09 akosiaris: undo wikifeeds experiments
* 04:20 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation restart without plugin upgrade (1 nodes at a time) for ElasticSearch cluster cloudelastic: cloudelastic restart - ryankemper@cumin1001 - [[phab:T292814|T292814]]
* 11:07 akosiaris@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
* 04:18 gehel@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99)
* 10:42 akosiaris@deploy1001: helmfile [EQIAD] Ran 'apply' command on namespace 'wikifeeds' for release 'production' .
* 04:17 gehel@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99)
* 10:40 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 04:15 ryankemper: [WDQS Deploy] Tests passing following deploy of `0.3.89` on canary `wdqs1003`; proceeding to rest of fleet
* 10:37 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 04:14 ryankemper@deploy1002: Started deploy [wdqs/wdqs@8f57a56]: 0.3.89
* 10:36 akosiaris: conduct experiments with stopping/starting uwsgi-ores on ores2001 [[phab:T242705|T242705]]
* 04:14 ryankemper: [WDQS Deploy] Gearing up for deploy of wdqs `0.3.89`. Pre-deploy tests passing on canary `wdqs1003`
* 10:24 akosiaris@deploy1001: helmfile [EQIAD] Ran 'apply' command on namespace 'wikifeeds' for release 'production' .
* 03:58 ryankemper@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) restart without plugin upgrade (1 nodes at a time) for ElasticSearch cluster cloudelastic: cloudelastic restart - ryankemper@cumin1001 - [[phab:T292814|T292814]]
* 10:23 vgutierrez: depool and reimage ncredir5001 as buster - [[phab:T243391|T243391]]
* 03:58 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation restart without plugin upgrade (1 nodes at a time) for ElasticSearch cluster cloudelastic: cloudelastic restart - ryankemper@cumin1001 - [[phab:T292814|T292814]]
* 10:14 vgutierrez: depool & reimage cp4022 as buster - [[phab:T242093|T242093]]
* 02:04 Krinkle: krinkle@deploy1002$ echo 'https://en.wikipedia.org/static/images/project-logos/jvwiktionary.png' {{!}} mwscript purgeList.php , ref [[phab:T287425|T287425]], [[phab:T292810|T292810]]
* 10:02 akosiaris: increase capacity for wikifeeds by 50% [[phab:T244535|T244535]]
* 00:07 tgr_: deploy window over
* 10:02 akosiaris@deploy1001: helmfile [EQIAD] Ran 'apply' command on namespace 'wikifeeds' for release 'production' .
* 00:05 tgr@deploy1002: Synchronized php-1.38.0-wmf.3/extensions/GrowthExperiments: Backport: [[gerrit:727498{{!}}Mentee overview: Make UncachedMenteeOverviewDataProvider::getBlocksForUsers faster (T290609)]] (duration: 00m 56s)
* 10:01 akosiaris@deploy1001: helmfile [CODFW] Ran 'apply' command on namespace 'wikifeeds' for release 'production' .
* 09:53 ema: A:mw: increase keepalive_requests from 100 to 200 https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/570670/ [[phab:T241145|T241145]]
* 09:09 godog: roll restart cassandra instance on restbase-dev
* 09:03 akosiaris@deploy1001: helmfile [EQIAD] Ran 'apply' command on namespace 'wikifeeds' for release 'production' .
* 09:03 godog: restart cassandra on restbase-dev1004 to test logging pipeline onboard
* 09:01 akosiaris@deploy1001: helmfile [CODFW] Ran 'apply' command on namespace 'wikifeeds' for release 'production' .
* 08:59 akosiaris@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'wikifeeds' for release 'staging' .
* 08:58 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1090:3312, db1090:3317', diff saved to https://phabricator.wikimedia.org/P10343 and previous config saved to /var/cache/conftool/dbconfig/20200207-085846-marostegui.json
* 08:54 marostegui: Upgrade db1090:3312, db1090:3317
* 08:54 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1090:3312, db1090:3317 for upgrade', diff saved to https://phabricator.wikimedia.org/P10342 and previous config saved to /var/cache/conftool/dbconfig/20200207-085432-marostegui.json
* 08:44 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1101:3317 [[phab:T239453|T239453]]', diff saved to https://phabricator.wikimedia.org/P10341 and previous config saved to /var/cache/conftool/dbconfig/20200207-084447-marostegui.json
* 08:44 moritzm: installing libexif security updates
* 08:21 akosiaris: deploy https://gerrit.wikimedia.org/r/570726 [[phab:T244535|T244535]] to avoid CPU throttling of wikifeeds
* 08:21 akosiaris@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'wikifeeds' for release 'staging' .
* 07:53 marostegui@cumin1001: dbctl commit (dc=all): 'Increase base weight for db1126', diff saved to https://phabricator.wikimedia.org/P10340 and previous config saved to /var/cache/conftool/dbconfig/20200207-075323-marostegui.json
* 07:52 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1101:3317 [[phab:T239453|T239453]]', diff saved to https://phabricator.wikimedia.org/P10339 and previous config saved to /var/cache/conftool/dbconfig/20200207-075234-marostegui.json
* 07:48 marostegui: Remove revision partitions from db2085:3318 [[phab:T239453|T239453]]
* 07:45 marostegui@cumin1001: dbctl commit (dc=all): 'Fullyy repool db1126 [[phab:T232446|T232446]]', diff saved to https://phabricator.wikimedia.org/P10338 and previous config saved to /var/cache/conftool/dbconfig/20200207-074511-marostegui.json
* 07:44 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2085:3318 [[phab:T239453|T239453]]', diff saved to https://phabricator.wikimedia.org/P10337 and previous config saved to /var/cache/conftool/dbconfig/20200207-074407-marostegui.json
* 07:42 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1101:3317 [[phab:T239453|T239453]]', diff saved to https://phabricator.wikimedia.org/P10336 and previous config saved to /var/cache/conftool/dbconfig/20200207-074258-marostegui.json
* 07:31 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1101:3317 [[phab:T239453|T239453]]', diff saved to https://phabricator.wikimedia.org/P10335 and previous config saved to /var/cache/conftool/dbconfig/20200207-073130-marostegui.json
* 07:30 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1126 [[phab:T232446|T232446]]', diff saved to https://phabricator.wikimedia.org/P10334 and previous config saved to /var/cache/conftool/dbconfig/20200207-073026-marostegui.json
* 06:38 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1126 [[phab:T232446|T232446]]', diff saved to https://phabricator.wikimedia.org/P10333 and previous config saved to /var/cache/conftool/dbconfig/20200207-063831-marostegui.json
* 06:34 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1105:3311 [[phab:T239453|T239453]]', diff saved to https://phabricator.wikimedia.org/P10332 and previous config saved to /var/cache/conftool/dbconfig/20200207-063402-marostegui.json
* 06:31 elukey: force a puppet run on all ores[12] nodes
* 06:27 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1105:3311 [[phab:T239453|T239453]]', diff saved to https://phabricator.wikimedia.org/P10331 and previous config saved to /var/cache/conftool/dbconfig/20200207-062731-marostegui.json
* 06:26 marostegui: Reboot db1107 for update - [[phab:T242702|T242702]]
* 06:25 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1126 [[phab:T232446|T232446]]', diff saved to https://phabricator.wikimedia.org/P10330 and previous config saved to /var/cache/conftool/dbconfig/20200207-062502-marostegui.json
* 06:23 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1105:3311 [[phab:T239453|T239453]]', diff saved to https://phabricator.wikimedia.org/P10329 and previous config saved to /var/cache/conftool/dbconfig/20200207-062345-marostegui.json
* 06:20 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1105:3311 [[phab:T239453|T239453]]', diff saved to https://phabricator.wikimedia.org/P10328 and previous config saved to /var/cache/conftool/dbconfig/20200207-062043-marostegui.json
* 04:49 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 04:46 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 04:16 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 04:14 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 04:13 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 04:11 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 03:51 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 03:49 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 03:42 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 03:40 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 01:27 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 01:25 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 01:24 robh: eqsin pdu work ongoing starting now.  ps1-603 swapping per [[phab:T242250|T242250]]
* 00:13 pt1979@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 00:11 pt1979@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 00:09 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 00:08 pt1979@cumin2001: START - Cookbook sre.hosts.downtime


== 2020-02-06 ==
== 2021-10-07 ==
* 23:44 pt1979@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 23:43 thcipriani@deploy1002: Synchronized wmf-config/logos.php: Config: [[gerrit:708065{{!}}Change Javanese Wiktionary logo (T287425)]] part 3/3 (duration: 00m 55s)
* 23:42 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 23:41 thcipriani@deploy1002: Synchronized logos/config.yaml: Config: [[gerrit:708065{{!}}Change Javanese Wiktionary logo (T287425)]] part 2/3 (duration: 00m 55s)
* 23:37 pt1979@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 23:40 thcipriani@deploy1002: Synchronized static/images/project-logos: Config: [[gerrit:708065{{!}}Change Javanese Wiktionary logo (T287425)]] part 1/3 (duration: 00m 56s)
* 23:35 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 23:30 thcipriani@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:704170{{!}}Adding and use wordmark in trwikiquote (T286133)]] Part 2/2 (duration: 00m 56s)
* 23:25 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [[phab:T244133|T244133]] [cswikisource] Enable VisualEditor in the Edice namespace (duration: 01m 07s)
* 23:28 thcipriani@deploy1002: Synchronized static/images/mobile/copyright/wikiquote-wordmark-tr.svg: Config: [[gerrit:704170{{!}}Adding and use wordmark in trwikiquote (T286133)]] Part 1/2 (duration: 00m 57s)
* 23:22 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [[phab:T159711|T159711]] [[phab:T161365|T161365]] [[phab:T164435|T164435]] [nlwiki] Enable VisualEditor in the Project namespace (duration: 01m 08s)
* 21:35 urbanecm: Password reset for SUL User:LA2-bot ([[phab:T292793|T292793]])
* 23:21 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 20:43 brennen@deploy1002: rebuilt and synchronized wikiversions files: all wikis to 1.38.0-wmf.3
* 23:19 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 20:37 brennen@deploy1002: rebuilt and synchronized wikiversions files: all wikis to 1.38.0-wmf.2  refs [[phab:T281167|T281167]]
* 23:15 pt1979@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 20:35 cmooney@cumin1001: END (PASS) - Cookbook sre.network.cf (exit_code=0)
* 23:13 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 20:35 cmooney@cumin1001: START - Cookbook sre.network.cf
* 23:10 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: [[phab:T244405|T244405]] Don't trying to assign  to  if it's unset (duration: 01m 07s)
* 20:23 krinkle@deploy1002: Synchronized php-1.38.0-wmf.3/extensions/Gadgets/: {{Gerrit|I7c858b8c4bc}} (duration: 00m 56s)
* 22:50 jforrester@deploy1001: Synchronized php-1.35.0-wmf.18/extensions/VisualEditor: [[phab:T242184|T242184]] Change tags method so anon edits will go through (duration: 01m 08s)
* 20:01 urbanecm@deploy1002: Synchronized php-1.38.0-wmf.3/extensions/Echo/: {{Gerrit|8a7ff05ba28f302adb581bf430a868bb815b4ffd}}: Revert "Use namespaced CentralAuthSessionProvider" (duration: 00m 57s)
* 22:42 pt1979@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 19:45 urbanecm@deploy1002: Synchronized php-1.38.0-wmf.3/extensions/CentralAuth/: {{Gerrit|c01c2e4983bad8582ddd62aeb35ac9be852d493b}}: Revert "Namespace session providers" (duration: 00m 57s)
* 22:40 pt1979@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 19:44 urbanecm: Backporting https://gerrit.wikimedia.org/r/c/mediawiki/extensions/CentralAuth/+/727489, https://gerrit.wikimedia.org/r/c/mediawiki/extensions/Echo/+/727487 in an unsafe way -- exceptions at testwikis expected, wmf.3 is not deployed elsewhere, so this should be ok
* 22:39 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 19:37 brennen@deploy1002: rebuilt and synchronized wikiversions files: Revert all wikis to 1.38.0-wmf.2 ([[phab:T281167|T281167]])
* 22:38 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 19:33 brennen: 1.38.0-wmf.3 train ([[phab:T281167|T281167]]): variously blocked, rolling back to testwikis for safe deploy of backports
* 22:18 pt1979@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 19:14 brennen@deploy1002: rebuilt and synchronized wikiversions files: Revert group2 wikis to 1.38.0-wmf.2
* 22:15 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 19:07 brennen@deploy1002: rebuilt and synchronized wikiversions files: all wikis to 1.38.0-wmf.3  refs [[phab:T281167|T281167]]
* 22:13 mutante: turning mw2271 and mw2163 into canary appservers for codfw, this adds mediawiki-testers shell users and removes scap sql scripts, rest stays as is ([[phab:T242606|T242606]])
* 19:03 brennen: 1.38.0-wmf.3 train ([[phab:T281167|T281167]]): unblocked, rolling to all wikis
* 21:54 pt1979@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 18:50 urbanecm: [urbanecm@mwmaint1002 /srv/mediawiki/php]$ mwscript extensions/GrowthExperiments/maintenance/initWikiConfig.php --wiki=test2wiki
* 21:52 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 18:46 sukhe: running authdns-update for [[phab:T292537|T292537]]
* 21:40 twentyafterfour: train blocked due to serious incident related to deploying the latest branch. Incident documentation: https://wikitech.wikimedia.org/wiki/Incident_documentation/20200206-mediawiki refs [[phab:T233866|T233866]]
* 18:29 urbanecm: Morning B&C window done
* 21:30 pt1979@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 18:28 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|4a946c046ae17a520f8d3463a16b1435ceb4856c}}: Deploy Growth mentor dashboard to pilot wikis ([[phab:T278920|T278920]]) (duration: 01m 04s)
* 21:27 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 18:23 urbanecm@deploy1002: Synchronized dblists/growthexperiments.dblist: {{Gerrit|87e300137c14451949fac12c3ec89319305a423e}}: Deploy Growth features to test2wiki (duration: 01m 03s)
* 21:05 pt1979@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 18:21 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|87e300137c14451949fac12c3ec89319305a423e}}: Deploy Growth features to test2wiki (duration: 01m 04s)
* 21:03 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 18:20 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|31770f2b3660e7d7490c0a9ab66285c1f069732d}}: shwiki: Deploy Growth features to newcomers ([[phab:T278240|T278240]]) (duration: 01m 04s)
* 20:52 akosiaris: restart all wikifeeds pods
* 18:15 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|33526dfed148068585289f5ac501feda72068fd9}}: Stream config changes for android_daily_stats schema ([[phab:T286000|T286000]]) (duration: 01m 06s)
* 20:48 akosiaris@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
* 18:10 ejegg: updated payments-wiki from {{Gerrit|6d3560d083}} to {{Gerrit|030b11da1a}}
* 20:45 akosiaris: restart restbase on restbase1027
* 18:07 arnoldokoth: gitlab2001 re-image complete ([[phab:T283076|T283076]])
* 20:32 twentyafterfour@deploy1001: rebuilt and synchronized wikiversions files: (no justification provided)
* 17:30 mutante: rebooting gitlab2001.wikimedia.org
* 20:30 twentyafterfour: sync-wikiversions --force
* 16:56 arnoldokoth: down timing gitlab2001 for re-imaging ([[phab:T283076|T283076]])
* 20:30 twentyafterfour@deploy1001: Scap failed!: 9/11 canaries failed their endpoint checks(http://en.wikipedia.org)
* 16:47 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on gitlab2001.wikimedia.org with reason: reimage
* 20:25 twentyafterfour@deploy1001: rebuilt and synchronized wikiversions files: all wikis to 1.35.0-wmf.18  refs [[phab:T233866|T233866]]
* 16:47 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on gitlab2001.wikimedia.org with reason: reimage
* 19:45 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: [[phab:T244405|T244405]] Set wgLogoHD before adding wordmark (duration: 01m 06s)
* 16:32 hnowlan: roll restarting maps cassandra instances for java updates
* 19:36 bblack: re-pool cp1075 (eqiad text)
* 16:19 ayounsi@cumin2002: END (PASS) - Cookbook sre.network.cf (exit_code=0)
* 19:33 addshore: SWAT done!
* 16:19 ayounsi@cumin2002: START - Cookbook sre.network.cf
* 19:32 addshore@deploy1001: Synchronized php-1.35.0-wmf.18/extensions/WikibaseLexemeCirrusSearch: [[phab:T244479|T244479]] Update namespace for PrefetchingTermLookup & fix tests (duration: 01m 06s)
* 16:18 ayounsi@cumin1001: END (FAIL) - Cookbook sre.network.cf (exit_code=99)
* 19:31 bblack: depool cp1075 (eqiad text) for minor experimentation
* 16:18 ayounsi@cumin1001: START - Cookbook sre.network.cf
* 19:29 addshore@deploy1001: Synchronized php-1.35.0-wmf.16/extensions/Babel/includes/Babel.php: [[phab:T243713|T243713]] Timeout for meta api call from 10 to 2 seconds. (duration: 01m 07s)
* 16:18 ayounsi@cumin1001: END (FAIL) - Cookbook sre.network.cf (exit_code=99)
* 19:28 addshore@deploy1001: Synchronized php-1.35.0-wmf.18/extensions/Babel/includes/Babel.php: [[phab:T243713|T243713]] Timeout for meta api call from 10 to 2 seconds. (duration: 01m 07s)
* 16:18 ayounsi@cumin1001: START - Cookbook sre.network.cf
* 19:25 addshore@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Fix incorrect spellings of "RESTBase" in config variables (2/2) 2.IS (duration: 01m 06s)
* 15:07 hashar@deploy1002: Finished deploy [gerrit/gerrit@13cef9f]: Gerrit to 3.3.6 on gerrit1001 (duration: 00m 08s)
* 19:23 addshore@deploy1001: Synchronized wmf-config/CommonSettings.php: Fix incorrect spellings of "RESTBase" in config variables (2/2) 1.CS (duration: 01m 07s)
* 15:07 hashar@deploy1002: Started deploy [gerrit/gerrit@13cef9f]: Gerrit to 3.3.6 on gerrit1001
* 19:23 cdanis: manual puppet run on netflow1001 looked good; ✔️ cdanis@cumin1001.eqiad.wmnet ~ 🕑☕ sudo cumin A:netflow "run-puppet-agent --enable 'rollout of {{Gerrit|I60692f0e8}} [[phab:T237587|T237587]] cdanis'"
* 14:50 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 19:22 pt1979@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 14:49 hashar@deploy1002: Finished deploy [gerrit/gerrit@13cef9f]: Gerrit to 3.3.6 on gerrit2001 (duration: 00m 10s)
* 19:20 addshore@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Fix incorrect spellings of "RESTBase" in config variables (1/2) (duration: 01m 06s)
* 14:49 hashar@deploy1002: Started deploy [gerrit/gerrit@13cef9f]: Gerrit to 3.3.6 on gerrit2001
* 19:20 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 14:48 hashar: Upgrading Gerrit replica to 3.3.6 # [[phab:T290236|T290236]]
* 19:14 addshore@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enable EntitySourceBasedFederation everywhere [[phab:T243395|T243395]], sync again for luck (duration: 01m 06s)
* 14:48 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 19:12 cdanis: ✔️ cdanis@cumin1001.eqiad.wmnet ~ 🕑☕ sudo cumin A:netflow "disable-puppet 'rollout of {{Gerrit|I60692f0e8}} [[phab:T237587|T237587]] cdanis'"
* 14:34 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 19:10 addshore@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enable EntitySourceBasedFederation everywhere [[phab:T243395|T243395]] (duration: 01m 07s)
* 14:31 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 19:05 addshore@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enable EntitySourceBasedFederation for group1 [[phab:T243395|T243395]] (duration: 01m 10s)
* 13:56 jiji@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
* 19:01 moritzm: restarting exim on mendelevium to pick up cyrus-sasl security updates
* 13:46 jiji@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
* 18:58 pt1979@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 13:39 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 18:56 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 13:30 jiji@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
* 18:55 moritzm: restarting apache on tungsten/dbmonitor to pick up cyrus-sasl security updates
* 13:29 jiji@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
* 18:53 mholloway-shell@deploy1001: Finished deploy [mobileapps/deploy@8e15868]: Update mobileapps to {{Gerrit|ceeb950}} (duration: 06m 27s)
* 13:29 hashar: restarting CI Jenkins for git plugin update
* 18:46 mholloway-shell@deploy1001: Started deploy [mobileapps/deploy@8e15868]: Update mobileapps to {{Gerrit|ceeb950}}
* 13:19 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 18:36 pt1979@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 13:15 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 18:34 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 13:14 hashar: Upgraded CI Jenkins on contint2001
* 18:06 pt1979@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 13:14 jiji@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
* 18:04 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 13:13 jiji@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
* 17:32 herron: set performance cpu scaling governor on maps*
* 13:10 jiji@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
* 16:49 vgutierrez: pooling ncredir5002 running buster - [[phab:T243391|T243391]]
* 13:09 jiji@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
* 16:38 vgutierrez: pooling cp4023 with buster - [[phab:T242093|T242093]]
* 13:06 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 16:36 ebernhardson@deploy1001: Finished deploy [wikimedia/discovery/analytics@524be2b]: airflow: Update ores data transfer from drafttopic -> articletopic (duration: 00m 19s)
* 13:06 volans@cumin2002: END (PASS) - Cookbook sre.experimental.reimage (exit_code=0) for host sretest1001.eqiad.wmnet
* 16:35 ebernhardson@deploy1001: Started deploy [wikimedia/discovery/analytics@524be2b]: airflow: Update ores data transfer from drafttopic -> articletopic
* 13:05 jiji@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
* 16:35 XioNoX: remove AS prepending in esams/knams
* 13:05 jiji@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
* 16:31 bblack: lvs1013 - restart pybal for dual bgp session config - [[phab:T180069|T180069]]
* 12:56 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 16:30 bblack: lvs1014 - restart pybal for dual bgp session config - [[phab:T180069|T180069]]
* 12:56 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 16:30 bblack: lvs1015 - restart pybal for dual bgp session config - [[phab:T180069|T180069]]
* 12:51 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 16:29 bblack: lvs1016 - restart pybal for dual bgp session config - [[phab:T180069|T180069]]
* 12:40 volans@cumin2002: START - Cookbook sre.experimental.reimage for host sretest1001.eqiad.wmnet
* 16:28 moritzm: restarting apache on bromine to pick up SASL security updates
* 12:16 moritzm: installing testvm2005
* 16:24 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 11:59 moritzm: installing openssl security updates for stretch (buster/bullseye already fixed)
* 16:22 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 11:52 Lucas_WMDE: EU backport+config window (aka UTC morning) done
* 16:22 moritzm: installing cyrus-sasl2 security updates on jessie
* 11:52 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 16:20 bblack: lvs2001 - restart pybal for dual bgp session config - [[phab:T180069|T180069]]
* 11:50 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:725858{{!}}Enable Content and Section Translation to Kurdish WP (T290238)]] (duration: 01m 04s)
* 16:19 bblack: lvs2002 - restart pybal for dual bgp session config - [[phab:T180069|T180069]]
* 11:49 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 16:19 bblack: lvs2003 - restart pybal for dual bgp session config - [[phab:T180069|T180069]]
* 11:44 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.38.0-wmf.3/extensions/WikidataPageBanner/includes/WikidataPageBannerFunctions.php: Backport: [[gerrit:727188{{!}}Change PropertyId to NumericPropertyId (T289125, T292667)]] (duration: 01m 05s)
* 16:07 vgutierrez: depool and reimage ncredir5002 as buster - [[phab:T243391|T243391]]
* 11:41 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 16:07 bblack: lvs4005 - restart pybal for dual bgp session config - [[phab:T180069|T180069]]
* 11:38 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 16:06 bblack: lvs4006 - restart pybal for dual bgp session config - [[phab:T180069|T180069]]
* 11:10 jbond: update puppet stdlib gerrit:726872
* 16:06 bblack: lvs4007 - restart pybal for dual bgp session config - [[phab:T180069|T180069]]
* 09:36 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 16:03 vgutierrez: depool & reimage cp4023 as buster - [[phab:T242093|T242093]]
* 09:31 jmm@cumin2002: START - Cookbook sre.dns.netbox
* 16:03 vgutierrez: pooling cp4024 with buster - [[phab:T242093|T242093]]
* 09:27 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts testvm2004.codfw.wmnet
* 15:59 akosiaris: repool eventgate-analytics/eqiad. Experiment proved the failover wouldn't cause (on it's own) a problem. Experiment done.
* 09:26 mvernon@cumin2002: END (PASS) - Cookbook sre.experimental.reimage (exit_code=0) for host ms-be2045.codfw.wmnet
* 15:58 akosiaris@cumin1001: conftool action : set/pooled=true; selector: name=eqiad,dnsdisc=eventgate-analytics
* 09:25 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host testvm2005.codfw.wmnet
* 15:57 halfak@deploy1001: Finished deploy [ores/deploy@50a101a]: [[phab:T242705|T242705]] (duration: 04m 35s)
* 09:19 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts testvm2004.codfw.wmnet
* 15:56 vgutierrez: pooling ncredir4001 running buster - [[phab:T243391|T243391]]
* 09:08 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host testvm2005.codfw.wmnet
* 15:55 moritzm: installing qemu security updates
* 08:49 mvernon@cumin2002: START - Cookbook sre.experimental.reimage for host ms-be2045.codfw.wmnet
* 15:54 bblack: lvs5001 - restart pybal for dual bgp session config - [[phab:T180069|T180069]]
* 08:36 moritzm: imported jenkins 2.303.2 to thirdparty/ci component for buster-wikimedia
* 15:53 bblack: lvs5002 - restart pybal for dual bgp session config - [[phab:T180069|T180069]]
* 07:57 Emperor: re-enabling puppet on ms-be2045 after hw work [[phab:T290881|T290881]]
* 15:53 halfak@deploy1001: Started deploy [ores/deploy@50a101a]: [[phab:T242705|T242705]]
* 07:39 jayme@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'sync'.
* 15:52 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 07:39 jayme@deploy1002: helmfile [eqiad] START helmfile.d/admin 'sync'.
* 15:52 bblack: lvs5003 - restart pybal for dual bgp session config - [[phab:T180069|T180069]]
* 07:38 jayme@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'sync'.
* 15:50 moritzm: installing python-ecdsa security updates
* 07:37 jayme@deploy1002: helmfile [codfw] START helmfile.d/admin 'sync'.
* 15:50 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 07:34 jayme@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'sync'.
* 15:41 moritzm: installing jsoup security updates
* 07:33 jayme@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'sync'.
* 15:30 vgutierrez: depool & reimage ncredir4001 as buster - [[phab:T243391|T243391]]
* 07:32 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'sync'.
* 15:29 vgutierrez: depool & reimage cp4024 as buster - [[phab:T242093|T242093]]
* 07:31 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'sync'.
* 15:28 vgutierrez: pooling ncredir4002 running buster - [[phab:T243391|T243391]]
* 06:21 ryankemper: [Elastic] Restart of `relforge` complete
* 15:27 moritzm: installing sudo security updates on jessie
* 06:05 ryankemper: [Elastic] Cluster in green status, proceeding to next and final node => `ryankemper@relforge1003:~$ sudo systemctl restart elasticsearch_6@relforge-eqiad-small-alpha.service && sudo systemctl restart elasticsearch_6@relforge-eqiad.service`
* 15:23 vgutierrez: pooling cp4025 with buster - [[phab:T242093|T242093]]
* 05:53 ryankemper: [Elastic] `ryankemper@relforge1004:~$ sudo systemctl restart elasticsearch_6@relforge-eqiad-small-alpha.service && sudo systemctl restart elasticsearch_6@relforge-eqiad.service`
* 15:14 ema: A:mw-api: force puppet run to increase keepalive_requests from 100 to 200 https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/570670/ [[phab:T241145|T241145]]
* 05:48 ryankemper: [Elastic] Performing rolling restarts of `relforge`. `relforge1003` is the master so I'll restart `relforge1004` first to minimize disruption
* 15:09 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 03:00 ejegg: updated payments-wiki from {{Gerrit|23d0ffac66}} to {{Gerrit|6d3560d083}}
* 15:07 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 02:30 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 14:59 godog: extend graphite1004 / graphite2003 fs +200G
* 02:28 tstarling@deploy1002: Synchronized wmf-config/CommonSettings.php: enable Parsoid API everywhere (duration: 01m 04s)
* 14:56 vgutierrez: depool and reimage ncredir4002 as buster - [[phab:T243391|T243391]]
* 02:27 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 14:46 vgutierrez: depool & reimage cp4025 as buster - [[phab:T242093|T242093]]
* 00:11 mutante: [grafana2001:~] $ sudo systemctl start rsync-var-lib-grafana  because of "PROBLEM - Check systemd state on grafana2001 is CRITICAL: CRITICAL - degraded" because of some race condition where a file vanished during sync
* 14:16 akosiaris: 20mins in with eventgate-analytics/eqiad depooled from discovery, no issues yet.
* 14:14 ema: run puppet on mw-api-canary to revert nginx keepalive_requests bump [[phab:T241145|T241145]]
* 13:55 marostegui: Stop MySQL on es1019, upgrade and poweroff for on-site maintenance - [[phab:T243963|T243963]]
* 13:54 akosiaris@cumin1001: conftool action : set/pooled=false; selector: name=eqiad,dnsdisc=eventgate-analytics
* 13:53 akosiaris: depool eqiad eventgate-analytics for testing purposes. Requests will flow to codfw, monitoring https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?orgId=1&from=now-30m&to=now for issues.
* 13:51 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es1019 for onsite maintenance [[phab:T243963|T243963]]', diff saved to https://phabricator.wikimedia.org/P10321 and previous config saved to /var/cache/conftool/dbconfig/20200206-135157-marostegui.json
* 13:45 XioNoX: rollback deactivate BGP transits on cr3-knams
* 13:34 elukey: repool mw1347 with mcrouter running with 10 proxy threads (was: 5)
* 13:31 XioNoX: reboot cr3-knams
* 13:31 elukey: depool mw1347 to test some mcrouter settings
* 13:27 XioNoX: deactivate BGP transits on cr3-knams
* 13:22 vgutierrez: Enable server session sharing on ats-tls in cp4031 - [[phab:T244464|T244464]]
* 13:10 XioNoX: rollback: deactivate BGP transits on cr2-eqsin
* 13:00 XioNoX: reboot cr2-eqsin for sw upgrade
* 13:00 addshore: SWAT done
* 13:00 addshore@deploy1001: Synchronized wmf-config/InitialiseSettings.php: resync REVERT Enable EntitySourceBasedFederation for group1 (duration: 01m 07s)
* 12:59 XioNoX: deactivate BGP transits on cr2-eqsin
* 12:58 addshore@deploy1001: Synchronized wmf-config/InitialiseSettings.php: REVERT Enable EntitySourceBasedFederation for group1 [[phab:T243395|T243395]], due to [[phab:T244479|T244479]] (duration: 01m 07s)
* 12:52 addshore@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enable EntitySourceBasedFederation for group1 [[phab:T243395|T243395]] (duration: 01m 06s)
* 12:46 addshore@deploy1001: Synchronized php-1.35.0-wmf.18/extensions/Babel: REVERT Fetch central babel information over SQL query, not API ([[phab:T243726|T243726]]) (duration: 01m 07s)
* 12:44 addshore@deploy1001: sync-file aborted: Fetch central babel information over SQL query, not API ([[phab:T243726|T243726]]) (duration: 01m 04s)
* 12:40 vgutierrez: pooling cp3065 - [[phab:T242093|T242093]]
* 12:39 addshore@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enable EntitySourceBasedFederation for group0 [[phab:T243395|T243395]] (duration: 01m 07s)
* 12:34 cparle@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Re-enable delayed new upload jobs for MachineVision extension (duration: 01m 08s)
* 12:26 cparle@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Remove handler deleted from the MachineVision extension (duration: 01m 05s)
* 12:25 XioNoX: remove full-duplex statement from eqsin Tata link (not supported on Junos 18, as 10G is full duplex anyway)
* 12:24 cparle@deploy1001: Synchronized php-1.35.0-wmf.18/extensions/MachineVision: Use the wbsetclaim API to add depicts statements (duration: 01m 09s)
* 12:07 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: {{Gerrit|5e1cbb2}}: Enable CX in te, kn, gu, mr and pawiki as a default tool ([[phab:T243271|T243271]], [[phab:T243272|T243272]], [[phab:T243273|T243273]], [[phab:T243274|T243274]], [[phab:T243275|T243275]]) (duration: 01m 09s)
* 11:41 akosiaris: upgrade etherpad-lite on etherpad1002 to 1.8.0-1
* 11:38 kart_: Updated cxserver to 2020-02-05-051751-production ([[phab:T244230|T244230]], [[phab:T234323|T234323]])
* 11:35 kartik@deploy1001: helmfile [EQIAD] Ran 'apply' command on namespace 'cxserver' for release 'production' .
* 11:33 akosiaris: upload etherpad-lite_1.8.0-1 to apt.wikimedia.org buster-wikimedia/main
* 11:31 kartik@deploy1001: helmfile [CODFW] Ran 'apply' command on namespace 'cxserver' for release 'production' .
* 11:28 kartik@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'cxserver' for release 'staging' .
* 11:14 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 11:11 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 10:21 akosiaris: undo "switchover selectively eventgate-analytics.discovery.wmnet to codfw for mw1331 and mw1348". no effect observed
* 10:20 akosiaris: undo "switchover selectively eventgate-analytics.discovery.wmnet to codfw for mw1331 and mw1348"
* 10:19 vgutierrez: Enabling HTTP keepalive between ats-tls and varnish-frontend on cp4031 - [[phab:T244464|T244464]]
* 10:00 vgutierrez: depool and reimage cp3065 as buster - [[phab:T242093|T242093]]
* 09:59 vgutierrez: upload trafficserver 8.0.5-1wm14 to apt.wm.o (buster) - [[phab:T242093|T242093]]
* 09:08 dcausse@deploy1001: Finished deploy [wdqs/wdqs@4306c64]: deploying wdqs 0.3.14-SNAPSHOT and gui {{Gerrit|5a1af3b}} (duration: 11m 41s)
* 08:56 dcausse@deploy1001: Started deploy [wdqs/wdqs@4306c64]: deploying wdqs 0.3.14-SNAPSHOT and gui {{Gerrit|5a1af3b}}
* 08:45 dcausse@deploy1001: Finished deploy [wdqs/wdqs@4306c64]: deploying wdqs 0.3.14-SNAPSHOT and gui {{Gerrit|5a1af3b}} to wdqs1010.eqiad.wmnet (duration: 00m 29s)
* 08:44 dcausse@deploy1001: Started deploy [wdqs/wdqs@4306c64]: deploying wdqs 0.3.14-SNAPSHOT and gui {{Gerrit|5a1af3b}} to wdqs1010.eqiad.wmnet
* 08:23 marostegui: Reboot dbproxy1012 and dbproxy1014 for upgrade
* 08:18 dcausse: restarting blazegraph on wdqs1006: [[phab:T242453|T242453]]
* 08:17 akosiaris: switchover selectively eventgate-analytics.discovery.wmnet to codfw for mw1331 and mw1348 to
* 06:59 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1101:3317 - [[phab:T239453|T239453]]', diff saved to https://phabricator.wikimedia.org/P10319 and previous config saved to /var/cache/conftool/dbconfig/20200206-065906-marostegui.json
* 06:52 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1098:3317 - [[phab:T239453|T239453]]', diff saved to https://phabricator.wikimedia.org/P10318 and previous config saved to /var/cache/conftool/dbconfig/20200206-065238-marostegui.json
* 06:46 elukey: run puppet on all ores[12]* nodes
* 02:49 dzahn@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
* 02:42 mutante: ganeti - Creating new VM named install2003.codfw.wmnet in codfw with row=A vcpu=1 memory=1 gigabytes disk=20 gigabytes link=private ([[phab:T244390|T244390]])
* 02:39 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm
* 02:30 dzahn@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
* 02:21 mutante: ganeti - Creating new VM named install1003.eqiad.wmnet in eqiad with row=C vcpu=1 memory=1 gigabytes disk=20 gigabytes link=private ([[phab:T244390|T244390]])
* 02:20 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm


== 2020-02-05 ==
== 2021-10-06 ==
* 23:30 ebernhardson: delete search indices duplicated on multiple clusters for: hywwiki, chrwiktionary, gcrwiki, mnwwiki, noboard_chapterswikimedia nqowiki nrmwiki outreachwiki and srnwiki
* 23:57 mutante: releases2002 - rm /srv/org/wikimedia/reprepro/conf/distributions - contains only jessie-mediawiki - see 725670 and EOL of MediaWiki 1.31
* 23:08 mholloway-shell@deploy1001: Finished deploy [mobileapps/deploy@a51f927]: Update mobileapps to {{Gerrit|a7928fa}} (duration: 10m 48s)
* 23:32 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 22:57 mholloway-shell@deploy1001: Started deploy [mobileapps/deploy@a51f927]: Update mobileapps to {{Gerrit|a7928fa}}
* 23:29 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 22:07 mutante: Gerrit - added ppchelko to 'wmf-deployment' Gerrit group (he is already in deployment admin group) ([[phab:T244389|T244389]])
* 23:21 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 21:37 arlolra@deploy1001: Finished deploy [parsoid/deploy@01d9d3d]: Updating Parsoid to {{Gerrit|74730a3}} (duration: 03m 07s)
* 23:21 jforrester@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:726955{{!}}Adding and use wordmark in ckbwiki (T288368)]] (duration: 01m 04s)
* 21:33 arlolra@deploy1001: Started deploy [parsoid/deploy@01d9d3d]: Updating Parsoid to {{Gerrit|74730a3}}
* 23:20 jforrester@deploy1002: Synchronized static/images/mobile/copyright/wikipedia-wordmark-ckb.svg: Config: [[gerrit:726955{{!}}Adding and use wordmark in ckbwiki (T288368)]] (duration: 01m 04s)
* 21:31 mutante: killing and restarting wikibugs, it was reporting each update twice
* 23:18 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 20:51 joal@deploy1001: Finished deploy [analytics/refinery@a47f0d5] (thin): Analytics regular weekly deploy (duration: 00m 07s)
* 23:16 jforrester@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:726603{{!}}Enable NewUserMessage for ptwikivoyage (T290820)]] (duration: 01m 05s)
* 20:51 joal@deploy1001: Started deploy [analytics/refinery@a47f0d5] (thin): Analytics regular weekly deploy
* 22:30 mutante: re-enabling puppet on mw*, an-worker* after deploying gerrit:726954. no issue this time
* 20:51 joal@deploy1001: Finished deploy [analytics/refinery@a47f0d5]: Analytics regular weekly deploy (duration: 13m 28s)
* 22:23 mutante: temp. disabling puppet on an-worker*, mw*
* 20:50 mutante: ores1004 - systemctl start celery-ores-worker
* 20:50 mutante: global puppet failure - revert is merged, puppet run will recover on next run everywhere. partially forcing with cumin, partially letting it recover naturally
* 20:45 twentyafterfour@deploy1001: Synchronized php: group1 wikis to 1.35.0-wmf.18  refs [[phab:T233866|T233866]] (duration: 01m 07s)
* 20:43 mutante: [cumin1001:~] $ sudo cumin -b 5 -p 95 'mw2*' 'run-puppet-agent -q --failed-only'
* 20:44 twentyafterfour@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.35.0-wmf.18  refs [[phab:T233866|T233866]]
* 19:08 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 20:37 joal@deploy1001: Started deploy [analytics/refinery@a47f0d5]: Analytics regular weekly deploy
* 19:06 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 20:34 dzahn@cumin1001: conftool action : set/weight=25; selector: name=mw1269.eqiad.wmnet
* 19:05 brennen@deploy1002: Synchronized php: group1 wikis to 1.38.0-wmf.3  refs [[phab:T281167|T281167]] (duration: 01m 03s)
* 20:25 dzahn@cumin1001: conftool action : set/weight=25; selector: name=mw1267.eqiad.wmnet
* 19:04 brennen@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.38.0-wmf.refs [[phab:T281167|T281167]]
* 20:25 mutante: mw1267 restarting php7.2-fpm
* 19:01 brennen: 1.38.0-wmf.3 train ([[phab:T281167|T281167]]): still unblocked after triage meeting, rolling to group1
* 20:21 joal@deploy1001: Finished deploy [analytics/hdfs-tools/deploy@714e2d0]: Deploy bug fix version (duration: 00m 08s)
* 18:52 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 20:21 joal@deploy1001: Started deploy [analytics/hdfs-tools/deploy@714e2d0]: Deploy bug fix version
* 18:50 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 20:09 twentyafterfour: Preparing to deploy wmf/1.35.0-wmf.18 to group1 wikis refs [[phab:T233866|T233866]]
* 18:44 legoktm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Revert disabling static mapframes on eswiki (duration: 01m 14s)
* 20:09 moritzm: installing git security updates for jessie
* 18:36 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 20:00 moritzm: installing unzip security updates
* 18:34 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 19:44 mutante: LDAP - added spramduya to wmf group ([[phab:T243802|T243802]])
* 18:31 legoktm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: eswiki: Disable static mapframes ([[phab:T291736|T291736]]) (duration: 01m 17s)
* 19:38 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Clean up VisualEditor settings (duration: 01m 07s)
* 18:25 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 19:38 ebernhardson: restart mjolnir-kafka-bulk-daemon across eqiad, daemons appear stuck and not reading new messages
* 18:22 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 19:19 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [[phab:T238029|T238029]] Enable InukaPageView logging on production Wikipedias (duration: 01m 07s)
* 18:22 legoktm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: viwikibooks: Set $wgRestrictDisplayTitle to false ([[phab:T289837|T289837]]) (duration: 01m 21s)
* 19:15 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: Sync back revert of {{Gerrit|975b4bbb9}} (duration: 01m 06s)
* 17:16 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 19:10 jforrester@deploy1001: scap failed: average error rate on 4/11 canaries increased by 10x (rerun with --force to override this check, see https://logstash.wikimedia.org/goto/db09a36be5ed3e81155041f7d46ad040 for details)
* 17:10 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 18:35 vgutierrez: pooling cp5012 - [[phab:T242093|T242093]]
* 16:53 bd808@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'toolhub' for release 'main' .
* 18:23 vgutierrez: rebooting cp5012 - [[phab:T242093|T242093]]
* 16:47 brennen@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.38.0-wmf.3  refs [[phab:T281167|T281167]]
* 18:21 elukey: restart memcached on mc1025 with 8 threads (rollback - revert https://gerrit.wikimedia.org/r/#/c/570370/, run puppet, restart memcached)
* 16:47 bd808@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'toolhub' for release 'main' .
* 17:51 mutante: ganeti1017 - rebooting (not in use yet)
* 16:43 brennen: 1.38.0-wmf.3 train ([[phab:T281167|T281167]]): unblocked, rolling to group0
* 17:34 reedy@deploy1001: Synchronized php-1.35.0-wmf.18/languages/: [[phab:T244300|T244300]] (duration: 01m 13s)
* 16:41 bd808@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'toolhub' for release 'main' .
* 17:33 reedy@deploy1001: Synchronized php-1.35.0-wmf.18/includes/: [[phab:T244300|T244300]] (duration: 01m 14s)
* 16:35 brennen@deploy1002: Synchronized php-1.38.0-wmf.3/extensions/Scribunto/includes/engines/LuaCommon/LanguageLibrary.php: Backport: [[gerrit:726596{{!}}Replace deprecated ParserOptions::getUser with ::getUserIdentity (T292589)]] (duration: 01m 04s)
* 16:53 urandom: Sessionstore deployment (mediawiki-config) is done
* 16:35 jynus: stopping db1127 for hw maintenance [[phab:T292366|T292366]]
* 16:37 ppchelko@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:569678]] Config: Enable sessionstore on group0 and 1 [[phab:T243106|T243106]] (duration: 01m 08s)
* 16:31 jynus@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1127.eqiad.wmnet with reason: hw maintenance
* 16:25 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: [[phab:T232140|T232140]] Restore wgLogoHD to wikis without a MinervaCustomLogos defined (duration: 01m 09s)
* 16:31 jynus@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1127.eqiad.wmnet with reason: hw maintenance
* 16:07 elukey: update puppet compiler's facts
* 16:28 brennen@deploy1002: Synchronized php-1.38.0-wmf.2/extensions/Scribunto/includes/engines/LuaCommon/LanguageLibrary.php: Backport: [[gerrit:726597{{!}}Replace deprecated ParserOptions::getUser with ::getUserIdentity (T292589)]] (duration: 01m 10s)
* 15:54 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 16:22 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 15:52 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 16:19 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 15:29 effie: restart php-fpm on canaries - [[phab:T236800|T236800]]
* 16:11 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 15:24 effie: Rollout php-apcu_5.1.17+4.0.11-1+0~20190217111312.9+stretch~1.gbp192528+wmf2 to api, app and jobrunner canaries - [[phab:T236800|T236800]]
* 16:08 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 15:15 vgutierrez: depooling & reimaging cp5012 as buster - [[phab:T242093|T242093]]
* 16:01 volans@cumin2002: END (PASS) - Cookbook sre.experimental.reimage (exit_code=0) for host sretest1002.eqiad.wmnet
* 15:12 ema: cp: unset Accept-Encoding from ats-be requests to applayer [[phab:T242478|T242478]]
* 15:45 brennen: 1.38.0-wmf.3 train ([[phab:T281167|T281167]]): proceeding to deploy backports for [[phab:T292589|T292589]]
* 14:35 vgutierrez: updating acme-chief to version 0.24 - [[phab:T244236|T244236]]
* 15:37 volans@cumin2002: START - Cookbook sre.experimental.reimage for host sretest1002.eqiad.wmnet
* 14:32 _joe_: restarting mcrouter at nice -19 on mw1331 for testing effects of that change
* 15:35 volans: installer spicerack 1.0.4 on cumin2002
* 14:30 vgutierrez: upload acme-chief 0.24 to apt.wm.o (buster) - [[phab:T244236|T244236]]
* 12:50 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 14:26 XioNoX: push inital flowspec config to all routers
* 12:48 volans: uploaded spicerack_1.0.4 to apt.wikimedia.org buster-wikimedia,bullseye-wikimedia
* 14:23 vgutierrez: pooling cp5006 - [[phab:T242093|T242093]]
* 12:26 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host testvm2004.codfw.wmnet
* 14:13 ema: cp1075: back to leaving Accept-Encoding as it is due to unrelated applayer issues [[phab:T242478|T242478]]
* 12:23 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 13:46 marostegui: Decrease buffer pool size on db1107 for testing - [[phab:T242702|T242702]]
* 12:18 effie: pool mw1455 mw1422
* 13:45 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 12:17 urbanecm: wikiadmin@10.64.0.164(viwiki)> delete from growthexperiments_mentee_data; # cleanup after disabling mentor dashboard backend
* 13:43 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 12:16 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host testvm2004.codfw.wmnet
* 13:42 akosiaris: undo the manually set 10.2.1.42 eventgate-analytics.discovery.wmnet in /etc/hosts for mw1331, mw1348. Verify hypothesis that this should cause increased latency. Restart php-fpm
* 12:13 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|1aa67d4846f39f59127a835cb7a8ed2974506025}}: viwiki: Disable mentor dashboard backend ([[phab:T278920|T278920]]) (duration: 01m 06s)
* 13:41 ema: cp1075: unset Accept-Encoding on origin server requests [[phab:T242478|T242478]]
* 12:02 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 13:39 Amir1: EU SWAT is done
* 11:59 jmm@cumin2002: START - Cookbook sre.dns.netbox
* 13:38 ema: cp: disable puppet and merge https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/570311/ [[phab:T242478|T242478]]
* 11:56 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts testvm2003.codfw.wmnet
* 13:35 XioNoX: rollback traffic steering off cr2-eqord
* 11:55 XioNoX: esams - Advertise 185.15.59.0/24 instead of 185.15.58.0/23 - [[phab:T288505|T288505]] - [[phab:T283050|T283050]]
* 13:29 akosiaris: manually set 10.2.1.42 eventgate-analytics.discovery.wmnet in /etc/hosts for mw1331, mw1348. Verify hypothesis that this should cause increased latency
* 11:46 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts testvm2003.codfw.wmnet
* 13:25 XioNoX: reboot cr2-eqord for software upgrade - yaaaaa
* 10:58 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2025.codfw.wmnet
* 13:24 ladsgroup@deploy1001: Synchronized php-1.35.0-wmf.18/extensions/Wikibase/lib/includes/Store/CachingPropertyInfoLookup.php: SWAT: [[gerrit:570301{{!}}Cache PropertyInfoLookup internally]] ([[phab:T243955|T243955]]) (duration: 01m 07s)
* 10:52 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2025.codfw.wmnet
* 13:17 XioNoX: increase ospf cost for cr2-eqord links
* 10:50 jelto: disable puppet on gitlab1001 to test puppetized code on GitLab replica - [[phab:T283076|T283076]]
* 13:16 vgutierrez: upload acme-chief 0.23 to apt.wm.o (buster) - [[phab:T244236|T244236]]
* 10:21 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ldap-replica1003.wikimedia.org
* 13:15 XioNoX: disable transit/peering BGP sessions on cr2-eqord
* 10:18 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ldap-replica1003.wikimedia.org
* 13:15 ladsgroup@deploy1001: Synchronized php-1.35.0-wmf.16/extensions/Wikibase/lib/includes/Store/CachingPropertyInfoLookup.php: SWAT: [[gerrit:570301{{!}}Cache PropertyInfoLookup internally]] ([[phab:T243955|T243955]]) (duration: 01m 07s)
* 10:06 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 13:10 XioNoX: rollback: disable transit/peering BGP sessions on cr2-eqdfw
* 10:04 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 13:08 vgutierrez: depooling & reimaging cp5006 as buster - [[phab:T242093|T242093]]
* 10:04 urbanecm@deploy1002: Synchronized wmf-config/: {{Gerrit|01633739462f3bf09ae4e50b955454921ea4fbf9}}: Delete gettingstarted-with-category-suggestions dblist ([[phab:T235752|T235752]]; 2/2) (duration: 01m 05s)
* 13:03 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: {{Gerrit|5cc2b70}}: wgLogoHD and $wgVectorPrintLogo is replaced with wgLogos ([[phab:T232140|T232140]]) (duration: 01m 06s)
* 10:01 urbanecm@deploy1002: Synchronized dblists/: {{Gerrit|01633739462f3bf09ae4e50b955454921ea4fbf9}}: Delete gettingstarted-with-category-suggestions dblist ([[phab:T235752|T235752]]; 1/2) (duration: 01m 04s)
* 13:01 XioNoX: reboot cr2-eqdfw for software upgrade
* 09:59 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host copernicium.wikimedia.org
* 13:00 Amir1: SWAT needs more time
* 09:54 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host copernicium.wikimedia.org
* 12:55 XioNoX: disable transit/peering BGP sessions on cr2-eqdfw
* 09:19 jbond: update ipaddress6 fact - https://gerrit.wikimedia.org/r/c/operations/puppet/+/726625
* 12:50 urbanecm@deploy1001: Synchronized wmf-config/CommonSettings.php: SWAT: {{Gerrit|d450288}}: wgLogoHD and $wgVectorPrintLogo is replaced with wgLogos ([[phab:T232140|T232140]]) (duration: 01m 07s)
* 09:15 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 12:48 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: {{Gerrit|5cc2b70}}: wgLogoHD and $wgVectorPrintLogo is replaced with wgLogos ([[phab:T232140|T232140]]) (duration: 01m 07s)
* 09:13 ladsgroup@deploy1002: Synchronized php-1.38.0-wmf.2/extensions/Wikibase/repo/includes/ChangeModification/DispatchChangesJob.php: Backport: [[gerrit:725923{{!}}Don't fail job if subscribed wiki is unknown (T292446 T292440)]] (duration: 01m 15s)
* 12:32 awight@deploy1001: Synchronized php-1.35.0-wmf.18/extensions/Cite: SWAT: [[gerrit:570285{{!}}Revert follow standardization (T240858)]] (duration: 01m 13s)
* 09:13 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 10:53 akosiaris: rolling restart of all pods on kubernetes staging cluster to make sure everything is fine after the upgrade
* 08:29 volans@cumin2002: END (PASS) - Cookbook sre.experimental.reimage (exit_code=0) for host sretest1001.eqiad.wmnet
* 10:50 akosiaris: [[phab:T244335|T244335]] upgrade kubernetes-node on kubestage1002.eqiad.wmnet to 1.13.12
* 08:21 XioNoX: add ROAs for 185.15.58.0/24 and 185.15.59.0/24 - [[phab:T288505|T288505]] - [[phab:T283050|T283050]]
* 10:43 ema: cp4028: varnish-frontend-restart [[phab:T243634|T243634]]
* 08:04 volans@cumin2002: START - Cookbook sre.experimental.reimage for host sretest1001.eqiad.wmnet
* 10:24 akosiaris: [[phab:T244335|T244335]] upgrade kubernetes-master on neon.eqiad.wmnet (staging)
* 07:56 urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript namespaceDupes.php plwikinews --fix # [[phab:T291344|T291344]]
* 10:24 effie: Upload php-apcu_5.1.17+4.0.11-1+0~20190217111312.9+stretch~1.gbp192528+wmf2 - [[phab:T236800|T236800]]
* 07:56 urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript namespaceDupes.php plwikinews # [[phab:T291344|T291344]]
* 10:10 Urbanecm: Run mwscript deleteEqualMessages.php --delete to delete GrowthExperiments' message overrides (cswiki, viwiki, arwiki, kowiki)
* 07:55 urbanecm: mwdebug1001: scap pull ([[phab:T291344|T291344]] fix done)
* 09:57 akosiaris: upload kubernetes 1.13.12 to apt.wikimedia.org stretch-wikimedia/main [[phab:T244335|T244335]]
* 07:51 urbanecm: Staging at mwdebug1001 for [[phab:T291344|T291344]]
* 09:51 effie: install libmemcached-tools on mc-gp* servers - [[phab:T240684|T240684]]
* 05:53 kart_: Updated cxserver to use nodejs12 ([[phab:T290754|T290754]])
* 09:05 ema: add individual FortiGate IPs hitting ulsfo (currently cp4028) to vcl blocked_nets -- trying to identify problematic traffic [[phab:T243634|T243634]]
* 05:47 kartik@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'cxserver' for release 'production' .
* 07:02 marostegui: Replay s1 traffic on db1107 (10.4) [[phab:T242702|T242702]]
* 05:39 kartik@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'cxserver' for release 'production' .
* 06:32 elukey: force a puppet run on ores* hosts
* 05:36 Amir1: start of mwscript extensions/Wikibase/repo/maintenance/pruneChanges.php --wiki wikidatawiki --number-of-days=2
* 06:12 marostegui: Remove partitions from revision table db1098:3317 - [[phab:T239453|T239453]]
* 05:31 kartik@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'cxserver' for release 'staging' .
* 06:09 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1098:3317 - [[phab:T239453|T239453]]', diff saved to https://phabricator.wikimedia.org/P10312 and previous config saved to /var/cache/conftool/dbconfig/20200205-060942-marostegui.json
* 04:54 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 06:09 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db2085:3311, db2086:3317 - [[phab:T239453|T239453]]', diff saved to https://phabricator.wikimedia.org/P10311 and previous config saved to /var/cache/conftool/dbconfig/20200205-060911-marostegui.json
* 04:49 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 02:38 cdanis: [[phab:T243634|T243634]] ✔️ cdanis@cp4030.ulsfo.wmnet ~ 🕤🍺 sudo varnish-frontend-restart
* 04:29 ryankemper: [WDQS] `wdqs1012` is back up after restarting blazegraph (blazegraph was locked up)
* 04:27 ryankemper: [WDQS] `ryankemper@wdqs1012:~$ sudo systemctl restart wdqs-blazegraph` (attempting to bring downed `wdqs1012` back into health)
* 04:25 ryankemper: [WDQS] Repooling eqiad hosts following the brief outage from earlier: `wdqs1004`, `wdqs1006`, `wdqs1007`
* 03:19 eileen: civicrm revision changed from {{Gerrit|b6f5f71c18}} to {{Gerrit|82efd2e195}}, config revision is {{Gerrit|f4c57d4733}}
* 03:11 tstarling@deploy1002: Synchronized php-1.38.0-wmf.3/includes/CommentFormatter/RowCommentIterator.php: fix UBN [[phab:T292590|T292590]] (duration: 01m 04s)
* 01:39 legoktm: legoktm@mwmaint1002:~$ echo "https://en.wikiversity.org/static/images/mobile/copyright/wikiversity.svg" {{!}}mwscript purgeList.php
* 01:17 arlolra@deploy1002: Synchronized php-1.38.0-wmf.2/extensions/GlobalUserPage/includes/GlobalUserPage.php: Bump GlobalUserPage::PARSED_CACHE_VERSION for media DOM changes (duration: 01m 03s)
* 01:12 arlolra@deploy1002: Synchronized php-1.38.0-wmf.3/extensions/GlobalUserPage/includes/GlobalUserPage.php: Bump GlobalUserPage::PARSED_CACHE_VERSION for media DOM changes (duration: 01m 17s)
* 00:59 arlolra@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Enable legacy media dom on metawiki (duration: 01m 05s)
* 00:37 arlolra@deploy1002: Synchronized php-1.38.0-wmf.2/includes/resourceloader/ResourceLoaderSkinModule.php: Add a separate config for content.media.less 2/2 (duration: 01m 03s)
* 00:35 arlolra@deploy1002: Synchronized php-1.38.0-wmf.2/includes/DefaultSettings.php: Add a separate config for content.media.less 1/2 (duration: 01m 03s)
* 00:32 arlolra@deploy1002: Synchronized php-1.38.0-wmf.3/includes/resourceloader/ResourceLoaderSkinModule.php: Add a separate config for content.media.less 2/2 (duration: 01m 03s)
* 00:29 arlolra@deploy1002: Synchronized php-1.38.0-wmf.3/includes/DefaultSettings.php: Add a separate config for content.media.less 1/2 (duration: 01m 04s)
* 00:16 mutante: puppetmasters: rm /etc/logrotate.d/geoipupdate && systemctl start logrotate && puppet agent -tv
* 00:14 mutante: puppetmaster2002 - rm /etc/logrotate.d/geoipupdate (not managed by puppet anymore but not removed, caused duplicate logrotate config, made logrotate service fail), start logrotate
* 00:08 cstone: civicrm revision changed from {{Gerrit|34d3c3aae8}} to {{Gerrit|b6f5f71c18}}
* 00:01 tgr@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:725132{{!}}Add WN as an alias to project namespace in Polish Wikinews (T291344)]] (duration: 01m 04s)


== 2020-02-04 ==
== 2021-10-05 ==
* 22:35 twentyafterfour@deploy1001: rebuilt and synchronized wikiversions files: group0 wikis to 1.35.0-wmf.18  refs [[phab:T233866|T233866]]
* 23:54 tgr@deploy1002: Synchronized static/images/mobile/copyright/wikiversity.svg: Config: [[gerrit:725413{{!}}Wikiversity Logo Update for 2017 Logo Version (T292109)]] (duration: 01m 03s)
* 22:13 twentyafterfour@deploy1001: Finished scap: testwikis wikis to 1.35.0-wmf.18  refs [[phab:T233866|T233866]] (duration: 32m 03s)
* 23:47 tgr@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:704376{{!}}Adding and use wordmark in azwiki (T284877)]] (duration: 01m 04s)
* 22:03 cdanis@cumin2001: conftool action : set/pooled=true; selector: dnsdisc=kartotherian,name=eqiad
* 23:44 tgr@deploy1002: Synchronized static/images/mobile/copyright/wikipedia-wordmark-az.svg: Config: [[gerrit:704376{{!}}Adding and use wordmark in azwiki (T284877)]] (duration: 01m 23s)
* 21:41 twentyafterfour@deploy1001: Started scap: testwikis wikis to 1.35.0-wmf.18  refs [[phab:T233866|T233866]]
* 23:16 tgr@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:725386{{!}}Add image_suggestion_interaction event stream]] (duration: 01m 12s)
* 21:29 twentyafterfour: preparing the new mediawiki branch for deployment to test wikis
* 23:02 legoktm: deleting old stretch docker images from the registry for [[phab:T292485|T292485]]
* 20:31 shdubsh: restart kartotherian on maps2001
* 22:24 brennen@deploy1002: rebuilt and synchronized wikiversions files: Revert group0 wikis to 1.38.0-wmf.2
* 20:24 shdubsh: temporarily enable access logs on maps2001
* 22:20 brennen: 1.38.0-wmf.3 ([[phab:T281167|T281167]]) rolling back to testwikis for the day; will revisit in US-morning
* 20:20 twentyafterfour: branching mediawiki to wmf/1.35.0-wmf.18 from commit {{Gerrit|054dd94e97d6}} - train blockers should be added as subtasks under [[phab:T233866|T233866]]
* 20:47 brennen@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.38.0-wmf.3  refs [[phab:T281167|T281167]]
* 20:06 marxarelli: temporarily holding 1.35.0-wmf.18 [[[phab:T233866|T233866]]] branch cut and train due to concurrent maps prod issues
* 20:44 brennen@deploy1002: Synchronized php-1.38.0-wmf.3/includes/page: Backport: [[gerrit:726594{{!}}Pre-format comments for non-local files too]] ([[phab:T292570|T292570]]) (duration: 01m 04s)
* 19:15 mutante: cp3065 - powercycling
* 20:18 mutante: puppetmaster1003 et al - converting maxmind geoip database fetching from cron to timers
* 18:45 cdanis@cumin2001: conftool action : set/pooled=false; selector: dnsdisc=kartotherian,name=eqiad
* 20:06 mutante: cumin 'puppetmaster*' "disable-puppet '[[phab:T288844|T288844]] - [[phab:T273673|T273673]] - gerrit:721595 - $<nowiki>{</nowiki>USER<nowiki>}</nowiki>'"
* 17:57 cdanis: ✔️ cdanis@mw1272.eqiad.wmnet ~ 🕐☕ sudo restart-php7.2-fpm
* 19:30 mutante: restoring /home/amire80 from and to mwmaint2002 via Bacula bconsole ([[phab:T292573|T292573]])
* 17:41 akosiaris: reenable kartotherian on maps100*
* 19:09 brennen@deploy1002: rebuilt and synchronized wikiversions files: Revert group0 wikis to 1.38.0-wmf.2
* 17:34 oblivian@cumin1001: conftool action : set/weight=15; selector: cluster=appserver,service=nginx,dc=eqiad,name=mw12[3-5].*
* 19:03 brennen@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.38.0-wmf.3 refs [[phab:T281167|T281167]]
* 17:13 _joe_: restarting php-fpm on mw126[1-3]
* 18:26 brennen@deploy1002: Pruned MediaWiki: 1.37.0-wmf.23 (duration: 01m 57s)
* 17:11 _joe_: restarting php-fpm on mw1266-9
* 18:23 brennen@deploy1002: Pruned MediaWiki: 1.37.0-wmf.21 (duration: 04m 20s)
* 17:10 ladsgroup@deploy1001: Synchronized php-1.35.0-wmf.16/includes/filerepo/file/ForeignDBFile.php: gerrit: 570089, ongoing incident (duration: 01m 04s)
* 18:21 brennen: 1.38.0-wmf.3 ([[phab:T281167|T281167]]): pruning old branches, starting with 1.37.0-wmf.21, proceeeding to 1.37.0-wmf.23 if time allows
* 17:07 _joe_: restarted php-fpm on mw1265 witrh 80 workers (teh default)
* 18:11 ppchelko@deploy1002: Synchronized wmf-config: Remove mb_strtoupper overrides for HHVM [[phab:T219279|T219279]] Php72ToUpper.php removal (duration: 01m 06s)
* 17:07 _joe_: restarted php-fpm on mw1264 witrh 240 workers
* 18:04 ppchelko@deploy1002: Synchronized wmf-config/CommonSettings.php: Remove mb_strtoupper overrides for HHVM [[phab:T219279|T219279]] CS.php (duration: 01m 06s)
* 16:52 ladsgroup@deploy1001: Synchronized php-1.35.0-wmf.16/extensions/Wikibase: fix for the recent outage (duration: 01m 21s)
* 17:55 brennen@deploy1002: Finished scap: testwikis wikis to 1.38.0-wmf.3  refs [[phab:T281167|T281167]] (duration: 45m 59s)
* 16:02 ema: cp: rolling ats-backend-restart to unset Accept-Encoding before sending origin server requests [[phab:T242478|T242478]]
* 17:12 btullis@cumin1001: END (PASS) - Cookbook sre.aqs.roll-restart (exit_code=0) for AQS aqs cluster: Roll restart of all AQS's nodejs daemons. - btullis@cumin1001
* 14:23 akosiaris@deploy1001: helmfile [CODFW] Ran 'apply' command on namespace 'wikifeeds' for release 'production' .
* 17:09 brennen@deploy1002: Started scap: testwikis wikis to 1.38.0-wmf.3  refs [[phab:T281167|T281167]]
* 14:18 akosiaris: deploy new wikifeeds chart that is consistent with the current scaffolding approach. No code deploy though.
* 17:03 btullis@cumin1001: START - Cookbook sre.aqs.roll-restart for AQS aqs cluster: Roll restart of all AQS's nodejs daemons. - btullis@cumin1001
* 14:17 akosiaris@deploy1001: helmfile [EQIAD] Ran 'apply' command on namespace 'wikifeeds' for release 'production' .
* 17:02 btullis@cumin1001: END (FAIL) - Cookbook sre.aqs.roll-restart (exit_code=99) for AQS aqs cluster: Roll restart of all AQS's nodejs daemons. - btullis@cumin1001
* 14:16 akosiaris@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'wikifeeds' for release 'staging' .
* 17:02 btullis@cumin1001: START - Cookbook sre.aqs.roll-restart for AQS aqs cluster: Roll restart of all AQS's nodejs daemons. - btullis@cumin1001
* 14:07 XioNoX: repool ulsfo
* 16:56 brennen: successfully applied security patches for 1.38.0-wmf.3 train ([[phab:T281167|T281167]])
* 14:03 elukey@cumin1001: END (PASS) - Cookbook sre.aqs.roll-restart (exit_code=0)
* 16:47 brennen: coordinated with deployment backup and starting train prep for 1.38.0-wmf.3 ([[phab:T281167|T281167]]), branched at {{Gerrit|65279490f82c785181b8b6961e40901a4aaafca4}}
* 14:00 elukey@cumin1001: START - Cookbook sre.aqs.roll-restart
* 15:57 jbond@cumin2002: END (PASS) - Cookbook sre.puppet.renew-cert (exit_code=0) for puppetboard2002.codfw.wmnet: Renew puppet certificate - jbond@cumin2002
* 13:36 XioNoX: restart cr3-ulsfo for software upgrade
* 15:57 jbond@cumin2002: START - Cookbook sre.puppet.renew-cert for puppetboard2002.codfw.wmnet: Renew puppet certificate - jbond@cumin2002
* 13:23 vgutierrez: upgrading acme-chief to version 0.22 - [[phab:T240614|T240614]]
* 15:38 jbond: reimage puppetboard2002
* 13:10 vgutierrez: uploaded acme-chief 0.22 to apt.wm.o (buster) - [[phab:T240614|T240614]]
* 15:15 jbond@cumin1001: END (PASS) - Cookbook sre.puppet.renew-cert (exit_code=0) for puppetboard1002.eqiad.wmnet: Renew puppet certificate - jbond@cumin1001
* 13:09 XioNoX: restart cr4-ulsfo for upgrade
* 15:15 jbond@cumin1001: START - Cookbook sre.puppet.renew-cert for puppetboard1002.eqiad.wmnet: Renew puppet certificate - jbond@cumin1001
* 12:49 XioNoX: depool ulsfo for routers upgrade
* 15:10 moritzm: imported routinator 0.10.1-1bullseye to thirdparty/routinator for bullseye-wikimedia [[phab:T292503|T292503]]
* 10:35 ema: cp4032: varnish-frontend-restart [[phab:T243634|T243634]]
* 14:58 jbond: reimage puppetboard1002
* 09:08 vgutierrez: manually refreshing OCSP stapling response for non-canonical-redirects-3 - [[phab:T243948|T243948]]
* 14:40 effie: depool  mw1455 and mw1422
* 09:07 marostegui: Upgrade s3 codfw master db2105 - [[phab:T239791|T239791]]
* 14:30 Pchelolo: run foreachwiki uppercaseTitlesForUnicodeTransition.php --charmap current_to_php7_overrides.php [[phab:T219279|T219279]]
* 08:56 marostegui: Deploy schema change on enwiki eqiad host by host - [[phab:T243804|T243804]]
* 13:51 reedy@deploy1002: Synchronized wmf-config/CommonSettings.php: ExtensionDistributor - Drop REL1_31, start REL1_37 (duration: 00m 57s)
* 08:46 marostegui: Deploy schema change on enwiki codfw - [[phab:T243804|T243804]]
* 13:46 Pchelolo: run renameInvalidUsernames.php --wiki loginwiki --list /tmp/rename_users_for_uppercase_all.txt [[phab:T219279|T219279]]
* 08:16 marostegui: Deploy schema change on testwiki - [[phab:T243804|T243804]]
* 13:39 elukey@cumin1001: END (FAIL) - Cookbook sre.aqs.roll-restart (exit_code=99) for AQS aqs cluster: Roll restart of all AQS's nodejs daemons. - elukey@cumin1001
* 08:13 marostegui: Deploy schema change on test2wiki - [[phab:T243804|T243804]]
* 13:39 elukey@cumin1001: START - Cookbook sre.aqs.roll-restart for AQS aqs cluster: Roll restart of all AQS's nodejs daemons. - elukey@cumin1001
* 07:36 marostegui: Upgrade Mariadb on db1107 from 10.4.11 to 10.4.12 [[phab:T242702|T242702]]
* 13:23 ppchelko@deploy1002: Synchronized php-1.38.0-wmf.2/maintenance/uppercaseTitlesForUnicodeTransition.php: Backport uppercaseTitlesForUnicodeTransition.php maintenance script improvements [[phab:T219279|T219279]] (duration: 00m 58s)
* 07:15 marostegui: Compress db1126 - [[phab:T232446|T232446]]
* 12:53 ema: upload varnish 6.0.8-1wm1 to apt.wikimedia.org [[phab:T292290|T292290]]
* 07:14 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1126 - [[phab:T232446|T232446]]', diff saved to https://phabricator.wikimedia.org/P10302 and previous config saved to /var/cache/conftool/dbconfig/20200204-071420-marostegui.json
* 12:43 elukey: import AMD ROCm 4.2 to buster-wikimedia's thirdparty/amd-rocm42 - [[phab:T287267|T287267]]
* 07:09 marostegui: Compress db1091 - [[phab:T232446|T232446]]
* 12:24 ema: deployment-cache-text06: upgrade varnish to 6.0.8-1wm1 [[phab:T292290|T292290]]
* 07:08 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1091 - [[phab:T232446|T232446]]', diff saved to https://phabricator.wikimedia.org/P10301 and previous config saved to /var/cache/conftool/dbconfig/20200204-070804-marostegui.json
* 11:58 hnowlan: reverted restbase2023 to use CN=hostname certificate due to loading errors
* 07:05 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1105:3311, db2086:3317 - [[phab:T239453|T239453]]', diff saved to https://phabricator.wikimedia.org/P10300 and previous config saved to /var/cache/conftool/dbconfig/20200204-070533-marostegui.json
* 11:57 hnowlan@cumin1001: END (FAIL) - Cookbook sre.cassandra.roll-restart (exit_code=99) for nodes matching restbase2023.codfw.wmnet: Switching over to using FQDN certificate - hnowlan@cumin1001
* 06:48 elukey: force a puppet run on all ores[12] nodes
* 11:57 hnowlan@cumin1001: START - Cookbook sre.cassandra.roll-restart for nodes matching restbase2023.codfw.wmnet: Switching over to using FQDN certificate - hnowlan@cumin1001
* 00:14 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [enwiki] Add Commons as an import source [[phab:T242884|T242884]] (duration: 00m 57s)
* 11:37 hnowlan@cumin1001: END (FAIL) - Cookbook sre.cassandra.roll-restart (exit_code=99) for nodes matching restbase2023.codfw.wmnet: Switching over to using FQDN certificate - hnowlan@cumin1001
* 00:09 mutante: gerrit1002 - replaced ens5 with ens6 in /etc/network/interfaces (IP and row had changed in the past, needed manual fix after reboot and now came back) ;  mkfs.ext4 /dev/vdb on new additional 10GB disk. ([[phab:T239151|T239151]] [[phab:T243983|T243983]])
* 11:28 hnowlan@cumin1001: START - Cookbook sre.cassandra.roll-restart for nodes matching restbase2023.codfw.wmnet: Switching over to using FQDN certificate - hnowlan@cumin1001
* 00:06 jforrester@deploy1001: Synchronized dblists/visualeditor-nondefault.dblist: [nlwiki] Enable VisualEditor by default for all users [[phab:T161365|T161365]] (duration: 00m 58s)
* 11:17 hnowlan_: disabling puppet on cassandra nodes for rollout of 724061 - defaulting to cn=fqdn certificates
* 00:05 mutante: gerrit1002 - attempt to manually fix /etc/network interfaces , add IP on interface, reboot
* 11:15 effie: upgrade scap to 4.0.2 - [[phab:T291095|T291095]]
* 00:03 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Configure remainder of testwikis group for kask-session [[phab:T243106|T243106]] (duration: 00m 58s)
* 11:12 urbanecm@deploy1002: Synchronized dblists/commonsuploads.dblist: {{Gerrit|04524992865b0ae5750eb6fb0a374aa74a65b383}}: Enable local uploads for tcywiki ([[phab:T166763|T166763]]) (duration: 00m 59s)
* 00:02 volans: depool, varnish-frontend-restart, pool on cp4029 (~242k fds) - [[phab:T243634|T243634]]
* 10:11 vgutierrez: update acme-chief to version 0.32 on acmechief hosts - [[phab:T290249|T290249]]
* 10:09 vgutierrez: update acme-chief to version 0.32 on acmechief-test hosts - [[phab:T290249|T290249]]
* 10:06 vgutierrez: upload acme-chief 0.32 to apt.wm.o (buster) - [[phab:T290249|T290249]]
* 09:46 hnowlan_: generated cassandra certificate using FQDN for restbase2023
* 09:09 topranks: updating routinator on rpki2001 ([[phab:T291543|T291543]])
* 08:59 dcausse: depool and restart blazegraph on wdqs1007
* 08:51 moritzm: installing openssl security updates for stretch (buster/bullseye already fixed)
* 07:58 moritzm: installing apache security updates
* 07:57 elukey: upgrade GPU drivers (AMD ROCm 4.3.1) on an-worker1[096-101]
* 07:27 ladsgroup@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
* 07:26 ladsgroup@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
* 07:26 elukey@puppetmaster1001: conftool action : set/pooled=yes; selector: name=wdqs1004.wmnet
* 06:38 elukey: reboot an-worker1096 after installing new GPU drivers
* 04:20 eileen: civicrm revision changed from {{Gerrit|d74e9aa0a1}} to {{Gerrit|34d3c3aae8}}, config revision is {{Gerrit|cae09f7691}}


== 2020-02-03 ==
== 2021-10-04 ==
* 23:34 mutante: rebooting gerrit1002 (test VM)
* 23:30 foks: resetting some emails used for abuse by a globally-banned user
* 23:26 mutante: ganeti1003 - sudo gnt-instance modify --disk add:size=10G gerrit1002.wikimedia.org ([[phab:T239151|T239151]] [[phab:T243983|T243983]])
* 23:19 jdrewniak@deploy1002: Synchronized portals: Wikimedia Portals Update: [[gerrit:726084{{!}} Bumping portals to master (T128546)]] (duration: 00m 59s)
* 23:24 brennen@deploy1001: rebuilt and synchronized wikiversions files: all wikis to 1.35.0-wmf.16
* 23:18 jdrewniak@deploy1002: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: [[gerrit:726084{{!}} Bumping portals to master (T128546)]] (duration: 00m 59s)
* 23:21 mutante: gerrit1002 - deleting gerrit.log and gerrit.json files from January to free about 4GB of space ([[phab:T239151|T239151]] [[phab:T243983|T243983]])
* 23:13 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|75645c9cc59b37dbf59942eabbc014b7dc147626}}: Add explicit config for licensing/copyright message overrides ([[phab:T284097|T284097]]) (duration: 00m 59s)
* 23:12 XioNoX: removing AS15542 from esams
* 23:05 mutante: [deneb:~] $ sudo systemctl start docker-reporter-releng-images
* 22:18 andrew@deploy1001: Finished deploy [horizon/deploy@8bffc7d]: Fix for [[phab:T243355|T243355]] (duration: 03m 29s)
* 22:54 mutante: puppetmaster2001 - rm /etc/logrotate.d/geoipupdate_ipinfo  and geoipupdate_ipinfo ; running puppet, starting logrotate service
* 22:14 andrew@deploy1001: Started deploy [horizon/deploy@8bffc7d]: Fix for [[phab:T243355|T243355]]
* 18:13 jiji@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 22:13 mutante: rebooting ganeti1010, ganeti1011 and other new ganeti machines to pickup microcode mitigations, for some reason the previous reboots did not do it. rescheduled service check on icinga for ganeti1010 and now it recovered ([[phab:T228924|T228924]])
* 16:51 bblack: rolling restart of haproxy for DoTLS on dns300[12],authdns1001,authdns2001 to recycle connections
* 22:05 mutante: ganeti1010 - rebooting host to clear microcode mitigations CPU alert
* 15:24 vgutierrez: pool cp5006
* 21:39 brennen@deploy1001: rebuilt and synchronized wikiversions files: Revert "group2 wikis to 1.35.0-wmf.15"
* 15:17 ladsgroup@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' .
* 21:33 brennen@deploy1001: rebuilt and synchronized wikiversions files: all wikis to 1.35.0-wmf.16
* 15:16 ladsgroup@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' .
* 21:28 brennen@deploy1001: Synchronized php-1.35.0-wmf.16/includes/TemplateParser.php: Syncing https://gerrit.wikimedia.org/r/c/mediawiki/core/+/569643 for [[phab:T243548|T243548]] (duration: 01m 08s)
* 14:50 phuedx: phuedx@mwmaint1002:~$ mwscript extensions/SecurePoll/cli/purgeDecryptionKeys.php --wiki=votewiki --before="20210101000000"
* 21:14 halfak@deploy1001: Finished deploy [ores/deploy@50a101a]: [[phab:T243451|T243451]] (duration: 12m 47s)
* 14:46 ladsgroup@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' .
* 21:01 halfak@deploy1001: Started deploy [ores/deploy@50a101a]: [[phab:T243451|T243451]]
* 14:46 effie: uploading scap 4.0.2 - [[phab:T291095|T291095]]
* 20:43 mutante: doc1001 - sudo chown -R doc-uploader:doc-uploader /srv/docroot/
* 14:45 ladsgroup@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' .
* 20:19 XioNoX: reactivate L3 only LB in esams/knams
* 14:39 brennen: gitlab: upgrade to 14.3.2 (note there was an additional patch release on 2021-10-01) complete ([[phab:T292256|T292256]])
* 20:19 XioNoX: remove test flowspec rule from cr3-knams
* 14:25 Amir1: cleaning up wb_changes_subscription rows from closed wikis ([[phab:T292440|T292440]])
* 20:13 mutante: doc1001 - re-enabled puppet after merging gerrit:569620 - Git::Clone[integration/docroot]/File[/srv/docroot]/mode: mode changed '2775' to '0755' - Profile::Doc/File[/srv/docroot/org/wikimedia/doc]/group: group changed 'doc-uploader' to 'wikidev', mode changed '0775' to '0755'. needs another follow-up ([[phab:T237707|T237707]])
* 14:24 brennen: gitlab: downtime for upgrade to 14.3.1
* 19:27 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [officewiki] Enable VisualEditor desktop section editing (duration: 01m 07s)
* 14:19 elukey: import AMD ROCm 4.3.1 packages in buster-wikimedia's thirdparty/amd-rocm431 - [[phab:T287267|T287267]]
* 19:21 Urbanecm: Morning SWAT done
* 14:19 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 19:20 urbanecm@deploy1001: Synchronized wmf-config/InterwikiSortOrders.php: SWAT: {{Gerrit|7b53a52}}: Add gcr, mnw and szy to InterwikiSortOrders (duration: 01m 11s)
* 14:17 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 19:19 mutante: doc1001 - chown -R doc-uploader:doc-uploader /srv/docroot ; temp. disabled puppet ([[phab:T237707|T237707]])
* 14:13 ladsgroup@deploy1002: Synchronized wmf-config/Wikibase.php: Config: [[gerrit:725905{{!}}Explicitly enable dispatching and pruning for wikidata (T48643)]] (duration: 00m 58s)
* 19:09 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: {{Gerrit|7bb6a12}}: Configure remainder of testwikis group for kask-transition ([[phab:T243106|T243106]]) (duration: 01m 14s)
* 14:08 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 18:58 mutante: < bblack> !log doc1001: chown -R nobody:wikidev /srv/docroot {{!}} < mutante> !doc1001 sudo -u doc-uploader chmod g+w /srv/docroot/org/wikimedia/doc  {{!}} https://gerrit.wikimedia.org/r/c/operations/puppet/+/484304 {{!}} ([[phab:T237707|T237707]])
* 14:06 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 18:44 bblack: doc1001: chown -R nobody:wikidev /srv/docroot
* 14:03 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on gitlab2001.wikimedia.org with reason: upgrade gitlab2001 to new version https://phabricator.wikmiedia.org/T292256
* 18:34 brennen: edited /srv/mediawiki-stating/wikiversions.json on deploy1001; scap pull and scap wikiversions-compile on mwdebug1002; revert wikiversions changes on deploy1001.
* 14:03 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on gitlab2001.wikimedia.org with reason: upgrade gitlab2001 to new version https://phabricator.wikmiedia.org/T292256
* 18:25 mholloway-shell@deploy1001: helmfile [CODFW] Ran 'apply' command on namespace 'wikifeeds' for release 'production' .
* 14:01 ladsgroup@deploy1002: Synchronized wmf-config: Config: [[gerrit:725502{{!}}Enable dispatching via jobs everywhere (T48643)]] (duration: 01m 00s)
* 18:23 mholloway-shell@deploy1001: helmfile [EQIAD] Ran 'apply' command on namespace 'wikifeeds' for release 'production' .
* 12:57 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 18:17 mholloway-shell@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'wikifeeds' for release 'staging' .
* 12:56 ladsgroup@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:725785{{!}}Enable dispatching for wikidatawiki and commonswiki (T292088)]] (duration: 01m 00s)
* 16:52 eevans@deploy1001: helmfile [EQIAD] Ran 'apply' command on namespace 'sessionstore' for release 'production' .
* 12:54 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 16:48 eevans@deploy1001: helmfile [CODFW] Ran 'apply' command on namespace 'sessionstore' for release 'production' .
* 12:08 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 16:38 eevans@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'sessionstore' for release 'staging' .
* 12:02 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on ganeti2026.codfw.wmnet with reason: Ganeti tests
* 15:38 XioNoX: rollback: add debug on eqiad-knams link interfaces - [[phab:T240659|T240659]]
* 12:02 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on ganeti2026.codfw.wmnet with reason: Ganeti tests
* 15:33 XioNoX: add debug on eqiad-knams link interfaces - [[phab:T240659|T240659]]
* 12:02 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on ganeti2025.codfw.wmnet with reason: Ganeti tests
* 14:59 moritzm: restarting exim on phab* to pick up libidn security update
* 12:02 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on ganeti2025.codfw.wmnet with reason: Ganeti tests
* 14:55 moritzm: restarting superset on an-tool1004/1005 to pick up libidn security update
* 12:01 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 14:44 moritzm: restarting apache on an-tool*. cloudmetrics*, logstash*, grafana1002 to pick up libidn security update
* 11:55 urbanecm: EU B&C window done
* 14:21 moritzm: restarting slapd on ldap-corp* to pick up libidn2 security updates
* 11:55 urbanecm@deploy1002: Synchronized multiversion/MWWikiversions.php: {{Gerrit|508cf5cc6d213373f7c9ba1cdef142ebc8398022}}: Let DB expressions intersect DB lists ([[phab:T290609|T290609]]) (duration: 00m 58s)
* 14:18 cdanis: [[phab:T243634|T243634]] ✔️ cdanis@cp4031.ulsfo.wmnet ~ 🕤☕ sudo varnish-frontend-restart
* 11:50 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|a855078cf52d88cc2cd27a0adc7c6a680c80dd39}}: dewiki, nlwiki: Bump Growth features to 80% ([[phab:T288420|T288420]], [[phab:T285254|T285254]]) (duration: 00m 58s)
* 13:58 moritzm: installing libidn2 security updates
* 11:46 urbanecm@deploy1002: Synchronized private/PrivateSettings.php: {{Gerrit|5728376}}: Update [[phab:T250887|T250887]] mitigations (duration: 00m 58s)
* 13:32 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 11:44 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|b0a96bed4562bcc975187b1d34626201d407404b}}: Undeploy GettingStarted V: Remove now-obsolete logging channels ([[phab:T235752|T235752]]) (duration: 00m 59s)
* 13:32 jmm@cumin2001: START - Cookbook sre.hosts.downtime
* 11:42 urbanecm@deploy1002: Synchronized wmf-config/extension-list: {{Gerrit|9709bcfc8dacbcd1704471df08c31cec0711bea6}}: Undeploy GettingStarted IV: Dont build i18n ([[phab:T235752|T235752]]) (duration: 00m 58s)
* 13:32 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 11:39 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|d60f332785868797e7ecc9b5e410616d5604b392}}: Undeploy getting started III: Dont set wmgUseGettingStarted, now ignored ([[phab:T235752|T235752]]) (duration: 00m 58s)
* 13:32 jmm@cumin2001: START - Cookbook sre.hosts.downtime
* 11:37 urbanecm@deploy1002: Synchronized wmf-config/CommonSettings.php: {{Gerrit|9eaf960c4b7c304be57dfc8d248aca0c6501d04c}}: Undeploy GettingStarted II: Dont load regardless of config ([[phab:T235752|T235752]]) (duration: 00m 58s)
* 13:31 moritzm: rebooting ganeti1009 - ganeti1022 to pick up microcode update [[phab:T228924|T228924]]
* 11:35 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|1c7405ad1eb323a8da524819f17d6f1a66afaa57}}: Undeploy GettingStarted I: Disable on all wikis ([[phab:T235752|T235752]]) (duration: 00m 58s)
* 12:58 XioNoX: deactivate v6 BGP to AS25596
* 11:31 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:724992{{!}}Remove deprecated SectionTranslationTargetLanguage config (T290302)]] (duration: 00m 58s)
* 12:57 moritzm: installing spamassassin security updates
* 11:22 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:725042{{!}}Add wikisource-bot.toolforge.org to Commons copy upload list (T292213)]] (duration: 00m 59s)
* 12:53 Urbanecm: Previous message should be "EU SWAT done"
* 11:16 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:720058{{!}}Add IA-Upload tool domains to Commons wgCopyUploadsDomains (T287241)]] (duration: 00m 59s)
* 12:52 Urbanecm: Morning SWAT done
* 11:12 akosiaris@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mathoid' for release 'production' .
* 12:52 Urbanecm: Purge https://en.wikipedia.org/static/images/project-logos/zh_classicalwiki*.png ([[phab:T243509|T243509]])
* 11:10 akosiaris@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mathoid' for release 'production' .
* 12:51 urbanecm@deploy1001: Synchronized static/images/project-logos/: SWAT: {{Gerrit|af0b745}}: Update logo for zh_classical wiki ([[phab:T243509|T243509]]) (duration: 01m 06s)
* 11:07 jiji@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 12:45 urbanecm@deploy1001: Synchronized dblists/mobilemainpagelegacy.dblist: SWAT: {{Gerrit|e9387b2}}: Disable MobileFrontend Mainpage special casing on frwiktionary ([[phab:T241888|T241888]]) (duration: 01m 05s)
* 11:06 akosiaris@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'mathoid' for release 'staging' .
* 12:40 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: {{Gerrit|5f13c19}}: Add minerva custom log for la.wiki ([[phab:T240728|T240728]]; 2/2) (duration: 01m 06s)
* 11:04 effie: depool  wtp1026 for tests
* 12:37 urbanecm@deploy1001: Synchronized static/images/mobile/copyright/: SWAT: {{Gerrit|5f13c19}}: Add minerva custom log for la.wiki ([[phab:T240728|T240728]]; 1/2) (duration: 01m 06s)
* 11:04 effie: pool  wtp1025
* 12:35 moritzm: installing openjpeg2 security updates
* 10:59 jiji@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 12:32 Urbanecm: Purge https://en.wikipedia.org/static/images/mobile/copyright/wikipedia-wordmark-szl.svg ([[phab:T233104|T233104]])
* 09:13 akosiaris: hbal -L -G row_C -X on ganeti01.svc.eqiad.wmnet
* 12:30 urbanecm@deploy1001: Synchronized static/images/mobile/copyright/: SWAT: {{Gerrit|76e67cd}}: {{Gerrit|e266e25}}: Add wordmarks for szlwiki and etwiki ([[phab:T233104|T233104]], [[phab:T230379|T230379]]) (duration: 01m 06s)
* 08:59 jgiannelos@deploy1002: Finished deploy [kartotherian/deploy@071f7c3] (eqiad): Increase mirrored traffic to 100% for eqiad (duration: 00m 54s)
* 12:29 urbanecm@deploy1001: Synchronized static/images/mobile/copyright/: SWAT: {{Gerrit|76e67cd}}: {{Gerrit|e266e25}}: Add static wordmarks for szlwiki and etwiki ([[phab:T233104|T233104]], [[phab:T230379|T230379]]) (duration: 01m 06s)
* 08:58 jgiannelos@deploy1002: Started deploy [kartotherian/deploy@071f7c3] (eqiad): Increase mirrored traffic to 100% for eqiad
* 12:25 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: {{Gerrit|32e0356}}: Add vzg-easydb.gbv.de to the wgCopyUploadsDomains ([[phab:T243118|T243118]]) (duration: 01m 07s)
* 07:37 joal@deploy1002: Finished deploy [analytics/refinery@38f3adc] (hadoop-test): Hotfix analytics deploy TEST [analytics/refinery@38f3adc] (duration: 06m 14s)
* 12:20 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: {{Gerrit|6c48af8}}: Assign editautopatrolprotected to hewiki patrollers ([[phab:T243665|T243665]]) (duration: 01m 06s)
* 07:31 joal@deploy1002: Started deploy [analytics/refinery@38f3adc] (hadoop-test): Hotfix analytics deploy TEST [analytics/refinery@38f3adc]
* 12:14 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: {{Gerrit|6b497e7}}: Wikidata - enable TaintedRefs ([[phab:T241989|T241989]]) (duration: 01m 06s)
* 07:30 joal@deploy1002: Finished deploy [analytics/refinery@38f3adc] (thin): Hotfix analytics deploy THIN [analytics/refinery@38f3adc] (duration: 00m 06s)
* 12:09 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: {{Gerrit|0c0ef87}}: Add wgImportSources for hiwikibooks ([[phab:T244022|T244022]]) (duration: 01m 05s)
* 07:30 joal@deploy1002: Started deploy [analytics/refinery@38f3adc] (thin): Hotfix analytics deploy THIN [analytics/refinery@38f3adc]
* 12:07 urbanecm@deploy1001: Synchronized wmf-config/CommonSettings.php: SWAT: Remove $wgImgAuthDetails=true ([[phab:T153459|T153459]]) (duration: 01m 36s)
* 07:29 joal@deploy1002: Finished deploy [analytics/refinery@38f3adc]: Hotfix analytics deploy [analytics/refinery@38f3adc] (duration: 19m 18s)
* 11:38 ema: powercycle cp3057 [[phab:T244127|T244127]] [[phab:T238305|T238305]]
* 07:19 dcausse: restarting blazegraph on wdqs2001 & wdqs2004 (allocators burning too quickly)
* 10:24 godog: temp disable puppet on cp hosts as precaution for https://gerrit.wikimedia.org/r/c/operations/puppet/+/563977
* 07:18 elukey: depool + restart blazegraph + restart updater for wdqs1006
* 10:08 moritzm: installing sudo security updates on stretch
* 07:18 elukey@puppetmaster1001: conftool action : set/pooled=inactive; selector: name=wdqs1006.wmnet
* 07:18 elukey@puppetmaster1001: conftool action : set/pooled=inactive; selector: name=wdqs1004.wmnet
* 07:10 joal@deploy1002: Started deploy [analytics/refinery@38f3adc]: Hotfix analytics deploy [analytics/refinery@38f3adc]
* 07:02 godog: swift eqiad-prod: add weight to ms-be10[64-67] - [[phab:T290546|T290546]]
* 06:44 elukey: depool + restart blazegraph + restart updater on wdqs1004
* 05:50 ladsgroup@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' .
* 05:49 ladsgroup@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' .
* 05:47 ladsgroup@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'staging' .


== 2020-02-02 ==
== 2021-10-03 ==
* 19:25 effie: restart varnish on cp4028
* 14:45 _joe_: restarting acmechief on acmechief1001
* 08:48 effie: reboot host analytics1061 - [[phab:T244081|T244081]]
* 12:55 kormat@cumin1001: dbctl commit (dc=all): 'Depool db1127, bad ram', diff saved to https://phabricator.wikimedia.org/P17414 and previous config saved to /var/cache/conftool/dbconfig/20211003-125530-kormat.json
* 08:24 elukey: powercycle cp5006 (unresponsive to ssh, remote tty available but not able to login as root, no prometheus metrics in hours)
* 08:23 elukey@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp5006.eqsin.wmnet


== 2020-02-01 ==
== 2021-10-02 ==
* 18:17 effie: pool scb2003, no need for host to stay depooled - [[phab:T244069|T244069]]
* 17:28 bd808@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'toolhub' for release 'main' .
* 17:46 cdanis: [[phab:T243634|T243634]] ✔️ cdanis@cp4030.ulsfo.wmnet ~ 🕐☕ sudo varnish-frontend-restart
* 16:10 bd808@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'toolhub' for release 'main' .
* 17:27 effie: depool scb2003 [[phab:T244069|T244069]]
* 16:51 effie: pool mw1273
* 16:50 effie: pool scb2003
* 16:30 elukey: powerup analytics1073 (attempt to see if it was only a kernel-related crash) - [[phab:T244064|T244064]]
* 16:16 effie: poweroff analytics1073 - [[phab:T244064|T244064]]
* 16:16 effie: poweroff analytics1073 - /T244064
* 16:16 effie: poweroff analytics1073
* 13:00 effie: depool scb2003
* 12:21 effie: depool mw1273
* 01:03 eileen: process-control config revision is {{Gerrit|c3c8bde761}}
* 00:50 eileen: civicrm revision changed from {{Gerrit|fcc5673ee7}} to {{Gerrit|ee9edf8137}}, config revision is {{Gerrit|2a61da0ace}}


== 2020-01-31 ==
== 2021-10-01 ==
* 22:25 eileen: civicrm revision changed from {{Gerrit|ac730a6bcb}} to {{Gerrit|fcc5673ee7}}, config revision is {{Gerrit|2a61da0ace}}
* 23:19 bd808@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'toolhub' for release 'main' .
* 22:14 bstorm_: repooled labsdb1011 now that view work is done
* 22:27 mutante: puppetmaster2001 - systemctl reset-failed
* 22:00 eileen: process-control config revision is {{Gerrit|2a61da0ace}} disabled process-control
* 22:16 mutante: puppetmaster2001 systemctl disable geoip_update_ipinfo.timer
* 21:59 bstorm_: depooled labsdb1011
* 22:15 mutante: puppetmaster2001 - sudo /usr/local/bin/geoipupdate_job after adding new shell command and timer - succesfully downloaded enterprise database for [[phab:T288844|T288844]]
* 21:32 bstorm_: updated views on labsdb1010
* 21:56 bd808@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'toolhub' for release 'main' .
* 21:22 bstorm_: updated views on labsdb1009
* 21:44 mutante: puppetmasters - temp. disabling puppet one more time, now for a different deploy, to fetch an additional MaxMind database - [[phab:T288844|T288844]]
* 21:21 bstorm_: updated actor views on labsdb1012
* 21:19 mutante: puppetmaster2001 - puppet removed cron sync_volatile and cron sync_ca - starting and verifying new timers: 'systemctl status sync-puppet-volatile', 'systemctl status sync-puppet-ca' [[phab:T273673|T273673]]
* 18:17 bblack: repool cp4032 (buster)
* 21:12 mutante: puppetmaster1002, puppetmaster1003, puppetmaster2002, puppetmaster2003: re-enabled puppet, they are backends. backends don't have the sync cron/job/timer, so noop as well, just like 1004/1005/2004/2005. this just leaves the actual change on 2001  - [[phab:T273673|T273673]]
* 18:17 bblack@cumin1001: conftool action : set/pooled=yes; selector: name=cp4032.ulsfo.wmnet
* 21:07 mutante: puppetmaster1004, puppetmaster1005, puppetmaster2004, puppetmaster2005: re-enabled puppet, they are "insetup" role
* 18:14 bblack: repool cp4029
* 21:06 mbsantos@deploy1002: Finished deploy [kartotherian/deploy@d309a6e] (eqiad): tegola: reduce load to 50% during the weekend (duration: 00m 54s)
* 18:13 bblack: restarted ats-tls and varnish-fe on cp4029
* 21:05 mbsantos@deploy1002: Started deploy [kartotherian/deploy@d309a6e] (eqiad): tegola: reduce load to 50% during the weekend
* 18:05 bblack: depool varnish-fe on cp4029
* 21:05 mutante: puppetmaster1001 - re-enabled puppet, noop as expected, the passive host pulls from the active one, so only 2001 has the cron/job/timer
* 18:03 bblack: depool ats-tls on cp4029
* 21:05 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 16:59 marostegui: Re-enable notifications on the dbstore1005:3318 check [[phab:T243871|T243871]]
* 21:02 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 09:18 addshore: addshore@mwmaint1002:~$ mwscript extensions/Wikibase/repo/maintenance/rebuildPropertyTerms.php --wiki=wikidatawiki --sleep 4 --batch-size=25 # In a screen for [[phab:T219301|T219301]]
* 21:01 legoktm@deploy1002: Synchronized wmf-config/CommonSettings.php: Revert "Have PdfHandler use Shellbox on Commons for 10% of requests" (duration: 00m 59s)
* 03:22 mutante: powercycling crashed cp3063
* 20:58 mutante: temp disabling puppet on puppetmasters - deploying gerrit:724115 (gerrit:723310) [[phab:T273673|T273673]]
* 01:09 mholloway-shell@deploy1001: Finished deploy [mobileapps/deploy@322ee4c]: Update mobileapps to {{Gerrit|3eec28d}} (duration: 06m 53s)
* 18:58 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-db1002.eqiad.wmnet with reason: REIMAGE
* 01:02 mholloway-shell@deploy1001: Started deploy [mobileapps/deploy@322ee4c]: Update mobileapps to {{Gerrit|3eec28d}}
* 18:56 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-db1001.eqiad.wmnet with reason: REIMAGE
* 00:41 mutante: contint1001/contint2001 - upgrading jenkins to 2.219
* 18:55 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-db1002.eqiad.wmnet with reason: REIMAGE
* 00:36 mutante: releases2001: upgrading jenkins to 2.219; install1002: import jenkins 2.219 into jessie-wikimedia APT repo
* 18:53 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-db1001.eqiad.wmnet with reason: REIMAGE
* 00:31 mutante: importing jenkins 2.219 to stretch-wikimedia APT repo; releases1001: upgrading jenkins to 2.219
* 18:07 robh@cumin1001: END (ERROR) - Cookbook sre.experimental.reimage (exit_code=97) for host an-db1001.eqiad.wmnet
* 18:05 robh@cumin1001: START - Cookbook sre.experimental.reimage for host an-db1001.eqiad.wmnet
* 17:58 effie: depool mw1025, mw1319, mw1312 for test
* 16:20 dancy: testing upcoming Scap 4.0.2 release on beta
* 14:04 bblack: C:envoyproxy (appservers and others): restarting envoyproxy
* 14:04 bblack: C:envoyproxy (appservers and others): ca-certificates updated via cumin to workaround [[phab:T292291|T292291]] issues
* 13:45 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
* 13:45 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 13:23 bblack: manually trying LE expired root workaround on mwdebug1001 with puppet disabled ...
* 13:12 gehel@cumin1001: START - Cookbook sre.wdqs.data-reload
* 13:11 gehel@cumin1001: END (ERROR) - Cookbook sre.wdqs.data-reload (exit_code=97)
* 13:11 gehel@cumin1001: START - Cookbook sre.wdqs.data-reload
* 13:10 gehel@cumin1001: START - Cookbook sre.wdqs.data-reload
* 11:42 jgiannelos@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 11:11 jynus: manually migrating some vms out of ganeti1009 to avoid excessive memory pressure
* 10:58 marostegui@cumin1001: dbctl commit (dc=all): 'db1164 (re)pooling @ 100%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17413 and previous config saved to /var/cache/conftool/dbconfig/20211001-105849-root.json
* 10:57 marostegui@cumin1001: dbctl commit (dc=all): 'db1177 (re)pooling @ 100%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17412 and previous config saved to /var/cache/conftool/dbconfig/20211001-105735-root.json
* 10:43 jgiannelos@deploy1002: Finished deploy [kartotherian/deploy@d4caf6d] (eqiad): Increase mirrored traffic to 100% for eqiad (duration: 00m 49s)
* 10:43 marostegui@cumin1001: dbctl commit (dc=all): 'db1164 (re)pooling @ 75%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17411 and previous config saved to /var/cache/conftool/dbconfig/20211001-104345-root.json
* 10:43 jgiannelos@deploy1002: Started deploy [kartotherian/deploy@d4caf6d] (eqiad): Increase mirrored traffic to 100% for eqiad
* 10:42 marostegui@cumin1001: dbctl commit (dc=all): 'db1177 (re)pooling @ 75%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17410 and previous config saved to /var/cache/conftool/dbconfig/20211001-104232-root.json
* 10:28 marostegui@cumin1001: dbctl commit (dc=all): 'db1164 (re)pooling @ 50%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17409 and previous config saved to /var/cache/conftool/dbconfig/20211001-102841-root.json
* 10:27 marostegui@cumin1001: dbctl commit (dc=all): 'db1177 (re)pooling @ 50%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17408 and previous config saved to /var/cache/conftool/dbconfig/20211001-102728-root.json
* 10:13 marostegui@cumin1001: dbctl commit (dc=all): 'db1164 (re)pooling @ 25%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17407 and previous config saved to /var/cache/conftool/dbconfig/20211001-101338-root.json
* 10:12 marostegui@cumin1001: dbctl commit (dc=all): 'db1177 (re)pooling @ 25%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17406 and previous config saved to /var/cache/conftool/dbconfig/20211001-101224-root.json
* 10:00 jgiannelos@deploy1002: Finished deploy [kartotherian/deploy@c123ab9] (eqiad): Increase mirrored traffic to 80% for eqiad (duration: 00m 51s)
* 09:59 jgiannelos@deploy1002: Started deploy [kartotherian/deploy@c123ab9] (eqiad): Increase mirrored traffic to 80% for eqiad
* 09:58 marostegui@cumin1001: dbctl commit (dc=all): 'db1164 (re)pooling @ 10%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17405 and previous config saved to /var/cache/conftool/dbconfig/20211001-095834-root.json
* 09:57 marostegui@cumin1001: dbctl commit (dc=all): 'db1177 (re)pooling @ 10%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17404 and previous config saved to /var/cache/conftool/dbconfig/20211001-095720-root.json
* 09:55 marostegui: Upgrade db1164 and db1177
* 09:54 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1177 and db1164 for upgrade', diff saved to https://phabricator.wikimedia.org/P17403 and previous config saved to /var/cache/conftool/dbconfig/20211001-095433-marostegui.json
* 09:49 marostegui@cumin1001: dbctl commit (dc=all): 'db1172 (re)pooling @ 100%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17402 and previous config saved to /var/cache/conftool/dbconfig/20211001-094913-root.json
* 09:49 marostegui@cumin1001: dbctl commit (dc=all): 'db1135 (re)pooling @ 100%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17401 and previous config saved to /var/cache/conftool/dbconfig/20211001-094902-root.json
* 09:38 urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript extensions/GrowthExperiments/maintenance/updateMenteeData.php --wiki=frwiki --force # to get an idea about timing for [[phab:T290609|T290609]], runs in a tmux session under my account
* 09:34 marostegui@cumin1001: dbctl commit (dc=all): 'db1172 (re)pooling @ 75%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17400 and previous config saved to /var/cache/conftool/dbconfig/20211001-093410-root.json
* 09:33 marostegui@cumin1001: dbctl commit (dc=all): 'db1135 (re)pooling @ 75%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17399 and previous config saved to /var/cache/conftool/dbconfig/20211001-093358-root.json
* 09:25 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host testvm2002.codfw.wmnet
* 09:19 marostegui@cumin1001: dbctl commit (dc=all): 'db1172 (re)pooling @ 50%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17398 and previous config saved to /var/cache/conftool/dbconfig/20211001-091906-root.json
* 09:18 marostegui@cumin1001: dbctl commit (dc=all): 'db1135 (re)pooling @ 50%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17397 and previous config saved to /var/cache/conftool/dbconfig/20211001-091854-root.json
* 09:04 marostegui@cumin1001: dbctl commit (dc=all): 'db1172 (re)pooling @ 25%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17396 and previous config saved to /var/cache/conftool/dbconfig/20211001-090402-root.json
* 09:03 marostegui@cumin1001: dbctl commit (dc=all): 'db1135 (re)pooling @ 25%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17395 and previous config saved to /var/cache/conftool/dbconfig/20211001-090351-root.json
* 09:02 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host testvm2002.codfw.wmnet
* 09:00 _joe_: restarting pybal low-traffic in eqiad to pick up the drop of proxyfetch to kubernetes services
* 08:48 marostegui@cumin1001: dbctl commit (dc=all): 'db1172 (re)pooling @ 10%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17394 and previous config saved to /var/cache/conftool/dbconfig/20211001-084859-root.json
* 08:48 marostegui@cumin1001: dbctl commit (dc=all): 'db1135 (re)pooling @ 10%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17393 and previous config saved to /var/cache/conftool/dbconfig/20211001-084847-root.json
* 08:44 marostegui: Upgrade db1135 and db1172
* 08:44 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1172 for upgrade', diff saved to https://phabricator.wikimedia.org/P17392 and previous config saved to /var/cache/conftool/dbconfig/20211001-084435-marostegui.json
* 08:44 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1135 for upgrade', diff saved to https://phabricator.wikimedia.org/P17391 and previous config saved to /var/cache/conftool/dbconfig/20211001-084411-marostegui.json
* 08:43 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db2080 [[phab:T290868|T290868]]', diff saved to https://phabricator.wikimedia.org/P17390 and previous config saved to /var/cache/conftool/dbconfig/20211001-084345-marostegui.json
* 08:17 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2026.codfw.wmnet
* 08:15 _joe_: restarting pybal in codfw to pick up config changes
* 08:09 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2026.codfw.wmnet
* 08:04 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on testvm[2001,2003].codfw.wmnet with reason: Ganeti tests
* 08:03 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 8:00:00 on testvm[2001,2003].codfw.wmnet with reason: Ganeti tests
* 06:28 marostegui@cumin1001: dbctl commit (dc=all): 'db1114 (re)pooling @ 100%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17388 and previous config saved to /var/cache/conftool/dbconfig/20211001-062846-root.json
* 06:27 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality' for release 'main' .
* 06:24 marostegui@cumin1001: dbctl commit (dc=all): 'db1119 (re)pooling @ 100%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17387 and previous config saved to /var/cache/conftool/dbconfig/20211001-062453-root.json
* 06:13 marostegui@cumin1001: dbctl commit (dc=all): 'db1114 (re)pooling @ 75%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17386 and previous config saved to /var/cache/conftool/dbconfig/20211001-061342-root.json
* 06:13 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality' for release 'main' .
* 06:09 marostegui@cumin1001: dbctl commit (dc=all): 'db1119 (re)pooling @ 75%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17385 and previous config saved to /var/cache/conftool/dbconfig/20211001-060949-root.json
* 05:58 marostegui@cumin1001: dbctl commit (dc=all): 'db1114 (re)pooling @ 50%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17384 and previous config saved to /var/cache/conftool/dbconfig/20211001-055838-root.json
* 05:54 marostegui@cumin1001: dbctl commit (dc=all): 'db1119 (re)pooling @ 50%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17383 and previous config saved to /var/cache/conftool/dbconfig/20211001-055445-root.json
* 05:43 marostegui@cumin1001: dbctl commit (dc=all): 'db1114 (re)pooling @ 25%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17382 and previous config saved to /var/cache/conftool/dbconfig/20211001-054335-root.json
* 05:39 marostegui@cumin1001: dbctl commit (dc=all): 'db1119 (re)pooling @ 25%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17381 and previous config saved to /var/cache/conftool/dbconfig/20211001-053942-root.json
* 05:28 marostegui@cumin1001: dbctl commit (dc=all): 'db1114 (re)pooling @ 10%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17380 and previous config saved to /var/cache/conftool/dbconfig/20211001-052831-root.json
* 05:26 marostegui: Upgrade db1114
* 05:25 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1114 for upgrade', diff saved to https://phabricator.wikimedia.org/P17379 and previous config saved to /var/cache/conftool/dbconfig/20211001-052509-marostegui.json
* 05:24 marostegui@cumin1001: dbctl commit (dc=all): 'db1119 (re)pooling @ 10%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17378 and previous config saved to /var/cache/conftool/dbconfig/20211001-052438-root.json
* 05:22 marostegui: Upgrade db1119
* 05:21 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1119 for upgrade', diff saved to https://phabricator.wikimedia.org/P17377 and previous config saved to /var/cache/conftool/dbconfig/20211001-052133-marostegui.json
* 04:00 legoktm@deploy1002: Synchronized wmf-config/CommonSettings.php: Have PdfHandler use Shellbox on Commons for 10% of requests ([[phab:T289228|T289228]]) (duration: 00m 59s)
* 04:00 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 03:57 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 03:24 bd808@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'toolhub' for release 'main' .
* 03:15 bd808@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'toolhub' for release 'main' .


== 2020-01-30 ==
== 2021-09-30 ==
* 19:37 mutante: copying /var/log/apache2 to /root on all eqiad mw appservers to preserve logs
* 23:57 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 18:07 vgutierrez: depool cp4032 and perform a rolling restart of varnish-fe at cp4027-cp4031 - [[phab:T243634|T243634]]
* 23:54 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 17:51 ladsgroup@deploy1001: Synchronized php-1.35.0-wmf.16/extensions/Wikibase/lib/includes/Store/Sql/Terms/FingerprintableEntityTermStoreTrait.php: wbterms: Fix incorrect deletion of rows in findActuallyUnusedTermIds ([[phab:T243944|T243944]]) (duration: 01m 06s)
* 23:51 reedy@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Put a https protocol into values (duration: 01m 00s)
* 17:49 ladsgroup@deploy1001: Synchronized php-1.35.0-wmf.16/extensions/Wikibase/repo/maintenance/rebuildItemTerms.php: wbterms: Write only to the new term store in rebuildItemTerms ([[phab:T243944|T243944]]) (duration: 01m 09s)
* 23:48 dpifke@deploy1002: Finished deploy [statsv/statsv@afeff42]: Deploy statsv with Kafka TLS support (not yet enabled) [[phab:T290131|T290131]] (duration: 00m 05s)
* 17:03 vgutierrez: repooling cp4032 - [[phab:T243634|T243634]]
* 23:48 dpifke@deploy1002: Started deploy [statsv/statsv@afeff42]: Deploy statsv with Kafka TLS support (not yet enabled) [[phab:T290131|T290131]]
* 17:02 vgutierrez: restarting varnish-frontend on cp4031 before it crashes - [[phab:T243634|T243634]]
* 23:41 dpifke@deploy1002: Finished deploy [performance/coal@1be49f8]: Deploy Coal with Kafka TLS support (not yet enabled) [[phab:T290131|T290131]] (duration: 01m 07s)
* 16:26 vgutierrez: manually refreshing OCSP stapling response for non-canonical-redirects-3 - [[phab:T243948|T243948]]
* 23:40 dpifke@deploy1002: Started deploy [performance/coal@1be49f8]: Deploy Coal with Kafka TLS support (not yet enabled) [[phab:T290131|T290131]]
* 12:22 arturo: add prometheus 2.7.1+ds-3+k8s+buster to buster-wikimedia [[phab:T238096|T238096]] (basically a rebuild from stretch)
* 23:39 dpifke@deploy1002: Finished deploy [performance/navtiming@29264fb]: Deploy Navtiming with Kafka TLS support (not yet enabled) [[phab:T290131|T290131]] (duration: 00m 05s)
* 06:23 vgutierrez: restarting varnish-frontend on cp4030 before it crashes - [[phab:T243634|T243634]]
* 23:39 dpifke@deploy1002: Started deploy [performance/navtiming@29264fb]: Deploy Navtiming with Kafka TLS support (not yet enabled) [[phab:T290131|T290131]]
* 06:21 vgutierrez: depool cp4032 - [[phab:T243634|T243634]]
* 23:34 ejegg: updated Fundraising CiviCRM from {{Gerrit|d4da344274}} to {{Gerrit|d74e9aa0a1}}
* 05:12 vgutierrez: restarting varnish-frontend and repooling cp4029 - [[phab:T243634|T243634]]
* 22:09 legoktm@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'shellbox-media' for release 'main' .
* 05:00 vgutierrez: depooling cp4029
* 22:07 legoktm@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'shellbox-media' for release 'main' .
* 22:06 legoktm@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'shellbox-media' for release 'main' .
* 21:53 legoktm@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'shellbox-timeline' for release 'main' .
* 21:06 eileen: civicrm revision changed from {{Gerrit|2ecb8f0bcd}} to {{Gerrit|d4da344274}}, config revision is {{Gerrit|77cb7ec866}}
* 20:54 ryankemper: [WCQS] `ryankemper@wcqs1003:~$ sudo pool` (merged https://gerrit.wikimedia.org/r/c/operations/puppet/+/725110 to unbreak readiness probe)
* 20:54 topranks: Routinator on rpki1001 upgraded to  0.10.0 and working again after force refresh.
* 20:49 brennen: gitlab1001: upgrade to 14.2.5 complete
* 20:32 brennen: gitlab2001, gitlab1001: downtime for upgrades to 14.2.5
* 20:18 ryankemper: [WCQS] `ryankemper@wcqs1003:~$ sudo depool` (not sure why pybal can't depool it, the other 2 servers are pooled)
* 19:51 topranks: Updating routinator on rpki1001 [[phab:T291543|T291543]]
* 19:39 legoktm@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'shellbox-timeline' for release 'main' .
* 19:38 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 19:37 legoktm@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'shellbox-timeline' for release 'main' .
* 19:36 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 19:26 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 19:21 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 19:14 jhuneidi@deploy1002: rebuilt and synchronized wikiversions files: all wikis to 1.38.0-wmf.2  refs [[phab:T281166|T281166]]
* 19:11 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 19:08 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 19:07 thcipriani@deploy1002: Synchronized php-1.38.0-wmf.2/extensions/MobileFrontend: Backport: [[gerrit:724979{{!}}Fix search within pages alignment (T292107)]] (duration: 01m 09s)
* 19:05 thcipriani@deploy1002: Synchronized php-1.38.0-wmf.1/extensions/EventBus/includes/EventBus.php: Backport: [[gerrit:724481{{!}}Guard against undefined index notice when setting x-client-ip (T288853)]] (duration: 01m 09s)
* 19:04 thcipriani@deploy1002: Synchronized php-1.38.0-wmf.2/extensions/EventBus/includes/EventBus.php: Backport: [[gerrit:724480{{!}}Guard against undefined index notice when setting x-client-ip (T288853)]] (duration: 01m 09s)
* 18:58 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 18:58 thcipriani@deploy1002: Synchronized php-1.38.0-wmf.2/skins/Vector/resources/skins.vector.styles.legacy/components/MenuDropdown.less: Backport: [[gerrit:724798{{!}}Restore original more menu padding in legacy Vector (T289163)]] (duration: 01m 08s)
* 18:54 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 18:44 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 18:43 thcipriani@deploy1002: Scap failed!: Call to mwscript eval.php stderr: not empty
* 18:42 moritzm: imported gitlab 14.2.5 to thirdparty/gitlab [[phab:T292219|T292219]]
* 18:41 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 18:38 thcipriani@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:704167{{!}}Use Wikimania's logo in a new vector (T286405)]] Part III (duration: 01m 07s)
* 18:37 thcipriani@deploy1002: Synchronized static/images/mobile/copyright/wikimania-wordmark.svg: Config: [[gerrit:704167{{!}}Use Wikimania's logo in a new vector (T286405)]] Part II (duration: 01m 07s)
* 18:35 thcipriani@deploy1002: Synchronized static/images/mobile/copyright/wikimania.svg: Config: [[gerrit:704167{{!}}Use Wikimania's logo in a new vector (T286405)]] part I (duration: 01m 07s)
* 18:32 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 18:31 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 18:31 thcipriani@deploy1002: Synchronized wmf-config: Config: [[gerrit:724514{{!}}Enable sticky header on beta cluster (T289721)]] (duration: 01m 08s)
* 18:29 cmooney@cumin1001: START - Cookbook sre.dns.netbox
* 18:28 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 18:27 otto@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts thorium.eqiad.wmnet
* 18:22 bd808@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'toolhub' for release 'main' .
* 18:20 thcipriani@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:724861{{!}}Disable legacy media dom on a few more wikis (T51097)]] (duration: 01m 08s)
* 18:07 bd808@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'toolhub' for release 'main' .
* 17:49 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
* 17:49 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 17:49 otto@cumin1001: START - Cookbook sre.hosts.decommission for hosts thorium.eqiad.wmnet
* 17:42 bstorm: updating packages for thirdparty/kubeadm-k8s-1-20 and thirdparty/kubeadm-k8s-1-19 in stretch-wikimedia on apt1001 [[phab:T292131|T292131]]
* 17:09 jgiannelos@deploy1002: Finished deploy [kartotherian/deploy@8fbf87c] (eqiad): Increase mirrored traffic to 50% for eqiad (duration: 00m 55s)
* 17:08 jgiannelos@deploy1002: Started deploy [kartotherian/deploy@8fbf87c] (eqiad): Increase mirrored traffic to 50% for eqiad
* 17:03 jgiannelos@deploy1002: Finished deploy [kartotherian/deploy@8fbf87c] (eqiad): Increase mirrored traffic to 50% for eqiad (duration: 00m 08s)
* 17:02 jgiannelos@deploy1002: Started deploy [kartotherian/deploy@8fbf87c] (eqiad): Increase mirrored traffic to 50% for eqiad
* 17:02 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality' for release 'main' .
* 17:00 jgiannelos@deploy1002: Finished deploy [kartotherian/deploy@8fbf87c] (eqiad): Increase mirrored traffic to 50% for eqiad (duration: 00m 11s)
* 17:00 jgiannelos@deploy1002: Started deploy [kartotherian/deploy@8fbf87c] (eqiad): Increase mirrored traffic to 50% for eqiad
* 16:49 sukhe: restart dnsdist.service on doh[1001-1002,2001-2002,3001-3002,4001-4002,5001-5002].wikimedia.org
* 16:43 jgiannelos@deploy1002: Finished deploy [kartotherian/deploy@67a4d22]: Increase mirrored traffic to 10% (duration: 02m 33s)
* 16:40 jgiannelos@deploy1002: Started deploy [kartotherian/deploy@67a4d22]: Increase mirrored traffic to 10%
* 16:38 jgiannelos@deploy1002: Finished deploy [kartotherian/deploy@67a4d22] (eqiad): Increase mirrored traffic to 10% (duration: 00m 40s)
* 16:37 jgiannelos@deploy1002: Started deploy [kartotherian/deploy@67a4d22] (eqiad): Increase mirrored traffic to 10%
* 16:37 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 16:33 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 16:32 hnowlan: Ran `GRANT pg_monitor TO prometheus` for maps in eqiad and codfw to fix empty prometheus connection metrics
* 16:30 jgiannelos@deploy1002: Finished deploy [kartotherian/deploy@67a4d22] (eqiad): Increase mirrored traffic to 10% (duration: 00m 16s)
* 16:30 jgiannelos@deploy1002: Started deploy [kartotherian/deploy@67a4d22] (eqiad): Increase mirrored traffic to 10%
* 16:24 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 16:20 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 16:11 ladsgroup@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:725032{{!}}Disable jQuery migrate in metawiki (T280944)]] (duration: 01m 09s)
* 16:08 ladsgroup@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:725019{{!}}Enable dispatching via job to 10 prod wikis]] (duration: 01m 09s)
* 15:55 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality' for release 'main' .
* 14:36 elukey: drop /etc/helmfile-defaults/private/backup_old_paths from deploy1002 (old data not needed anymore)
* 14:33 marostegui@cumin1001: dbctl commit (dc=all): 'db1111 (re)pooling @ 100%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17374 and previous config saved to /var/cache/conftool/dbconfig/20210930-143325-root.json
* 14:30 marostegui@cumin1001: dbctl commit (dc=all): 'db1134 (re)pooling @ 100%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17373 and previous config saved to /var/cache/conftool/dbconfig/20210930-143044-root.json
* 14:18 marostegui@cumin1001: dbctl commit (dc=all): 'db1111 (re)pooling @ 75%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17372 and previous config saved to /var/cache/conftool/dbconfig/20210930-141822-root.json
* 14:15 marostegui@cumin1001: dbctl commit (dc=all): 'db1134 (re)pooling @ 75%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17370 and previous config saved to /var/cache/conftool/dbconfig/20210930-141540-root.json
* 14:03 marostegui@cumin1001: dbctl commit (dc=all): 'db1111 (re)pooling @ 50%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17369 and previous config saved to /var/cache/conftool/dbconfig/20210930-140318-root.json
* 14:00 marostegui@cumin1001: dbctl commit (dc=all): 'db1134 (re)pooling @ 50%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17368 and previous config saved to /var/cache/conftool/dbconfig/20210930-140037-root.json
* 13:48 marostegui@cumin1001: dbctl commit (dc=all): 'db1111 (re)pooling @ 25%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17367 and previous config saved to /var/cache/conftool/dbconfig/20210930-134815-root.json
* 13:45 marostegui@cumin1001: dbctl commit (dc=all): 'db1134 (re)pooling @ 25%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17366 and previous config saved to /var/cache/conftool/dbconfig/20210930-134533-root.json
* 13:42 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host testvm2001.codfw.wmnet
* 13:40 jgiannelos@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 13:38 akosiaris@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
* 13:38 akosiaris@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
* 13:37 akosiaris@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
* 13:36 akosiaris@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
* 13:33 marostegui@cumin1001: dbctl commit (dc=all): 'db1111 (re)pooling @ 10%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17365 and previous config saved to /var/cache/conftool/dbconfig/20210930-133311-root.json
* 13:30 marostegui@cumin1001: dbctl commit (dc=all): 'db1134 (re)pooling @ 10%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17364 and previous config saved to /var/cache/conftool/dbconfig/20210930-133029-root.json
* 13:29 marostegui: Upgrade db1111
* 13:28 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1111 for upgrade', diff saved to https://phabricator.wikimedia.org/P17363 and previous config saved to /var/cache/conftool/dbconfig/20210930-132831-marostegui.json
* 13:27 marostegui: Upgrade db1134
* 13:27 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1134 (s1) for upgrade', diff saved to https://phabricator.wikimedia.org/P17362 and previous config saved to /var/cache/conftool/dbconfig/20210930-132700-marostegui.json
* 13:26 marostegui: Upgrade db1133
* 13:26 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host testvm2001.codfw.wmnet
* 13:02 urbanecm: Start server-side upload for 2 video files ([[phab:T292096|T292096]], [[phab:T291492|T291492]])
* 13:01 marostegui@cumin1001: dbctl commit (dc=all): 'db1099:3311 (re)pooling @ 100%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17361 and previous config saved to /var/cache/conftool/dbconfig/20210930-130116-root.json
* 13:01 marostegui@cumin1001: dbctl commit (dc=all): 'db1099:3318 (re)pooling @ 100%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17360 and previous config saved to /var/cache/conftool/dbconfig/20210930-130109-root.json
* 12:46 marostegui@cumin1001: dbctl commit (dc=all): 'db1099:3311 (re)pooling @ 75%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17359 and previous config saved to /var/cache/conftool/dbconfig/20210930-124612-root.json
* 12:46 marostegui@cumin1001: dbctl commit (dc=all): 'db1099:3318 (re)pooling @ 75%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17358 and previous config saved to /var/cache/conftool/dbconfig/20210930-124606-root.json
* 12:31 Reedy: downloading files for [[phab:T290900|T290900]] in screen on mwmaint1002
* 12:31 marostegui@cumin1001: dbctl commit (dc=all): 'db1099:3311 (re)pooling @ 50%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17357 and previous config saved to /var/cache/conftool/dbconfig/20210930-123109-root.json
* 12:31 marostegui@cumin1001: dbctl commit (dc=all): 'db1099:3318 (re)pooling @ 50%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17356 and previous config saved to /var/cache/conftool/dbconfig/20210930-123101-root.json
* 12:18 jgiannelos@deploy1002: Finished deploy [kartotherian/deploy@35b9174]: tegola: remove mirror_threshold variable because of parsing errors (duration: 00m 17s)
* 12:18 jgiannelos@deploy1002: Started deploy [kartotherian/deploy@35b9174]: tegola: remove mirror_threshold variable because of parsing errors
* 12:17 moritzm: adapted MX records to point to both mx1001.wikimedia.org and mx2001.wikimedia.org with equal weights [[phab:T286911|T286911]]
* 12:17 jgiannelos@deploy1002: Finished deploy [kartotherian/deploy@35b9174]: tegola: remove mirror_threshold variable because of parsing errors (duration: 00m 16s)
* 12:16 jgiannelos@deploy1002: Started deploy [kartotherian/deploy@35b9174]: tegola: remove mirror_threshold variable because of parsing errors
* 12:16 marostegui@cumin1001: dbctl commit (dc=all): 'db1099:3311 (re)pooling @ 25%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17355 and previous config saved to /var/cache/conftool/dbconfig/20210930-121605-root.json
* 12:15 marostegui@cumin1001: dbctl commit (dc=all): 'db1099:3318 (re)pooling @ 25%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17354 and previous config saved to /var/cache/conftool/dbconfig/20210930-121558-root.json
* 12:14 jgiannelos@deploy1002: Finished deploy [kartotherian/deploy@35b9174]: tegola: remove mirror_threshold variable because of parsing errors (duration: 00m 15s)
* 12:13 jgiannelos@deploy1002: Started deploy [kartotherian/deploy@35b9174]: tegola: remove mirror_threshold variable because of parsing errors
* 12:13 jgiannelos@deploy1002: Finished deploy [kartotherian/deploy@35b9174]: tegola: remove mirror_threshold variable because of parsing errors (duration: 00m 15s)
* 12:13 jgiannelos@deploy1002: Started deploy [kartotherian/deploy@35b9174]: tegola: remove mirror_threshold variable because of parsing errors
* 12:11 jgiannelos@deploy1002: Finished deploy [kartotherian/deploy@35b9174]: tegola: remove mirror_threshold variable because of parsing errors (duration: 00m 10s)
* 12:10 jgiannelos@deploy1002: Started deploy [kartotherian/deploy@35b9174]: tegola: remove mirror_threshold variable because of parsing errors
* 12:10 jgiannelos@deploy1002: Finished deploy [kartotherian/deploy@35b9174]: tegola: remove mirror_threshold variable because of parsing errors (duration: 00m 01s)
* 12:10 jgiannelos@deploy1002: Started deploy [kartotherian/deploy@35b9174]: tegola: remove mirror_threshold variable because of parsing errors
* 12:03 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 12:01 marostegui@cumin1001: dbctl commit (dc=all): 'db1099:3311 (re)pooling @ 10%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17353 and previous config saved to /var/cache/conftool/dbconfig/20210930-120102-root.json
* 12:00 marostegui@cumin1001: dbctl commit (dc=all): 'db1099:3318 (re)pooling @ 10%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17352 and previous config saved to /var/cache/conftool/dbconfig/20210930-120054-root.json
* 12:00 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 11:58 hnowlan: imported wikidiff2_1.13.0-1/php-wikidiff2_1.13.0-1_amd64.deb to buster-wikimedia component/php72
* 11:56 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1099 (s1 and s8) for upgrade', diff saved to https://phabricator.wikimedia.org/P17351 and previous config saved to /var/cache/conftool/dbconfig/20210930-115631-marostegui.json
* 11:51 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 11:47 jgiannelos@deploy1002: Finished deploy [kartotherian/deploy@0a38bc5]: tegola: use eqiad discovery endpoin (duration: 00m 03s)
* 11:47 jgiannelos@deploy1002: Started deploy [kartotherian/deploy@0a38bc5]: tegola: use eqiad discovery endpoin
* 11:47 jgiannelos@deploy1002: Finished deploy [kartotherian/deploy@0a38bc5]: tegola: use eqiad discovery endpoin (duration: 00m 01s)
* 11:47 jgiannelos@deploy1002: Started deploy [kartotherian/deploy@0a38bc5]: tegola: use eqiad discovery endpoin
* 11:47 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 11:46 jgiannelos@deploy1002: Finished deploy [kartotherian/deploy@0a38bc5]: tegola: use eqiad discovery endpoin (duration: 00m 01s)
* 11:46 jgiannelos@deploy1002: Started deploy [kartotherian/deploy@0a38bc5]: tegola: use eqiad discovery endpoin
* 11:44 effie: downgrading scap to 3.17.1-1 on maps* hosts - [[phab:T291990|T291990]]
* 11:43 kartik@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:724732{{!}}Make reply tool available as opt-out almost everywhere (phase 3) (T288485)]] (duration: 01m 07s)
* 11:37 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 11:35 kartik@deploy1002: Synchronized php-1.38.0-wmf.2/extensions/DiscussionTools: Backport: [[gerrit:724789{{!}}Add a link to preferences within the Reply and New Discussion Tools (T291002)]] (duration: 01m 08s)
* 11:34 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 11:30 kartik@deploy1002: Synchronized php-1.38.0-wmf.1/extensions/DiscussionTools: Backport: [[gerrit:724788{{!}}Add a link to preferences within the Reply and New Discussion Tools (T291002)]] (duration: 01m 09s)
* 11:25 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 11:21 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 11:14 kartik@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:724458{{!}}Enable SectionTranslation in Igbo, Hausa, Yoruba Wikipedias (T290175)]] (duration: 01m 08s)
* 11:05 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
* 11:05 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 10:13 akosiaris: upgrade znuny to 6.0.37
* 10:06 godog: test bounce logstash on logstash1023
* 08:21 moritzm: installing nettle security updates on stretch
* 08:10 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host testvm2003.codfw.wmnet
* 07:49 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host testvm2003.codfw.wmnet
* 07:31 jgiannelos@deploy1002: Finished deploy [kartotherian/deploy@0a38bc5]: tegola: use eqiad discovery endpoin (duration: 00m 06s)
* 07:31 jgiannelos@deploy1002: Started deploy [kartotherian/deploy@0a38bc5]: tegola: use eqiad discovery endpoin
* 07:03 legoktm@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'shellbox-syntaxhighlight' for release 'main' .
* 06:58 legoktm@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'shellbox-syntaxhighlight' for release 'main' .
* 06:56 legoktm@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'shellbox-syntaxhighlight' for release 'main' .
* 06:48 marostegui: Deploy schema change on s8 codfw (lag will show up) [[phab:T270620|T270620]]
* 06:01 marostegui: Deploy schema change on s1 codfw (lag will show up) [[phab:T270620|T270620]]
* 05:53 marostegui: Deploy schema change on s3 codfw (lag will show up) [[phab:T270620|T270620]]
* 05:52 marostegui: Deploy schema change on s7 codfw (lag will show up) [[phab:T270620|T270620]]
* 05:47 marostegui: Deploy schema change on s5 codfw (lag will show up) [[phab:T270620|T270620]]
* 05:45 marostegui: Deploy schema change on s4 codfw (lag will show up) [[phab:T270620|T270620]]
* 05:45 marostegui: Deploy schema change on s2 codfw (lag will show up) [[phab:T270620|T270620]]


== 2020-01-29 ==
== 2021-09-29 ==
* 23:37 marostegui: Remove partitions from db2087:3317 - [[phab:T239453|T239453]]
* 23:20 bd808@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'toolhub' for release 'main' .
* 18:17 XioNoX: move knams netflow sampling to cr3-knams
* 23:05 bd808@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'toolhub' for release 'main' .
* 17:19 krinkle@deploy1001: Synchronized wmf-config/etcd.php: {{Gerrit|Ice8dad2}} (duration: 01m 10s)
* 23:02 bd808@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'toolhub' for release 'main' .
* 01:11 vgutierrez: varnish-frontend restarted on cp4031
* 22:10 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 01:09 vgutierrez: repool cp4031
* 22:07 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 01:05 marostegui: Disable notifications for dbstore1005:3318 slave lag - [[phab:T243871|T243871]]
* 21:57 legoktm@deploy1002: Synchronized php-1.38.0-wmf.2/extensions/timeline/includes/Timeline.php: Catch TimelineException from fixMap() ([[phab:T292126|T292126]]) (duration: 01m 07s)
* 01:03 vgutierrez: depool cp4031
* 21:48 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 00:35 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1097:3314 [[phab:T239453|T239453]]', diff saved to https://phabricator.wikimedia.org/P10289 and previous config saved to /var/cache/conftool/dbconfig/20200129-003507-marostegui.json
* 21:44 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 00:22 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1097:3314 [[phab:T239453|T239453]]', diff saved to https://phabricator.wikimedia.org/P10288 and previous config saved to /var/cache/conftool/dbconfig/20200129-002203-marostegui.json
* 21:37 legoktm@deploy1002: Synchronized php-1.38.0-wmf.2/extensions/timeline/includes/Timeline.php: Bump Timeline::CACHE_VERSION (duration: 01m 08s)
* 20:34 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 20:31 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 20:22 jhuneidi@deploy1002: Synchronized php: group1 wikis to 1.38.0-wmf.2  refs [[phab:T281166|T281166]] (duration: 01m 08s)
* 20:21 jhuneidi@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.38.0-wmf.2  refs [[phab:T281166|T281166]]
* 20:20 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 20:16 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 20:16 jhuneidi@deploy1002: Finished scap: Fix pywikibot feature detection (duration: 13m 38s)
* 20:02 jhuneidi@deploy1002: Started scap: Fix pywikibot feature detection
* 19:17 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 19:13 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 19:06 legoktm@deploy1002: Synchronized php-1.38.0-wmf.2/extensions/timeline/scripts/renderTimeline.sh: Fix passing temp directory to EasyTimeline.pl (duration: 01m 07s)
* 18:59 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 18:55 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 18:52 dancy@deploy1002: Synchronized php-1.38.0-wmf.2/skins/MinervaNeue/resources/skins.minerva.base.styles/ui.less: Backport: [[gerrit:724787{{!}}Search header should be vertically centered, not top aligned(take 2) (T292071)]] (duration: 01m 08s)
* 17:26 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 17:22 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 17:14 ladsgroup@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:724776{{!}}Fully enable change dispatching via jobs on test wikis]], Part I (duration: 01m 09s)
* 17:13 ladsgroup@deploy1002: Synchronized wmf-config/Wikibase.php: Config: [[gerrit:724776{{!}}Fully enable change dispatching via jobs on test wikis]], Part I (duration: 01m 07s)
* 16:52 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 16:47 pt1979@cumin2002: END (PASS) - Cookbook sre.experimental.reimage (exit_code=0) for host thumbor2006.codfw.wmnet
* 16:47 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 16:43 akosiaris: start hbal -L -G row_B -X on ganeti01.svc.codfw.wmnet . Rows C and D are fine
* 16:42 akosiaris: start hbal -L -G row_A -X on ganeti01.svc.codfw.wmnet
* 16:40 akosiaris: migrate kubemaster2001 off ganeti2007 and to ganeti2008 due to memory starvation on ganeti2007
* 16:37 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 16:34 pt1979@cumin2002: START - Cookbook sre.experimental.reimage for host thumbor2006.codfw.wmnet
* 16:33 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 16:25 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.38.0-wmf.2/extensions/WikimediaBadges/: Backport: [[gerrit:724561{{!}}Handle missing items in WikibaseClientSiteLinksForItemHandler (T291953)]] (duration: 01m 08s)
* 16:24 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.38.0-wmf.1/extensions/WikimediaBadges/: Backport: [[gerrit:724560{{!}}Handle missing items in WikibaseClientSiteLinksForItemHandler (T291953)]] (duration: 01m 10s)
* 15:58 pt1979@cumin2002: END (FAIL) - Cookbook sre.experimental.reimage (exit_code=99) for host thumbor2006.codfw.wmnet
* 15:53 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 15:50 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 15:45 Amir1: disabled cron dispatching for mediawikiwiki
* 15:44 ladsgroup@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:724765{{!}}Enable change dispatching via jobs in wikidatawiki (T48643)]] (duration: 01m 08s)
* 15:44 pt1979@cumin2002: START - Cookbook sre.experimental.reimage for host thumbor2006.codfw.wmnet
* 15:39 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2035.codfw.wmnet
* 15:39 ladsgroup@deploy1002: Synchronized php-1.38.0-wmf.2/extensions/Wikibase/client: Backport: [[gerrit:724558{{!}}Track time until dispatched recent changes are inserted (T291962)]] (duration: 01m 10s)
* 15:24 pt1979@cumin2002: END (PASS) - Cookbook sre.experimental.reimage (exit_code=0) for host thumbor2006.codfw.wmnet
* 15:22 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be2035.codfw.wmnet
* 15:21 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2026.codfw.wmnet
* 15:12 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2026.codfw.wmnet
* 15:02 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality' for release 'main' .
* 14:51 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2025.codfw.wmnet
* 14:45 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2025.codfw.wmnet
* 14:38 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 14:35 jmm@cumin2002: START - Cookbook sre.dns.netbox
* 14:25 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts testvm2001.codfw.wmnet
* 14:17 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts testvm2001.codfw.wmnet
* 14:08 dcausse@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
* 14:07 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
* 14:07 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 14:04 pt1979@cumin2002: START - Cookbook sre.experimental.reimage for host thumbor2006.codfw.wmnet
* 14:01 dcausse@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
* 13:38 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'.
* 13:38 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'.
* 13:34 volans@cumin2002: END (PASS) - Cookbook sre.experimental.reimage (exit_code=0) for host sretest1001.eqiad.wmnet
* 13:31 dcausse@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
* 13:11 otto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-analytics' for release 'production' .
* 13:11 otto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-analytics' for release 'canary' .
* 13:09 volans@cumin2002: START - Cookbook sre.experimental.reimage for host sretest1001.eqiad.wmnet
* 13:09 otto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-analytics' for release 'canary' .
* 13:09 otto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-analytics' for release 'production' .
* 13:08 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'.
* 13:08 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'.
* 13:04 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
* 13:04 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 12:09 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 12:06 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 11:56 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
<