You are browsing a read-only backup copy of Wikitech. The live site can be found at wikitech.wikimedia.org

Difference between revisions of "Server Admin Log"

From Wikitech-static
Jump to navigation Jump to search
imported>Stashbot
(thcipriani@deploy1002: Synchronized php-1.36.0-wmf.37/extensions/WikimediaEvents/modules/ext.wikimediaEvents/searchSatisfaction.js: Backport: Revert "Turn on glent m1 AB test" T262612 (duration: 00m 58s))
imported>Stashbot
(mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .)
(110 intermediate revisions by 2 users not shown)
Line 1: Line 1:
== 2021-04-01 ==
== 2021-08-03 ==
* 23:32 thcipriani@deploy1002: Synchronized php-1.36.0-wmf.37/extensions/WikimediaEvents/modules/ext.wikimediaEvents/searchSatisfaction.js: Backport: [[gerrit:676350{{!}}Revert "Turn on glent m1 AB test"]] [[phab:T262612|T262612]] (duration: 00m 58s)
* 23:34 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 23:28 thcipriani: reset /srv/mediawiki-staging/php-1.36.0-wmf.37/extensions/TimedMediaHandler to {{Gerrit|1be781d}} (HEAD of wmf/1.36.0-wmf.37 -- from HEAD of master 49f417)
* 23:28 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 23:12 thcipriani@deploy1002: Synchronized wmf-config/logos.php: Backport: Part III [[gerrit:676451{{!}}Add hi-res version of mediawiki.org logos]] [[phab:T268230|T268230]] (duration: 00m 57s)
* 23:17 ebernhardson@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:709770{{!}}Re-enable commonswiki sister search (T277225)]] (duration: 01m 07s)
* 23:10 thcipriani@deploy1002: Synchronized logos: Backport: Part II [[gerrit:676451{{!}}Add hi-res version of mediawiki.org logos]] [[phab:T268230|T268230]] (duration: 00m 57s)
* 22:57 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 23:08 thcipriani@deploy1002: Synchronized static: Backport: Part I [[gerrit:676451{{!}}Add hi-res version of mediawiki.org logos]] [[phab:T268230|T268230]] (duration: 00m 59s)
* 22:50 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 22:53 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw2248.codfw.wmnet
* 22:37 dduvall: re-rolled 1.37.0-wmf.17 to group0 following rollback and subsequent fixes for [[phab:T287988|T287988]] ([[phab:T281158|T281158]])
* 22:50 twentyafterfour@deploy1002: Finished deploy [releng/phatality@27ddd0b]: deploy phatality (duration: 00m 13s)
* 22:28 dduvall@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.37.0-wmf.17
* 22:50 twentyafterfour@deploy1002: Started deploy [releng/phatality@27ddd0b]: deploy phatality
* 22:20 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.16/autoload.php: {{Gerrit|7d286dc0feaef354943a70ee18014d55cbb2aefa}}: Add (MySQL/DB)PrimaryPos as an alias to (MySQL/DB)MasterPos ([[phab:T287988|T287988]]; 3/3) (duration: 01m 07s)
* 22:49 twentyafterfour: deploying phatality
* 22:18 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.16/includes/libs/rdbms/database/position/MySQLMasterPos.php: {{Gerrit|7d286dc0feaef354943a70ee18014d55cbb2aefa}}: Add (MySQL/DB)PrimaryPos as an alias to (MySQL/DB)MasterPos ([[phab:T287988|T287988]]; 2/3) (duration: 01m 07s)
* 22:34 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw2248.codfw.wmnet
* 22:15 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.16/includes/libs/rdbms/database/position/DBMasterPos.php: {{Gerrit|7d286dc0feaef354943a70ee18014d55cbb2aefa}}: Add (MySQL/DB)PrimaryPos as an alias to (MySQL/DB)MasterPos ([[phab:T287988|T287988]]; 1/3) (duration: 01m 07s)
* 22:33 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts mw2247.codfw.wmnet
* 21:51 ebernhardson@deploy1002: Finished deploy [wikimedia/discovery/analytics@2d533ba]: enable glent version marker in final output (duration: 45m 00s)
* 22:17 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw2247.codfw.wmnet
* 21:43 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 22:12 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw2246.codfw.wmnet
* 21:37 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 22:00 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw2246.codfw.wmnet
* 21:10 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 21:58 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw2243.codfw.wmnet
* 21:06 ebernhardson@deploy1002: Started deploy [wikimedia/discovery/analytics@2d533ba]: enable glent version marker in final output
* 21:36 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw2243.codfw.wmnet
* 21:04 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 20:42 mutante: mw2243, mw2246, mw2247, mw2248 - depooled - replaced by mw2379, mw2380, mw2381, mw2382 ( [[phab:T277780|T277780]])
* 20:46 rzl@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 13 days, 0:00:00 on mw2383.codfw.wmnet with reason: [[phab:T286463|T286463]]
* 20:41 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2248.codfw.wmnet
* 20:45 rzl@cumin1001: START - Cookbook sre.hosts.downtime for 13 days, 0:00:00 on mw2383.codfw.wmnet with reason: [[phab:T286463|T286463]]
* 20:41 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2247.codfw.wmnet
* 20:44 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.16/autoload.php: REVERT: {{Gerrit|2d4ea752ec6f412ba071ef46023c978d55afcd98}}: Add (MySQL/DB)PrimaryPos as an alias to (MySQL/DB)MasterPos ([[phab:T287988|T287988]]; 1/3) (duration: 00m 37s)
* 20:41 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2246.codfw.wmnet
* 20:41 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.16/includes/libs/rdbms/database/position/DBMasterPos.php: REVERT: {{Gerrit|2d4ea752ec6f412ba071ef46023c978d55afcd98}}: Add (MySQL/DB)PrimaryPos as an alias to (MySQL/DB)MasterPos ([[phab:T287988|T287988]]; 2/3) (duration: 00m 37s)
* 20:41 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2243.codfw.wmnet
* 20:40 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.16/includes/libs/rdbms/database/position/DBMasterPos.php: {{Gerrit|2d4ea752ec6f412ba071ef46023c978d55afcd98}}: Add (MySQL/DB)PrimaryPos as an alias to (MySQL/DB)MasterPos ([[phab:T287988|T287988]]; 2/3) (duration: 01m 07s)
* 20:23 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2382.codfw.wmnet
* 20:39 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.16/autoload.php: {{Gerrit|2d4ea752ec6f412ba071ef46023c978d55afcd98}}: Add (MySQL/DB)PrimaryPos as an alias to (MySQL/DB)MasterPos ([[phab:T287988|T287988]]; 1/3) (duration: 01m 08s)
* 20:23 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2381.codfw.wmnet
* 20:23 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on ganeti-test2003.codfw.wmnet with reason: REIMAGE
* 20:21 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2380.codfw.wmnet
* 20:20 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti-test2003.codfw.wmnet with reason: REIMAGE
* 20:20 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2379.codfw.wmnet
* 20:13 otto@deploy1002: Finished deploy [analytics/refinery@ea78871] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@ea78871] (duration: 05m 36s)
* 20:06 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2379.codfw.wmnet with reason: new_install
* 20:08 otto@deploy1002: Started deploy [analytics/refinery@ea78871] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@ea78871]
* 20:06 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2379.codfw.wmnet with reason: new_install
* 20:08 otto@deploy1002: Finished deploy [analytics/refinery@ea78871] (thin): Regular analytics weekly train THIN [analytics/refinery@ea78871] (duration: 00m 07s)
* 20:06 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2380.codfw.wmnet with reason: new_install
* 20:07 otto@deploy1002: Started deploy [analytics/refinery@ea78871] (thin): Regular analytics weekly train THIN [analytics/refinery@ea78871]
* 20:05 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2380.codfw.wmnet with reason: new_install
* 20:03 otto@deploy1002: Finished deploy [analytics/refinery@ea78871]: Regular analytics weekly train [analytics/refinery@ea78871] (duration: 20m 38s)
* 20:05 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2382.codfw.wmnet with reason: new_install
* 19:57 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 20:05 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2382.codfw.wmnet with reason: new_install
* 19:52 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 20:05 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2381.codfw.wmnet with reason: new_install
* 19:45 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on ganeti-test2002.codfw.wmnet with reason: REIMAGE
* 20:05 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2381.codfw.wmnet with reason: new_install
* 19:43 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti-test2002.codfw.wmnet with reason: REIMAGE
* 20:01 razzi@deploy1002: Finished deploy [analytics/superset/deploy@5b8de4c]: Deployment of superset {{Gerrit|fd7c9eb71e193}}, released after 1.0.1 (duration: 00m 04s)
* 19:42 otto@deploy1002: Started deploy [analytics/refinery@ea78871]: Regular analytics weekly train [analytics/refinery@ea78871]
* 20:01 razzi@deploy1002: Started deploy [analytics/superset/deploy@5b8de4c]: Deployment of superset {{Gerrit|fd7c9eb71e193}}, released after 1.0.1
* 19:36 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 20:01 razzi@deploy1002: deploy aborted: Deployment of superset {{Gerrit|fd7c9eb71e193}}, released after 1.0.1hv (duration: 00m 00s)
* 19:31 ryankemper: [[phab:T285355|T285355]] `ryankemper@an-web1001:~$ sudo run-puppet-agent` to establish `role(analytics_cluster::webserver)` on the host in preparation for upcoming cutover from `thorium`->`an-web1001`
* 20:01 mutante: mw2379, mw2380, mw2381, mw2382 - scap pull
* 19:31 otto@deploy1002: Finished deploy [analytics/refinery@aceb561] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@aceb561] (duration: 05m 40s)
* 19:59 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2382.codfw.wmnet
* 19:30 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 19:59 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2381.codfw.wmnet
* 19:27 dduvall@deploy1002: rebuilt and synchronized wikiversions files: revert group0 wikis to 1.37.0-wmf.16
* 19:59 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2380.codfw.wmnet
* 19:25 otto@deploy1002: Started deploy [analytics/refinery@aceb561] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@aceb561]
* 19:59 razzi@deploy1002: Finished deploy [analytics/superset/deploy@5b8de4c]: Deployment of superset {{Gerrit|fd7c9eb71e193}}, released after 1.0.1 (duration: 00m 21s)
* 19:14 dduvall@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.37.0-wmf.17
* 19:59 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2379.codfw.wmnet
* 19:01 otto@deploy1002: Finished deploy [analytics/refinery@aceb561] (thin): Regular analytics weekly train THIN [analytics/refinery@aceb561] (duration: 00m 07s)
* 19:58 razzi@deploy1002: Started deploy [analytics/superset/deploy@5b8de4c]: Deployment of superset {{Gerrit|fd7c9eb71e193}}, released after 1.0.1
* 19:01 otto@deploy1002: Started deploy [analytics/refinery@aceb561] (thin): Regular analytics weekly train THIN [analytics/refinery@aceb561]
* 19:57 kharlan@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'linkrecommendation' for release 'external' .
* 19:00 otto@deploy1002: Finished deploy [analytics/refinery@aceb561]: Regular analytics weekly train [analytics/refinery@aceb561] (duration: 16m 25s)
* 19:57 kharlan@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'linkrecommendation' for release 'production' .
* 18:47 Amir1: running mwscript migrateUserGroup.php --wiki=idwiki editor reviewer ([[phab:T286853|T286853]])
* 19:56 razzi@deploy1002: Finished deploy [analytics/superset/deploy@5b8de4c]: Deployment of superset {{Gerrit|fd7c9eb71e193}}, released after 1.0.1 (duration: 00m 12s)
* 18:44 otto@deploy1002: Started deploy [analytics/refinery@aceb561]: Regular analytics weekly train [analytics/refinery@aceb561]
* 19:56 razzi@deploy1002: Started deploy [analytics/superset/deploy@5b8de4c]: Deployment of superset {{Gerrit|fd7c9eb71e193}}, released after 1.0.1
* 18:29 dduvall@deploy1002: Finished scap: testwikis wikis to 1.37.0-wmf.17 (duration: 36m 44s)
* 19:54 kharlan@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'linkrecommendation' for release 'production' .
* 18:18 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 19:54 kharlan@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'linkrecommendation' for release 'external' .
* 18:12 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 19:51 kharlan@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'linkrecommendation' for release 'staging' .
* 18:06 ebernhardson@deploy1002: Finished deploy [search/mjolnir/deploy@f0f70d1]: [[phab:T286642|T286642]] fixes to bulk daemon prioritization (duration: 00m 48s)
* 19:37 mutante: pooled parse2001 again after twentyaftefour rebuilt the l10n cache for wmf.37 which fixed it and made Apache alert recover ([[phab:T268524|T268524]])
* 18:05 ebernhardson@deploy1002: Started deploy [search/mjolnir/deploy@f0f70d1]: [[phab:T286642|T286642]] fixes to bulk daemon prioritization
* 19:34 mutante: mw2379, mw2380, mw2381, mw2382 - rebooting
* 17:52 dduvall@deploy1002: Started scap: testwikis wikis to 1.37.0-wmf.17
* 19:34 twentyafterfour@deploy1002: scap sync-l10n completed (1.36.0-wmf.37) (duration: 02m 38s)
* 16:59 hashar: Gerrit has been upgraded
* 19:30 mutante: depooled parse2001 because on train deployment it caused "MWException: No localisation cache found for English" and then "HTTP CRITICAL: HTTP/1.1 500 Internal Server Error" ([[phab:T268524|T268524]])
* 16:47 dancy@deploy1002: Finished deploy [gerrit/gerrit@244120b]: Gerrit to 3.3.5 on gerrit1001 (duration: 00m 07s)
* 19:29 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=parse2001.codfw.wmnet
* 16:47 dancy@deploy1002: Started deploy [gerrit/gerrit@244120b]: Gerrit to 3.3.5 on gerrit1001
* 19:28 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=parse2001.codfw.wmnet
* 16:45 urbanecm: Start server side upload for 1 video file ([[phab:T287957|T287957]])
* 19:27 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=parse2001.codfw.wmnet
* 16:45 hashar: Stopping Gerrit for upgrade
* 19:21 twentyafterfour@deploy1002: rebuilt and synchronized wikiversions files: group2 wikis to 1.36.0-wmf.37  refs [[phab:T278343|T278343]]
* 16:43 volans: upgraded spicerack to 0.0.57-1+deb10u1 on cumin1001
* 18:59 mutante: creating mcrouter certs for mw2379 thorugh mw2382
* 16:36 dancy@deploy1002: Finished deploy [gerrit/gerrit@244120b]: Gerrit to 3.3.5 on gerrit2001 (duration: 00m 10s)
* 18:35 Urbanecm: Morning B&C window done
* 16:36 dancy@deploy1002: Started deploy [gerrit/gerrit@244120b]: Gerrit to 3.3.5 on gerrit2001
* 18:33 urbanecm@deploy1002: Synchronized php-1.36.0-wmf.37/extensions/WikibaseMediaInfo/resources/mediasearch-vue/components/base/Dialog.vue: {{Gerrit|e77f2b98a4fcb7d9cf74c45caeb7cfbc68a063d0}}: Use appendChild() instead of append() ([[phab:T278448|T278448]]) (duration: 01m 09s)
* 16:27 hashar: Going to upgrade Gerrit 3.3 (scheduled maintenance)
* 18:31 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|b485d1ca6779a03912345a094fa1101cef5f091a}}: Enable SandboxLink extension in ptwikinews ([[phab:T278634|T278634]]) (duration: 01m 12s)
* 16:18 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 17:49 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on bast1003.wikimedia.org with reason: REIMAGE
* 16:14 pt1979@cumin2002: START - Cookbook sre.dns.netbox
* 17:47 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on bast1003.wikimedia.org with reason: REIMAGE
* 16:00 dcausse@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
* 17:24 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 15:55 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 17:21 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 15:50 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 16:59 Urbanecm: Start server-side upload of two files ([[phab:T279082|T279082]], [[phab:T279081|T279081]])
* 15:49 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 16:44 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=aqs1007.eqiad.wmnet
* 15:34 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 16:39 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|a7acf3357d5d148bad11a2d2718b4da56e1a0cb8}}: hrwiki: Fix help panel links ([[phab:T275684|T275684]]) (duration: 01m 10s)
* 15:30 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 16:25 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mw2396.codfw.wmnet with reason: REIMAGE
* 15:26 jmm@cumin2002: START - Cookbook sre.dns.netbox
* 16:23 pt1979@cumin2001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2396.codfw.wmnet with reason: REIMAGE
* 15:25 moritzm: prune testvm2001 from Ganeti and clean up from Netbox (stuck in some inconsistent state the decom cookbook can't handle) [[phab:T286206|T286206]]
* 16:02 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mw2395.codfw.wmnet with reason: REIMAGE
* 15:14 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host testvm2002.codfw.wmnet
* 16:00 pt1979@cumin2001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2395.codfw.wmnet with reason: REIMAGE
* 15:01 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host testvm2002.codfw.wmnet
* 15:58 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mw2394.codfw.wmnet with reason: REIMAGE
* 14:56 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts testvm2001.codfw.wmnet
* 15:56 pt1979@cumin2001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2394.codfw.wmnet with reason: REIMAGE
* 14:49 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts testvm2001.codfw.wmnet
* 15:39 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mw2393.codfw.wmnet with reason: REIMAGE
* 14:32 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 15:37 pt1979@cumin2001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2393.codfw.wmnet with reason: REIMAGE
* 14:27 ottomata: chown dumpsgen and chmod 644 /data/xmldatadumps/public/*/20210801/dumpstatus.json  on labstore1006 and labstore1007 (it was only readable by root causing an analytics import job to fail), ping apergos
* 15:32 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mw2391.codfw.wmnet with reason: REIMAGE
* 14:23 ottomata: chown dumpsgen and chmod 644 /data/xmldatadumps/public/lezwiki/20210801/dumpstatus.json  on labstore1006 and labstore1007 (it was only readable by root causing an analytics import job to fail), ping apergos
* 15:30 pt1979@cumin2001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2391.codfw.wmnet with reason: REIMAGE
* 14:13 ottomata: chown dumpsgen and chmod 644 dumpsdata1003:/data/xmldatadumps/public/lezwiki/20210801/dumpstatus.json (it was only readable by root causing an analytics import job to fail), ping apergos
* 15:05 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mw2392.codfw.wmnet with reason: REIMAGE
* 12:47 moritzm: restarting Tomcat on idp1001
* 15:03 pt1979@cumin2001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2392.codfw.wmnet with reason: REIMAGE
* 12:05 moritzm: installing libgcrypt20 security updates
* 14:52 volans: uploaded python3-wmflib_0.0.7 to bullseye-wikimedia
* 11:48 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host testvm2001.codfw.wmnet
* 14:41 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mw2390.codfw.wmnet with reason: REIMAGE
* 11:36 moritzm: updated bullseye d-i images to rc3 [[phab:T275873|T275873]]
* 14:39 pt1979@cumin2001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2390.codfw.wmnet with reason: REIMAGE
* 11:28 godog: upgrade prometheus3001 to 2.24.1+ds-1+wmf1 - [[phab:T222113|T222113]]
* 14:22 effie: disable puppet on mw* canaries, rolling depool and pooling of canaries
* 11:25 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host testvm2001.codfw.wmnet
* 14:06 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-test-worker1001.eqiad.wmnet with reason: REIMAGE
* 11:19 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 14:04 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-test-worker1001.eqiad.wmnet with reason: REIMAGE
* 11:18 godog: upgrade prometheus5001 to 2.24.1+ds-1+wmf1 - [[phab:T222113|T222113]]
* 14:02 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mw2389.codfw.wmnet with reason: REIMAGE
* 11:15 jmm@cumin2002: START - Cookbook sre.dns.netbox
* 13:59 pt1979@cumin2001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2389.codfw.wmnet with reason: REIMAGE
* 11:13 moritzm: rename Ganeti group for test cluster to row_D [[phab:T286206|T286206]]
* 13:53 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mw2388.codfw.wmnet with reason: REIMAGE
* 11:01 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host testvm2001.codfw.wmnet
* 13:51 pt1979@cumin2001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2388.codfw.wmnet with reason: REIMAGE
* 10:58 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host testvm2001.codfw.wmnet
* 13:24 ema: cp3054: reboot with Linux 4.19.181+1 -- the kernel was not upgraded earlier during [[phab:T273278|T273278]] reboots due to broken dpkg status
* 09:18 marostegui: Failover m1, m2 and m3-master  [[phab:T287574|T287574]]
* 13:16 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1022.eqiad.wmnet
* 09:12 moritzm: installinh php 7.0 security updates on stretch
* 13:07 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ganeti1022.eqiad.wmnet
* 09:11 jayme: importing dragonfly 1.0.6-2 to buster-wikimedia and stretch-wikimedia - [[phab:T286054|T286054]]
* 12:59 dcaro@cumin1001: END (PASS) - Cookbook sre.hosts.upgrade-and-reboot (exit_code=0)
* 08:57 moritzm: installing pillow security updates on stretch
* 12:53 dcaro@cumin1001: START - Cookbook sre.hosts.upgrade-and-reboot
* 08:53 jynus@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1145.eqiad.wmnet with reason: REIMAGE
* 12:51 dcaro@cumin1001: END (PASS) - Cookbook sre.hosts.upgrade-and-reboot (exit_code=0)
* 08:50 jynus@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1145.eqiad.wmnet with reason: REIMAGE
* 12:47 moritzm: drain ganeti1022
* 08:17 legoktm: pausing refreshLinks run against wikiversities while other issues are figured out
* 12:46 dcaro@cumin1001: START - Cookbook sre.hosts.upgrade-and-reboot
* 08:13 jynus@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1145.eqiad.wmnet with reason: REIMAGE
* 12:45 dcaro@cumin1001: END (PASS) - Cookbook sre.hosts.upgrade-and-reboot (exit_code=0)
* 08:10 jynus@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1145.eqiad.wmnet with reason: REIMAGE
* 12:45 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1021.eqiad.wmnet
* 08:03 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5 days, 8:00:00 on planet1002.eqiad.wmnet with reason: known issue
* 12:40 dcaro@cumin1001: START - Cookbook sre.hosts.upgrade-and-reboot
* 08:03 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 5 days, 8:00:00 on planet1002.eqiad.wmnet with reason: known issue
* 12:38 dcaro@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudcephosd2003-dev.codfw.wmnet
* 07:42 moritzm: upgrading spicerack on cumin2002 to 0.0.57
* 12:38 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ganeti1021.eqiad.wmnet
* 06:31 kart__: Updated cxserver to 2021-08-02-164000-production ([[phab:T286473|T286473]])
* 12:34 dcaro@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudcephosd2003-dev.codfw.wmnet
* 06:26 kartik@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'cxserver' for release 'production' .
* 12:23 moritzm: drain ganeti1021
* 06:20 kartik@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'cxserver' for release 'production' .
* 12:21 dcaro@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudcephosd2003-dev.codfw.wmnet
* 06:15 kartik@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'cxserver' for release 'staging' .
* 12:19 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1020.eqiad.wmnet
* 04:37 marostegui: Disable puppet on dbproxy1014 dbproxy1013 dbproxy1020
* 12:15 dcaro@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudcephosd2003-dev.codfw.wmnet
* 00:43 reedy@deploy1002: Finished deploy [integration/docroot@f9d225d]: with less gref (duration: 00m 05s)
* 12:12 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ganeti1020.eqiad.wmnet
* 00:43 reedy@deploy1002: Started deploy [integration/docroot@f9d225d]: with less gref
* 11:59 Urbanecm: Start server upload of two video files (~4 GB in total) # [[phab:T278856|T278856]]
* 00:29 reedy@deploy1002: Finished deploy [integration/docroot@f7df1c7]: (no justification provided) (duration: 00m 05s)
* 11:55 moritzm: drain ganeti1020
* 00:29 reedy@deploy1002: Started deploy [integration/docroot@f7df1c7]: (no justification provided)
* 11:55 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1019.eqiad.wmnet
* 00:22 reedy@deploy1002: Finished deploy [integration/docroot@3cff0e4]: (no justification provided) (duration: 00m 08s)
* 11:48 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ganeti1019.eqiad.wmnet
* 00:22 reedy@deploy1002: Started deploy [integration/docroot@3cff0e4]: (no justification provided)
* 11:43 ladsgroup@deploy1002: Synchronized wmf-config/InitialiseSettings.php: [[gerrit:675993{{!}}Disable RelatedArticles on Timeless skin on German Wikipedia]] ([[phab:T278611|T278611]]) (duration: 01m 08s)
* 11:41 moritzm: drain ganeti1019
* 11:38 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1018.eqiad.wmnet
* 11:31 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ganeti1018.eqiad.wmnet
* {{safesubst:SAL entry|1=11:23 ladsgroup@deploy1002: Synchronized wmf-config/InitialiseSettings.php: [[gerrit:674820{{!}}Enable MediaSearch by default for anonymous users (duration: 01m 10s)}}
* 11:20 moritzm: drain ganeti1018
* 11:17 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1017.eqiad.wmnet
* 11:10 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ganeti1017.eqiad.wmnet
* 11:00 moritzm: drain ganeti1017
* 10:56 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1011.eqiad.wmnet
* 10:49 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ganeti1011.eqiad.wmnet
* 10:39 dcaro@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudcephosd2002-dev.codfw.wmnet
* 10:33 dcaro@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudcephosd2002-dev.codfw.wmnet
* 10:33 dcaro@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudcephosd2001-dev.codfw.wmnet
* 10:26 dcaro@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudcephosd2001-dev.codfw.wmnet
* 09:07 hashar: contint2001: compressing files with 4 parallel executions:  sudo -u jenkins find /srv/jenkins/builds/mediawiki-fresnel-patch-docker -name "*trace.json" -print0{{!}}xargs -0 -P4 gzip
* 09:01 hashar: contint2001: compressing all fresnel trace--trace.json files: sudo -u jenkins find /srv/jenkins/builds/mediawiki-fresnel-patch-docker -name "*trace.json" -exec gzip <nowiki>{</nowiki><nowiki>}</nowiki> \+  # [[phab:T249268|T249268]]
* 08:52 moritzm: drain ganeti1011
* 08:35 moritzm: failover Ganeti master in eqiad to ganeti1009
* 08:25 moritzm: installing ldb security updates
* 08:12 kharlan@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'linkrecommendation' for release 'external' .
* 08:12 kharlan@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'linkrecommendation' for release 'production' .
* 08:10 kharlan@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'linkrecommendation' for release 'external' .
* 08:10 kharlan@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'linkrecommendation' for release 'production' .
* 08:09 kharlan@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'linkrecommendation' for release 'staging' .
* 07:58 kharlan@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'linkrecommendation' for release 'production' .
* 07:58 kharlan@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'linkrecommendation' for release 'external' .
* 07:55 kharlan@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'linkrecommendation' for release 'staging' .
* 06:37 elukey: powercycle cp1087 (no ssh, no tty via serial console) - [[phab:T278729|T278729]]
* 06:35 elukey@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp1087.eqiad.wmnet
* 02:50 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mw2386.codfw.wmnet with reason: REIMAGE
* 02:48 pt1979@cumin2001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2386.codfw.wmnet with reason: REIMAGE
* 02:21 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mw2387.codfw.wmnet with reason: REIMAGE
* 02:18 pt1979@cumin2001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2387.codfw.wmnet with reason: REIMAGE
* 02:16 pt1979@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2384.codfw.wmnet with reason: REIMAGE
* 02:13 pt1979@cumin2001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2384.codfw.wmnet with reason: REIMAGE
* 01:53 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mw2385.codfw.wmnet with reason: REIMAGE
* 01:52 Reedy: `echo "https://www.mediawiki.org/static/images/footer/poweredby_mediawiki_176x62.png" {{!}} mwscript purgeList.php --wiki=enwiki` [[phab:T268230|T268230]]
* 01:51 pt1979@cumin2001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2385.codfw.wmnet with reason: REIMAGE
* 01:51 Reedy: `echo "https://www.mediawiki.org/favicon.ico" {{!}} mwscript purgeList.php --wiki=enwiki` [[phab:T268230|T268230]]
* 01:37 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mw2384.codfw.wmnet with reason: REIMAGE
* 01:35 pt1979@cumin2001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2384.codfw.wmnet with reason: REIMAGE
* 01:24 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mw2383.codfw.wmnet with reason: REIMAGE
* 01:22 pt1979@cumin2001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2383.codfw.wmnet with reason: REIMAGE
* 01:12 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mw2382.codfw.wmnet with reason: REIMAGE
* 01:10 pt1979@cumin2001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2382.codfw.wmnet with reason: REIMAGE
* 00:56 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mw2381.codfw.wmnet with reason: REIMAGE
* 00:54 pt1979@cumin2001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2381.codfw.wmnet with reason: REIMAGE
* 00:46 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mw2380.codfw.wmnet with reason: REIMAGE
* 00:44 pt1979@cumin2001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2380.codfw.wmnet with reason: REIMAGE
* 00:32 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mw2379.codfw.wmnet with reason: REIMAGE
* 00:30 pt1979@cumin2001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2379.codfw.wmnet with reason: REIMAGE
* 00:12 pt1979@cumin2001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 00:08 legoktm: uploaded mailman3 3.2.1-1+wmf1, postorius 1.2.4-1+wmf1 to apt.wikimedia.org
* 00:07 pt1979@cumin2001: START - Cookbook sre.dns.netbox


== 2021-03-31 ==
== 2021-08-02 ==
* 23:34 urbanecm@deploy1002: Synchronized php-1.36.0-wmf.37/extensions/Wikibase/client/includes/DataAccess/Scribunto/: {{Gerrit|bfc8f55196f57e43c0abc8a16d81cb3b390ac94a}}: Eliminate another php.getSetting() from Lua code (duration: 01m 09s)
* 23:58 legoktm@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 23:32 urbanecm@deploy1002: Synchronized php-1.36.0-wmf.36/extensions/Wikibase/client/includes/DataAccess/Scribunto/: {{Gerrit|ad564a098f9174d76ff5c95adec20064ddde7bc9}}: Eliminate another php.getSetting() from Lua code (duration: 01m 10s)
* 23:50 legoktm@cumin1001: START - Cookbook sre.dns.netbox
* 23:12 jhuneidi@deploy1002: Synchronized .pipeline/config.yaml: Config: [[gerrit:674698{{!}}Include private folder in restricted image (T276145)]] (duration: 01m 08s)
* 23:38 legoktm@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'shellbox-constraints' for release 'main' .
* 23:05 ladsgroup@deploy1002: Synchronized wmf-config/InitialiseSettings.php: [[gerrit:668241{{!}}Use the new mediawiki logos]], part II ([[phab:T268230|T268230]]) (duration: 01m 11s)
* 23:38 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 23:03 ladsgroup@deploy1002: Synchronized static: [[gerrit:668241{{!}}Use the new mediawiki logos]], part I ([[phab:T268230|T268230]]) (duration: 01m 09s)
* 23:38 legoktm@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'shellbox-constraints' for release 'main' .
* 22:58 Urbanecm: Start server side upload for 3 files
* 23:35 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 22:01 Urbanecm: Server side upload of three video files ([[phab:T279011|T279011]], [[phab:T278956|T278956]], [[phab:T278955|T278955]])
* 23:28 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 22:01 eileen: civicrm revision changed from {{Gerrit|2fcea570bd}} to {{Gerrit|740e49d868}}, config revision is {{Gerrit|6779e3829a}}
* 23:25 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 20:16 robh@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 23:21 legoktm: Previous sync also deployed {{Gerrit|c38998f03f}} "Stop enabling DPL on new wikis" ([[phab:T287380|T287380]])
* 20:00 dwisehaupt: shifted payments2003 to use gtid for mysql replication.
* 23:18 legoktm@deploy1002: Synchronized dblists/: Move ruwikinews to large wikis dblist (2/2) (duration: 00m 56s)
* 19:55 robh@cumin1001: START - Cookbook sre.dns.netbox
* 23:16 legoktm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Move ruwikinews to large wikis dblist (1/2) (duration: 00m 57s)
* 19:21 twentyafterfour@deploy1002: Synchronized php: group1 wikis to 1.36.0-wmf.37  refs [[phab:T278343|T278343]] (duration: 01m 08s)
* 21:31 tzatziki: removing 1 file for legal compliance
* 19:20 twentyafterfour@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.36.0-wmf.37  refs [[phab:T278343|T278343]]
* 21:16 tzatziki: removing 7 files for legal compliance
* 19:18 robh@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 19:35 razzi@cumin1001: END (PASS) - Cookbook sre.druid.roll-restart-workers (exit_code=0) for Druid analytics cluster: Roll restart of Druid's jvm daemons. - razzi@cumin1001
* 19:13 twentyafterfour@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.36.0-wmf.37  refs [[phab:T278343|T278343]]
* 19:15 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 19:06 robh@cumin1001: START - Cookbook sre.dns.netbox
* 19:13 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 19:03 twentyafterfour@deploy1002: Synchronized php-1.36.0-wmf.37/includes/Revision/RevisionRecord.php: sync https://gerrit.wikimedia.org/r/c/mediawiki/core/+/675875 to unblock train refs  [[phab:T278376|T278376]] [[phab:T278343|T278343]] (duration: 00m 58s)
* 19:05 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 17:56 twentyafterfour@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.36.0-wmf.36  refs [[phab:T278343|T278343]]
* 19:01 urbanecm: Run extensions/GrowthExperiments/maintenance/initWikiConfig.php on a couple of wikis to init on-wiki config for Growth features ([[phab:T287868|T287868]], [[phab:T287874|T287874]], [[phab:T287873|T287873]])
* 17:49 twentyafterfour@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.36.0-wmf.37  refs [[phab:T278343|T278343]]
* 19:00 urbanecm: Morning B&C window completed
* 17:41 twentyafterfour: The train is now unblocked, promoting to group0 refs [[phab:T278343|T278343]]
* 19:00 urbanecm@deploy1002: Synchronized dblists/growthexperiments.dblist: {{Gerrit|bebf4a9819f80e19cbb94f115f47c1ff4d05b7d2}}: Enable Growth features on a couple of wikis in dark mode ([[phab:T287868|T287868]], [[phab:T287874|T287874]], [[phab:T287873|T287873]]; 2/2) (duration: 00m 56s)
* 17:01 Urbanecm: Server side upload of three video files ([[phab:T278959|T278959]], [[phab:T278958|T278958]], [[phab:T278957|T278957]])
* 18:59 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|bebf4a9819f80e19cbb94f115f47c1ff4d05b7d2}}: Enable Growth features on a couple of wikis in dark mode ([[phab:T287868|T287868]], [[phab:T287874|T287874]], [[phab:T287873|T287873]]; 1/2) (duration: 00m 57s)
* 15:09 hnowlan@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
* 18:58 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 15:09 hnowlan@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
* 18:57 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 14:57 papaul: disconnecting ps1-d8-codfw for replacement
* 18:50 otto@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Stream config for android_notification_interaction - [[phab:T287652|T287652]] (duration: 00m 56s)
* 14:17 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=aqs1007.eqiad.wmnet
* 18:50 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 14:02 Urbanecm: Server side upload of two video files ([[phab:T278961|T278961]], [[phab:T278960|T278960]])
* 18:49 razzi@cumin1001: START - Cookbook sre.druid.roll-restart-workers for Druid analytics cluster: Roll restart of Druid's jvm daemons. - razzi@cumin1001
* 13:48 jynus: retrying s3 snapshot on codfw
* 18:49 urbanecm: Run extensions/GrowthExperiments/maintenance/initWikiConfig.php on a couple of wikis to init on-wiki config for Growth features ([[phab:T287876|T287876]], [[phab:T287871|T287871]], [[phab:T287878|T287878]], [[phab:T287880|T287880]], [[phab:T287875|T287875]], [[phab:T287879|T287879]], [[phab:T287872|T287872]])
* 13:39 akosiaris: revert mw1412, mw1413, wtp1032, mw2305 to the previous state for [[phab:T278220|T278220]]
* 18:48 razzi@cumin1001: END (PASS) - Cookbook sre.druid.roll-restart-workers (exit_code=0) for Druid public cluster: Roll restart of Druid's jvm daemons. - razzi@cumin1001
* 13:34 akosiaris: disabling puppet on role::mediawiki::appserver, role::mediawiki::appserver::api, role::mediawiki::maintenance, role::mediawiki::jobrunner, role::parsoid, role::parsoid::testing [[phab:T278220|T278220]]
* 18:47 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 13:00 akosiaris: repool all jobrunners/videoscalers in the respective conftool clusters. The video transcoding backlog has been served we can return to "normal"
* 18:46 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|18cd360773a2a236f9817ac0a4eaf3790b6d8cff}}: Growth features: Enable features in dark mode on a few wikis ([[phab:T287876|T287876]], [[phab:T287871|T287871]], [[phab:T287878|T287878]], [[phab:T287880|T287880]], [[phab:T287875|T287875]], [[phab:T287879|T287879]], [[phab:T287872|T287872]]; 2/2) (duration: 00m 56s)
* 12:59 akosiaris: repool all jobrunners/videoscalers in the respective conftool clusters
* 18:45 urbanecm@deploy1002: Synchronized dblists/growthexperiments.dblist: {{Gerrit|18cd360773a2a236f9817ac0a4eaf3790b6d8cff}}: Growth features: Enable features in dark mode on a few wikis ([[phab:T287876|T287876]], [[phab:T287871|T287871]], [[phab:T287878|T287878]], [[phab:T287880|T287880]], [[phab:T287875|T287875]], [[phab:T287879|T287879]], [[phab:T287872|T287872]]; 1/2) (duration: 00m 56s)
* 12:59 akosiaris@cumin1001: conftool action : set/pooled=yes; selector: dc=eqiad,cluster=videoscaler
* 18:41 urbanecm: Create GrowthExperiments database tables for a bunch of wikis ([[phab:T287876|T287876]], [[phab:T287871|T287871]], [[phab:T287878|T287878]], [[phab:T287880|T287880]], [[phab:T287875|T287875]], [[phab:T287879|T287879]], [[phab:T287872|T287872]])
* 12:59 akosiaris@cumin1001: conftool action : set/pooled=yes; selector: dc=eqiad,cluster=jobrunner
* 18:40 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|ee47f9d9a867f0bc419928c010579fb4f6fea425}}: Add rollbacker group for kswiki ([[phab:T286789|T286789]]) (duration: 00m 56s)
* 11:38 awight: EU deployment complete
* 18:35 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 11:38 awight@deploy1002: Synchronized php-1.36.0-wmf.37/extensions/WikibaseMediaInfo: Backport: [[gerrit:675882{{!}}Style change to mediasearch logged-in notice close (T274927)]] [[gerrit:675883{{!}}Suppress user notice on mobile (T274927)]] [[gerrit:675881{{!}}Reset namespace filter on cancel (T276261)]] (duration: 01m 08s)
* 18:33 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|eec997cf88437fc6e2e27a835301aef968c548c4}}: Enable SUL autologin for wikimania.wikimedia.org ([[phab:T285197|T285197]]) (duration: 00m 55s)
* 11:26 awight@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:675509{{!}}vector: Disable WVUI search widget treatment A/B test (T276917)]] (duration: 01m 08s)
* 18:33 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 10:48 effie: enable puppet on all mw* servers
* 18:31 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.16/extensions/GrowthExperiments/modules/: {{Gerrit|05cf1d6de1695d2e38531f3fecb26381f4dc0b1d}}: Add a link: Show article extract instead of description in the link inspector ([[phab:T287636|T287636]]; 2/2) (duration: 00m 56s)
* 10:10 effie: disable puppet on all mw* hosts
* 18:30 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.16/extensions/GrowthExperiments/extension.json: {{Gerrit|05cf1d6de1695d2e38531f3fecb26381f4dc0b1d}}: Add a link: Show article extract instead of description in the link inspector ([[phab:T287636|T287636]]; 1/2) (duration: 00m 57s)
* 09:03 hashar: contint2001: enable puppet again
* 18:25 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 08:38 hashar: contint2001: stopping Puppet for an Apache config live hack
* 18:22 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 04:35 eileen: civicrm revision changed from {{Gerrit|7040b68c11}} to {{Gerrit|2fcea570bd}}, config revision is {{Gerrit|6779e3829a}}
* 18:20 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|cc8ca452e66994c211efd684b7ed3810bdc84aaf}}: Add tewikisource as import source for tewikibooks ([[phab:T286978|T286978]]) (duration: 00m 56s)
* 02:37 pt1979@cumin2001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 18:15 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|11e96bab3375d604126619169964a2db96808152}}: Add media.defense.gov to the wgCopyUploadsDomains allowlist of Wikimedia Commons ([[phab:T287264|T287264]]) (duration: 00m 56s)
* 02:33 pt1979@cumin2001: START - Cookbook sre.dns.netbox
* 18:11 urbanecm@deploy1002: Synchronized static/images/project-logos/: {{Gerrit|97b68972108feaf52ab328991f563617f3594d81}}: Remove unused enwiki celebration logos ([[phab:T272108|T272108]]) (duration: 00m 57s)
* 02:22 pt1979@cumin2001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 18:07 urbanecm@deploy1002: Synchronized static/images/project-logos/: {{Gerrit|16f97941b7d8eacc9bddae7bc570e03b031bead2}}: Remove unused eswiki celebration logos ([[phab:T280908|T280908]]) (duration: 00m 57s)
* 02:17 pt1979@cumin2001: START - Cookbook sre.dns.netbox
* 17:57 razzi@cumin1001: START - Cookbook sre.druid.roll-restart-workers for Druid public cluster: Roll restart of Druid's jvm daemons. - razzi@cumin1001
* 02:06 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on wcqs2003.codfw.wmnet with reason: REIMAGE
* 15:44 jynus: remove s2 from db1139 [[phab:T287230|T287230]]
* 02:05 pt1979@cumin2001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
* 14:37 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on lvs2010.codfw.wmnet with reason: NIC maintenance
* 02:04 pt1979@cumin2001: START - Cookbook sre.hosts.downtime for 2:00:00 on wcqs2003.codfw.wmnet with reason: REIMAGE
* 14:37 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on lvs2010.codfw.wmnet with reason: NIC maintenance
* 02:00 pt1979@cumin2001: START - Cookbook sre.dns.netbox
* 13:21 volans: uploaded spicerack_0.0.57 to apt.wikimedia.org buster-wikimedia,bullseye-wikimedia
* 01:37 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on wcqs2002.codfw.wmnet with reason: REIMAGE
* 13:03 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on gerrit1001.wikimedia.org with reason: apply 706049
* 01:35 pt1979@cumin2001: START - Cookbook sre.hosts.downtime for 2:00:00 on wcqs2002.codfw.wmnet with reason: REIMAGE
* 13:03 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on gerrit1001.wikimedia.org with reason: apply 706049
* 01:15 urbanecm@deploy1002: Synchronized wmf-config/config/gawiki.yaml: {{Gerrit|3283ae59f25f02966a81ed2f0b51b964f733cf65}}: Enable local uploads on Irish Wikipedia ([[phab:T277723|T277723]]) (duration: 01m 08s)
* 13:02 mutante: gerrit1001 - restarting service after 706049
* 01:13 urbanecm@deploy1002: Synchronized dblists/commonsuploads.dblist: {{Gerrit|3283ae59f25f02966a81ed2f0b51b964f733cf65}}: Enable local uploads on Irish Wikipedia ([[phab:T277723|T277723]]) (duration: 01m 08s)
* 13:00 mutante: gerrit1001 - re-enabling puppet, deploying sshd listening / firewall change
* 01:07 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on wcqs2001.codfw.wmnet with reason: REIMAGE
* 12:38 mutante: gerrit2001 - restarting gerrit after deploying 706049
* 01:05 pt1979@cumin2001: START - Cookbook sre.hosts.downtime for 2:00:00 on wcqs2001.codfw.wmnet with reason: REIMAGE
* 12:20 mutante: gerrit servers: disabling puppet
* 12:10 reedy@deploy1002: Synchronized php-1.37.0-wmf.16/extensions/MobileFrontend/: [[phab:T287528|T287528]] (duration: 00m 57s)
* 12:08 reedy@deploy1002: Synchronized php-1.37.0-wmf.16/extensions/SecurePoll/: [[phab:T287780|T287780]] (duration: 00m 57s)
* 11:54 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw1288.eqiad.wmnet
* 11:42 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw1288.eqiad.wmnet
* 11:42 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw1271.eqiad.wmnet
* 11:40 reedy@deploy1002: Synchronized php-1.37.0-wmf.16/extensions/SecurePoll/: [[phab:T287782|T287782]] (duration: 00m 56s)
* 11:31 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw1271.eqiad.wmnet
* 11:29 hashar: restarting gerrit primary server on gerrit1001
* 11:27 hashar: restarting Jenkins on contint2001
* 11:27 hashar: restarting Jenkins on contint1001
* 11:26 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 11:24 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 11:18 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw1288.eqiad.wmnet
* 11:17 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 11:16 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw1271.eqiad.wmnet
* 11:14 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 11:13 urbanecm: EU B&C window completed
* 11:10 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|43020b72e8f466188d738aa73f2023f3017804d0}}: votewiki: Enable Single Transferable Vote ([[phab:T283728|T283728]]) (duration: 00m 57s)
* 11:08 moritzm: installing openjdk-11 security updates
* 11:05 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|26bcaafdcd57b1b7a78f9e0ad000325baaf36a72}}: Restore logging for mediamoderation script to better understand high error rate occurring when running script ([[phab:T287511|T287511]]) (duration: 00m 57s)
* 07:53 moritzm: catch up bullseye installs with latest state of testing
* 07:24 moritzm: installing libsndfile security updates on buster
* 07:12 moritzm: installing aspell security updates
* 05:01 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 04:58 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 02:01 tstarling@deploy1002: Synchronized src/defines.php: for consistency only, should have no production impact (duration: 00m 57s)


== 2021-03-30 ==
== 2021-07-31 ==
* 23:59 Trey314159: reindexing English wikis on elastic@eqiad, elastic@codfw, and cloudelastic ([[phab:T274200|T274200]])
* 12:40 reedy@deploy1002: Synchronized php-1.37.0-wmf.16/extensions/SecurePoll/: [[phab:T287780|T287780]] [[phab:T287782|T287782]] (duration: 00m 58s)
* 23:56 legoktm@deploy1002: Synchronized php-1.36.0-wmf.37/extensions/TimedMediaHandler/extension.json: Allow autoconfirmed users to see Special:TranscodeStatistics by default ([[phab:T278867|T278867]]) (duration: 01m 08s)
* 00:01 eileen: civicrm revision changed from {{Gerrit|158ed65e00}} to {{Gerrit|d6baf291f4}}, config revision is {{Gerrit|6011d9c471}}
* 23:53 legoktm@deploy1002: Synchronized php-1.36.0-wmf.36/extensions/TimedMediaHandler/extension.json: Allow autoconfirmed users to see Special:TranscodeStatistics by default ([[phab:T278867|T278867]]) (duration: 01m 08s)
* 23:29 Amir1: sudo django-admin hyperkitty_import -l discovery-alerts@lists-next.wikimedia.org discovery-alerts.mbox/discovery-alerts.mbox --pythonpath /usr/share/mailman3-web --settings settings ([[phab:T278609|T278609]])
* 23:27 pt1979@cumin2001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 23:23 pt1979@cumin2001: START - Cookbook sre.dns.netbox
* 23:15 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|ef306a35464f295f43b874301cf0170edcfa4d8c}}: Growth features: bnwiki: Enable impact module ([[phab:T274793|T274793]]) (duration: 01m 07s)
* 22:52 cstone: civicrm revision changed from {{Gerrit|ad430721f6}} to {{Gerrit|7040b68c11}}
* 21:11 twentyafterfour@deploy1002: Finished deploy [releng/phatality@fbca60c]: rollback (duration: 00m 12s)
* 21:11 twentyafterfour@deploy1002: Started deploy [releng/phatality@fbca60c]: rollback
* 21:05 twentyafterfour@deploy1002: Finished deploy [releng/phatality@fbca60c]: trying again with newly built zip (duration: 00m 12s)
* 21:05 twentyafterfour@deploy1002: Started deploy [releng/phatality@fbca60c]: trying again with newly built zip
* 21:02 legoktm: scap pulling on mw1298
* 20:59 twentyafterfour@deploy1002: Finished deploy [releng/phatality@715d809]: (no justification provided) (duration: 00m 15s)
* 20:58 twentyafterfour@deploy1002: Started deploy [releng/phatality@715d809]: (no justification provided)
* 20:58 legoktm: killed remaining ffmpeg on mw1298
* 20:56 twentyafterfour@deploy1002: Finished deploy [releng/phatality@715d809]: (no justification provided) (duration: 00m 12s)
* 20:56 twentyafterfour@deploy1002: Started deploy [releng/phatality@715d809]: (no justification provided)
* 20:53 kharlan@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'linkrecommendation' for release 'external' .
* 20:52 kharlan@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'linkrecommendation' for release 'external' .
* 20:41 twentyafterfour@deploy1002: Finished deploy [releng/phatality@715d809]: (no justification provided) (duration: 00m 20s)
* 20:41 twentyafterfour@deploy1002: Started deploy [releng/phatality@715d809]: (no justification provided)
* 20:41 kharlan@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'linkrecommendation' for release 'production' .
* 20:40 kharlan@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'linkrecommendation' for release 'external' .
* 20:38 kharlan@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'linkrecommendation' for release 'external' .
* 20:37 kharlan@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'linkrecommendation' for release 'production' .
* 20:37 twentyafterfour@deploy1002: Finished deploy [releng/phatality@715d809]: (no justification provided) (duration: 00m 31s)
* 20:36 twentyafterfour@deploy1002: Started deploy [releng/phatality@715d809]: (no justification provided)
* 20:35 twentyafterfour@deploy1002: Finished deploy [releng/phatality@715d809]: (no justification provided) (duration: 00m 05s)
* 20:35 twentyafterfour@deploy1002: Started deploy [releng/phatality@715d809]: (no justification provided)
* 20:34 kharlan@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'linkrecommendation' for release 'staging' .
* 20:34 twentyafterfour@deploy1002: Started restart [releng/phatality@715d809]: (no justification provided)
* 20:33 twentyafterfour@deploy1002: Finished scap: testwikis wikis to 1.36.0-wmf.37  refs [[phab:T278343|T278343]] (duration: 80m 32s)
* 20:29 twentyafterfour@deploy1002: Finished deploy [releng/phatality@715d809]: (no justification provided) (duration: 00m 49s)
* 20:29 twentyafterfour@deploy1002: Started deploy [releng/phatality@715d809]: (no justification provided)
* 20:28 legoktm@deploy1002: conftool action : set/pooled=yes; selector: cluster=videoscaler,name=mw1307.eqiad.wmnet
* 20:28 legoktm@deploy1002: conftool action : set/pooled=yes; selector: cluster=videoscaler,name=mw1306.eqiad.wmnet
* 20:28 legoktm@deploy1002: conftool action : set/pooled=yes; selector: cluster=videoscaler,name=mw1305.eqiad.wmnet
* 20:28 legoktm@deploy1002: conftool action : set/pooled=yes; selector: cluster=videoscaler,name=mw1304.eqiad.wmnet
* 20:28 legoktm@deploy1002: conftool action : set/pooled=yes; selector: cluster=videoscaler,name=mw1303.eqiad.wmnet
* 20:28 legoktm@deploy1002: conftool action : set/pooled=no; selector: cluster=jobrunner,name=mw1307.eqiad.wmnet
* 20:28 legoktm@deploy1002: conftool action : set/pooled=no; selector: cluster=jobrunner,name=mw1306.eqiad.wmnet
* 20:27 legoktm@deploy1002: conftool action : set/pooled=no; selector: cluster=jobrunner,name=mw1305.eqiad.wmnet
* 20:27 legoktm@deploy1002: conftool action : set/pooled=no; selector: cluster=jobrunner,name=mw1304.eqiad.wmnet
* 20:27 legoktm@deploy1002: conftool action : set/pooled=no; selector: cluster=jobrunner,name=mw1303.eqiad.wmnet
* 20:26 twentyafterfour: preparing to deploy phatality upgrade to kibana cluster
* 20:25 legoktm@deploy1002: conftool action : set/pooled=yes; selector: cluster=jobrunner,name=mw1296.eqiad.wmnet
* 20:25 legoktm@deploy1002: conftool action : set/pooled=yes; selector: cluster=jobrunner,name=mw1298.eqiad.wmnet
* 20:25 legoktm@deploy1002: conftool action : set/pooled=yes; selector: cluster=jobrunner,name=mw1299.eqiad.wmnet
* 20:21 joal@deploy1002: Finished deploy [analytics/refinery@1a53e9a] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@1a53e9a] (duration: 04m 29s)
* 20:20 legoktm@deploy1002: conftool action : set/pooled=no; selector: cluster=videoscaler,name=mw1299.eqiad.wmnet
* 20:20 legoktm@deploy1002: conftool action : set/pooled=no; selector: cluster=videoscaler,name=mw1298.eqiad.wmnet
* 20:20 legoktm@deploy1002: conftool action : set/pooled=no; selector: cluster=videoscaler,name=mw1296.eqiad.wmnet
* 20:16 joal@deploy1002: Started deploy [analytics/refinery@1a53e9a] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@1a53e9a]
* 20:16 joal@deploy1002: Finished deploy [analytics/refinery@1a53e9a] (thin): Regular analytics weekly train THIN [analytics/refinery@1a53e9a] (duration: 00m 07s)
* 20:16 joal@deploy1002: Started deploy [analytics/refinery@1a53e9a] (thin): Regular analytics weekly train THIN [analytics/refinery@1a53e9a]
* 20:15 joal@deploy1002: Finished deploy [analytics/refinery@1a53e9a]: Regular analytics weekly train [analytics/refinery@1a53e9a] (duration: 17m 11s)
* 20:02 twentyafterfour: when syncing 1.36.0-wmf.37 promote to testwikis, one server failed: server mw1298.eqiad.wmnet and two more appear to be hung because scap is stuck at 2 left 99% without making any progress for a long time now. refs [[phab:T278343|T278343]]
* 19:58 bblack@cumin1001: conftool action : set/pooled=yes; selector: name=cp1087.eqiad.wmnet
* 19:58 joal@deploy1002: Started deploy [analytics/refinery@1a53e9a]: Regular analytics weekly train [analytics/refinery@1a53e9a]
* 19:58 bblack: repool cp1087 - [[phab:T278729|T278729]]
* 19:13 twentyafterfour@deploy1002: Started scap: testwikis wikis to 1.36.0-wmf.37  refs [[phab:T278343|T278343]]
* 18:15 pt1979@cumin2001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 18:09 pt1979@cumin2001: START - Cookbook sre.dns.netbox
* 17:22 legoktm: moved mw[1293-1295] to jobrunners and mw[1300-1302] to videoscalers
* 17:22 legoktm@deploy1002: conftool action : set/pooled=yes; selector: cluster=videoscaler,name=mw1302.eqiad.wmnet
* 17:22 legoktm@deploy1002: conftool action : set/pooled=yes; selector: cluster=videoscaler,name=mw1301.eqiad.wmnet
* 17:21 legoktm@deploy1002: conftool action : set/pooled=yes; selector: cluster=videoscaler,name=mw1300.eqiad.wmnet
* 17:21 legoktm@deploy1002: conftool action : set/pooled=no; selector: cluster=jobrunner,name=mw1302.eqiad.wmnet
* 17:21 legoktm@deploy1002: conftool action : set/pooled=no; selector: cluster=jobrunner,name=mw1301.eqiad.wmnet
* 17:21 legoktm@deploy1002: conftool action : set/pooled=no; selector: cluster=jobrunner,name=mw1300.eqiad.wmnet
* 17:21 legoktm@deploy1002: conftool action : set/pooled=yes; selector: cluster=jobrunner,name=mw1295.eqiad.wmnet
* 17:21 legoktm@deploy1002: conftool action : set/pooled=yes; selector: cluster=jobrunner,name=mw1294.eqiad.wmnet
* 17:21 legoktm@deploy1002: conftool action : set/pooled=yes; selector: cluster=jobrunner,name=mw1293.eqiad.wmnet
* 17:19 legoktm: killed all ffmpeg on mw1294
* 17:17 legoktm@deploy1002: conftool action : set/pooled=no; selector: cluster=videoscaler,name=mw1295.eqiad.wmnet
* 17:17 legoktm@deploy1002: conftool action : set/pooled=no; selector: cluster=videoscaler,name=mw1293.eqiad.wmnet
* 17:17 legoktm@deploy1002: conftool action : set/pooled=no; selector: cluster=videoscaler,name=mw1294.eqiad.wmnet
* 17:13 mbsantos@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
* 17:12 mbsantos@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
* 17:10 mbsantos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'mobileapps' for release 'staging' .
* 17:08 mbsantos@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'proton' for release 'production' .
* 17:05 mbsantos@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'proton' for release 'production' .
* 17:02 mbsantos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'proton' for release 'production' .
* 16:40 effie: enable puppet on mw* hosts
* 16:10 mutante: mw1296 - started ferm
* 16:10 mutante: mw1308 - started ferm
* 16:07 akosiaris: split jobrunners/videoscalers clusters in conftool. mw12* become videoscalers, mw13* become jobrunners, killing ffmpeg on mw13*
* 16:07 mutante: mw1309 - systemctl start ferm
* 16:07 akosiaris@cumin1001: conftool action : set/pooled=no; selector: dc=eqiad,cluster=jobrunner,name=mw12.*
* 16:06 akosiaris@cumin1001: conftool action : set/pooled=no; selector: dc=eqiad,cluster=videoscaler,name=mw13.*
* 16:06 akosiaris@cumin1001: conftool action : set/pooled=yes; selector: dc=eqiad,cluster=videoscaler,name=mw12.*
* 15:59 akosiaris: depool a number of hosts from videoscalers
* 15:59 akosiaris@cumin1001: conftool action : set/pooled=no; selector: dc=eqiad,cluster=videoscaler,name=mw12.*
* 15:55 legoktm@deploy1002: conftool action : set/pooled=no; selector: name=mw1308.eqiad.wmnet,service=jobrunner
* 15:55 legoktm@deploy1002: conftool action : set/pooled=no; selector: name=mw1307.eqiad.wmnet,service=jobrunner
* 15:42 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=aqs1004.eqiad.wmnet
* 15:29 hnowlan: moving all test tables out of cassandra directories on aqs hosts
* 14:59 effie: disable puppet on mediawiki servers to deploy 663565
* 14:58 Urbanecm: Move Help talk:Help talk:Getting started --> Help talk:Getting started via moveBatch.php on enwiki ([[phab:T278350|T278350]])
* 14:32 arturo: manually start update-openstack-mirror.service on sodium ([[phab:T278505|T278505]])
* 13:02 jbond42: rollout lxml update [[phab:T278822|T278822]]
* 12:55 jbond42: update spamassasin on lists,otrs and mx [[phab:T278820|T278820]]
* 12:39 Amir1: ssh -p 29418 gerrit.wikimedia.org replication start wikidata/query-builder --wait ([[phab:T277060|T277060]])
* 12:38 jbond42: update python(3)-pygments
* 12:36 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=aqs1004.eqiad.wmnet
* 12:14 Urbanecm: mwmaint1002: Downloading multiple big files (total filesize estimated 150 GB, downloaded and processed in batches) for server-side uploads
* 11:21 ladsgroup@deploy1002: Synchronized wmf-config/InitialiseSettings.php: [[gerrit:675751{{!}}Disable legacy javascript global variables in group1]], Some increase in client errors is expected ([[phab:T72470|T72470]]) (duration: 01m 11s)
* 09:58 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudnet1003.eqiad.wmnet
* 09:52 aborrero@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudnet1003.eqiad.wmnet
* 09:42 hnowlan@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
* 09:41 hnowlan@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
* 09:35 hnowlan@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
* 09:35 hnowlan@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
* 09:05 hnowlan@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
* 09:04 hnowlan@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
* 08:36 jynus: mariadb upgrade of all buster source backup hosts to 10.4.18 [[phab:T250666|T250666]]
* 08:05 dcausse: refreshing wdqs entities ([[phab:T278693|T278693]])
* 07:37 elukey: restart-php7.2-fpm on mw1304, jobrunner completely overwhelmed by ffmpeg/transcode jobs (not publishing metrics, erroring out for memcached timeouts) - [[phab:T278734|T278734]]
* 07:28 hashar@deploy1002: rebuilt and synchronized wikiversions files: all wikis to 1.36.0-wmf.36 - [[phab:T274940|T274940]]
* 06:06 elukey: powercycle cp1087 (no ssh, no mgmt console tty)
* 06:04 elukey@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp1087.eqiad.wmnet


== 2021-03-29 ==
== 2021-07-30 ==
* 19:06 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=aqs1004.eqiad.wmnet
* 22:44 legoktm@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'shellbox-constraints' for release 'main' .
* 17:47 volans@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 22:31 razzi@cumin1001: END (PASS) - Cookbook sre.druid.roll-restart-workers (exit_code=0) for Druid test cluster: Roll restart of Druid's jvm daemons. - razzi@cumin1001
* 17:37 volans@cumin1001: START - Cookbook sre.dns.netbox
* 22:22 razzi@cumin1001: START - Cookbook sre.druid.roll-restart-workers for Druid test cluster: Roll restart of Druid's jvm daemons. - razzi@cumin1001
* 16:15 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=aqs1004.eqiad.wmnet
* 22:20 eileen: civicrm revision is {{Gerrit|158ed65e00}}, config revision is {{Gerrit|6011d9c471}}
* 16:11 hnowlan: depooled aqs1004 for transfer of large tables to aqs1010
* 21:51 legoktm@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
* 15:54 jbond@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 21:50 legoktm@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
* 15:47 jbond@cumin1001: START - Cookbook sre.dns.netbox
* 21:50 legoktm@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
* 15:45 jbond@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 21:49 legoktm@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
* 15:39 jbond@cumin1001: START - Cookbook sre.dns.netbox
* 21:49 legoktm@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
* 13:26 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on parse2001.codfw.wmnet with reason: REIMAGE
* 21:48 legoktm@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
* 13:24 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on parse2001.codfw.wmnet with reason: REIMAGE
* 21:47 legoktm@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
* 13:03 ema: cp4027: rollback luajit experiment https://github.com/apache/trafficserver/issues/7423#issuecomment-809354214
* 21:46 legoktm@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
* 12:36 ema: cp4027: re-enable JIT compilation in all ats-be lua scripts -- https://github.com/apache/trafficserver/issues/7423
* 20:39 ottomata: wiping kafka jumbo cluster in deployment-prep beta
* 11:57 ema: cp4027: re-enable JIT compilation in normalize-path.lua -- https://github.com/apache/trafficserver/issues/7423
* 19:44 otto@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Declare wd_propertysuggester streams - [[phab:T287760|T287760]] (duration: 00m 57s)
* 11:32 ema: cp4027: install libluajit 2.1.0~beta3+dfsg-6wm1 with P15083 applied -- https://github.com/apache/trafficserver/issues/7423
* 16:13 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 09:59 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pki2001.codfw.wmnet with reason: REIMAGE
* 16:10 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 09:57 jbond@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on pki2001.codfw.wmnet with reason: REIMAGE
* 15:58 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 09:16 ryankemper: [[phab:T267927|T267927]] `sudo -i cookbook sre.wdqs.data-reload wdqs2008.codfw.wmnet --task-id [[phab:T267927|T267927]] --reload-data wikidata --reason '[[phab:T267927|T267927]]: Reload wikidata jnl from fresh dumps' --reuse-downloaded-dump --depool`
* 15:33 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw[1298-1299].eqiad.wmnet
* 09:15 ryankemper@cumin2001: START - Cookbook sre.wdqs.data-reload
* 15:30 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 08:47 filippo@deploy1002: Finished deploy [librenms/librenms@df69efe]: deploy {{Gerrit|I156f32925f693}} (duration: 00m 08s)
* 15:20 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw[1298-1299].eqiad.wmnet
* 08:47 filippo@deploy1002: Started deploy [librenms/librenms@df69efe]: deploy {{Gerrit|I156f32925f693}}
* 15:19 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw129[8-9].eqiad.wmnet
* 07:59 hashar@deploy1002: Synchronized php: group1 wikis to 1.36.0-wmf.36 (duration: 01m 06s)
* 15:19 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw129[5-6].eqiad.wmnet
* 07:58 hashar@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.36.0-wmf.36
* 15:17 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw[1295-1296].eqiad.wmnet
* 07:54 hashar@deploy1002: Synchronized php-1.36.0-wmf.36/extensions/FlaggedRevs: Wrap most of functionalities depending on protect mode in a condition - [[phab:T278478|T278478]] (duration: 01m 08s)
* 15:12 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 07:49 ladsgroup@deploy1002: Synchronized php-1.36.0-wmf.36/extensions/FlaggedRevs: [[gerrit:675161{{!}}Wrap most of functionalities depending on protect mode in a condition]] ([[phab:T278478|T278478]]) (duration: 01m 08s)
* 15:07 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw[1295-1296].eqiad.wmnet
* 07:42 godog: swift eqiad-prod: less weight for ms-be[1019-1026] / more weight to ms-be106[0-3] - [[phab:T272836|T272836]] [[phab:T268435|T268435]]
* 15:04 pt1979@cumin2002: START - Cookbook sre.dns.netbox
* 14:58 mutante: mw1439, mw1440, mw1445, mw1446 - scap pull, repool as jobrunners after reimaging
* 14:55 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw144[5-6].eqiad.wmnet
* 14:53 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1440.eqiad.wmnet
* 14:52 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1439.eqiad.wmnet
* 14:46 dzahn@cumin1001: conftool action : set/weight=10; selector: name=mw144[5-6].eqiad.wmnet
* 14:46 dzahn@cumin1001: conftool action : set/weight=10; selector: name=mw1440.eqiad.wmnet
* 14:46 dzahn@cumin1001: conftool action : set/weight=10; selector: name=mw1439.eqiad.wmnet
* 14:39 topranks: Setting up BGP peering to Xiber LLC AS393950 on cr2-eqord, Equinix Chicago exchange.
* 14:21 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1446.eqiad.wmnet with reason: REIMAGE
* 14:19 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1445.eqiad.wmnet with reason: REIMAGE
* 14:18 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1446.eqiad.wmnet with reason: REIMAGE
* 14:17 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1440.eqiad.wmnet with reason: REIMAGE
* 14:16 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1445.eqiad.wmnet with reason: REIMAGE
* 14:15 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1439.eqiad.wmnet with reason: REIMAGE
* 14:14 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1440.eqiad.wmnet with reason: REIMAGE
* 14:12 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1439.eqiad.wmnet with reason: REIMAGE
* 13:57 mutante: mw1439,mw1440,mw1445,mw1446 - converting from app/API to jobrunners - reimaging for row balance in eqiad
* 13:53 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on mw[1439-1440].eqiad.wmnet with reason: reimage
* 13:53 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on mw[1439-1440].eqiad.wmnet with reason: reimage
* 13:53 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on mw[1445-1446].eqiad.wmnet with reason: reimage
* 13:53 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on mw[1445-1446].eqiad.wmnet with reason: reimage
* 13:26 joe: uploaded docker-report 0.0.13 to buster
* 13:23 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw144[5-6].eqiad.wmnet
* 13:22 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw1440.eqiad.wmnet
* 13:22 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw1439.eqiad.wmnet
* 11:23 moritzm: installing libsndfile security updates on stretch
* 09:38 mbsantos@deploy1002: Finished deploy [kartotherian/deploy@289d3a9]: Add public source to render tegola MVT in maps2007 temporarily (duration: 00m 21s)
* 09:37 mbsantos@deploy1002: Started deploy [kartotherian/deploy@289d3a9]: Add public source to render tegola MVT in maps2007 temporarily
* 09:32 mbsantos@deploy1002: Finished deploy [kartotherian/deploy@c6cfa85]: Add non-public source to render tegola MVT in maps2007 (duration: 00m 21s)
* 09:32 mbsantos@deploy1002: Started deploy [kartotherian/deploy@c6cfa85]: Add non-public source to render tegola MVT in maps2007
* 08:56 topranks: running homer against asw2-a-eqiad and asw2-b-eqiad to bring homer in line with manual config added for buffer mem. [[phab:T284592|T284592]]
* 06:25 marostegui@cumin1001: dbctl commit (dc=all): 'db2104 (re)pooling @ 100%: After upgrae', diff saved to https://phabricator.wikimedia.org/P16934 and previous config saved to /var/cache/conftool/dbconfig/20210730-062545-root.json
* 06:10 marostegui@cumin1001: dbctl commit (dc=all): 'db2104 (re)pooling @ 75%: After upgrae', diff saved to https://phabricator.wikimedia.org/P16933 and previous config saved to /var/cache/conftool/dbconfig/20210730-061041-root.json
* 05:55 marostegui@cumin1001: dbctl commit (dc=all): 'db2104 (re)pooling @ 50%: After upgrae', diff saved to https://phabricator.wikimedia.org/P16932 and previous config saved to /var/cache/conftool/dbconfig/20210730-055537-root.json
* 05:40 marostegui@cumin1001: dbctl commit (dc=all): 'db2104 (re)pooling @ 25%: After upgrae', diff saved to https://phabricator.wikimedia.org/P16931 and previous config saved to /var/cache/conftool/dbconfig/20210730-054031-root.json
* 05:25 marostegui@cumin1001: dbctl commit (dc=all): 'db2104 (re)pooling @ 15%: After upgrae', diff saved to https://phabricator.wikimedia.org/P16930 and previous config saved to /var/cache/conftool/dbconfig/20210730-052527-root.json
* 05:25 tstarling@deploy1002: Synchronized php-1.37.0-wmf.16/tests/phpunit/includes/media/PNGMetadataExtractorTest.php: fix broken PNG thumbnails [[phab:T286273|T286273]] (duration: 00m 57s)
* 05:10 marostegui@cumin1001: dbctl commit (dc=all): 'db2104 (re)pooling @ 10%: After upgrae', diff saved to https://phabricator.wikimedia.org/P16929 and previous config saved to /var/cache/conftool/dbconfig/20210730-051024-root.json
* 04:56 tstarling@deploy1002: Synchronized php-1.37.0-wmf.16/includes/media/PNGMetadataExtractor.php: fix broken PNG thumbnails [[phab:T286273|T286273]] (duration: 00m 57s)
* 04:55 marostegui@cumin1001: dbctl commit (dc=all): 'db2104 (re)pooling @ 5%: After upgrae', diff saved to https://phabricator.wikimedia.org/P16928 and previous config saved to /var/cache/conftool/dbconfig/20210730-045520-root.json


== 2021-03-27 ==
== 2021-07-29 ==
* 19:25 elukey: powercycle elastic1060 - [[phab:T278630|T278630]]
* 23:41 derick@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:708870{{!}}Merge new configs with existing testwiki definition]] (duration: 00m 57s)
* 06:10 ryankemper: [[phab:T267927|T267927]] `sudo https_proxy=webproxy.codfw.wmnet:8080 wget https://dumps.wikimedia.org/wikidatawiki/entities/latest-all.ttl.bz2 -O /srv/wdqs/latest-all.ttl.bz2 && sudo https_proxy=webproxy.codfw.wmnet:8080 wget https://dumps.wikimedia.org/wikidatawiki/entities/latest-lexemes.ttl.bz2 -O /srv/wdqs/latest-lexemes.ttl.bz2` on `ryankemper@wdqs2008` tmux session `download_dumps_2020-03-26`
* 21:11 twentyafterfour@deploy1002: rebuilt and synchronized wikiversions files: group2 wikis to 1.37.0-wmf.16  refs [[phab:T281157|T281157]]
* 05:44 ryankemper@cumin2001: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99)
* 20:59 ladsgroup@deploy1002: Synchronized php-1.37.0-wmf.16/extensions/Wikibase/client: Backport: [[gerrit:708644{{!}}Let language parameter accept null in Scribunto_LuaWikibaseEntityLibrary (T287704)]] (duration: 01m 09s)
* 05:44 ryankemper@cumin2001: START - Cookbook sre.wdqs.data-reload
* 19:27 twentyafterfour@deploy1002: rebuilt and synchronized wikiversions files: group2 wikis to 1.37.0-wmf.15  refs [[phab:T281157|T281157]]
* 05:42 ryankemper@cumin2001: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99)
* 19:19 twentyafterfour@deploy1002: rebuilt and synchronized wikiversions files: all wikis to 1.37.0-wmf.16  refs [[phab:T281157|T281157]]
* 05:42 ryankemper@cumin2001: START - Cookbook sre.wdqs.data-reload
* 18:37 urbanecm@deploy1002: Finished scap: {{Gerrit|796fe8e}}: {{Gerrit|927763c}}: SecurePoll backports ([[phab:T283728|T283728]], [[phab:T284585|T284585]]) (duration: 17m 06s)
* 05:40 ryankemper@cumin2001: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99)
* 18:19 urbanecm@deploy1002: Started scap: {{Gerrit|796fe8e}}: {{Gerrit|927763c}}: SecurePoll backports ([[phab:T283728|T283728]], [[phab:T284585|T284585]])
* 05:40 ryankemper@cumin2001: START - Cookbook sre.wdqs.data-reload
* 18:19 razzi@cumin1001: END (PASS) - Cookbook sre.aqs.roll-restart (exit_code=0) for AQS aqs cluster: Roll restart of all AQS's nodejs daemons. - razzi@cumin1001
* 05:40 ryankemper@cumin2001: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99)
* 18:15 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.16/extensions/GlobalWatchlist/modules/SiteDisplay.js: {{Gerrit|9a2383d7ecfe1874c08f38a08d174364a12ad247}}: Display: Use HTML "dir" attribute for ltr/rtl ([[phab:T287649|T287649]]) (duration: 01m 25s)
* 05:40 ryankemper@cumin2001: START - Cookbook sre.wdqs.data-reload
* 18:11 razzi@cumin1001: START - Cookbook sre.aqs.roll-restart for AQS aqs cluster: Roll restart of all AQS's nodejs daemons. - razzi@cumin1001
* 05:38 ryankemper@cumin2001: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99)
* 16:27 andrewbogott: adding uid=mdipietro,ou=people,dc=wikimedia,dc=org to cn=ops,ou=groups,dc=wikimedia,dc=org in ldap
* 05:38 ryankemper@cumin2001: START - Cookbook sre.wdqs.data-reload
* 15:23 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 15:11 mmandere: pool lvs1013.eqiad.wmnet - [[phab:T286032|T286032]]
* 15:09 mmandere: pool dns1001.wikimedia.org - [[phab:T286032|T286032]]
* 15:07 mmandere: pool cp107[5-8].eqiad.wmnet - [[phab:T286032|T286032]]
* 14:48 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on lvs1013.eqiad.wmnet with reason: Eqiad row A maintenance
* 14:48 mmandere@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on lvs1013.eqiad.wmnet with reason: Eqiad row A maintenance
* 14:46 mmandere: depool lvs1013 - [[phab:T286032|T286032]]
* 14:45 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on dns1001.wikimedia.org with reason: Eqiad row A maintenance
* 14:45 mmandere@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on dns1001.wikimedia.org with reason: Eqiad row A maintenance
* 14:39 mmandere: depool dns1001 - [[phab:T286032|T286032]]
* 14:39 dzahn@cumin1001: conftool action : set/weight=30; selector: name=mw143[4-6].eqiad.wmnet
* 14:38 dzahn@cumin1001: conftool action : set/weight=30; selector: name=mw144[3-6].eqiad.wmnet
* 14:38 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on cp[1075-1078].eqiad.wmnet with reason: Eqiad row A maintenance
* 14:38 mmandere@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on cp[1075-1078].eqiad.wmnet with reason: Eqiad row A maintenance
* 14:35 mmandere: depool cp107[5-8].eqiad.wmnet - [[phab:T286032|T286032]]
* 14:19 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 14:11 vgutierrez: restart pybal on lvs2009
* 14:09 vgutierrez: restart pybal on lvs2010
* 14:07 vgutierrez: restart pybal on lvs2008
* 14:05 vgutierrez: restart pybal on lvs2007
* 13:59 vgutierrez: restart pybal on lvs1014
* 13:55 vgutierrez: restart pybal on lvs1015
* 13:52 _joe_: restarting pybal on lvs1016
* 12:56 jgiannelos@deploy1002: Finished deploy [kartotherian/deploy@1e31cc6]: Increase mirrored traffic to tegola (duration: 00m 21s)
* 12:55 jgiannelos@deploy1002: Started deploy [kartotherian/deploy@1e31cc6]: Increase mirrored traffic to tegola
* 12:22 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host ganeti2026.codfw.wmnet
* 12:21 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2026.codfw.wmnet
* 10:50 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2104.codfw.wmnet with reason: REIMAGE
* 10:47 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db2104.codfw.wmnet with reason: REIMAGE
* 10:27 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2104 [[phab:T287230|T287230]]', diff saved to https://phabricator.wikimedia.org/P16925 and previous config saved to /var/cache/conftool/dbconfig/20210729-102753-marostegui.json
* 09:59 jgiannelos@deploy1002: Finished deploy [kartotherian/deploy@6960d32]: Increase mirrored traffic to tegola (duration: 00m 22s)
* 09:59 jgiannelos@deploy1002: Started deploy [kartotherian/deploy@6960d32]: Increase mirrored traffic to tegola
* 09:47 moritzm: installing Mariadb 10.3.29 updates from Buster point release (as packaged in Debian, not the WMF DB packages)
* 09:40 jelto: uncordon kubestage1002.eqiad.wmnet as rsyslog was restarted and log shipping to logstash works again
* 09:14 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2025.codfw.wmnet
* 09:04 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2025.codfw.wmnet
* 08:54 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 08:51 jmm@cumin2002: START - Cookbook sre.dns.netbox
* 08:33 moritzm: purging obsolete kernels from moscovium (disk space alerts for /)
* 08:13 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host moscovium.eqiad.wmnet
* 08:09 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host moscovium.eqiad.wmnet
* 07:55 elukey: roll restart uwsgi + celery on ores[12]* nodes to pick up aspell upgrades
* 07:53 mbsantos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 07:52 moritzm: restarting Tomcat on idp-test
* 06:41 XioNoX: push pfw policies - [[phab:T287203|T287203]]
* 05:44 Amir1: adding "comunicaciones AT wikimediacolombia.org" as owner of wikimedia-co mailing list
* 01:08 eileen: civicrm revision changed from {{Gerrit|739c936298}} to {{Gerrit|158ed65e00}}, config revision is {{Gerrit|6011d9c471}}


== 2021-03-26 ==
== 2021-07-28 ==
* 22:27 tzatziki: reset password for Philroc
* 23:57 thcipriani@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:708581{{!}}wgSkipSkins: Update defaults, hide modern (T287616)]] (duration: 01m 06s)
* 20:10 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudgw1001.eqiad.wmnet with reason: REIMAGE
* 23:50 thcipriani@deploy1002: Synchronized wmf-config: Config: [[gerrit:708158{{!}}Disable mobile contributions simplifications on Wikidata and Commons (T283988)]] (duration: 01m 58s)
* 20:08 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudgw1001.eqiad.wmnet with reason: REIMAGE
* 19:16 twentyafterfour@deploy1002: Synchronized php: group1 wikis to 1.37.0-wmf.16  refs [[phab:T281157|T281157]] (duration: 01m 06s)
* 17:44 hashar@deploy1002: Synchronized php-1.36.0-wmf.36/includes/changes/RecentChange.php: RecentChange: directly build the user identity if we have the data - [[phab:T277795|T277795]] (duration: 01m 06s)
* 19:15 twentyafterfour@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.37.0-wmf.16  refs [[phab:T281157|T281157]]
* 17:42 hashar@deploy1002: Finished scap: Revert "Add change tags for media additions/removals" - [[phab:T266067|T266067]] [[phab:T278429|T278429]] (duration: 31m 43s)
* 19:09 twentyafterfour: Preparing to deploy 1.37.0-wmf.16 to group1 wikis
* 17:10 hashar@deploy1002: Started scap: Revert "Add change tags for media additions/removals" - [[phab:T266067|T266067]] [[phab:T278429|T278429]]
* 18:57 legoktm: mwmaint2002$ foreachwikiindblist wikimania refreshLinks.php - to start populating DPL tracking category
* 15:40 Urbanecm: Delete `commonswiki:ip-autoblock:whitelist` cache key from memcached (wmf.36 moves the autoblock whitelist source, and it was deployed on commonswiki for a while, resulting in the cache key being empty)
* 18:36 legoktm@deploy1002: Finished scap: Add a tracking category to pages using the <DynamicPageList> tag (duration: 27m 16s)
* 15:37 hnowlan: importing imposm3_0.11.0+git20201104.4758cf4-1_amd64.changes on apt1001
* 18:14 jbond: manually cleared out the puppetdb2002 queue
* 14:40 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1016.eqiad.wmnet
* 18:08 legoktm@deploy1002: Started scap: Add a tracking category to pages using the <DynamicPageList> tag
* 14:33 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ganeti1016.eqiad.wmnet
* 16:37 ryankemper: [WDQS Deploy] Deploy complete. Successful test query placed on query.wikidata.org, there's no relevant criticals in Icinga, and Grafana looks good
* 14:05 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1015.eqiad.wmnet
* 16:00 ryankemper: [WDQS Deploy] Restarting `wdqs-categories` across lvs-managed hosts, one node at a time: `sudo -E cumin -b 1 'A:wdqs-all and not A:wdqs-test' 'depool && sleep 45 && systemctl restart wdqs-categories && sleep 45 && pool'`
* 13:58 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ganeti1015.eqiad.wmnet
* 15:59 ryankemper: [WDQS Deploy] Restarted `wdqs-categories` across all test hosts simultaneously: `sudo -E cumin 'A:wdqs-test' 'systemctl restart wdqs-categories'`
* 13:10 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1014.eqiad.wmnet
* 15:59 ryankemper: [WDQS Deploy] Restarted `wdqs-updater` across all hosts, 4 hosts at a time: `sudo -E cumin -b 4 'A:wdqs-all' 'systemctl restart wdqs-updater'`
* 13:02 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ganeti1014.eqiad.wmnet
* 15:58 ryankemper: [[phab:T287112|T287112]] [WDQS] Re-pooled `wdqs2002`
* 13:02 moritzm: reimaging theemin [[phab:T275873|T275873]]
* 15:57 ryankemper@deploy1002: Finished deploy [wdqs/wdqs@26273d8]: 0.3.77 (duration: 08m 55s)
* 12:56 moritzm: drain ganeti1014
* 15:53 mutante: mw1434,mw1435,mw1436: scap pull, repooled, reimaged, converted from API to appserver for balancing ([[phab:T279309|T279309]])
* 12:49 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1013.eqiad.wmnet
* 15:53 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw143[4-6].eqiad.wmnet
* 12:42 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ganeti1013.eqiad.wmnet
* 15:52 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw143[4-6].eqiad.wmnet
* 12:37 moritzm: drain ganeti1013
* 15:51 ryankemper: [WDQS Deploy] Tests passing following deploy of `0.3.77` on canary `wdqs1003`; proceeding to rest of fleet
* 12:35 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1012.eqiad.wmnet
* 15:48 ryankemper@deploy1002: Started deploy [wdqs/wdqs@26273d8]: 0.3.77
* 12:27 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ganeti1012.eqiad.wmnet
* 15:47 ryankemper: [WDQS Deploy] Gearing up for deploy of wdqs `0.3.77`. Pre-deploy tests passing on canary `wdqs1003`
* 10:55 Urbanecm: Move `Help talk:Getting Started --> Help talk:Getting started` on enwiki with `[urbanecm@mwmaint1002 ~]$ mwscript moveBatch.php --wiki=enwiki -r 'sysadmin action: fixing [[:phab:T278350]]' -u 'Martin Urbanec' batch.txt` ([[phab:T278350|T278350]])
* 15:47 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 10:49 Urbanecm: Move `User talk:TheAafi/Help talk` to `Help talk:Getting Started` via `[urbanecm@mwmaint1002 ~]$ mwscript moveBatch.php --wiki=enwiki -r 'sysadmin action: fixing [[:phab:T278350]]' -u 'Martin Urbanec' batch.txt` to fix an UBN task ([[phab:T278350|T278350]])
* 15:08 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 10:10 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts chlorine.eqiad.wmnet
* 15:05 jmm@cumin2002: START - Cookbook sre.dns.netbox
* 10:02 akosiaris@cumin1001: START - Cookbook sre.hosts.decommission for hosts chlorine.eqiad.wmnet
* 14:58 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1434.eqiad.wmnet with reason: REIMAGE
* 10:00 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts argon.eqiad.wmnet
* 14:56 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1434.eqiad.wmnet with reason: REIMAGE
* 09:49 filippo@deploy1002: Finished deploy [librenms/librenms@63e862a]: deploy {{Gerrit|I955cbfc244}} (duration: 00m 08s)
* 14:39 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
* 09:49 filippo@deploy1002: Started deploy [librenms/librenms@63e862a]: deploy {{Gerrit|I955cbfc244}}
* 14:33 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on mw1434.eqiad.wmnet with reason: known issue
* 09:46 akosiaris@cumin1001: START - Cookbook sre.hosts.decommission for hosts argon.eqiad.wmnet
* 14:33 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on mw1434.eqiad.wmnet with reason: known issue
* 09:45 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts acrab.codfw.wmnet
* 14:19 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 09:43 moritzm: delete fermium in Ganeti (was still around, but powered down) [[phab:T224586|T224586]]
* 14:06 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1436.eqiad.wmnet with reason: REIMAGE
* 09:38 akosiaris@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts acrux.codfw.wmnet
* 14:06 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
* 09:36 akosiaris@cumin1001: START - Cookbook sre.hosts.decommission for hosts acrab.codfw.wmnet
* 14:06 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 09:32 akosiaris@cumin1001: START - Cookbook sre.hosts.decommission for hosts acrux.codfw.wmnet
* 14:06 dcausse@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
* 09:31 filippo@deploy1002: Finished deploy [librenms/librenms@e7727e3]: deploy {{Gerrit|I12ac21d877c}} (duration: 00m 12s)
* 14:04 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1435.eqiad.wmnet with reason: REIMAGE
* 09:31 filippo@deploy1002: Started deploy [librenms/librenms@e7727e3]: deploy {{Gerrit|I12ac21d877c}}
* 14:03 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1436.eqiad.wmnet with reason: REIMAGE
* 09:28 moritzm: drain ganeti1012
* 14:01 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1435.eqiad.wmnet with reason: REIMAGE
* 09:27 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1010.eqiad.wmnet
* 13:32 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw143[4-6].eqiad.wmnet
* 09:20 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ganeti1010.eqiad.wmnet
* 13:29 moritzm: installing python2.7 security updates on stretch
* 08:38 moritzm: drain ganeti1010
* 13:08 moritzm: installing python3.5 security updates on stretch
* 08:38 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1009.eqiad.wmnet
* 12:27 dcausse@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
* 08:30 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ganeti1009.eqiad.wmnet
* 11:27 moritzm: installing nginx security updates on thumbor*
* 06:11 ryankemper: [WDQS Deploy] Deploy complete. Successful test query placed on query.wikidata.org, there's no relevant criticals in Icinga, and Grafana looks good
* 11:18 moritzm: installing nginx security updates on sodium (mirrors.wikimedia.org)
* 06:09 ryankemper: [WDQS Deploy] Restarted `wdqs-updater` across all hosts, 4 hosts at a time: `sudo -E cumin -b 4 'A:wdqs-all' 'systemctl restart wdqs-updater'`
* 11:03 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5 days, 8:00:00 on planet1002.eqiad.wmnet with reason: known issue
* 06:09 ryankemper: [WDQS Deploy] Restarted `wdqs-categories` across all test hosts simultaneously: `sudo -E cumin 'A:wdqs-test' 'systemctl restart wdqs-categories'`
* 11:03 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 5 days, 8:00:00 on planet1002.eqiad.wmnet with reason: known issue
* 06:09 ryankemper: [WDQS Deploy] Restarted `wdqs-categories` across lvs-managed hosts, one node at a time: `sudo -E cumin -b 1 'A:wdqs-all and not A:wdqs-test' 'depool && sleep 45 && systemctl restart wdqs-categories && sleep 45 && pool'`
* 10:11 moritzm: installing remaining nginx security updates on stretch
* 05:06 ryankemper@deploy1002: Finished deploy [wdqs/wdqs@bb5a072]: 0.3.68 (duration: 07m 31s)
* 10:09 godog: temp fix prometheus-icinga-am on alert1001
* 05:00 ryankemper: [WDQS Deploy] Tests passing following deploy of `0.3.68` on canary `wdqs1003`; proceeding to rest of fleet
* 09:40 dcausse@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
* 04:58 ryankemper@deploy1002: Started deploy [wdqs/wdqs@bb5a072]: 0.3.68
* 09:40 urbanecm: Start server-side upload for 1 video file ([[phab:T287482|T287482]])
* 04:58 ryankemper: [WDQS Deploy] Gearing up for deploy of wdqs `0.3.68`. Pre-deploy tests passing on canary `wdqs1003`
* 09:29 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 09:29 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
* 09:28 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 09:24 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
* 09:24 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 08:33 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1122.eqiad.wmnet with reason: REIMAGE
* 08:31 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1122.eqiad.wmnet with reason: REIMAGE
* 08:27 Amir1: running several long-running queries against pc1007
* 08:13 oblivian@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 08:01 dcausse@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
* 07:53 moritzm: installing aspell security updates on stretch
* 07:20 dcaro@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on 29 hosts with reason: [[phab:T287559|T287559]]
* 07:20 dcaro@cumin1001: START - Cookbook sre.hosts.downtime for 5:00:00 on 29 hosts with reason: [[phab:T287559|T287559]]
* 07:20 dcaro@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on 40 hosts with reason: [[phab:T287559|T287559]]
* 07:20 dcaro@cumin1001: START - Cookbook sre.hosts.downtime for 5:00:00 on 40 hosts with reason: [[phab:T287559|T287559]]
* 07:20 dcaro@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on 6 hosts with reason: [[phab:T287559|T287559]]
* 07:20 dcaro@cumin1001: START - Cookbook sre.hosts.downtime for 5:00:00 on 6 hosts with reason: [[phab:T287559|T287559]]
* 07:07 godog: remove cloud*/syslog.log from centrallog2001 - [[phab:T287559|T287559]]
* 07:06 godog: remove node_pinger.prom from node-pinger hosts
* 06:42 godog: remove obsolete user.log.manual-rotation from centrallog1001 to free disk space
* 02:43 TimStarling: on mwmaint2002 fixing [[phab:T286273|T286273]] broken files using eval.php


== 2021-03-25 ==
== 2021-07-27 ==
* 23:47 thcipriani@deploy1002: Synchronized php-1.36.0-wmf.36/extensions/3D/package.json: No-op demo sync (duration: 01m 07s)
* 23:53 thcipriani@deploy1002: Synchronized php-1.37.0-wmf.16/skins/Vector: Backport: [[gerrit:708220{{!}}Restore print, links, table and message box styles (T278896)]] (duration: 01m 07s)
* 23:37 stran@deploy1002: Synchronized README: (no justification provided) (duration: 01m 06s)
* 23:15 cjming@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:708152{{!}}Enable user links on office + test wikis (T287391)]] (duration: 02m 00s)
* 23:20 jhuneidi@deploy1002: Synchronized README: [[gerrit:674984{{!}}DEMO: README]] (duration: 01m 07s)
* 20:44 ryankemper: [WDQS] Returning `wdqs` dns discovery to the expected status of `(eqiad, codfw) = (depooled, pooled)`: `sudo confctl --object-type discovery select 'dnsdisc=wdqs,name=eqiad' set/pooled=false`
* 22:59 brennen: no patches for upcoming deploy window, but we'll be conducting a deployment training using DEMO patches to READMEs.
* 20:44 legoktm: legoktm@wtp1025:~$ sudo systemctl restart php7.2-fpm # restart php-fpm, opcache hit ratio was warning
* 22:16 Urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript deleteEqualMessages.php --wiki=hrwiki --delete
* 20:43 ryankemper@puppetmaster1001: conftool action : set/pooled=false; selector: dnsdisc=wdqs,name=eqiad
* 21:35 kharlan@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'linkrecommendation' for release 'external' .
* 20:35 twentyafterfour@deploy1002: Pruned MediaWiki: 1.37.0-wmf.14 (duration: 03m 12s)
* 21:35 kharlan@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'linkrecommendation' for release 'production' .
* 20:25 twentyafterfour@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.37.0-wmf.16
* 21:31 kharlan@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'linkrecommendation' for release 'production' .
* 20:02 twentyafterfour@deploy1002: Finished scap: testwikis wikis to 1.37.0-wmf.16 (duration: 36m 06s)
* 21:31 kharlan@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'linkrecommendation' for release 'external' .
* 19:26 twentyafterfour@deploy1002: Started scap: testwikis wikis to 1.37.0-wmf.16
* 21:27 kharlan@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'linkrecommendation' for release 'staging' .
* 18:49 otto@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Migrate EchoMail and EchoInteraction to EventGate - [[phab:T287210|T287210]] (duration: 02m 28s)
* 19:48 hashar@deploy1002: rebuilt and synchronized wikiversions files: Revert group 1 and 2 wikis to 1.36.0-wmf.35 - [[phab:T274940|T274940]]
* 18:17 twentyafterfour: MediaWiki branch `1.37.0-wmf.16` prepped and patched in preparation for the upcoming deployment window.
* 19:37 hashar@deploy1002: rebuilt and synchronized wikiversions files: Revert group2 wikis to 1.36.0-wmf.35 - [[phab:T274940|T274940]]
* 17:50 legoktm@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 19:36 hashar@deploy1002: sync-wikiversions aborted: (no justification provided) (duration: 00m 03s)
* 17:47 legoktm@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 19:11 hashar@deploy1002: rebuilt and synchronized wikiversions files: all wikis to 1.36.0-wmf.36
* 17:28 herron@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts logstash2022.codfw.wmnet
* 19:04 urbanecm@deploy1002: Synchronized wmf-config/flaggedrevs.php: {{Gerrit|ce7d2d7a51bd2e3717b4de7b2f7e8ae427c221ad}}: ruwiki: flaggedrevs: Delete autoeditor group ([[phab:T275337|T275337]]) (duration: 01m 08s)
* 17:18 herron@cumin1001: START - Cookbook sre.hosts.decommission for hosts logstash2022.codfw.wmnet
* 19:01 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|ce7d2d7a51bd2e3717b4de7b2f7e8ae427c221ad}}: ruwiki: flaggedrevs: Delete autoeditor group ([[phab:T275337|T275337]]) (duration: 01m 06s)
* 17:17 herron@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts logstash2021.codfw.wmnet
* 18:59 Urbanecm: `mwscript migrateUserGroup.php --wiki=ruwiki 'autoeditor' 'autoreview' ` finished ([[phab:T275337|T275337]])
* 16:54 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve1003.eqiad.wmnet
* 18:53 Urbanecm: [urbanecm@mwmaint1002 ~/uploads]$ mwscript importImages.php --wiki=commonswiki --comment-ext=txt --user=Sturm . # [[phab:T278391|T278391]]
* 16:49 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-serve1003.eqiad.wmnet
* 18:50 Urbanecm: mwscript migrateUserGroup.php --wiki=ruwiki 'autoeditor' 'autoreview' # [[phab:T275337|T275337]]
* 16:49 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve1004.eqiad.wmnet
* 18:49 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|39cd4f15a3900783ac0e9a213004a28f18298a23}}: ruwiki: flaggedrevs: Do not allow sysops to modify users in autoeditor group ([[phab:T275337|T275337]]) (duration: 01m 09s)
* 16:47 herron@cumin1001: START - Cookbook sre.hosts.decommission for hosts logstash2021.codfw.wmnet
* 18:45 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|dcfb7feaace1f397169e5e1bab7efd4e5f605a0f}}: ruwiki: flaggedrevs: Do not remove autoreview group ([[phab:T275337|T275337]]) (duration: 01m 14s)
* 16:44 herron@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts logstash2020.codfw.wmnet
* 18:39 urbanecm@deploy1002: Synchronized wmf-config/flaggedrevs.php: {{Gerrit|3fb664682bea3c4d1448b0937f938e810268bac3}}: ruwiki: flaggedrevs: Revoke review from sysop group ([[phab:T275811|T275811]]) (duration: 01m 06s)
* 16:43 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-serve1004.eqiad.wmnet
* 18:29 urbanecm@deploy1002: Synchronized logos/config.yaml: {{Gerrit|29660f9ae8468aac1578b2905606ba9dd41d095f}}: Update altwiki logo (3/3; [[phab:T275819|T275819]]) (duration: 01m 06s)
* 16:42 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve1002.eqiad.wmnet
* 18:28 urbanecm@deploy1002: Synchronized wmf-config/logos.php: {{Gerrit|29660f9ae8468aac1578b2905606ba9dd41d095f}}: Update altwiki logo (2/3; [[phab:T275819|T275819]]) (duration: 01m 06s)
* 16:37 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-serve1002.eqiad.wmnet
* 18:26 urbanecm@deploy1002: Synchronized static/images/project-logos/: {{Gerrit|29660f9ae8468aac1578b2905606ba9dd41d095f}}: Update altwiki logo (1/3; [[phab:T275819|T275819]]) (duration: 01m 10s)
* 16:34 herron@cumin1001: START - Cookbook sre.hosts.decommission for hosts logstash2020.codfw.wmnet
* 18:21 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|62be4e738a4fd45256027bb09b010ab152f19850}}: Disable magic links on enwiki ([[phab:T275951|T275951]]) (duration: 01m 20s)
* 16:26 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve1001.eqiad.wmnet
* 18:14 mutante: alert1001 - sudo systemctl restart tcpircbot-logmsgbot
* 16:21 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-serve1001.eqiad.wmnet
* 18:09 marxarelli: scap sync-file .pipeline Config: [[gerrit:674132{{!}}Include patches in restricted image (T271274)]]
* 16:14 dcausse@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
* 18:06 hnowlan: draining and restarting aqs1004-b cassandra
* 15:42 elukey: add disk_template drbd back to ml-serve-ctrl100[12] vms after performance testing - [[phab:T287238|T287238]]
* 17:45 hnowlan: draining and restarting aqs1004-a cassandra
* 15:22 dcausse: cirrus: reindexing 823 wikis in elastic@[eqiad, codfw and cloudelastic] to apply new mapping (weighted_tags) [[phab:T147505|T147505]]
* 17:16 mbsantos@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'proton' for release 'production' .
* 15:22 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve2004.codfw.wmnet
* 17:14 mbsantos@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'proton' for release 'production' .
* 15:17 mmandere: pool lvs1014.eqiad.wmnet - [[phab:T286061|T286061]]
* 17:08 mbsantos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'proton' for release 'production' .
* 15:16 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-serve2004.codfw.wmnet
* 16:39 hashar: Restarted Apache 2 on contint2001 / contint1001
* 15:16 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve2003.codfw.wmnet
* 16:35 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-upgrade
* 15:11 marostegui: Move m1-master from dbproxy1012 to dbproxy1014 [[phab:T286061|T286061]]
* 16:32 moritzm: restarting apache on an-tool1007/turnilo
* 15:11 mmandere: pool authdns1001.wikimedia.org - [[phab:T286061|T286061]]
* 16:27 moritzm: restarting dnsdist/rdns-recursor on malmok
* 15:10 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-serve2003.codfw.wmnet
* 16:24 jbond42: restart slapd on ldap-replica
* 15:09 mmandere: pool cp10[79-82].eqiad.wmnet - [[phab:T286061|T286061]]
* 16:22 jbond42: restart slapd on ldap-corp
* 15:05 jmm@puppetmaster1001: conftool action : set/pooled=yes; selector: name=ldap-replica1003.wikimedia.org
* 16:20 jbond42: restart apache on lists1002
* 14:57 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve2002.codfw.wmnet
* 16:18 jbond42: restart apache on netbox
* 14:55 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on lvs1014.eqiad.wmnet with reason: Eqiad row B maintenance
* 16:13 hashar@deploy1002: Synchronized php-1.36.0-wmf.36/extensions/ProofreadPage: Disallow negative or decimal values in pages tag - [[phab:T278400|T278400]] (duration: 01m 32s)
* 14:55 mmandere@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on lvs1014.eqiad.wmnet with reason: Eqiad row B maintenance
* 16:12 jbond42: restart routinator on rpki*
* 14:53 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1162 [[phab:T287230|T287230]]', diff saved to https://phabricator.wikimedia.org/P16917 and previous config saved to /var/cache/conftool/dbconfig/20210727-145352-marostegui.json
* 16:12 moritzm: restarting nginx on apt*
* 14:53 moritzm: disabling puppet for upcoming row B maintenance
* 16:10 moritzm: restarting apache on dbmonitor
* 14:52 mmandere: depool lvs1014 - [[phab:T286061|T286061]]
* 16:08 moritzm: restart Apacge on matomo/piwik
* 14:52 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-serve2002.codfw.wmnet
* 16:03 jbond42: restart apache service on gerrit
* 14:52 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve2001.codfw.wmnet
* 16:02 jbond42: restart idp service
* 14:51 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on authdns1001.wikimedia.org with reason: Eqiad row B maintenance
* 16:01 ema: A:cp rolling ats-<nowiki>{</nowiki>tls,backend<nowiki>}</nowiki>-restart for openssl upgrades -- https://www.openssl.org/news/secadv/20210325.txt
* 14:51 mmandere@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on authdns1001.wikimedia.org with reason: Eqiad row B maintenance
* 15:45 moritzm: installing openssl updates on buster
* 14:48 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sretest1002.eqiad.wmnet
* 14:48 herron@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 14:47 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-serve2001.codfw.wmnet
* 14:45 herron@cumin1001: START - Cookbook sre.dns.netbox
* 14:47 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on cp[1079-1082].eqiad.wmnet with reason: Eqiad row B maintenance
* 14:13 twentyafterfour: update phabricator again (last night's update undid a hotfix that is now fixed properly)
* 14:46 mmandere@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on cp[1079-1082].eqiad.wmnet with reason: Eqiad row B maintenance
* 13:45 moritzm: drain ganeti1009
* 14:45 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host sretest1002.eqiad.wmnet
* 13:27 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on webperf1001.eqiad.wmnet with reason: adapt RAM
* 14:43 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1129.eqiad.wmnet with reason: REIMAGE
* 13:27 jmm@cumin2001: START - Cookbook sre.hosts.downtime for 1:00:00 on webperf1001.eqiad.wmnet with reason: adapt RAM
* 14:41 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sretest1002.eqiad.wmnet
* 13:27 moritzm: reduce webperf1001/webperf2001 to 4G RAM (xhgui has been split off to separate VMs)
* 14:40 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1129.eqiad.wmnet with reason: REIMAGE
* 13:24 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1008.eqiad.wmnet
* 14:40 mmandere: depool authdns1001 - [[phab:T286061|T286061]]
* 13:18 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ganeti1008.eqiad.wmnet
* 14:40 elukey@cumin1001: END (ERROR) - Cookbook sre.hosts.reboot-single (exit_code=97) for host ml-serve-ctrl2002.codfw.wmnet
* 12:52 hnowlan: aqs1004 nodetool-a cleanup finished
* 14:36 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host sretest1002.eqiad.wmnet
* 12:14 moritzm: drain ganeti1008
* 14:34 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-serve-ctrl2002.codfw.wmnet
* 12:12 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1007.eqiad.wmnet
* 14:33 mmandere: depool cp10[79-82]).eqiad.wmnet - [[phab:T286061|T286061]]
* 12:08 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ganeti1007.eqiad.wmnet
* 14:33 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve-ctrl2001.codfw.wmnet
* 11:52 ladsgroup@deploy1002: Synchronized wmf-config/InitialiseSettings.php: [[gerrit:674861{{!}}Disable Legacy javascript in fawikiquote]] ([[phab:T72470|T72470]]) (duration: 01m 07s)
* 14:30 topranks: Add peering to AS398196 - Cobalt Ridge at DE-CIX Dallas on cr2-codfw.
* 11:46 moritzm: drain ganeti1007
* 14:29 elukey: reduce vcores for ml-serve-ctrl[12]00[12] after performance testing - [[phab:T287238|T287238]]
* 11:44 ladsgroup@deploy1002: Synchronized php-1.36.0-wmf.36/skins/Vector/resources: [[gerrit:674382{{!}}Inform anonymous A/B test by tracking time from navigationStart (T275807)]] (duration: 01m 09s)
* 14:28 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-serve-ctrl2001.codfw.wmnet
* 11:43 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1006.eqiad.wmnet
* 14:25 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1129 [[phab:T287230|T287230]]', diff saved to https://phabricator.wikimedia.org/P16916 and previous config saved to /var/cache/conftool/dbconfig/20210727-142520-marostegui.json
* 11:39 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ganeti1006.eqiad.wmnet
* 14:19 otto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
* 11:33 ladsgroup@deploy1002: Synchronized dblists/: [[gerrit:674857{{!}}tawiki: Enable Growth features in dark mode]], Part II ([[phab:T278369|T278369]]) (duration: 01m 07s)
* 14:16 otto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-main' for release 'canary' .
* 11:32 ladsgroup@deploy1002: Synchronized wmf-config: [[gerrit:674857{{!}}tawiki: Enable Growth features in dark mode]] ([[phab:T278369|T278369]]) (duration: 01m 30s)
* 14:13 otto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
* 11:29 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pki2001.codfw.wmnet with reason: REIMAGE
* 14:13 otto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-main' for release 'canary' .
* 11:27 jbond@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on pki2001.codfw.wmnet with reason: REIMAGE
* 14:11 moritzm: installing aspell security updates
* 11:24 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dns4001.wikimedia.org
* 14:11 otto@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
* 11:19 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pki1001.eqiad.wmnet with reason: REIMAGE
* 14:07 otto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
* 11:18 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host dns4001.wikimedia.org
* 14:07 otto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'canary' .
* 11:17 jbond@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on pki1001.eqiad.wmnet with reason: REIMAGE
* 14:03 otto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
* 11:10 moritzm: drain ganeti1006
* 14:03 otto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'canary' .
* 11:03 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1005.eqiad.wmnet
* 14:00 otto@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
* 10:58 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ganeti1005.eqiad.wmnet
* 13:59 otto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
* 10:54 kharlan@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'linkrecommendation' for release 'external' .
* 13:59 otto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'canary' .
* 10:54 kharlan@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'linkrecommendation' for release 'production' .
* 13:54 otto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
* 10:51 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host flerovium.eqiad.wmnet
* 13:54 otto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'canary' .
* 10:48 kharlan@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'linkrecommendation' for release 'external' .
* 13:52 otto@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
* 10:48 kharlan@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'linkrecommendation' for release 'production' .
* 13:42 volans@deploy1002: Finished deploy [netbox/deploy@660ad14]: Deploy v2.10.4-wmf5 (duration: 02m 29s)
* 10:45 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host flerovium.eqiad.wmnet
* 13:40 otto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-analytics' for release 'production' .
* 10:44 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host furud.codfw.wmnet
* 13:39 volans@deploy1002: Started deploy [netbox/deploy@660ad14]: Deploy v2.10.4-wmf5
* 10:42 kharlan@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'linkrecommendation' for release 'staging' .
* 13:36 otto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-analytics' for release 'production' .
* 10:40 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host furud.codfw.wmnet
* 13:34 otto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-analytics' for release 'canary' .
* 10:36 hnowlan: running general nodetool cleanup on aqs1004-a
* 13:30 ottomata: deploying eventgate-analytics with native prometheus support.  Doing this slowly on canary release first to ensure https://wikitech.wikimedia.org/wiki/Incident_documentation/2021-07-14_eventgate-analytics_latency_spike_caused_MW_app_server_overload is fixed.
* 10:35 hnowlan: running cleanup on aqs1004-a: nodetool-a cleanup "local_group_default_T_pageviews_per_project_v2" data
* 13:29 otto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-analytics' for release 'canary' .
* 10:34 moritzm: drain ganeti1005
* 12:56 elukey: created component/iptables185 for buster-wikimedia + imported packages from buster-backports
* 10:29 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2024.codfw.wmnet
* 12:50 dcausse@deploy1002: Finished deploy [wikimedia/discovery/analytics@346ac10]: (no justification provided) (duration: 06m 13s)
* 10:28 dcaro@cumin1001: END (PASS) - Cookbook sre.hosts.upgrade-and-reboot (exit_code=0)
* 12:43 dcausse@deploy1002: Started deploy [wikimedia/discovery/analytics@346ac10]: (no justification provided)
* 10:24 dcaro@cumin1001: START - Cookbook sre.hosts.upgrade-and-reboot
* 11:23 Lucas_WMDE: EU backport+config window done
* 10:23 dcaro@cumin1001: END (PASS) - Cookbook sre.hosts.upgrade-and-reboot (exit_code=0)
* 11:20 oblivian@deploy1002: Synchronized debug.json: Config: [[gerrit:708255{{!}}Add the experimental kubernetes backend to mwdebug (T283056)]] (duration: 00m 56s)
* 10:22 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ganeti2024.codfw.wmnet
* 11:10 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:704456{{!}}Add stream configuration for ContentTranslation events (T281982)]] (duration: 00m 58s)
* 10:18 dcaro@cumin1001: START - Cookbook sre.hosts.upgrade-and-reboot
* 10:39 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw1269.eqiad.wmnet
* 10:17 dcaro@cumin1001: END (PASS) - Cookbook sre.hosts.upgrade-and-reboot (exit_code=0)
* 10:16 jelto: gitlab-ansible playbook on gitlab2001.wikimedia.org END (PASS)
* 10:13 dcaro@cumin1001: START - Cookbook sre.hosts.upgrade-and-reboot
* 10:11 mutante: replacing scap proxies: mw1269 with mw1420, mw1285 with mw1306
* 10:13 dcaro@cumin1001: END (FAIL) - Cookbook sre.hosts.upgrade-and-reboot (exit_code=99)
* 10:10 marostegui@cumin1001: dbctl commit (dc=all): 'db2147 (re)pooling @ 100%: After mariadb restart and upgraed', diff saved to https://phabricator.wikimedia.org/P16909 and previous config saved to /var/cache/conftool/dbconfig/20210727-101053-root.json
* 10:13 dcaro@cumin1001: START - Cookbook sre.hosts.upgrade-and-reboot
* 10:10 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw1269.eqiad.wmnet
* 09:26 moritzm: drain ganeti2024
* 10:06 jelto: running gitlab-ansible playbook on gitlab2001.wikimedia.org
* 09:25 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2023.codfw.wmnet
* 10:05 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw1269.eqiad.wmnet
* 09:17 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ganeti2023.codfw.wmnet
* 09:55 marostegui@cumin1001: dbctl commit (dc=all): 'db2147 (re)pooling @ 75%: After mariadb restart and upgraed', diff saved to https://phabricator.wikimedia.org/P16908 and previous config saved to /var/cache/conftool/dbconfig/20210727-095549-root.json
* 08:45 moritzm: drain ganeti2023
* 09:52 jynus: reverting query killer parameters on s3 codfw replicas
* 08:43 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2022.codfw.wmnet
* 09:41 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw1285.eqiad.wmnet
* 08:35 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ganeti2022.codfw.wmnet
* 09:40 marostegui@cumin1001: dbctl commit (dc=all): 'db2147 (re)pooling @ 50%: After mariadb restart and upgraed', diff saved to https://phabricator.wikimedia.org/P16906 and previous config saved to /var/cache/conftool/dbconfig/20210727-094046-root.json
* 08:12 elukey: upgrade hive packages in thirdparty/bigtop15 to 2.3.6-2 for buster-wikimedia
* 09:25 marostegui@cumin1001: dbctl commit (dc=all): 'db2147 (re)pooling @ 25%: After mariadb restart and upgraed', diff saved to https://phabricator.wikimedia.org/P16905 and previous config saved to /var/cache/conftool/dbconfig/20210727-092542-root.json
* 08:11 elukey: upgrade hive packages in thirdparty/bigtop15 to 2.3.6-2
* 09:13 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw1285.eqiad.wmnet
* 07:41 legoktm: upgraded lists1002 to hyperkitty 1.2.2-1+wmf1 ([[phab:T276687|T276687]])
* 09:12 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw1285.eqiad.wmnet
* 07:36 legoktm: uploaded hyperkitty 1.2.2-1+wmf1 to buster-wikimedia ([[phab:T276687|T276687]])
* 09:10 marostegui@cumin1001: dbctl commit (dc=all): 'db2147 (re)pooling @ 15%: After mariadb restart and upgraed', diff saved to https://phabricator.wikimedia.org/P16904 and previous config saved to /var/cache/conftool/dbconfig/20210727-091038-root.json
* 07:35 jynus: restart db2135 [[phab:T278408|T278408]] [[phab:T273281|T273281]]
* 09:04 _joe_: restarting pybal on lvs2009 to pick up the new api depool threshold
* 07:05 effie: enable puppet on all mediawiki servers
* 08:57 _joe_: repooling mw225[12] for apis
* 06:57 XioNoX: Option 82: use-vlan-id
* 08:56 _joe_: restart pybal on lvs2010 to pick up the depool threshold change
* 06:53 effie: enable puppet on jobrunners
* 08:55 marostegui@cumin1001: dbctl commit (dc=all): 'db2147 (re)pooling @ 10%: After mariadb restart and upgraed', diff saved to https://phabricator.wikimedia.org/P16902 and previous config saved to /var/cache/conftool/dbconfig/20210727-085535-root.json
* 06:47 effie: enable puppet on parsoid
* 08:40 marostegui@cumin1001: dbctl commit (dc=all): 'db2147 (re)pooling @ 5%: After mariadb restart and upgraed', diff saved to https://phabricator.wikimedia.org/P16901 and previous config saved to /var/cache/conftool/dbconfig/20210727-084031-root.json
* 06:40 effie: disable puppet on all mediawiki servers to merge 673061 (service proxy to listen on ::1)
* 08:36 jynus: reenabled puppet on mwmaint1002
* 06:23 ryankemper@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.rolling-upgrade (exit_code=99)
* 08:29 volans@deploy1002: Finished deploy [netbox/deploy@660ad14]: Test v2.10.4-wmf5 on netbox-next (duration: 01m 01s)
* 05:19 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-upgrade
* 08:28 volans@deploy1002: Started deploy [netbox/deploy@660ad14]: Test v2.10.4-wmf5 on netbox-next
* 04:44 legoktm: restarted exim4 on lists1002 so it listens on 0.0.0.0 instead of 127.0.0.1
* 08:28 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2147 to restart mysql', diff saved to https://phabricator.wikimedia.org/P16900 and previous config saved to /var/cache/conftool/dbconfig/20210727-082820-marostegui.json
* 04:16 ryankemper@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.rolling-upgrade (exit_code=99)
* 07:52 jynus: disabling puppet on mwmaint1002
* 03:10 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-upgrade
* 07:14 moritzm: installing krb security updates on buster
* 01:33 ryankemper@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.rolling-upgrade (exit_code=99)
* 06:50 elukey: install iptables from buster-backports (manually) on ml-serve-ctrl200[1,2] as test (+ reboot the nodes for a clean start) - [[phab:T287238|T287238]]
* 01:10 legoktm: mailman3: added lists-next.wikimedia.org domain
* 06:20 ladsgroup@deploy1002: Synchronized wmf-config/Wikibase.php: Config: [[gerrit:708204{{!}}Enable request language for RDF stubs in testwikidatawiki (T285795)]], Part II (duration: 00m 56s)
* 01:08 legoktm: mailman3: renamed default site from "example.com" to "lists-next.wikimedia.org"
* 06:18 ladsgroup@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:708204{{!}}Enable request language for RDF stubs in testwikidatawiki (T285795)]], Part I (duration: 00m 57s)
* 00:50 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2378.codfw.wmnet
* 05:34 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1162.eqiad.wmnet with reason: REIMAGE
* 00:35 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2377.codfw.wmnet
* 05:32 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1162.eqiad.wmnet with reason: REIMAGE
* 00:35 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2777.codfw.wmnet
* 05:12 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1162 [[phab:T287230|T287230]]', diff saved to https://phabricator.wikimedia.org/P16899 and previous config saved to /var/cache/conftool/dbconfig/20210727-051212-marostegui.json
* 00:34 mutante: mw2377, mw2378 - first scap pull
* 00:33 dzahn@cumin1001: conftool action : set/weight=10; selector: name=mw2378.codfw.wmnet
* 00:33 dzahn@cumin1001: conftool action : set/weight=10; selector: name=mw2377.codfw.wmnet
* 00:32 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2378.codfw.wmnet
* 00:32 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2377.codfw.wmnet
* 00:29 legoktm: syncing facts for puppet-compiler
* 00:23 mutante: mw2377, mw2378 - reboot
* 00:14 twentyafterfour: phabricator update complete
* 00:10 twentyafterfour: deploying phabricator
* 00:05 ryankemper: [[phab:T274204|T274204]] `sudo -i cookbook sre.elasticsearch.rolling-upgrade search_eqiad "eqiad cluster reboot" --task-id [[phab:T274204|T274204]] --nodes-per-run 3 --start-datetime 2021-03-24T23:55:35` on `ryankemper@cumin1001` tmux session `elasticsearch_rolling_upgrade_reboots`


== 2021-03-24 ==
== 2021-07-26 ==
* 23:57 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2378.codfw.wmnet with reason: new_install
* 23:37 legoktm@deploy1002: Synchronized php-1.37.0-wmf.15/extensions/Score/includes/Score.php: Increase lilypond version cache TTL to 1 hour (duration: 00m 57s)
* 23:57 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2378.codfw.wmnet with reason: new_install
* 18:30 cstone: SmashPig revision changed from {{Gerrit|be272c02ce}} to {{Gerrit|020d4eccd4}},
* 23:56 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2377.codfw.wmnet with reason: new_install
* 17:41 legoktm: ran `scap pull` and repooled mw2336.codfw.wmnet - [[phab:T287394|T287394]]
* 23:56 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2377.codfw.wmnet with reason: new_install
* 17:41 legoktm@cumin1001: conftool action : set/pooled=yes; selector: name=mw2336.codfw.wmnet
* 23:56 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-upgrade
* 17:40 jynus@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dbprov1002.eqiad.wmnet with reason: REIMAGE
* 23:48 mutante: generating new mcrouter certs for mw2377, mw2378
* 17:38 jynus@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on dbprov1002.eqiad.wmnet with reason: REIMAGE
* 22:07 ryankemper@cumin1001: END (PASS) - Cookbook sre.elasticsearch.rolling-upgrade (exit_code=0)
* 16:06 legoktm: depooled mw2336.codfw.mwnet, mgmt is down too. [[phab:T287394|T287394]]
* 22:07 legoktm: disabled puppet on lists1002 while mailman3-web is broken
* 16:04 legoktm@cumin1001: conftool action : set/pooled=no; selector: name=mw2336.codfw.wmnet
* 21:49 ryankemper@cumin2001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 15:29 hashar: Restarted gerrit replica on gerrit2001.wikimedia.org # [[phab:T287122|T287122]]
* 21:19 mutante: webperf2001 - restarted apache
* 15:24 ladsgroup@deploy1002: Synchronized php-1.37.0-wmf.15/extensions/AbuseFilter/includes/AbuseFilterHooks.php: Backport: [[gerrit:707021{{!}}Don’t generate current content text twice]], Part II (duration: 01m 49s)
* 21:11 hashar@deploy1002: Synchronized php: group1 wikis to 1.36.0-wmf.36 (duration: 01m 07s)
* 15:21 ladsgroup@deploy1002: Synchronized php-1.37.0-wmf.15/extensions/AbuseFilter/includes/VariableGenerator/RunVariableGenerator.php: Backport: [[gerrit:707021{{!}}Don’t generate current content text twice]], Part I (duration: 01m 50s)
* 21:10 hashar@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.36.0-wmf.36
* 15:19 topranks: Adding peering to AS139931 - Bangladesh Submarine Cable Company - at Equinix Singapore on cr3-eqsin
* 21:08 kharlan@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'linkrecommendation' for release 'external' .
* 14:42 dcausse@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
* 21:08 kharlan@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'linkrecommendation' for release 'production' .
* 13:42 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 21:07 hashar@deploy1002: Synchronized php-1.36.0-wmf.36/extensions/GrowthExperiments: LinkRecommendation: Modify path args for calls to API - [[phab:T277865|T277865]] (duration: 01m 07s)
* 10:55 ladsgroup@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Disable DPL on ruwikinews (duration: 00m 27s)
* 21:05 hashar@deploy1002: Synchronized php-1.36.0-wmf.36/extensions/ProofreadPage: Revert "Add default TemplateStyles for an Index" - [[phab:T278379|T278379]] (duration: 01m 07s)
* 10:53 ladsgroup@deploy1002: Scap failed!: 3/6 canaries failed their endpoint checks(https://en.wikipedia.org)
* 21:04 kharlan@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'linkrecommendation' for release 'production' .
* 10:52 ladsgroup@deploy1002: Scap failed!: 2/6 canaries failed their endpoint checks(https://en.wikipedia.org)
* 21:04 kharlan@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'linkrecommendation' for release 'external' .
* 10:51 jynus: deploying 10 second mw user query limit on s3 codfw replicas
* 21:02 hashar@deploy1002: Synchronized php-1.36.0-wmf.36/extensions/GlobalUsage: Fix hook registration after class was namespaced - [[phab:T278375|T278375]] (duration: 01m 07s)
* 10:49 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db2149', diff saved to https://phabricator.wikimedia.org/P16895 and previous config saved to /var/cache/conftool/dbconfig/20210726-104953-marostegui.json
* 20:59 hashar@deploy1002: Synchronized wmf-config/env.php: multiversion: Move '@' operator in env.php closer to relevant statement (duration: 01m 07s)
* 10:46 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db2149', diff saved to https://phabricator.wikimedia.org/P16894 and previous config saved to /var/cache/conftool/dbconfig/20210726-104649-marostegui.json
* 20:56 kharlan@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'linkrecommendation' for release 'staging' .
* 10:46 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db2149', diff saved to https://phabricator.wikimedia.org/P16893 and previous config saved to /var/cache/conftool/dbconfig/20210726-104613-marostegui.json
* 20:30 kharlan@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'linkrecommendation' for release 'staging' .
* 10:38 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2149', diff saved to https://phabricator.wikimedia.org/P16892 and previous config saved to /var/cache/conftool/dbconfig/20210726-103847-marostegui.json
* 20:26 kharlan@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'linkrecommendation' for release 'staging' .
* 10:33 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 20:13 ryankemper@cumin2001: START - Cookbook sre.wdqs.data-transfer
* 09:55 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 20:13 ryankemper@cumin2001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 09:15 XioNoX: rollback sampling for [[phab:T286038|T286038]]
* 20:10 ryankemper@cumin2001: START - Cookbook sre.wdqs.data-transfer
* 08:31 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sretest1002.eqiad.wmnet
* 20:09 ryankemper@cumin2001: START - Cookbook sre.wdqs.data-transfer
* 08:27 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host sretest1002.eqiad.wmnet
* 20:07 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on cumin2002.codfw.wmnet with reason: REIMAGE
* 08:26 jmm@cumin2002: END (ERROR) - Cookbook sre.hosts.reboot-single (exit_code=97) for host sretest1001.eqiad.wmnet
* 20:05 pt1979@cumin2001: START - Cookbook sre.hosts.downtime for 2:00:00 on cumin2002.codfw.wmnet with reason: REIMAGE
* 08:11 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host sretest1001.eqiad.wmnet
* 19:59 ryankemper@cumin2001: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99)
* 07:18 _joe_: docker-image prune on deneb [[phab:T287222|T287222]]
* 19:59 ryankemper@cumin2001: START - Cookbook sre.wdqs.data-transfer
* 07:17 _joe_: manage-production-images prune on deneb, [[phab:T287222|T287222]]
* 19:57 ryankemper: [[phab:T267927|T267927]] Host key is missing for `wdqs2008` leading to `data-transfer` cookbook failing, looking into resolving
* 07:08 marostegui: Optimize dewiki.logging in eqiad (there will be lag)
* 19:55 ryankemper@cumin2001: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99)
* 06:39 moritzm: installing krb5 security updates
* 19:55 ryankemper@cumin2001: START - Cookbook sre.wdqs.data-transfer
* 05:55 Amir1: start cleaning up auto-review flagged revs logs in plwiki
* 19:50 ryankemper@cumin2001: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99)
* 19:50 ryankemper@cumin2001: START - Cookbook sre.wdqs.data-transfer
* 19:49 ryankemper@cumin2001: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99)
* 19:49 ryankemper@cumin2001: START - Cookbook sre.wdqs.data-transfer
* 19:45 ryankemper@cumin2001: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99)
* 19:45 ryankemper@cumin2001: START - Cookbook sre.wdqs.data-transfer
* 19:42 ryankemper: [[phab:T267927|T267927]] Re-enabledpuppet on `wdqs2008` and ran puppet agent
* 19:21 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-upgrade
* 19:14 hashar@deploy1002: rebuilt and synchronized wikiversions files: Revert group 1 to 1.36.0-wmf.35
* 19:07 hashar@deploy1002: Synchronized php: group1 wikis to 1.36.0-wmf.36 (duration: 01m 21s)
* 19:05 hashar@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.36.0-wmf.36
* 19:03 urbanecm@deploy1002: Synchronized wmf-config/config/shwiki.yaml: {{Gerrit|0f3aa7278d17c88f27b7d58ceede82730fd4ddcd}}: shwiki: Enable Growth features in dark mode ([[phab:T278240|T278240]]; 3/3) (duration: 01m 08s)
* 19:02 urbanecm@deploy1002: Synchronized dblists/growthexperiments.dblist: {{Gerrit|0f3aa7278d17c88f27b7d58ceede82730fd4ddcd}}: shwiki: Enable Growth features in dark mode ([[phab:T278240|T278240]]; 2/3) (duration: 01m 06s)
* 19:00 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|0f3aa7278d17c88f27b7d58ceede82730fd4ddcd}}: shwiki: Enable Growth features in dark mode ([[phab:T278240|T278240]]; 1/3) (duration: 01m 07s)
* 18:54 urbanecm@deploy1002: Synchronized wmf-config/config/eswiki.yaml: {{Gerrit|ced092071a9638d1e1c04602bd5bbed5cc3812e3}}: Enable Growth features on eswiki in dark mode ([[phab:T278235|T278235]]; 3/3) (duration: 01m 06s)
* 18:53 urbanecm@deploy1002: Synchronized dblists/growthexperiments.dblist: {{Gerrit|ced092071a9638d1e1c04602bd5bbed5cc3812e3}}: Enable Growth features on eswiki in dark mode ([[phab:T278235|T278235]]; 2/3) (duration: 01m 07s)
* 18:52 urbanecm@deploy1002: sync-file aborted: {{Gerrit|ced092071a9638d1e1c04602bd5bbed5cc3812e3}}: Enable Growth features on eswiki in dark mode (2/3) (duration: 00m 01s)
* 18:51 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|ced092071a9638d1e1c04602bd5bbed5cc3812e3}}: Enable Growth features on eswiki in dark mode ([[phab:T278235|T278235]]; 1/3) (duration: 01m 08s)
* 18:49 legoktm@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 18:45 legoktm@cumin1001: START - Cookbook sre.dns.netbox
* 18:42 legoktm@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
* 18:40 legoktm@cumin1001: START - Cookbook sre.dns.netbox
* 18:31 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|5aa050602954a3cab0c7e0c4b10efb0f957efb59}}: Promote several Growth target wikis out of dark mode ([[phab:T277491|T277491]]; [[phab:T276830|T276830]]; [[phab:T276123|T276123]]; [[phab:T276816|T276816]]; [[phab:T275550|T275550]]; [[phab:T276450|T276450]]) (duration: 01m 08s)
* 18:20 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|333393dfe59deb0ec4d7df6dd92372a705f65b85}}: Add autopatrol to autoreviewers in en.wikibooks ([[phab:T278300|T278300]]) (duration: 01m 09s)
* 18:08 pt1979@cumin2001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 18:02 pt1979@cumin2001: START - Cookbook sre.dns.netbox
* 17:25 effie: upgrade memcached on mc-gp* hosts
* 15:45 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on irc2001.wikimedia.org with reason: adapt RAM
* 15:45 jmm@cumin2001: START - Cookbook sre.hosts.downtime for 1:00:00 on irc2001.wikimedia.org with reason: adapt RAM
* 15:42 moritzm: reduce RAM for irc2001 to 2G, was originally created with 8 G [[phab:T224579|T224579]]
* 15:35 effie: enable puppet on all mediawiki + memcached hosts
* 15:20 moritzm: drain ganeti2022
* 15:20 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2021.codfw.wmnet
* 15:10 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ganeti2021.codfw.wmnet
* 14:35 moritzm: drain ganeti2021
* 14:31 effie: disable puppet on all mediawiki servers + memcached for 674290
* 14:05 moritzm: failover Ganeti master in codfw to ganeti2019
* 13:59 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2020.codfw.wmnet
* 13:51 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ganeti2020.codfw.wmnet
* 13:29 moritzm: installing irc1001
* 13:15 moritzm: drain ganeti2020
* 12:38 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2019.codfw.wmnet
* 12:30 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ganeti2019.codfw.wmnet
* 12:28 effie: enabling puppet on mediawiki and memcached servers
* 12:10 jynus: restart dbprov200[12] [[phab:T271913|T271913]]
* 11:59 marostegui@cumin1001: dbctl commit (dc=all): 'db1160 (re)pooling @ 100%: Slowly repool db1160 after schema change', diff saved to https://phabricator.wikimedia.org/P15076 and previous config saved to /var/cache/conftool/dbconfig/20210324-115940-root.json
* 11:57 Andrew-WMDE_: EU deploys done
* 11:53 jynus: restart dbprov100[12] [[phab:T271913|T271913]]
* 11:51 andrew-wmde@deploy1002: Synchronized php-1.36.0-wmf.35/extensions/MassMessage/: Backport: [[gerrit:674367{{!}}MassMessage: Unbreak remote content fetching (T276936)]] (duration: 01m 08s)
* 11:49 effie: disable puppet on all hosts running mediawiki+memcached to merge 674282
* 11:45 andrew-wmde@deploy1002: Synchronized php-1.36.0-wmf.36/extensions/MassMessage/: Backport: [[gerrit:674366{{!}}MassMessage: Unbreak remote content fetching (T276936)]] (duration: 01m 07s)
* 11:44 marostegui@cumin1001: dbctl commit (dc=all): 'db1160 (re)pooling @ 75%: Slowly repool db1160 after schema change', diff saved to https://phabricator.wikimedia.org/P15075 and previous config saved to /var/cache/conftool/dbconfig/20210324-114436-root.json
* 11:29 marostegui@cumin1001: dbctl commit (dc=all): 'db1160 (re)pooling @ 50%: Slowly repool db1160 after schema change', diff saved to https://phabricator.wikimedia.org/P15074 and previous config saved to /var/cache/conftool/dbconfig/20210324-112932-root.json
* 11:22 andrew-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:673326{{!}}Enable CodeMirror accessibility colors on initial wikis (T276346)]] (duration: 01m 08s)
* 11:15 jynus: restart serially db2097 db2098 db2099 db2100 [[phab:T271913|T271913]]
* 11:14 andrew-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:673312{{!}}Enable bracket matching on group0 and wikitech (T273591)]] (duration: 01m 25s)
* 11:14 marostegui@cumin1001: dbctl commit (dc=all): 'db1160 (re)pooling @ 25%: Slowly repool db1160 after schema change', diff saved to https://phabricator.wikimedia.org/P15073 and previous config saved to /var/cache/conftool/dbconfig/20210324-111429-root.json
* 10:50 jmm@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host irc1001.wikimedia.org
* 10:48 mbsantos@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
* 10:45 mbsantos@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
* 10:44 mbsantos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'wikifeeds' for release 'staging' .
* 10:36 jmm@cumin1001: START - Cookbook sre.ganeti.makevm for new host irc1001.wikimedia.org
* 10:31 jynus: restart db1171 [[phab:T271913|T271913]]
* 10:15 akosiaris@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'staging' .
* 10:14 akosiaris@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' .
* 10:14 jynus: restart db1145 [[phab:T271913|T271913]]
* 10:06 akosiaris@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'staging' .
* 10:06 akosiaris@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' .
* 10:03 jynus: restart db1139 [[phab:T271913|T271913]]
* 09:56 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1160 for schema change', diff saved to https://phabricator.wikimedia.org/P15072 and previous config saved to /var/cache/conftool/dbconfig/20210324-095655-marostegui.json
* 09:56 marostegui@cumin1001: dbctl commit (dc=all): 'db1149 (re)pooling @ 100%: Slowly repool db1149 after schema change', diff saved to https://phabricator.wikimedia.org/P15071 and previous config saved to /var/cache/conftool/dbconfig/20210324-095606-root.json
* 09:51 jynus: restart db1116 [[phab:T271913|T271913]]
* 09:41 marostegui@cumin1001: dbctl commit (dc=all): 'db1149 (re)pooling @ 75%: Slowly repool db1149 after schema change', diff saved to https://phabricator.wikimedia.org/P15070 and previous config saved to /var/cache/conftool/dbconfig/20210324-094102-root.json
* 09:28 jayme@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' .
* 09:28 jayme@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'staging' .
* 09:25 marostegui@cumin1001: dbctl commit (dc=all): 'db1149 (re)pooling @ 50%: Slowly repool db1149 after schema change', diff saved to https://phabricator.wikimedia.org/P15069 and previous config saved to /var/cache/conftool/dbconfig/20210324-092558-root.json
* 09:10 marostegui@cumin1001: dbctl commit (dc=all): 'db1149 (re)pooling @ 25%: Slowly repool db1149 after schema change', diff saved to https://phabricator.wikimedia.org/P15068 and previous config saved to /var/cache/conftool/dbconfig/20210324-091055-root.json
* 08:29 jayme@cumin1001: conftool action : set/pooled=true; selector: name=eqiad,dnsdisc=sessionstore
* 08:16 gehel: restarting wdqs updater on all nodes for config change
* 08:14 jayme@cumin1001: conftool action : set/pooled=true; selector: name=eqiad,dnsdisc=eventgate-analytics
* 08:14 jayme@cumin1001: conftool action : set/pooled=true; selector: name=eqiad,dnsdisc=eventgate-analytics-external
* 08:10 marostegui@cumin1001: dbctl commit (dc=all): 'db1086 (re)pooling @ 75%: Slowly repool db1086 after schema change', diff saved to https://phabricator.wikimedia.org/P15066 and previous config saved to /var/cache/conftool/dbconfig/20210324-081057-root.json
* 08:07 marostegui@cumin1001: dbctl commit (dc=all): 'db1141 (re)pooling @ 100%: Slowly repool db1141', diff saved to https://phabricator.wikimedia.org/P15065 and previous config saved to /var/cache/conftool/dbconfig/20210324-080725-root.json
* 08:02 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1149 for schema change', diff saved to https://phabricator.wikimedia.org/P15064 and previous config saved to /var/cache/conftool/dbconfig/20210324-080223-marostegui.json
* 08:01 jayme@cumin1001: conftool action : set/pooled=true; selector: name=eqiad,dnsdisc=eventgate-main
* 08:01 jayme@cumin1001: conftool action : set/pooled=true; selector: name=eqiad,dnsdisc=eventgate-logging-external
* 08:01 jayme@cumin1001: conftool action : set/pooled=true; selector: name=eqiad,dnsdisc=zotero
* 07:55 marostegui@cumin1001: dbctl commit (dc=all): 'db1086 (re)pooling @ 50%: Slowly repool db1086 after schema change', diff saved to https://phabricator.wikimedia.org/P15063 and previous config saved to /var/cache/conftool/dbconfig/20210324-075553-root.json
* 07:52 marostegui@cumin1001: dbctl commit (dc=all): 'db1141 (re)pooling @ 75%: Slowly repool db1141', diff saved to https://phabricator.wikimedia.org/P15062 and previous config saved to /var/cache/conftool/dbconfig/20210324-075221-root.json
* 07:50 jayme@cumin1001: conftool action : set/pooled=true; selector: name=eqiad,dnsdisc=dnsdisc=eventgate-main
* 07:50 jayme@cumin1001: conftool action : set/pooled=true; selector: name=eqiad,dnsdisc=dnsdisc=eventgate-logging-external
* 07:50 jayme@cumin1001: conftool action : set/pooled=true; selector: name=eqiad,dnsdisc=dnsdisc=zotero
* 07:41 elukey@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host ml-etcd2002.codfw.wmnet
* 07:40 marostegui@cumin1001: dbctl commit (dc=all): 'db1086 (re)pooling @ 25%: Slowly repool db1086 after schema change', diff saved to https://phabricator.wikimedia.org/P15061 and previous config saved to /var/cache/conftool/dbconfig/20210324-074050-root.json
* 07:37 marostegui@cumin1001: dbctl commit (dc=all): 'db1141 (re)pooling @ 50%: Slowly repool db1141', diff saved to https://phabricator.wikimedia.org/P15060 and previous config saved to /var/cache/conftool/dbconfig/20210324-073718-root.json
* 07:27 elukey@cumin1001: START - Cookbook sre.ganeti.makevm for new host ml-etcd2002.codfw.wmnet
* 07:23 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1086 for schema change', diff saved to https://phabricator.wikimedia.org/P15059 and previous config saved to /var/cache/conftool/dbconfig/20210324-072319-marostegui.json
* 07:22 marostegui@cumin1001: dbctl commit (dc=all): 'db1141 (re)pooling @ 25%: Slowly repool db1141', diff saved to https://phabricator.wikimedia.org/P15058 and previous config saved to /var/cache/conftool/dbconfig/20210324-072214-root.json
* 07:20 elukey@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts ml-etcd2002.codfw.wmnet
* 07:10 elukey@cumin1001: START - Cookbook sre.hosts.decommission for hosts ml-etcd2002.codfw.wmnet
* 07:09 moritzm: installing squid security updates
* 06:35 marostegui@cumin1001: dbctl commit (dc=all): 'Add db1181 to dbctl, depooled [[phab:T275633|T275633]]', diff saved to https://phabricator.wikimedia.org/P15057 and previous config saved to /var/cache/conftool/dbconfig/20210324-063459-marostegui.json
* 06:24 root@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db1084.eqiad.wmnet
* 06:14 root@cumin1001: START - Cookbook sre.hosts.decommission for hosts db1084.eqiad.wmnet
* 05:52 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1141', diff saved to https://phabricator.wikimedia.org/P15056 and previous config saved to /var/cache/conftool/dbconfig/20210324-055246-marostegui.json
* 04:44 ryankemper@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.rolling-upgrade (exit_code=99)
* 03:41 ryankemper: [[phab:T274204|T274204]] `sudo -i cookbook sre.elasticsearch.rolling-upgrade search_codfw "codfw cluster reboot" --task-id [[phab:T274204|T274204]] --nodes-per-run 3 --start-datetime 2021-03-24T02:29:39` on `ryankemper@cumin1001` tmux session `elasticsearch_rolling_upgrade_reboots`
* 03:41 ryankemper: [[phab:T274204|T274204]] Restarting `codfw` restart; the timestamp argument should prevent it from wasting time on nodes that have been rebooted already
* 03:40 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-upgrade
* 03:39 ryankemper: [[phab:T274204|T274204]] Timed out waiting for write queues to empty: `[59/60, retrying in 60.00s] Attempt to run 'spicerack.elasticsearch_cluster.ElasticsearchClusters.wait_for_all_write_queues_empty' raised: Write queue not empty (had value of 241631) for partition 0 of topic codfw.cpjobqueue.partitioned.mediawiki.job.cirrusSearchElasticaWrite.`
* 03:38 ryankemper@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.rolling-upgrade (exit_code=99)
* 02:38 ryankemper: [[phab:T274204|T274204]] `sudo -i cookbook sre.elasticsearch.rolling-upgrade search_codfw "codfw cluster reboot" --task-id [[phab:T274204|T274204]] --nodes-per-run 3 --start-datetime 2021-03-24T02:29:39` on `ryankemper@cumin1001` tmux session `elasticsearch_rolling_upgrade_reboots`
* 02:31 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-upgrade
* 01:59 ryankemper: [[phab:T274204|T274204]] For now I'll proceed to the reboots of `codfw`
* 01:59 ryankemper: [[phab:T274204|T274204]] `ctrl+c`'d out of run; relforge is relying on outdated config that is trying to talk to `relforge1002` which no longer exists. Need to refactor so that config no longer lives in spicerack
* 01:58 ryankemper@cumin1001: END (ERROR) - Cookbook sre.elasticsearch.rolling-upgrade-reboot (exit_code=97)
* 01:49 ryankemper: [[phab:T274204|T274204]] `sudo -i cookbook sre.elasticsearch.rolling-upgrade-reboot relforge "relforge cluster restarts" --task-id [[phab:T274204|T274204]] --nodes-per-run 3 --start-datetime 2021-03-24T01:45:59+00:00` on `ryankemper@cumin1001` tmux session `elasticsearch_rolling_upgrade_reboots`
* 01:48 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-upgrade-reboot
* 01:36 eileen: civicrm revision changed from {{Gerrit|f36a0b08f0}} to {{Gerrit|ad430721f6}}, config revision is {{Gerrit|26b02db7ba}}
* 00:22 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mw2378.codfw.wmnet with reason: REIMAGE
* 00:18 pt1979@cumin2001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2378.codfw.wmnet with reason: REIMAGE
* 00:18 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mw2377.codfw.wmnet with reason: REIMAGE
* 00:16 pt1979@cumin2001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2377.codfw.wmnet with reason: REIMAGE


== 2021-03-23 ==
== 2021-07-24 ==
* 22:59 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pki-root1001.eqiad.wmnet with reason: REIMAGE
* 11:04 urbanecm: [urbanecm@mwmaint2002 ~]$ mwscript extensions/Translate/scripts/moveTranslatablePage.php  --wiki=commonswiki --reason='OTRS -> VRTS renaming process; see [[Phab:T280392]] and [[Phab:T280397]]' --move-subpages 'Commons:OTRS' 'Commons:Volunteer Response Team' 'Martin Urbanec' # [[phab:T287321|T287321]]
* 22:57 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on pki-root1001.eqiad.wmnet with reason: REIMAGE
* 22:33 dwisehaupt: pushing {{Gerrit|60f9baaf50b}} to fundraising hosts which will enable ssl by default for mysql client connections that use the host my.cnf file - [[phab:T170321|T170321]]
* 22:19 ebernhardson@deploy1002: Finished deploy [wikimedia/discovery/analytics@3fd7d7b]: partition ores dumps by namespace (duration: 02m 07s)
* 22:17 ebernhardson@deploy1002: Started deploy [wikimedia/discovery/analytics@3fd7d7b]: partition ores dumps by namespace
* 22:09 dzahn@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 22:05 dzahn@cumin1001: START - Cookbook sre.dns.netbox
* 21:27 ppchelko@deploy1002: Finished deploy [restbase/deploy@531c474]: Add pageviews top-per-country endpoint (duration: 17m 58s)
* 21:09 ppchelko@deploy1002: Started deploy [restbase/deploy@531c474]: Add pageviews top-per-country endpoint
* 21:04 robh@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 21:00 robh@cumin1001: START - Cookbook sre.dns.netbox
* 21:00 robh@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 20:41 eileen: civicrm revision changed from {{Gerrit|39d24e8b0a}} to {{Gerrit|f36a0b08f0}}, config revision is {{Gerrit|26b02db7ba}}
* 20:24 robh@cumin1001: START - Cookbook sre.dns.netbox
* 20:24 robh@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
* 20:21 robh@cumin1001: START - Cookbook sre.dns.netbox
* 20:13 robh@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts auth1002.eqiad.wmnet
* 20:03 robh@cumin1001: START - Cookbook sre.hosts.decommission for hosts auth1002.eqiad.wmnet
* 20:02 robh@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=99) for hosts auth1002.eqiad.wmnet
* 20:01 robh@cumin1001: START - Cookbook sre.hosts.decommission for hosts auth1002.eqiad.wmnet
* 19:51 jforrester@deploy1002: Finished deploy [integration/docroot@9de8c9d]: Add homer-public listing, added by volans (duration: 00m 08s)
* 19:51 jforrester@deploy1002: Started deploy [integration/docroot@9de8c9d]: Add homer-public listing, added by volans
* 18:45 otto@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Remove schema overrides for 6 finished EL migrations - [[phab:T267347|T267347]] [[phab:T271164|T271164]] [[phab:T267351|T267351]] [[phab:T267348|T267348]] [[phab:T267343|T267343]] [[phab:T267353|T267353]] (duration: 01m 07s)
* 18:40 legoktm@deploy1002: Synchronized php-1.36.0-wmf.36/vendor/: Bump wikimedia/parsoid to 0.13.0-a29 (duration: 01m 16s)
* 18:20 mbsantos@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
* 18:18 mbsantos@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
* 18:16 mbsantos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'mobileapps' for release 'staging' .
* 18:10 legoktm@deploy1002: Synchronized wmf-config/ProductionServices.php: Add irc2001.wikimedia.org (running buster) as second irc server ([[phab:T224579|T224579]]) (duration: 01m 08s)
* 15:39 akosiaris@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'cxserver' for release 'staging' .
* 15:39 akosiaris@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'cxserver' for release 'production' .
* 15:38 akosiaris@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'cxserver' for release 'production' .
* 15:38 akosiaris@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'cxserver' for release 'staging' .
* 15:36 akosiaris@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'cxserver' for release 'staging' .
* 15:36 akosiaris@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'cxserver' for release 'production' .
* 15:32 moritzm: installing libsdl2 security updates
* 15:31 akosiaris: pool echostore for eqiad (the first of the larger services traffic wise)
* 15:31 akosiaris@cumin1001: conftool action : set/pooled=true; selector: name=eqiad,dnsdisc=echostore
* 15:25 Trey314159: reindexing Italian wikis on elastic@eqiad, elastic@codfw, and cloudelastic complete ([[phab:T274200|T274200]])
* 15:10 akosiaris@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'linkrecommendation' for release 'production' .
* 15:10 akosiaris@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'linkrecommendation' for release 'staging' .
* 15:10 akosiaris@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'linkrecommendation' for release 'external' .
* 14:53 akosiaris@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'linkrecommendation' for release 'staging' .
* 14:53 akosiaris@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'linkrecommendation' for release 'external' .
* 14:53 akosiaris@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'linkrecommendation' for release 'production' .
* 14:46 akosiaris@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'linkrecommendation' for release 'production' .
* 14:46 akosiaris@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'linkrecommendation' for release 'staging' .
* 14:46 akosiaris@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'linkrecommendation' for release 'external' .
* 14:43 akosiaris: pool more services in eqiad k8s. [[phab:T277741|T277741]]. Only the very large ones traffic wise are still on codfw
* 14:43 akosiaris@cumin1001: conftool action : set/pooled=true; selector: name=eqiad,dnsdisc=recommendation-api
* 14:43 akosiaris@cumin1001: conftool action : set/pooled=true; selector: name=eqiad,dnsdisc=push-notifications
* 14:43 akosiaris@cumin1001: conftool action : set/pooled=true; selector: name=eqiad,dnsdisc=proton
* 14:42 akosiaris@cumin1001: conftool action : set/pooled=true; selector: name=eqiad,dnsdisc=mobileapps
* 14:42 akosiaris@cumin1001: conftool action : set/pooled=true; selector: name=eqiad,dnsdisc=mathoid
* 14:42 akosiaris@cumin1001: conftool action : set/pooled=true; selector: name=eqiad,dnsdisc=linkrecommendation
* 14:42 akosiaris@cumin1001: conftool action : set/pooled=true; selector: name=eqiad,dnsdisc=eventstreams-internal
* 14:42 akosiaris@cumin1001: conftool action : set/pooled=true; selector: name=eqiad,dnsdisc=eventstreams
* 14:20 akosiaris: pool a few more services in eqiad k8s. [[phab:T277741|T277741]]
* 14:19 akosiaris@cumin1001: conftool action : set/pooled=true; selector: name=eqiad,dnsdisc=wikifeeds
* 14:19 akosiaris@cumin1001: conftool action : set/pooled=true; selector: name=eqiad,dnsdisc=termbox
* 14:19 akosiaris@cumin1001: conftool action : set/pooled=true; selector: name=eqiad,dnsdisc=similar-users
* 14:07 hashar@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.36.0-wmf.36
* 14:06 akosiaris: pool a few services in eqiad k8s. [[phab:T277741|T277741]]
* 14:05 akosiaris@cumin1001: conftool action : set/pooled=true; selector: name=eqiad,dnsdisc=cxserver
* 14:05 akosiaris@cumin1001: conftool action : set/pooled=true; selector: name=eqiad,dnsdisc=citoid
* 14:05 akosiaris@cumin1001: conftool action : set/pooled=true; selector: name=eqiad,dnsdisc=blubberoid
* 14:05 akosiaris@cumin1001: conftool action : set/pooled=true; selector: name=eqiad,dnsdisc=api-gateway
* 14:05 akosiaris@cumin1001: conftool action : set/pooled=true; selector: name=eqiad,dnsdisc=apertium
* 14:05 moritzm: installing pygments security updates on stretch
* 14:04 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2008.codfw.wmnet
* 13:59 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ganeti2008.codfw.wmnet
* 13:55 hashar@deploy1002: Finished scap: Promote testwikis from 1.36.0-wmf.35 to 1.36.0-wmf.36 - [[phab:T274940|T274940]] (duration: 31m 57s)
* 13:54 elukey: sudo systemctl reload apache2 on prometheus[12]00[34] to pick up new k8s-mlserve instance settings
* 13:28 moritzm: drain ganeti2008
* 13:25 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2018.codfw.wmnet
* 13:23 hashar@deploy1002: Started scap: Promote testwikis from 1.36.0-wmf.35 to 1.36.0-wmf.36 - [[phab:T274940|T274940]]
* 13:17 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ganeti2018.codfw.wmnet
* 13:15 ema: cp3054: install varnishkafka built explicitly against varnish 6.0.1-1wm2 to fix broken dpkg status [[phab:T264398|T264398]]
* 13:05 marostegui@cumin1001: dbctl commit (dc=all): 'db1086 (re)pooling @ 100%: Slowly repool db1086 after cloning db1181', diff saved to https://phabricator.wikimedia.org/P15054 and previous config saved to /var/cache/conftool/dbconfig/20210323-130543-root.json
* 13:01 marostegui@cumin1001: dbctl commit (dc=all): 'db1148 (re)pooling @ 100%: Slowly repool db1148 after schema change', diff saved to https://phabricator.wikimedia.org/P15053 and previous config saved to /var/cache/conftool/dbconfig/20210323-130153-root.json
* 12:58 moritzm: drain ganeti2018
* 12:58 akosiaris: remove and decomission argon, chroline, acrab, acrux [[phab:T277741|T277741]], [[phab:T277191|T277191]]
* 12:51 marostegui@cumin1001: dbctl commit (dc=all): 'db1165 (re)pooling @ 100%: Slowly pool db1165 into s6 [[phab:T258361|T258361]]', diff saved to https://phabricator.wikimedia.org/P15052 and previous config saved to /var/cache/conftool/dbconfig/20210323-125155-root.json
* 12:50 marostegui@cumin1001: dbctl commit (dc=all): 'db1086 (re)pooling @ 75%: Slowly repool db1086 after cloning db1181', diff saved to https://phabricator.wikimedia.org/P15051 and previous config saved to /var/cache/conftool/dbconfig/20210323-125039-root.json
* 12:50 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2017.codfw.wmnet
* 12:46 marostegui@cumin1001: dbctl commit (dc=all): 'db1148 (re)pooling @ 75%: Slowly repool db1148 after schema change', diff saved to https://phabricator.wikimedia.org/P15050 and previous config saved to /var/cache/conftool/dbconfig/20210323-124650-root.json
* 12:42 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ganeti2017.codfw.wmnet
* 12:36 marostegui@cumin1001: dbctl commit (dc=all): 'db1165 (re)pooling @ 85%: Slowly pool db1165 into s6 [[phab:T258361|T258361]]', diff saved to https://phabricator.wikimedia.org/P15049 and previous config saved to /var/cache/conftool/dbconfig/20210323-123651-root.json
* 12:35 marostegui@cumin1001: dbctl commit (dc=all): 'db1086 (re)pooling @ 50%: Slowly repool db1086 after cloning db1181', diff saved to https://phabricator.wikimedia.org/P15048 and previous config saved to /var/cache/conftool/dbconfig/20210323-123535-root.json
* 12:31 marostegui@cumin1001: dbctl commit (dc=all): 'db1148 (re)pooling @ 50%: Slowly repool db1148 after schema change', diff saved to https://phabricator.wikimedia.org/P15047 and previous config saved to /var/cache/conftool/dbconfig/20210323-123146-root.json
* 12:27 moritzm: drain ganeti2017
* 12:25 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2016.codfw.wmnet
* 12:21 marostegui@cumin1001: dbctl commit (dc=all): 'db1165 (re)pooling @ 75%: Slowly pool db1165 into s6 [[phab:T258361|T258361]]', diff saved to https://phabricator.wikimedia.org/P15046 and previous config saved to /var/cache/conftool/dbconfig/20210323-122148-root.json
* 12:20 marostegui@cumin1001: dbctl commit (dc=all): 'db1086 (re)pooling @ 25%: Slowly repool db1086 after cloning db1181', diff saved to https://phabricator.wikimedia.org/P15045 and previous config saved to /var/cache/conftool/dbconfig/20210323-122032-root.json
* 12:17 akosiaris: remove all schedule downtimes for k8s cluster. [[phab:T277741|T277741]]
* 12:17 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ganeti2016.codfw.wmnet
* 12:16 marostegui@cumin1001: dbctl commit (dc=all): 'db1148 (re)pooling @ 25%: Slowly repool db1148 after schema change', diff saved to https://phabricator.wikimedia.org/P15044 and previous config saved to /var/cache/conftool/dbconfig/20210323-121642-root.json
* 12:09 moritzm: drain ganeti2016
* 12:06 marostegui@cumin1001: dbctl commit (dc=all): 'db1165 (re)pooling @ 60%: Slowly pool db1165 into s6 [[phab:T258361|T258361]]', diff saved to https://phabricator.wikimedia.org/P15043 and previous config saved to /var/cache/conftool/dbconfig/20210323-120644-root.json
* 11:55 moritzm: installing libcaca security updates
* 11:51 marostegui@cumin1001: dbctl commit (dc=all): 'db1165 (re)pooling @ 50%: Slowly pool db1165 into s6 [[phab:T258361|T258361]]', diff saved to https://phabricator.wikimedia.org/P15042 and previous config saved to /var/cache/conftool/dbconfig/20210323-115141-root.json
* 11:38 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on aqs[1012-1015].eqiad.wmnet with reason: New buster hosts, not in use
* 11:38 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on aqs[1012-1015].eqiad.wmnet with reason: New buster hosts, not in use
* 11:36 marostegui@cumin1001: dbctl commit (dc=all): 'db1165 (re)pooling @ 35%: Slowly pool db1165 into s6 [[phab:T258361|T258361]]', diff saved to https://phabricator.wikimedia.org/P15041 and previous config saved to /var/cache/conftool/dbconfig/20210323-113637-root.json
* 11:31 Lucas_WMDE: EU backport&config window done
* 11:30 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:674098{{!}}Enable DiscussionTools' beta features on dewiki (T276494)]] (duration: 00m 58s)
* 11:21 marostegui@cumin1001: dbctl commit (dc=all): 'db1165 (re)pooling @ 25%: Slowly pool db1165 into s6 [[phab:T258361|T258361]]', diff saved to https://phabricator.wikimedia.org/P15040 and previous config saved to /var/cache/conftool/dbconfig/20210323-112133-root.json
* 11:06 marostegui@cumin1001: dbctl commit (dc=all): 'db1165 (re)pooling @ 20%: Slowly pool db1165 into s6 [[phab:T258361|T258361]]', diff saved to https://phabricator.wikimedia.org/P15039 and previous config saved to /var/cache/conftool/dbconfig/20210323-110630-root.json
* 11:05 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1148', diff saved to https://phabricator.wikimedia.org/P15038 and previous config saved to /var/cache/conftool/dbconfig/20210323-110553-marostegui.json
* 11:03 marostegui@cumin1001: dbctl commit (dc=all): 'db1147 (re)pooling @ 100%: Slowly repool db1147 after schema change', diff saved to https://phabricator.wikimedia.org/P15037 and previous config saved to /var/cache/conftool/dbconfig/20210323-110347-root.json
* 11:01 moritzm: installing tomcat8 security updates
* 10:56 jayme: all services re-deployed to k8s eqiad - [[phab:T277741|T277741]]
* 10:51 marostegui@cumin1001: dbctl commit (dc=all): 'db1165 (re)pooling @ 15%: Slowly pool db1165 into s6 [[phab:T258361|T258361]]', diff saved to https://phabricator.wikimedia.org/P15036 and previous config saved to /var/cache/conftool/dbconfig/20210323-105126-root.json
* 10:48 marostegui@cumin1001: dbctl commit (dc=all): 'db1147 (re)pooling @ 75%: Slowly repool db1147 after schema change', diff saved to https://phabricator.wikimedia.org/P15035 and previous config saved to /var/cache/conftool/dbconfig/20210323-104843-root.json
* 10:46 jayme@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'zotero' for release 'staging' .
* 10:46 jayme@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'zotero' for release 'production' .
* 10:45 jayme@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
* 10:45 jayme@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'wikifeeds' for release 'staging' .
* 10:45 jayme@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'termbox' for release 'test' .
* 10:45 jayme@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'termbox' for release 'production' .
* 10:45 jayme@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'termbox' for release 'staging' .
* 10:44 jayme@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'similar-users' for release 'main' .
* 10:44 jayme@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'sessionstore' for release 'staging' .
* 10:44 jayme@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'sessionstore' for release 'production' .
* 10:43 jayme@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'recommendation-api' for release 'production' .
* 10:42 jayme@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'push-notifications' for release 'canary' .
* 10:42 jayme@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'push-notifications' for release 'main' .
* 10:41 jayme@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'proton' for release 'production' .
* 10:39 jayme@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
* 10:39 jayme@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mobileapps' for release 'staging' .
* 10:37 jayme@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mathoid' for release 'staging' .
* 10:37 jayme@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mathoid' for release 'production' .
* 10:36 marostegui@cumin1001: dbctl commit (dc=all): 'db1165 (re)pooling @ 10%: Slowly pool db1165 into s6 [[phab:T258361|T258361]]', diff saved to https://phabricator.wikimedia.org/P15034 and previous config saved to /var/cache/conftool/dbconfig/20210323-103623-root.json
* 10:33 marostegui@cumin1001: dbctl commit (dc=all): 'db1147 (re)pooling @ 50%: Slowly repool db1147 after schema change', diff saved to https://phabricator.wikimedia.org/P15033 and previous config saved to /var/cache/conftool/dbconfig/20210323-103340-root.json
* 10:32 jayme@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'linkrecommendation' for release 'production' .
* 10:32 jayme@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'linkrecommendation' for release 'external' .
* 10:32 jayme@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'linkrecommendation' for release 'staging' .
* 10:32 jayme@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventstreams-internal' for release 'canary' .
* 10:32 jayme@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventstreams-internal' for release 'main' .
* 10:31 jayme@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventstreams' for release 'canary' .
* 10:31 jayme@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventstreams' for release 'production' .
* 10:29 jayme@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
* 10:29 jayme@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-main' for release 'canary' .
* 10:29 jayme@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'canary' .
* 10:28 jayme@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
* 10:28 jayme@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'canary' .
* 10:28 jayme@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
* 10:27 jayme@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-analytics' for release 'production' .
* 10:27 jayme@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-analytics' for release 'canary' .
* 10:26 jayme@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'echostore' for release 'staging' .
* 10:26 jayme@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'echostore' for release 'production' .
* 10:25 jayme@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'cxserver' for release 'production' .
* 10:25 jayme@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'cxserver' for release 'staging' .
* 10:24 jayme@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'citoid' for release 'production' .
* 10:23 jayme@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'citoid' for release 'staging' .
* 10:23 jayme@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' .
* 10:23 jayme@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'staging' .
* 10:22 jayme@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'changeprop' for release 'production' .
* 10:22 jayme@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'changeprop' for release 'staging' .
* 10:22 jayme@cumin1001: conftool action : set/pooled=yes; selector: dc=eqiad,service=kubesvc
* 10:22 jayme@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'blubberoid' for release 'production' .
* 10:22 jayme@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'blubberoid' for release 'staging' .
* 10:21 jayme@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
* 10:21 jayme@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
* 10:21 marostegui@cumin1001: dbctl commit (dc=all): 'db1165 (re)pooling @ 5%: Slowly pool db1165 into s6 [[phab:T258361|T258361]]', diff saved to https://phabricator.wikimedia.org/P15031 and previous config saved to /var/cache/conftool/dbconfig/20210323-102119-root.json
* 10:20 jayme@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'apertium' for release 'staging' .
* 10:20 jayme@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'apertium' for release 'production' .
* 10:20 jayme@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'similar-users' for release 'main' .
* 10:20 jayme@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'recommendation-api' for release 'production' .
* 10:20 jayme@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'proton' for release 'production' .
* 10:20 jayme@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventstreams' for release 'production' .
* 10:20 jayme@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventstreams' for release 'canary' .
* 10:20 jayme@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-main' for release 'canary' .
* 10:19 jayme@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
* 10:19 jayme@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'canary' .
* 10:19 jayme@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
* 10:19 jayme@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
* 10:19 jayme@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'canary' .
* 10:19 hashar@deploy1002: Pruned MediaWiki: 1.36.0-wmf.33 (duration: 01m 48s)
* 10:18 marostegui@cumin1001: dbctl commit (dc=all): 'db1147 (re)pooling @ 25%: Slowly repool db1147 after schema change', diff saved to https://phabricator.wikimedia.org/P15030 and previous config saved to /var/cache/conftool/dbconfig/20210323-101836-root.json
* 10:16 hashar@deploy1002: Pruned MediaWiki: 1.36.0-wmf.32 (duration: 14m 47s)
* 10:10 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kubernetes1005.eqiad.wmnet
* 10:02 hashar: scap clean --delete 1.36.0-wmf.32 # [[phab:T274940|T274940]]
* 10:01 hashar: Applied security patches for 1.36.0-wmf.36 # [[phab:T274940|T274940]]
* 09:57 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kubernetes1006.eqiad.wmnet
* 09:56 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kubernetes1015.eqiad.wmnet
* 09:54 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single for host kubernetes1006.eqiad.wmnet
* 09:54 marostegui@cumin1001: dbctl commit (dc=all): 'Pool db1165 into s6 with minimal weight [[phab:T258361|T258361]]', diff saved to https://phabricator.wikimedia.org/P15029 and previous config saved to /var/cache/conftool/dbconfig/20210323-095437-marostegui.json
* 09:54 akosiaris@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'sync'.
* 09:54 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kubernetes1016.eqiad.wmnet
* 09:53 akosiaris: deploy helmfile.d/admin_ng for eqiad [[phab:T277741|T277741]]
* 09:53 hashar: scap prep 1.36.0-wmf.36 # [[phab:T274940|T274940]]
* 09:53 akosiaris@deploy1002: helmfile [eqiad] START helmfile.d/admin 'sync'.
* 09:53 jayme@cumin1001: conftool action : set/pooled=yes; selector: dc=codfw,service=kubesvc,name=kubernetes2017.codfw.wmnet
* 09:53 jayme@cumin1001: conftool action : set/weight=10; selector: dc=codfw,service=kubesvc,name=kubernetes2017.codfw.wmnet
* 09:51 akosiaris@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'sync'.
* 09:50 jayme@cumin1001: conftool action : set/pooled=yes; selector: dc=eqiad,service=kubesvc,name=kubernetes1017.eqiad.wmnet
* 09:50 jayme@cumin1001: conftool action : set/weight=10; selector: dc=eqiad,service=kubesvc,name=kubernetes1017.eqiad.wmnet
* 09:49 akosiaris@deploy1002: helmfile [eqiad] START helmfile.d/admin 'sync'.
* 09:46 akosiaris@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'sync'.
* 09:46 akosiaris@deploy1002: helmfile [eqiad] START helmfile.d/admin 'sync'.
* 09:45 akosiaris@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'sync'.
* 09:45 akosiaris@deploy1002: helmfile [eqiad] START helmfile.d/admin 'sync'.
* 09:45 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes1014.eqiad.wmnet with reason: REIMAGE
* 09:44 akosiaris@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'sync'.
* 09:44 akosiaris@deploy1002: helmfile [eqiad] START helmfile.d/admin 'sync'.
* 09:43 jayme@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1014.eqiad.wmnet with reason: REIMAGE
* 09:43 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes1013.eqiad.wmnet with reason: REIMAGE
* 09:42 marostegui@cumin1001: dbctl commit (dc=all): 'Pool db1165 into s6 with minimal weight [[phab:T258361|T258361]]', diff saved to https://phabricator.wikimedia.org/P15028 and previous config saved to /var/cache/conftool/dbconfig/20210323-094257-marostegui.json
* 09:41 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes1012.eqiad.wmnet with reason: REIMAGE
* 09:41 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single for host kubernetes1016.eqiad.wmnet
* 09:41 jayme@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1013.eqiad.wmnet with reason: REIMAGE
* 09:40 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single for host kubernetes1015.eqiad.wmnet
* 09:40 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single for host kubernetes1005.eqiad.wmnet
* 09:39 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes1011.eqiad.wmnet with reason: REIMAGE
* 09:38 jayme@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1012.eqiad.wmnet with reason: REIMAGE
* 09:37 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes1010.eqiad.wmnet with reason: REIMAGE
* 09:36 jayme@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1011.eqiad.wmnet with reason: REIMAGE
* 09:36 jayme@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on kubernetes1004.eqiad.wmnet with reason: REIMAGE
* 09:35 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes1009.eqiad.wmnet with reason: REIMAGE
* 09:34 jayme@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1010.eqiad.wmnet with reason: REIMAGE
* 09:33 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kubernetes1017.eqiad.wmnet
* 09:33 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes1008.eqiad.wmnet with reason: REIMAGE
* 09:32 jayme@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1009.eqiad.wmnet with reason: REIMAGE
* 09:32 marostegui@cumin1001: dbctl commit (dc=all): 'Add db1165 to dbctl, depooled - [[phab:T258361|T258361]]', diff saved to https://phabricator.wikimedia.org/P15027 and previous config saved to /var/cache/conftool/dbconfig/20210323-093246-marostegui.json
* 09:31 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes1007.eqiad.wmnet with reason: REIMAGE
* 09:31 jayme@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1004.eqiad.wmnet with reason: REIMAGE
* 09:30 jayme@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1008.eqiad.wmnet with reason: REIMAGE
* 09:29 jayme@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on kubernetes1003.eqiad.wmnet with reason: REIMAGE
* 09:29 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes1002.eqiad.wmnet with reason: REIMAGE
* 09:28 jayme@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1007.eqiad.wmnet with reason: REIMAGE
* 09:28 jayme@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1003.eqiad.wmnet with reason: REIMAGE
* 09:27 jayme@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1002.eqiad.wmnet with reason: REIMAGE
* 09:26 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes1001.eqiad.wmnet with reason: REIMAGE
* 09:26 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1086 to clone db1181 [[phab:T275633|T275633]]', diff saved to https://phabricator.wikimedia.org/P15025 and previous config saved to /var/cache/conftool/dbconfig/20210323-092600-marostegui.json
* 09:24 jayme@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1001.eqiad.wmnet with reason: REIMAGE
* 09:18 akosiaris@cumin1001: conftool action : set/pooled=true; selector: dc=eqiad,cluster=kubernetes,name=kubernetes1017.eqiad.wmnet
* 09:17 akosiaris@cumin1001: conftool action : set/pooled=yes; selector: dc=eqiad,service=kubemaster,cluster=kubernetes
* 09:17 akosiaris@cumin1001: conftool action : set/weight=10; selector: dc=eqiad,service=kubemaster,cluster=kubernetes
* 09:16 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single for host kubernetes1017.eqiad.wmnet
* 09:14 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1147', diff saved to https://phabricator.wikimedia.org/P15024 and previous config saved to /var/cache/conftool/dbconfig/20210323-091432-marostegui.json
* 09:05 akosiaris: reboot kubetcd100[456] for kernel upgrades. [[phab:T277741|T277741]] [[phab:T273278|T273278]]
* 09:04 akosiaris: empty etcd [[phab:T277741|T277741]]
* 08:43 akosiaris: poweroff argon and chlorine [[phab:T277741|T277741]]
* 08:41 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3317 (re)pooling @ 100%: Slowly repool db1098:3317', diff saved to https://phabricator.wikimedia.org/P15023 and previous config saved to /var/cache/conftool/dbconfig/20210323-083957-root.json
* 08:41 akosiaris@cumin1001: conftool action : set/pooled=false; selector: name=eqiad,dnsdisc=zotero
* 08:41 akosiaris@cumin1001: conftool action : set/pooled=false; selector: name=eqiad,dnsdisc=wikifeeds
* 08:41 akosiaris@cumin1001: conftool action : set/pooled=false; selector: name=eqiad,dnsdisc=termbox
* 08:41 akosiaris@cumin1001: conftool action : set/pooled=false; selector: name=eqiad,dnsdisc=similar-users
* 08:41 akosiaris@cumin1001: conftool action : set/pooled=false; selector: name=eqiad,dnsdisc=sessionstore
* 08:40 akosiaris@cumin1001: conftool action : set/pooled=false; selector: name=eqiad,dnsdisc=recommendation-api
* 08:40 akosiaris@cumin1001: conftool action : set/pooled=false; selector: name=eqiad,dnsdisc=push-notifications
* 08:40 akosiaris@cumin1001: conftool action : set/pooled=false; selector: name=eqiad,dnsdisc=proton
* 08:40 akosiaris@cumin1001: conftool action : set/pooled=false; selector: name=eqiad,dnsdisc=mobileapps
* 08:40 akosiaris@cumin1001: conftool action : set/pooled=false; selector: name=eqiad,dnsdisc=mathoid
* 08:40 akosiaris@cumin1001: conftool action : set/pooled=false; selector: name=eqiad,dnsdisc=linkrecommendation
* 08:40 akosiaris@cumin1001: conftool action : set/pooled=false; selector: name=eqiad,dnsdisc=eventstreams-internal
* 08:40 akosiaris@cumin1001: conftool action : set/pooled=false; selector: name=eqiad,dnsdisc=eventstreams
* 08:40 akosiaris@cumin1001: conftool action : set/pooled=false; selector: name=eqiad,dnsdisc=eventgate-main
* 08:39 akosiaris@cumin1001: conftool action : set/pooled=false; selector: name=eqiad,dnsdisc=eventgate-logging-external
* 08:39 akosiaris@cumin1001: conftool action : set/pooled=false; selector: name=eqiad,dnsdisc=eventgate-analytics-external
* 08:39 akosiaris@cumin1001: conftool action : set/pooled=false; selector: name=eqiad,dnsdisc=eventgate-analytics
* 08:39 akosiaris@cumin1001: conftool action : set/pooled=false; selector: name=eqiad,dnsdisc=echostore
* 08:39 akosiaris@cumin1001: conftool action : set/pooled=false; selector: name=eqiad,dnsdisc=cxserver
* 08:39 akosiaris@cumin1001: conftool action : set/pooled=false; selector: name=eqiad,dnsdisc=citoid
* 08:39 akosiaris@cumin1001: conftool action : set/pooled=false; selector: name=eqiad,dnsdisc=blubberoid
* 08:39 akosiaris@cumin1001: conftool action : set/pooled=false; selector: name=eqiad,dnsdisc=api-gateway
* 08:39 akosiaris@cumin1001: conftool action : set/pooled=false; selector: name=eqiad,dnsdisc=apertium
* 08:33 akosiaris: eqiad services in k8s depooled. [[phab:T277741|T277741]]
* 08:33 akosiaris@cumin1001: conftool action : set/pooled=false; selector: name=eqiad,dnsdisc=wikifeeds
* 08:33 akosiaris@cumin1001: conftool action : set/pooled=false; selector: name=eqiad,dnsdisc=termbox
* 08:33 akosiaris@cumin1001: conftool action : set/pooled=false; selector: name=eqiad,dnsdisc=similar-users
* 08:33 akosiaris@cumin1001: conftool action : set/pooled=false; selector: name=eqiad,dnsdisc=sessionstore
* 08:33 akosiaris@cumin1001: conftool action : set/pooled=false; selector: name=eqiad,dnsdisc=recommendation-api
* 08:33 akosiaris@cumin1001: conftool action : set/pooled=false; selector: name=eqiad,dnsdisc=push-notifications
* 08:32 akosiaris@cumin1001: conftool action : set/pooled=false; selector: name=eqiad,dnsdisc=proton
* 08:32 akosiaris@cumin1001: conftool action : set/pooled=false; selector: name=eqiad,dnsdisc=mobileapps
* 08:32 akosiaris@cumin1001: conftool action : set/pooled=false; selector: name=eqiad,dnsdisc=mathoid
* 08:32 akosiaris@cumin1001: conftool action : set/pooled=false; selector: name=eqiad,dnsdisc=linkrecommendation
* 08:32 akosiaris@cumin1001: conftool action : set/pooled=false; selector: name=eqiad,dnsdisc=eventstreams-internal
* 08:32 akosiaris@cumin1001: conftool action : set/pooled=false; selector: name=eqiad,dnsdisc=eventstreams
* 08:32 akosiaris@cumin1001: conftool action : set/pooled=false; selector: name=eqiad,dnsdisc=eventgate-main
* 08:32 akosiaris@cumin1001: conftool action : set/pooled=false; selector: name=eqiad,dnsdisc=eventgate-logging-external
* 08:32 akosiaris@cumin1001: conftool action : set/pooled=false; selector: name=eqiad,dnsdisc=eventgate-analytics-external
* 08:32 akosiaris@cumin1001: conftool action : set/pooled=false; selector: name=eqiad,dnsdisc=eventgate-analytics
* 08:32 akosiaris@cumin1001: conftool action : set/pooled=false; selector: name=eqiad,dnsdisc=echostore
* 08:31 akosiaris@cumin1001: conftool action : set/pooled=false; selector: name=eqiad,dnsdisc=cxserver
* 08:31 akosiaris@cumin1001: conftool action : set/pooled=false; selector: name=eqiad,dnsdisc=citoid
* 08:31 akosiaris@cumin1001: conftool action : set/pooled=false; selector: name=eqiad,dnsdisc=blubberoid
* 08:31 akosiaris@cumin1001: conftool action : set/pooled=false; selector: name=eqiad,dnsdisc=api-gateway
* 08:31 akosiaris@cumin1001: conftool action : set/pooled=false; selector: name=eqiad,dnsdisc=apertium
* 08:28 akosiaris: downtime all services in [[phab:T277741|T277741]] for 24H
* 08:25 akosiaris: beginning the k8s upgrade/reinit process. [[phab:T277741|T277741]]
* 08:24 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3317 (re)pooling @ 75%: Slowly repool db1098:3317', diff saved to https://phabricator.wikimedia.org/P15022 and previous config saved to /var/cache/conftool/dbconfig/20210323-082454-root.json
* 08:24 moritzm: installing mariadb-10.3 updates on buster (just client-side libs/tools, unrelated to the main wmf-mariadb packages)
* 08:24 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on 18 hosts with reason: Reinitialize eqiad k8s cluster with new etcd
* 08:24 akosiaris@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on 18 hosts with reason: Reinitialize eqiad k8s cluster with new etcd
* 08:22 marostegui@cumin1001: dbctl commit (dc=all): 'db1146:3314 (re)pooling @ 100%: Slowly repool db1146:3314 after schema change', diff saved to https://phabricator.wikimedia.org/P15021 and previous config saved to /var/cache/conftool/dbconfig/20210323-082213-root.json
* 08:09 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3317 (re)pooling @ 50%: Slowly repool db1098:3317', diff saved to https://phabricator.wikimedia.org/P15020 and previous config saved to /var/cache/conftool/dbconfig/20210323-080949-root.json
* 08:07 marostegui@cumin1001: dbctl commit (dc=all): 'db1146:3314 (re)pooling @ 75%: Slowly repool db1146:3314 after schema change', diff saved to https://phabricator.wikimedia.org/P15019 and previous config saved to /var/cache/conftool/dbconfig/20210323-080709-root.json
* 07:54 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3317 (re)pooling @ 25%: Slowly repool db1098:3317', diff saved to https://phabricator.wikimedia.org/P15017 and previous config saved to /var/cache/conftool/dbconfig/20210323-075445-root.json
* 07:52 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1098:3317 to enable report_host [[phab:T266483|T266483]]', diff saved to https://phabricator.wikimedia.org/P15016 and previous config saved to /var/cache/conftool/dbconfig/20210323-075253-marostegui.json
* 07:52 marostegui@cumin1001: dbctl commit (dc=all): 'db1101:3318 (re)pooling @ 100%: Slowly repool db1101:3318', diff saved to https://phabricator.wikimedia.org/P15015 and previous config saved to /var/cache/conftool/dbconfig/20210323-075230-root.json
* 07:52 marostegui@cumin1001: dbctl commit (dc=all): 'db1101:3317 (re)pooling @ 100%: Slowly repool db1101:3317', diff saved to https://phabricator.wikimedia.org/P15014 and previous config saved to /var/cache/conftool/dbconfig/20210323-075216-root.json
* 07:52 marostegui@cumin1001: dbctl commit (dc=all): 'db1146:3314 (re)pooling @ 50%: Slowly repool db1146:3314 after schema change', diff saved to https://phabricator.wikimedia.org/P15013 and previous config saved to /var/cache/conftool/dbconfig/20210323-075206-root.json
* 07:37 marostegui@cumin1001: dbctl commit (dc=all): 'db1101:3318 (re)pooling @ 75%: Slowly repool db1101:3318', diff saved to https://phabricator.wikimedia.org/P15012 and previous config saved to /var/cache/conftool/dbconfig/20210323-073726-root.json
* 07:37 marostegui@cumin1001: dbctl commit (dc=all): 'db1101:3317 (re)pooling @ 75%: Slowly repool db1101:3317', diff saved to https://phabricator.wikimedia.org/P15011 and previous config saved to /var/cache/conftool/dbconfig/20210323-073713-root.json
* 07:37 marostegui@cumin1001: dbctl commit (dc=all): 'db1146:3314 (re)pooling @ 25%: Slowly repool db1146:3314 after schema change', diff saved to https://phabricator.wikimedia.org/P15010 and previous config saved to /var/cache/conftool/dbconfig/20210323-073702-root.json
* 07:36 elukey: create a 50g lvm volume on prometheus[12]00[34] for the k8s-mlserve cluster - [[phab:T272918|T272918]]
* 07:34 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1181.eqiad.wmnet with reason: REIMAGE
* 07:32 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1181.eqiad.wmnet with reason: REIMAGE
* 07:23 marostegui@cumin1001: dbctl commit (dc=all): 'db1086 (re)pooling @ 100%: Slowly repool db1086 after removing it from master', diff saved to https://phabricator.wikimedia.org/P15009 and previous config saved to /var/cache/conftool/dbconfig/20210323-072352-root.json
* 07:22 marostegui@cumin1001: dbctl commit (dc=all): 'db1101:3318 (re)pooling @ 50%: Slowly repool db1101:3318', diff saved to https://phabricator.wikimedia.org/P15008 and previous config saved to /var/cache/conftool/dbconfig/20210323-072223-root.json
* 07:22 marostegui@cumin1001: dbctl commit (dc=all): 'db1101:3317 (re)pooling @ 50%: Slowly repool db1101:3317', diff saved to https://phabricator.wikimedia.org/P15007 and previous config saved to /var/cache/conftool/dbconfig/20210323-072209-root.json
* 07:08 marostegui@cumin1001: dbctl commit (dc=all): 'db1086 (re)pooling @ 75%: Slowly repool db1086 after removing it from master', diff saved to https://phabricator.wikimedia.org/P15006 and previous config saved to /var/cache/conftool/dbconfig/20210323-070849-root.json
* 07:07 marostegui@cumin1001: dbctl commit (dc=all): 'db1101:3318 (re)pooling @ 25%: Slowly repool db1101:3318', diff saved to https://phabricator.wikimedia.org/P15005 and previous config saved to /var/cache/conftool/dbconfig/20210323-070719-root.json
* 07:07 marostegui@cumin1001: dbctl commit (dc=all): 'db1101:3317 (re)pooling @ 25%: Slowly repool db1101:3317', diff saved to https://phabricator.wikimedia.org/P15004 and previous config saved to /var/cache/conftool/dbconfig/20210323-070705-root.json
* 07:02 marostegui: Upgrade kernel on db1101
* 06:59 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1101:3318 to enable report_host [[phab:T266483|T266483]]', diff saved to https://phabricator.wikimedia.org/P15003 and previous config saved to /var/cache/conftool/dbconfig/20210323-065947-marostegui.json
* 06:58 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1101:3317 to enable report_host [[phab:T266483|T266483]]', diff saved to https://phabricator.wikimedia.org/P15002 and previous config saved to /var/cache/conftool/dbconfig/20210323-065836-marostegui.json
* 06:53 marostegui@cumin1001: dbctl commit (dc=all): 'db1086 (re)pooling @ 50%: Slowly repool db1086 after removing it from master', diff saved to https://phabricator.wikimedia.org/P15001 and previous config saved to /var/cache/conftool/dbconfig/20210323-065345-root.json
* 06:38 marostegui@cumin1001: dbctl commit (dc=all): 'db1086 (re)pooling @ 25%: Slowly repool db1086 after removing it from master', diff saved to https://phabricator.wikimedia.org/P15000 and previous config saved to /var/cache/conftool/dbconfig/20210323-063842-root.json
* 06:29 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1146:3314', diff saved to https://phabricator.wikimedia.org/P14999 and previous config saved to /var/cache/conftool/dbconfig/20210323-062942-marostegui.json
* 06:23 marostegui@cumin1001: dbctl commit (dc=all): 'db1086 (re)pooling @ 10%: Slowly repool db1086 after removing it from master', diff saved to https://phabricator.wikimedia.org/P14998 and previous config saved to /var/cache/conftool/dbconfig/20210323-062338-root.json
* 06:20 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1086', diff saved to https://phabricator.wikimedia.org/P14997 and previous config saved to /var/cache/conftool/dbconfig/20210323-062059-marostegui.json
* 06:20 marostegui: Upgrade kernel on db1086
* 06:07 marostegui@cumin1001: dbctl commit (dc=all): 'db1086 (re)pooling @ 25%: Slowly repool db1086 after removing it from master', diff saved to https://phabricator.wikimedia.org/P14996 and previous config saved to /var/cache/conftool/dbconfig/20210323-060701-root.json
* 06:02 marostegui@cumin1001: dbctl commit (dc=all): 'Promote db1136 to s7 master and remove read-only from s7 [[phab:T274336|T274336]]', diff saved to https://phabricator.wikimedia.org/P14995 and previous config saved to /var/cache/conftool/dbconfig/20210323-060216-marostegui.json
* 06:01 marostegui@cumin1001: dbctl commit (dc=all): 'Set s7 as read-only for maintenance [[phab:T274336|T274336]]', diff saved to https://phabricator.wikimedia.org/P14994 and previous config saved to /var/cache/conftool/dbconfig/20210323-060104-marostegui.json
* 06:00 marostegui: Starting s7 eqiad failover from db1086 to db1136 - [[phab:T274336|T274336]]
* 05:13 marostegui@cumin1001: dbctl commit (dc=all): 'Add db1174 to api [[phab:T274336|T274336]]', diff saved to https://phabricator.wikimedia.org/P14993 and previous config saved to /var/cache/conftool/dbconfig/20210323-051346-marostegui.json
* 05:12 marostegui@cumin1001: dbctl commit (dc=all): 'Set weight 0 to db1136 before failover [[phab:T274336|T274336]]', diff saved to https://phabricator.wikimedia.org/P14992 and previous config saved to /var/cache/conftool/dbconfig/20210323-051210-marostegui.json
* 00:07 tstarling@deploy1002: Synchronized wmf-config: use RequestTimeout library step 3: clean up (duration: 00m 58s)
* 00:06 tstarling@deploy1002: Synchronized wmf-config/CommonSettings.php: use RequestTimeout library step 2: enable new system (duration: 00m 57s)
* 00:04 tstarling@deploy1002: Synchronized wmf-config/PhpAutoPrepend.php: use RequestTimeout library step 1: disable old request timeout system (duration: 00m 58s)


== 2021-03-22 ==
== 2021-07-23 ==
* 23:52 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudgw1002.eqiad.wmnet with reason: REIMAGE
* 19:11 topranks: Successfully re-pooled eqiad - reversed change from yesterday after successful line card replacement in cr2-codfw - [[phab:T287110|T287110]]
* 23:49 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudgw1002.eqiad.wmnet with reason: REIMAGE
* 19:02 topranks: Re-pooling eqiad again after successful replacement of linecard in cr2-codfw [[phab:T287110|T287110]]
* 23:34 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw2250.codfw.wmnet
* 18:26 legoktm@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'shellbox' for release 'main' .
* 23:21 pt1979@cumin2001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 18:24 legoktm@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'shellbox' for release 'main' .
* 23:18 ebernhardson@deploy1002: Synchronized php-1.36.0-wmf.35/extensions/WikimediaEvents/modules/ext.wikimediaEvents/searchSatisfaction.js: [[phab:T262612|T262612]]: Start glent m1 ab test (duration: 01m 53s)
* 18:14 topranks: Turning up et-0/0/[0-1] and et-0/2/[0-1] interfaces on cr2-codfw after line card replacement slot 0.
* 23:18 pt1979@cumin2001: START - Cookbook sre.dns.netbox
* 18:12 legoktm@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'shellbox' for release 'main' .
* 23:08 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw2250.codfw.wmnet
* 16:15 effie: enable puppet on mc-gp* hosts
* 23:01 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw2249.codfw.wmnet
* 15:47 papaul: powerdown wdqs2002 for IDRAC reset
* 22:52 mutante: decom mw2249
* 15:45 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
* 22:44 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw2249.codfw.wmnet
* 15:44 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 21:08 sbassett: Deployed security patch for [[phab:T272244|T272244]]
* 15:11 elukey: stop ml-serve-ctrl1001 + gnt-instance modify -t plain ml-serve-ctrl1001.eqiad.wmnet on ganeti1009 + start instance back - [[phab:T287238|T287238]]
* 20:02 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2279.codfw.wmnet,service=canary
* 14:36 _joe_: rebuilding httpd-fcgi, mediawiki-http fixing logging [[phab:T285384|T285384]]
* 20:02 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2278.codfw.wmnet,service=canary
* 14:16 brennen: gitlab1001: running ansible to deploy [[gerrit:707236{{!}}fix puma exporter listen address]] ([[phab:T275170|T275170]])
* 20:02 dzahn@cumin1001: conftool action : set/weight=1; selector: name=mw2279.codfw.wmnet,service=canary
* 13:35 otto@deploy1002: Finished deploy [analytics/refinery@15521b3]: Add property disabling gobblin lock - [[phab:T271232|T271232]] (duration: 03m 32s)
* 20:02 dzahn@cumin1001: conftool action : set/weight=1; selector: name=mw2278.codfw.wmnet,service=canary
* 13:31 otto@deploy1002: Started deploy [analytics/refinery@15521b3]: Add property disabling gobblin lock - [[phab:T271232|T271232]]
* 19:50 mutante: gerrit2001 - restarted apache2 as well for consistency
* 12:16 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on mw[1440-1442].eqiad.wmnet with reason: setup new canary mw api servers in eqiad D8 https://phabricator.wikimedia.org/T279309
* 19:47 mutante: gerrit - restarting apache2 after we dropped MaxClients config line. This should make us fall back to Debian default MaxRequestWorkers. (since we use event MPM we should not be using MaxClients in the first place, says #httpd) ([[phab:T277127|T277127]])
* 12:16 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on mw[1440-1442].eqiad.wmnet with reason: setup new canary mw api servers in eqiad D8 https://phabricator.wikimedia.org/T279309
* 18:20 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|25247c9cbba3d3741908164f2d15fb8497ce8b5e}}: hrwiki: Configure mentorship for Growth team features ([[phab:T275684|T275684]]) (duration: 01m 00s)
* 12:15 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on mw1439.eqiad.wmnet with reason: setup new canary mw api servers in eqiad D8 https://phabricator.wikimedia.org/T279309
* 18:13 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|951601f7a4c887f21e209b32dbd1cfd3da084816}}: Grant enwiki pagemovers the delete-redirect right ([[phab:T278131|T278131]]) (duration: 00m 59s)
* 12:15 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on mw1439.eqiad.wmnet with reason: setup new canary mw api servers in eqiad D8 https://phabricator.wikimedia.org/T279309
* 17:30 Trey314159: reindexing Italian wikis on elastic@eqiad, elastic@codfw, and cloudelastic ([[phab:T274200|T274200]])
* 11:50 marostegui: Change innodb_checksum_algorithm to full_crc32 on pc1011-1014 and pc2011-2014 - [[phab:T287244|T287244]]
* 16:49 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
* 11:17 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1446.eqiad.wmnet
* 16:48 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
* 11:17 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1445.eqiad.wmnet
* 16:47 jayme@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
* 11:11 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1443.eqiad.wmnet
* 16:46 jayme@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
* 11:11 dzahn@cumin1001: conftool action : set/weight=30; selector: name=mw144[3-6].eqiad.wmnet
* 16:37 jayme@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
* 11:00 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on mw[1443,1445-1446].eqiad.wmnet with reason: new host
* 16:37 jayme@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
* 11:00 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on mw[1443,1445-1446].eqiad.wmnet with reason: new host
* 16:12 pt1979@cumin2001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 10:58 arturo: adding packages to buster-wikimedia/thirdparty/kubeadm-k8s-1-19 @ apt1001
* 16:07 pt1979@cumin2001: START - Cookbook sre.dns.netbox
* 10:02 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1442.eqiad.wmnet
* 15:58 marostegui@cumin1001: dbctl commit (dc=all): 'db1144:3314 (re)pooling @ 100%: Slowly repool db1144:3314', diff saved to https://phabricator.wikimedia.org/P14990 and previous config saved to /var/cache/conftool/dbconfig/20210322-155808-root.json
* 09:57 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1441.eqiad.wmnet
* 15:43 marostegui@cumin1001: dbctl commit (dc=all): 'db1144:3314 (re)pooling @ 75%: Slowly repool db1144:3314', diff saved to https://phabricator.wikimedia.org/P14989 and previous config saved to /var/cache/conftool/dbconfig/20210322-154304-root.json
* 09:49 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1440.eqiad.wmnet
* 15:38 pt1979@cumin2001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 09:47 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1439.eqiad.wmnet
* 15:33 pt1979@cumin2001: START - Cookbook sre.dns.netbox
* 09:20 hashar@deploy1002: Finished deploy [integration/docroot@edae2b4]: doc: add footer link to wikitech documentation (duration: 00m 11s)
* 15:28 marostegui@cumin1001: dbctl commit (dc=all): 'db1144:3314 (re)pooling @ 50%: Slowly repool db1144:3314', diff saved to https://phabricator.wikimedia.org/P14988 and previous config saved to /var/cache/conftool/dbconfig/20210322-152800-root.json
* 09:20 hashar@deploy1002: Started deploy [integration/docroot@edae2b4]: doc: add footer link to wikitech documentation
* 15:12 marostegui@cumin1001: dbctl commit (dc=all): 'db1144:3314 (re)pooling @ 25%: Slowly repool db1144:3314', diff saved to https://phabricator.wikimedia.org/P14987 and previous config saved to /var/cache/conftool/dbconfig/20210322-151257-root.json
* 08:59 dzahn@cumin1001: conftool action : set/weight=30; selector: name=mw144[0-2].eqiad.wmnet
* 14:26 jayme@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
* 08:58 dzahn@cumin1001: conftool action : set/weight=30; selector: name=mw1439.eqiad.wmnet
* 14:25 jayme@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
* 08:56 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on mw[1439-1442].eqiad.wmnet with reason: new host
* 14:23 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'sync'.
* 08:56 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on mw[1439-1442].eqiad.wmnet with reason: new host
* 14:22 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'sync'.
* 08:24 elukey: run 'gnt-instance modify -t plain ml-serve-ctrl1002.eqiad.wmnet' on ganeti1009 as test to track down latency/perf issues with kubelets
* 14:14 elukey@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
* 03:11 ryankemper: [[phab:T287223|T287223]] Installed `nginx-light` on all of `cloudelastic*`, and it looks like `relforge` didn't need the upgrade. This operation is done.
* 14:14 elukey@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
* 03:09 ryankemper: [[phab:T287223|T287223]] Installed `nginx-light` on all of `elastic1*` (eqiad)
* 14:11 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1144:3314 for schema change', diff saved to https://phabricator.wikimedia.org/P14986 and previous config saved to /var/cache/conftool/dbconfig/20210322-141146-marostegui.json
* 03:06 ryankemper: [[phab:T287223|T287223]] Installed `nginx-light` on all of `elastic2*` (codfw)
* 14:08 marostegui@cumin1001: dbctl commit (dc=all): 'db1143 (re)pooling @ 100%: Slowly repool db1143', diff saved to https://phabricator.wikimedia.org/P14985 and previous config saved to /var/cache/conftool/dbconfig/20210322-140800-root.json
* 02:53 ejegg: updated Fundraising CiviCRM from {{Gerrit|819c11307d}} to {{Gerrit|739c936298}}
* 14:07 XioNoX: rename cloud-hosts1-b-eqiad to cloud-hosts1-eqiad - [[phab:T277771|T277771]]
* 02:26 ryankemper: [WDQS] Pooled `wdqs1004` (all caught up on its mountain of lag)
* 14:07 XioNoX: rename cloud-hosts1-b-eqiad to cloud-hosts1-eqiad
* 01:28 ejegg: updated payments-wiki from {{Gerrit|844b59ee42}} to {{Gerrit|cc5d14ea7f}}
* 13:52 marostegui@cumin1001: dbctl commit (dc=all): 'db1143 (re)pooling @ 75%: Slowly repool db1143', diff saved to https://phabricator.wikimedia.org/P14984 and previous config saved to /var/cache/conftool/dbconfig/20210322-135256-root.json
* 01:20 legoktm: legoktm@deneb:~$ docker rmi docker-registry.wikimedia.org/mwcachedir:0.0.1 # [[phab:T287222|T287222]]
* 13:37 marostegui@cumin1001: dbctl commit (dc=all): 'db1143 (re)pooling @ 50%: Slowly repool db1143', diff saved to https://phabricator.wikimedia.org/P14983 and previous config saved to /var/cache/conftool/dbconfig/20210322-133753-root.json
* 13:26 kharlan@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'linkrecommendation' for release 'external' .
* 13:26 kharlan@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'linkrecommendation' for release 'production' .
* 13:22 marostegui@cumin1001: dbctl commit (dc=all): 'db1143 (re)pooling @ 25%: Slowly repool db1143', diff saved to https://phabricator.wikimedia.org/P14982 and previous config saved to /var/cache/conftool/dbconfig/20210322-132249-root.json
* 13:20 kharlan@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'linkrecommendation' for release 'external' .
* 13:20 kharlan@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'linkrecommendation' for release 'production' .
* 13:16 kharlan@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'linkrecommendation' for release 'staging' .
* 12:28 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'sync'.
* 12:27 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'sync'.
* 12:20 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'sync'.
* 12:19 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'sync'.
* 12:19 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1143 for schema change', diff saved to https://phabricator.wikimedia.org/P14981 and previous config saved to /var/cache/conftool/dbconfig/20210322-121924-marostegui.json
* 11:29 marostegui@cumin1001: dbctl commit (dc=all): 'db1085 (re)pooling @ 100%: Slowly repool db1085', diff saved to https://phabricator.wikimedia.org/P14980 and previous config saved to /var/cache/conftool/dbconfig/20210322-112954-root.json
* 11:27 marostegui@cumin1001: dbctl commit (dc=all): 'db1142 (re)pooling @ 100%: Slowly repool db1142', diff saved to https://phabricator.wikimedia.org/P14979 and previous config saved to /var/cache/conftool/dbconfig/20210322-112707-root.json
* 11:15 elukey@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
* 11:15 elukey@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
* 11:15 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
* 11:14 marostegui@cumin1001: dbctl commit (dc=all): 'db1085 (re)pooling @ 75%: Slowly repool db1085', diff saved to https://phabricator.wikimedia.org/P14978 and previous config saved to /var/cache/conftool/dbconfig/20210322-111451-root.json
* 11:14 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 11:12 marostegui@cumin1001: dbctl commit (dc=all): 'db1142 (re)pooling @ 75%: Slowly repool db1142', diff saved to https://phabricator.wikimedia.org/P14977 and previous config saved to /var/cache/conftool/dbconfig/20210322-111203-root.json
* 11:09 hnowlan@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
* 11:09 hnowlan@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
* 10:59 marostegui@cumin1001: dbctl commit (dc=all): 'db1085 (re)pooling @ 50%: Slowly repool db1085', diff saved to https://phabricator.wikimedia.org/P14976 and previous config saved to /var/cache/conftool/dbconfig/20210322-105947-root.json
* 10:57 marostegui@cumin1001: dbctl commit (dc=all): 'db1142 (re)pooling @ 50%: Slowly repool db1142', diff saved to https://phabricator.wikimedia.org/P14975 and previous config saved to /var/cache/conftool/dbconfig/20210322-105700-root.json
* 10:53 akosiaris@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'cxserver' for release 'staging' .
* 10:53 akosiaris@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'cxserver' for release 'production' .
* 10:51 moritzm: installing libdbi-perl security updates
* 10:48 hnowlan@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
* 10:48 hnowlan@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
* 10:48 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
* 10:48 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 10:47 elukey@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
* 10:47 elukey@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
* 10:44 marostegui@cumin1001: dbctl commit (dc=all): 'db1085 (re)pooling @ 25%: Slowly repool db1085', diff saved to https://phabricator.wikimedia.org/P14974 and previous config saved to /var/cache/conftool/dbconfig/20210322-104443-root.json
* 10:42 marostegui@cumin1001: dbctl commit (dc=all): 'db1142 (re)pooling @ 25%: Slowly repool db1142', diff saved to https://phabricator.wikimedia.org/P14973 and previous config saved to /var/cache/conftool/dbconfig/20210322-104156-root.json
* 10:42 hnowlan@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
* 10:41 hnowlan@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
* 10:41 jdrewniak@deploy1002: Synchronized portals: Wikimedia Portals Update: [[gerrit:673979{{!}} Bumping portals to master (T128546)]] (duration: 00m 58s)
* 10:40 jdrewniak@deploy1002: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: [[gerrit:673979{{!}} Bumping portals to master (T128546)]] (duration: 00m 58s)
* 10:34 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
* 10:33 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 10:32 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
* 10:32 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 10:27 elukey@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
* 10:26 elukey@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
* 10:26 elukey@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
* 10:25 elukey@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
* 10:21 jayme@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
* 10:21 jayme@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
* 10:17 elukey@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
* 10:17 elukey@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
* 10:15 elukey@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
* 10:15 elukey@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
* 10:12 elukey: run homer for cr1/cr2 eqiad and codfw to add new iBGP session for the k8s ML clusters - https://gerrit.wikimedia.org/r/c/operations/homer/public/+/661055
* 09:50 reedy@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config cleanup (duration: 00m 57s)
* 09:49 reedy@deploy1002: Synchronized wmf-config/InitialiseSettings-labs.php: Config cleanup (duration: 00m 59s)
* 09:48 reedy@deploy1002: Synchronized wmf-config/CommonSettings-labs.php: Config cleanup (duration: 01m 20s)
* 09:36 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1142 for schema change', diff saved to https://phabricator.wikimedia.org/P14971 and previous config saved to /var/cache/conftool/dbconfig/20210322-093558-marostegui.json
* 09:15 marostegui@cumin1001: dbctl commit (dc=all): 'db1141 (re)pooling @ 100%: Slowly repool db1141', diff saved to https://phabricator.wikimedia.org/P14970 and previous config saved to /var/cache/conftool/dbconfig/20210322-091534-root.json
* 09:00 marostegui@cumin1001: dbctl commit (dc=all): 'db1141 (re)pooling @ 75%: Slowly repool db1141', diff saved to https://phabricator.wikimedia.org/P14969 and previous config saved to /var/cache/conftool/dbconfig/20210322-090030-root.json
* 08:45 marostegui@cumin1001: dbctl commit (dc=all): 'db1141 (re)pooling @ 50%: Slowly repool db1141', diff saved to https://phabricator.wikimedia.org/P14968 and previous config saved to /var/cache/conftool/dbconfig/20210322-084527-root.json
* 08:30 marostegui@cumin1001: dbctl commit (dc=all): 'db1141 (re)pooling @ 25%: Slowly repool db1141', diff saved to https://phabricator.wikimedia.org/P14967 and previous config saved to /var/cache/conftool/dbconfig/20210322-083023-root.json
* 08:13 godog: swift eqiad-prod: less weight for ms-be[1019-1026] / more weight to ms-be106[0-3] - [[phab:T272836|T272836]] [[phab:T268435|T268435]]
* 08:13 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1158.eqiad.wmnet with reason: REIMAGE
* 08:11 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1158.eqiad.wmnet with reason: REIMAGE
* 08:02 jayme: build and release docker-registry.discovery.wmnet/eventrouter:0.3.0-6, docker-registry.discovery.wmnet/fluent-bit:1.5.3-3, docker-registry.discovery.wmnet/ratelimit:1.5.1-s3
* 08:00 marostegui: Stop MySQL on db1085 to clone db1165 (lag will appear on s6 on wiki replicas) [[phab:T258361|T258361]]
* 08:00 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1085 to clone db1165', diff saved to https://phabricator.wikimedia.org/P14965 and previous config saved to /var/cache/conftool/dbconfig/20210322-080020-marostegui.json
* 07:51 elukey: stop/start mariadb instances on dbstore1004 to reduce buffer pool memory settings - [[phab:T273865|T273865]]
* 07:37 marostegui@cumin1001: dbctl commit (dc=all): 'db1161 (re)pooling @ 100%: Slowly repool db1161', diff saved to https://phabricator.wikimedia.org/P14964 and previous config saved to /var/cache/conftool/dbconfig/20210322-073747-root.json
* 07:22 marostegui@cumin1001: dbctl commit (dc=all): 'db1161 (re)pooling @ 75%: Slowly repool db1161', diff saved to https://phabricator.wikimedia.org/P14963 and previous config saved to /var/cache/conftool/dbconfig/20210322-072243-root.json
* 07:14 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1141 for schema change', diff saved to https://phabricator.wikimedia.org/P14962 and previous config saved to /var/cache/conftool/dbconfig/20210322-071430-marostegui.json
* 07:07 marostegui@cumin1001: dbctl commit (dc=all): 'db1161 (re)pooling @ 50%: Slowly repool db1161', diff saved to https://phabricator.wikimedia.org/P14961 and previous config saved to /var/cache/conftool/dbconfig/20210322-070740-root.json
* 06:52 marostegui@cumin1001: dbctl commit (dc=all): 'db1161 (re)pooling @ 25%: Slowly repool db1161', diff saved to https://phabricator.wikimedia.org/P14960 and previous config saved to /var/cache/conftool/dbconfig/20210322-065236-root.json
* 06:37 marostegui@cumin1001: dbctl commit (dc=all): 'Remove db1084 from dbctl [[phab:T276302|T276302]]', diff saved to https://phabricator.wikimedia.org/P14959 and previous config saved to /var/cache/conftool/dbconfig/20210322-063732-marostegui.json
* 06:11 marostegui: Sanitize db1124 db2094 db1154: taywiki trvwiki mnwwiktionary
* 04:28 kartik@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'cxserver' for release 'staging' .


== 2021-03-21 ==
== 2021-07-22 ==
* 10:25 _joe_: restarting gerrit on gerrit1001, using 45G of reserved memory
* 23:35 derick@deploy1002: Synchronized php-1.37.0-wmf.15/includes/preferences/DefaultPreferencesFactory.php: Backport: [[gerrit:706003{{!}}Make sure enable responsive mode UI reflects actual preference value (T285402)]] (duration: 00m 56s)
* 09:22 elukey: install apache2-bin-dbgsym on gerrit1001 - [[phab:T277127|T277127]]
* 19:26 otto@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Finalize several EventLogging -> Event Platfom migrations - [[phab:T282855|T282855]] [[phab:T238138|T238138]] [[phab:T282562|T282562]] [[phab:T271168|T271168]] (duration: 00m 55s)
* 08:50 qchris: Restarting apache on gerrit1001 again (all apache workers busy again) see [[phab:T277127|T277127]]
* 19:08 legoktm@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.00-warmup-caches (exit_code=0)
* 08:18 qchris: Restarting apache on gerrit1001 (all apache workers busy)
* 19:07 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw142[1-2].eqiad.wmnet
* 19:07 mutante: mw1421, mw1422 - scap pull, re-pool as new API servers after reimaging, previously appservers
* 19:06 legoktm@cumin1001: START - Cookbook sre.switchdc.mediawiki.00-warmup-caches
* 19:05 dzahn@cumin1001: conftool action : set/weight=30; selector: name=mw142[1-2].eqiad.wmnet
* 19:04 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw142[1-2].eqiad.wmnet
* 19:00 urbanecm: Start server-side upload for 1 video file ([[phab:T287061|T287061]])
* 18:59 otto@deploy1002: Finished deploy [analytics/refinery@3115f9e]: Set gobblin job.lock.dir after all - [[phab:T271232|T271232]] (duration: 03m 22s)
* 18:58 urbanecm: Start server-side upload for 1 video file ([[phab:T286489|T286489]])
* 18:56 urbanecm: Start server-side upload for 1 video file ([[phab:T286665|T286665]])
* 18:56 otto@deploy1002: Started deploy [analytics/refinery@3115f9e]: Set gobblin job.lock.dir after all - [[phab:T271232|T271232]]
* 18:53 urbanecm: [urbanecm@mwmaint2002 ~]$ mwscript namespaceDupes.php --wiki=hewikisource --fix # [[phab:T286500|T286500]]
* 18:52 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|26c23dee57cc105c6ff98f4403618cfab536e089}}: hewikisource: Add namespace aliases ([[phab:T286500|T286500]]) (duration: 00m 55s)
* 18:50 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=maps2010.codfw.wmnet
* 18:48 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|599c2209c332fb0ebf3079bfb44558eb67ae5657}}: enwikisource: Create upload-shared user group ([[phab:T285130|T285130]]) (duration: 00m 56s)
* 18:42 legoktm@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.00-warmup-caches (exit_code=0)
* 18:41 otto@deploy1002: Finished deploy [analytics/refinery@1ef4fe1]: bin/gobbin wrapper now avoids launching if job is already running - [[phab:T271232|T271232]] (duration: 03m 18s)
* 18:41 legoktm@cumin1001: START - Cookbook sre.switchdc.mediawiki.00-warmup-caches
* 18:38 otto@deploy1002: Started deploy [analytics/refinery@1ef4fe1]: bin/gobbin wrapper now avoids launching if job is already running - [[phab:T271232|T271232]]
* 18:32 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|6a909301d93045ad6752ded08fa5ed7c2972f855}}: Enable the visual editor on the 2021 namespace on Wikimania wiki ([[phab:T287197|T287197]]) (duration: 00m 55s)
* 18:23 legoktm@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.00-warmup-caches (exit_code=0)
* 18:22 legoktm@cumin1001: START - Cookbook sre.switchdc.mediawiki.00-warmup-caches
* 18:20 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|f765832fa2bcfb9e43516e4962254854c3a3b39a}}: Add digital.ub.umu.se to the wgCopyUploadsDomains allowlist of Wikimedia Commons ([[phab:T287204|T287204]]) (duration: 00m 55s)
* 18:13 legoktm@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.00-warmup-caches (exit_code=0)
* 18:10 legoktm: testing dc switchover warmup script in eqiad
* 18:10 legoktm@cumin1001: START - Cookbook sre.switchdc.mediawiki.00-warmup-caches
* 17:43 marostegui@cumin1001: dbctl commit (dc=all): 'db2091 (re)pooling @ 100%: After onsite maintenance', diff saved to https://phabricator.wikimedia.org/P16866 and previous config saved to /var/cache/conftool/dbconfig/20210722-174357-root.json
* 17:41 mbsantos@deploy1002: Finished deploy [kartotherian/deploy@b414857]: Mirror 10% of maps2007 traffic to the Tegola service (duration: 00m 20s)
* 17:41 mbsantos@deploy1002: Started deploy [kartotherian/deploy@b414857]: Mirror 10% of maps2007 traffic to the Tegola service
* 17:33 mbsantos@deploy1002: Finished deploy [kartotherian/deploy@bbb7ba8]: Preparing maps2007 to mirror traffic to the Tegola service (no-op) (duration: 00m 20s)
* 17:33 mbsantos@deploy1002: Started deploy [kartotherian/deploy@bbb7ba8]: Preparing maps2007 to mirror traffic to the Tegola service (no-op)
* 17:28 marostegui@cumin1001: dbctl commit (dc=all): 'db2091 (re)pooling @ 75%: After onsite maintenance', diff saved to https://phabricator.wikimedia.org/P16865 and previous config saved to /var/cache/conftool/dbconfig/20210722-172853-root.json
* 17:23 mbsantos@deploy1002: Finished deploy [kartotherian/deploy@0a38bc5]: Rollback maps2007 mirroring (duration: 00m 12s)
* 17:22 mbsantos@deploy1002: Started deploy [kartotherian/deploy@0a38bc5]: Rollback maps2007 mirroring
* 17:22 mbsantos@deploy1002: Finished deploy [kartotherian/deploy@b173b4f]: Preparing maps2007 to mirror traffic to the Tegola service (no-op) (duration: 00m 20s)
* 17:21 mbsantos@deploy1002: Started deploy [kartotherian/deploy@b173b4f]: Preparing maps2007 to mirror traffic to the Tegola service (no-op)
* 17:13 marostegui@cumin1001: dbctl commit (dc=all): 'db2091 (re)pooling @ 50%: After onsite maintenance', diff saved to https://phabricator.wikimedia.org/P16864 and previous config saved to /var/cache/conftool/dbconfig/20210722-171349-root.json
* 17:05 mbsantos@deploy1002: Finished deploy [kartotherian/deploy@0a38bc5]: Rollback maps2007 mirroring (duration: 00m 12s)
* 17:05 mbsantos@deploy1002: Started deploy [kartotherian/deploy@0a38bc5]: Rollback maps2007 mirroring
* 17:03 jgiannelos@deploy1002: Finished deploy [kartotherian/deploy@67a0db1]: Preparing maps2007 to mirror traffic to the Tegola service (no-op) (duration: 00m 21s)
* 17:03 jgiannelos@deploy1002: Started deploy [kartotherian/deploy@67a0db1]: Preparing maps2007 to mirror traffic to the Tegola service (no-op)
* 16:58 marostegui@cumin1001: dbctl commit (dc=all): 'db2091 (re)pooling @ 25%: After onsite maintenance', diff saved to https://phabricator.wikimedia.org/P16863 and previous config saved to /var/cache/conftool/dbconfig/20210722-165846-root.json
* 16:56 brennen: gitlab1001: running ansible to deploy [[gerrit:706396]] ([[phab:T275170|T275170]])
* 16:43 marostegui@cumin1001: dbctl commit (dc=all): 'db2091 (re)pooling @ 15%: After onsite maintenance', diff saved to https://phabricator.wikimedia.org/P16862 and previous config saved to /var/cache/conftool/dbconfig/20210722-164342-root.json
* 16:28 marostegui@cumin1001: dbctl commit (dc=all): 'db2091 (re)pooling @ 10%: After onsite maintenance', diff saved to https://phabricator.wikimedia.org/P16861 and previous config saved to /var/cache/conftool/dbconfig/20210722-162838-root.json
* 16:27 hnowlan@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on maps2010.codfw.wmnet with reason: REIMAGE
* 16:25 hnowlan@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on maps2010.codfw.wmnet with reason: REIMAGE
* 16:25 mbsantos@deploy1002: Finished deploy [kartotherian/deploy@0a38bc5]: Rollback maps2007 mirroring (duration: 00m 20s)
* 16:24 mbsantos@deploy1002: Started deploy [kartotherian/deploy@0a38bc5]: Rollback maps2007 mirroring
* 16:20 mbsantos@deploy1002: Finished deploy [kartotherian/deploy@fb4bc10]: Preparing maps2007 to mirror traffic to the Tegola service (no-op) (duration: 00m 20s)
* 16:20 mbsantos@deploy1002: Started deploy [kartotherian/deploy@fb4bc10]: Preparing maps2007 to mirror traffic to the Tegola service (no-op)
* 16:13 marostegui@cumin1001: dbctl commit (dc=all): 'db2091 (re)pooling @ 5%: After onsite maintenance', diff saved to https://phabricator.wikimedia.org/P16860 and previous config saved to /var/cache/conftool/dbconfig/20210722-161333-root.json
* 15:45 marostegui: Stop db2091 for onsite maintenance
* 15:44 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2091', diff saved to https://phabricator.wikimedia.org/P16859 and previous config saved to /var/cache/conftool/dbconfig/20210722-154408-marostegui.json
* 15:22 moritzm: installing dnspython bugfix updates from Buster 10.10 point release
* 15:14 mmandere: pool lvs1015 - [[phab:T286065|T286065]]
* 15:14 jynus: shutdown db2097 for hw servicing [[phab:T287072|T287072]]
* 15:11 moritzm: re-enabled puppet after row C switch maintenance completed
* 15:11 mmandere: pool cp108[3-6].eqiad.wmnet - [[phab:T286065|T286065]]
* 14:58 moritzm: disabled puppet temporarily for Row C switch maintenance
* 14:50 mmandere@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on lvs1015.eqiad.wmnet with reason: Eqiad row C maintenance
* 14:50 mmandere@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on lvs1015.eqiad.wmnet with reason: Eqiad row C maintenance
* 14:47 mmandere: depool lvs1015 - [[phab:T286065|T286065]]
* 14:40 mmandere@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on cp[1083-1086].eqiad.wmnet with reason: Eqiad row C maintenance
* 14:40 mmandere@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on cp[1083-1086].eqiad.wmnet with reason: Eqiad row C maintenance
* 14:37 mmandere: depool cp108[3-6].eqiad.wmnet - [[phab:T286065|T286065]]
* 14:29 effie: restarting pybal in lvs2009 and lvs1015
* 14:27 moritzm: installing libwebp security updates on stretch
* 14:25 effie: restarting pybal in lvs2010 and lvs1016
* 14:22 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 14:20 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|0208fc2b71863c91c3e767373d4bea1a2eaf178d}}: Growth: Add mentor dashboard related config ([[phab:T278920|T278920]]) (duration: 00m 55s)
* 13:52 kormat@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 13:47 kormat@cumin1001: START - Cookbook sre.dns.netbox
* 13:04 hashar@deploy1002: rebuilt and synchronized wikiversions files: group2 wikis to 1.37.0-wmf.15
* 12:50 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc1014.eqiad.wmnet with reason: REIMAGE
* 12:48 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on pc1014.eqiad.wmnet with reason: REIMAGE
* 12:40 Amir1: cleaning flaggedrevs auto-approve logs in dewiki
* 12:17 Amir1: cleaning rest of auto-approve logs of ruwiki
* 12:01 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on mw[1421-1422].eqiad.wmnet with reason: new host
* 12:01 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on mw[1421-1422].eqiad.wmnet with reason: new host
* 11:36 Lucas_WMDE: EU backport+config window done
* 11:35 hnowlan_: removing maps2010 from old maps cassandra cluster
* 11:35 lucaswerkmeister-wmde@deploy1002: Synchronized w/touch.php: Config: [[gerrit:705690{{!}}Avoid using MWHttpRequest::factory()]] (2/2) (duration: 01m 04s)
* 11:34 lucaswerkmeister-wmde@deploy1002: Synchronized w/favicon.php: Config: [[gerrit:705690{{!}}Avoid using MWHttpRequest::factory()]] (1/2) (duration: 01m 04s)
* 11:23 lucaswerkmeister-wmde@deploy1002: Synchronized w/robots.php: Config: [[gerrit:705682{{!}}Avoid using WikiPage::factory()]] (duration: 01m 06s)
* 10:59 mutante: mw1421, mw1422 - puppetmaster - cleaning certs, reimaged hosts
* 10:45 effie: restart pybal on lvs2009 and lvs1015
* 10:45 jiji@cumin1001: conftool action : set/pooled=false; selector: name=eqiad,dnsdisc=mwdebug
* 10:42 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1422.eqiad.wmnet with reason: REIMAGE
* 10:42 effie: restart pybal on lvs2010  and lvs1016
* 10:40 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1422.eqiad.wmnet with reason: REIMAGE
* 10:37 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1421.eqiad.wmnet with reason: REIMAGE
* 10:35 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1421.eqiad.wmnet with reason: REIMAGE
* 10:19 mutante: mw1421, mw1422 - converting from app to API server for balance in row A
* 10:09 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw1422.eqiad.wmnet
* 10:01 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5 days, 8:00:00 on planet1002.eqiad.wmnet with reason: known issue
* 10:01 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 5 days, 8:00:00 on planet1002.eqiad.wmnet with reason: known issue
* 09:11 XioNoX: depool eqiad to reduce load on one codfw-eqiad link - [[phab:T287110|T287110]]
* 08:48 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2025.codfw.wmnet
* 08:45 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2025.codfw.wmnet
* 08:34 XioNoX: cr2-codfw> request chassis fpc slot 0 offline - [[phab:T287110|T287110]]
* 07:24 hashar@deploy1002: Finished deploy [integration/docroot@b3e39b0]: build: Updating mediawiki/mediawiki-codesniffer to 37.0.0 (duration: 00m 09s)
* 07:24 hashar@deploy1002: Started deploy [integration/docroot@b3e39b0]: build: Updating mediawiki/mediawiki-codesniffer to 37.0.0
* 07:01 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1170 (s2, s7), pool db1105 (s2) and db1098 (s7) into dump [[phab:T286888|T286888]]', diff saved to https://phabricator.wikimedia.org/P16844 and previous config saved to /var/cache/conftool/dbconfig/20210722-070114-marostegui.json
* 06:20 ryankemper: [WDQS] Pooled `wdqs1006` (was still depooled following data-transfer cookbook runs from several hours ago)
* 05:41 ryankemper: [WDQS] Restarted `wdqs-blazegraph` on `wdqs1013`
* 05:31 ryankemper: [[phab:T281327|T281327]] [Elastic] Unbanned `elastic2043.codfw.wmnet` from all 3 cirrus/elasticsearch clusters; node is back in the fleet
* 00:52 ryankemper@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic2043.codfw.wmnet with reason: REIMAGE
* 00:50 ryankemper@cumin2001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic2043.codfw.wmnet with reason: REIMAGE


== 2021-03-20 ==
== 2021-07-21 ==
* 00:22 tzatziki: altering emails for STei (WMF) and SGrabarczuk (WMF)
* 23:15 legoktm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Enable Score on enwikisource, plwikisource. Disable on all private/lockeddown wikis ([[phab:T257066|T257066]]) (duration: 01m 03s)
* 22:44 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 22:41 ryankemper: [[phab:T280382|T280382]] `sudo -i cookbook sre.wdqs.data-transfer --source wdqs1006.eqiad.wmnet --dest wdqs1009.eqiad.wmnet --reason "transferring fresh categories journal to resolve categories update lag unknown alert status" --blazegraph_instance categories --without-lvs` on `ryankemper@cumin1001` tmux session `wdqs`
* 22:41 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 22:40 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 22:37 ryankemper: [[phab:T280382|T280382]] `sudo -i cookbook sre.wdqs.data-transfer --source wdqs1006.eqiad.wmnet --dest wdqs1010.eqiad.wmnet --reason "transferring fresh categories journal to resolve categories update lag unknown alert status" --blazegraph_instance categories --without-lvs` on `ryankemper@cumin1001` tmux session `wdqs`
* 22:37 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 22:36 ryankemper@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99)
* 22:36 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 22:29 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 22:24 pt1979@cumin2002: START - Cookbook sre.dns.netbox
* 20:27 dancy: testing upcoming Scap release on beta
* 18:27 ryankemper: [[phab:T281327|T281327]] [Elastic] `sudo -i wmf-auto-reimage-host -p [[phab:T281327|T281327]] elastic2043.codfw.wmnet` on `ryankemper@cumin2001` tmux session `reimage_elastic2043`
* 18:21 ryankemper: [WDQS] Restarted `wdqs-updater` on `wdqs1004`
* 18:19 XioNoX: cr2-codfw> request chassis fpc slot 0 restart
* 18:17 ryankemper: [WDQS] Depooled `wdqs1004` and restarted `wdqs-blazegraph`
* 18:17 herron@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on logstash2003.codfw.wmnet with reason: REIMAGE
* 18:16 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.15/extensions/DiscussionTools/modules/ReplyLinksController.js: {{Gerrit|1453831db13e17e550a86dd99d09dc26eeb242b1}}: Do not teardown newtopictool interface if it was not setup ([[phab:T287035|T287035]]) (duration: 01m 04s)
* 18:15 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.14/extensions/DiscussionTools/modules/ReplyLinksController.js: {{Gerrit|aca510b773a67d24452731d5d6a33952c57592b8}}: Do not teardown newtopictool interface if it was not setup ([[phab:T287035|T287035]]) (duration: 01m 05s)
* 18:14 herron@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on logstash2003.codfw.wmnet with reason: REIMAGE
* 17:12 herron@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on logstash2002.codfw.wmnet with reason: REIMAGE
* 17:10 herron@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on logstash2002.codfw.wmnet with reason: REIMAGE
* 16:50 urbanecm: [urbanecm@mwmaint2002 ~]$ time /usr/local/bin/mw-cli-wrapper /usr/local/bin/foreachwikiindblist /srv/mediawiki/dblists/growthexperiments.dblist extensions/GrowthExperiments/maintenance/updateMenteeData.php # [[phab:T285811|T285811]]
* 16:32 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.37.0-wmf.15/extensions/WikibaseLexeme/WikibaseLexeme.entitytypes.repo.php: Backport: [[gerrit:705912{{!}}Define PREFETCHING_TERM_LOOKUP for all types in client and repo (T287085)]] (2/2) – I think this might be the fastest way to fix the errors (duration: 01m 05s)
* 16:27 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.37.0-wmf.15/extensions/WikibaseLexeme/WikibaseLexeme.entitytypes.php: restore previous state after previous scap failed on canaries with seemingly legitimate error (duration: 01m 04s)
* 16:23 lucaswerkmeister-wmde@deploy1002: scap failed: average error rate on 2/6 canaries increased by 10x (rerun with --force to override this check, see https://logstash.wikimedia.org/goto/83629bcb5560d11e61d3085c89dd9ed6 for details)
* 16:19 herron@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on logstash2001.codfw.wmnet with reason: REIMAGE
* 16:17 herron@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on logstash2001.codfw.wmnet with reason: REIMAGE
* 16:04 jbond: upload cas_6.3.2-1+wmf10u1 to apt
* 15:45 volans@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:05:00 on 10.3.0.1,cumin1001.mgmt,debmonitor.wikimedia.org with reason: testing new feature
* 15:45 volans@cumin2002: START - Cookbook sre.hosts.downtime for 0:05:00 on 10.3.0.1,cumin1001.mgmt,debmonitor.wikimedia.org with reason: testing new feature
* 15:35 dzahn@cumin1001: conftool action : set/weight=10; selector: name=mw143[7-8].eqiad.wmnet
* 15:34 dzahn@cumin1001: conftool action : set/weight=10; selector: name=mw143[7-8].eqiad.wmnet,service=canary
* 15:17 moritzm: installing intel-microcode security updates on stretch
* 15:11 moritzm: installing apt bugfix updates from Buster 10.10 point release
* 14:51 reedy@deploy1002: Finished scap: Fix some VE translation issues for [[phab:T286679|T286679]] (duration: 04m 45s)
* 14:46 reedy@deploy1002: Started scap: Fix some VE translation issues for [[phab:T286679|T286679]]
* 14:40 papaul: powerdown ms-be2038 for BBU replacement
* 14:33 dzahn@cumin1001: conftool action : set/weight=10; selector: name=mw143[7-8].eqiad.wmnet,service=canary
* 14:19 hashar@deploy1002: Finished deploy [gerrit/gerrit@a5c9d35]: Update its-phabricator: Urlencode POST to conduit # [[phab:T280197|T280197]] (duration: 00m 09s)
* 14:19 hashar@deploy1002: Started deploy [gerrit/gerrit@a5c9d35]: Update its-phabricator: Urlencode POST to conduit # [[phab:T280197|T280197]]
* 14:07 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw1294.eqiad.wmnet
* 14:07 Jeff_Green: authdns-update to remove deprecated records related to fundraising.wikimedia.org
* 14:01 moritzm: imported systemd 241-5~bpo9+wmf1 to component/systemd241 [[phab:T287036|T287036]]
* 13:52 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw1294.eqiad.wmnet
* 13:49 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw1293.eqiad.wmnet
* 13:39 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw1293.eqiad.wmnet
* 13:20 jiji@cumin1001: conftool action : set/pooled=true; selector: name=eqiad,dnsdisc=mwdebug
* 13:15 jiji@cumin1001: conftool action : set/pooled=true; selector: name=codfw,dnsdisc=mwdebug
* 13:10 hashar@deploy1002: Synchronized php: group1 wikis to 1.37.0-wmf.15 (duration: 01m 13s)
* 13:09 godog: apply https://gerrit.wikimedia.org/r/c/operations/puppet/+/705705 to puppetdb hosts
* 13:09 hashar@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.37.0-wmf.15
* 13:09 mutante: mw1293, mw1294 - formerly jobrunner canaries, depooled, replaced by new jobrunner canaries mw1437, mw1438
* 13:08 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw129[34].eqiad.wmnet
* 13:04 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw143[78].eqiad.wmnet
* 12:45 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw[1437-1438].eqiad.wmnet with reason: new host
* 12:44 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw[1437-1438].eqiad.wmnet with reason: new host
* 11:26 hashar@deploy1002: Finished deploy [integration/docroot@0515d9c]: Support linking to individual doc.wikimedia.org tiles (duration: 00m 09s)
* 11:26 hashar@deploy1002: Started deploy [integration/docroot@0515d9c]: Support linking to individual doc.wikimedia.org tiles
* 11:11 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|d6699dae1e96b38b4fae7e8b9817d84b56d2be6c}}: GrowthExperiments: Add more wikis to linkrecommendation experiment ([[phab:T284481|T284481]]) (duration: 01m 31s)
* 10:50 moritzm: installing systemd security updates on bullseye
* 10:33 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=maps2010.codfw.wmnet
* 10:14 effie: enable puppet on mw* servers
* 09:57 XioNoX: Pushed FW rules to pfw3-eqiad/codfw - [[phab:T287038|T287038]]
* 09:34 jynus: restart db2097 [[phab:T287072|T287072]]
* 09:07 hashar@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.37.0-wmf.15
* 09:05 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 09:00 hashar@deploy1002: Finished scap: testwiki to php-1.37.0-wmf.15 and rebuild l10n cache # [[phab:T281156|T281156]] (duration: 45m 51s)
* 08:45 effie: disble puppet on codfw mw hosts to deploy 702592
* 08:44 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 08:31 godog: upgrade karma on alert hosts - [[phab:T284213|T284213]]
* 08:31 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on 18 hosts with reason: Deploying schema change to s1 [[phab:T281058|T281058]]
* 08:31 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on 18 hosts with reason: Deploying schema change to s1 [[phab:T281058|T281058]]
* 08:17 effie: enable puppet on alert*
* 08:15 hashar@deploy1002: Started scap: testwiki to php-1.37.0-wmf.15 and rebuild l10n cache # [[phab:T281156|T281156]]
* 08:08 hashar@deploy1002: Pruned MediaWiki: 1.37.0-wmf.12 (duration: 01m 35s)
* 08:05 hashar@deploy1002: Pruned MediaWiki: 1.37.0-wmf.11 (duration: 11m 51s)
* 08:02 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 07:56 XioNoX: push extra sampling on cr2-eqiad - [[phab:T286038|T286038]]
* 07:44 XioNoX: push extra sampling on cr1-eqiad - [[phab:T286038|T286038]]
* 07:38 XioNoX: update RIS peer IP on cr2-codfw
* 07:16 godog: powercycle ms-be2048
* 07:03 moritzm: installing systemd security updates on stretch
* 06:51 effie: restart memcached on eqiad mc* hosts
* 06:51 effie: enable puppet on mc* hosts
* 06:35 effie: disable puppet on mc1* hosts and icinga - [[phab:T271967|T271967]]
* 05:40 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .


== 2021-03-19 ==
== 2021-07-20 ==
* 21:11 mutante: scandium - stop apache and rerun puppet which fails after reimaging because it tries to run an nginx on port 80 which is already used by apache [[phab:T268248|T268248]]
* 20:53 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|caa5a076f39b051b01622aa3e4c9d716a8643eef}}: Set wgGEMentorDashboardBackendEnabled properly ([[phab:T285811|T285811]]) (duration: 00m 57s)
* 20:31 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on scandium.eqiad.wmnet with reason: REIMAGE
* 20:49 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.14/extensions/GrowthExperiments/maintenance/updateMenteeData.php: {{Gerrit|dafd953eb5cd35bddbd2fd348b03066420a42362}}: updateMenteeData: Make it possible to disable script per-wiki ([[phab:T285811|T285811]]) (duration: 00m 58s)
* 20:29 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on scandium.eqiad.wmnet with reason: REIMAGE
* 18:57 urbanecm: Start server-side upload for 4 large PNG files ([[phab:T285708|T285708]])
* 20:15 mutante: scandium - reimaging with buster
* 18:05 Jeff_Green: authdns-update to point fundraising.wm.o CNAME to a new server
* 20:14 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on scandium.eqiad.wmnet with reason: reimage
* 17:57 razzi@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-master1001.eqiad.wmnet with reason: REIMAGE
* 20:14 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on scandium.eqiad.wmnet with reason: reimage
* 17:55 razzi@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-master1001.eqiad.wmnet with reason: REIMAGE
* 20:11 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw2245.codfw.wmnet
* 17:06 rzl: enabled puppet on A:mw
* 19:55 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw2245.codfw.wmnet
* 16:54 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 20 hosts with reason: dealing with an-master1001 rebuild issue
* 19:53 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw2244.codfw.wmnet
* 16:54 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 20 hosts with reason: dealing with an-master1001 rebuild issue
* 19:53 legoktm@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host lists1002.wikimedia.org
* 16:53 rzl: disabled puppet on A:mw to test https://gerrit.wikimedia.org/r/676508
* 19:50 mutante: testreduce1001 - confirmed MariaDB @@datadir is /srv/data/mysql and deleting /var/lib/mysql ([[phab:T277580|T277580]])
* 16:53 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 64 hosts with reason: dealing with an-master1001 rebuild issue
* 19:40 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw2244.codfw.wmnet
* 16:53 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 64 hosts with reason: dealing with an-master1001 rebuild issue
* 19:39 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw2245.codfw.wmnet
* 16:44 dcausse@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
* 19:39 legoktm@cumin1001: START - Cookbook sre.ganeti.makevm for new host lists1002.wikimedia.org
* 16:37 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw1297.eqiad.wmnet
* 19:39 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw2244.codfw.wmnet
* 16:25 dcausse@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
* 19:37 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2252.codfw.wmnet,service=canary
* 16:24 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw1297.eqiad.wmnet
* 19:37 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2251.codfw.wmnet,service=canary
* 16:21 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw1290.eqiad.wmnet
* 19:33 dzahn@cumin1001: conftool action : set/weight=1; selector: name=mw2252.codfw.wmnet,service=canary
* 16:11 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw1290.eqiad.wmnet
* 19:33 dzahn@cumin1001: conftool action : set/weight=1; selector: name=mw2251.codfw.wmnet,service=canary
* 16:10 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw1289.eqiad.wmnet
* 19:24 mutante: deploy2002 - re-enabled puppet, reverted patch of scap-sync-master
* 15:59 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw1289.eqiad.wmnet
* 18:46 mutante: deploy2002 - disable puppet, copy modified version of scap-master-sync over it that does not --exclude="**/cache/l10n/*.cdb"  (for [[phab:T275826|T275826]])
* 15:57 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw129[07].eqiad.wmnet
* 16:01 effie: upgrade memcached on mc-gp200*
* 15:57 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw1289.eqiad.wmnet
* 12:36 klausman@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ml-serve2002.codfw.wmnet with reason: REIMAGE
* 15:48 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 12:34 klausman@cumin2001: START - Cookbook sre.hosts.downtime for 2:00:00 on ml-serve2002.codfw.wmnet with reason: REIMAGE
* 15:23 vgutierrez: pool dns1002 - [[phab:T286069|T286069]]
* 12:10 effie: upgrade memcached on mc1026,mc2026
* 15:21 vgutierrez: pool cp[1087-1090].eqiad.wmnet - [[phab:T286069|T286069]]
* 11:37 klausman@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
* 15:19 jmm@puppetmaster1001: conftool action : set/pooled=yes; selector: name=ldap-replica1004.wikimedia.org
* 11:37 klausman@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 15:14 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1297.eqiad.wmnet
* 11:36 klausman@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
* 15:14 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1290.eqiad.wmnet
* 11:36 klausman@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
* 15:14 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1289.eqiad.wmnet
* 11:30 klausman@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
* 15:06 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on 12 hosts with reason: Deploying schema change to s3 [[phab:T281058|T281058]]
* 11:29 klausman@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
* 15:06 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on 12 hosts with reason: Deploying schema change to s3 [[phab:T281058|T281058]]
* 11:29 klausman@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
* 14:53 urbanecm: Start server-side upload for 7 large PNG files ([[phab:T285708|T285708]])
* 11:29 klausman@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
* 14:51 herron: depooled and scheduled downtime for kafka-main100[45]
* 11:29 klausman@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
* 14:51 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on lvs1016.eqiad.wmnet with reason: eqiad row D maintenance
* 11:29 klausman@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 14:50 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on lvs1016.eqiad.wmnet with reason: eqiad row D maintenance
* 11:27 akosiaris@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
* 14:48 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on dns1002.wikimedia.org with reason: eqiad row D maintenance
* 11:27 akosiaris@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 14:48 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on dns1002.wikimedia.org with reason: eqiad row D maintenance
* 11:20 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ml-serve2002.codfw.wmnet with reason: REIMAGE
* 14:46 vgutierrez: depool dns1002 - [[phab:T286069|T286069]]
* 11:18 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on ml-serve2002.codfw.wmnet with reason: REIMAGE
* 14:40 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on cp[1087-1090].eqiad.wmnet with reason: eqiad row D maintenance
* 10:45 klausman@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
* 14:40 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on cp[1087-1090].eqiad.wmnet with reason: eqiad row D maintenance
* 10:45 klausman@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 14:36 vgutierrez: depool cp[1087-1090].eqiad.wmnet - [[phab:T286069|T286069]]
* 10:42 moritzm: installing dbmonitor1002 [[phab:T224589|T224589]]
* 14:30 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on 18 hosts with reason: Deploying schema change to s8 [[phab:T281058|T281058]]
* 10:42 klausman@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
* 14:30 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on 18 hosts with reason: Deploying schema change to s8 [[phab:T281058|T281058]]
* 10:42 klausman@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 14:25 jayme@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'sync'.
* 10:41 klausman@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
* 14:25 jayme@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'sync'.
* 10:41 klausman@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 14:22 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'sync'.
* 10:11 kharlan@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'linkrecommendation' for release 'external' .
* 14:21 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'sync'.
* 10:10 kharlan@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'linkrecommendation' for release 'production' .
* 14:12 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on 18 hosts with reason: Deploying schema change to s4 [[phab:T281058|T281058]]
* 10:05 kharlan@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'linkrecommendation' for release 'external' .
* 14:12 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on 18 hosts with reason: Deploying schema change to s4 [[phab:T281058|T281058]]
* 10:04 kharlan@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'linkrecommendation' for release 'production' .
* 14:09 jiji@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
* 09:40 kharlan@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'linkrecommendation' for release 'staging' .
* 14:08 jiji@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
* 09:36 jayme@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'proton' for release 'production' .
* 14:03 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=maps2009.codfw.wmnet
* 08:22 elukey: upload alluxio 2.4.1 to thirdparty/bigtop15 on stretch/buster-wikimedia
* 14:00 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 07:16 ryankemper: [[phab:T275885|T275885]] `ryankemper@cumin1001:~$ sudo cumin 'P<nowiki>{</nowiki>relforge*<nowiki>}</nowiki>' 'sudo run-puppet-agent'` (change hadn't been merged when I ran the agent earlier)
* 13:56 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=maps2008.codfw.wmnet
* 04:04 eileen: civicrm revision changed from {{Gerrit|99bf1c9210}} to {{Gerrit|39d24e8b0a}}, config revision is {{Gerrit|26b02db7ba}}
* 13:50 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 15 hosts with reason: Deploying schema change to s7 [[phab:T281058|T281058]]
* 03:27 ryankemper: [wdqs] `ryankemper@wdqs1013:~$ sudo systemctl restart wdqs-blazegraph`
* 13:50 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 15 hosts with reason: Deploying schema change to s7 [[phab:T281058|T281058]]
* 03:26 ryankemper: [[phab:T275885|T275885]] `ryankemper@cumin1001:~$ sudo cumin 'P<nowiki>{</nowiki>relforge*<nowiki>}</nowiki>' 'sudo run-puppet-agent'`
* 13:45 hnowlan@puppetmaster1001: conftool action : set/pooled=false; selector: dnsdisc=kartotherian,name=eqiad
* 02:43 ryankemper: [[phab:T275885|T275885]] Revoking current `relforge` TLS cert in advance of generation of new cert: `ryankemper@puppetmaster1001:/srv/private$ sudo puppet cert clean relforge.svc.eqiad.wmnet`
* 13:45 hnowlan@puppetmaster1001: conftool action : set/pooled=true; selector: dnsdisc=kartotherian,name=codfw
* 00:51 dancy@deploy1002: Synchronized php-1.36.0-wmf.35/extensions/LiquidThreads/classes/Thread.php: [[phab:T277772|T277772]] (duration: 00m 58s)
* 13:43 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=maps200[89].codfw.wmnet
* 00:45 mutante: testreduce1001 - stop mysql; rsyncing /var/lib/mysql to /srv/data/mysql ([[phab:T277580|T277580]])
* 13:30 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=maps20(10{{!}}0[1-9]).codfw.wmnet
* 13:25 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 15 hosts with reason: Deploying schema change to s2 [[phab:T281058|T281058]]
* 13:25 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 15 hosts with reason: Deploying schema change to s2 [[phab:T281058|T281058]]
* 13:14 gehel: set/pooled=inactive on elastic1039 - disk failure - [[phab:T285643|T285643]]
* 13:14 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 13 hosts with reason: Deploying schema change to s5 [[phab:T281058|T281058]]
* 13:14 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 13 hosts with reason: Deploying schema change to s5 [[phab:T281058|T281058]]
* 13:13 gehel@puppetmaster1001: conftool action : set/pooled=inactive; selector: name=elastic1039.eqiad.wmnet
* 12:44 moritzm: installing systemd security updates on buster
* 12:23 elukey: reboot ml-serve-ctrl vms to pick up new vcores settings
* 12:22 elukey: bump vcpus from 2 to 4 on ml-serve-ctrl VMs on Ganeti (load/cpu usage increased steadily since we deployed kubelets on them)
* 11:58 Lucas_WMDE: EU config+backport window done
* 11:58 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/CommonSettings-labs.php: Config: [[gerrit:705505{{!}}Avoid using User::newFrom* methods]] (3/3) (duration: 00m 56s)
* 11:58 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on maps1007.eqiad.wmnet with reason: Testing impact of tilerator
* 11:58 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on maps1007.eqiad.wmnet with reason: Testing impact of tilerator
* 11:56 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/CommonSettings.php: Config: [[gerrit:705505{{!}}Avoid using User::newFrom* methods]] (2/3) (duration: 00m 56s)
* 11:55 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/wikitech.php: Config: [[gerrit:705505{{!}}Avoid using User::newFrom* methods]] (1/3) (duration: 00m 56s)
* 11:48 urbanecm@deploy1002: Synchronized logos/config.yaml: {{Gerrit|e52ae37dc2010ed2483328921a274e4934940791}}: otrs_wikiwiki: Update logo to use VRT instead of OTRS ([[phab:T280400|T280400]]; 3/3) (duration: 00m 56s)
* 11:47 urbanecm@deploy1002: Synchronized wmf-config/logos.php: {{Gerrit|e52ae37dc2010ed2483328921a274e4934940791}}: otrs_wikiwiki: Update logo to use VRT instead of OTRS ([[phab:T280400|T280400]]; 2/3) (duration: 00m 56s)
* 11:46 urbanecm@deploy1002: Synchronized static/images/project-logos: {{Gerrit|e52ae37dc2010ed2483328921a274e4934940791}}: otrs_wikiwiki: Update logo to use VRT instead of OTRS ([[phab:T280400|T280400]]; 1/3) (duration: 00m 57s)
* 11:35 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:705498{{!}}Add patroller group for ckbwiki (T285221)]] (duration: 00m 57s)
* 11:23 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/CommonSettings-labs.php: Config: [[gerrit:705107{{!}}Typo fix: "the the" -> "the" (T201491)]] (2/2, beta) (duration: 00m 56s)
* 11:22 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/CommonSettings.php: Config: [[gerrit:705107{{!}}Typo fix: "the the" -> "the" (T201491)]] (1/2, prod) (duration: 00m 57s)
* 11:18 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:704867{{!}}Update config for language switching on pilot wikis (T286459)]] (duration: 00m 59s)
* 11:06 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 11:03 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 10:58 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 10:57 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 10:53 oblivian@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 10:43 hnowlan@puppetmaster1001: conftool action : set/weight=10; selector: name=maps100[79].eqiad.wmnet
* 10:35 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=maps100[79].eqiad.wmnet
* 10:11 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 09:39 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 14 hosts with reason: Deploying schema change to s6 [[phab:T281058|T281058]]
* 09:39 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 14 hosts with reason: Deploying schema change to s6 [[phab:T281058|T281058]]
* 08:27 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mw2352.codfw.wmnet
* 08:21 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host mw2352.codfw.wmnet
* 08:02 btullis: racadm serveraction powercycle on an-worker1106 due to CPU soft lock-ups on host
* 07:54 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host idp-test2001.wikimedia.org
* 07:50 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host idp-test2001.wikimedia.org
* 07:10 jmm@puppetmaster1001: conftool action : set/pooled=no; selector: name=ldap-replica1004.wikimedia.org
* 03:17 eileen: civicrm revision changed from {{Gerrit|20e9ef6bbb}} to {{Gerrit|819c11307d}}, config revision is {{Gerrit|bb405c5232}}


== 2021-03-18 ==
== 2021-07-19 ==
* 23:56 legoktm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Don't define a default icon ([[phab:T274199|T274199]]) (duration: 00m 57s)
* 20:48 urbanecm: Deploy security patch for [[phab:T286884|T286884]]
* 23:38 brennen@deploy1002: Synchronized php-1.36.0-wmf.35/includes/user/ActorStore.php: Backport: [[gerrit:673115{{!}}ActorStore::getActorById - fall back to master. (T277795)]] (duration: 00m 57s)
* 20:29 vgutierrez: pool text@codfw - [[phab:T286921|T286921]]
* 23:35 brennen@deploy1002: Synchronized php-1.36.0-wmf.35/includes/user/ActorStore.php: Backport: [[gerrit:673115{{!}}ActorStore::getActorById - fall back to master. (T277795)]] (duration: 00m 58s)
* 20:23 volans@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 23:25 dduvall@deploy1002: Synchronized .pipeline: config: [[gerrit:673375{{!}}Use build environment HTTP proxy for APT sources (T277109)]] (duration: 01m 02s)
* 20:18 volans@cumin2002: START - Cookbook sre.dns.netbox
* 23:06 brennen: train status: 1.36.0-wmf.35 ([[phab:T274939|T274939]]) stable on all wikis after deploy of hotfix for [[phab:T277795|T277795]]
* 20:08 dancy@deploy1002: Synchronized php-1.37.0-wmf.14/includes/export/WikiExporter.php: Backport: [[gerrit:705467{{!}}prevent PageIdentity checks in RevisionStore from breaking xml dumps (T286877)]] (duration: 00m 58s)
* 22:53 brennen@deploy1002: Synchronized php-1.36.0-wmf.35/includes/specials/SpecialContributions.php: Backport: [[gerrit:673115{{!}}ActorStore::getActorById - fall back to master. (T277795)]] (duration: 01m 07s)
* 19:21 Jeff_Green: authdns-update to remove payments100[1-4].frack.eqiad.wmnet
* 22:30 dduvall@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'blubberoid' for release 'production' .
* 19:14 dancy@deploy1002: Synchronized php-1.37.0-wmf.14/includes/Revision/RevisionStore.php: Backport: [[gerrit:705448{{!}}Add sanity check to newRevisionFromRowAndSlots. (T286877)]] (duration: 00m 57s)
* 22:29 dduvall@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'blubberoid' for release 'production' .
* 18:53 vgutierrez: running puppet and restarting pybal on lvs2009 - [[phab:T286921|T286921]]
* 22:25 dduvall@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'blubberoid' for release 'staging' .
* 18:46 topranks: Running homer to re-enable port xe-2/0/43 on asw2-a2-codfw (lvs2009) - [[phab:T286921|T286921]]
* 20:37 dancy@deploy1002: Synchronized php-1.36.0-wmf.35/extensions/LiquidThreads/classes/Thread.php: (no justification provided) (duration: 01m 05s)
* 18:46 brennen: gerrit1001: restarting gerrit
* 19:04 dancy@deploy1002: rebuilt and synchronized wikiversions files: group2 wikis to 1.36.0-wmf.35
* 18:40 vgutierrez: stop pybal on lvs2009  - [[phab:T286921|T286921]]
* 18:28 legoktm: re-enabled puppet on registry*
* 18:38 brennen: re-enabling puppet on gerrit1001]
* 18:17 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|44eddcc}}: hrwiki: Deploy Growth features to newcomers ([[phab:T275684|T275684]]) (duration: 01m 08s)
* 18:35 vgutierrez: running puppet and restarting pybal on lvs2010 - [[phab:T286921|T286921]]
* 18:12 urbanecm@deploy1002: Synchronized dblists/growthexperiments.dblist: {{Gerrit|179d9e5}}: mswiki: Enable Growth features in stealth mode ([[phab:T277562|T277562]]; 2/2) (duration: 01m 08s)
* 18:27 ryankemper: [[phab:T264053|T264053]] Deploying fix for timer issue on relforge: `ryankemper@cumin1001:~$ sudo cumin -b 2 'P<nowiki>{</nowiki>relforge*<nowiki>}</nowiki>' 'sudo systemctl stop elasticsearch-disable-readahead.timer && sudo systemctl disable elasticsearch-disable-readahead.timer && rm -fv /etc/systemd/system/elasticsearch-disable-readahead.timer && rm -fv /usr/lib/systemd/system/elasticsearch-disable-readahead.timer && sudo run-puppet-agent'`
* 18:10 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|179d9e5}}: mswiki: Enable Growth features in stealth mode ([[phab:T277562|T277562]]; 1/2) (duration: 01m 11s)
* 18:27 topranks: Running homer to re-enable port xe-2/0/44 on asw2-a2-codfw (lvs2010)
* 17:58 legoktm: disabled puppet on registry* for rolling out https://gerrit.wikimedia.org/r/672537
* 18:27 ryankemper: [[phab:T264053|T264053]] Deploying fix for timer issue on cloudelastic: `ryankemper@cumin1001:~$ sudo cumin -b 6 'P<nowiki>{</nowiki>cloudelastic*<nowiki>}</nowiki>' 'sudo systemctl stop elasticsearch-disable-readahead.timer && sudo systemctl disable elasticsearch-disable-readahead.timer && rm -fv /etc/systemd/system/elasticsearch-disable-readahead.timer && rm -fv /usr/lib/systemd/system/elasticsearch-disable-readahead.timer && sudo run-puppet-agent'`
* 17:50 urbanecm@deploy1002: Synchronized dblists/growthexperiments.dblist: {{Gerrit|55aa6cb}}: tewiki: Enable Growth features in stealth mode ([[phab:T277491|T277491]]; 2/2) (duration: 01m 08s)
* 18:22 vgutierrez: disable puppet & stop pybal on lvs2010 - [[phab:T286921|T286921]]
* 17:50 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw2242.codfw.wmnet
* 18:20 vgutierrez: enabling pybal on lvs2007 - [[phab:T286921|T286921]]
* 17:48 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|55aa6cb}}: tewiki: Enable Growth features in stealth mode ([[phab:T277491|T277491]]; 1/2) (duration: 01m 10s)
* 18:19 ryankemper: [[phab:T264053|T264053]] Deploying fix for timer issue: `ryankemper@cumin1001:~$ sudo cumin -b 36 'P<nowiki>{</nowiki>elastic*<nowiki>}</nowiki>' 'sudo systemctl stop elasticsearch-disable-readahead.timer && sudo systemctl disable elasticsearch-disable-readahead.timer && rm -fv /etc/systemd/system/elasticsearch-disable-readahead.timer && rm -fv /usr/lib/systemd/system/elasticsearch-disable-readahead.timer && sudo run-puppet-agent'`
* 17:45 urbanecm@deploy1002: Synchronized dblists/growthexperiments.dblist: {{Gerrit|04342e9bb0765a6a58ad78bd7eaa380d4167f0c1}}: simplewiki: Enable Growth team features in stealth mode ([[phab:T277550|T277550]]) (duration: 01m 09s)
* 18:14 topranks: Running homer to re-enable asw-a2-codfw xe-2/0/45 port [lvs2007]
* 17:42 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|04342e9bb0765a6a58ad78bd7eaa380d4167f0c1}}: simplewiki: Enable Growth team features in stealth mode ([[phab:T277550|T277550]]) (duration: 01m 10s)
* 18:06 dancy@deploy1002: Synchronized .pipeline: Config: [[gerrit:705437{{!}}pipeline: Perform mergeMessageFileList and rebuildLocalisationCache separately]] (duration: 00m 56s)
* 17:40 dduvall@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'blubberoid' for release 'staging' .
* 17:54 mbsantos@deploy1002: Finished deploy [tilerator/deploy@82e5f94]: (no justification provided) (duration: 00m 22s)
* 17:31 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw2242.codfw.wmnet
* 17:54 mbsantos@deploy1002: Started deploy [tilerator/deploy@82e5f94]: (no justification provided)
* 17:28 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw2241.codfw.wmnet
* 17:53 mbsantos@deploy1002: Finished deploy [tilerator/deploy@82e5f94]: (no justification provided) (duration: 00m 22s)
* 17:09 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw2241.codfw.wmnet
* 17:53 mbsantos@deploy1002: Started deploy [tilerator/deploy@82e5f94]: (no justification provided)
* 17:07 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw2240.codfw.wmnet
* 17:53 mbsantos@deploy1002: Finished deploy [tilerator/deploy@82e5f94]: (no justification provided) (duration: 00m 21s)
* 16:54 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw2240.codfw.wmnet
* 17:53 mbsantos@deploy1002: Started deploy [tilerator/deploy@82e5f94]: (no justification provided)
* 16:51 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw2239.codfw.wmnet
* 17:52 mbsantos@deploy1002: Finished deploy [tilerator/deploy@82e5f94]: (no justification provided) (duration: 00m 15s)
* 16:38 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw2239.codfw.wmnet
* 17:52 mbsantos@deploy1002: Started deploy [tilerator/deploy@82e5f94]: (no justification provided)
* 16:37 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw2242.codfw.wmnet
* 17:52 mbsantos@deploy1002: Finished deploy [tilerator/deploy@82e5f94]: (no justification provided) (duration: 00m 16s)
* 16:37 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw2241.codfw.wmnet
* 17:51 mbsantos@deploy1002: Started deploy [tilerator/deploy@82e5f94]: (no justification provided)
* 16:37 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw2240.codfw.wmnet
* 17:42 ryankemper: [Elastic] Noted `Jul 16 18:31:20 elastic2038 elasticsearch[957]: 2021-07-16 18:31:20,657 main ERROR Unknown GELF server hostname:udp:logstash.svc.eqiad.wmnet` in elasticsearch service logs (unit had been running for 2 days) thus the restart of the elasticsearch service
* 16:37 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw2239.codfw.wmnet
* 17:41 ryankemper: [Elastic] Restarted elasticsearch services on `elastic2038`; afterwards restarted prometheus exporters; no units failed any longer
* 15:33 shdubsh: clean up dead letter queue and restart all logstashes
* 17:30 volans: running puppet on elastic2038 after nework was restored
* 14:50 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 17:26 mbsantos@deploy1002: Finished deploy [kartotherian/deploy@978b674]: (no justification provided) (duration: 00m 14s)
* 14:43 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 17:26 mbsantos@deploy1002: Started deploy [kartotherian/deploy@978b674]: (no justification provided)
* 14:37 dcausse: repooling wdqs1005
* 17:26 mbsantos@deploy1002: Finished deploy [kartotherian/deploy@978b674]: (no justification provided) (duration: 00m 16s)
* 14:29 hashar: Restarting CI Jenkins for plugin upgrade
* 17:25 mbsantos@deploy1002: Started deploy [kartotherian/deploy@978b674]: (no justification provided)
* 13:49 elukey: reboot analytics1066
* 17:25 mbsantos@deploy1002: Finished deploy [kartotherian/deploy@978b674]: (no justification provided) (duration: 00m 21s)
* 13:23 ladsgroup@deploy1002: Synchronized php-1.36.0-wmf.35/extensions/Wikibase/repo: [[gerrit:673108{{!}}languageLabelDescriptionAliases: use getLanguageNameByCode]] ([[phab:T275611|T275611]] [[phab:T277722|T277722]]) (duration: 01m 14s)
* 17:25 mbsantos@deploy1002: Started deploy [kartotherian/deploy@978b674]: (no justification provided)
* 12:58 jbond42: upload cas_6.3.2 to apt buster-wikimedia
* 17:24 mbsantos@deploy1002: Finished deploy [kartotherian/deploy@978b674]: (no justification provided) (duration: 00m 21s)
* 11:37 mvolz@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'zotero' for release 'production' .
* 17:24 mbsantos@deploy1002: Started deploy [kartotherian/deploy@978b674]: (no justification provided)
* 11:34 mvolz@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'zotero' for release 'production' .
* 17:23 mbsantos@deploy1002: Finished deploy [kartotherian/deploy@978b674]: (no justification provided) (duration: 00m 21s)
* 11:25 mvolz@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'zotero' for release 'staging' .
* 17:23 volans: running authdns-update to force-update authdns2001
* 11:24 urbanecm@deploy1002: Synchronized wmf-config/flaggedrevs.php: {{Gerrit|896c9f019b17d1ad3a1589d377158ca2fb91ebaa}}: flaggedrevs: Disable multiple dimensions in hewikisource (duration: 01m 09s)
* 17:23 mbsantos@deploy1002: Started deploy [kartotherian/deploy@978b674]: (no justification provided)
* 11:20 urbanecm@deploy1002: Synchronized php-1.36.0-wmf.35/extensions/GrowthExperiments/includes/HomepageHooks.php: {{Gerrit|3b2aa1aa28e9d204f32ae937a84ec211137cbb2e}}: Remove variant C from list of valid variants ([[phab:T277727|T277727]]) (duration: 01m 09s)
* 17:23 volans@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 11:16 mvolz@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'citoid' for release 'production' .
* 17:21 XioNoX: remove ns1 redirect - [[phab:T286787|T286787]]
* 11:14 mvolz@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'citoid' for release 'production' .
* 17:19 volans@cumin2002: START - Cookbook sre.dns.netbox
* 11:11 mvolz@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'citoid' for release 'staging' .
* 17:17 volans@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 11:11 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|0005676e704cad907655a4a0bca7bd2164714b1c}}: GrowthExperiments: set $wgGEHomepageNewAccountVariants to D only ([[phab:T277727|T277727]]) (duration: 01m 10s)
* 17:14 volans@cumin2002: START - Cookbook sre.dns.netbox
* 11:08 urbanecm@deploy1002: Synchronized wmf-config/CommonSettings.php: NOOP: {{Gerrit|e7f5eac}}: Enable CentralAuth IRC feed in beta cluster ([[phab:T277432|T277432]]) (duration: 01m 12s)
* 17:13 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw[1286-1287].eqiad.wmnet
* 09:13 _joe_: hard reboot of snapshot1005
* 17:10 XioNoX: enable asw-a2-codfw access ports - [[phab:T286787|T286787]]
* 09:04 _joe_: attempted reboot of snapshot1005, read-only filesystem and probably disks are broken beyond repair
* 17:04 XioNoX: enable cr1-codfw / et-0/0/0 - [[phab:T286787|T286787]]
* 08:27 godog: swift eqiad-prod: less weight for ms-be[1019-1026] - [[phab:T272836|T272836]]
* 16:54 brennen: gerrit up and running with manual configuration edit to use ipv4 address
* 08:18 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ml-serve1004.eqiad.wmnet with reason: REIMAGE
* 16:51 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=logstash2021.codfw.wmnet
* 08:16 akosiaris@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on ml-serve1004.eqiad.wmnet with reason: REIMAGE
* 16:51 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw[1286-1287].eqiad.wmnet
* 08:03 marostegui@cumin1001: dbctl commit (dc=all): 'db1126 (re)pooling @ 100%: Slowly repool db1126', diff saved to https://phabricator.wikimedia.org/P14946 and previous config saved to /var/cache/conftool/dbconfig/20210318-080258-root.json
* 16:46 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw1284.eqiad.wmnet
* 08:02 akosiaris: reimage ml-serve1004 to debug a docker volume_group issue
* 16:40 dancy@deploy1002: Finished deploy [gerrit/gerrit@4f29981]: Gerrit to 3.2.11 on gerrit1001 (duration: 00m 08s)
* 07:47 marostegui@cumin1001: dbctl commit (dc=all): 'db1126 (re)pooling @ 75%: Slowly repool db1126', diff saved to https://phabricator.wikimedia.org/P14945 and previous config saved to /var/cache/conftool/dbconfig/20210318-074754-root.json
* 16:40 hashar: Upgrading gerrit1001 with dancy & brennen
* 07:32 marostegui@cumin1001: dbctl commit (dc=all): 'db1126 (re)pooling @ 50%: Slowly repool db1126', diff saved to https://phabricator.wikimedia.org/P14944 and previous config saved to /var/cache/conftool/dbconfig/20210318-073250-root.json
* 16:40 dancy@deploy1002: Started deploy [gerrit/gerrit@4f29981]: Gerrit to 3.2.11 on gerrit1001
* 07:20 dcausse: depooling & restarting blazegraph on wdqs1005
* 16:40 XioNoX: update asw-a2-codfw serial number - [[phab:T286787|T286787]]
* 07:19 marostegui: Deploy schema change on s4 codfw master, lag will appear - [[phab:T276150|T276150]] [[phab:T276156|T276156]]
* 16:39 dcausse@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
* 07:17 marostegui@cumin1001: dbctl commit (dc=all): 'db1126 (re)pooling @ 25%: Slowly repool db1126', diff saved to https://phabricator.wikimedia.org/P14943 and previous config saved to /var/cache/conftool/dbconfig/20210318-071747-root.json
* 16:33 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw1284.eqiad.wmnet
* 07:15 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1156.eqiad.wmnet with reason: REIMAGE
* 16:31 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on gerrit1001.wikimedia.org with reason: maintenance
* 07:13 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1156.eqiad.wmnet with reason: REIMAGE
* 16:31 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on gerrit1001.wikimedia.org with reason: maintenance
* 06:32 marostegui@cumin1001: dbctl commit (dc=all): 'Add db1161 to dbctl, depooled [[phab:T258361|T258361]]', diff saved to https://phabricator.wikimedia.org/P14942 and previous config saved to /var/cache/conftool/dbconfig/20210318-063241-marostegui.json
* 16:31 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on gerrit2001.wikimedia.org with reason: maintenance
* 06:22 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db2120', diff saved to https://phabricator.wikimedia.org/P14941 and previous config saved to /var/cache/conftool/dbconfig/20210318-062201-marostegui.json
* 16:31 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on gerrit2001.wikimedia.org with reason: maintenance
* 06:04 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1126 for schema change', diff saved to https://phabricator.wikimedia.org/P14940 and previous config saved to /var/cache/conftool/dbconfig/20210318-060445-marostegui.json
* 16:24 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on logstash2021.codfw.wmnet with reason: maintenace
* 03:46 andrewbogott: restarting slapd on seaborgium, serpens, and r-o ldap replicas (we're getting irregular connection failures)
* 16:24 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on logstash2021.codfw.wmnet with reason: maintenace
* 00:05 eileen: tools revision changed from {{Gerrit|b7b4060c30}} to {{Gerrit|ef54260b0d}}
* 16:21 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 16:21 hashar: upgrading gerrit replica on gerrit2001 and restarting
* 16:21 mutante: depooled logstash2021 for dcops maintenance work
* 16:20 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=logstash2021.codfw.wmnet
* 16:19 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw128[6-7].eqiad.wmnet
* 16:19 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1284.eqiad.wmnet
* 16:18 dancy@deploy1002: Finished deploy [gerrit/gerrit@4f29981]: Gerrit to 3.2.11 on gerrit2001 (duration: 00m 10s)
* 16:18 dancy@deploy1002: Started deploy [gerrit/gerrit@4f29981]: Gerrit to 3.2.11 on gerrit2001
* 16:15 krinkle@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|310be45f7}} (duration: 00m 57s)
* 16:12 mutante: mw1434, mw1435, mw1436 - new API appservers in production, pooled first time
* 16:11 dzahn@cumin1001: conftool action : set/weight=30; selector: name=mw143[4-6].eqiad.wmnet
* 16:08 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw143[5-6].eqiad.wmnet
* 16:08 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1434.eqiad.wmnet
* 15:59 krinkle@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|I2bdfbd258e}} (duration: 00m 57s)
* 15:57 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 15:53 krinkle@deploy1002: Synchronized wmf-config/CommonSettings.php: {{Gerrit|I069c7b53}} (duration: 00m 58s)
* 15:49 dcausse@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
* 15:43 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 15:12 sukhe: ran homer for Gerrit 705374: Remove malmok.wikimedia.org from anycast_neighbors in codfw
* 15:10 godog: +100G to prometheus/ops in codfw
* 14:59 vgutierrez: rolling restart of eqiad pybal instances
* 14:43 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw[1434-1436].eqiad.wmnet with reason: new host
* 14:43 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw[1434-1436].eqiad.wmnet with reason: new host
* 14:42 vgutierrez: rolling restart of codfw pybal instances
* 14:33 vgutierrez: rolling restart of eqsin pybal instances
* 14:23 vgutierrez: rolling restart of ulsfo pybal instances
* 14:01 sukhe@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts malmok.wikimedia.org
* 13:57 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw[1273-1275].eqiad.wmnet
* 13:55 sukhe@cumin1001: START - Cookbook sre.hosts.decommission for hosts malmok.wikimedia.org
* 13:55 sukhe@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts malmok.wikimedia.org
* 13:47 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw[1273-1275].eqiad.wmnet
* 13:47 sukhe@cumin1001: START - Cookbook sre.hosts.decommission for hosts malmok.wikimedia.org
* 13:44 volans@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 13:42 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts mw1272.eqiad.wmnet
* 13:41 volans@cumin2002: START - Cookbook sre.dns.netbox
* 13:32 sukhe@cumin1001: END (ERROR) - Cookbook sre.dns.netbox (exit_code=97)
* 13:31 sukhe@cumin1001: START - Cookbook sre.dns.netbox
* 13:26 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw1272.eqiad.wmnet
* 13:21 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts mw1270.eqiad.wmnet
* 13:12 jayme@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
* 13:09 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw1270.eqiad.wmnet
* 13:09 sukhe@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts malmok.wikimedia.org
* 13:08 jayme@cumin1001: START - Cookbook sre.dns.netbox
* 13:08 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1272.eqiad.wmnet
* 13:06 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw127[3-5].eqiad.wmnet
* 13:01 dzahn@cumin1001: conftool action : set/weight=1; selector: name=mw1415.eqiad.wmnet,service=canary
* 13:01 sukhe@cumin1001: START - Cookbook sre.hosts.decommission for hosts malmok.wikimedia.org
* 13:01 dzahn@cumin1001: conftool action : set/weight=1; selector: name=mw1414.eqiad.wmnet,service=canary
* 11:40 moritzm: installing bluez security updates
* 11:31 Lucas_WMDE: EU backport+config window done
* 11:05 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings-labs.php: Config: [[gerrit:703205{{!}}Add config for updated PropertySuggester beta cluster (T285098)]] (beta-only) (duration: 00m 57s)
* 10:19 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host copernicium.wikimedia.org
* 10:14 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host copernicium.wikimedia.org
* 09:52 moritzm: imported megacli for bullseye-wikimedia [[phab:T282272|T282272]] [[phab:T275873|T275873]]
* 09:43 topranks: Running homer against cr2-eqdfw to change descr and move interface ae0, which connects to Facebook, into the external-links group.
* 09:30 godog: bounce prometheus@k8s* on prometheus2004 due to cache not refreshing alert
* 08:15 vgutierrez: depool codfw text traffic
* 07:11 elukey: roll restart kafka mirror maker on kafka-main200* hosts - stuck after Friday's events/incident
* 03:26 twentyafterfour: restarted phd on phab1001
* 03:25 twentyafterfour: investigating PHD failure


== 2021-03-17 ==
== 2021-07-16 ==
* 23:42 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|c730dd5feb865a8325279cd4e76c133512f14251}}: idwiki: Deploy Growth features to newcomers ([[phab:T259024|T259024]]) (duration: 01m 08s)
* 19:50 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on copernicium.wikimedia.org with reason: REIMAGE
* 23:40 urbanecm@deploy1002: Synchronized wmf-config/CommonSettings.php: {{Gerrit|5c14e7d2045f0905f7e85b249e821bbe8d69c600}}: Define confirmed group in MediaWikiServices hook ([[phab:T275334|T275334]], [[phab:T277704|T277704]], [[phab:T275310|T275310]], [[phab:T275333|T275333]]) (duration: 01m 08s)
* 19:48 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on copernicium.wikimedia.org with reason: REIMAGE
* 23:30 ebernhardson@deploy1002: Synchronized php-1.36.0-wmf.35/extensions/CirrusSearch/profiles/FallbackProfiles.config.php: Add fallback profile including glent m1 (duration: 01m 42s)
* 19:10 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on copernicium.wikimedia.org with reason: REIMAGE
* 22:27 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1038.eqiad.wmnet with reason: REIMAGE
* 19:08 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on copernicium.wikimedia.org with reason: REIMAGE
* 22:25 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1037.eqiad.wmnet with reason: REIMAGE
* 18:29 ryankemper: [Elastic] Kicked off powercycle on `elastic2038`, this will effectively restart its `elasticsearch_6@production-search-omega-codfw.service`. We're back to 3 eligible masters for `codfw-omega`
* 22:25 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1038.eqiad.wmnet with reason: REIMAGE
* 18:28 ryankemper: [Elastic] Restarted `elasticsearch_6@production-search-omega-codfw.service` on `elastic2051`; will restart on `elastic2038` by powercycling the node from mgmt port given that it is ssh unreachable
* 22:23 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1037.eqiad.wmnet with reason: REIMAGE
* 18:24 ryankemper: [Elastic] `puppet-merge`d https://gerrit.wikimedia.org/r/c/operations/puppet/+/704973; ran puppet across `elastic2*` hosts: `sudo cumin 'P<nowiki>{</nowiki>elastic2*<nowiki>}</nowiki>' 'sudo run-puppet-agent'` (puppet run succeeded on all but the 3 nodes taken offline by the switch failure: `elastic[2037-2038,2055].codfw.wmnet`)
* 20:52 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1184.eqiad.wmnet with reason: REIMAGE
* 18:19 ryankemper: [Elastic] Given that we will likely have switch A3 out of commission over the weekend, Search team is going to change masters so that we no longer have a master in row A3. New desired config: `B1 (elastic2042), C2 (elastic2047), D2 (elastic2051)`, see https://gerrit.wikimedia.org/r/c/operations/puppet/+/704973
* 20:50 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1183.eqiad.wmnet with reason: REIMAGE
* 16:29 vgutierrez: restarting pybal on lvs2009 to decrease api depool threshold
* 20:49 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1184.eqiad.wmnet with reason: REIMAGE
* 15:48 vgutierrez: restart pybal on lvs2010
* 20:48 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1182.eqiad.wmnet with reason: REIMAGE
* 15:38 jiji@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
* 20:47 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1183.eqiad.wmnet with reason: REIMAGE
* 15:34 jiji@cumin1001: START - Cookbook sre.dns.netbox
* 20:46 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1181.eqiad.wmnet with reason: REIMAGE
* 15:24 godog: downtime flappy pages in codfw for 40 minutes
* 20:45 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1182.eqiad.wmnet with reason: REIMAGE
* 15:14 godog: set alert2001 as active in netbox (was staged) - [[phab:T247966|T247966]]
* 20:44 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1180.eqiad.wmnet with reason: REIMAGE
* 15:14 vgutierrez: vgutierrez@lvs2010:~$ sudo -i ifup ens2f1np1
* 20:43 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1181.eqiad.wmnet with reason: REIMAGE
* 14:41 vgutierrez: vgutierrez@lvs2009:~$ sudo -i ifdown ens2f1np1
* 20:42 andrew@deploy1002: Finished deploy [horizon/deploy@17ea780]: display volume usage summaries (duration: 03m 34s)
* 14:40 topranks: Running homer to disable et-0/0/0 on cr1-codfw, which connects to currently dead device asw-a2-codfw [[phab:T286787|T286787]]
* 20:42 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1179.eqiad.wmnet with reason: REIMAGE
* 14:40 topranks: Ran homer against asw-a-codfw virtual-chassis to change the config for all ports on dead switch asw-a2-codfw to disabled.
* 20:41 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1180.eqiad.wmnet with reason: REIMAGE
* 14:40 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 20:40 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1178.eqiad.wmnet with reason: REIMAGE
* 14:37 vgutierrez: vgutierrez@lvs2010:~$ sudo -i ifdown ens2f1np1
* 20:39 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1179.eqiad.wmnet with reason: REIMAGE
* 14:14 hnowlan@puppetmaster1001: conftool action : set/weight=5; selector: name=maps1004.eqiad.wmnet
* 20:39 andrew@deploy1002: Started deploy [horizon/deploy@17ea780]: display volume usage summaries
* 14:11 jiji@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=wdqs,name=eqiad
* 20:38 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1177.eqiad.wmnet with reason: REIMAGE
* 14:07 hnowlan@puppetmaster1001: conftool action : set/pooled=false; selector: dnsdisc=kartotherian,name=codfw
* 20:37 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1178.eqiad.wmnet with reason: REIMAGE
* 14:07 hnowlan@puppetmaster1001: conftool action : set/pooled=true; selector: dnsdisc=kartotherian,name=eqiad
* 20:35 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1177.eqiad.wmnet with reason: REIMAGE
* 13:45 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on maps2009.codfw.wmnet with reason: Service profiling tests
* 20:19 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw2238.codfw.wmnet
* 13:45 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on maps2009.codfw.wmnet with reason: Service profiling tests
* 20:08 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw2238.codfw.wmnet
* 13:44 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 20:07 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1176.eqiad.wmnet with reason: REIMAGE
* 13:02 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw143[0-3].eqiad.wmnet
* 20:05 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1176.eqiad.wmnet with reason: REIMAGE
* 12:56 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1429.eqiad.wmnet
* 20:05 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw2237.codfw.wmnet
* 12:49 mutante: mw1429 through mw1433 - initial puppet run, reboot, moving into production as appservers ([[phab:T279309|T279309]])
* 19:54 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw2237.codfw.wmnet
* 12:48 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw[1430-1433].eqiad.wmnet with reason: new host
* 19:51 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw2236.codfw.wmnet
* 12:48 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw[1430-1433].eqiad.wmnet with reason: new host
* 19:48 andrew@deploy1002: Finished deploy [horizon/deploy@3c2d1ee]: support VM resizing (duration: 03m 42s)
* 12:47 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1429.eqiad.wmnet with reason: new host
* 19:44 andrew@deploy1002: Started deploy [horizon/deploy@3c2d1ee]: support VM resizing
* 12:47 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1429.eqiad.wmnet with reason: new host
* 19:42 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw2236.codfw.wmnet
* 12:39 mutante: mw1412 through mw1428 - set to active in netbox ([[phab:T279309|T279309]])
* 19:42 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw2238.codfw.wmnet
* 12:39 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 19:42 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw2237.codfw.wmnet
* 12:36 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1429.eqiad.wmnet
* 19:42 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw2236.codfw.wmnet
* 12:35 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw143[0-3].eqiad.wmnet
* 19:38 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw2235.codfw.wmnet
* 12:35 dzahn@cumin1001: conftool action : set/weight=30; selector: name=mw143[0-3].eqiad.wmnet
* 19:29 mutante: testreduce1001 - rebooted, fdisk /dev/sdb, create partition table, create primary partition, mkfs.ext4 /dev/vdb1
* 12:35 dzahn@cumin1001: conftool action : set/weight=30; selector: name=mw1429.eqiad.wmnet
* 19:23 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw2235.codfw.wmnet
* 12:30 dcausse@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
* 19:18 andrew@deploy1002: Finished deploy [horizon/deploy@8967660]: clean up a reverted hack (duration: 03m 25s)
* 12:26 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw142[6-8].eqiad.wmnet
* 19:17 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw2234.codfw.wmnet
* 12:17 mutante: mw1426,mw1427,mw1428 - scap pull
* 19:14 andrew@deploy1002: Started deploy [horizon/deploy@8967660]: clean up a reverted hack
* 12:16 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw142[6-8].eqiad.wmnet
* 19:06 dancy@deploy1002: Synchronized php: group1 wikis to 1.36.0-wmf.35 (duration: 01m 26s)
* 12:16 dzahn@cumin1001: conftool action : set/weight=30; selector: name=mw142[6-8].eqiad.wmnet
* 19:05 mutante: ganeti1011 - rebooting VM testreduce1001 on ganeti level for [[phab:T277580|T277580]]
* 12:14 mutante: mw1426, mw1427, mw1428, rebooting, new API servers moving into production
* 19:04 dancy@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.36.0-wmf.35
* 12:12 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on mw[1426-1428].eqiad.wmnet with reason: new host
* 19:02 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw2234.codfw.wmnet
* 12:12 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on mw[1426-1428].eqiad.wmnet with reason: new host
* 19:01 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw2233.codfw.wmnet
* 12:03 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 18:58 catrope@deploy1002: Synchronized php-1.36.0-wmf.35/extensions/WikimediaEvents/: sessionTick: Tick right away on sessionReset ([[phab:T277515|T277515]]) (duration: 01m 10s)
* 11:33 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on maps2009.codfw.wmnet with reason: Service profiling tests
* 18:52 catrope@deploy1002: Synchronized php-1.36.0-wmf.35/vendor/: Bump wikimedia/parsoid to 0.13.0-a28 ([[phab:T276649|T276649]]) (duration: 01m 18s)
* 11:33 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on maps2009.codfw.wmnet with reason: Service profiling tests
* 18:43 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw2233.codfw.wmnet
* 11:29 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5 days, 8:00:00 on planet1002.eqiad.wmnet with reason: known issue
* 18:43 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw2235.codfw.wmnet
* 11:29 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 5 days, 8:00:00 on planet1002.eqiad.wmnet with reason: known issue
* 18:43 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw2234.codfw.wmnet
* 09:54 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=maps2007.codfw.wmnet
* 18:43 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw2233.codfw.wmnet
* 09:54 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=maps2007.codfw.wmnet
* 18:33 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw2232.codfw.wmnet
* 09:47 jayme: cordon kubestage1002.eqiad.wmnet as it currently does not feed logs to logstash
* 18:31 catrope@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Define Portal and Portal talk namespace for niawiki ([[phab:T277671|T277671]]) (duration: 01m 11s)
* 09:32 jelto: restart rsyslog on kubestage1001.eqiad.wmnet
* 18:10 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 09:02 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on es1031.eqiad.wmnet with reason: Rebooting for [[phab:T273281|T273281]]
* 18:10 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw2232.codfw.wmnet
* 09:02 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:30:00 on es1031.eqiad.wmnet with reason: Rebooting for [[phab:T273281|T273281]]
* 18:09 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw2231.codfw.wmnet
* 08:54 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on es1030.eqiad.wmnet with reason: Rebooting for [[phab:T273281|T273281]]
* 18:08 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 08:54 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:30:00 on es1030.eqiad.wmnet with reason: Rebooting for [[phab:T273281|T273281]]
* 17:56 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw2231.codfw.wmnet
* 08:28 kormat@cumin1001: dbctl commit (dc=all): 'Depooling db1127 due to RAM failures [[phab:T286763|T286763]]', diff saved to https://phabricator.wikimedia.org/P16827 and previous config saved to /var/cache/conftool/dbconfig/20210716-082829-kormat.json
* 17:52 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw2230.codfw.wmnet
* 00:06 hoo: Updated the Wikidata property suggester with data from the 2021-07-12 JSON dump (with pre-applied [[phab:T132839|T132839]] workarounds)
* 17:50 razzi: update firewall rules to allow mysql-sqoop in analytics-in4 to access clouddb1021 - https://gerrit.wikimedia.org/r/c/operations/homer/public/+/672797
* 17:47 ejegg: updated payments-wiki from {{Gerrit|0405ea1723}} to {{Gerrit|b06009c099}}
* 17:41 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw2230.codfw.wmnet
* 17:35 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 17:28 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 17:09 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 17:04 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 16:50 andrew@deploy1002: Finished deploy [horizon/deploy@8c50f27]: more support for disabled flavors (duration: 02m 32s)
* 16:48 andrew@deploy1002: Started deploy [horizon/deploy@8c50f27]: more support for disabled flavors
* 16:45 andrew@deploy1002: Finished deploy [horizon/deploy@8c50f27]: more support for disabled flavors (duration: 00m 07s)
* 16:45 andrew@deploy1002: Started deploy [horizon/deploy@8c50f27]: more support for disabled flavors
* 16:44 andrew@deploy1002: Finished deploy [horizon/deploy@e4fd934]: more support for disabled flavors (duration: 00m 07s)
* 16:44 andrew@deploy1002: Started deploy [horizon/deploy@e4fd934]: more support for disabled flavors
* 16:38 effie: upgrade memcached on mc1025, mc2025
* 16:06 dancy@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.36.0-wmf.35
* 16:04 dancy@deploy1002: Synchronized php-1.36.0-wmf.35/includes/Revision/RevisionRecord.php: (no justification provided) (duration: 00m 58s)
* 15:54 ejegg: updated standalone SmashPig deployment from {{Gerrit|58b070db1a}} to {{Gerrit|250a8570d1}}
* 15:23 jmm@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host dbmonitor1002.wikimedia.org
* 14:56 jmm@cumin1001: START - Cookbook sre.ganeti.makevm for new host dbmonitor1002.wikimedia.org
* 14:45 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host testreduce1001.eqiad.wmnet
* 14:25 marostegui@cumin1001: dbctl commit (dc=all): 'db1087 (re)pooling @ 100%: Slowly repool db1087', diff saved to https://phabricator.wikimedia.org/P14935 and previous config saved to /var/cache/conftool/dbconfig/20210317-142532-root.json
* 14:18 jayme: rebooting restreduce1001 for [[phab:T277580|T277580]]
* 14:17 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single for host testreduce1001.eqiad.wmnet
* 14:10 marostegui@cumin1001: dbctl commit (dc=all): 'db1087 (re)pooling @ 75%: Slowly repool db1087', diff saved to https://phabricator.wikimedia.org/P14934 and previous config saved to /var/cache/conftool/dbconfig/20210317-141028-root.json
* 14:02 jayme@cumin1001: conftool action : set/pooled=true; selector: name=codfw,dnsdisc=sessionstore
* 14:02 jayme@cumin1001: conftool action : set/pooled=true; selector: name=codfw,dnsdisc=eventgate-analytics
* 14:01 otto@deploy1002: Finished deploy [analytics/refinery@d2f1b28] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@d2f1b28] (duration: 04m 19s)
* 13:59 razzi@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-conf1001.eqiad.wmnet
* 13:58 moritzm: added bullseye tftpboot environment [[phab:T275873|T275873]]
* 13:56 otto@deploy1002: Started deploy [analytics/refinery@d2f1b28] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@d2f1b28]
* 13:56 otto@deploy1002: Finished deploy [analytics/refinery@d2f1b28] (thin): Regular analytics weekly train THIN [analytics/refinery@d2f1b28] (duration: 00m 06s)
* 13:56 otto@deploy1002: Started deploy [analytics/refinery@d2f1b28] (thin): Regular analytics weekly train THIN [analytics/refinery@d2f1b28]
* 13:55 marostegui@cumin1001: dbctl commit (dc=all): 'db1087 (re)pooling @ 50%: Slowly repool db1087', diff saved to https://phabricator.wikimedia.org/P14933 and previous config saved to /var/cache/conftool/dbconfig/20210317-135522-root.json
* 13:54 razzi@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-conf1001.eqiad.wmnet
* 13:52 razzi@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-conf1003.eqiad.wmnet
* 13:52 otto@deploy1002: Finished deploy [analytics/refinery@d2f1b28]: Regular analytics weekly train [analytics/refinery@d2f1b28] (duration: 11m 36s)
* 13:47 jayme@cumin1001: conftool action : set/pooled=true; selector: name=codfw,dnsdisc=eventgate-analytics-external
* 13:47 jayme@cumin1001: conftool action : set/pooled=true; selector: name=codfw,dnsdisc=eventgate-logging-external
* 13:47 jayme@cumin1001: conftool action : set/pooled=true; selector: name=codfw,dnsdisc=api-gateway
* 13:47 jayme@cumin1001: conftool action : set/pooled=true; selector: name=codfw,dnsdisc=echostore
* 13:47 razzi@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-conf1003.eqiad.wmnet
* 13:46 razzi@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-conf1002.eqiad.wmnet
* 13:41 razzi@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-conf1002.eqiad.wmnet
* 13:40 otto@deploy1002: Started deploy [analytics/refinery@d2f1b28]: Regular analytics weekly train [analytics/refinery@d2f1b28]
* 13:40 marostegui@cumin1001: dbctl commit (dc=all): 'db1087 (re)pooling @ 25%: Slowly repool db1087', diff saved to https://phabricator.wikimedia.org/P14932 and previous config saved to /var/cache/conftool/dbconfig/20210317-134018-root.json
* 13:38 kormat: stopping db2137:s5 [[phab:T277632|T277632]]
* 13:33 kormat: stopping db2089:s5 [[phab:T277632|T277632]]
* 13:31 otto@deploy1002: Finished deploy [analytics/aqs/deploy@3e92346]: deploy aqs as part of train - [[phab:T207171|T207171]], [[phab:T263697|T263697]] (duration: 03m 24s)
* 13:27 otto@deploy1002: Started deploy [analytics/aqs/deploy@3e92346]: deploy aqs as part of train - [[phab:T207171|T207171]], [[phab:T263697|T263697]]
* 13:23 jynus: stopping s5 instance on db2099 and restoring from backup [[phab:T277632|T277632]]
* 13:17 jayme@cumin1001: conftool action : set/pooled=true; selector: name=codfw,dnsdisc=eventstreams
* 13:14 jayme@cumin1001: conftool action : set/pooled=true; selector: name=codfw,dnsdisc=eventstreams-internal
* 13:13 jayme@cumin1001: conftool action : set/pooled=true; selector: name=codfw,dnsdisc=mobileapps
* 13:13 jayme@cumin1001: conftool action : set/pooled=true; selector: name=codfw,dnsdisc=wikifeeds
* 13:13 jayme@cumin1001: conftool action : set/pooled=true; selector: name=codfw,dnsdisc=termbox
* 13:12 moritzm: installing tiff security updates
* 12:45 jayme@cumin1001: conftool action : set/pooled=true; selector: name=codfw,dnsdisc=similar-users
* 12:45 jayme@cumin1001: conftool action : set/pooled=true; selector: name=codfw,dnsdisc=push-notifications
* 12:45 jayme@cumin1001: conftool action : set/pooled=true; selector: name=codfw,dnsdisc=proton
* 12:45 jayme@cumin1001: conftool action : set/pooled=true; selector: name=codfw,dnsdisc=linkrecommendation
* 12:44 jayme@cumin1001: conftool action : set/pooled=true; selector: name=codfw,dnsdisc=blubberoid
* 12:44 jayme@cumin1001: conftool action : set/pooled=true; selector: name=codfw,dnsdisc=apertium
* 12:11 jayme@cumin1001: conftool action : set/pooled=true; selector: name=codfw,dnsdisc=mathoid
* 12:10 jayme@cumin1001: conftool action : set/pooled=true; selector: name=codfw,dnsdisc=eventgate-main
* 11:49 marostegui: Deploy schema change on s8, lag will appear on wiki replicas [[phab:T276150|T276150]] [[phab:T276156|T276156]]
* 11:47 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1087 for schema change', diff saved to https://phabricator.wikimedia.org/P14931 and previous config saved to /var/cache/conftool/dbconfig/20210317-114746-marostegui.json
* 11:46 marostegui@cumin1001: dbctl commit (dc=all): 'db1109 (re)pooling @ 100%: Slowly repool db1109', diff saved to https://phabricator.wikimedia.org/P14930 and previous config saved to /var/cache/conftool/dbconfig/20210317-114601-root.json
* 11:30 marostegui@cumin1001: dbctl commit (dc=all): 'db1109 (re)pooling @ 75%: Slowly repool db1109', diff saved to https://phabricator.wikimedia.org/P14929 and previous config saved to /var/cache/conftool/dbconfig/20210317-113057-root.json
* 11:20 jayme: switch restbase-async back to codfw (the newly initialized cluster)
* 11:17 jayme@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=restbase-async,name=codfw
* 11:17 jayme@cumin1001: conftool action : set/pooled=false; selector: dnsdisc=restbase-async,name=eqiad
* 11:15 marostegui@cumin1001: dbctl commit (dc=all): 'db1109 (re)pooling @ 50%: Slowly repool db1109', diff saved to https://phabricator.wikimedia.org/P14928 and previous config saved to /var/cache/conftool/dbconfig/20210317-111553-root.json
* 11:09 moritzm: restarting tomcat on idp.wikimedia.org
* 11:00 marostegui@cumin1001: dbctl commit (dc=all): 'db1109 (re)pooling @ 25%: Slowly repool db1109', diff saved to https://phabricator.wikimedia.org/P14927 and previous config saved to /var/cache/conftool/dbconfig/20210317-110050-root.json
* 09:59 moritzm: imported PHP 5.6.40 to thirdparty/php56 [[phab:T224589|T224589]]
* 09:47 vgutierrez: restart varnish-fe on cp5011
* 09:24 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1109 for schema change', diff saved to https://phabricator.wikimedia.org/P14926 and previous config saved to /var/cache/conftool/dbconfig/20210317-092443-marostegui.json
* 09:23 marostegui@cumin1001: dbctl commit (dc=all): 'db1114 (re)pooling @ 100%: Slowly repool db1114', diff saved to https://phabricator.wikimedia.org/P14925 and previous config saved to /var/cache/conftool/dbconfig/20210317-092357-root.json
* 09:08 marostegui@cumin1001: dbctl commit (dc=all): 'db1114 (re)pooling @ 75%: Slowly repool db1114', diff saved to https://phabricator.wikimedia.org/P14924 and previous config saved to /var/cache/conftool/dbconfig/20210317-090853-root.json
* 09:04 jayme@cumin1001: conftool action : set/pooled=true; selector: name=codfw,dnsdisc=recommendation-api
* 09:04 jayme@cumin1001: conftool action : set/pooled=true; selector: name=codfw,dnsdisc=cxserver
* 09:04 jayme@cumin1001: conftool action : set/pooled=true; selector: name=codfw,dnsdisc=citoid
* 09:01 marostegui@cumin1001: dbctl commit (dc=all): 'db1082 (re)pooling @ 100%: Slowly repool db1082', diff saved to https://phabricator.wikimedia.org/P14923 and previous config saved to /var/cache/conftool/dbconfig/20210317-090108-root.json
* 08:58 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1084 [[phab:T276302|T276302]]', diff saved to https://phabricator.wikimedia.org/P14922 and previous config saved to /var/cache/conftool/dbconfig/20210317-085852-marostegui.json
* 08:53 marostegui@cumin1001: dbctl commit (dc=all): 'db1114 (re)pooling @ 50%: Slowly repool db1114', diff saved to https://phabricator.wikimedia.org/P14921 and previous config saved to /var/cache/conftool/dbconfig/20210317-085350-root.json
* 08:46 marostegui@cumin1001: dbctl commit (dc=all): 'db1082 (re)pooling @ 75%: Slowly repool db1082', diff saved to https://phabricator.wikimedia.org/P14920 and previous config saved to /var/cache/conftool/dbconfig/20210317-084605-root.json
* 08:38 marostegui@cumin1001: dbctl commit (dc=all): 'db1114 (re)pooling @ 25%: Slowly repool db1114', diff saved to https://phabricator.wikimedia.org/P14919 and previous config saved to /var/cache/conftool/dbconfig/20210317-083846-root.json
* 08:31 marostegui@cumin1001: dbctl commit (dc=all): 'db1082 (re)pooling @ 50%: Slowly repool db1082', diff saved to https://phabricator.wikimedia.org/P14918 and previous config saved to /var/cache/conftool/dbconfig/20210317-083101-root.json
* 08:15 marostegui@cumin1001: dbctl commit (dc=all): 'db1082 (re)pooling @ 25%: Slowly repool db1082', diff saved to https://phabricator.wikimedia.org/P14917 and previous config saved to /var/cache/conftool/dbconfig/20210317-081557-root.json
* 07:50 godog: swift eqiad-prod: less weight for ms-be[1019-1026] - [[phab:T272836|T272836]]
* 07:34 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1114 for schema change', diff saved to https://phabricator.wikimedia.org/P14916 and previous config saved to /var/cache/conftool/dbconfig/20210317-073403-marostegui.json
* 07:30 marostegui@cumin1001: dbctl commit (dc=all): 'db1111 (re)pooling @ 100%: Slowly repool db1111', diff saved to https://phabricator.wikimedia.org/P14915 and previous config saved to /var/cache/conftool/dbconfig/20210317-073024-root.json
* 07:15 marostegui@cumin1001: dbctl commit (dc=all): 'db1111 (re)pooling @ 75%: Slowly repool db1111', diff saved to https://phabricator.wikimedia.org/P14914 and previous config saved to /var/cache/conftool/dbconfig/20210317-071520-root.json
* 07:00 marostegui@cumin1001: dbctl commit (dc=all): 'db1111 (re)pooling @ 50%: Slowly repool db1111', diff saved to https://phabricator.wikimedia.org/P14913 and previous config saved to /var/cache/conftool/dbconfig/20210317-070017-root.json
* 06:52 marostegui: Stop MySQL on db1082 to clone db1161 (lag will appear on s5 on wikireplicas) - [[phab:T258361|T258361]]
* 06:51 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1082 to clone db1161 [[phab:T258361|T258361]]', diff saved to https://phabricator.wikimedia.org/P14911 and previous config saved to /var/cache/conftool/dbconfig/20210317-065146-marostegui.json
* 06:46 marostegui@cumin1001: dbctl commit (dc=all): 'Pool db2150 into s7 [[phab:T275633|T275633]]', diff saved to https://phabricator.wikimedia.org/P14910 and previous config saved to /var/cache/conftool/dbconfig/20210317-064606-marostegui.json
* 06:45 marostegui@cumin1001: dbctl commit (dc=all): 'db1111 (re)pooling @ 25%: Slowly repool db1111', diff saved to https://phabricator.wikimedia.org/P14909 and previous config saved to /var/cache/conftool/dbconfig/20210317-064513-root.json
* 06:03 marostegui@cumin1001: dbctl commit (dc=all): 'Add db2150 to s7, depooled [[phab:T275633|T275633]]', diff saved to https://phabricator.wikimedia.org/P14908 and previous config saved to /var/cache/conftool/dbconfig/20210317-060358-marostegui.json
* 05:42 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1111 for schema change', diff saved to https://phabricator.wikimedia.org/P14907 and previous config saved to /var/cache/conftool/dbconfig/20210317-054206-marostegui.json
* 02:25 eileen: civicrm revision changed from {{Gerrit|8c137b94f0}} to {{Gerrit|99bf1c9210}}, config revision is {{Gerrit|ef2767ab91}}
* 01:55 eileen: civicrm revision changed from {{Gerrit|550be50105}} to {{Gerrit|8c137b94f0}}, config revision is {{Gerrit|ef2767ab91}}


== 2021-03-16 ==
== 2021-07-15 ==
* 23:56 krinkle@deploy1002: Synchronized php-1.36.0-wmf.35/includes/Revision/: {{Gerrit|I8619ab9e92b}}, [[phab:T277362|T277362]], [[phab:T275531|T275531]] (duration: 00m 58s)
* 23:32 brennen: checking stashbot: [[phab:T286756|T286756]]
* 23:51 krinkle@deploy1002: Synchronized php-1.36.0-wmf.34/extensions/Scribunto/: {{Gerrit|I84e8732d8d}} - tmp logging (duration: 00m 58s)
* 23:28 brennen@deploy1002: Synchronized php-1.37.0-wmf.14/extensions/GlobalWatchlist/modules/watchlistUtils.js: Backport: [[gerrit:704815{{!}}Fix creation of mw.Message objects (T286385)]] (duration: 00m 57s)
* 23:47 Krinkle: There is an uncommitted dirty diff in /srv/mediawiki-staging/php-1.36.0-wmf.34/extensions/WikimediaMaintenance/createExtensionTables.php
* 20:44 urbanecm: [urbanecm@mwmaint2002 /srv/mediawiki/php]$ time mwscript extensions/GrowthExperiments/maintenance/updateMenteeData.php --wiki=viwiki # [[phab:T285811|T285811]]
* 23:31 krinkle@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|I1ca4f30c2}}, [[phab:T262612|T262612]] (duration: 00m 57s)
* 20:26 urbanecm: [urbanecm@mwmaint2002 /srv/mediawiki/php]$ time mwscript extensions/GrowthExperiments/maintenance/updateMenteeData.php -
* 23:22 krinkle@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|Icd6635cb302cc}}, [[phab:T277332|T277332]] (duration: 00m 58s)
* 23:07 krinkle@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|I8d8c94d95c6}} (duration: 00m 59s)
* 23:03 twentyafterfour: applied hotfix to phabricator/src/infrastructure/customfield/storage/PhabricatorCustomFieldStorage.php and restarted php-fpm
* 23:02 krinkle@deploy1002: Synchronized wmf-config/CommonSettings.php: {{Gerrit|I4097cbcb1d5}} (duration: 00m 59s)
* 22:59 krinkle@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|Ie24eb2077}


== 2021-03-15 ==
== 2021-07-14 ==
* 23:31 legoktm@deploy1002: Synchronized wmf-config/CommonSettings.php: Remove back-compat from when IRC feed servers was a string ([[phab:T224579|T224579]]) (duration: 00m 59s)
* 23:23 eileen: civicrm revision changed from {{Gerrit|b1c63470bb}} to {{Gerrit|e0d53c92b5}}, config revision is {{Gerrit|bb405c5232}}
* 23:24 legoktm@deploy1002: Synchronized wmf-config/: Define IRC feed servers as an array in <nowiki>{</nowiki>Production,Labs<nowiki>}</nowiki>Services.php ([[phab:T224579|T224579]]) (duration: 00m 59s)
* 21:19 brennen@deploy1002: Synchronized php: group1 wikis to 1.37.0-wmf.14 (duration: 01m 05s)
* 23:23 legoktm@deploy1002: Synchronized wmf-config/CommonSettings.php: Support having multiple IRC feed servers ([[phab:T224579|T224579]]) (duration: 00m 58s)
* 21:18 brennen@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.37.0-wmf.14
* 23:13 legoktm@deploy1002: conftool action : set/pooled=inactive; selector: name=mw2225.codfw.wmnet
* 21:08 ladsgroup@deploy1002: Synchronized php-1.37.0-wmf.12/includes/user/User.php: Backport: [[gerrit:704609{{!}}Move saving user options to onTransactionPreCommitOrIdle (T286521)]] (duration: 01m 05s)
* 23:11 legoktm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: GlobalWatchlist: allow watching up to 50 sites ([[phab:T276195|T276195]]) (duration: 01m 04s)
* 20:58 ladsgroup@deploy1002: Synchronized php-1.37.0-wmf.14/includes/user/User.php: Backport: [[gerrit:704608{{!}}Move saving user options to onTransactionPreCommitOrIdle (T286521)]] (duration: 01m 05s)
* 21:36 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2239.codfw.wmnet
* 20:51 dancy@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.37.0-wmf.14
* 21:36 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2238.codfw.wmnet
* 19:33 ladsgroup@deploy1002: Synchronized php-1.37.0-wmf.14/resources: Backport: [[gerrit:704606{{!}}Fix deprecated offset() on invalid DOM (T185629)]] (duration: 01m 07s)
* 21:36 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2237.codfw.wmnet
* 19:31 andrew@deploy1002: Finished deploy [horizon/deploy@156a984]: fix trove-dashboard bug (duration: 04m 18s)
* 21:35 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2236.codfw.wmnet
* 19:26 andrew@deploy1002: Started deploy [horizon/deploy@156a984]: fix trove-dashboard bug
* 21:02 ebernhardson@deploy1002: Finished deploy [wikimedia/discovery/analytics@4300929]: convert_to_esbulk: Accept partial hour timestamps (duration: 03m 02s)
* 19:17 nskaggs@cumin1001: END (PASS) - Cookbook wmcs.wikireplicas.add_wiki (exit_code=0)
* 20:59 ebernhardson@deploy1002: Started deploy [wikimedia/discovery/analytics@4300929]: convert_to_esbulk: Accept partial hour timestamps
* 19:17 nskaggs@cumin1001: Added views for new wiki: dagwiki [[phab:T284456|T284456]]
* 20:55 legoktm: re-enabled puppet on kubestage2001, uncordoned kubestage2002
* 18:55 nskaggs@cumin1001: START - Cookbook wmcs.wikireplicas