You are browsing a read-only backup copy of Wikitech. The primary site can be found at wikitech.wikimedia.org

Difference between revisions of "Server Admin Log"

From Wikitech-static
Jump to navigation Jump to search
imported>Labslogbot
(rolling restart of cassandra instances to rule out a single node in funky state causing elevated p99 latency (gwicke))
imported>Stashbot
(ladsgroup@deploy1002: Synchronized php-1.37.0-wmf.23/includes/libs/rdbms/database/Database.php: (no justification provided) (duration: 00m 57s))
Line 1: Line 1:
== June 24 ==
== 2021-09-18 ==
* 01:01 gwicke: rolling restart of cassandra instances to rule out a single node in funky state causing elevated p99 latency
* 01:47 ladsgroup@deploy1002: Synchronized php-1.37.0-wmf.23/includes/libs/rdbms/database/Database.php: (no justification provided) (duration: 00m 57s)
* 00:43 ori: experimenting with httpd on mw1041 again
* 01:01 ladsgroup@deploy1002: Synchronized php-1.37.0-wmf.23/includes/libs/rdbms/database/Database.php: (no justification provided) (duration: 01m 03s)
* 00:19 gwicke: rolling restart of restbase instances to rule out backend connections as a source for high p99 latencies
* 00:14 ori: experimenting with HHVM shutdown via /stop on the admin server on mw1041


== June 23 ==
== 2021-09-17 ==
* 23:38 logmsgbot: ori Finished scap: scapping to all apaches for --restart test (duration: 07m 03s)
* 21:28 legoktm@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 23:30 logmsgbot: ori Started scap: scapping to all apaches for --restart test
* 21:19 legoktm@cumin1001: START - Cookbook sre.dns.netbox
* 23:24 bblack: nginxes all updated for ssl stapling bugfix
* 19:00 hnowlan@cumin1001: END (PASS) - Cookbook sre.postgresql.postgres-init (exit_code=0)
* 23:24 logmsgbot: ori Finished scap: scapping to scap-test dsh group for --restart test (duration: 06m 02s)
* 17:02 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on cloudcephosd1022.eqiad.wmnet with reason: REIMAGE
* 23:18 logmsgbot: ori Started scap: scapping to scap-test dsh group for --restart test
* 17:02 hnowlan@cumin1001: START - Cookbook sre.postgresql.postgres-init
* 23:16 logmsgbot: ori scap aborted: scapping to scap-test dsh group for --restart test (duration: 00m 06s)
* 17:00 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd1022.eqiad.wmnet with reason: REIMAGE
* 23:16 logmsgbot: ori Started scap: scapping to scap-test dsh group for --restart test
* 16:48 hnowlan@cumin1001: END (PASS) - Cookbook sre.postgresql.postgres-init (exit_code=0)
* 22:14 logmsgbot: legoktm Synchronized php-1.26wmf11/extensions/SyntaxHighlight_GeSHi/SyntaxHighlight_GeSHi.class.php: RejectParserCacheValue may pass a WikiPage or Article (duration: 00m 13s)
* 16:27 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 22:07 mutante: tmp. disabling puppet on mw1033
* 16:25 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 21:53 logmsgbot: legoktm Synchronized php-1.26wmf11/extensions/SyntaxHighlight_GeSHi/SyntaxHighlight_GeSHi.class.php: (no message) (duration: 00m 15s)
* 16:11 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 21:50 logmsgbot: ori Synchronized php-1.26wmf11/includes/parser/ParserCache.php: (no message) (duration: 00m 12s)
* 16:04 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 21:40 mutante: starting instance planet1001 on ganeti1003 - cant get console
* 14:49 hnowlan@cumin1001: START - Cookbook sre.postgresql.postgres-init
* 21:40 logmsgbot: legoktm Synchronized php-1.26wmf11/includes/parser/ParserCache.php: (no message) (duration: 00m 13s)
* 14:29 hnowlan@cumin1001: END (PASS) - Cookbook sre.postgresql.postgres-init (exit_code=0)
* 21:36 bd808: updated scap to 33f3002 (Ensure that the minimum batch size used by cluster_ssh is 1)
* 13:06 moritzm: installing 4.9.272 kernels on stretch hosts (no reboots yet)
* 21:34 logmsgbot: ori Synchronized php-1.26wmf11/extensions/SyntaxHighlight_GeSHi: 3c8bb2c493: Update SyntaxHighlight_GeSHi for cherry-pick (duration: 00m 13s)
* 11:28 hnowlan@cumin1001: START - Cookbook sre.postgresql.postgres-init
* 20:32 logmsgbot: twentyafterfour rebuilt wikiversions.cdb and synchronized wikiversions files: group0 wikis to 1.26wmf11
* 11:14 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 20:19 logmsgbot: mattflaschen Synchronized wmf-config/InitialiseSettings-labs.php: Beta-only change to add Flow_test to enwiki (duration: 00m 11s)
* 11:09 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 19:59 logmsgbot: ori scap failed: OSError [Errno 10] No child processes (duration: 01m 46s)
* 09:37 milimetric@deploy1002: Finished deploy [analytics/refinery@37e904a] (thin): Only syncing sanitize allowlist, deploying THIN for consistency (duration: 00m 07s)
* 19:58 logmsgbot: ori Started scap: (no message)
* 09:37 milimetric@deploy1002: Started deploy [analytics/refinery@37e904a] (thin): Only syncing sanitize allowlist, deploying THIN for consistency
* 19:52 ori: updated scap to master
* 09:36 milimetric@deploy1002: Finished deploy [analytics/refinery@37e904a]: Only syncing sanitize allowlist (duration: 17m 43s)
* 19:11 ori: running apache graceful-stop on mw1042 to test mod_status behavior during graceful stop
* 09:19 milimetric@deploy1002: Started deploy [analytics/refinery@37e904a]: Only syncing sanitize allowlist
* 19:02 logmsgbot: twentyafterfour Finished scap: New deployment branch: 1.26wmf11 try #2 (13 apaches failed) (duration: 03m 50s)
* 08:00 jayme: restarting php-fpm on wtp1037 and wtp1030
* 18:58 logmsgbot: twentyafterfour Started scap: New deployment branch: 1.26wmf11 try #2 (13 apaches failed)
* 02:28 ryankemper: [[phab:T290330|T290330]] [Remove WDQS codfw ~hourly restarts] Successfully rolled out to rest of fleet `sudo cumin 'C:query_service::crontasks' 'sudo run-puppet-agent --force && sudo systemctl reset-failed wdqs-restart-hourly-w-random-delay.timer'`
* 18:53 logmsgbot: twentyafterfour Finished scap: New deployment branch: 1.26wmf11 (duration: 26m 37s)
* 02:22 ryankemper: [[phab:T290330|T290330]] [Remove WDQS codfw ~hourly restarts] `wdqs2001` and `wdqs2004` look fine after running `sudo systemctl reset-failed wdqs-restart-hourly-w-random-delay.timer` to clean up dangling timer
* 18:31 godog: start rolling-downgrade of cassandra to 2.1.3 T102015
* 01:55 ryankemper: [[phab:T290330|T290330]] [Remove WDQS codfw ~hourly restarts] Testing on arbitrary codfw host: `ryankemper@wdqs2001:~$ sudo run-puppet-agent`
* 18:27 logmsgbot: twentyafterfour Started scap: New deployment branch: 1.26wmf11
* 01:48 ryankemper: [[phab:T290330|T290330]] [Remove WDQS codfw ~hourly restarts] `sudo cumin 'C:query_service::crontasks' 'sudo disable-puppet "Stop doing wdqs codfw ~hourly restarts - [[phab:T290330|T290330]]"'`
* 18:13 logmsgbot: ori Finished scap: (no message) (duration: 04m 34s)
* 00:04 legoktm@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'shellbox-media' for release 'main' .
* 18:11 paravoid: reloading nginx on all cp* for reuseport
* 00:01 legoktm@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'shellbox-media' for release 'main' .
* 18:08 logmsgbot: ori Started scap: (no message)
* 17:57 ori: repooled scap-test servers (mw1170-mw1175 and mw1270-mw1275)
* 17:16 logmsgbot: ori Finished scap: (no message) (duration: 01m 42s)
* 17:14 logmsgbot: ori Started scap: (no message)
* 17:10 logmsgbot: ori Finished scap: (no message) (duration: 01m 34s)
* 17:09 logmsgbot: ori Started scap: (no message)
* 17:06 logmsgbot: ori scap aborted: (no message) (duration: 01m 23s)
* 17:04 logmsgbot: ori Started scap: (no message)
* 16:53 logmsgbot: bd808 Finished scap: no-op sync to scap-test dsh group; Testing HHVM restart take 4 (duration: 01m 30s)
* 16:52 logmsgbot: bd808 Started scap: no-op sync to scap-test dsh group; Testing HHVM restart take 4
* 16:45 cscott: updated OCG to version db7a56965233a74c73917c78b5c8c84c867321d9
* 16:37 logmsgbot: bd808 Finished scap: no-op sync to scap-test dsh group; Testing HHVM restart take 3 (duration: 01m 12s)
* 16:35 logmsgbot: bd808 Started scap: no-op sync to scap-test dsh group; Testing HHVM restart take 3
* 16:35 bd808: updated scap to da64a65 (Cast pid read from file to an int)
* 16:26 logmsgbot: bd808 Finished scap: no-op sync to scap-test dsh group; Testing HHVM restart take 2 (duration: 01m 26s)
* 16:25 logmsgbot: bd808 Started scap: no-op sync to scap-test dsh group; Testing HHVM restart take 2
* 16:22 bd808: updated scap to 947b93f (Fix reference to _get_apache_list)
* 16:12 logmsgbot: bd808 scap failed: AttributeError 'Scap' object has no attribute '_get_apache_list' (duration: 02m 15s)
* 16:10 logmsgbot: bd808 Started scap: no-op sync to scap-test dsh group; Testing HHVM restart
* 16:01 paravoid: staggered upgrade of cp* fleet to nginx 1.9.2
* 15:57 logmsgbot: thcipriani Synchronized wmf-config/InitialiseSettings.php: Follow-up 94e5fd2: Default wmgUseContentTranslation true only on Wikipedias [[gerrit:220161]] (duration: 00m 16s)
* 15:49 jynus: rebooting es1004
* 15:09 logmsgbot: thcipriani Synchronized wmf-config/InitialiseSettings.php: SWAT: CX: Enable CX as default except where it is not deployed [[gerrit:220078]] (duration: 00m 12s)
* 15:04 logmsgbot: thcipriani Synchronized wmf-config/InitialiseSettings.php: SWAT: Enable 'frwiki-recommender' campaign in frwiki [[gerrit:220071]] (duration: 00m 13s)
* 14:54 paravoid: reprepro: including nginx 1.9.2-1~bpo8+1 to jessie-wikimedia/backports
* 14:39 logmsgbot: jynus Synchronized wmf-config/db-eqiad.php: Repool es1003, depool es1004 (duration: 00m 12s)
* 14:04 cscott: reverted OCG to version ca4f64852de5b1de782b292b50038fbd2dd84266 (bundler failing with exit code 8)
* 13:57 cscott: updated OCG to version d7c698d5bf730d34057945e912ac75dc542dd788
* 13:44 logmsgbot: krenair Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/209744/ (duration: 00m 13s)
* 13:44 logmsgbot: krenair Synchronized wmf-config/CommonSettings.php: https://gerrit.wikimedia.org/r/#/c/209744/ (duration: 00m 12s)
* 12:54 moritzm: ssh on precise hosts has been updated to a backport of 6.6p1-2ubuntu2 (the version from trusty). this allows us to use modern crypto (plus labs can simplify key handling)
* 12:45 jynus: rebooting es1003
* 12:18 moritzm: uploaded openssh_6.6p1-2ubuntu2~wmfprecise2 to precise-wikimedia on apt.wikimedia.org
* 12:10 logmsgbot: hoo Synchronized arbitraryaccess.dblist: Arbitrary access for ruwiki and cswiki. T102122 (duration: 00m 12s)
* 11:33 logmsgbot: jynus Synchronized wmf-config/db-eqiad.php: Repool es1002, depool es1003 (part 2/2) (duration: 00m 12s)
* 11:25 logmsgbot: jynus Synchronized wmf-config/db-eqiad.php: Repool es1002, depool es1003 (duration: 00m 12s)
* 09:41 moritzm: updated jsch on gallium and lanthanum to support modern SSH key exchange in Jenkins (actually that happened yesterday, but I forgot to log it back then)
* 09:41 moritzm: added jsch_0.1.50-1ubuntu1~wmfprecise1 to precise-wikimedia on carbon
* 09:09 akosiaris: failing over etherpad to db1016
* 04:53 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Tue Jun 23 04:53:17 UTC 2015 (duration 53m 16s)
* 03:33 springle: xtrabackup clone db2023 to db1045
* 02:26 logmsgbot: LocalisationUpdate completed (1.26wmf10) at 2015-06-23 02:26:44+00:00
* 02:23 logmsgbot: l10nupdate Synchronized php-1.26wmf10/cache/l10n: (no message) (duration: 06m 47s)
* 01:17 logmsgbot: krinkle Synchronized docroot and w: (no message) (duration: 00m 12s)
* 01:00 bd808: Pruned virt1000 from trebuchet minions list: redis-cli srem "deploy:scap/scap:minions" virt1000.wikimedia.org


== June 22 ==
== 2021-09-16 ==
* 23:42 gwicke: restarted Cassandra on restbase1006
* 23:58 legoktm@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'shellbox-media' for release 'main' .
* 23:27 logmsgbot: catrope Synchronized php-1.26wmf10/extensions/MobileFrontend: For real this time (duration: 00m 14s)
* 23:51 ryankemper: [[phab:T273673|T273673]] All looks good, re-enabling puppet and running on rest of fleet: `sudo cumin 'R:Class = elasticsearch::log::hot_threads' 'sudo run-puppet-agent --force'`
* 23:27 logmsgbot: catrope Synchronized php-1.26wmf10/extensions/Gather: For real this time (duration: 00m 13s)
* 23:44 ryankemper: [[phab:T273673|T273673]] The associated crons are gone and I see the new systemd timers for both gc-cleanup and the hot threads logger
* 23:17 logmsgbot: catrope Synchronized php-1.26wmf10/extensions/Gather: SWAT (duration: 00m 12s)
* 23:39 ryankemper: [[phab:T273673|T273673]] Testing elasticsearch cron->systemd timer-job changes on canary instance `ryankemper@elastic1064:~$ sudo run-puppet-agent --force`
* 23:17 logmsgbot: catrope Synchronized php-1.26wmf10/extensions/MobileFrontend/: SWAT (duration: 00m 15s)
* 23:37 ryankemper: [[phab:T273673|T273673]] Disabling puppet on elasticsearch hosts `sudo cumin 'R:Class = elasticsearch::log::hot_threads' 'sudo disable-puppet "https://gerrit.wikimedia.org/r/c/operations/puppet/+/721413 - [[phab:T273673|T273673]]"'`
* 23:12 logmsgbot: catrope Synchronized wmf-config/InitialiseSettings.php: Enable TinyRGB ICC profile swapping on testwiki (duration: 00m 13s)
* 23:21 legoktm@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
* 22:51 logmsgbot: ori Synchronized php-1.26wmf10/resources/src/mediawiki/mediawiki.Title.js: I0e5f2d3b2: Fix undeclared dependency on jquery.mwExtension (duration: 00m 12s)
* 23:21 legoktm@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
* 22:45 gwicke: restarting Cassandra on restbase1005 to get the metrics back
* 23:19 legoktm@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
* 22:37 gwicke: restarting Cassandra on restbase1004 to get the metrics back
* 23:18 legoktm@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
* 22:33 gwicke: restarting Cassandra on restbase1003 to get the metrics back
* 23:18 legoktm@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
* 22:24 gwicke: restarting Cassandra on restbase1002 to get the metrics back
* 23:17 legoktm@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
* 22:19 bd808: scap error "@ERROR: access denied to common from localhost (127.0.0.1)" from mw2187 and mw2080 on sync-file test.
* 23:17 legoktm@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
* 22:17 logmsgbot: bd808 Synchronized README: Testing sync-file after scap update (duration: 00m 12s)
* 23:16 legoktm@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
* 22:08 RoanKattouw: Deployed patch for T103054
* 22:45 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 21:59 godog: reboot restbase1008
* 22:40 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 21:56 bd808: updated scap to 81b7c14 (Move dsh group file names to config)
* 22:38 legoktm@deploy1002: Finished scap: i18n for restoring deprecated token APIs (duration: 15m 30s)
* 21:55 bd808: trebuchet checkout for scap/scap failed on 23 hosts: mw1104, mw1222, mw2009, mw2011, mw2021, mw2028, mw2031, mw2034, mw2069, mw2076, mw2080, mw2086, mw2095, mw2099, mw2120, mw2127, mw2131, mw2136, mw2170, mw2187, mw2189, mw2197, virt1000
* 22:30 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 21:50 bd808: trebuchet fetch for scap/scap failed on mw2086.codfw.wmnet, mw1222.eqiad.wmnet and virt1000.wikimedia.org
* 22:25 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 21:41 gwicke: restarting Cassandra on restbase1001 to get the metrics back
* 22:23 legoktm@deploy1002: Started scap: i18n for restoring deprecated token APIs
* 21:20 ori: Depooled mw1170-mw1175 and mw1270-mw1275 for testing Idddcfe46
* 22:21 legoktm@deploy1002: Synchronized php-1.37.0-wmf.23/includes/api/: Restore deprecated token APIs (3/3) (duration: 00m 56s)
* 21:07 chasemp: rebooting mw1101 the hard way
* 22:19 legoktm@deploy1002: Synchronized php-1.37.0-wmf.23/autoload.php: Restore deprecated token APIs (2/3) (duration: 00m 56s)
* 20:28 cscott: updated Parsoid to version d488783e
* 22:16 legoktm@deploy1002: Synchronized php-1.37.0-wmf.23/includes/api/ApiTokens.php: Restore deprecated token APIs (1/3) (duration: 00m 56s)
* 19:34 akosiaris: delete pad:ips from etherpad
* 21:22 robh@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on ganeti4004.ulsfo.wmnet with reason: REIMAGE
* 19:01 jynus: rebooting es1002
* 21:19 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti4004.ulsfo.wmnet with reason: REIMAGE
* 18:52 logmsgbot: ori Synchronized php-1.26wmf10/includes/OutputPage.php: I0e5f2d3b2: Construct clean canonical URLs for wiki pages, ignoring request URL (T67402) (duration: 00m 14s)
* 21:04 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 18:01 legoktm: live-hacking mw1017 to debug T103053
* 21:00 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 17:49 mutante: Bugzilla has left the building
* 20:49 ladsgroup@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:721610{{!}}Set jQuery migrate to false for wikibooks and Commons (T280944)]] (duration: 00m 56s)
* 16:31 jynus: reseting wikitech-static mysql contents to improve fragmentation
* 19:26 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 16:26 logmsgbot: jynus Synchronized wmf-config/db-eqiad.php: Repool es1001, depool es1002 (duration: 00m 14s)
* 19:21 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 16:12 andrewbogott: shutting down virt1000
* 19:08 hashar@deploy1002: rebuilt and synchronized wikiversions files: all wikis to 1.37.0-wmf.23
* 16:08 andrewbogott: disabling puppet on virt1000
* 18:55 robh@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 16:07 ottomata: deploying eventlogging 0.9. This includes changes for arbitrary eventlogging URIs in all eventlogging stages, as well as support for schema based kafka topic URIs.
* 18:51 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 15:24 logmsgbot: thcipriani Synchronized php-1.26wmf10/extensions/WikiEditor: SWAT: Reduce 'Edit' EventLogging schema sampling rate to 6.25% (1/16th) [[gerrit:219837]] (duration: 00m 13s)
* 18:50 robh@cumin1001: START - Cookbook sre.dns.netbox
* 15:04 logmsgbot: thcipriani Synchronized wmf-config/InitialiseSettings-labs.php: SWAT: Default wmgUseWikibaseQuality on beta to true. [[gerrit:219630]] (duration: 00m 14s)
* 18:49 dzahn@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
* 14:32 hashar: restarting Jenkins
* 18:46 dzahn@cumin1001: START - Cookbook sre.dns.netbox
* 13:26 jynus: rebooting es1001 for regular maintenance
* 18:44 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 12:08 paravoid: powercycled ms-be1002, stuck at console
* 18:29 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.23/extensions/GrowthExperiments/modules/ext.growthExperiments.StructuredTask/addlink/AddLinkArticleTarget.js: {{Gerrit|bb8cba102fe417e8e41b7c4e9179d119c7d25a43}}: Use growthexperiments-structuredtask-no-suggestions-found-dialog-button in outdated suggestions dialog (2/2) (duration: 01m 06s)
* 11:12 logmsgbot: jynus Synchronized wmf-config/db-eqiad.php: Depool es1001 (duration: 00m 13s)
* 18:27 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.23/extensions/GrowthExperiments/extension.json: {{Gerrit|bb8cba102fe417e8e41b7c4e9179d119c7d25a43}}: Use growthexperiments-structuredtask-no-suggestions-found-dialog-button in outdated suggestions dialog (1/2) (duration: 01m 07s)
* 11:06 _joe_: restarting hhvm on the low-memory appservers (main and api)
* 17:54 volans: turn of lldp agent on NIC (both ports) on ms-be105[1-9],ms-be205[2-6] - [[phab:T290984|T290984]]
* 09:23 hashar: upgrading Jenkins gearman plugin from 0.1.1 to latest master (f2024bd). Restarting Jenkins.
* 17:31 volans: turn of lldp agent on NIC (both ports) on ms-be2051 - [[phab:T290984|T290984]]
* 05:11 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Mon Jun 22 05:11:22 UTC 2015 (duration 11m 21s)
* 17:09 jynus: deployed extra grants for admin user on s6 primary
* 02:31 logmsgbot: LocalisationUpdate completed (1.26wmf10) at 2015-06-22 02:31:32+00:00
* 16:39 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts an-test-coord1002.eqiad.wmnet
* 02:27 logmsgbot: l10nupdate Synchronized php-1.26wmf10/cache/l10n: (no message) (duration: 07m 27s)
* 16:17 btullis@cumin1001: START - Cookbook sre.hosts.decommission for hosts an-test-coord1002.eqiad.wmnet
* 00:44 jgage: restarted gitblit on antimony again
* 16:04 marostegui: Disconnect s6 master from m5 master (noting the replication position) [[phab:T167973|T167973]]
* 16:04 marostegui: Disconnect s6 master from m5 master (noting the replication position)
* 15:52 bd808: marostegui is awesome and made wikitech better today. :)
* 15:04 marostegui@cumin1001: dbctl commit (dc=all): 'Set wikitech on read-only for maintenance [[phab:T287454|T287454]]', diff saved to https://phabricator.wikimedia.org/P17283 and previous config saved to /var/cache/conftool/dbconfig/20210916-150444-marostegui.json
* 15:03 marostegui: Set wikitech on read-only (from now on all SAL changes will fail) [[phab:T167973|T167973]]
* 14:56 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on mwmaint2002.codfw.wmnet with reason: reimage
* 14:55 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on mwmaint2002.codfw.wmnet with reason: reimage
* 14:53 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mwmaint2002.codfw.wmnet with reason: REIMAGE
* 14:53 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 14:51 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 14:51 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mwmaint2002.codfw.wmnet with reason: REIMAGE
* 14:35 mutante: reimaging mwmaint2002 to buster ([[phab:T267607|T267607]], [[phab:T245757|T245757]])
* 14:21 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mwmaint2002.codfw.wmnet with reason: reimage
* 14:21 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mwmaint2002.codfw.wmnet with reason: reimage
* 14:12 mutante: switching https://noc.wikimedia.org from codfw to eqiad ([[phab:T287539|T287539]], [[phab:T267607|T267607]])
* 13:44 sukhe: homer: running for Gerrit: 721018: set up BGP peering to durum hosts in <nowiki>{</nowiki>eqiad,codfw,esams,ulsfo,eqsin<nowiki>}</nowiki>
* 13:25 effie: pool mw1422 mw1455
* 13:24 effie: poiol mw1422 mw1455
* 13:18 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 13:16 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 13:12 hashar@deploy1002: Synchronized php: group1 wikis to 1.37.0-wmf.23 (duration: 01m 04s)
* 13:11 hashar@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.37.0-wmf.23
* 12:08 marostegui: Deploy schema change on s2 codfw (lag will show up) [[phab:T290057|T290057]]
* 12:00 mbsantos: start OSM re-import script in maps2009 (depooled)
* 11:51 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.21/extensions/GrowthExperiments/includes/MentorDashboard/MenteeOverview/UncachedMenteeOverviewDataProvider.php: {{Gerrit|529f86c5a998820c32e7d7f2d952317080383e05}}: UncachedMenteeOverviewDataProvider: Do not fatal with zero mentees ([[phab:T291088|T291088]]) (duration: 01m 04s)
* 11:49 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.23/extensions/GrowthExperiments/includes/MentorDashboard/MenteeOverview/UncachedMenteeOverviewDataProvider.php: {{Gerrit|9e0f6f84240bf621e97806a94a0e786817001668}}: UncachedMenteeOverviewDataProvider: Do not fatal with zero mentees ([[phab:T291088|T291088]]) (duration: 01m 04s)
* 11:43 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.23/extensions/AbuseFilter/: Fixing incorrect deployment of {{Gerrit|01e4450}} for [[phab:T291123|T291123]]. This is supposed to be a no-op. (duration: 01m 05s)
* 11:43 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 11:41 urbanecm: [urbanecm@deploy1002 /srv/mediawiki-staging/php-1.37.0-wmf.23 (wmf/1.37.0-wmf.23 * u+2-2)]$ git rebase &&  git submodule update extensions/AbuseFilter/ # fixing an incorrect deployment that happened in [[phab:T291123|T291123]]
* 11:41 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 11:41 urbanecm: [urbanecm@deploy1002 /srv/mediawiki-staging/php-1.37.0-wmf.23/extensions/AbuseFilter (wmf/1.37.0-wmf.23 u=)]$ git co {{Gerrit|0d2bc7ca17b9f767ae5753db7e4e41fd9e7d3531}} # reset repo to expected state, fixing incorrect deploy of a backport in [[phab:T291123|T291123]]
* 11:34 moritzm: installing 4.9.272 kernels on stretch hosts (no reboots yet)
* 11:34 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 11:32 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 11:21 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.renew-cert (exit_code=0) for mx2002.wikimedia.org: Renew puppet certificate - jmm@cumin2002
* 11:21 jmm@cumin2002: START - Cookbook sre.puppet.renew-cert for mx2002.wikimedia.org: Renew puppet certificate - jmm@cumin2002
* 11:15 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 11:13 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 11:11 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings-labs.php: Config: [[gerrit:721305{{!}}Add new WikimediaBadges config (T232927)]] (2/2) (duration: 01m 05s)
* 11:09 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:721305{{!}}Add new WikimediaBadges config (T232927)]] (1/2) (duration: 01m 05s)
* 11:03 jmm@cumin2002: END (FAIL) - Cookbook sre.puppet.renew-cert (exit_code=99) for mx2002.wikimedia.org: Renew puppet certificate - jmm@cumin2002
* 11:03 jmm@cumin2002: START - Cookbook sre.puppet.renew-cert for mx2002.wikimedia.org: Renew puppet certificate - jmm@cumin2002
* 10:59 hashar@deploy1002: Synchronized php-1.37.0-wmf.21/includes/language/Message.php: Message: Remove deprecated format property - [[phab:T146416|T146416]] [[phab:T291124|T291124]] (duration: 01m 06s)
* 10:55 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 10:54 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 10:21 topranks: Changing default gateway on mw1422 to use VRRP backup (cr2), to determine if tail drops from switches to cr1 is cause of TCP retransmissions.
* 10:14 effie: depool mw1455 for network testing
* 10:11 effie: depool mw1422 for network testing
* 10:01 jmm@cumin2002: END (FAIL) - Cookbook sre.puppet.renew-cert (exit_code=99) for mx2002.wikimedia.org: Renew puppet certificate - jmm@cumin2002
* 10:01 jmm@cumin2002: START - Cookbook sre.puppet.renew-cert for mx2002.wikimedia.org: Renew puppet certificate - jmm@cumin2002
* 10:00 jmm@cumin2002: END (FAIL) - Cookbook sre.puppet.renew-cert (exit_code=99) for mx2002.wikimedia.org: Renew puppet certificate - jmm@cumin2002
* 10:00 jmm@cumin2002: START - Cookbook sre.puppet.renew-cert for mx2002.wikimedia.org: Renew puppet certificate - jmm@cumin2002
* 09:36 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on mx2002.wikimedia.org with reason: reimage
* 09:36 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on mx2002.wikimedia.org with reason: reimage
* 09:10 moritzm: in-place re-installation of mx2002.wikimedia.org (test VM) to test the new installer key support in the sre.puppet.renew-cert cookbook
* 08:04 moritzm: upgrading scandium to PHP 7.2 backport of patch for enhanced DOM replaceChild/removeChild performance  [[phab:T291052|T291052]]
* 07:48 elukey@puppetmaster1001: conftool action : set/pooled=true; selector: dnsdisc=helm-charts,name=eqiad
* 05:35 marostegui: Optimize dewiki.logging in codfw [[phab:T287344|T287344]]


== June 21 ==
== 2021-09-15 ==
* 11:28 jynus: restarting apache on mw1110
* 23:02 legoktm: upgrading lists1001 to use postorius 1.3.5
* 06:55 gwicke: restarted  bootstrap on restbase1009 earlier today; hardware hasn't died yet
* 22:51 legoktm: uploaded new mailmanclient/postorius packages to apt1001
* 05:01 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Sun Jun 21 05:01:07 UTC 2015 (duration 1m 6s)
* 22:38 ryankemper: [WDQS Deploy] Deploy complete. Successful test query placed on query.wikidata.org, there's no relevant criticals in Icinga, and Grafana looks good
* 02:27 logmsgbot: LocalisationUpdate completed (1.26wmf10) at 2015-06-21 02:27:13+00:00
* 22:03 ryankemper: [WDQS Deploy] Restarting `wdqs-categories` across lvs-managed hosts, one node at a time: `sudo -E cumin -b 1 'A:wdqs-all and not A:wdqs-test' 'depool && sleep 45 && systemctl restart wdqs-categories && sleep 45 && pool'`
* 02:23 logmsgbot: l10nupdate Synchronized php-1.26wmf10/cache/l10n: (no message) (duration: 10m 23s)
* 22:03 ryankemper: [WDQS Deploy] Restarted `wdqs-categories` across both test hosts simultaneously: `sudo -E cumin 'A:wdqs-test' 'systemctl restart wdqs-categories'`
* 01:39 jgage: restarted gitblit on antimony at 00:43 UTC
* 22:03 ryankemper: [WDQS Deploy] Restarted `wdqs-updater` across all hosts, 4 hosts at a time: `sudo -E cumin -b 4 'A:wdqs-all' 'systemctl restart wdqs-updater'`
* 01:37 Krenair: testing morebots
* 22:02 ryankemper@deploy1002: Finished deploy [wdqs/wdqs@902529b]: 0.3.85 (duration: 06m 59s)
* 21:56 ryankemper: [WDQS Deploy] Tests passing following deploy of `0.3.85` on canary `wdqs1003`; proceeding to rest of fleet
* 21:55 ryankemper@deploy1002: Started deploy [wdqs/wdqs@902529b]: 0.3.85
* 21:55 ryankemper: [WDQS Deploy] Gearing up for deploy of wdqs `0.3.85`. Pre-deploy tests passing on canary `wdqs1003`
* 21:42 ebernhardson@deploy1002: Finished deploy [wdqs/wdqs@f3473d9]: Reference files deployed by puppet through query_service paths instead of wdqs (duration: 02m 07s)
* 21:40 ebernhardson@deploy1002: Started deploy [wdqs/wdqs@f3473d9]: Reference files deployed by puppet through query_service paths instead of wdqs
* 21:29 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 21:27 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 21:04 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 21:03 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 21:00 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|60e7e515d7034a9f839d78851f1dcc2be3df7f3b}}: Set wmgEchoEnablePush to false explicitly on arbcom_* wikis ([[phab:T291128|T291128]]) (duration: 01m 06s)
* 19:50 twentyafterfour@deploy1002: Synchronized php-1.37.0-wmf.23/extensions/AbuseFilter/: sync backport for https://gerrit.wikimedia.org/r/c/mediawiki/extensions/AbuseFilter/+/721312 (duration: 01m 06s)
* 19:44 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 19:43 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 19:15 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 19:14 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 19:07 hashar@deploy1002: rebuilt and synchronized wikiversions files: Rollback all wikis to 1.37.0-wmf.23
* 19:07 urbanecm: Re-start server-side upload for 1 video file, likely temporary swift failure ([[phab:T289781|T289781]])
* 19:06 urbanecm: Start server-side upload for 1 video file ([[phab:T287686|T287686]])
* 19:04 hashar@deploy1002: Synchronized php: group1 wikis to 1.37.0-wmf.23 (duration: 00m 55s)
* 19:03 hashar@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.37.0-wmf.23
* 18:52 urbanecm: Start server-side upload for 1 video file ([[phab:T289949|T289949]])
* 18:50 urbanecm: Start server-side upload for 1 video file ([[phab:T289781|T289781]])
* 18:44 urbanecm: Start server-side upload for 3 large PDF files ([[phab:T290722|T290722]])
* 18:43 legoktm: migrated sitereq-l@ from Google Groups to Mailman ([[phab:T290908|T290908]])
* 18:27 urbanecm: Start server-side upload for 1 video file ([[phab:T290290|T290290]])
* 18:23 urbanecm: Start server-side upload for 1 video file ([[phab:T290685|T290685]])
* 18:21 urbanecm: Start server-side upload for 1 video file ([[phab:T290707|T290707]])
* 18:16 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 18:14 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 18:07 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 18:05 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 18:05 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|7620084a1ed92066aa8b29fa609cf6cbb4f799ab}}: Add portrattarkiv.se to wgCopyUploadsDomains whitelist of Wikimedia Commons ([[phab:T290581|T290581]]) (duration: 01m 05s)
* 17:39 mutante: thumbor - running puppet on all thumbor hosts, removed cron job systemd-thumbor-tmpfiles-clean, added thumbor_systemd_tmpfiles_clean timer job
* 16:56 joal@deploy1002: Finished deploy [analytics/refinery@0f7f6f3] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@0f7f6f3] (duration: 06m 15s)
* 16:50 joal@deploy1002: Started deploy [analytics/refinery@0f7f6f3] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@0f7f6f3]
* 16:47 joal@deploy1002: Finished deploy [analytics/refinery@0f7f6f3] (thin): Regular analytics weekly train THIN [analytics/refinery@0f7f6f3] (duration: 00m 07s)
* 16:47 joal@deploy1002: Started deploy [analytics/refinery@0f7f6f3] (thin): Regular analytics weekly train THIN [analytics/refinery@0f7f6f3]
* 16:45 joal@deploy1002: Finished deploy [analytics/refinery@0f7f6f3]: Regular analytics weekly train [analytics/refinery@0f7f6f3] (duration: 19m 43s)
* 16:31 dzahn@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host durum5002.eqsin.wmnet
* 16:26 joal@deploy1002: Started deploy [analytics/refinery@0f7f6f3]: Regular analytics weekly train [analytics/refinery@0f7f6f3]
* 16:19 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm for new host durum5002.eqsin.wmnet
* 16:17 dzahn@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host durum5001.eqsin.wmnet
* 16:02 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm for new host durum5001.eqsin.wmnet
* 15:56 urbanecm: Remove 2FA for User:Rho at wikitech, identity verified via a videocall
* 14:50 moritzm: installing lz4 security updates on stretch
* 13:50 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 13:33 ottomata: pointing <nowiki>{</nowiki>stats,analytics<nowiki>}</nowiki>.wikimedia.org at analytics-web.discovery.wmnet cname - [[phab:T285355|T285355]]
* 13:32 dzahn@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host durum4002.ulsfo.wmnet
* 13:18 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm for new host durum4002.ulsfo.wmnet
* 13:15 dzahn@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host durum4001.ulsfo.wmnet
* 13:03 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm for new host durum4001.ulsfo.wmnet
* 12:54 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 11:41 marostegui: Install 10.4.21-2 on db1125
* 11:25 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 11:23 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 11:21 Lucas_WMDE: EU backport+config window done
* 11:20 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:720983{{!}}Enable change-tags for new edits' proofread status at mulWS (T289140)]] (duration: 01m 06s)
* 11:16 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 11:10 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:583407{{!}}Don’t check constraints on two property qualifiers (T235292)]] (duration: 01m 11s)
* 11:09 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 10:03 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=maps1010.eqiad.wmnet
* 09:55 effie: depool wtp1026
* 09:54 effie: depooling mw1312 and mw1319
* 09:46 topranks: Disabling Intel X710 NIC on-board LLDP processing on relforge1003 ([[phab:T290984|T290984]])
* 07:04 jiji@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 06:57 elukey: shutdown ms-be2045 (again) after seeing [[phab:T290881|T290881]]
* 06:02 elukey: powercycle ms-be2045 - no ssh, no remote tty available
* 05:28 marostegui@cumin1001: dbctl commit (dc=all): 'Restore db1109 original load', diff saved to https://phabricator.wikimedia.org/P17274 and previous config saved to /var/cache/conftool/dbconfig/20210915-052802-marostegui.json
* 04:30 marostegui@cumin1001: dbctl commit (dc=all): 'Increase db1109 load', diff saved to https://phabricator.wikimedia.org/P17273 and previous config saved to /var/cache/conftool/dbconfig/20210915-043053-marostegui.json


== June 20 ==
== 2021-09-14 ==
* 22:50 bblack: restarted gitblit java service on antimony
* 23:01 legoktm@deploy1002: Synchronized wmf-config/CommonSettings.php: Re-enable VipsScaler (2 of 2) (duration: 01m 04s)
* 04:27 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Sat Jun 20 04:27:14 UTC 2015 (duration 27m 13s)
* 22:59 legoktm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Re-enable VipsScaler (1 of 2) (duration: 01m 05s)
* 02:21 logmsgbot: LocalisationUpdate completed (1.26wmf10) at 2015-06-20 02:21:30+00:00
* 22:58 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 02:18 logmsgbot: l10nupdate Synchronized php-1.26wmf10/cache/l10n: (no message) (duration: 07m 02s)
* 22:53 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 22:43 legoktm: legoktm@cumin2001:~$ sudo systemctl reset-failed # clear httpbb_hourly_tests failure, moved to cumin1001
* 22:34 legoktm@deploy1002: Finished scap: Rebuild i18n for redeployment of VipsScaler ([[phab:T290759|T290759]]) (duration: 23m 49s)
* 22:34 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 22:29 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 22:12 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 22:11 legoktm@deploy1002: Started scap: Rebuild i18n for redeployment of VipsScaler ([[phab:T290759|T290759]])
* 22:10 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 20:26 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 20:23 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 20:20 dancy: testing upcoming Scap release on beta
* 20:20 ladsgroup@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:720387{{!}}Early adopt wgIncludejQueryMigrate=false on nlwiki (T280944)]] (duration: 01m 48s)
* 20:06 cdanis: [[phab:T290425|T290425]] ✔️ cdanis@alert1001.wikimedia.org ~ 🕓🍵 sudo /usr/bin/statograph -c /etc/statograph/config.yml erase_metric_data lyfcttm2lhw4
* 20:06 cdanis: [[phab:T290425|T290425]] ✔️ cdanis@alert1001.wikimedia.org ~ 🕓🍵 sudo /usr/bin/statograph -c /etc/statograph/config.yml erase_metric_data h5mvbny28713
* 19:19 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 19:16 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 19:08 hashar@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.37.0-wmf.23
* 18:48 moritzm: removed filter for tcp/25 on mx2001, reimage is complete [[phab:T286911|T286911]]
* 18:33 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 18:32 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 18:24 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 18:23 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 18:20 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|2982638039720107d0b6e3227f5dce5b34ce7533}}: Offer the DiscussionTools reply tool as opt-out setting at ptwikinews ([[phab:T285162|T285162]]) (duration: 01m 06s)
* 18:16 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|7f1de32f4b5788e92291a5448563bc61a9f561e2}}: Offer the DiscussionTools reply tool as opt-out setting at Wikimania wiki ([[phab:T284339|T284339]]) (duration: 01m 05s)
* 18:15 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 18:14 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 18:13 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|e36f4d3dcc368f0afbce3649ce72f2135ab1c76f}}: DiscussionTools: Make newtopictool available to everyone on arwiki and cswiki ([[phab:T285724|T285724]]) (duration: 01m 04s)
* 18:09 urbanecm@deploy1002: Synchronized debug.json: {{Gerrit|Idef64e72}} (duration: 01m 29s)
* 18:06 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 18:05 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 17:54 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on mx2001.wikimedia.org with reason: reimage
* 17:54 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on mx2001.wikimedia.org with reason: reimage
* 17:45 moritzm: reimaging mx2001 to bullseye [[phab:T286911|T286911]]
* 16:57 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 16:55 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 16:32 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 16:28 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 16:28 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 16:24 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 16:20 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 16:16 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 15:53 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on maps1010.eqiad.wmnet with reason: Resyncing from master
* 15:53 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 5:00:00 on maps1010.eqiad.wmnet with reason: Resyncing from master
* 15:51 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=maps1010.eqiad.wmnet
* 15:43 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 15:41 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 15:34 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 15:32 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 15:19 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for 37 hosts
* 15:19 kormat@cumin1001: START - Cookbook sre.hosts.remove-downtime for 37 hosts
* 15:11 jelto@cumin2002: END (PASS) - Cookbook sre.switchdc.mediawiki.09-update-tendril (exit_code=0)
* 15:11 jelto@cumin2002: START - Cookbook sre.switchdc.mediawiki.09-update-tendril
* 15:10 jelto@cumin2002: END (PASS) - Cookbook sre.switchdc.mediawiki.09-run-puppet-on-db-masters (exit_code=0)
* 15:07 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 15:07 jelto@cumin2002: START - Cookbook sre.switchdc.mediawiki.09-run-puppet-on-db-masters
* 15:06 jelto@cumin2002: END (PASS) - Cookbook sre.switchdc.mediawiki.09-restore-ttl (exit_code=0)
* 15:05 jelto@cumin2002: START - Cookbook sre.switchdc.mediawiki.09-restore-ttl
* 15:04 marostegui@cumin1001: dbctl commit (dc=all): 'Increase db1109 load', diff saved to https://phabricator.wikimedia.org/P17271 and previous config saved to /var/cache/conftool/dbconfig/20210914-150458-marostegui.json
* 15:03 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 15:00 jelto@cumin2002: END (PASS) - Cookbook sre.switchdc.mediawiki.08-start-maintenance (exit_code=0)
* 14:58 jelto@cumin2002: START - Cookbook sre.switchdc.mediawiki.08-start-maintenance
* 14:55 marostegui@cumin1001: dbctl commit (dc=all): 'Reduce db1109 load', diff saved to https://phabricator.wikimedia.org/P17270 and previous config saved to /var/cache/conftool/dbconfig/20210914-145522-marostegui.json
* 14:54 jelto@cumin2002: END (PASS) - Cookbook sre.switchdc.mediawiki.01-stop-maintenance (exit_code=0)
* 14:54 jelto@cumin2002: START - Cookbook sre.switchdc.mediawiki.01-stop-maintenance
* 14:53 jelto@cumin2002: END (ERROR) - Cookbook sre.switchdc.mediawiki.08-start-maintenance (exit_code=97)
* 14:53 marostegui@cumin1001: dbctl commit (dc=all): 'Reduce db1109 load', diff saved to https://phabricator.wikimedia.org/P17269 and previous config saved to /var/cache/conftool/dbconfig/20210914-145324-marostegui.json
* 14:52 jelto@cumin2002: START - Cookbook sre.switchdc.mediawiki.08-start-maintenance
* 14:49 jelto@cumin2002: END (FAIL) - Cookbook sre.switchdc.mediawiki.08-restart-envoy-on-jobrunners (exit_code=99)
* 14:49 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 14:49 jelto@cumin2002: START - Cookbook sre.switchdc.mediawiki.08-restart-envoy-on-jobrunners
* 14:46 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 14:46 jelto@cumin2002: END (PASS) - Cookbook sre.switchdc.mediawiki.07-set-readwrite (exit_code=0)
* 14:46 jelto@cumin2002: MediaWiki read-only period ends at: 2021-09-14 14:46:30.570035
* 14:45 jelto@cumin2002: START - Cookbook sre.switchdc.mediawiki.07-set-readwrite
* 14:45 jelto@cumin2002: END (PASS) - Cookbook sre.switchdc.mediawiki.06-set-db-readwrite (exit_code=0)
* 14:45 jelto@cumin2002: START - Cookbook sre.switchdc.mediawiki.06-set-db-readwrite
* 14:45 jelto@cumin2002: END (PASS) - Cookbook sre.switchdc.mediawiki.05-invert-redis-sessions (exit_code=0)
* 14:45 jelto@cumin2002: START - Cookbook sre.switchdc.mediawiki.05-invert-redis-sessions
* 14:45 jelto@cumin2002: END (PASS) - Cookbook sre.switchdc.mediawiki.04-switch-mediawiki (exit_code=0)
* 14:44 jelto@cumin2002: START - Cookbook sre.switchdc.mediawiki.04-switch-mediawiki
* 14:44 jelto@cumin2002: END (PASS) - Cookbook sre.switchdc.mediawiki.03-set-db-readonly (exit_code=0)
* 14:44 jelto@cumin2002: START - Cookbook sre.switchdc.mediawiki.03-set-db-readonly
* 14:44 jelto@cumin2002: END (PASS) - Cookbook sre.switchdc.mediawiki.02-set-readonly (exit_code=0)
* 14:43 jelto@cumin2002: MediaWiki read-only period starts at: 2021-09-14 14:43:48.272827
* 14:43 jelto@cumin2002: START - Cookbook sre.switchdc.mediawiki.02-set-readonly
* 14:40 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 37 hosts with reason: DC switchover
* 14:40 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 37 hosts with reason: DC switchover
* 14:39 jelto@cumin2002: END (PASS) - Cookbook sre.switchdc.mediawiki.01-stop-maintenance (exit_code=0)
* 14:39 jelto@cumin2002: START - Cookbook sre.switchdc.mediawiki.01-stop-maintenance
* 14:34 jelto@cumin2002: END (PASS) - Cookbook sre.switchdc.mediawiki.00-warmup-caches (exit_code=0)
* 14:32 jelto@cumin2002: START - Cookbook sre.switchdc.mediawiki.00-warmup-caches
* 14:30 jelto@cumin2002: END (PASS) - Cookbook sre.switchdc.mediawiki.00-reduce-ttl (exit_code=0)
* 14:24 jelto@cumin2002: START - Cookbook sre.switchdc.mediawiki.00-reduce-ttl
* 14:22 jelto@cumin2002: END (PASS) - Cookbook sre.switchdc.mediawiki.00-disable-puppet (exit_code=0)
* 14:22 jelto@cumin2002: START - Cookbook sre.switchdc.mediawiki.00-disable-puppet
* 14:19 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 14:12 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 14:10 legoktm@deploy1002: Synchronized wmf-config/CommonSettings.php: Avoid warning about undefined $wgFileBlacklist ([[phab:T290640|T290640]]) (duration: 01m 32s)
* 13:44 mbsantos@deploy1002: Finished deploy [kartotherian/deploy@0a38bc5]: kartotherian: restore v4 maxzoom to z15 (duration: 00m 10s)
* 13:43 mbsantos@deploy1002: Started deploy [kartotherian/deploy@0a38bc5]: kartotherian: restore v4 maxzoom to z15
* 13:43 mbsantos@deploy1002: Finished deploy [kartotherian/deploy@79bc0c6]: geoshapes: update table names (duration: 00m 14s)
* 13:42 mbsantos@deploy1002: Started deploy [kartotherian/deploy@79bc0c6]: geoshapes: update table names
* 13:27 mbsantos@deploy1002: Finished deploy [kartotherian/deploy@0a38bc5]: kartotherian: restore v4 maxzoom to z15 (duration: 00m 10s)
* 13:27 mbsantos@deploy1002: Started deploy [kartotherian/deploy@0a38bc5]: kartotherian: restore v4 maxzoom to z15
* 13:26 mbsantos@deploy1002: Finished deploy [kartotherian/deploy@1ebdca4]: (no justification provided) (duration: 00m 15s)
* 13:26 mbsantos@deploy1002: Started deploy [kartotherian/deploy@1ebdca4]: (no justification provided)
* 12:32 jelto@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
* 12:32 jelto@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
* 12:29 jelto@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
* 12:29 jelto@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
* 12:19 jelto@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
* 12:19 jelto@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
* 12:17 jelto@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
* 12:17 jelto@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
* 11:33 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2026.codfw.wmnet
* 11:22 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2026.codfw.wmnet
* 10:47 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host testvm2001.codfw.wmnet
* 10:31 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host testvm2001.codfw.wmnet
* 10:05 hashar@deploy1002: Pruned MediaWiki: 1.37.0-wmf.20 (duration: 01m 48s)
* 09:47 hashar@deploy1002: Pruned MediaWiki: 1.37.0-wmf.19 (duration: 04m 13s)
* 09:40 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts testvm2001.codfw.wmnet
* 09:38 hashar@deploy1002: Finished scap: testwikis wikis to 1.37.0-wmf.23 (duration: 70m 39s)
* 09:29 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts testvm2001.codfw.wmnet
* 09:21 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts testvm2002.codfw.wmnet
* 09:10 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts testvm2002.codfw.wmnet
* 09:09 Emperor: swift rebalance to remove h/w faulty host ms-be2045 [[phab:T290881|T290881]]
* 09:04 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 08:57 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 08:47 moritzm: installing testvm2002
* 08:42 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host testvm2002.codfw.wmnet
* 08:28 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host testvm2002.codfw.wmnet
* 08:27 hashar@deploy1002: Started scap: testwikis wikis to 1.37.0-wmf.23
* 08:25 godog: poweroff ms-be2045 and set it as failed in netbox - [[phab:T290881|T290881]]
* 08:24 hashar: train: applied security patches for 1.37.0-wmf.23  # [[phab:T281164|T281164]]
* 08:05 godog: wipe non-os partitions from ms-be2045 - [[phab:T290881|T290881]]
* 07:50 vgutierrez: update acme-chief to version 0.31 on acmechief hosts - [[phab:T290249|T290249]]
* 04:47 eileen: civicrm revision changed from {{Gerrit|1f071f6c6c}} to {{Gerrit|e6bf81d99c}}, config revision is {{Gerrit|23eda8ba3a}}
* 02:41 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 02:39 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 02:07 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 02:07 James_F: wmf/1.37.0-wmf.23 was branched at {{Gerrit|ea72c9b690c2159a12beec2f518b61cc499ed521}} for [[phab:T281164|T281164]]
* 02:03 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 00:04 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 00:01 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .


== June 19 ==
== 2021-09-13 ==
* 23:32 gwicke: upgraded restbase1006 to cassandra 2.1.7
* 23:54 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 23:30 gwicke: starting cassandra bootstrap on restbase1009
* 23:52 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 21:37 gwicke: upgraded cassandra on 1003 to 2.1.7 (pre-release, likely going out on Monday)
* 23:45 jforrester@deploy1002: Synchronized wmf-config/InitialiseSettings.php: [[phab:T290759|T290759]]: Undeploy VipsScaler: III – Don't set wmgUseVips, now ignored (duration: 00m 58s)
* 18:32 godog: stop cassandra on restbase1008
* 23:45 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 17:45 logmsgbot: krenair Synchronized private/PrivateSettings.php: sync 4a30446e for wikitech cleanup - T102361 (duration: 00m 12s)
* 23:43 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 17:24 godog: install linux 3.19 on restbase100[789]
* 23:41 jforrester@deploy1002: Synchronized wmf-config/CommonSettings.php: [[phab:T290759|T290759]]: Undeploy VipsScaler: II – Don't load regardless of config (duration: 00m 58s)
* 17:12 ori: salt -t30 -G 'php:hhvm' cmd.run 'rm -f /usr/local/bin/check_tc_space' (https://gerrit.wikimedia.org/r/#/c/219102/)
* 19:52 jforrester@deploy1002: Synchronized wmf-config/InitialiseSettings.php: [[phab:T290759|T290759]] Undeploy VipsScaler: I – Disable on all wikis (duration: 00m 57s)
* 16:54 moritzm: updated/rebooted nescio/maerlant to 3.19
* 19:49 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 13:40 andrewbogott: test test test
* 19:47 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 02:19 logmsgbot: LocalisationUpdate completed (1.26wmf10) at 2015-06-19 02:19:33+00:00
* 19:04 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 02:16 logmsgbot: l10nupdate Synchronized php-1.26wmf10/cache/l10n: (no message) (duration: 05m 08s)
* 18:59 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 00:49 springle: killed storm of research queries on dbstore1002, load avg 90+, replag, likely explosion, etc. emailing analytics@
* 18:59 urbanecm: [urbanecm@mwmaint2002 ~]$ mwscript resetAuthenticationThrottle.php --wiki=<nowiki>{</nowiki>cswiki,cswikiversity<nowiki>}</nowiki> --signup --ip=185.47.223.49 # [[phab:T290809|T290809]]
* 00:13 logmsgbot: ebernhardson Synchronized php-1.26wmf10/extensions/Flow/tests/: no-op sync of flow test cases in wmf10 (duration: 00m 17s)
* 18:58 urbanecm@deploy1002: Synchronized wmf-config/throttle.php: {{Gerrit|9db1d1ac938ca053c82fed88c8b6e75f97a52416}}: Add throttle rule for Czech wiki course ([[phab:T290809|T290809]]) (duration: 00m 58s)
* 00:11 logmsgbot: ebernhardson Synchronized php-1.26wmf10/skins/Vector/: Bump Vector submodule in 1.26wmf10 for swat (duration: 00m 12s)
* 18:29 ryankemper: [Cirrus] `eqiad` fully recovered (100% of shards), `codfw` at 99.816%. `codfw` is getting held up by recovery of `enwiki` shards which tend to be quite large
* 18:25 razzi: reenable replication on dbstore1007 for [[phab:T290841|T290841]]
* 18:16 cwhite: apply high log volume from ES mitigations to deprecated inputs
* 18:13 razzi: razzi@dbstore1007:~$ sudo systemctl restart mariadb@s3.service for [[phab:T290841|T290841]]
* 18:05 razzi: sudo systemctl restart mariadb@s2.service
* 17:48 ryankemper: [Cirrus] `eqiad` is at 99.13% shards recovered and `codfw` is at 98.83%
* 17:20 volans@cumin1001: END (PASS) - Cookbook sre.experimental.reimage (exit_code=0) for host sretest1002.eqiad.wmnet
* 17:17 ryankemper: [Cirrus] `enwiki` searches appear to be working now. `production-search-eqiad` is at 93.5% recovered shards, `production-search-codfw` is at 95.3% recovered
* 16:57 volans@cumin1001: START - Cookbook sre.experimental.reimage for host sretest1002.eqiad.wmnet
* 16:18 legoktm@cumin1001: conftool action : set/pooled=false; selector: name=codfw,dnsdisc=eventgate-main
* 16:16 volans@cumin1001: conftool action : set/pooled=yes; selector: name=mw1414.*
* 16:08 volans@cumin1001: conftool action : set/pooled=no; selector: name=mw1414.*
* 16:06 volans@cumin1001: END (PASS) - Cookbook sre.experimental.reimage (exit_code=0) for host mw1414.eqiad.wmnet
* 15:54 moritzm: filtered mx2001 on the routers for reimage [[phab:T286911|T286911]]
* 15:43 vgutierrez: update acme-chief to version 0.31 on acmechief-test hosts - [[phab:T290249|T290249]]
* 15:40 vgutierrez: upload acme-chief 0.31 to apt.wm.o (buster) - [[phab:T290249|T290249]]
* 15:32 jelto: Traffic: depool codfw from user traffic
* 15:26 jelto@cumin2002: END (PASS) - Cookbook sre.switchdc.services.02-restore-ttl (exit_code=0)
* 15:25 jelto@cumin2002: START - Cookbook sre.switchdc.services.02-restore-ttl
* 15:25 volans@cumin1001: START - Cookbook sre.experimental.reimage for host mw1414.eqiad.wmnet
* 15:20 Emperor: rebooting ms-be2045 to see if that brings the disk back properly [[phab:T290881|T290881]]
* 15:13 jelto@cumin2002: conftool action : set/pooled=false; selector: name=eqiad,dnsdisc=restbase-async
* 15:13 legoktm: (cotd.) box-constraints{{!}}similar-users{{!}}termbox{{!}}thanos-query{{!}}thanos-swift{{!}}wdqs{{!}}wdqs-internal{{!}}wikifeeds{{!}}zotero)
* 15:13 rzl: (contd.) box-constraints{{!}}similar-users{{!}}termbox{{!}}thanos-query{{!}}thanos-swift{{!}}wdqs{{!}}wdqs-internal{{!}}wikifeeds{{!}}zotero)
* 15:12 jelto@cumin2002: conftool action : set/pooled=true; selector: name=codfw,dnsdisc=(apertium{{!}}api-gateway{{!}}citoid{{!}}cxserver{{!}}echostore{{!}}eventgate-analytics{{!}}eventgate-analytics-external{{!}}eventgate-logging-external{{!}}eventgate-main{{!}}eventstreams{{!}}eventstreams-internal{{!}}kartotherian{{!}}linkrecommendation{{!}}mathoid{{!}}mobileapps{{!}}ores{{!}}proton{{!}}push-notifications{{!}}recommendation-api{{!}}restbase{{!}}restbase-async{{!}}schema{{!}}search{{!}}sessionstore{{!}}shellbox{{!}}shell
* 15:02 jelto@cumin2002: END (PASS) - Cookbook sre.switchdc.services.00-reduce-ttl-and-sleep (exit_code=0)
* 15:02 topranks: Restarting unused line-card FPC 1 in cr2-codfw in attempt to clear alarm.
* 14:56 jelto@cumin2002: START - Cookbook sre.switchdc.services.00-reduce-ttl-and-sleep
* 14:44 herron: drained mx2001 mail queue to mx1001 [[phab:T286911|T286911]]
* 14:38 dcausse: restarting wdqs-updater.service on all wdqs servers
* 14:21 jelto@cumin2002: END (PASS) - Cookbook sre.switchdc.services.02-restore-ttl (exit_code=0)
* 14:20 jelto@cumin2002: START - Cookbook sre.switchdc.services.02-restore-ttl
* 14:13 jelto@cumin2002: END (PASS) - Cookbook sre.switchdc.services.01-switch-dc (exit_code=0)
* 14:13 legoktm: (cotd.) ternal, eventgate-main, wikifeeds, eventstreams-internal, eventgate-analytics-external: codfw => eqiad
* 14:12 jelto@cumin2002: Switching services echostore, termbox, cxserver, eventstreams, search, ores, mathoid, schema, push-notifications, thanos-swift, wdqs, sessionstore, restbase, wdqs-internal, apertium, eventgate-analytics, citoid, api-gateway, restbase-async, proton, linkrecommendation, thanos-query, shellbox, kartotherian, mobileapps, recommendation-api, zotero, similar-users, shellbox-constraints, eventgate-logging-ex
* 14:12 jelto@cumin2002: START - Cookbook sre.switchdc.services.01-switch-dc
* 14:11 jelto@cumin2002: END (PASS) - Cookbook sre.switchdc.services.00-reduce-ttl-and-sleep (exit_code=0)
* 14:05 jelto@cumin2002: START - Cookbook sre.switchdc.services.00-reduce-ttl-and-sleep
* 14:03 dzahn@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host durum3002.esams.wmnet
* 13:51 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm for new host durum3002.esams.wmnet
* 13:50 dzahn@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host durum3001.esams.wmnet
* 13:39 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm for new host durum3001.esams.wmnet
* 13:36 dzahn@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host durum2002.codfw.wmnet
* 13:21 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm for new host durum2002.codfw.wmnet
* 13:20 dzahn@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host durum2001.codfw.wmnet
* 13:08 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm for new host durum2001.codfw.wmnet
* 12:09 volans@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 12:03 volans@cumin1001: START - Cookbook sre.dns.netbox
* 11:32 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 11:27 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 11:26 kostajh: European mid-day backport window deploys done
* 11:24 kharlan@deploy1002: Synchronized wmf-config: Config: [[gerrit:713553{{!}}WikimediaEvents: Remove UnderstandingFirstDay config]] (duration: 00m 59s)
* 10:51 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts testvm2002.codfw.wmnet
* 10:43 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts testvm2002.codfw.wmnet
* 10:15 volans@cumin1001: END (FAIL) - Cookbook sre.experimental.reimage (exit_code=93) for host mw1414.eqiad.wmnet
* 09:33 volans: restarting tcpircbot-logmsgbot on alert1001, not relying messages
* 09:18 elukey: upgrade rsyslog* on ml-serve* nodes to 8.1901.0-1+wmf2
* 09:16 godog: swift eqiad-prod: add weight to ms-be10[64-67] - [[phab:T290546|T290546]]
* 09:11 moritzm: reimaging sretest1002
* 09:11 elukey: upload rsyslog* 8.1901.0-1+wmf2 to buster-wikimedia component/rsyslog-k8s - [[phab:T277739|T277739]]
* 08:16 godog: bump +100G prometheus/ops codfw


== June 18 ==
== 2021-09-12 ==
* 23:37 logmsgbot: ebernhardson Synchronized php-1.26wmf9/skins/Vector: Bump Vector in 1.26wmf9 for SWAT (duration: 00m 16s)
* 18:33 vgutierrez: restart varnish-fe on cp3061, cp3063 and cp3065
* 23:22 logmsgbot: ebernhardson Synchronized wmf-config/: Actually enable the feedback link on Special:Search (duration: 00m 17s)
* 18:29 vgutierrez: restart varnish on cp3055
* 23:08 logmsgbot: ebernhardson Synchronized wmf-config/InitialiseSettings.php: Enable wgCirrusSearchFeedbackLink on enwiki (duration: 00m 13s)
* 18:26 vgutierrez: restart varnish on cp3057
* 21:07 godog: start (bootstrap) cassandra on restbase1008
* 04:53 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 20:43 akosiaris: uploaded to apt.wikimedia.org trusty-wikimedia: apertium-urd-hin_0.1.0+svn~r60389-1
* 04:52 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 20:17 akosiaris: restarted salt on sca1001, truncate log files. keep a sample in /tmp/
* 20:03 chasemp: apache && hhvm restart for mw 1243 1250 1254 1256 1257
* 20:00 chasemp: apache && hhvm restart for mw...1256 1255 1254 1250 1243 1242 1071 1021
* 19:58 mutante: restarting hhvm on mw1021, mw1071
* 19:27 godog: bounce cassandra on restbase1003, new logging configuration
* 19:26 akosiaris: puppet-merged on strontium
* 19:15 logmsgbot: twentyafterfour rebuilt wikiversions.cdb and synchronized wikiversions files: wikipedia wikis to 1.26wmf10
* 19:06 godog: upgrade cassandra to 2.1.6 on restbase1003
* 18:56 akosiaris: uploaded to apt.wikimedia.org jessie-wikimedia: apertium-urd_0.1.0~r57551-1
* 18:56 akosiaris: uploaded to apt.wikimedia.org jessie-wikimedia: apertium-hin_0.1.0~r57344-1
* 18:56 akosiaris: uploaded to apt.wikimedia.org jessie-wikimedia: apertium-cy-en_0.1.1~r57554-1
* 18:43 legoktm: fixed content model of MediaWiki:Common.css@lrcwiki
* 18:18 YuviPanda: restarted nutcracker on wikitech
* 18:16 YuviPanda: restarted keystone on labcontrol1001
* 17:13 gwicke: bouncing cassandra on restbase1002
* 17:11 godog: restart cassandra on restbase1004
* 15:53 gwicke: updated restbase to 7ffaf94b
* 15:13 logmsgbot: thcipriani Synchronized wmf-config/InitialiseSettings.php: SWAT: Hovercards: Disable test release on Catalan and Greek Wikipedias [[gerrit:215932]] (duration: 00m 13s)
* 15:06 logmsgbot: thcipriani Synchronized wmf-config/InitialiseSettings.php: SWAT: CX: Add wikis for deployment on 20150618 [[gerrit:218886]] (duration: 00m 14s)
* 11:14 akosiaris: powercycling labstore2001
* 09:08 moritzm: added firejail_0.9.26-1~wmfjessie1 and firejail_0.9.26-1~wmftrusty1 to apt.wikimedia.org
* 08:45 jynus: very brief replication stop for s7, already corrected
* 06:51 Coren: rebooting labstore2001
* 06:32 legoktm: live hacking mw1017 for T102915
* 05:26 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Thu Jun 18 05:26:01 UTC 2015 (duration 26m 0s)
* 02:48 logmsgbot: LocalisationUpdate completed (1.26wmf10) at 2015-06-18 02:48:44+00:00
* 02:46 logmsgbot: l10nupdate Synchronized php-1.26wmf10/cache/l10n: (no message) (duration: 05m 03s)
* 02:32 logmsgbot: LocalisationUpdate completed (1.26wmf9) at 2015-06-18 02:32:45+00:00
* 02:28 logmsgbot: l10nupdate Synchronized php-1.26wmf9/cache/l10n: (no message) (duration: 05m 56s)
* 02:04 springle: applied T99941 scema change to all remaining affected (ie, old) wikis
* 02:01 tgr: ran https://gerrit.wikimedia.org/r/#/c/159350/7/backend/schema/mysql/developer_agreement.sql on mediawikiwiki
* 01:32 ejegg: updated payments from f33d0a8687a120a2057a7e6acad67da63b17f97e to a17ee221db0dbde70c92e24fc188379b6dbad613
* 01:20 logmsgbot: ori Synchronized php-1.26wmf10/resources/src/mediawiki.action/mediawiki.action.edit.stash.js: 0c21a14a6e: Revert StashEdit: Use postWithToken (duration: 00m 13s)
* 01:06 twentyafterfour: applied hotfix for T102276 and restarted apache on iridium
* 00:00 logmsgbot: twentyafterfour rebuilt wikiversions.cdb and synchronized wikiversions files: group0 to 1.26wmf10


== June 17 ==
== 2021-09-11 ==
* 23:35 logmsgbot: catrope Synchronized php-1.26wmf10/extensions/Gather: SWAT (duration: 00m 14s)
* 19:02 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|27814b8eaacb5ba2fee1b6167a36ea14356a1ecf}}: testwiki: Fully remove securepoll-related groups ([[phab:T290808|T290808]]) (duration: 00m 57s)
* 23:35 gwicke: rolled back restbase to 90817c2a
* 18:35 urbanecm: [urbanecm@mwmaint2002 ~]$ mwscript emptyUserGroup.php --wiki=testwiki <nowiki>{</nowiki>electionadmin,electcomm<nowiki>}</nowiki> # [[phab:T290808|T290808]]
* 23:24 logmsgbot: catrope Synchronized php-1.26wmf9/extensions/MobileFrontend: SWAT (duration: 00m 15s)
* 18:31 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|908bbf35235ea4129795dfbf4c0e646440152e18}}: Revert "test: Add electcomm and electionadmin groups" ([[phab:T290808|T290808]]) (duration: 00m 58s)
* 23:23 logmsgbot: catrope Synchronized php-1.26wmf9/extensions/Flow: SWAT (duration: 00m 15s)
* 22:45 gwicke: rolling restart of cassandra nodes
* 22:09 gwicke: rolling restart of restbase instances to apply puppet change after puppet actually ran on all nodes
* 21:58 gwicke: rolling restart of restbase instances to apply config change
* 21:56 godog: restart nutcracker on mw1145
* 21:35 gwicke: restarting cassandra on restbase1005
* 20:47 mutante: temp. stopped icinga-wm
* 20:37 gwicke: deployed RESTBase 7ffaf94bfc
* 20:24 cscott: updated Parsoid to version 402ddf66
* 20:01 ottomata: resized antimony's / LV from 30G to 100G.  looks like /var/lib/git was getting filled up
* 19:43 jynus: rolling schema changes on hewiki
* 19:29 godog: downgrade and restart cassandra to 2.1.3 on restbase1001, metrics not being pushed to graphite with 2.1.6
* 19:05 godog: bounce cassandra on xenon
* 18:46 logmsgbot: ori Synchronized wmf-config/InitialiseSettings.php: Ic03b152de: Make $wgUploadPath for commons https only for benefit instant commons (duration: 00m 14s)
* 18:11 logmsgbot: twentyafterfour rebuilt wikiversions.cdb and synchronized wikiversions files: group1 wikis to 1.26wmf10
* 17:45 godog: bounce cassandra on restbase1001
* 17:39 mutante: repooled mw1234
* 17:24 ottomata: starting reinstall of Zookeeper analytics nodes (analytics102[345]): https://phabricator.wikimedia.org/T101713
* 17:16 godog: bounce cassandra on restbase1001
* 17:14 jynus: rolling schema changes on ruwiki master
* 17:13 mutante: running puppet via salt on api appservers in batches, switch to ganglia_new and carbon
* 17:12 godog: cassandra stopped sending graphite metrics after restart, investigating (test cluster works fine tho)
* 16:58 jynus: rolling schema changes on ruwiki slaves
* 16:28 godog: start upgrading restbase1001 to cassandra 2.1.6 T102015
* 16:02 logmsgbot: thcipriani Finished scap: Wikitech-Ldap host record roll-out (duration: 24m 35s)
* 15:37 logmsgbot: thcipriani Started scap: Wikitech-Ldap host record roll-out
* 15:19 logmsgbot: anomie Synchronized wmf-config/InitialiseSettings.php: SWAT: Give patrolmarks right to "*" on dewiki [[gerrit:218901]] (duration: 00m 13s)
* 15:17 logmsgbot: anomie Synchronized wmf-config/throttle.php: SWAT: Add a throttle exception for United Islands of Prague [[gerrit:217413]] (duration: 00m 14s)
* 15:15 logmsgbot: anomie Synchronized wmf-config/InitialiseSettings.php: SWAT: Disable captcha on labswiki for now [[gerrit:218908]] (duration: 00m 13s)
* 15:10 logmsgbot: anomie Synchronized wmf-config/InitialiseSettings.php: SWAT: Add extra namespace aliases for Italian Wikipedia [[gerrit:215708]] (duration: 00m 13s)
* 15:08 anomie: SWAT: Enable anti-abuse features on labswiki [[gerrit:218903]]
* 15:08 jynus: testing some schema changes on testwiki
* 15:00 logmsgbot: aude Synchronized usagetracking.dblist: Enable Wikibase usage tracking on nowiki and plwiki (duration: 00m 13s)
* 13:56 logmsgbot: aude Synchronized usagetracking.dblist: Enable Wikibase usage tracking on fiwiki and idwiki (duration: 00m 13s)
* 13:26 logmsgbot: aude Synchronized usagetracking.dblist: Enable Wikibase usage tracking on bgwiki and eowiki (duration: 00m 13s)
* 10:52 akosiaris: reload pybal on lvs1006
* 10:50 mobrovac: finished deploying mathoid I40ef68 on SCA
* 10:48 akosiaris: repooled mathoid.svc.eqiad.wmnet: sca1002 backend
* 10:44 akosiaris: enable puppet on sca1002
* 10:43 akosiaris: enable puppet
* 10:43 akosiaris: depool sca1002 for mathoid.svc.eqiad.wmnet
* 10:43 akosiaris: reloaded pybal on lvs1003
* 10:28 akosiaris: repool sca1002, depool sca1001
* 10:18 mark: Halting pvmove of md124 on labstore1001
* 09:30 akosiaris: disable puppet on sca1001
* 09:09 akosiaris: depool sca1001, resource: mathoid
* 09:09 akosiaris: puppet disabled on sca1002
* 08:37 YuviPanda: run sudo salt -t 20 -b 100 '*' cmd.run 'sudo service salt-minion restart' on virt1000, attempt to get them to answer on labcontrol1001 instead
* 06:52 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Wed Jun 17 06:52:58 UTC 2015 (duration 52m 57s)
* 02:56 logmsgbot: LocalisationUpdate completed (1.26wmf10) at 2015-06-17 02:56:49+00:00
* 02:55 logmsgbot: springle Synchronized wmf-config/db-eqiad.php: depool db1045 (duration: 00m 13s)
* 02:54 springle: found wikiversions.json modified on tin since 2015-06-16 23:27 (catrope?); stashed and reapplied the file in order to do a pull
* 02:54 logmsgbot: l10nupdate Synchronized php-1.26wmf10/cache/l10n: (no message) (duration: 04m 44s)
* 02:35 logmsgbot: LocalisationUpdate completed (1.26wmf9) at 2015-06-17 02:35:23+00:00
* 02:32 logmsgbot: l10nupdate Synchronized php-1.26wmf9/cache/l10n: (no message) (duration: 06m 12s)
* 02:21 logmsgbot: ori Synchronized php-1.26wmf9/extensions/CentralNotice/modules/ext.centralNotice.bannerController/bannerController.js: I480cbc7ad (duration: 00m 12s)
* 02:21 logmsgbot: ori Synchronized php-1.26wmf10/extensions/CentralNotice/modules/ext.centralNotice.bannerController/bannerController.js: I480cbc7ad (duration: 00m 12s)
* 00:10 paravoid: draining esams because of upcoming network maintenance window


== June 16 ==
== 2021-09-10 ==
* 23:28 logmsgbot: catrope Synchronized wmf-config/InitialiseSettings.php: Enable local upload on fawikivoyage; enable logging for T76305 (duration: 00m 13s)
* 21:28 legoktm@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'shellbox-syntaxhighlight' for release 'main' .
* 23:28 logmsgbot: catrope Synchronized wmf-config/CommonSettings.php: Set previous values for password length policies (duration: 00m 16s)
* 21:27 legoktm@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'shellbox-syntaxhighlight' for release 'main' .
* 23:17 logmsgbot: twentyafterfour Finished scap: testwiki to 1.26wmf10 (duration: 43m 04s)
* 21:21 legoktm@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'shellbox-syntaxhighlight' for release 'main' .
* 23:02 godog: restore INFO cassandra logging level on restbase1003
* 20:46 jhuneidi@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'blubberoid' for release 'production' .
* 22:44 godog: start cassandra on restbase1008
* 20:44 jhuneidi@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'blubberoid' for release 'production' .
* 22:43 godog: enable back some cassandra debugging on restbase1003
* 20:42 jhuneidi@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'blubberoid' for release 'staging' .
* 22:33 logmsgbot: twentyafterfour Started scap: testwiki to 1.26wmf10
* 18:34 volans@cumin1001: END (FAIL) - Cookbook sre.experimental.reimage (exit_code=99) for host sretest1001.eqiad.wmnet
* 22:26 urandom: restored default logging level on restbase1003
* 18:08 volans@cumin1001: START - Cookbook sre.experimental.reimage for host sretest1001.eqiad.wmnet
* 22:22 urandom: enabling even more debugging on restbase1003
* 17:16 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on puppetmaster2005.codfw.wmnet with reason: REIMAGE
* 22:14 urandom: enable (some) debug logging on restbase1003
* 17:14 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on puppetmaster2005.codfw.wmnet with reason: REIMAGE
* 21:57 logmsgbot: twentyafterfour scap failed: CalledProcessError Command '/usr/local/bin/mwscript mergeMessageFileList.php --wiki="testwiki" --list-file="/srv/mediawiki-staging/wmf-config/extension-list" --output="/tmp/tmp.SxGNHsmVYP" ' returned non-zero exit status 1 (duration: 01m 24s)
* 16:42 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on puppetmaster2004.codfw.wmnet with reason: REIMAGE
* 21:56 logmsgbot: twentyafterfour Started scap: testwiki to 1.26wmf10
* 16:40 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on puppetmaster2004.codfw.wmnet with reason: REIMAGE
* 20:34 logmsgbot: krinkle Synchronized php-1.26wmf9/extensions/WikimediaEvents/modules/ext.wikimediaEvents.resourceloader.js: T101806 live hack (duration: 00m 12s)
* 16:14 volans@cumin1001: END (FAIL) - Cookbook sre.experimental.reimage (exit_code=99) for host sretest1001.eqiad.wmnet
* 19:24 Coren: labstore1001 pvmove of slice2 to slice 51 started; some bursts of iowait expected but should have minimal enduser impact)
* 16:03 volans@cumin1001: START - Cookbook sre.experimental.reimage for host sretest1001.eqiad.wmnet
* 18:36 logmsgbot: aude Synchronized wmf-config/InitialiseSettings.php: Fix usage tracking setting (duration: 00m 14s)
* 15:39 volans@cumin1001: END (FAIL) - Cookbook sre.experimental.reimage (exit_code=99) for host sretest1001.eqiad.wmnet
* 18:03 godog: bounce statsite on graphite1001, stuck while writing to graphite
* 15:27 volans@cumin1001: START - Cookbook sre.experimental.reimage for host sretest1001.eqiad.wmnet
* 17:30 ejegg: update SmashPig on listener from e1e925c9fc2a60c1e14ef01d8b653dc09512f51f to 258f2c917b1ae50b01231927bcd6f58ecaa8940b
* 14:48 jiji@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 17:23 logmsgbot: krinkle Synchronized php-1.26wmf9/includes/resourceloader/ResourceLoader.php: undo live hack (duration: 00m 13s)
* 14:43 jiji@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 17:09 logmsgbot: aude Synchronized arbitraryaccess.dblist: Enable arbitrary access on gomwiki and lrcwiki (duration: 00m 13s)
* 13:54 jiji@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 17:09 logmsgbot: aude Synchronized usagetracking.dblist: Enable Wikibase usage tracking on second batch of s3 wikis (duration: 00m 13s)
* 09:31 XioNoX: push pfw policies - [[phab:T290611|T290611]]
* 17:03 logmsgbot: bblack Synchronized wmf-config/InitialiseSettings.php: wgCanonicalServer: HTTPS for all (duration: 00m 15s)
* 09:07 mutante: planet - deleted all state files for all languages, running fresh update via systemctl start for all languages after proxy changes ([[phab:T285251|T285251]])
* 16:44 logmsgbot: krenair Synchronized wmf-config/interwiki.cdb: Updating interwiki cache (duration: 00m 13s)
* 08:37 jynus: upgrade and restart db2139
* 16:43 logmsgbot: krenair Synchronized wmf-config/InitialiseSettings.php: (no message) (duration: 00m 13s)
* 08:14 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
* 16:43 logmsgbot: krenair Synchronized w/static/images/project-logos/gomwiki.png: (no message) (duration: 00m 14s)
* 08:14 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 16:42 logmsgbot: krenair Synchronized langlist: gomwiki (duration: 00m 13s)
* 08:14 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
* 16:41 logmsgbot: krenair rebuilt wikiversions.cdb and synchronized wikiversions files: (no message)
* 08:13 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 16:40 logmsgbot: krenair Synchronized database lists: (no message) (duration: 00m 13s)
* 08:12 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
* 16:29 logmsgbot: krenair Synchronized wmf-config/interwiki.cdb: Updating interwiki cache (duration: 00m 13s)
* 08:12 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 16:27 logmsgbot: krenair Synchronized langlist: (no message) (duration: 00m 14s)
* 07:58 jayme: updating rsyslog to 8.1901.0-1~bpo9+wmf2 on kubernetes-workers - [[phab:T289766|T289766]]
* 16:25 logmsgbot: krenair Synchronized w/static/images/project-logos/lrcwiki.png: (no message) (duration: 00m 13s)
* 07:57 moritzm: installing ntfs-3g security updates
* 16:21 moritzm: updated copper, oxygen, labstore2001 and labnodepool1001 to the 3.19 kernel
* 07:46 jayme@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
* 16:11 logmsgbot: krenair Synchronized wmf-config/interwiki.cdb: Updating interwiki cache (duration: 00m 13s)
* 07:45 jayme@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 16:10 logmsgbot: krenair Synchronized wmf-config: (no message) (duration: 00m 14s)
* 07:31 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
* 16:06 logmsgbot: krenair rebuilt wikiversions.cdb and synchronized wikiversions files: (no message)
* 07:31 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 16:05 logmsgbot: krenair Synchronized database lists: (no message) (duration: 00m 15s)
* 07:25 jayme: updating rsyslog to 8.1901.0-1~bpo9+wmf2 on kubernetes-staging - [[phab:T289766|T289766]]
* 15:43 logmsgbot: thcipriani Synchronized wmf-config/InitialiseSettings.php: SWAT: templateeditor: add templateeditor right in hewiki [[gerrit:218426]] (duration: 00m 13s)
* 07:19 jayme: importes rsyslog 8.1901.0-1~bpo9+wmf2 to stretch-wikimedia - [[phab:T289766|T289766]]
* 15:09 logmsgbot: thcipriani Synchronized wmf-config/InitialiseSettings.php: SWAT: Turn on wgGenerateThumbnailOnParse for wikitech. [[gerrit:218553]] (duration: 00m 12s)
* 06:56 effie: disable puppet on deploy1002 and mw2254
* 15:03 logmsgbot: thcipriani Synchronized wmf-config/InitialiseSettings.php: SWAT: CX: Add wikis for CX deployment on 20150616 [[gerrit:218341]] (duration: 00m 12s)
* 06:29 jayme@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'sync'.
* 14:18 cmjohnson: barium is going down for disk replacement
* 06:27 jayme@deploy1002: helmfile [codfw] START helmfile.d/admin 'sync'.
* 13:38 logmsgbot: aude Synchronized usagetracking.dblist: Enable Wikibase usage tracking on dewiki (duration: 00m 15s)
* 06:26 jayme@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'sync'.
* 13:18 akosiaris: rebooted etherpad1001 for kernel upgrades
* 06:26 jayme@deploy1002: helmfile [eqiad] START helmfile.d/admin 'sync'.
* 12:51 logmsgbot: jynus Synchronized wmf-config/db-codfw.php: Repool es2005, es2006 and es2007 after maintenance (duration: 00m 13s)
* 06:02 elukey@puppetmaster1001: conftool action : set/pooled=inactive; selector: name=mw2280.codfw.wmnet
* 12:44 logmsgbot: aude Synchronized usagetracking.dblist: Enable Wikibase usage tracking on cswiki (duration: 00m 14s)
* 05:59 jiji@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 12:20 logmsgbot: aude Synchronized usagetracking.dblist: Enable usage tracking on ruwiki (duration: 00m 15s)
* 05:56 elukey: powercycle mw2280 - no tty available in mgmt, no ssh, host frozen
* 11:21 paravoid: restarting the puppetmaster
* 05:55 elukey@puppetmaster1001: conftool action : set/pooled=no; selector: name=mw2280.codfw.wmnet
* 11:19 logmsgbot: springle Synchronized wmf-config/db-eqiad.php: repool db1073, warm up (duration: 00m 13s)
* 05:54 jiji@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 10:36 akosiaris: rebooting ganeti200{1..6}.codfw.wmnet for kernel upgrades
* 05:45 jiji@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 09:33 logmsgbot: jynus Synchronized wmf-config/db-codfw.php: Depool es2005, es2006 and es2007 for maintenance (duration: 00m 14s)
* 05:42 jiji@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 09:10 YuviPanda: deleted huge puppet-master.log on labcontrol1001
* 05:12 marostegui: Repool clouddb1017:3311
* 08:05 jynus: added m5-slave to dns servers
* 05:12 marostegui: Repool clouddb1013:3311
* 07:52 paravoid: restarting hhvm on mw1121
* 04:49 marostegui: Depool clouddb1013:3311
* 07:52 moritzm: blacklisted the overlayfs kernel module (prevents a reliable local root exploit on all Ubuntu systems). no systems in the fleet had an overlaysfs mount present or the kernel module loaded, so there should be no impact on existing systems. Note: This is a bandaid, I'll create a Phab task to deploy this via puppet in the future (and to also blacklist additional desktopy kernel modules which increase our attack
* 04:49 marostegui: Depool clouddb1017:3311
* 07:39 logmsgbot: jynus Synchronized wmf-config/db-eqiad.php: Repool es1005 (duration: 00m 14s)
* 02:52 eileen: civicrm revision changed from {{Gerrit|83f514f693}} to {{Gerrit|1f071f6c6c}}, config revision is {{Gerrit|23eda8ba3a}}
* 06:24 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Tue Jun 16 06:24:04 UTC 2015 (duration 24m 3s)
* 00:35 tgr: Deployed patch for [[phab:T290692|T290692]]
* 06:18 godog: restore ES replication throttling to 20mb/s
* 06:13 godog: restore ES replication throttling to 40mb/s
* 06:08 logmsgbot: filippo Synchronized wmf-config/PoolCounterSettings-common.php: unthrottle ES (duration: 00m 14s)
* 05:56 godog: bump ES replication throttling to 60mb/s
* 05:50 manybubbles: ok - we're yellow and recovering. ops can take this from here. We have a root cause and we have things I can complain about to the elastic folks I plan to meet with today anyway. I'm going to finish waking up now.
* 05:49 manybubbles: reenabling puppet agent on elasticsearch machines
* 05:46 manybubbles: I expect them to be red for another few minutes during the initial master recovery
* 05:45 manybubbles: started all elasticsearch nodes and now they are recovering.
* 05:41 godog: restart gmond on elastic1007
* 05:39 logmsgbot: filippo Synchronized wmf-config/PoolCounterSettings-common.php: throttle ES (duration: 00m 13s)
* 05:25 manybubbles: shutting down all the elasticsearch on the elasticsearch nodes against - another full cluster restart should fix it like it did last time...............
* 05:11 godog: restart elasticsearch on elastic1031
* 03:06 logmsgbot: springle Synchronized wmf-config/db-eqiad.php: depool db1073 (duration: 00m 12s)
* 02:27 logmsgbot: LocalisationUpdate completed (1.26wmf9) at 2015-06-16 02:27:51+00:00
* 02:24 logmsgbot: l10nupdate Synchronized php-1.26wmf9/cache/l10n: (no message) (duration: 05m 52s)
* 00:55 tgr: running extensions/Gather/maintenance/updateCounts.php for gather wikis - https://phabricator.wikimedia.org/T101460
* 00:52 logmsgbot: springle Synchronized wmf-config/db-eqiad.php: repool db1057, warm up (duration: 00m 13s)
* 00:46 godog: killed bacula-fd on graphite1001, shouldn't be running and consuming bandwidth (cc akosiaris)
* 00:27 godog: kill python stats on cp1052, filling /tmp


== June 15 ==
== 2021-09-09 ==
* 23:42 ori: Cleaning up renamed jobqueue metrics on graphite{1,2}001
* 23:07 brennen: no takers on patches, ending backport & config training window.
* 23:01 godog: killed bacula-fd on graphite2001, shouldn't be running and consuming bandwidth (cc akosiaris)
* 21:31 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 22:54 logmsgbot: hoo Synchronized wmf-config/filebackend.php: Fix commons image inclusion after commons went https only (duration: 00m 14s)
* 21:27 pt1979@cumin2002: START - Cookbook sre.dns.netbox
* 22:18 godog: run disk stress-test on restbase1007 / restbase1009
* 21:20 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 22:06 logmsgbot: twentyafterfour Synchronized hhvm-fatal-error.php: deploy: Guard header() call in error page (duration: 00m 15s)
* 21:17 pt1979@cumin2002: START - Cookbook sre.dns.netbox
* 22:05 logmsgbot: twentyafterfour Synchronized wmf-config/InitialiseSettings-labs.php: deploy: Never use wgServer/wgCanonicalServer values from production in labs (duration: 00m 12s)
* 21:02 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 20:37 logmsgbot: yurik Synchronized docroot/bits/WikipediaMobileFirefoxOS: Bumping FirefoxOS app to latest (duration: 00m 14s)
* 20:58 pt1979@cumin2002: START - Cookbook sre.dns.netbox
* 20:30 godog: bounce cassandra on restbase1003
* 19:40 jiji@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 20:18 godog: start cassandra on restbase1008, bootstrapping
* 19:37 jiji@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 20:04 godog: sign restbase1008 key, run puppet
* 19:06 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 20:00 godog: powercycle restbase1007, investigate disk issue
* 19:04 jiji@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 19:07 logmsgbot: ori Synchronized php-1.26wmf9/includes/jobqueue: 0a32aa3be4: jobqueue: use more sensible metric key names (duration: 00m 13s)
* 18:37 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 16:57 logmsgbot: thcipriani Synchronized wmf-config/InitialiseSettings.php: SWAT: Grant cloudadmins the 'editallhiera' right [[gerrit:218115]] (duration: 00m 14s)
* 18:34 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 16:48 logmsgbot: thcipriani Synchronized php-1.26wmf9/extensions/OpenStackManager/OpenStackManagerHooks.php: SWAT: refer to user the right way (duration: 00m 13s)
* 18:33 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|bc4f20437868b39ae2cc4eac8735ecb8bcd93157}}: Growth: Push 44 wikis out of dark mode ([[phab:T289680|T289680]]) (duration: 00m 57s)
* 16:48 godog: powercycle graphite1002, no ssh, unresponsive console
* 18:24 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|6af38d951f0ef9af369e2172c175628dc6e9a281}}: Deploy Growth features in dark modes to ~200 wikis ([[phab:T290582|T290582]]; 3/3) (duration: 00m 57s)
* 16:19 jynus: upgrading es1005 mysql service while depooled
* 18:22 urbanecm@deploy1002: Synchronized wmf-config/config/: {{Gerrit|6af38d951f0ef9af369e2172c175628dc6e9a281}}: Deploy Growth features in dark modes to ~200 wikis ([[phab:T290582|T290582]]; 2/3) (duration: 01m 01s)
* 16:12 logmsgbot: thcipriani Synchronized wmf-config/InitialiseSettings.php: SWAT: Grant cloudadmins the 'editallhiera' right [[gerrit:218115]] (duration: 00m 12s)
* 18:21 urbanecm@deploy1002: Synchronized dblists/growthexperiments.dblist: {{Gerrit|6af38d951f0ef9af369e2172c175628dc6e9a281}}: Deploy Growth features in dark modes to ~200 wikis ([[phab:T290582|T290582]]; 1/3) (duration: 00m 58s)
* 16:10 bblack: pybal restarts complete, all ok
* 18:21 jayme@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'sync'.
* 16:09 logmsgbot: thcipriani Finished scap: SWAT: Openstack manager and language updates (duration: 21m 27s)
* 18:20 jayme@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'sync'.
* 15:47 logmsgbot: thcipriani Started scap: SWAT: Openstack manager and language updates
* 18:20 urbanecm@deploy1002: sync-file aborted: {{Gerrit|6af38d951f0ef9af369e2172c175628dc6e9a281}}: Deploy Growth features in dark modes to ~200 wikis ([[phab:T290582|T290582]]) (duration: 00m 05s)
* 15:46 bblack: starting pybal restart process for config changes ( https://gerrit.wikimedia.org/r/#/c/218285/ ), inactives first w/ manual verification of ok-ness
* 18:18 volans@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest1001.eqiad.wmnet with reason: REIMAGE
* 15:11 bblack: rebooting cp3041 (downtimed)
* 18:18 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'sync'.
* 15:00 _joe_: ES is green
* 18:17 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 14:38 logmsgbot: aude Synchronized php-1.26wmf9/extensions/Wikidata: Fix property label constraints bug (duration: 00m 24s)
* 18:17 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'sync'.
* 14:27 logmsgbot: aude Synchronized arbitraryaccess.dblist: Enable arbitrary access on s7 wikis (duration: 00m 13s)
* 18:16 volans@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest1001.eqiad.wmnet with reason: REIMAGE
* 13:47 jynus: enabling puppet on all elastic* nodes, should enable also ganglia
* 18:13 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 13:11 logmsgbot: demon Synchronized wmf-config/PoolCounterSettings-common.php: all the search (duration: 00m 12s)
* 18:12 urbanecm: [urbanecm@mwmaint2002 /srv/mediawiki/php]$ foreachwikiindblist growthexperiments extensions/GrowthExperiments/maintenance/initWikiConfig.php --phab=[[phab:T290582|T290582]] {{!}} tee ~/initwikiconfig.out # [[phab:T290582|T290582]]
* 13:04 _joe_: re-scaling down the recovery index bandwidth in ES to 20 mb/s
* 18:11 urbanecm: Run extensions/WikimediaMaintenance/createExtensionTables.php growthexperiments for wikis in P17258 ([[phab:T290582|T290582]])
* 12:52 logmsgbot: demon Synchronized wmf-config/PoolCounterSettings-common.php: partially turn search back on (duration: 00m 13s)
* 18:06 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 11:54 _joe_: raised the ES index replica bandwidth limit to 60mb
* 18:05 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 11:31 akosiaris: migrating etherpad.wikimedia.org to etherpad1001.eqiad.wmnet
* 18:05 urbanecm@deploy1002: Synchronized wmf-config/config: no-op: {{Gerrit|76c51f2753aed9dc8e06b63de6657c3c94371a3c}}: Standardize indentation in several .yaml files (duration: 00m 58s)
* 11:15 _joe_: raised the max bytes for ES recovery to 40mbps
* 17:29 jelto@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.08-update-tendril (exit_code=0)
* 10:49 manybubbles: and we're yellow right now.
* 17:28 jelto@cumin1001: START - Cookbook sre.switchdc.mediawiki.08-update-tendril
* 10:49 manybubbles: the initial primaries stage - the red stage of the rolling restart - recovers quick-ish
* 17:28 jelto@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.08-start-maintenance (exit_code=0)
* 10:48 manybubbles: soon we should see it go yellow and stay that way while the replicas recover
* 17:26 jelto@cumin1001: START - Cookbook sre.switchdc.mediawiki.08-start-maintenance
* 10:48 manybubbles: manybubbles is confident his mighty bitch slap of the elasticsearch cluster has set it further to the road to recovery
* 17:25 jelto@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.08-run-puppet-on-db-masters (exit_code=0)
* 10:46 jynus: disabled puppet on all elasticsearch nodes to avoid restarting services and other magic
* 17:22 jelto@cumin1001: START - Cookbook sre.switchdc.mediawiki.08-run-puppet-on-db-masters
* 10:44 _joe_: disabled hot threads logging, ganglia on es nodes
* 17:21 jelto@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.08-restore-ttl (exit_code=0)
* 10:44 manybubbles: started Elasticsearch on all elasticsearch nodes
* 17:21 jelto@cumin1001: START - Cookbook sre.switchdc.mediawiki.08-restore-ttl
* 10:38 manybubbles: stopping all elasticsearch servers - going for a full cluster resstart.
* 17:21 jelto@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.08-restart-envoy-on-jobrunners (exit_code=0)
* 10:11 manybubbles: restarting elasticsearch on elasticsearch1021 - that one is in a gc death spiral
* 17:20 jelto@cumin1001: START - Cookbook sre.switchdc.mediawiki.08-restart-envoy-on-jobrunners
* 09:26 logmsgbot: oblivian Synchronized wmf-config/PoolCounterSettings-common.php: temporarily throttle down cirrussearch (duration: 00m 13s)
* 17:14 jelto@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.07-set-readwrite (exit_code=0)
* 09:12 logmsgbot: oblivian Synchronized wmf-config/PoolCounterSettings-common.php: temporarily throttle down cirrussearch (duration: 00m 13s)
* 17:14 jelto@cumin1001: [DRY-RUN] MediaWiki read-only period ends at: 2021-09-09 17:14:12.502162
* 07:35 _joe_: attempting a fast restart of elastic1020
* 17:14 jelto@cumin1001: START - Cookbook sre.switchdc.mediawiki.07-set-readwrite
* 07:21 logmsgbot: ori Synchronized php-1.26wmf9/extensions/CirrusSearch/includes/Util.php: I504dac0c3: Add missing 'use \Status;' to includes/Util.php (duration: 00m 13s)
* 17:14 jelto@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.06-set-db-readwrite (exit_code=0)
* 04:56 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Mon Jun 15 04:56:39 UTC 2015 (duration 56m 38s)
* 17:14 jelto@cumin1001: START - Cookbook sre.switchdc.mediawiki.06-set-db-readwrite
* 03:31 logmsgbot: springle Synchronized wmf-config/db-eqiad.php: depool db1057 (duration: 00m 12s)
* 17:13 jelto@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.05-invert-redis-sessions (exit_code=0)
* 02:22 logmsgbot: LocalisationUpdate completed (1.26wmf9) at 2015-06-15 02:22:56+00:00
* 17:13 jelto@cumin1001: START - Cookbook sre.switchdc.mediawiki.05-invert-redis-sessions
* 02:19 logmsgbot: l10nupdate Synchronized php-1.26wmf9/cache/l10n: (no message) (duration: 05m 46s)
* 17:13 jelto@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.04-switch-mediawiki (exit_code=0)
* 17:13 jelto@cumin1001: START - Cookbook sre.switchdc.mediawiki.04-switch-mediawiki
* 17:13 jelto@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.03-set-db-readonly (exit_code=0)
* 17:13 jelto@cumin1001: START - Cookbook sre.switchdc.mediawiki.03-set-db-readonly
* 17:12 jelto@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.02-set-readonly (exit_code=0)
* 17:12 jelto@cumin1001: [DRY-RUN] MediaWiki read-only period starts at: 2021-09-09 17:12:27.974410
* 17:12 jelto@cumin1001: START - Cookbook sre.switchdc.mediawiki.02-set-readonly
* 17:08 jelto@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.01-stop-maintenance (exit_code=0)
* 17:07 jelto@cumin1001: START - Cookbook sre.switchdc.mediawiki.01-stop-maintenance
* 17:07 jelto@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.00-warmup-caches (exit_code=0)
* 17:04 jelto@cumin1001: START - Cookbook sre.switchdc.mediawiki.00-warmup-caches
* 17:04 jelto@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.00-reduce-ttl (exit_code=0)
* 16:58 jelto@cumin1001: START - Cookbook sre.switchdc.mediawiki.00-reduce-ttl
* 16:58 jelto@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.00-disable-puppet (exit_code=0)
* 16:58 jelto@cumin1001: START - Cookbook sre.switchdc.mediawiki.00-disable-puppet
* 16:57 jelto: start cookbook sre.switchdc.mediawiki eqiad codfw --live-test this will generate some additional SAL logs here
* 16:41 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 16:36 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 16:33 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 16:23 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 16:10 volans@cumin1001: END (FAIL) - Cookbook sre.experimental.reimage (exit_code=99) for host sretest1001.eqiad.wmnet
* 16:00 volans@cumin1001: START - Cookbook sre.experimental.reimage for host sretest1001.eqiad.wmnet
* 15:34 volans@cumin1001: END (FAIL) - Cookbook sre.experimental.reimage (exit_code=99) for host sretest1001.eqiad.wmnet
* 15:32 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 15:29 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 15:28 dancy@deploy1002: Synchronized .pipeline/config.yaml: Config: [[gerrit:719610{{!}}pipeline: add comment redirecting to correct file]] (duration: 00m 59s)
* 15:24 volans@cumin1001: START - Cookbook sre.experimental.reimage for host sretest1001.eqiad.wmnet
* 14:47 mutante: planet - deleting all state and lock files for the "en" feeds ([[phab:T285251|T285251]] [[phab:T289984|T289984]])
* 14:34 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mx2002.wikimedia.org
* 14:31 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host mx2002.wikimedia.org
* 14:25 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Muehlenhoff out of all services on: 2 hosts
* 14:25 jmm@cumin2002: START - Cookbook sre.idm.logout Logging Muehlenhoff out of all services on: 2 hosts
* 14:19 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Muehlenhoff out of all services on: 2 hosts
* 14:19 jmm@cumin2002: START - Cookbook sre.idm.logout Logging Muehlenhoff out of all services on: 2 hosts
* 14:11 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=maps1007.eqiad.wmnet
* 13:48 jmm@cumin2002: END (ERROR) - Cookbook sre.hosts.reboot-single (exit_code=97) for host mx2002.wikimedia.org
* 13:16 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 13:11 mutante: planet1002 - re-enabling disabled puppet
* 13:06 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Muehlenhoff out of all services on: 2 hosts
* 13:06 jmm@cumin2002: START - Cookbook sre.idm.logout Logging Muehlenhoff out of all services on: 2 hosts
* 13:05 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Muehlenhoff out of all services on: 2 hosts
* 13:05 jmm@cumin2002: START - Cookbook sre.idm.logout Logging Muehlenhoff out of all services on: 2 hosts
* 13:03 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Muehlenhoff out of all services on: 2 hosts
* 13:03 jmm@cumin2002: START - Cookbook sre.idm.logout Logging Muehlenhoff out of all services on: 2 hosts
* 13:01 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 12:56 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 10:49 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=maps1007.eqiad.wmnet
* 10:48 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on maps1007.eqiad.wmnet with reason: Resyncing from master
* 10:48 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 5:00:00 on maps1007.eqiad.wmnet with reason: Resyncing from master
* 10:48 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=maps1007.eqiad.wmnet
* 10:48 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=maps1006.eqiad.wmnet
* 10:47 topranks: Removing peering to old IPs of AS139931 (BSCCL) at Equinix Singapore (cr3-eqsin).
* 10:45 topranks: Removing peering to AS24218 at Equinix Singapore (cr3-eqsin) - network no longer uses this ASN.
* 10:22 volans: upgrading spicerack on cumin1001
* 10:20 volans@cumin2002: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts mc1027.eqiad.wmnet
* 10:10 volans@cumin2002: START - Cookbook sre.hosts.decommission for hosts mc1027.eqiad.wmnet
* 09:56 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host mx2002.wikimedia.org
* 09:47 volans@cumin2002: END (ERROR) - Cookbook sre.hosts.decommission (exit_code=97) for hosts mc1027.eqiad.wmnet
* 09:46 volans@cumin2002: START - Cookbook sre.hosts.decommission for hosts mc1027.eqiad.wmnet
* 09:37 godog: swift eqiad add ms-be10[64-67] with initial weight - [[phab:T290546|T290546]]
* 09:19 filippo@puppetmaster1001: conftool action : set/pooled=false; selector: dnsdisc=swift-ro,name=eqiad
* 09:19 filippo@puppetmaster1001: conftool action : set/pooled=false; selector: dnsdisc=swift,name=eqiad
* 09:15 volans: rebooting sretest1001 to test ipmi reboot via spicerack
* 09:15 volans@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:20:00 on sretest1001.eqiad.wmnet with reason: testing reboot via ipmi
* 09:15 volans@cumin2002: START - Cookbook sre.hosts.downtime for 0:20:00 on sretest1001.eqiad.wmnet with reason: testing reboot via ipmi
* 09:13 btullis@cumin1001: END (PASS) - Cookbook sre.aqs.roll-restart (exit_code=0) for AQS aqs cluster: Roll restart of all AQS's nodejs daemons. - btullis@cumin1001
* 09:09 btullis@cumin1001: START - Cookbook sre.aqs.roll-restart for AQS aqs cluster: Roll restart of all AQS's nodejs daemons. - btullis@cumin1001
* 08:59 godog: move swift traffic fully to codfw to rebalance eqiad - [[phab:T287539|T287539]]
* 08:59 filippo@puppetmaster1001: conftool action : set/pooled=true; selector: dnsdisc=swift,name=codfw
* 08:58 filippo@puppetmaster1001: conftool action : set/pooled=true; selector: dnsdisc=swift-ro,name=codfw
* 08:56 volans: upgrading spicerack on cumin2002 to test the new release
* 08:50 volans: uploaded spicerack_0.0.59 to apt.wikimedia.org buster-wikimedia,bullseye-wikimedia
* 08:23 jelto: run ansible change 719041 on gitlab1001
* 08:13 jelto: run ansible change 719041 on gitlab2001
* 07:07 dzahn@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host durum1002.eqiad.wmnet
* 06:47 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm for new host durum1002.eqiad.wmnet
* 04:37 ryankemper: [WDQS] Dispatched e-mail to the banned user agent (dailymotion)
* 03:57 ryankemper: [WDQS] Dispatched e-mail to WDQS public mailing list informing them the outage is over; all that's left is the e-mail to the banned UA
* 03:47 ryankemper: [WDQS] Restarting `wdqs-blazegraph` on `wdqs[2001-2008].codfw.wmnet`; if banning the dailymotion UA was sufficient then servers should come back up healthy and not drop back into deadlock
* 03:43 ryankemper: [WDQS] Running puppet agent on `wdqs[2001-2008].codfw.wmnet` to roll out https://gerrit.wikimedia.org/r/719753
* 03:29 ryankemper: [WDQS] There's no clear indication of them being a culprit, but by far the most common user agent is a dailymotion VideocatalogTopic UA (see https://logstash.wikimedia.org/goto/51f238e9010d0220e5d33c6c210be93e)
* 03:12 bstorm: attempting to start replication on clouddb1017 s1 [[phab:T290630|T290630]]
* 03:11 bstorm: stopping and restarting mariadb on clouddb1017 s1
* 03:04 ryankemper: [WDQS] Dispatched email to Wikidata public mailing list about reduced service availability
* 02:36 ryankemper: [WDQS] https://grafana.wikimedia.org/d/000000489/wikidata-query-service?viewPanel=7&orgId=1&from=1631152574841&to=1631154942992 shows the availability pattern, anywhere we see missing data (null) represents time that blazegraph was locked up and therefore unable to report metrics
* 02:34 ryankemper: [WDQS] For context I glanced at `ryankemper@cumin1001:~$ sudo -E cumin 'P<nowiki>{</nowiki>wdqs2*<nowiki>}</nowiki>' 'sudo systemctl status wdqs-blazegraph'` before doing the aforementioned restarts and they'd all last restarted between 25-28 minutes ago
* 02:33 ryankemper: [WDQS] Restarting `wdqs-blazegraph` across all of `wdqs2*`
* 00:50 legoktm@deploy1002: Synchronized wmf-config/CommonSettings.php: Don't set default  to Score (try #2) (duration: 00m 58s)
* 00:48 legoktm@deploy1002: Synchronized php-1.37.0-wmf.21/extensions/Score/includes/Score.php: Use the 'score' Shellbox if configured ([[phab:T290193|T290193]]) (duration: 00m 57s)
* 00:46 legoktm@deploy1002: Synchronized php-1.37.0-wmf.21/includes/shell/CommandFactory.php: shell: Fix $wgShellboxUrls by passing service name when creating BoxedCommand ([[phab:T290193|T290193]]) (duration: 00m 58s)
* 00:45 legoktm@deploy1002: sync-file aborted: shell: Fix $wgShellboxUrls by passing service name when creating BoxedCommand ([[phab:T290193|T290193]] (duration: 00m 07s)
* 00:15 legoktm@deploy1002: Synchronized wmf-config/CommonSettings.php: Remove putenv() for GDFONTPATH (duration: 00m 58s)


== June 14 ==
== 2021-09-08 ==
* 10:39 YuviPanda: running du -d 2 on /srv/project in a screen sesssion on labstore1001
* 22:34 ryankemper: WDQS] [[phab:T280247|T280247]] Ran puppet-agent on `miscweb*` following merge of https://gerrit.wikimedia.org/r/c/wikidata/query/gui-deploy/+/717649
* 04:33 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Sun Jun 14 04:33:20 UTC 2015 (duration 33m 19s)
* 22:24 ryankemper: WDQS] [[phab:T280247|T280247]] Ran puppet-agent on `miscweb*` following merge of https://gerrit.wikimedia.org/r/c/wikidata/query/gui-deploy/+/714623
* 02:42 logmsgbot: reedy Synchronized wmf-config/extension-list: noop (duration: 00m 13s)
* 21:55 ryankemper: [WDQS] [[phab:T280247|T280247]] Purged varnish to make sure change took effect: `echo 'https://query-preview.wikidata.org/' {{!}} mwscript purgeList.php` and `echo 'https://query.wikidata.org/' {{!}} mwscript purgeList.php` on `mwmaint1002`
* 02:40 logmsgbot: krenair Synchronized wmf-config/squid-labs.php: sync random labs-only file to test per irc (duration: 00m 13s)
* 21:53 ryankemper: [WDQS] [[phab:T280247|T280247]] Merged https://gerrit.wikimedia.org/r/c/operations/puppet/+/719502 and ran puppet-agent on `miscweb*`
* 02:21 logmsgbot: LocalisationUpdate completed (1.26wmf9) at 2015-06-14 02:21:28+00:00
* 20:49 eileen: civicrm revision changed from {{Gerrit|593d01f4fc}} to {{Gerrit|83f514f693}}, config revision is {{Gerrit|23eda8ba3a}}
* 02:18 logmsgbot: l10nupdate Synchronized php-1.26wmf9/cache/l10n: (no message) (duration: 05m 47s)
* 20:41 legoktm: Successfully published image docker-registry.discovery.wmnet/php7.2-fpm-multiversion-base:1.0.2
* 19:25 Krinkle: krinkle@mw1369 Running some benchmarks in Eqiad on load.php
* 18:27 urbanecm@deploy1002: Synchronized wmf-config/config/itwiki.yaml: {{Gerrit|6bcbe61f9a89086b775d84a81d55a7587cf26780}}: Italian Wikipedia is now a group 1 wiki ([[phab:T286664|T286664]]; 2/2) (duration: 00m 58s)
* 18:26 urbanecm@deploy1002: Synchronized dblists/: {{Gerrit|6bcbe61f9a89086b775d84a81d55a7587cf26780}}: Italian Wikipedia is now a group 1 wiki ([[phab:T286664|T286664]]; 1/2) (duration: 00m 58s)
* 18:10 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|bbefce6a3778f159ad68587c830dff4a1da0c792}}: Growth: Remove config that moved on-wiki ([[phab:T290295|T290295]]) (duration: 00m 58s)
* 18:03 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|950a377e5ba6f5d318135e31b36334532d9ae71b}}: Stop setting $wgAbuseFilterParserClass ([[phab:T239990|T239990]]) (duration: 00m 58s)
* 17:01 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts maps2004.codfw.wmnet
* 16:53 hnowlan@cumin1001: START - Cookbook sre.hosts.decommission for hosts maps2004.codfw.wmnet
* 16:52 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts maps2003.codfw.wmnet
* 16:37 hnowlan@cumin1001: START - Cookbook sre.hosts.decommission for hosts maps2003.codfw.wmnet
* 16:37 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts maps2001.codfw.wmnet
* 16:33 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.21/extensions/GrowthExperiments/maintenance/updateMenteeData.php: {{Gerrit|796e23c87ccfc48334ab932e13aab4f0ec746bbd}}: updateMenteeData.php: Make it possible to force update (duration: 00m 58s)
* 16:28 ladsgroup@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:719524{{!}}Turn off jQuery migrate on wikisource wikis (T280944)]] (duration: 00m 59s)
* 16:23 hnowlan@cumin1001: START - Cookbook sre.hosts.decommission for hosts maps2001.codfw.wmnet
* 16:22 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=maps1006.eqiad.wmnet
* 16:14 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 20:00:00 on maps1006.eqiad.wmnet with reason: Resyncing from master
* 16:14 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 20:00:00 on maps1006.eqiad.wmnet with reason: Resyncing from master
* 16:13 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on maps1006.eqiad.wmnet with reason: Resyncing from master
* 16:13 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 5:00:00 on maps1006.eqiad.wmnet with reason: Resyncing from master
* 16:13 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=maps1005.eqiad.wmnet
* 15:43 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1067.eqiad.wmnet
* 15:43 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1065.eqiad.wmnet
* 15:43 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1066.eqiad.wmnet
* 15:41 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1064.eqiad.wmnet
* 15:38 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be1067.eqiad.wmnet
* 15:37 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be1065.eqiad.wmnet
* 15:37 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be1066.eqiad.wmnet
* 15:37 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be1064.eqiad.wmnet
* 14:57 marostegui: Retroactive: started to warm up eqiad databaes
* 14:57 moritzm: installing 4.19.194 kernels on stretch systems with 4.19.x (no reboots yet)
* 14:54 brennen: gitlab: upgrading gitlab2001, followed by gitlab1001, to 14.2.3 ([[phab:T289802|T289802]])
* 14:53 robh@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on ms-be1067.eqiad.wmnet with reason: REIMAGE
* 14:51 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be1067.eqiad.wmnet with reason: REIMAGE
* 14:33 moritzm: installing zeromq3 security updates
* 13:50 mbsantos@deploy1002: Finished deploy [kartotherian/deploy@eb211ac]: kartotherian: restore v4 maxzoom to z15 (duration: 06m 42s)
* 13:44 mbsantos@deploy1002: Started deploy [kartotherian/deploy@eb211ac]: kartotherian: restore v4 maxzoom to z15
* 13:38 brennen: gitlab: upgrading gitlab2001, followed by gitlab1001, to 14.1.5 ([[phab:T289802|T289802]])
* 13:13 brennen: gitlab1001: downtiming alerts for 2.5 hours; upgrading to 14.0.10 ([[phab:T289802|T289802]])
* 12:45 brennen: gitlab: pausing all runners in preparation for upgrade to 14.0.10 ([[phab:T289802|T289802]])
* 11:57 moritzm: installing curl security updates on stretch
* 11:09 jbond: upload statograph_0.1.2
* 11:02 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on maps1005.eqiad.wmnet with reason: Resyncing from master
* 11:01 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 5:00:00 on maps1005.eqiad.wmnet with reason: Resyncing from master
* 11:01 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=maps1005.eqiad.wmnet
* 10:06 jelto: upgrade gitlab2001 to gitlab-ce=14.0.10-ce.0
* 10:03 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on gitlab2001.wikimedia.org with reason: upgrade gitlab2001 to new version https://phabricator.wikmiedia.org/T289802
* 10:03 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on gitlab2001.wikimedia.org with reason: upgrade gitlab2001 to new version https://phabricator.wikmiedia.org/T289802
* 09:38 godog: start rollout of prometheus-rsyslog-exporter 0.0.0+git20201008-3 to wikimedia.org - [[phab:T210137|T210137]]
* 09:29 godog: start rollout of prometheus-rsyslog-exporter 0.0.0+git20201008-3 to codfw - [[phab:T210137|T210137]]
* 09:09 godog: start rollout of prometheus-rsyslog-exporter 0.0.0+git20201008-3 to eqiad - [[phab:T210137|T210137]]
* 07:45 godog: start rollout of prometheus-rsyslog-exporter 0.0.0+git20201008-3 to eqsin/esams/ulsfo - [[phab:T210137|T210137]]
* 06:46 ryankemper: [WDQS] Manually running puppet-agent on `miscweb2002.codfw.wmnet,miscweb1002.eqiad.wmnet`
* 06:45 ryankemper: [WDQS] Merged https://gerrit.wikimedia.org/r/c/operations/puppet/+/719185 to rollback query.wikidata.org changes
* 02:59 eileen: civicrm revision changed from {{Gerrit|06ef98593f}} to {{Gerrit|593d01f4fc}}, config revision is {{Gerrit|5f004d94d7}}
* 00:00 legoktm: legoktm@lists1001:~$ sudo rm -rf /etc/mailman # cleanup as part of {{Gerrit|4869d91b0be}} / [[phab:T282303|T282303]]


== June 13 ==
== 2021-09-07 ==
* 19:30 bblack: repooled cp1071, cp3040
* 23:25 robh@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 18:53 bblack: rebooting cp1071, cp3040 to look at BIOS-level things (depooled, icinga-downed)
* 23:20 robh@cumin1001: START - Cookbook sre.dns.netbox
* 17:08 logmsgbot: krinkle Synchronized php-1.26wmf9/extensions/WikimediaEvents: T101806 (duration: 00m 12s)
* 23:13 ladsgroup@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:719381{{!}}Enable UrlShortener everywhere (T267925)]] (duration: 00m 58s)
* 15:47 paravoid: labstore1001: stopping manage-nfs-volumes daemon
* 23:07 dpifke@deploy1002: Synchronized wmf-config/profiler.php: Config: [[gerrit:716041{{!}}profiler: use seperate pipeline inside k8s pods (T288165)]] (duration: 00m 58s)
* 04:41 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Sat Jun 13 04:41:57 UTC 2015 (duration 41m 56s)
* 22:29 cstone: SmashPig revision changed from {{Gerrit|afd362b163}} to {{Gerrit|3607b16f83}}
* 03:51 Krinkle: Running deleteEqualMessages.php for sawiki (T45917)
* 20:41 ladsgroup@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:715018{{!}}Set $wgWBRepoSettings['tmpNormalizeDataValues'] on all wikis (T251480)]] (duration: 00m 59s)
* 03:49 Krinkle: Running deleteEqualMessages.php for cewiki (T45917)
* 20:31 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 02:21 logmsgbot: LocalisationUpdate completed (1.26wmf9) at 2015-06-13 02:20:58+00:00
* 20:27 pt1979@cumin2002: START - Cookbook sre.dns.netbox
* 02:18 logmsgbot: l10nupdate Synchronized php-1.26wmf9/cache/l10n: (no message) (duration: 05m 19s)
* 17:18 jgiannelos@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'push-notifications' for release 'main' .
* 00:17 gwicke: restarted cassandra on restbase1001
* 17:09 jgiannelos@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'push-notifications' for release 'main' .
* 00:13 gwicke: restarted cassandra on restbase1002
* 17:01 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'push-notifications' for release 'main' .
* 16:39 moritzm: installing jetty9 security updates on buster
* 16:30 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5 days, 8:00:00 on planet1002.eqiad.wmnet with reason: known issue
* 16:30 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 5 days, 8:00:00 on planet1002.eqiad.wmnet with reason: known issue
* 16:30 dancy@deploy1002: Synchronized README: testing (duration: 00m 59s)
* 15:18 akosiaris: run_benchmarky.py against mwdebug.svc.codfw.wmnet for performance tests
* 15:07 akosiaris@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 15:04 jbond: upload python-prometheus-client_0.6.0 to stretch-wikimedia
* 14:50 mutante: snapshot1015 - manually removed prometheus-puppet-agent-stats from crontab which was sending spam and is now a timer
* 14:33 mutante: CI - migrating zuul-merger cronjob to systemd timer (contint*)
* 14:23 XioNoX: re-pool esams-eqiad - [[phab:T288503|T288503]]
* 14:23 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on cloudcephosd1024.eqiad.wmnet with reason: REIMAGE
* 14:23 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd1024.eqiad.wmnet with reason: REIMAGE
* 14:22 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on cloudcephosd1023.eqiad.wmnet with reason: REIMAGE
* 14:22 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd1023.eqiad.wmnet with reason: REIMAGE
* 14:17 marostegui: No more db maintenance on eqiad [[phab:T288594|T288594]]
* 14:08 mutante: alert1001 - temp disabled puppet, stopped icinga-wm
* 14:07 mutante: temp killed icinga-wm because of flooding
* 14:01 Emperor: removing pc2010 from orchestrator [[phab:T289117|T289117]]
* 13:59 Emperor: removing pc2010 from tendril and zarcillo [[phab:T289117|T289117]]
* 13:57 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 13:57 XioNoX: drain esams-eqiad for circuit maintenance - [[phab:T288503|T288503]]
* 13:54 pt1979@cumin2002: START - Cookbook sre.dns.netbox
* 13:51 jayme: uncordoned kubestage2001
* 13:50 jiji@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 13:49 mutante: mw2264 - scap pulled and repooled after [[phab:T290242|T290242]]
* 13:49 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2264.codfw.wmnet
* 13:43 jiji@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 13:40 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts pc2010.codfw.wmnet
* 13:25 mvernon@cumin1001: START - Cookbook sre.hosts.decommission for hosts pc2010.codfw.wmnet
* 13:21 Emperor: removing pc2009 from orchestrator [[phab:T289116|T289116]]
* 13:21 Emperor: removing pc2009 from tendril and zarcillo [[phab:T289116|T289116]]
* 13:02 marostegui@cumin1001: dbctl commit (dc=all): 'fix s8 weights [[phab:T288594|T288594]]', diff saved to https://phabricator.wikimedia.org/P17248 and previous config saved to /var/cache/conftool/dbconfig/20210907-130244-marostegui.json
* 12:59 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts pc2009.codfw.wmnet
* 12:51 mvernon@deploy1002: Synchronized wmf-config/ProductionServices.php: Remove old decommissioned pc hosts [[phab:T284825|T284825]] (duration: 01m 02s)
* 12:45 mvernon@cumin1001: START - Cookbook sre.hosts.decommission for hosts pc2009.codfw.wmnet
* 12:27 marostegui@cumin1001: dbctl commit (dc=all): 'fix s1 weights [[phab:T288594|T288594]]', diff saved to https://phabricator.wikimedia.org/P17247 and previous config saved to /var/cache/conftool/dbconfig/20210907-122747-marostegui.json
* 12:27 marostegui@cumin1001: dbctl commit (dc=all): 'fix s1 weights [[phab:T288594|T288594]]', diff saved to https://phabricator.wikimedia.org/P17246 and previous config saved to /var/cache/conftool/dbconfig/20210907-122708-marostegui.json
* 11:46 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for 6 hosts
* 11:46 btullis@cumin1001: START - Cookbook sre.hosts.remove-downtime for 6 hosts
* 11:36 awight: EU backport complete
* 11:33 awight@deploy1002: Synchronized php-1.37.0-wmf.21/extensions/CodeMirror/extension.json: Backport: [[gerrit:719170{{!}}Change line numbers default to null (T290226)]] (duration: 00m 59s)
* 11:28 awight@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:717192{{!}}Set template namespace for code mirror line numbering (T290226)]] (duration: 00m 59s)
* 10:51 Emperor: removing pc2008 from orchestrator [[phab:T289115|T289115]]
* 10:49 Emperor: removing pc2008 from tendril and zarcillo [[phab:T289115|T289115]]
* 10:46 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts pc2008.codfw.wmnet
* 10:35 mvernon@cumin1001: START - Cookbook sre.hosts.decommission for hosts pc2008.codfw.wmnet
* 10:29 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5 days, 0:00:00 on 6 hosts with reason: commissioning aqs_new hosts
* 10:29 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 5 days, 0:00:00 on 6 hosts with reason: commissioning aqs_new hosts
* 10:29 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5 days, 0:00:00 on aqs1010.eqiad.wmnet with reason: commissioning aqs_new hosts
* 10:29 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 5 days, 0:00:00 on aqs1010.eqiad.wmnet with reason: commissioning aqs_new hosts
* 10:27 Emperor: removing pc1010 from orchestrator [[phab:T289122|T289122]]
* 10:22 Emperor: removing pc1010 from tendril and zarcillo [[phab:T289122|T289122]]
* 10:15 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts pc1010.eqiad.wmnet
* 10:02 mvernon@cumin1001: START - Cookbook sre.hosts.decommission for hosts pc1010.eqiad.wmnet
* 09:46 Emperor: removing pc1009 from orchestrator [[phab:T289120|T289120]]
* 09:26 Emperor: removing pc1009 from tendril and zarcillo [[phab:T289120|T289120]]
* 09:25 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts pc1009.eqiad.wmnet
* 09:16 mvernon@cumin1001: START - Cookbook sre.hosts.decommission for hosts pc1009.eqiad.wmnet
* 08:57 elukey@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 08:53 elukey@cumin1001: START - Cookbook sre.dns.netbox
* 08:51 Emperor: removing pc1008 from orchestrator [[phab:T289119|T289119]]
* 08:44 Emperor: removing pc1008 from tendril and zarcillo [[phab:T289119|T289119]]
* 08:42 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts pc1008.eqiad.wmnet
* 08:31 mvernon@cumin1001: START - Cookbook sre.hosts.decommission for hosts pc1008.eqiad.wmnet
* 08:29 marostegui@cumin1001: dbctl commit (dc=all): 'More weight for db2090 into API [[phab:T288803|T288803]]', diff saved to https://phabricator.wikimedia.org/P17241 and previous config saved to /var/cache/conftool/dbconfig/20210907-082952-marostegui.json
* 08:25 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
* 08:25 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 08:25 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
* 08:24 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 08:02 marostegui@cumin1001: dbctl commit (dc=all): 'db2090 (re)pooling @ 100%: Slowly repool [[phab:T288803|T288803]]', diff saved to https://phabricator.wikimedia.org/P17240 and previous config saved to /var/cache/conftool/dbconfig/20210907-080230-root.json
* 07:52 kormat@cumin1001: dbctl commit (dc=all): 'db2118 (re)pooling @ 100%: reimage to buster (now with fixed pool config) [[phab:T288244|T288244]]', diff saved to https://phabricator.wikimedia.org/P17239 and previous config saved to /var/cache/conftool/dbconfig/20210907-075235-kormat.json
* 07:49 marostegui@cumin1001: dbctl commit (dc=all): 'More weight for db2090 into API [[phab:T288803|T288803]]', diff saved to https://phabricator.wikimedia.org/P17238 and previous config saved to /var/cache/conftool/dbconfig/20210907-074901-marostegui.json
* 07:47 marostegui@cumin1001: dbctl commit (dc=all): 'db2090 (re)pooling @ 75%: Slowly repool [[phab:T288803|T288803]]', diff saved to https://phabricator.wikimedia.org/P17237 and previous config saved to /var/cache/conftool/dbconfig/20210907-074726-root.json
* 07:37 kormat@cumin1001: dbctl commit (dc=all): 'db2118 (re)pooling @ 75%: reimage to buster (now with fixed pool config) [[phab:T288244|T288244]]', diff saved to https://phabricator.wikimedia.org/P17236 and previous config saved to /var/cache/conftool/dbconfig/20210907-073731-kormat.json
* 07:37 godog: +100G for prometheus/k8s codfw
* 07:34 marostegui@cumin1001: dbctl commit (dc=all): 'Start to pool db2090 into API [[phab:T288803|T288803]]', diff saved to https://phabricator.wikimedia.org/P17235 and previous config saved to /var/cache/conftool/dbconfig/20210907-073436-marostegui.json
* 07:32 marostegui@cumin1001: dbctl commit (dc=all): 'db2090 (re)pooling @ 50%: Slowly repool [[phab:T288803|T288803]]', diff saved to https://phabricator.wikimedia.org/P17234 and previous config saved to /var/cache/conftool/dbconfig/20210907-073222-root.json
* 07:22 kormat@cumin1001: dbctl commit (dc=all): 'db2118 (re)pooling @ 50%: reimage to buster (now with fixed pool config) [[phab:T288244|T288244]]', diff saved to https://phabricator.wikimedia.org/P17233 and previous config saved to /var/cache/conftool/dbconfig/20210907-072227-kormat.json
* 07:17 marostegui@cumin1001: dbctl commit (dc=all): 'db2090 (re)pooling @ 25%: Slowly repool [[phab:T288803|T288803]]', diff saved to https://phabricator.wikimedia.org/P17232 and previous config saved to /var/cache/conftool/dbconfig/20210907-071719-root.json
* 07:13 jayme@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
* 07:13 jayme@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'wikifeeds' for release 'staging' .
* 07:07 kormat@cumin1001: dbctl commit (dc=all): 'db2118 (re)pooling @ 25%: reimage to buster (now with fixed pool config) [[phab:T288244|T288244]]', diff saved to https://phabricator.wikimedia.org/P17231 and previous config saved to /var/cache/conftool/dbconfig/20210907-070724-kormat.json
* 07:07 kormat@cumin1001: dbctl commit (dc=all): 'Fixing db2118's pooling config [[phab:T288244|T288244]]', diff saved to https://phabricator.wikimedia.org/P17230 and previous config saved to /var/cache/conftool/dbconfig/20210907-070702-kormat.json
* 07:02 marostegui@cumin1001: dbctl commit (dc=all): 'db2090 (re)pooling @ 10%: Slowly repool [[phab:T288803|T288803]]', diff saved to https://phabricator.wikimedia.org/P17229 and previous config saved to /var/cache/conftool/dbconfig/20210907-070215-root.json
* 06:47 marostegui@cumin1001: dbctl commit (dc=all): 'db2090 (re)pooling @ 5%: Slowly repool [[phab:T288803|T288803]]', diff saved to https://phabricator.wikimedia.org/P17228 and previous config saved to /var/cache/conftool/dbconfig/20210907-064711-root.json
* 05:15 marostegui: Optimize eowiki.flaggedtemplates in eqiad [[phab:T290057|T290057]]
* 05:15 marostegui: Optimize vecwiki.flaggedtemplates in eqiad [[phab:T290057|T290057]]
* 05:14 marostegui: Optimize kawiki.flaggedtemplates in eqiad [[phab:T290057|T290057]]


== June 12 ==
== 2021-09-06 ==
* 22:57 ejegg: rolled back SmashPig on listener from 15acdafef9d9682c417632e5ac5a5f2e5380f92e to e1e925c9fc2a60c1e14ef01d8b653dc09512f51f
* 23:52 tstarling@deploy1002: Synchronized php-1.37.0-wmf.21/extensions/SecurePoll/includes/Talliers/STVTallier.php: [[phab:T290000|T290000]] (duration: 00m 58s)
* 22:40 ejegg: updated SmashPig on listener from e1e925c9fc2a60c1e14ef01d8b653dc09512f51f to 15acdafef9d9682c417632e5ac5a5f2e5380f92e
* 16:14 Amir1: Deployed patch for [[phab:T290394|T290394]]
* 22:24 godog: upgrade and bounce carbon daemons on graphite2001 to investigate T101572
* 15:01 Emperor: removing pc1007 from orchestrator [[phab:T289118|T289118]]
* 21:16 logmsgbot: ori Synchronized wmf-config/InitialiseSettings.php: I3694489ba: wgCanonicalServer->https for new HTTPS domains (duration: 00m 14s)
* 15:00 jiji@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 20:33 logmsgbot: krenair Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/217878/1 (duration: 00m 13s)
* 14:53 kormat@cumin1001: dbctl commit (dc=all): 'db2118 (re)pooling @ 25%: reimage to buster [[phab:T288244|T288244]]', diff saved to https://phabricator.wikimedia.org/P17226 and previous config saved to /var/cache/conftool/dbconfig/20210906-145341-kormat.json
* 20:32 logmsgbot: krenair Synchronized w/static/images/project-logos/dawiki-200k.png: https://gerrit.wikimedia.org/r/#/c/217878/1 (duration: 00m 16s)
* 14:50 Emperor: removing pc1007 from tendril and zarcillo [[phab:T289118|T289118]]
* 20:15 logmsgbot: krenair Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/217670/ (duration: 00m 12s)
* 14:45 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts pc1007.eqiad.wmnet
* 19:28 ejegg: updated SmashPig on payments-listener from f9c3eaa99fa0fe8ef098d0fc876091d3676aa039 to 5a463400bc74706ba7bf6256cd0101014e792acb
* 14:45 volans@cumin1001: END (ERROR) - Cookbook sre.hosts.decommission (exit_code=97) for hosts mc1026.eqiad.wmnet
* 19:28 ejegg: updated SmashPig on payments-listener ccepting New Patients:
* 14:44 volans@cumin1001: START - Cookbook sre.hosts.decommission for hosts mc1026.eqiad.wmnet
* 18:47 ejegg: updated SmashPig on payments-listener from 7fed22ad933a6d3e371d60dfc6f8fdd0f9131510 to f9c3eaa99fa0fe8ef098d0fc876091d3676aa039
* 14:36 volans@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts mc1027.eqiad.wmnet
* 18:45 logmsgbot: faidon Synchronized wmf-config/InitialiseSettings.php: remove wmgHTTPSBlacklistCountries (duration: 00m 12s)
* 14:35 mvernon@cumin1001: START - Cookbook sre.hosts.decommission for hosts pc1007.eqiad.wmnet
* 18:45 logmsgbot: faidon Synchronized wmf-config/CommonSettings.php: remove CanIPUseHTTPS hook (duration: 00m 13s)
* 14:22 volans@cumin1001: START - Cookbook sre.hosts.decommission for hosts mc1027.eqiad.wmnet
* 17:39 moritzm: updated cerium, xenon and praseodymium to 3.19 kernel
* 14:19 ladsgroup@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:715492{{!}}Set permission of creating short url to everyone everywhere (T267921 T267925)]], Part II (duration: 00m 57s)
* 17:08 ejegg: enabled queue consumer
* 14:17 ladsgroup@deploy1002: Synchronized wmf-config/CommonSettings.php: Config: [[gerrit:715492{{!}}Set permission of creating short url to everyone everywhere (T267921 T267925)]], Part I (duration: 00m 59s)
* 17:08 ejegg: updated crm from d13aaa4e9e937b0b1ae1f5de61ea7ff1f316d58f to bd8a00196071ddd04efbff7b30567dd9357c9000
* 14:12 moritzm: installing postgres 9.6 security updates
* 16:53 ejegg: disabled donations queue consumer
* 14:05 gehel: re-pooling wdqs1007, catched up on lag
* 15:52 logmsgbot: faidon Synchronized wmf-config/CommonSettings.php: hide prefershttps user pref (duration: 00m 13s)
* 13:56 jbond: update facter networking fact gerrit:715949
* 15:40 logmsgbot: faidon Synchronized docroot/search.wikimedia.org/index.php: unbreak search.wikimedia.org due to HTTPS (duration: 00m 12s)
* 13:51 jiji@deploy1002: Synchronized wmf-config/ProductionServices.php: Config: [[gerrit:719118{{!}}ProductionServices: fix comment for rdb* servers]] (duration: 00m 58s)
* 15:27 jynus: mysql load issues on labsdb1003, investigating
* 13:42 moritzm: updated thirdparty/gitlab component to 14.0.10 [[phab:T284811|T284811]]
* 13:39 moritzm: updated etcd* to 3.19 kernel
* 13:04 jiji@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 12:11 jynus: restarting mariadb at labsdb1003
* 12:42 jiji@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 11:58 moritzm: updated rdb200* to 3.19 kernel
* 12:42 akosiaris@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
* 11:31 jynus: db2068 up but all services and console login unresponsive, powercycling
* 12:42 akosiaris@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
* 10:06 springle: killed a bunch of queries hammering labsdb1003 for days
* 12:41 akosiaris@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'sync'.
* 09:58 moritzm: updated mc2004 to mc2016 to 3.19 kernel
* 12:40 akosiaris@deploy1002: helmfile [codfw] START helmfile.d/admin 'sync'.
* 06:06 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Fri Jun 12 06:06:55 UTC 2015 (duration 6m 54s)
* 12:29 jiji@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 04:37 logmsgbot: ori Synchronized php-1.26wmf9/extensions/FlaggedRevs: I4cfb47b41: Avoid post-redirect parse for certain edits (duration: 00m 14s)
* 12:06 godog: silence statograph until thurs on alert1001 - [[phab:T290425|T290425]]
* 02:40 logmsgbot: LocalisationUpdate completed (1.26wmf9) at 2015-06-12 02:40:36+00:00
* 11:58 urbanecm: [urbanecm@mwmaint2002 ~]$ mwscript renameRestrictions.php --wiki=plwiki 'editor' 'editeditorprotected' # [[phab:T230103|T230103]]
* 02:34 logmsgbot: l10nupdate Synchronized php-1.26wmf9/cache/l10n: (no message) (duration: 10m 00s)
* 11:56 urbanecm: [urbanecm@mwmaint2002 ~]$ mwscript renameRestrictions.php --wiki=<nowiki>{</nowiki>hewiki,lvwiki,srwiki,srwikibooks<nowiki>}</nowiki> 'autopatrol' 'editautopatrolprotected' # [[phab:T230103|T230103]]
* 00:40 logmsgbot: legoktm Synchronized wmf-config/InitialiseSettings-labs.php: https://gerrit.wikimedia.org/r/217759 (duration: 00m 15s)
* 11:53 urbanecm: [urbanecm@mwmaint2002 ~]$ mwscript renameRestrictions.php --wiki=etwiki 'autopatrol' 'editautopatrolprotected' # [[phab:T230103|T230103]]
* 00:07 logmsgbot: catrope Synchronized wmf-config/InitialiseSettings-labs.php: (no message) (duration: 00m 14s)
* 11:50 urbanecm: [urbanecm@mwmaint2002 ~]$ mwscript renameRestrictions.php --wiki=dewiktionary 'autoreviewprotected' 'editautoreviewprotected' # [[phab:T230103|T230103]]
* 11:48 urbanecm: [urbanecm@mwmaint2002 ~]$ mwscript renameRestrictions.php --wiki=arwiki 'autoreview' 'editautoreviewprotected' # [[phab:T230103|T230103]]
* 11:07 urbanecm: EU B&C window done
* 11:06 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|c8d7cf8f7c3faaf3773940e96ba0cf599e725237}}: foundationwiki: Create editor group ([[phab:T205352|T205352]]) (duration: 00m 57s)
* 11:04 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|f90862be8c7b540065da24c24f2e2ac0df5b9d07}}: Growth: Define wgGEMentorDashboardDiscoveryEnabled ([[phab:T289054|T289054]]) (duration: 00m 58s)
* 11:02 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.21/maintenance/renameRestrictions.php: {{Gerrit|18e43ecca7d25d2d93de2f98f3bf5b36f5d4b780}}: renameRestrictions.php: Update protected_titles as well ([[phab:T290398|T290398]]) (duration: 00m 59s)
* 10:39 volans@cumin1001: END (ERROR) - Cookbook sre.hosts.decommission (exit_code=97) for hosts mc1027.eqiad.wmnet
* 10:38 volans@cumin1001: START - Cookbook sre.hosts.decommission for hosts mc1027.eqiad.wmnet
* 10:22 volans@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts mc1027.eqiad.wmnet
* 10:17 volans@cumin1001: START - Cookbook sre.hosts.decommission for hosts mc1027.eqiad.wmnet
* 09:22 gehel: depooling wdqs1007, catching up on lag
* 09:06 gehel: restart blazegraph and updater on wdqs1007
* 08:46 jbond: update networking fact - gerrit:715943
* 07:57 godog: fail sdw on ms-be1062, reported errors
* 07:51 moritzm: installing libssh security updates
* 07:45 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
* 07:45 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 07:44 moritzm: installing squashfs-tools security updates
* 06:56 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
* 06:56 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 06:28 marostegui: Optimize table mkwiki.flaggedtemplates in eqiad [[phab:T290057|T290057]]
* 06:26 marostegui: Optimize table bewiki.flaggedtemplates in eqiad [[phab:T290057|T290057]]
* 06:23 marostegui: Optimize table dewiki.flaggedtemplates in eqiad [[phab:T290057|T290057]]
* 05:34 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2090.codfw.wmnet with reason: REIMAGE
* 05:32 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db2090.codfw.wmnet with reason: REIMAGE
* 05:07 marostegui: Stop replication on db2090 (old s4 master) [[phab:T289650|T289650]] [[phab:T288803|T288803]]
* 05:05 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2110 (current master) from API [[phab:T289650|T289650]]', diff saved to https://phabricator.wikimedia.org/P17223 and previous config saved to /var/cache/conftool/dbconfig/20210906-050502-marostegui.json
* 05:04 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2090 [[phab:T289650|T289650]]', diff saved to https://phabricator.wikimedia.org/P17222 and previous config saved to /var/cache/conftool/dbconfig/20210906-050419-marostegui.json
* 05:01 marostegui@cumin1001: dbctl commit (dc=all): 'Promote db2110 to s4 primary and set section read-write [[phab:T289650|T289650]]', diff saved to https://phabricator.wikimedia.org/P17221 and previous config saved to /var/cache/conftool/dbconfig/20210906-050140-root.json
* 05:00 marostegui@cumin1001: dbctl commit (dc=all): 'Set s4 codfw as read-only for maintenance - [[phab:T289650|T289650]]', diff saved to https://phabricator.wikimedia.org/P17220 and previous config saved to /var/cache/conftool/dbconfig/20210906-050048-root.json
* 05:00 marostegui: Starting s4 codfw failover from db2090 to db2110 - [[phab:T289650|T289650]]
* 04:07 marostegui@cumin1001: dbctl commit (dc=all): 'Set db2110 with weight 0 [[phab:T289650|T289650]]', diff saved to https://phabricator.wikimedia.org/P17219 and previous config saved to /var/cache/conftool/dbconfig/20210906-040740-root.json
* 04:07 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on 33 hosts with reason: Primary switchover s4 [[phab:T289650|T289650]]
* 04:06 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on 33 hosts with reason: Primary switchover s4 [[phab:T289650|T289650]]


== June 11 ==
== 2021-09-05 ==
* 23:59 logmsgbot: legoktm Synchronized wmf-config/InitialiseSettings-labs.php: https://gerrit.wikimedia.org/r/217753 (duration: 00m 16s)
* 18:54 urbanecm: wikiadmin@10.192.0.119(ptwiki)> update protected_titles set pt_create_perm='editautoreviewprotected' where pt_create_perm='autoreviewer'; # [[phab:T290396|T290396]]
* 23:54 logmsgbot: ori Synchronized php-1.26wmf9/includes/EditPage.php: cf7df757f2: Instrument edit failures (duration: 00m 14s)
* 23:41 logmsgbot: ebernhardson Synchronized php-1.26wmf9/extensions/MobileFrontend: Bump MobileFrontend in 1.26wmf9 for SWAT (duration: 00m 14s)
* 23:40 ejegg: updated civicrm from 7ffe0cefb019828a09c9369187f14518847b5f41 to d13aaa4e9e937b0b1ae1f5de61ea7ff1f316d58f
* 23:24 logmsgbot: ebernhardson Synchronized php-1.26wmf9/extensions/CirrusSearch/: Fix prefer-recent queries in cirrussearch (duration: 00m 13s)
* 23:02 ejegg: updated SmashPig on the rest of the cluster from 477e8a8be5ea895262031c147330de5a651cc3ac to 7fed22ad933a6d3e371d60dfc6f8fdd0f9131510
* 22:17 godog: temporary bump php memory_limit on magnesium to test T102092
* 22:11 ejegg: updated SmashPig on payments-listener from 477e8a8be5ea895262031c147330de5a651cc3ac to 7fed22ad933a6d3e371d60dfc6f8fdd0f9131510
* 21:54 ori: Widespread TC cache exhaustion again, doing rolling restart of HHVMs
* 21:46 logmsgbot: ori Synchronized wmf-config/CommonSettings.php: I3d3ed7647: Test LCStoreStaticArray on test2wiki (duration: 00m 14s)
* 21:01 godog: NPE while trying to make restbase1007 (cassandra 2.1.5) join the cluster, trying matching the same cassandra version (2.1.3)
* 20:57 logmsgbot: krenair Synchronized wmf-config/InitialiseSettings.php: fix last commit, did not have any affect (duration: 00m 16s)
* 20:55 ejegg: updated payments from 43c7952d2a31deaea97e8319f5612d644dce43c8 to f33d0a8687a120a2057a7e6acad67da63b17f97e
* 20:54 logmsgbot: krenair Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/217688/1 (duration: 00m 13s)
* 20:10 godog: sign restbase1007 puppet key and first puppet run
* 19:10 logmsgbot: legoktm Synchronized wmf-config/InitialiseSettings-labs.php: https://gerrit.wikimedia.org/r/217591 (duration: 00m 13s)
* 18:58 logmsgbot: legoktm Synchronized wmf-config/InitialiseSettings-labs.php: beta only change - https://gerrit.wikimedia.org/r/217560 (duration: 00m 12s)
* 18:55 logmsgbot: krinkle Synchronized php-1.26wmf9/extensions/WikimediaEvents: T101806 (duration: 00m 14s)
* 18:43 logmsgbot: twentyafterfour Synchronized php-1.26wmf9/includes/AjaxResponse.php: Hotfix Iafff9982bbbee893c13f891901dde88f998db7a6 (duration: 00m 14s)
* 18:16 logmsgbot: twentyafterfour rebuilt wikiversions.cdb and synchronized wikiversions files: all wikis to 1.26wmf9
* 17:44 ejegg: rolled back payments to 43c7952d2a31deaea97e8319f5612d644dce43c8
* 17:41 ejegg: updated payments from 43c7952d2a31deaea97e8319f5612d644dce43c8 to 15f24d24b150d5d774314b0c1b40ae26a73185f2
* 17:00 moritzm: updated mc200[1-3] to linux 3.19
* 16:28 logmsgbot: aude Synchronized wmf-config/InitialiseSettings.php: Use arbitrary access tag (duration: 00m 12s)
* 16:27 logmsgbot: aude Synchronized wmf-config/CommonSettings.php: Add arbitrary access group tag (duration: 00m 13s)
* 16:27 logmsgbot: aude Synchronized arbitraryaccess.dblist: Add dblist for arbitrary access wikis (duration: 00m 13s)
* 16:24 logmsgbot: aude Synchronized wmf-config/InitialiseSettings.php: Use usagetracking tag (duration: 00m 13s)
* 16:23 logmsgbot: aude Synchronized wmf-config/CommonSettings.php: Add usagetracking group tag (duration: 00m 16s)
* 16:23 ori: Scap + deployments exhausted TC cache on Apaches; performed a rolling restart of HHVM
* 16:21 logmsgbot: aude Synchronized usagetracking.dblist: Add dblist for usage tracking wikis (duration: 00m 25s)
* 16:19 logmsgbot: ori Synchronized wmf-config/CommonSettings.php: Disable Parsoid update jobs (duration: 00m 14s)
* 16:18 logmsgbot: thcipriani Finished scap: SWAT: Update namespaces and special pages for Northern Luri (lrc) from translatewiki [[gerrit:216533]] [[gerrit:217327]] (duration: 32m 11s)
* 15:46 logmsgbot: thcipriani Started scap: SWAT: Update namespaces and special pages for Northern Luri (lrc) from translatewiki [[gerrit:216533]] [[gerrit:217327]]
* 15:27 logmsgbot: thcipriani Synchronized php-1.26wmf9/extensions/OpenStackManager: SWAT: update OpenStackManager to disable unused sudoer features [[gerrit:217407]] (duration: 00m 13s)
* 15:11 logmsgbot: thcipriani Synchronized wmf-config/InitialiseSettings.php: SWAT: Make VisualEditor access RESTbase directly on all public wikis [[gerrit:214833]] (duration: 00m 12s)
* 15:05 logmsgbot: thcipriani Synchronized wmf-config/InitialiseSettings.php: SWAT: CX: Add wikis for deployment on 20150611 [[gerrit:217460 ]] (duration: 00m 12s)
* 14:33 logmsgbot: aude Synchronized wmf-config/InitialiseSettings.php: Enable usage tracking on jawiki (duration: 00m 12s)
* 13:40 _joe_: rolling restart of all the restbase instances
* 13:33 logmsgbot: aude Synchronized wmf-config/InitialiseSettings.php: Enable usage tracking on frwiki (duration: 00m 12s)
* 13:32 _joe_: running puppet on all restbase hosts
* 13:19 _joe_: running puppet on restbase1001
* 13:16 _joe_: disabling puppet on restbase hosts in anticipation for merging https://gerrit.wikimedia.org/r/217431
* 13:11 paravoid: removing gdnsd from apt: precise-wikimedia (1.9.0-1~precise1/2.1.0-1~precise1), trusty-wikimedia (2.1.0-1), jessie-wikimedia (2.1.2-1~deb8u1)
* 12:13 logmsgbot: aude Synchronized wmf-config/InitialiseSettings.php: Enable arbitrary access on Wikivoyage and Wikiquote (duration: 00m 13s)
* 11:48 YuviPanda: reboot labvirt1005 for kernel upgrade
* 11:46 YuviPanda: installing linux-image-generic-lts-vivid on labvirt1005 to get a 3.19 kernel
* 09:51 akosiaris: uploaded ruby-jsduck_5.3.4 and ruby-rkelly-remix_0.0.6 on apt.wikimedia.org/jessie-wikimedia/main
* 08:18 akosiaris: recreating jessie chroots on copper
* 06:21 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Thu Jun 11 06:21:53 UTC 2015 (duration 21m 52s)
* 04:44 twentyafterfour: upgraded phabricator at 1:50 UTC (belatedly logged...)
* 03:01 logmsgbot: LocalisationUpdate completed (1.26wmf9) at 2015-06-11 03:01:48+00:00
* 03:00 logmsgbot: springle Synchronized wmf-config/db-eqiad.php: repool db1057, warm up (duration: 01m 16s)
* 02:59 logmsgbot: l10nupdate Synchronized php-1.26wmf9/cache/l10n: (no message) (duration: 05m 59s)
* 02:43 logmsgbot: LocalisationUpdate completed (1.26wmf8) at 2015-06-11 02:43:34+00:00
* 02:30 logmsgbot: l10nupdate Synchronized php-1.26wmf8/cache/l10n: (no message) (duration: 09m 13s)


== June 10 ==
== 2021-09-04 ==
* 23:23 logmsgbot: catrope Synchronized wmf-config/InitialiseSettings.php: Add www.limis.lt to $wgCopyUploadsDomains (duration: 00m 19s)
* 13:35 marostegui@cumin1001: dbctl commit (dc=all): 'db2137:3314 (re)pooling @ 100%: Slowly repool [[phab:T290374|T290374]]', diff saved to https://phabricator.wikimedia.org/P17217 and previous config saved to /var/cache/conftool/dbconfig/20210904-133532-root.json
* 22:07 logmsgbot: twentyafterfour Synchronized php-1.26wmf9/extensions/MobileFrontend/includes/skins/banners.mustache: Deploying https://gerrit.wikimedia.org/r/#/c/217417/ (duration: 00m 16s)
* 13:20 marostegui@cumin1001: dbctl commit (dc=all): 'db2137:3314 (re)pooling @ 75%: Slowly repool [[phab:T290374|T290374]]', diff saved to https://phabricator.wikimedia.org/P17216 and previous config saved to /var/cache/conftool/dbconfig/20210904-132029-root.json
* 20:38 logmsgbot: ori Synchronized php-1.26wmf8/includes/Hooks.php: d6802ad7d6: Avoid section profiling in Hooks::run due to high overhead (duration: 00m 14s)
* 13:05 marostegui@cumin1001: dbctl commit (dc=all): 'db2137:3314 (re)pooling @ 50%: Slowly repool [[phab:T290374|T290374]]', diff saved to https://phabricator.wikimedia.org/P17215 and previous config saved to /var/cache/conftool/dbconfig/20210904-130525-root.json
* 20:37 logmsgbot: ori Synchronized php-1.26wmf9/includes/Hooks.php: e552f4942d: Avoid section profiling in Hooks::run due to high overhead (duration: 00m 17s)
* 12:50 marostegui@cumin1001: dbctl commit (dc=all): 'db2137:3314 (re)pooling @ 25%: Slowly repool [[phab:T290374|T290374]]', diff saved to https://phabricator.wikimedia.org/P17214 and previous config saved to /var/cache/conftool/dbconfig/20210904-125021-root.json
* 20:36 logmsgbot: ori Synchronized php-1.26wmf9/includes/User.php: 2f4f1e279d: Fixed "wfTimestamp() fed bogus time value" errors (duration: 00m 12s)
* 12:35 marostegui@cumin1001: dbctl commit (dc=all): 'db2137:3314 (re)pooling @ 10%: Slowly repool [[phab:T290374|T290374]]', diff saved to https://phabricator.wikimedia.org/P17213 and previous config saved to /var/cache/conftool/dbconfig/20210904-123518-root.json
* 20:36 logmsgbot: ori Synchronized php-1.26wmf8/includes/User.php: 55e18123ca: Fixed "wfTimestamp() fed bogus time value" errors (duration: 00m 15s)
* 12:20 marostegui@cumin1001: dbctl commit (dc=all): 'db2137:3314 (re)pooling @ 5%: Slowly repool [[phab:T290374|T290374]]', diff saved to https://phabricator.wikimedia.org/P17212 and previous config saved to /var/cache/conftool/dbconfig/20210904-122014-root.json
* 18:07 logmsgbot: twentyafterfour rebuilt wikiversions.cdb and synchronized wikiversions files: Group1 wikis to 1.26wmf9
* 09:04 elukey: restart wmf_auto_restart_rsyslog.service on puppetdb1002
* 16:14 godog: reboot ms-be2008 to check disk swap config
* 09:00 elukey: `systemctl reset-failed ifup@ens6.service` on puppetdb2002 - [[phab:T273026|T273026]]
* 15:50 logmsgbot: krenair Synchronized wmf-config/InitialiseSettings.php: retry (duration: 01m 08s)
* 03:02 rzl@cumin2001: dbctl commit (dc=all): 'Depool db2137:3314', diff saved to https://phabricator.wikimedia.org/P17210 and previous config saved to /var/cache/conftool/dbconfig/20210904-030231-rzl.json
* 15:34 Krenair: sync failed to something like 25 hosts, cannot directly log into any of them either
* 15:17 logmsgbot: krenair Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/215030/ - no code change, just docs - should not have to wait 9 days for this (duration: 01m 08s)
* 13:16 moritzm: installed curl security updates on elastic*, wtp*, db*, virt*, labs*, labmon*, labstore*, es*
* 12:38 paravoid: zirconium: rm -rf /var/log2 (last log there from Mar 20th 2014)
* 10:55 jynus: disruption for maintenance starting on labsdb1002 https://lists.wikimedia.org/pipermail/labs-l/2015-June/003766.html
* 03:02 logmsgbot: ori Synchronized php-1.26wmf8/includes/User.php: 55e18123ca: Fixed "wfTimestamp() fed bogus time value" (duration: 01m 07s)
* 03:01 logmsgbot: ori Synchronized php-1.26wmf9/includes/User.php: 2f4f1e279d: Fixed "wfTimestamp() fed bogus time value" (duration: 01m 08s)
* 02:36 logmsgbot: LocalisationUpdate completed (1.26wmf8) at 2015-06-10 02:35:44+00:00
* 02:31 logmsgbot: l10nupdate Synchronized php-1.26wmf8/cache/l10n: (no message) (duration: 07m 20s)
* 01:33 logmsgbot: springle Synchronized wmf-config/db-eqiad.php: depool db1057 (duration: 01m 08s)
* 01:13 logmsgbot: ori Synchronized php-1.26wmf8/extensions/FlaggedRevs: 433fae7f23: Update FlaggedRevs for cherry-picks (duration: 01m 09s)
* 01:10 logmsgbot: ori Synchronized php-1.26wmf9/extensions/FlaggedRevs: 2cfc8c9f2b: Update FlaggedRevs for cherry-picks (duration: 01m 09s)


== June 9 ==
== 2021-09-03 ==
* 23:57 logmsgbot: catrope Synchronized php-1.26wmf8/includes/: Avoid parser cache miss that often occurs post-save (duration: 01m 14s)
* 21:49 bd808@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'toolhub' for release 'main' .
* 23:29 logmsgbot: catrope Synchronized php-1.26wmf8/resources/src/mediawiki/mediawiki.js: touch (duration: 01m 08s)
* 20:30 bd808@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'toolhub' for release 'main' .
* 23:23 logmsgbot: catrope Synchronized php-1.26wmf9/includes/resourceloader/ResourceLoaderOOUIImageModule.php: Fix OOUI image variants (duration: 01m 08s)
* 19:33 krinkle@deploy1002: Finished deploy [integration/docroot@6492b3d]: {{Gerrit|I48480e89e5f6}} (duration: 00m 10s)
* 23:22 ori: Deleting unused metrics on graphite2001 (sum_sq and stddev) as well
* 19:33 krinkle@deploy1002: Started deploy [integration/docroot@6492b3d]: {{Gerrit|I48480e89e5f6}}
* 23:21 logmsgbot: catrope Synchronized php-1.26wmf9/resources/src/mediawiki/mediawiki.js: Add logging for T101806 private modules (duration: 01m 08s)
* 19:26 bd808@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'toolhub' for release 'main' .
* 23:20 ori: Deleting unused  metrics in graphite1001 (sum_sq and stddev)
* 19:04 ryankemper: [[phab:T290330|T290330]] `ryankemper@cumin1001:~$ sudo -E cumin 'P<nowiki>{</nowiki>wdqs2*<nowiki>}</nowiki>' 'sudo rm -fv /etc/cron.hourly/restart-blazegraph'` (Cleaned up manually created crons now that we have [somewhat hacky] systemd timers doing the same job)
* 23:19 logmsgbot: catrope Synchronized php-1.26wmf8/resources/src/mediawiki/mediawiki.js: Add logging for T101806 private modules (duration: 01m 08s)
* 17:42 dduvall@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'blubberoid' for release 'production' .
* 23:16 logmsgbot: catrope Synchronized wmf-config/CirrusSearch-common.php: fix total breakage of search in wmf9 (duration: 01m 08s)
* 17:40 dduvall@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'blubberoid' for release 'production' .
* 22:44 andrewbogott: moving labs-ns0 from virt1000 to labcontrol1001
* 17:35 dduvall@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'blubberoid' for release 'staging' .
* 22:43 andrewbogott: stopping almost everything on virt1000
* 17:17 ryankemper: [[phab:T290330|T290330]] Deployed https://gerrit.wikimedia.org/r/c/operations/puppet/+/717508 across `wdqs` fleet; codfw wdqs hosts will restart on average once per hour now to address ongoing availability issues for wdqs codfw
* 20:31 logmsgbot: twentyafterfour rebuilt wikiversions.cdb and synchronized wikiversions files: group0 to 1.26wmf9
* 16:32 bd808@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'toolhub' for release 'main' .
* 20:27 logmsgbot: twentyafterfour Finished scap: testwiki to php-1.26wmf9 and rebuild l10n cache (duration: 29m 24s)
* 16:10 gehel: blazegraph (public cofdfw cluster) will now restart every hour - [[phab:T290330|T290330]]
* 19:58 logmsgbot: twentyafterfour Started scap: testwiki to php-1.26wmf9 and rebuild l10n cache
* 15:53 jbond: enable puppet fleet wide to post puppetdb database maintance - [[phab:T263578|T263578]]
* 19:42 mutante: einsteinium - no console output after reboot command, powercycled, booting again
* 15:21 jbond: create lvm snapshot puppetdb2002_data_snapshot on ganeti2023 - [[phab:T263578|T263578]]
* 19:36 mutante: rebooting einsteinium
* 15:17 jbond: create lvm snapshot puppetdb1002_data_snapshot on ganeti1012 - [[phab:T263578|T263578]]
* 19:28 mutante: restarted apache on mw1227
* 15:00 jbond: disable puppet fleet wide to preform puppetdb database maintance - [[phab:T263578|T263578]]
* 17:30 mutante: wikitech-static: installing bunch of package upgrades on the external wikitech-static VM
* 14:58 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
* 17:13 cmjohnson1: db1058 replacing failed disk 7
* 14:58 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 16:20 cmjohnson1: analytics1028 going down for troubleshooting
* 14:35 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 16:17 kart_: updated cxserver to 4a71145
* 14:29 pt1979@cumin2002: START - Cookbook sre.dns.netbox
* 15:37 logmsgbot: thcipriani Synchronized php-1.26wmf8/extensions/Wikidata: SWAT: Update Wikidata - forward compat for usage tracking [[gerrit:216967]] (duration: 01m 17s)
* 14:20 mutante: mw2264 - scap pull
* 15:20 logmsgbot: thcipriani Synchronized wmf-config/InitialiseSettings.php: SWAT take II: Enabled Guided Tour on th.wikipedia [[gerrit:216950]] (duration: 01m 08s)
* 14:18 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
* 15:19 logmsgbot: thcipriani Synchronized wmf-config/InitialiseSettings.php: SWAT: Enabled Guided Tour on th.wikipedia [[gerrit:216950]] (duration: 01m 08s)
* 14:18 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 15:05 logmsgbot: thcipriani Synchronized wmf-config/InitialiseSettings.php: SWAT: CX: Add wikis for deployment on 20150609 [[gerrit:216622]] (duration: 01m 09s)
* 13:11 jiji@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts mc1027.eqiad.wmnet
* 11:09 Krenair: Email set for User:GifTagger@commonswiki per [[phab:T100889]]
* 13:10 dcausse: installing openjdk-8-dbg on wdqs2007
* 09:05 akosiaris: uploaded etherpad-lite_1.5.6-2 on apt.wikimedia.org/jessie-wikimedia/main component
* 13:04 jiji@cumin1001: START - Cookbook sre.hosts.decommission for hosts mc1027.eqiad.wmnet
* 08:22 akosiaris: upload etherpad-lite_1.5.6-1 on apt.wikimedia.org, jessie-wikimedia dist, main component
* 13:02 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mc1023.eqiad.wmnet
* 04:35 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Tue Jun  9 04:34:08 UTC 2015 (duration 34m 7s)
* 12:48 jiji@cumin1001: START - Cookbook sre.hosts.decommission for hosts mc1023.eqiad.wmnet
* 02:28 logmsgbot: LocalisationUpdate completed (1.26wmf8) at 2015-06-09 02:27:30+00:00
* 12:46 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mc[1035-1036].eqiad.wmnet
* 02:23 logmsgbot: l10nupdate Synchronized php-1.26wmf8/cache/l10n: (no message) (duration: 07m 12s)
* 12:32 jiji@cumin1001: START - Cookbook sre.hosts.decommission for hosts mc[1035-1036].eqiad.wmnet
* 01:42 godog: stop icinga-wm on neon
* 12:12 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mc[1028-1032].eqiad.wmnet
* 12:03 joal@deploy1002: Finished deploy [analytics/refinery@7208d3d] (thin): Analytics hotfix deploy (bis) THIN [analytics/refinery@7208d3d] (duration: 00m 06s)
* 12:03 joal@deploy1002: Started deploy [analytics/refinery@7208d3d] (thin): Analytics hotfix deploy (bis) THIN [analytics/refinery@7208d3d]
* 12:03 joal@deploy1002: Finished deploy [analytics/refinery@7208d3d]: Analytics hotfix deploy (bis)[analytics/refinery@7208d3d] (duration: 19m 16s)
* 11:56 dcausse@deploy1002: Finished deploy [wdqs/wdqs@8361ac9]: ban queries from a generic UA (duration: 19m 21s)
* 11:44 joal@deploy1002: Started deploy [analytics/refinery@7208d3d]: Analytics hotfix deploy (bis)[analytics/refinery@7208d3d]
* 11:42 marostegui: Remove flaggedrevs_stats2 and flaggedrevs_stats from enwiki - [[phab:T289050|T289050]]
* 11:37 dcausse@deploy1002: Started deploy [wdqs/wdqs@8361ac9]: ban queries from a generic UA
* 11:36 dcausse@deploy1002: Finished deploy [wdqs/wdqs@8361ac9]: ban queries from a generic UA (duration: 01m 07s)
* 11:35 dcausse@deploy1002: Started deploy [wdqs/wdqs@8361ac9]: ban queries from a generic UA
* 10:58 jiji@cumin1001: START - Cookbook sre.hosts.decommission for hosts mc[1028-1032].eqiad.wmnet
* 10:54 jiji@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts mc[1025-1026].eqiad.wmnet
* 10:47 joal@deploy1002: Finished deploy [analytics/aqs/deploy@d273fde] (aqs-next): Deploy latest code on AQS new servers - test after failures (duration: 00m 32s)
* 10:46 joal@deploy1002: Started deploy [analytics/aqs/deploy@d273fde] (aqs-next): Deploy latest code on AQS new servers - test after failures
* 10:45 joal@deploy1002: deploy aborted: Deploy latest code on AQS new servers - test after failures (duration: 00m 05s)
* 10:45 joal@deploy1002: Started deploy [analytics/aqs/deploy@d273fde] (aqs-test): Deploy latest code on AQS new servers - test after failures
* 10:29 hnowlan@deploy1002: Finished deploy [analytics/aqs/deploy@d273fde] (aqs-next): deploying aqs to inactive aqs-next hosts (duration: 00m 03s)
* 10:29 hnowlan@deploy1002: Started deploy [analytics/aqs/deploy@d273fde] (aqs-next): deploying aqs to inactive aqs-next hosts
* 10:22 hnowlan@deploy1002: Finished deploy [analytics/aqs/deploy@d273fde] (aqs-next): deploying aqs to inactive aqs-next hosts (duration: 00m 55s)
* 10:21 hnowlan@deploy1002: Started deploy [analytics/aqs/deploy@d273fde] (aqs-next): deploying aqs to inactive aqs-next hosts
* 10:17 hnowlan@deploy1002: Finished deploy [analytics/aqs/deploy@d273fde] (aqs-next): deploying aqs to inactive aqs-next hosts (duration: 00m 36s)
* 10:16 hnowlan@deploy1002: Started deploy [analytics/aqs/deploy@d273fde] (aqs-next): deploying aqs to inactive aqs-next hosts
* 10:08 hnowlan@deploy1002: Finished deploy [analytics/aqs/deploy@d273fde] (aqs-next): deploying aqs to inactive aqs-next hosts (duration: 00m 45s)
* 10:08 hnowlan@deploy1002: Started deploy [analytics/aqs/deploy@d273fde] (aqs-next): deploying aqs to inactive aqs-next hosts
* 10:05 hnowlan@deploy1002: Finished deploy [analytics/aqs/deploy@d273fde] (aqs-next): deploying aqs to inactive aqs-next hosts (duration: 00m 36s)
* 10:04 hnowlan@deploy1002: Started deploy [analytics/aqs/deploy@d273fde] (aqs-next): deploying aqs to inactive aqs-next hosts
* 10:02 hnowlan@deploy1002: Finished deploy [analytics/aqs/deploy@d273fde] (aqs-next): deploying aqs to inactive aqs-next hosts (duration: 01m 25s)
* 10:01 hnowlan@deploy1002: Started deploy [analytics/aqs/deploy@d273fde] (aqs-next): deploying aqs to inactive aqs-next hosts
* 10:00 hnowlan@deploy1002: Finished deploy [analytics/aqs/deploy@d273fde] (aqs-next): deploying aqs to inactive aqs-next hosts (duration: 01m 53s)
* 09:58 hnowlan@deploy1002: Started deploy [analytics/aqs/deploy@d273fde] (aqs-next): deploying aqs to inactive aqs-next hosts
* 09:57 hnowlan@deploy1002: Finished deploy [analytics/aqs/deploy@d273fde] (aqs-next): deploying aqs to inactive aqs-next hosts (duration: 00m 09s)
* 09:57 hnowlan@deploy1002: Started deploy [analytics/aqs/deploy@d273fde] (aqs-next): deploying aqs to inactive aqs-next hosts
* 09:32 joal@deploy1002: Finished deploy [analytics/refinery@4ff8979] (thin): Analytics hotfix deploy THIN [analytics/refinery@4ff8979] (duration: 00m 07s)
* 09:32 joal@deploy1002: Started deploy [analytics/refinery@4ff8979] (thin): Analytics hotfix deploy THIN [analytics/refinery@4ff8979]
* 09:26 joal@deploy1002: Finished deploy [analytics/refinery@4ff8979]: Analytics hotfix deploy [analytics/refinery@4ff8979] (duration: 17m 36s)
* 09:25 jiji@cumin1001: START - Cookbook sre.hosts.decommission for hosts mc[1025-1026].eqiad.wmnet
* 09:15 jelto@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
* 09:14 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mc1022.eqiad.wmnet
* 09:13 jelto@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
* 09:09 akosiaris@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mathoid' for release 'production' .
* 09:09 jelto@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
* 09:09 joal@deploy1002: Started deploy [analytics/refinery@4ff8979]: Analytics hotfix deploy [analytics/refinery@4ff8979]
* 09:08 jelto@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
* 09:06 akosiaris@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mathoid' for release 'production' .
* 09:03 jelto@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
* 09:03 jelto@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
* 08:53 jelto@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
* 08:52 jelto@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
* 08:45 ema: cp-eqsin: clean apt cache to free up some space [[phab:T290305|T290305]]
* 08:45 jiji@cumin1001: START - Cookbook sre.hosts.decommission for hosts mc1022.eqiad.wmnet
* 08:23 akosiaris@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'mathoid' for release 'staging' .
* 07:43 legoktm: uploaded pygments 2.10.0+dfsg-1~wmf1 to apt.wm.o in component/pygments
* 07:42 marostegui: Remove flaggedrevs_stats2 and flaggedrevs_stats from severak s3 wikis - [[phab:T289050|T289050]]
* 07:10 godog: more weight to ms-be20[62-65] - [[phab:T288458|T288458]]
* 07:01 marostegui@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 06:57 marostegui@cumin1001: START - Cookbook sre.dns.netbox
* 06:45 elukey: run `apt-get clean` on cp5012 to free some space (94% of the root partition used)
* 06:12 marostegui@cumin1001: dbctl commit (dc=all): 'db2138:3314 (re)pooling @ 100%: Slowly repool after reimage [[phab:T288803|T288803]]', diff saved to https://phabricator.wikimedia.org/P17203 and previous config saved to /var/cache/conftool/dbconfig/20210903-061204-root.json
* 06:11 marostegui@cumin1001: dbctl commit (dc=all): 'db2138:3312 (re)pooling @ 100%: Slowly repool after reimage [[phab:T288803|T288803]]', diff saved to https://phabricator.wikimedia.org/P17202 and previous config saved to /var/cache/conftool/dbconfig/20210903-061138-root.json
* 05:57 marostegui@cumin1001: dbctl commit (dc=all): 'db2138:3314 (re)pooling @ 75%: Slowly repool after reimage [[phab:T288803|T288803]]', diff saved to https://phabricator.wikimedia.org/P17201 and previous config saved to /var/cache/conftool/dbconfig/20210903-055700-root.json
* 05:56 marostegui@cumin1001: dbctl commit (dc=all): 'db2138:3312 (re)pooling @ 75%: Slowly repool after reimage [[phab:T288803|T288803]]', diff saved to https://phabricator.wikimedia.org/P17200 and previous config saved to /var/cache/conftool/dbconfig/20210903-055635-root.json
* 05:41 marostegui@cumin1001: dbctl commit (dc=all): 'db2138:3314 (re)pooling @ 50%: Slowly repool after reimage [[phab:T288803|T288803]]', diff saved to https://phabricator.wikimedia.org/P17199 and previous config saved to /var/cache/conftool/dbconfig/20210903-054157-root.json
* 05:41 marostegui@cumin1001: dbctl commit (dc=all): 'db2138:3312 (re)pooling @ 50%: Slowly repool after reimage [[phab:T288803|T288803]]', diff saved to https://phabricator.wikimedia.org/P17198 and previous config saved to /var/cache/conftool/dbconfig/20210903-054131-root.json
* 05:30 marostegui@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts pc2007.codfw.wmnet
* 05:26 marostegui@cumin1001: dbctl commit (dc=all): 'db2138:3314 (re)pooling @ 25%: Slowly repool after reimage [[phab:T288803|T288803]]', diff saved to https://phabricator.wikimedia.org/P17196 and previous config saved to /var/cache/conftool/dbconfig/20210903-052653-root.json
* 05:26 marostegui@cumin1001: dbctl commit (dc=all): 'db2138:3312 (re)pooling @ 25%: Slowly repool after reimage [[phab:T288803|T288803]]', diff saved to https://phabricator.wikimedia.org/P17195 and previous config saved to /var/cache/conftool/dbconfig/20210903-052628-root.json
* 05:20 marostegui@cumin1001: START - Cookbook sre.hosts.decommission for hosts pc2007.codfw.wmnet
* 05:11 marostegui@cumin1001: dbctl commit (dc=all): 'db2138:3314 (re)pooling @ 10%: Slowly repool after reimage [[phab:T288803|T288803]]', diff saved to https://phabricator.wikimedia.org/P17194 and previous config saved to /var/cache/conftool/dbconfig/20210903-051149-root.json
* 05:11 marostegui@cumin1001: dbctl commit (dc=all): 'db2138:3312 (re)pooling @ 10%: Slowly repool after reimage [[phab:T288803|T288803]]', diff saved to https://phabricator.wikimedia.org/P17193 and previous config saved to /var/cache/conftool/dbconfig/20210903-051124-root.json
* 05:04 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2138 for upgrade', diff saved to https://phabricator.wikimedia.org/P17192 and previous config saved to /var/cache/conftool/dbconfig/20210903-050423-marostegui.json
* 00:31 tgr@deploy1002: Synchronized php-1.37.0-wmf.21/extensions/GrowthExperiments/maintenance/fixLinkRecommendationData.php: Backport: [[gerrit:716491{{!}}fixLinkRecommendationData: Try harder to avoid >10K result sets (T284531)]] (duration: 00m 58s)
* 00:25 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 00:23 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .


== June 8 ==
== 2021-09-02 ==
* 23:43 bblack: repooled cp3030/cp1065 in pybal
* 23:12 thcipriani@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:704171{{!}}Adding wordmark for ptwikinews mobile and desktop skins (T281591)]] Part II (duration: 00m 57s)
* 23:11 logmsgbot: ebernhardson Synchronized php-1.26wmf8/extensions/UploadWizard/: Bump UploadWizard in 1.26wmf8 for evening SWAT (duration: 01m 09s)
* 23:11 thcipriani@deploy1002: Synchronized static/images/mobile/copyright/wikinews-wordmark-pt.svg: Config: [[gerrit:704171{{!}}Adding wordmark for ptwikinews mobile and desktop skins (T281591)]] Part I (duration: 01m 14s)
* 22:21 bblack: depooled cp3030, cp1065 in pybal for ipsec
* 21:47 bd808@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'toolhub' for release 'main' .
* 20:17 subbu: deployed parsoid sha 131554ba
* 21:37 bd808@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'toolhub' for release 'main' .
* 19:18 jynus: RAID degradation (disk failure) on s5 master (db1058), no production impact, replacement on the way
* 21:17 bd808@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'toolhub' for release 'main' .
* 17:13 ottomata: restarted eventlogging services on eventlog1001 after disabling kafka pieces
* 19:57 ejegg: updated fundraising CiviCRM from {{Gerrit|7ac13753c7}} to {{Gerrit|06ef98593f}}
* 16:13 _joe_: powercycling tmh1001, console blank, unresponsive to pings
* 19:49 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 16:00 logmsgbot: thcipriani Synchronized commonsuploads.dblist: SWAT: Revert Temporarily re-enable uploads on Marathi Wikipedia, for real [[gerrit:216719]] (duration: 01m 07s)
* 19:48 jiji@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts mc1021.eqiad.wmnet
* 15:58 logmsgbot: thcipriani Synchronized commonsuploads.dblist: SWAT: Revert Temporarily re-enable uploads on Marathi Wikipedia [[gerrit:216719]] (duration: 01m 08s)
* 19:45 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 15:40 logmsgbot: thcipriani Synchronized php-1.26wmf8/extensions/Cite: SWAT: Revert Do all of Cite's real work during unstrip and followup [[gerrit:216715]] (duration: 01m 08s)
* 19:40 jiji@cumin1001: START - Cookbook sre.hosts.decommission for hosts mc1021.eqiad.wmnet
* 15:19 Coren: T96063: process halted for now as store/backup is unmovable and on slice5
* 19:28 twentyafterfour@deploy1002: rebuilt and synchronized wikiversions files: all wikis to 1.37.0-wmf.21  refs [[phab:T281162|T281162]]
* 15:17 logmsgbot: thcipriani Synchronized w/static/images/project-logos/pflwiki.png: SWAT: Fix transparency of pflwiki logo [[gerrit:216595]] (duration: 01m 08s)
* 18:31 ryankemper: [WCQS] `wcqs100[1-3],wcqs200[1-3]` downtimed until `2021-09-09 20:29:55` (UTC)
* 15:15 akosiaris: disabled ircecho on neon for a while
* 18:28 ryankemper: [WCQS] Merged & deployed https://gerrit.wikimedia.org/r/c/operations/puppet/+/713946, going to suppress icinga alerts on `wcqs*` hosts because these are still in the process of being spun up properly and aren't serving traffic or anything
* 14:53 Coren: T96063: starting pvmove from slice5 to slice2
* 18:24 ryankemper@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'linkrecommendation' for release 'external' .
* 14:48 Coren: T96063: dropped volume slice1 from vg store
* 18:24 ryankemper@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'linkrecommendation' for release 'internal' .
* 14:46 Coren: T96063: dropped store/project
* 18:20 ryankemper@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'linkrecommendation' for release 'internal' .
* 14:44 Coren: starting https://phabricator.wikimedia.org/T96063 on labstore1001
* 18:20 ryankemper@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'linkrecommendation' for release 'external' .
* 14:24 logmsgbot: jynus Synchronized wmf-config/db-eqiad.php: depool es1005 (duration: 01m 08s)
* 17:06 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 14:23 Coren: rsync in progress between labstore1001:store/backup and labstore1002:backup/backup (at ionice idle)
* 17:03 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 14:13 Coren: created store/backup snapshot on labstore1001 for backup copy
* 16:57 akosiaris@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 13:03 moritzm: added strongswan_5.3.0-1+wmf2 to jessie-wikimedia on carbon
* 16:18 jiji@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 11:42 _joe_: purging squid cache on carbon
* 16:09 jiji@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 11:26 moritzm: updated mc2* to 2:2.8.17-1+deb8u1
* 16:04 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mc1020.eqiad.wmnet
* 10:55 logmsgbot: jynus Synchronized wmf-config/db-eqiad.php: repool es1007 (duration: 01m 08s)
* 15:53 jiji@cumin1001: START - Cookbook sre.hosts.decommission for hosts mc1020.eqiad.wmnet
* 10:27 akosiaris: disabled puppet on uranium, investigating ganglia problems
* 15:40 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mc1019.eqiad.wmnet
* 10:05 akosiaris: ganglia gmetad problems
* 15:31 dzahn@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'miscweb' for release 'main' .
* 05:25 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Mon Jun  8 05:24:08 UTC 2015 (duration 24m 7s)
* 15:28 dzahn@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'miscweb' for release 'main' .
* 02:26 logmsgbot: LocalisationUpdate completed (1.26wmf8) at 2015-06-08 02:25:12+00:00
* 15:26 jiji@cumin1001: START - Cookbook sre.hosts.decommission for hosts mc1019.eqiad.wmnet
* 02:21 logmsgbot: l10nupdate Synchronized php-1.26wmf8/cache/l10n: (no message) (duration: 07m 07s)
* 15:16 jiji@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=99) for hosts mc1033.eqiad.wmnet
* 15:15 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mc1034.eqiad.wmnet
* 15:04 marostegui@cumin1001: dbctl commit (dc=all): 'db2136 (re)pooling @ 100%: Slowly repool after reimage [[phab:T288803|T288803]]', diff saved to https://phabricator.wikimedia.org/P17178 and previous config saved to /var/cache/conftool/dbconfig/20210902-150412-root.json
* 14:50 jiji@cumin1001: START - Cookbook sre.hosts.decommission for hosts mc1034.eqiad.wmnet
* 14:49 marostegui@cumin1001: dbctl commit (dc=all): 'db2136 (re)pooling @ 75%: Slowly repool after reimage [[phab:T288803|T288803]]', diff saved to https://phabricator.wikimedia.org/P17177 and previous config saved to /var/cache/conftool/dbconfig/20210902-144908-root.json
* 14:49 jiji@cumin1001: START - Cookbook sre.hosts.decommission for hosts mc1033.eqiad.wmnet
* 14:47 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 14:44 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 14:39 jayme@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'cxserver' for release 'production' .
* 14:38 jayme@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'cxserver' for release 'production' .
* 14:38 jayme@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'cxserver' for release 'staging' .
* 14:35 jayme@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
* 14:34 marostegui@cumin1001: dbctl commit (dc=all): 'db2136 (re)pooling @ 50%: Slowly repool after reimage [[phab:T288803|T288803]]', diff saved to https://phabricator.wikimedia.org/P17176 and previous config saved to /var/cache/conftool/dbconfig/20210902-143405-root.json
* 14:33 jayme@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
* 14:32 jayme@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'wikifeeds' for release 'staging' .
* 14:22 moritzm: installing exiv2 security updates
* 14:19 marostegui@cumin1001: dbctl commit (dc=all): 'db2136 (re)pooling @ 25%: Slowly repool after reimage [[phab:T288803|T288803]]', diff saved to https://phabricator.wikimedia.org/P17175 and previous config saved to /var/cache/conftool/dbconfig/20210902-141901-root.json
* 14:13 moritzm: installing ffmpeg security updates
* 14:03 marostegui@cumin1001: dbctl commit (dc=all): 'db2136 (re)pooling @ 10%: Slowly repool after reimage [[phab:T288803|T288803]]', diff saved to https://phabricator.wikimedia.org/P17174 and previous config saved to /var/cache/conftool/dbconfig/20210902-140357-root.json
* 14:00 jayme@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'zotero' for release 'production' .
* 13:57 jayme@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'zotero' for release 'production' .
* 13:55 jayme@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'zotero' for release 'staging' .
* 13:48 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2136 for upgrade', diff saved to https://phabricator.wikimedia.org/P17173 and previous config saved to /var/cache/conftool/dbconfig/20210902-134838-marostegui.json
* 13:44 marostegui@cumin1001: dbctl commit (dc=all): 'db2119 (re)pooling @ 100%: Slowly repool after reimage [[phab:T288803|T288803]]', diff saved to https://phabricator.wikimedia.org/P17172 and previous config saved to /var/cache/conftool/dbconfig/20210902-134448-root.json
* 13:42 jayme@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'citoid' for release 'production' .
* 13:42 jayme@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'citoid' for release 'production' .
* 13:41 jayme@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'citoid' for release 'staging' .
* 13:39 jayme@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'termbox' for release 'production' .
* 13:39 jayme@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'termbox' for release 'production' .
* 13:38 jayme@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'termbox' for release 'test' .
* 13:38 jayme@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'termbox' for release 'staging' .
* 13:36 jayme@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'blubberoid' for release 'production' .
* 13:35 jayme@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'blubberoid' for release 'production' .
* 13:29 marostegui@cumin1001: dbctl commit (dc=all): 'db2119 (re)pooling @ 75%: Slowly repool after reimage [[phab:T288803|T288803]]', diff saved to https://phabricator.wikimedia.org/P17171 and previous config saved to /var/cache/conftool/dbconfig/20210902-132945-root.json
* 13:29 jayme@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'blubberoid' for release 'staging' .
* 13:24 jbond@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on sretest1002.eqiad.wmnet with reason: REIMAGE
* 13:22 jbond@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest1002.eqiad.wmnet with reason: REIMAGE
* 13:14 jbond: reimage sretest1002 (not sretest1001)
* 13:14 marostegui@cumin1001: dbctl commit (dc=all): 'db2119 (re)pooling @ 50%: Slowly repool after reimage [[phab:T288803|T288803]]', diff saved to https://phabricator.wikimedia.org/P17169 and previous config saved to /var/cache/conftool/dbconfig/20210902-131441-root.json
* 13:14 jbond: reimage sretest1001
* 12:59 marostegui@cumin1001: dbctl commit (dc=all): 'db2119 (re)pooling @ 25%: Slowly repool after reimage [[phab:T288803|T288803]]', diff saved to https://phabricator.wikimedia.org/P17168 and previous config saved to /var/cache/conftool/dbconfig/20210902-125937-root.json
* 12:55 jbond: disable puppet fleet wide to roll out 715728
* 12:44 marostegui@cumin1001: dbctl commit (dc=all): 'db2119 (re)pooling @ 10%: Slowly repool after reimage [[phab:T288803|T288803]]', diff saved to https://phabricator.wikimedia.org/P17167 and previous config saved to /var/cache/conftool/dbconfig/20210902-124434-root.json
* 12:42 marostegui: Upgrade db2119
* 12:41 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2119 for upgrade', diff saved to https://phabricator.wikimedia.org/P17166 and previous config saved to /var/cache/conftool/dbconfig/20210902-124102-marostegui.json
* 12:28 marostegui@cumin1001: dbctl commit (dc=all): 'db2106 (re)pooling @ 100%: Slowly repool after reimage [[phab:T288803|T288803]]', diff saved to https://phabricator.wikimedia.org/P17165 and previous config saved to /var/cache/conftool/dbconfig/20210902-122826-root.json
* 12:13 marostegui@cumin1001: dbctl commit (dc=all): 'db2106 (re)pooling @ 75%: Slowly repool after reimage [[phab:T288803|T288803]]', diff saved to https://phabricator.wikimedia.org/P17164 and previous config saved to /var/cache/conftool/dbconfig/20210902-121323-root.json
* 11:58 marostegui@cumin1001: dbctl commit (dc=all): 'db2106 (re)pooling @ 50%: Slowly repool after reimage [[phab:T288803|T288803]]', diff saved to https://phabricator.wikimedia.org/P17163 and previous config saved to /var/cache/conftool/dbconfig/20210902-115819-root.json
* 11:43 marostegui@cumin1001: dbctl commit (dc=all): 'db2106 (re)pooling @ 25%: Slowly repool after reimage [[phab:T288803|T288803]]', diff saved to https://phabricator.wikimedia.org/P17162 and previous config saved to /var/cache/conftool/dbconfig/20210902-114315-root.json
* 11:28 marostegui@cumin1001: dbctl commit (dc=all): 'db2106 (re)pooling @ 10%: Slowly repool after reimage [[phab:T288803|T288803]]', diff saved to https://phabricator.wikimedia.org/P17161 and previous config saved to /var/cache/conftool/dbconfig/20210902-112812-root.json
* 11:26 urbanecm@deploy1002: Synchronized README: testing scap (duration: 01m 06s)
* 11:22 jmm@puppetmaster1001: conftool action : set/pooled=inactive; selector: name=mw2264.codfw.wmnet
* 11:18 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2106 for upgrade', diff saved to https://phabricator.wikimedia.org/P17160 and previous config saved to /var/cache/conftool/dbconfig/20210902-111843-marostegui.json
* 11:07 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 11:05 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|3ce5d80eb6f8ad720b5d9c0b6ad7840dd869735e}}: dewiki: Enable Growth features for 30% of newcomers ([[phab:T288420|T288420]]) (duration: 01m 58s)
* 11:05 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 11:04 urbanecm: metawiki: Server-side page move from VRT -> Volunteer Response Team ([[phab:T290083|T290083]])
* 11:00 marostegui@cumin1001: dbctl commit (dc=all): 'db2073 (re)pooling @ 100%: Slowly repool after reimage [[phab:T288803|T288803]]', diff saved to https://phabricator.wikimedia.org/P17158 and previous config saved to /var/cache/conftool/dbconfig/20210902-110022-root.json
* 10:45 marostegui@cumin1001: dbctl commit (dc=all): 'db2073 (re)pooling @ 75%: Slowly repool after reimage [[phab:T288803|T288803]]', diff saved to https://phabricator.wikimedia.org/P17155 and previous config saved to /var/cache/conftool/dbconfig/20210902-104518-root.json
* 10:38 mbsantos: REINDEX database gis in maps1009 while it's in depooled state
* 10:30 marostegui@cumin1001: dbctl commit (dc=all): 'db2073 (re)pooling @ 50%: Slowly repool after reimage [[phab:T288803|T288803]]', diff saved to https://phabricator.wikimedia.org/P17152 and previous config saved to /var/cache/conftool/dbconfig/20210902-103014-root.json
* 10:24 mvolz@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'zotero' for release 'production' .
* 10:23 mvolz@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'zotero' for release 'production' .
* 10:19 mvolz@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'zotero' for release 'staging' .
* 10:15 marostegui@cumin1001: dbctl commit (dc=all): 'db2073 (re)pooling @ 25%: Slowly repool after reimage [[phab:T288803|T288803]]', diff saved to https://phabricator.wikimedia.org/P17150 and previous config saved to /var/cache/conftool/dbconfig/20210902-101511-root.json
* 10:00 marostegui@cumin1001: dbctl commit (dc=all): 'db2073 (re)pooling @ 10%: Slowly repool after reimage [[phab:T288803|T288803]]', diff saved to https://phabricator.wikimedia.org/P17147 and previous config saved to /var/cache/conftool/dbconfig/20210902-100007-root.json
* 09:57 marostegui: Upgrade db2073
* 09:56 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2073 for upgrade', diff saved to https://phabricator.wikimedia.org/P17145 and previous config saved to /var/cache/conftool/dbconfig/20210902-095601-marostegui.json
* 09:56 hashar@deploy1002: Finished deploy [integration/docroot@973ac8a]: Support listing files on index pages - [[phab:T289196|T289196]] (duration: 00m 07s)
* 09:55 hashar@deploy1002: Started deploy [integration/docroot@973ac8a]: Support listing files on index pages - [[phab:T289196|T289196]]
* 09:20 marostegui@cumin1001: dbctl commit (dc=all): 'db2140 (re)pooling @ 100%: Slowly repool after reimage [[phab:T288803|T288803]]', diff saved to https://phabricator.wikimedia.org/P17142 and previous config saved to /var/cache/conftool/dbconfig/20210902-092026-root.json
* 09:05 marostegui@cumin1001: dbctl commit (dc=all): 'db2140 (re)pooling @ 75%: Slowly repool after reimage [[phab:T288803|T288803]]', diff saved to https://phabricator.wikimedia.org/P17141 and previous config saved to /var/cache/conftool/dbconfig/20210902-090523-root.json
* 08:55 marostegui: Remove flaggedrevs_stats2 and flaggedrevs_stats from eowiki,idwiki,plwiki,trwiki - [[phab:T289050|T289050]]
* 08:50 marostegui@cumin1001: dbctl commit (dc=all): 'db2140 (re)pooling @ 50%: Slowly repool after reimage [[phab:T288803|T288803]]', diff saved to https://phabricator.wikimedia.org/P17140 and previous config saved to /var/cache/conftool/dbconfig/20210902-085019-root.json
* 08:35 marostegui@cumin1001: dbctl commit (dc=all): 'db2140 (re)pooling @ 25%: Slowly repool after reimage [[phab:T288803|T288803]]', diff saved to https://phabricator.wikimedia.org/P17138 and previous config saved to /var/cache/conftool/dbconfig/20210902-083515-root.json
* 08:20 marostegui@cumin1001: dbctl commit (dc=all): 'db2140 (re)pooling @ 10%: Slowly repool after reimage [[phab:T288803|T288803]]', diff saved to https://phabricator.wikimedia.org/P17136 and previous config saved to /var/cache/conftool/dbconfig/20210902-082012-root.json
* 08:14 marostegui: Upgrade db2140
* 08:14 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2140 for upgrade', diff saved to https://phabricator.wikimedia.org/P17135 and previous config saved to /var/cache/conftool/dbconfig/20210902-081436-marostegui.json
* 07:57 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1064.eqiad.wmnet
* 07:51 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be1064.eqiad.wmnet
* 07:44 marostegui: Remove flaggedrevs_stats2 and flaggedrevs_stats on huwiki - [[phab:T289050|T289050]]
* 07:44 marostegui: Remove flaggedrevs_stats2 and flaggedrevs_stats on arwiki - [[phab:T289050|T289050]]
* 07:06 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 07:04 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 07:00 marostegui: Stop mariadb on pc2007 before decommissioning [[phab:T289112|T289112]]
* 06:59 marostegui@deploy1002: Synchronized wmf-config/ProductionServices.php: Remove pc2007 [[phab:T289112|T289112]] (duration: 01m 06s)
* 06:13 eileen: civicrm revision changed from {{Gerrit|ad37f21a7d}} to {{Gerrit|7ac13753c7}}, config revision is {{Gerrit|5f004d94d7}}
* 04:50 marostegui: Remove flaggedrevs_stats2 and flaggedrevs_stats on ruwiki - [[phab:T289050|T289050]]
* 02:09 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 02:06 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 02:05 krinkle@deploy1002: Synchronized php-1.37.0-wmf.21/extensions/WikimediaMaintenance/blameStartupRegistry.php: {{Gerrit|I63bf1922af593b7a144ef5f6d036f9a5e23cec09}} (duration: 01m 07s)


== June 7 ==
== 2021-09-01 ==
* 23:27 godog: reboot ms-be2008 sdg failed, xfs unhappy
* 23:50 Amir1: mwscript createAndPromote.php --wiki=test2wiki --sysop --force Ladsgroup
* 07:03 logmsgbot: springle Synchronized wmf-config/db-eqiad.php: repool db1073, warm up (duration: 01m 09s)
* 23:50 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 05:16 andrewbogott: we did a whole lot of things to labstore1001 while morebots was away
* 23:45 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 05:14 andrewbogott: service nfs-kernel-server restart on labstore1001
* 23:35 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 02:26 logmsgbot: LocalisationUpdate completed (1.26wmf8) at 2015-06-07 02:25:13+00:00
* 23:30 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 02:21 logmsgbot: l10nupdate Synchronized php-1.26wmf8/cache/l10n: (no message) (duration: 07m 09s)
* 23:27 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.20/extensions/GrowthExperiments/maintenance/fixLinkRecommendationData.php: {{Gerrit|0bd65426494d4df981141650211e27e17c98ee0c}}: fixLinkRecommendationData: stay under 10K search limit ([[phab:T284531|T284531]]) (duration: 01m 06s)
* 23:27 eileen: civicrm revision changed from {{Gerrit|30cd9c1d90}} to {{Gerrit|ad37f21a7d}}, config revision is {{Gerrit|5f004d94d7}}
* 23:25 legoktm@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'shellbox-timeline' for release 'main' .
* 23:24 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.20/extensions/GrowthExperiments/maintenance/fixLinkRecommendationData.php: {{Gerrit|3c7d4ecc699b7c68467a372686f5514375d2b74f}}: fixLinkRecommendationData: Allow --db-table in dry-run mode ([[phab:T283868|T283868]]) (duration: 01m 06s)
* 23:20 urbanecm@deploy1002: Synchronized wmf-config/extension-list: {{Gerrit|91ff9273fd9f80b571771a7454d34d63f43405b8}}: Enable NearbyPages on beta cluster ([[phab:T246493|T246493]]; 3/3) (duration: 01m 05s)
* 23:19 legoktm@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'shellbox-timeline' for release 'main' .
* 23:18 urbanecm@deploy1002: Synchronized wmf-config/CommonSettings.php: {{Gerrit|91ff9273fd9f80b571771a7454d34d63f43405b8}}: Enable NearbyPages on beta cluster ([[phab:T246493|T246493]]; 2/3) (duration: 01m 06s)
* 23:18 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 23:17 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|91ff9273fd9f80b571771a7454d34d63f43405b8}}: Enable NearbyPages on beta cluster ([[phab:T246493|T246493]]; 1/3) (duration: 01m 06s)
* 23:16 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 23:15 legoktm@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'shellbox-timeline' for release 'main' .
* 23:11 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|bb7d92c48edf48b94fd628e9e0b5fd6682460373}}: Enable WVUI search on Wikimedia Commons ([[phab:T287215|T287215]]) (duration: 01m 07s)
* 23:04 dpifke@deploy1002: Finished deploy [performance/navtiming@63c9d31]: Deploy fix for CpuBenchmark-related Prometheus timeouts [[phab:T281243|T281243]] (duration: 00m 06s)
* 23:04 dpifke@deploy1002: Started deploy [performance/navtiming@63c9d31]: Deploy fix for CpuBenchmark-related Prometheus timeouts [[phab:T281243|T281243]]
* 22:44 legoktm@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
* 22:43 legoktm@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
* 22:43 legoktm@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
* 22:43 legoktm@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
* 22:42 legoktm@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
* 22:42 legoktm@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
* 22:40 legoktm@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
* 22:39 legoktm@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
* 22:35 legoktm@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
* 22:34 legoktm@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
* 22:33 legoktm@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
* 22:33 legoktm@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
* 22:32 legoktm@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
* 22:32 legoktm@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
* 22:30 legoktm@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
* 22:29 legoktm@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
* 20:02 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 20:01 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 19:57 twentyafterfour@deploy1002: Synchronized php: group1 wikis to 1.37.0-wmf.21  refs [[phab:T281161|T281161]] (duration: 01m 06s)
* 19:57 twentyafterfour: twentyafterfour@deploy1002 rebuilt and synchronized wikiversions files: group1 wikis to 1.37.0-wmf.21  refs [[phab:T281162|T281162]]
* 19:56 twentyafterfour@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.37.0-wmf.21  refs [[phab:T281161|T281161]]
* 18:38 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 18:37 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 18:30 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 18:28 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 18:21 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 18:19 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 18:17 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|fe1ae2e438841a069dc8dadc9a1850b91863c06a}}: Growth features: Deploy to 100% of newcomers on small wikis ([[phab:T289786|T289786]]) (duration: 01m 06s)
* 18:09 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 18:07 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|27e85b1f228dccb584b4692f5b1b1354b19625b4}}: nlwiki: Enable link recommendations for all Growth users ([[phab:T285254|T285254]]) (duration: 01m 06s)
* 18:07 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 18:05 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|94b1cca}}: Growth features: Enable for newcomers on two wikis ([[phab:T285254|T285254]], [[phab:T287867|T287867]]) (duration: 01m 09s)
* 17:31 ejegg: updated payments-wiki from {{Gerrit|c4d56178d0}} to {{Gerrit|f9cbf95a12}}
* 16:23 mforns@deploy1002: Finished deploy [analytics/refinery@ff15071] (thin): Fix for cassandra3 loading THIN [analytics/refinery@ff15071] (duration: 00m 06s)
* 16:23 mforns@deploy1002: Started deploy [analytics/refinery@ff15071] (thin): Fix for cassandra3 loading THIN [analytics/refinery@ff15071]
* 16:22 mforns@deploy1002: Finished deploy [analytics/refinery@ff15071]: Fix for cassandra3 loading [analytics/refinery@ff15071] (duration: 26m 58s)
* 16:06 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on ms-be1066.eqiad.wmnet with reason: REIMAGE
* 16:04 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be1065.eqiad.wmnet with reason: REIMAGE
* 16:02 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on ms-be1064.eqiad.wmnet with reason: REIMAGE
* 16:01 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be1066.eqiad.wmnet with reason: REIMAGE
* 16:01 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be1065.eqiad.wmnet with reason: REIMAGE
* 16:00 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be1064.eqiad.wmnet with reason: REIMAGE
* 15:55 mforns@deploy1002: Started deploy [analytics/refinery@ff15071]: Fix for cassandra3 loading [analytics/refinery@ff15071]
* 15:35 jiji@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 15:08 dzahn@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'miscweb' for release 'main' .
* 14:08 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 14:07 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 14:04 godog: move simone-this-dot from wmf to nda ldap group - [[phab:T289783|T289783]]
* 13:51 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host rdb2009.codfw.wmnet
* 13:49 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 13:48 krinkle@deploy1002: Synchronized php-1.37.0-wmf.20/includes/resourceloader: {{Gerrit|Id7c258841d7816}} (duration: 01m 06s)
* 13:46 krinkle@deploy1002: Synchronized php-1.37.0-wmf.21/includes/resourceloader: {{Gerrit|Id7c258841d7816}} (duration: 01m 49s)
* 13:45 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host rdb2009.codfw.wmnet
* 13:45 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 13:38 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 13:36 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 13:16 dzahn@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'miscweb' for release 'main' .
* 13:05 mutante: planet1002 - temp removing feed from ad.huikeshoven - seems to cause corrupt state file ([[phab:T289984|T289984]])
* 13:01 dzahn@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'miscweb' for release 'main' .
* 12:48 godog: s/webperf/navtiming/
* 12:47 godog: bounce webperf on webperf2001 - [[phab:T290138|T290138]]
* 12:41 mutante: planet1002 - rm /etc/rawdog/en/feeds/39a7970f.state (corrupt) [[phab:T289984|T289984]]
* 12:38 dzahn@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'miscweb' for release 'main' .
* 11:19 Krinkle: effie restarted php-fpm on parse2007.codfw.wmnet, ref [[phab:T290120|T290120]].
* 10:21 jbond: start filtering more puppet facts G:715461 - [[phab:T263578|T263578]]
* 09:23 marostegui: Drop flaggedrevs_stats and flaggedrevs_stats2 from dewiki [[phab:T289050|T289050]]
* 07:45 ema: deploy Varnish SLO dashboard with grr apply slo_dashboards.jsonnet [[phab:T289036|T289036]]
* 07:05 XioNoX: pfw NAT and ACLs changes - [[phab:T290077|T290077]]
* 06:29 elukey@cumin1001: END (PASS) - Cookbook sre.puppet.renew-cert (exit_code=0) for sodium.wikimedia.org: Renew puppet certificate - elukey@cumin1001
* 06:28 elukey@cumin1001: START - Cookbook sre.puppet.renew-cert for sodium.wikimedia.org: Renew puppet certificate - elukey@cumin1001
* 05:25 effie: depool mw2251 mw2255 parse2001 for tests - [[phab:T280497|T280497]]
* 04:41 marostegui: Optimize idwiki.flaggedtemplates [[phab:T290057|T290057]]
* 04:23 marostegui: Optimize arwiki.flaggedtemplates [[phab:T290057|T290057]]
* 04:16 eileen: civicrm revision changed from {{Gerrit|7da3eba4f9}} to {{Gerrit|30cd9c1d90}}, config revision is {{Gerrit|5f004d94d7}}
* 00:53 eileen: civicrm revision changed from {{Gerrit|e567b4c289}} to {{Gerrit|7da3eba4f9}}, config revision is {{Gerrit|5f004d94d7}}


== June 6 ==
== 2021-08-31 ==
* 23:46 subbu: deployed parsoid 5172a446 (cherry-pick of 719c736f) -- hotfix for T101599
* 23:41 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 05:48 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Sat Jun 6 05:47:40 UTC 2015 (duration 47m 39s)
* 23:38 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 02:31 logmsgbot: LocalisationUpdate completed (1.26wmf8) at 2015-06-06 02:30:24+00:00
* 23:38 eileen: civicrm revision changed from {{Gerrit|718aa9cad3}} to {{Gerrit|e567b4c289}}, config revision is {{Gerrit|7a24870bc7}}
* 02:26 logmsgbot: l10nupdate Synchronized php-1.26wmf8/cache/l10n: (no message) (duration: 07m 10s)
* 23:33 dpifke@deploy1002: Synchronized wmf-config/profiler.php: Revert excimer-k8s pipelines [[phab:T288165|T288165]] (duration: 01m 14s)
* 23:31 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 23:25 dpifke@deploy1002: scap failed: average error rate on 3/6 canaries increased by 10x (rerun with --force to override this check, see https://logstash.wikimedia.org/goto/83629bcb5560d11e61d3085c89dd9ed6 for details)
* 23:23 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 23:15 mforns: failed deployment of refinery (v0.1.17) to an-test-coord1001.eqiad.wmnet (scap error)
* 23:15 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 23:14 mforns@deploy1002: Finished deploy [analytics/refinery@a0f039b] (hadoop-test): Regular analytics weekly train TEST v0.1.17 [analytics/refinery@a0f039b] (duration: 13m 42s)
* 23:09 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|1437d99c1884c0695f02b81b724ec82a2bd3362e}}: Enable link recommendation frontent in dewiki and nlwiki ([[phab:T288420|T288420]], [[phab:T285254|T285254]]) (duration: 01m 06s)
* 23:08 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 23:07 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|8997ae5d0b998839853aed2b246f5c88fe9d83eb}}: Fix wgDiscussionTools_sourcemodetoolbar settings (duration: 01m 22s)
* 23:01 mforns@deploy1002: Started deploy [analytics/refinery@a0f039b] (hadoop-test): Regular analytics weekly train TEST v0.1.17 [analytics/refinery@a0f039b]
* 23:00 mforns@deploy1002: Finished deploy [analytics/refinery@a0f039b] (thin): Regular analytics weekly train THIN v0.1.17 [analytics/refinery@a0f039b] (duration: 00m 07s)
* 23:00 mforns@deploy1002: Started deploy [analytics/refinery@a0f039b] (thin): Regular analytics weekly train THIN v0.1.17 [analytics/refinery@a0f039b]
* 23:00 mforns@deploy1002: Finished deploy [analytics/refinery@a0f039b]: Regular analytics weekly train v0.1.17 [analytics/refinery@a0f039b] (duration: 17m 39s)
* 22:42 mforns@deploy1002: Started deploy [analytics/refinery@a0f039b]: Regular analytics weekly train v0.1.17 [analytics/refinery@a0f039b]
* 21:58 ejegg: switched Adyen to new Checkout integration
* 21:41 dduvall@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'blubberoid' for release 'production' .
* 21:38 dduvall@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'blubberoid' for release 'production' .
* 21:34 dduvall@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'blubberoid' for release 'staging' .
* 20:19 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 20:11 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 20:00 twentyafterfour@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.37.0-wmf.21  refs [[phab:T281161|T281161]]
* 19:20 brennen: gitlab1001: brief downtime for testing reconfiguration of cas3.session_duration
* 19:05 twentyafterfour@deploy1002: Finished scap: testwikis wikis to 1.37.0-wmf.21  refs [[phab:T281161|T281161]] (duration: 35m 53s)
* 19:03 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 18:56 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 18:40 ejegg: switched Adyen back to HPP integration
* 18:38 ejegg: updated payments-wiki from {{Gerrit|564daed816}} to {{Gerrit|c4d56178d0}}, switched Adyen to Checkout integration
* 18:30 twentyafterfour@deploy1002: Started scap: testwikis wikis to 1.37.0-wmf.21 refs [[phab:T281161|T281161]]
* 18:24 twentyafterfour: ran `scap prep 1.37.0-wmf.21` and `scap apply-patches --train 1.37.0-wmf.21` refs [[phab:T281162|T281162]]
* 18:05 XioNoX: re-pool eqsin-codfw link
* 16:18 dcausse@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
* 16:14 dcausse@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
* 16:08 hnowlan@deploy1002: Finished deploy [restbase/deploy@09156c2]: fix core Title redirect loop (duration: 16m 02s)
* 15:52 hnowlan@deploy1002: Started deploy [restbase/deploy@09156c2]: fix core Title redirect loop
* 14:30 jbond: enable puppet fleet wide to post preform puppetdb maintance [[phab:T263578|T263578]]
* 14:29 hashar: Restarting CI Jenkins for plugins upgrade
* 14:19 ottomata: merged change to service_auto_restart.pp that changes the way service names are matched to be more explicit.  tested in deployment prep and nothing bad happened.  Logging in case something bad does happen in prod.  https://gerrit.wikimedia.org/r/c/operations/puppet/+/697605
* 14:09 otto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
* 14:09 otto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
* 14:07 otto@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
* 14:05 otto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-analytics' for release 'production' .
* 14:05 otto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-analytics' for release 'production' .
* 14:03 otto@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics' for release 'production' .
* 14:03 otto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
* 14:02 otto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
* 14:02 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on puppetdb2002.codfw.wmnet with reason: puppetdb maintance - [[phab:T289779|T289779]]
* 14:02 jbond@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on puppetdb2002.codfw.wmnet with reason: puppetdb maintance - [[phab:T289779|T289779]]
* 14:02 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on puppetdb1002.eqiad.wmnet with reason: puppetdb maintance - [[phab:T289779|T289779]]
* 14:02 jbond@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on puppetdb1002.eqiad.wmnet with reason: puppetdb maintance - [[phab:T289779|T289779]]
* 14:01 otto@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
* 14:00 otto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
* 13:47 jbond: disable puppet fleet wide to preform puppetdb maintance [[phab:T263578|T263578]]
* 13:41 otto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
* 13:37 urbanecm: Start `mwscript extensions/GrowthExperiments/maintenance/refreshLinkRecommendations.php --wiki=nlwiki --verbose` in a tmux session at mwmaint2002
* 13:28 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=maps1010.eqiad.wmnet
* 13:06 dzahn@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'miscweb' for release 'main' .
* 13:04 jayme@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
* 12:59 urbanecm: [urbanecm@mwmaint2002 ~]$ sudo -u www-data kill 133282 # stop updateMenteeData.php at frwiki
* 12:52 jelto: run kubectl scale deployments.apps -n ci mediawiki-bruce --replicas=0 to stop ImagePulling and reduce io on kubestage1001
* 12:42 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5 days, 8:00:00 on planet1002.eqiad.wmnet with reason: known issue
* 12:42 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 5 days, 8:00:00 on planet1002.eqiad.wmnet with reason: known issue
* 11:38 jbond: sudo  gnt-instance modify --disk add:size=100G  puppetdb2002.codfw.wmnet [[phab:T263578|T263578]]
* 11:38 jbond: sudo gnt-instance modify --disk add:size=100G puppetdb1002.eqiad.wmnet [[phab:T263578|T263578]]
* 11:37 jbond: sudo  gnt-instance modify --disk add:size=100G  puppetdb2002.codfw.wmnet
* 11:35 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 11:33 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 11:31 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.20/extensions/GrowthExperiments/maintenance/updateMenteeData.php: {{Gerrit|53a1856128edb4ec3a5ea8840fb6755a1703f7ac}}: updateMenteeData: Send timing to statsd ([[phab:T278971|T278971]]) (duration: 00m 57s)
* 11:11 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 11:09 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 11:07 urbanecm: EU B&C window done
* 11:06 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|eb482e3fa88a87166b990fd9b87d0ccbbf971290}}: Offer the DiscussionTools reply tool as opt-out setting at 21 phase 2 Wikipedias ([[phab:T288483|T288483]]) (duration: 00m 57s)
* 10:38 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on maps1010.eqiad.wmnet with reason: Resyncing from master
* 10:38 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 5:00:00 on maps1010.eqiad.wmnet with reason: Resyncing from master
* 10:23 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=maps1010.eqiad.wmnet
* 10:23 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=maps1008.eqiad.wmnet
* 10:14 marostegui: Optimize huwiki.flaggedtemplates [[phab:T290057|T290057]]
* 10:11 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=maps1008.eqiad.wmnet
* 08:39 marostegui: Optimize plwiki.flaggedtemplates [[phab:T290057|T290057]]
* 08:18 marostegui: Optimize cewiki.flaggedtemplates [[phab:T290057|T290057]]
* 08:05 marostegui: Optimize plwiktionary.flaggedtemplates [[phab:T290057|T290057]]
* 07:44 marostegui: Optimize ruwiki.flaggedtemplates [[phab:T290057|T290057]]
* 07:01 XioNoX: drain eqsin-codfw link
* 06:56 marostegui@cumin1001: dbctl commit (dc=all): 'db2110 (re)pooling @ 100%: Slowly repool after reimage [[phab:T288803|T288803]]', diff saved to https://phabricator.wikimedia.org/P17113 and previous config saved to /var/cache/conftool/dbconfig/20210831-065600-root.json
* 06:40 marostegui@cumin1001: dbctl commit (dc=all): 'db2110 (re)pooling @ 75%: Slowly repool after reimage [[phab:T288803|T288803]]', diff saved to https://phabricator.wikimedia.org/P17112 and previous config saved to /var/cache/conftool/dbconfig/20210831-064056-root.json
* 06:25 marostegui@cumin1001: dbctl commit (dc=all): 'db2110 (re)pooling @ 50%: Slowly repool after reimage [[phab:T288803|T288803]]', diff saved to https://phabricator.wikimedia.org/P17111 and previous config saved to /var/cache/conftool/dbconfig/20210831-062553-root.json
* 06:10 marostegui@cumin1001: dbctl commit (dc=all): 'db2110 (re)pooling @ 25%: Slowly repool after reimage [[phab:T288803|T288803]]', diff saved to https://phabricator.wikimedia.org/P17110 and previous config saved to /var/cache/conftool/dbconfig/20210831-061049-root.json
* 06:06 marostegui: Rename flaggedrevs_stats2 and flaggedrevs_stats on dewiki codfw [[phab:T289050|T289050]]
* 05:55 marostegui@cumin1001: dbctl commit (dc=all): 'db2110 (re)pooling @ 10%: Slowly repool after reimage [[phab:T288803|T288803]]', diff saved to https://phabricator.wikimedia.org/P17109 and previous config saved to /var/cache/conftool/dbconfig/20210831-055546-root.json
* 03:39 eileen: civicrm revision changed from {{Gerrit|e89504652a}} to {{Gerrit|718aa9cad3}}, config revision is {{Gerrit|cb0a008cad}}
* 02:33 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 02:31 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 02:09 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 02:08 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 02:04 eileen: tools revision changed from {{Gerrit|14e4125f73}} to {{Gerrit|1d67c52c12}}


== June 5 ==
== 2021-08-30 ==
* 22:42 godog: powercycle graphite2001, no console no ssh
* 23:14 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 22:06 andrewbogott: restarted apache on virt1000
* 23:13 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 20:49 ori: Upgrading hhvm-fss on application servers to 1.1.7; expect brief 5xx spike.
* 23:11 urbanecm: Evening B&C done
* 20:14 logmsgbot: demon Synchronized php-1.26wmf8: live hack (duration: 02m 32s)
* 23:11 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.20/extensions/GrowthExperiments/includes/Specials/SpecialMentorDashboard.php: {{Gerrit|9e2264a0c9a48548da4795b2a5b9d7275d254ac7}}: Instrument Special:MentorDashboard ([[phab:T289369|T289369]]) (duration: 00m 55s)
* 20:10 mutante: apt-get upgrade on terbium
* 23:08 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.20/extensions/GrowthExperiments/includes/Specials/SpecialHomepage.php: {{Gerrit|9e2264a0c9a48548da4795b2a5b9d7275d254ac7}}: Instrument Special:MentorDashboard ([[phab:T289369|T289369]]) (duration: 00m 57s)
* 19:52 godog: bounce redis on rdb1001/rdb1003 to pick up new slave limits
* 21:56 eileen: civicrm revision changed from {{Gerrit|13bf3a02df}} to {{Gerrit|e89504652a}}, config revision is {{Gerrit|cb0a008cad}}
* 19:51 mutante: chown root:root / on terbium
* 19:59 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 19:50 godog: bounce redis on rdb1002/rdb1004 to pick up new slave limits
* 19:57 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 19:29 godog: bounce redis again on rdb1003 after increasing the slave limits more
* 19:52 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|9a92e2ae7526717a0a42b825a34b4595e75a544b}}: Fix mediawiki.mentor_dashboard.visits definition (duration: 00m 56s)
* 19:17 godog: bounce redis on rdb1003 after bumping slave limits
* 19:08 tgr: morning deploys done for real
* 19:07 godog: redis master logs shows periodic 'cmd=sync scheduled to be closed ASAP for overcoming of output buffer limits.' indicating the slave fails to sync
* 19:06 tgr@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:715579{{!}}Fix schema definition for mediawiki.mentor_dashboard.visit (T289369)]] (duration: 00m 56s)
* 18:40 godog: spike in redis network starting at ~15.00 UTC, correlates with ocg failures
* 19:05 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 18:01 moritzm: restarted gerrit on ytterbium for java update
* 19:03 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 14:43 jynus: short lag period on db1049, traffic automatically redirected to other slave and back to normal
* 18:49 tgr@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Revert: [[gerrit:715529{{!}}Add mediawiki.mentor_dashboard.visit schema (T289369)]] (duration: 00m 26s)
* 14:07 moritzm: added ubuntu-meta-1.325+wmf1 for trusty-wikimedia to apt.wikimedia.org (T100004)
* 18:48 tgr@deploy1002: Scap failed!: 5/6 canaries failed their endpoint checks(https://en.wikipedia.org)
* 14:07 moritzm: added ubuntu-meta-1.267.1+wmf1 for precise-wikimedia to apt.wikimedia.org (T100004)
* 18:43 tgr: morning deploys done
* 12:44 logmsgbot: jynus Synchronized wmf-config/db-eqiad.php: Depool es1007 (duration: 01m 08s)
* 18:43 tgr@deploy1002: scap failed: average error rate on 3/6 canaries increased by 10x (rerun with --force to override this check, see https://logstash.wikimedia.org/goto/83629bcb5560d11e61d3085c89dd9ed6 for details)
* 12:08 logmsgbot: jynus Synchronized wmf-config/db-eqiad.php: Repool es1009 (duration: 01m 08s)
* 18:41 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 11:30 _joe_: uploaded new HHVM package, installing on mw1025 for testing
* 18:38 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 09:17 moritzm: added redis_2.6.13-1+wmf1 to precise-wikimedia on apt.wikimedia.org
* 18:26 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 06:24 moritzm: added redis_2.8.4-2+wmf1 to trusty-wikimedia on apt.wikimedia.org
* 18:24 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 05:23 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Fri Jun  5 05:22:50 UTC 2015 (duration 22m 49s)
* 18:22 tgr@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:715568{{!}}GrowthExperiments: Enable link recommendation for dewiki and nlwiki (T288420 T285254)]] (duration: 00m 56s)
* 04:10 logmsgbot: springle Synchronized wmf-config/db-eqiad.php: depool db1073 (duration: 01m 08s)
* 18:18 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 02:26 logmsgbot: LocalisationUpdate completed (1.26wmf8) at 2015-06-05 02:25:20+00:00
* 18:16 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 02:21 logmsgbot: l10nupdate Synchronized php-1.26wmf8/cache/l10n: (no message) (duration: 07m 09s)
* 18:14 tgr@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:714548{{!}}GrowthExperiments: Switch image recommendations flag off (T288797)]] (duration: 00m 57s)
* 01:27 tgr: deploying schema changes for Gather on enwiki, enwikivoyage, hewiki (T98490, T101460)
* 17:44 ryankemper: [WDQS Deploy] Test query passing on `query.wikidata.org` and icinga looks good. This deploy is done.
* 00:08 logmsgbot: catrope Synchronized php-1.26wmf8/vendor/oojs/oojs-ui/php/Tag.php: Fix OOUI fatals (T99210) (duration: 00m 13s)
* 17:12 ryankemper: [WDQS Deploy] Restarting `wdqs-categories` across lvs-managed hosts, one node at a time: `sudo -E cumin -b 1 'A:wdqs-all and not A:wdqs-test' 'depool && sleep 45 && systemctl restart wdqs-categories && sleep 45 && pool'`
* 17:12 ryankemper: [WDQS Deploy] Restarted `wdqs-categories` across both test hosts simultaneously: `sudo -E cumin 'A:wdqs-test' 'systemctl restart wdqs-categories'`
* 17:12 ryankemper: [WDQS Deploy] Restarted `wdqs-updater` across all hosts, 4 hosts at a time: `sudo -E cumin -b 4 'A:wdqs-all' 'systemctl restart wdqs-updater'`
* 17:10 ryankemper@deploy1002: Finished deploy [wdqs/wdqs@a17833c]: 0.3.84 (duration: 08m 16s)
* 17:04 ryankemper: [WDQS Deploy] Tests passing following deploy of `0.3.84` on canary `wdqs1003`; proceeding to rest of fleet
* 17:02 ryankemper@deploy1002: Started deploy [wdqs/wdqs@a17833c]: 0.3.84
* 17:02 ryankemper: [WDQS Deploy] Gearing up for deploy of wdqs `0.3.84`. Pre-deploy tests passing on canary `wdqs1003`
* 17:00 ryankemper: [[phab:T289483|T289483]] Pooled `wdqs1013`
* 16:36 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti1024.eqiad.wmnet with reason: REIMAGE
* 16:34 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti1024.eqiad.wmnet with reason: REIMAGE
* 16:20 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on maps1008.eqiad.wmnet with reason: Resyncing from master
* 16:20 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on maps1008.eqiad.wmnet with reason: Resyncing from master
* 16:20 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=maps1008.eqiad.wmnet
* 16:20 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=maps1007.eqiad.wmnet
* 16:16 sukhe: running authdns-update for Gerrit 715499
* 14:44 dzahn@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'miscweb' for release 'main' .
* 14:21 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on maps1007.eqiad.wmnet with reason: Resyncing from master
* 14:21 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 5:00:00 on maps1007.eqiad.wmnet with reason: Resyncing from master
* 14:21 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=maps1007.eqiad.wmnet
* 14:21 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=maps1006.eqiad.wmnet
* 14:18 akosiaris@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
* 14:02 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 14:00 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 13:55 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|b17015395cc592e021a4ca8ce6f81b699bb77381}}:  Growth mentor dashboard: Enable beta features only on beta wikis ([[phab:T280307|T280307]]) (duration: 00m 55s)
* 13:53 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|f1a178e1d4d7c98a1988da68982f97848f390c68}}: knwiki: Disable wmgNewUserMessageOnAutoCreate ([[phab:T289333|T289333]]) (duration: 00m 57s)
* 13:52 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 13:48 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 13:48 urbanecm@deploy1002: Synchronized wmf-config/CommonSettings.php: {{Gerrit|6fbcc93f429ff3fbca98aeecdee4f33f022ca7c3}}: Add missing edit*protected rights to $wgAvailableRights (duration: 00m 56s)
* 12:12 Amir1: ladsgroup@mwmaint2002:~$ mwscript extensions/WikimediaMaintenance/filebackend/setZoneAccess.php --wiki=jvwikisource --backend=local-multiwrite ([[phab:T289860|T289860]])
* 11:52 jelto@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
* 11:51 jelto@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
* 11:48 jelto@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
* 11:47 jelto@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
* 11:31 jelto@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
* 11:30 jelto@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
* 10:55 jelto@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
* 10:53 jelto@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
* 10:21 dcausse@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
* 09:51 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 09:46 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 09:34 ladsgroup@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:703476{{!}}Set $wgIncludejQueryMigrate to false in group0 (T280944)]] (duration: 00m 57s)
* 09:01 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on maps1006.eqiad.wmnet with reason: Resyncing from master
* 09:01 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 5:00:00 on maps1006.eqiad.wmnet with reason: Resyncing from master
* 09:01 hnowlan@cumin1001: END (FAIL) - Cookbook sre.postgresql.postgres-init (exit_code=99)
* 09:00 hnowlan@cumin1001: START - Cookbook sre.postgresql.postgres-init
* 08:59 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=maps1006.eqiad.wmnet
* 08:57 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=maps1005.eqiad.wmnet
* 08:57 godog: +100G to prometheus/global in codfw
* 08:04 vgutierrez: pool cp2027 - [[phab:T289908|T289908]]
* 06:53 elukey: drop an-airflow1001's old airflow logs to fix root partition almost filled up
* 06:38 godog: more weight to ms-be20[62-65] - [[phab:T288458|T288458]]
* 05:44 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2110.codfw.wmnet with reason: REIMAGE
* 05:42 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db2110.codfw.wmnet with reason: REIMAGE
* 05:23 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2110 for reimage [[phab:T288803|T288803]]', diff saved to https://phabricator.wikimedia.org/P17105 and previous config saved to /var/cache/conftool/dbconfig/20210830-052336-marostegui.json


== June 4 ==
== 2021-08-29 ==
* 23:40 logmsgbot: catrope Synchronized php-1.26wmf8/extensions/MobileFrontend: SWAT (duration: 00m 13s)
* 00:15 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 23:28 logmsgbot: catrope Synchronized wmf-config/InitialiseSettings.php: Disable VE A/B test for new accounts on enwiki (duration: 00m 13s)
* 00:13 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 22:39 ejegg: updated payments from d22e44e3fab2b937707c2776384cb93a49b4cfd3 to 43c7952d2a31deaea97e8319f5612d644dce43c8
* 22:21 ottomata: doing controlled restart of kafka brokers services to apply auto create topic config
* 21:48 jgage: analyics1013 crashed, rebooted
* 21:42 logmsgbot: ori Synchronized php-1.26wmf8/includes/libs/ReplacementArray.php: 1b20d62c26: Revert "awful hack: disable fss on zhwiki only, except on mw1017" (duration: 00m 13s)
* 21:34 ori: performing rolling restart of HHVMs for hhvm-fss upgrade
* 21:27 bd808: restarted logstash and elasticsearch on logstash100[1-3] to pick up latest jre updates
* 18:48 mutante: restarted apache on silver/wikitech
* 18:20 logmsgbot: jynus Synchronized wmf-config/db-eqiad.php: Depool es1009 and master-slave switchover (duration: 00m 13s)
* 18:01 awight: Enabling PayPal audit parser job
* 17:57 logmsgbot: jynus Synchronized wmf-config/db-eqiad.php: Repool es1008 (duration: 00m 15s)
* 17:44 logmsgbot: jynus Synchronized wmf-config/db-codfw.php: Repool es2008 and its slaves (duration: 00m 13s)
* 17:21 ori: Disabling Puppet and nutcracker on mw1017 to control for parser cache
* 17:18 logmsgbot: jynus Synchronized wmf-config/db-codfw.php: Depool es2008 and its slaves (duration: 00m 13s)
* 17:17 logmsgbot: jynus Synchronized wmf-config/db-eqiad.php: Depool es1008 (duration: 00m 12s)
* 16:33 logmsgbot: kartik Finished scap: Update ContentTranslation (duration: 09m 17s)
* 16:23 logmsgbot: kartik Started scap: Update ContentTranslation
* 15:54 moritzm: added redis_2.8.4-2+wmf1 to trusty-wikimedia on apt.wikimedia.org
* 15:48 logmsgbot: anomie Synchronized php-1.26wmf8/includes/jobqueue/: SWAT: jobqueue: Record stats on how long it takes before a job is run [[gerrit:215748]] (duration: 00m 14s)
* 15:38 logmsgbot: anomie Synchronized wmf-config/InitialiseSettings.php: SWAT: Enable ApiFeatureUsage everywhere [[gerrit:215901]] (duration: 00m 19s)
* 15:36 logmsgbot: anomie Synchronized wmf-config/CommonSettings.php: SWAT: Remove obsolete 'ValidateExtendedMetadataCache' hook [[gerrit:215900]] (duration: 00m 12s)
* 15:35 logmsgbot: anomie Synchronized wmf-config/InitialiseSettings.php: SWAT: CX: Added staff-recommender campaign [[gerrit:215865]] (duration: 00m 12s)
* 15:30 logmsgbot: anomie Synchronized wmf-config/InitialiseSettings.php: SWAT: CX: Add wikis for deployment on 20150406 [[gerrit:215281]] (duration: 00m 12s)
* 15:12 logmsgbot: ori Synchronized php-1.26wmf8/includes/libs/ReplacementArray.php: Ia5f3dc84605: awful hack: disable fss on zhwiki only, except on mw1017 (duration: 00m 17s)
* 15:09 _joe_: puppet disabled, fss disabled on mw1017
* 14:42 YuviPanda: running sudo sed -i 's/GlobalSign_CA.pem/ca-certificates.crt/' /etc/ldap/ldap.conf on all labs nodes
* 14:36 awight: Disable PayPal audit parsing job
* 12:19 logmsgbot: springle Synchronized wmf-config/db-eqiad.php: repool db1072, warm up (duration: 00m 13s)
* 05:12 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Thu Jun  4 05:11:32 UTC 2015 (duration 11m 31s)
* 02:30 logmsgbot: LocalisationUpdate completed (1.26wmf8) at 2015-06-04 02:28:54+00:00
* 02:25 logmsgbot: l10nupdate Synchronized php-1.26wmf8/cache/l10n: (no message) (duration: 07m 22s)


== June 3 ==
== 2021-08-28 ==
* 23:42 logmsgbot: kaldari Synchronized wmf-config/InitialiseSettings.php: syncing ImportSource change for meta (duration: 00m 13s)
* 23:26 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 23:34 logmsgbot: kaldari Synchronized wmf-config/InitialiseSettings.php: syncing config change for mediawiki logo on mobile, take 2 (duration: 00m 12s)
* 23:24 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 23:26 logmsgbot: kaldari Synchronized wmf-config/InitialiseSettings.php: syncing config change for mediawiki logo on mobile (duration: 00m 12s)
* 09:12 elukey: powercycle cp2027 - OEM event registered in racadm getsel, no tty, no ssh
* 23:25 logmsgbot: kaldari Synchronized images/mobile/mediawiki.png: syncing mediawiki logo for mobile (duration: 00m 12s)
* 09:11 elukey@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp2027.codfw.wmnet
* 22:02 logmsgbot: aude Synchronized wmf-config/InitialiseSettings.php: Enable Wikibase usage tracking on ukwiki and viwiki (duration: 00m 15s)
* 21:58 mutante: restarted gitblit
* 21:53 logmsgbot: ori Synchronized php-1.26wmf8/includes/resourceloader/ResourceLoader.php: 7f49853fc9: ResourceLoader::filter: use APC when running under HHVM (did not sync correct file previously) (duration: 00m 12s)
* 21:20 andrewbogott: restarting pdns on virt1000 and labcontrol1001
* 21:05 Jamesofur: decryption key for Board Election insert into voteWiki
* 20:58 bblack: repooling ns0 -> radon AuthDNS
* 20:55 bblack: depooling ns0 -> radon AuthDNS (rebooting for kernel update)
* 20:50 hashar: restarted zuul entirely to remove some stalled jobs
* 20:29 paravoid: kafka preferred-replica-election on an1021
* 20:28 hashar: Restarting Jenkins to release a deadlock
* 20:23 logmsgbot: ori Synchronized php-1.26wmf8/resources/Resources.php: 7f49853fc9: ResourceLoader::filter: use APC when running under HHVM (duration: 00m 13s)
* 20:19 subbu: deployed parsoid sha ab675400
* 19:08 bblack: changed ops/puppet repo to ff-only in gerrit config, feel free to scream/revert if necc!
* 18:46 logmsgbot: twentyafterfour rebuilt wikiversions.cdb and synchronized wikiversions files: All wikis to 1.26wmf8, no new branch until next Tuesday, June 9th
* 18:42 logmsgbot: twentyafterfour Finished scap: Delete stale branch symlinks (1.26wmf1,1.26wmf2) (duration: 07m 14s)
* 18:35 logmsgbot: twentyafterfour Started scap: Delete stale branch symlinks (1.26wmf1,1.26wmf2)
* 15:16 logmsgbot: legoktm Synchronized wmf-config/InitialiseSettings.php: Remove references to $wgEchoCohortInterval (duration: 00m 12s)
* 15:16 logmsgbot: legoktm Synchronized wmf-config/CommonSettings.php: Change default extension distributor branch to REL1_25 (duration: 00m 15s)
* 15:15 bblack: repooling ns1->baham DNS traffic
* 15:07 bblack: depooling ns1->baham DNS traffic for kernel update
* 15:00 moritzm: added linux 3.19.3-5 for jessie-wikimedia on apt.wikimedia.org
* 14:46 bblack: restarted hhvm on mw1195, seems to be a case of https://phabricator.wikimedia.org/T89912
* 14:32 logmsgbot: aude Synchronized wmf-config/InitialiseSettings.php: Enable Wikibase usage tracking on huwiki (duration: 00m 12s)
* 14:29 logmsgbot: jynus Synchronized wmf-config/db-codfw.php: Repool es2008, es2009 and es2010 (duration: 00m 14s)
* 14:10 logmsgbot: aude Synchronized wmf-config/InitialiseSettings.php: Enable Wikibase usage tracking on eswiki (duration: 00m 13s)
* 13:38 logmsgbot: jynus Synchronized wmf-config/db-codfw.php: Depool es2008, es2009 and es2010 (duration: 00m 14s)
* 13:12 paravoid: reimaging rubidium with trusty, as spare
* 13:02 logmsgbot: aude Synchronized wmf-config/InitialiseSettings.php: Enable Wikibase usage tracking on arwiki and cawiki (duration: 00m 15s)
* 12:56 paravoid: permanently switching ns0 to radon instead of rubidium
* 12:53 logmsgbot: jynus Synchronized wmf-config/db-codfw.php: Repool es2009 (duration: 00m 15s)
* 11:04 paravoid: kafka preferred-replica-election on an1021
* 10:55 logmsgbot: jynus Synchronized wmf-config/db-codfw.php: Depool es2009 (duration: 00m 13s)
* 10:43 paravoid: powercycling ms-be1005
* 10:28 logmsgbot: jynus Synchronized wmf-config/db-codfw.php: repool es2010 (duration: 00m 14s)
* 10:24 moritzm: added linux-meta 1.2 for jessie-wikimedia on carbon.wikimedia.org
* 10:09 hashar: Jenkins: refreshing all jobs to get rid of an obsolete http notification to Zuul {{bug|T93321}}
* 09:48 logmsgbot: jynus Synchronized wmf-config/db-eqiad.php: repool es1008 (duration: 00m 13s)
* 09:00 logmsgbot: jynus Synchronized wmf-config/db-codfw.php: depool es2010 (duration: 00m 13s)
* 08:51 moritzm: removed fuse/ntfs-3g from wtp*
* 07:47 logmsgbot: jynus Synchronized wmf-config/db-eqiad.php: depool es1008 (duration: 00m 14s)
* 05:42 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Wed Jun  3 05:41:31 UTC 2015 (duration 41m 30s)
* 02:50 logmsgbot: LocalisationUpdate completed (1.26wmf8) at 2015-06-03 02:48:55+00:00
* 02:45 logmsgbot: l10nupdate Synchronized php-1.26wmf8/cache/l10n: (no message) (duration: 06m 37s)
* 02:28 logmsgbot: LocalisationUpdate completed (1.26wmf7) at 2015-06-03 02:27:38+00:00
* 02:25 logmsgbot: springle Synchronized wmf-config/db-eqiad.php: depool db1072 (duration: 00m 12s)
* 02:23 logmsgbot: l10nupdate Synchronized php-1.26wmf7/cache/l10n: (no message) (duration: 07m 13s)
* 01:57 springle: replicate m3 to codfw dbstore2001
* 01:37 springle: start sync m4 eventlogging to codfw dbstore2002
* 00:35 logmsgbot: mattflaschen Synchronized php-1.26wmf8/extensions/Calendar/: Sync Calendar 1.26wmf8 for module position (duration: 00m 12s)
* 00:20 logmsgbot: mattflaschen Synchronized php-1.26wmf8/includes/User.php: Fixed $flags bit operation precedence fail in User::loadFromDatabase() (duration: 00m 14s)


== June 2 ==
== 2021-08-27 ==
* 23:56 logmsgbot: mattflaschen Synchronized php-1.26wmf8/extensions/Flow/: Sync Flow 1.26wmf8 for import fix (duration: 00m 15s)
* 16:46 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on maps1005.eqiad.wmnet with reason: Resyncing from master
* 23:43 logmsgbot: mattflaschen Synchronized wmf-config/InitialiseSettings.php: Disable WikiGrok (duration: 00m 13s)
* 16:46 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on maps1005.eqiad.wmnet with reason: Resyncing from master
* 23:33 logmsgbot: mattflaschen Synchronized php-1.26wmf8/includes/resourceloader/ResourceLoaderStartUpModule.php: Don't cache minification of user.tokens (duration: 00m 15s)
* 14:50 akosiaris: stop flink on staging cluster to verify some IOPS starvation issues
* 23:33 logmsgbot: mattflaschen Synchronized php-1.26wmf8/includes/resourceloader/ResourceLoader.php: Don't cache minification of user.tokens (duration: 00m 13s)
* 14:46 akosiaris@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'sync'.
* 23:33 logmsgbot: mattflaschen Synchronized php-1.26wmf8/includes/OutputPage.php: Don't cache minification of user.tokens (duration: 00m 14s)
* 14:45 akosiaris@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'sync'.
* 23:31 logmsgbot: mattflaschen Synchronized php-1.26wmf7/includes/resourceloader/ResourceLoaderStartUpModule.php: Don't cache minification of user.tokens (duration: 00m 13s)
* 14:44 akosiaris@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'sync'.
* 23:31 logmsgbot: mattflaschen Synchronized php-1.26wmf7/includes/resourceloader/ResourceLoader.php: Don't cache minification of user.tokens (duration: 00m 14s)
* 14:44 akosiaris@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'sync'.
* 23:31 logmsgbot: mattflaschen Synchronized php-1.26wmf7/includes/OutputPage.php: Don't cache minification of user.tokens (duration: 00m 13s)
* 14:44 akosiaris@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'sync'.
* 21:44 logmsgbot: ori Synchronized wmf-config/CommonSettings.php: I263aa9542: Set $wgExtDistUseEventLogging = true; (duration: 00m 13s)
* 14:44 akosiaris@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'sync'.
* 21:43 logmsgbot: ori Synchronized php-1.26wmf8/extensions/ExtensionDistributor: cdd033e7d8: Update ExtensionDistributor for cherry-picks (duration: 00m 13s)
* 14:39 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=maps1005.eqiad.wmnet
* 19:24 logmsgbot: ori Synchronized wmf-config/StartProfiler.php: I7810b72d5: Sample profiling data at 1:10,000 (duration: 00m 12s)
* 14:38 hnowlan@cumin1001: END (FAIL) - Cookbook sre.postgresql.postgres-init (exit_code=99)
* 19:19 logmsgbot: ori Synchronized wmf-config: I35255f357 and I026dfdbf68 (duration: 00m 12s)
* 14:37 hnowlan@cumin1001: START - Cookbook sre.postgresql.postgres-init
* 19:15 logmsgbot: aude Synchronized wmf-config/Wikibase.php: bump cache epoch for wikidata (duration: 00m 13s)
* 14:30 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on maps1005.eqiad.wmnet with reason: Resyncing from master
* 19:06 logmsgbot: ori Synchronized wmf-config/InitialiseSettings.php: wgMaxCredits to 0 (duration: 00m 13s)
* 14:30 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on maps1005.eqiad.wmnet with reason: Resyncing from master
* 18:53 logmsgbot: twentyafterfour rebuilt wikiversions.cdb and synchronized wikiversions files: Group1 wikis to 1.26wmf8
* 13:48 dzahn@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'miscweb' for release 'main' .
* 18:46 robh: sodium has resumed normal service. all items on https://phabricator.wikimedia.org/T100711 addressed
* 12:49 mutante: rsynced /srv/org/wikimedia/racktables from miscweb1002 to miscweb2002 ([[phab:T269746|T269746]])
* 17:56 logmsgbot: jynus Synchronized wmf-config/db-eqiad.php: repool es1010 (duration: 00m 12s)
* 12:04 topranks: removing peering to Wave Division Holdings / AS11404 at Equinix Chicago cr2-eqord, AS no longer on exchange.
* 17:18 robh: mailing list traffic halted for list renames
* 10:56 akosiaris: sudo cumin 'mw*' 'ip ro ls dev docker0 && sysctl net.ipv4.ip_forward=0' to clear up the docker remnants of the dragonfly evaluation. [[phab:T286054|T286054]]
* 17:07 robh: lists.wikimedia.org is now sha256 cert
* 10:31 godog: bounce logstash on logstash1007
* 17:04 robh: starting the lists.wikimedia.org certificate update, archives will offline during this process
* 10:22 elukey: fallback codfw ores to rdb2007 after maintenance
* 15:44 logmsgbot: jynus Synchronized wmf-config/db-eqiad.php: depool es1010 (duration: 00m 13s)
* 10:18 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host rdb2007.codfw.wmnet
* 15:03 logmsgbot: thcipriani Synchronized wmf-config/wikitech.php: SWAT: No longer set use_dnsmasq for new instances. [[gerrit:215317]] (duration: 00m 12s)
* 10:12 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host rdb2007.codfw.wmnet
* 12:31 twentyafterfour: merged https://gerrit.wikimedia.org/r/#/c/214288/ and deployed scap
* 09:49 elukey: restart ores uwsgi/celery workers to failover rdb2007 to rdb2008 (and ease the reboot of rdb2007
* 12:18 moritzm: installed linux-tools-3.19.8-1 for jessie-wikimedia on carbon
* 09:33 topranks: Running homer against mr1-ulsfo to force OOB interface to 100Mb/full-duplex - [[phab:T288343|T288343]]
* 07:36 logmsgbot: nikerabbit Synchronized wmf-config/InitialiseSettings.php: Fixed wiki id for fiu_vro for CX beta feature (duration: 00m 13s)
* 09:25 cmooney@cumin1001: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) homer to cumin2002.codfw.wmnet with reason: Update to expose int type from Netbox - cmooney@cumin1001
* 05:41 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Tue Jun  2 05:39:57 UTC 2015 (duration 39m 56s)
* 09:25 cmooney@cumin1001: START - Cookbook sre.deploy.python-code homer to cumin2002.codfw.wmnet with reason: Update to expose int type from Netbox - cmooney@cumin1001
* 02:49 logmsgbot: LocalisationUpdate completed (1.26wmf8) at 2015-06-02 02:48:23+00:00
* 09:23 cmooney@deploy1002: Finished deploy [homer/deploy@8183056]: Homer update exposing interface type from Netbox - [[phab:T288343|T288343]] (duration: 01m 28s)
* 02:44 logmsgbot: l10nupdate Synchronized php-1.26wmf8/cache/l10n: (no message) (duration: 05m 45s)
* 09:21 cmooney@deploy1002: Started deploy [homer/deploy@8183056]: Homer update exposing interface type from Netbox - [[phab:T288343|T288343]]
* 02:28 logmsgbot: LocalisationUpdate completed (1.26wmf7) at 2015-06-02 02:27:42+00:00
* 08:05 tstarling@deploy1002: Synchronized php-1.37.0-wmf.20/extensions/SecurePoll/cli/wm-scripts/sendMail.php: (no justification provided) (duration: 00m 56s)
* 02:23 logmsgbot: l10nupdate Synchronized php-1.26wmf7/cache/l10n: (no message) (duration: 06m 26s)
* 07:49 jayme: stopped kube-apiserver on kubestagemaster2001 for testing
* 02:06 logmsgbot: krinkle Synchronized php-1.26wmf7/resources/src/mediawiki/mediawiki.js: backport rl-fix I717b86573 (duration: 00m 14s)
* 07:49 jayme: stopped kube-apiserver on kubestage2001 for testing
* 00:33 ejegg: updated payments-wiki from a4fef65ec1dd3db1fb1d7ceb797b2c7485c722d2 to d22e44e3fab2b937707c2776384cb93a49b4cfd3
* 07:00 godog: bounce logstash on logstash1008
* 00:07 ori: Updated jobrunner for I1d351d8d1: Made periodictasks stats calls more useful
* 06:43 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 00:02 logmsgbot: ori Synchronized php-1.26wmf8/extensions/RSS/RSSParser.php: Ice44740fb: Don't rely on strip marker uniqueness (T10104) (duration: 00m 14s)
* 06:41 tstarling@deploy1002: Synchronized php-1.37.0-wmf.20/extensions/SecurePoll/cli/wm-scripts/sendMail.php: (no justification provided) (duration: 00m 56s)
* 00:01 logmsgbot: ori Synchronized php-1.26wmf7/extensions/RSS/RSSParser.php: Ice44740fb: Don't rely on strip marker uniqueness (T10104) (duration: 00m 13s)
* 06:41 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 00:46 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 00:44 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 00:44 legoktm@deploy1002: Synchronized php-1.37.0-wmf.20/extensions/PageTriage/: Revert backbone.js and underscore.js updates ([[phab:T289825|T289825]]) (duration: 01m 06s)


== June 1 ==
== 2021-08-26 ==
* 23:36 mutante: restarted gitblit ..
* 22:06 legoktm: restarted mailman3-web on lists1001 ([[phab:T289798|T289798]])
* 23:15 ori: Deployed jobchron / jobrunner change Icab05090b and restarted jobchron / jobrunner on job queue runners.
* 19:09 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 22:51 ejegg: updated payments from 60c160110a20cf763b82677ff1501e9ce0c919bc to a4fef65ec1dd3db1fb1d7ceb797b2c7485c722d2
* 19:08 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 21:36 godog: doing some local testing on carbon for T100636 fwiw, thus puppet disabled
* 19:02 dancy@deploy1002: rebuilt and synchronized wikiversions files: group2 wikis to 1.37.0-wmf.20
* 21:35 ejegg: update paymentswiki from aa66797553fbcfb63f7cf29abccc44d060b65db0 to 60c160110a20cf763b82677ff1501e9ce0c919bc
* 18:59 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 21:13 logmsgbot: ori Synchronized php-1.26wmf7/languages/LanguageConverter.php: 1d054ce6d3: Use a fixed marker prefix string in the Parser and MWTidy (duration: 00m 14s)
* 18:54 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 20:40 logmsgbot: ori Synchronized php-1.26wmf8/languages/LanguageConverter.php: 1d054ce6d3: Use a fixed marker prefix string in the Parser and MWTidy (duration: 00m 13s)
* 18:26 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 20:29 twentyafterfour: disabled several no-longer-existent repositories in phabricator which apparently have been deleted in gerrit
* 18:24 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 20:26 subbu: deployed parsoid sha 73445bfd
* 18:19 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|66717bc039f40336144dcc0dfd97ff5331b418e9}}: Install Extension Quiz on ja.wikibooks ([[phab:T289383|T289383]]) (duration: 01m 05s)
* 20:05 twentyafterfour: restarted apache2 and phd on iridium (phabricator)
* 18:17 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 19:52 MaxSem: Repopulated gis.spatial_ref_sys on labsdb1004 with postgis 2.1 data, old contents backed up as spatial_ref_sys_bak
* 18:16 sukhe@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on durum1001.eqiad.wmnet with reason: testing out durum
* 18:55 logmsgbot: ori Synchronized php-1.26wmf7/extensions/SemanticForms/includes/SF_FormUtils.php: I7ed3996a1: Stop using StripState (duration: 00m 13s)
* 18:16 sukhe@cumin1001: START - Cookbook sre.hosts.downtime for 0:30:00 on durum1001.eqiad.wmnet with reason: testing out durum
* 18:55 logmsgbot: ori Synchronized php-1.26wmf8/extensions/SemanticForms/includes/SF_FormUtils.php: I7ed3996a1: Stop using StripState (duration: 00m 15s)
* 18:15 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 17:46 yurik: deployed graphoid service update - grafana logging cleanup
* 18:15 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|cde88918b73628f2eaaff919ddb869b4dc2c93c6}}: Install Extension Quiz on fa.wikibooks ([[phab:T289381|T289381]]) (duration: 01m 07s)
* 16:40 logmsgbot: jynus Synchronized wmf-config/db-eqiad.php: repool pc1003 (duration: 00m 15s)
* 18:09 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 16:06 logmsgbot: ori Synchronized wmf-config/InitialiseSettings.php: T99491, T100925: Sysops to add users to import group on maiwiki, newiki (duration: 00m 14s)
* 18:07 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 15:47 logmsgbot: thcipriani Synchronized php-1.26wmf8/extensions/CodeReview: SWAT: Backport CodeReview module position fix [[gerrit:215043]] (duration: 00m 13s)
* 18:03 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|d4340e9c18468d14885c8ced87f1e014a3481f2a}}: Finalize Event Platform migration of EchoEmail and EchoInteraction ([[phab:T287210|T287210]]) (duration: 01m 07s)
* 15:24 logmsgbot: thcipriani Synchronized php-1.26wmf8/includes/resourceloader/ResourceLoaderWikiModule.php: SWAT: Make ResourceLoaderWikiModule support custom position [[gerrit:214741]] (duration: 00m 15s)
* 17:40 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 15:23 logmsgbot: thcipriani Synchronized php-1.26wmf8/extensions/WikiEditor: SWAT: Make ResourceLoaderWikiModule support custom position [[gerrit:214741]] (duration: 00m 13s)
* 17:38 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 15:22 logmsgbot: thcipriani Synchronized php-1.26wmf8/extensions/VectorBeta: SWAT: Make ResourceLoaderWikiModule support custom position [[gerrit:214741]] (duration: 00m 15s)
* 17:31 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 15:21 logmsgbot: thcipriani Synchronized php-1.26wmf8/extensions/SyntaxHighlight_GeSHi: SWAT: Make ResourceLoaderWikiModule support custom position [[gerrit:214741]] (duration: 00m 14s)
* 17:30 dancy@deploy1002: Synchronized php: group1 wikis to 1.37.0-wmf.20 (duration: 01m 05s)
* 15:20 logmsgbot: thcipriani Synchronized php-1.26wmf8/extensions/MobileFrontend: SWAT: Make ResourceLoaderWikiModule support custom position [[gerrit:214741]] (duration: 00m 13s)
* 17:30 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 15:18 logmsgbot: thcipriani Synchronized php-1.26wmf8/extensions/Gather: SWAT: Make ResourceLoaderWikiModule support custom position [[gerrit:214741]] (duration: 00m 13s)
* 17:29 dancy@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.37.0-wmf.20
* 14:42 cmjohnson1: powering down analytics1028 to swap the bad DIMM
* 17:26 dancy@deploy1002: Synchronized php-1.37.0-wmf.20/includes/page/PageStore.php: Backport: [[gerrit:714864{{!}}PageStore: Pass query flags to getPageById() too (T289717 T195069)]] (duration: 01m 05s)
* 14:38 logmsgbot: jynus Synchronized wmf-config/db-eqiad.php: depool pc1003 (duration: 00m 12s)
* 16:27 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
* 13:48 logmsgbot: aude Synchronized wmf-config/InitialiseSettings.php: Enable arbitrary access on wikisource and itwiki, and make other projects sidebar feature default for ptwiki (for real) (duration: 00m 12s)
* 16:26 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 13:45 logmsgbot: aude Synchronized wmf-config/InitialiseSettings.php: Enable arbitrary access on wikisource and itwiki, and make other projects sidebar feature default for ptwiki (duration: 00m 15s)
* 15:56 sukhe: ran homer for Gerrit 715007: Set up BGP peering to durum1001 in eqiad
* 13:31 logmsgbot: aude Synchronized php-1.26wmf8/extensions/Wikidata: css compatibility fixes for wmf8 (duration: 00m 24s)
* 15:41 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
* 13:00 logmsgbot: krenair Synchronized php-1.26wmf8/extensions/WikimediaMessages/WikimediaMessages.hooks.php: https://gerrit.wikimedia.org/r/#/c/215011/ - fix EditPageCopyrightWarning (duration: 00m 16s)
* 15:40 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 12:22 moritzm: added firmware-nonfree 0.44~wmf1 for jessie-wikimedia on carbon
* 14:24 Amir1: start of mwscript extensions/FlaggedRevs/maintenance/pruneRevData.php --wiki=plwiki --prune --batch-size=10 --sleep=2 ([[phab:T289249|T289249]])
* 09:32 yurik: deployed latest graphoid service to sca100x
* 13:19 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve1004.eqiad.wmnet
* 08:18 hashar: Jenkins: upgrading git plugin from 1.5.0 to latest
* 13:15 klausman@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-serve1004.eqiad.wmnet
* 08:12 mobrovac: restbase restart cassandra on restbase1006
* 13:04 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve1003.eqiad.wmnet
* 08:09 mobrovac: restbase restart cassandra on restbase1005
* 12:59 klausman@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-serve1003.eqiad.wmnet
* 08:07 mobrovac: restbase restart cassandra on restbase1004
* 12:57 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
* 08:05 mobrovac: restbase restart cassandra on restbase1003
* 12:56 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 08:00 mobrovac: restbase restart cassandra on restbase1002
* 12:21 sukhe: running puppet initial run on durum1001.eqiad.wmnet - [[phab:T289536|T289536]]
* 07:59 mobrovac: restbase restart cassandra on restbase1001
* 11:50 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 05:19 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Mon Jun  1 05:18:18 UTC 2015 (duration 18m 17s)
* 11:48 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 02:47 logmsgbot: LocalisationUpdate completed (1.26wmf8) at 2015-06-01 02:46:32+00:00
* 11:42 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 02:43 logmsgbot: l10nupdate Synchronized php-1.26wmf8/cache/l10n: (no message) (duration: 05m 37s)
* 11:40 Lucas_WMDE: EU backport+config window done
* 02:27 logmsgbot: LocalisationUpdate completed (1.26wmf7) at 2015-06-01 02:26:03+00:00
* 11:40 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 02:22 logmsgbot: l10nupdate Synchronized php-1.26wmf7/cache/l10n: (no message) (duration: 06m 35s)
* 11:39 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.37.0-wmf.19/extensions/Math/src/HookHandlers/ParserHooksHandler.php: Backport: [[gerrit:714853{{!}}Allow rendering of <nowiki><math>0</math></nowiki> (T288846)]] (duration: 01m 04s)
* 11:35 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.37.0-wmf.20/extensions/Math/src/HookHandlers/ParserHooksHandler.php: Backport: [[gerrit:714854{{!}}Allow rendering of <nowiki><math>0</math></nowiki> (T288846)]] (duration: 01m 05s)
* 11:32 dzahn@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host durum1001.eqiad.wmnet
* 11:21 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm for new host durum1001.eqiad.wmnet
* 11:20 nikerabbit@deploy1002: Synchronized wmf-config/CommonSettings.php: Config: [[gerrit:714770{{!}}Rename wgTranslateBlacklist to wgTranslateDisabledTargetLanguages]] (duration: 01m 05s)
* 11:13 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 11:12 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 10:09 vgutierrez: rolling restart of varnishkafka-statsv - [[phab:T289618|T289618]]
* 10:07 vgutierrez: disable puppet on cp-text to merge {{Gerrit|I52cf2a573980e33487d1f05f19b192ae7d13d717}} - [[phab:T286038|T286038]]
* 10:06 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve1001.eqiad.wmnet
* 10:01 klausman@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-serve1001.eqiad.wmnet
* 09:36 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve1002.eqiad.wmnet
* 09:30 klausman@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-serve1002.eqiad.wmnet
* 09:24 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve-ctrl1001.eqiad.wmnet
* 09:21 elukey: elukey@kafka-main1001:~$ kafka acls --add --allow-principal User:CN=varnishkafka --producer --topic statsv - [[phab:T286038|T286038]]
* 09:21 klausman@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-serve-ctrl1001.eqiad.wmnet
* 09:20 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-etcd1003.eqiad.wmnet
* 09:17 elukey: restart varnishkafka-statsv on cp4032 to pick up TLS settings
* 09:15 klausman@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-etcd1003.eqiad.wmnet
* 09:15 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-etcd1002.eqiad.wmnet
* 09:13 klausman@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-etcd1002.eqiad.wmnet
* 09:12 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-etcd1001.eqiad.wmnet
* 09:10 klausman@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-etcd1001.eqiad.wmnet
* 08:52 vgutierrez: restart varnishkafka-statsv on cp4032
* 06:59 marostegui@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on db1138.eqiad.wmnet with reason: REIMAGE
* 06:57 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1138.eqiad.wmnet with reason: REIMAGE
* 06:48 godog: more weight to ms-be20[62-65] - [[phab:T288458|T288458]]
* 06:46 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1160 [[phab:T288273|T288273]]', diff saved to https://phabricator.wikimedia.org/P17085 and previous config saved to /var/cache/conftool/dbconfig/20210826-064655-marostegui.json
* 06:43 marostegui: Reimage s4 eqiad master (db1138),  expect lag on eqiad [[phab:T288803|T288803]]
* 06:37 elukey@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 06:33 elukey@cumin1001: START - Cookbook sre.dns.netbox


== May 31 ==
== 2021-08-25 ==
* 22:35 jgage: graphite2001 keeps falling off the net due to OOM; swap 100% in use. dist-upgraded & rebooted. dmesg in ~gage/dmesg.2015-05-31
* 23:23 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 18:37 logmsgbot: krinkle Synchronized php-1.26wmf8/resources/src/mediawiki/mediawiki.js: rl live fix - I717b86573 (duration: 00m 12s)
* 23:22 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 17:36 Krinkle: Confirmed RL problem solved. The jquery|mediawiki&version=bizqqnC request was cached with an old mw.loader implementation somehow. After the touch and sync, the version is now dQAzAsdU and the implementation is up to date.
* 23:20 urbanecm: Evening B&C window completed
* 17:33 logmsgbot: krinkle Synchronized php-1.26wmf7/resources: touch mediawiki.js (duration: 00m 13s)
* 23:19 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.20/extensions/GlobalWatchlist/modules/EntryLog.js: {{Gerrit|230aec3fe7f3d0e325882a5fc926e9f3e4e86717}}: GlobalWatchlistEntryLog: fix storing log id ([[phab:T288385|T288385]]) (duration: 01m 07s)
* 17:20 Krinkle: Investigating RL issues (clients are loading mediawiki.notification&version=19700101T000000Z, mw.loader.moduleRegistry contains NaN for versions)
* 22:19 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 17:12 gwicke: performed a rolling restart of RESTBase Cassandra nodes to address elevated request error rates apparently related to schema disagreement
* 22:18 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 05:35 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Sun May 31 05:34:36 UTC 2015 (duration 34m 35s)
* 22:11 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 02:47 logmsgbot: LocalisationUpdate completed (1.26wmf8) at 2015-05-31 02:46:41+00:00
* 22:10 legoktm@deploy1002: Synchronized debug.json: List primary DC servers first ([[phab:T289246|T289246]]) (duration: 01m 04s)
* 02:43 logmsgbot: l10nupdate Synchronized php-1.26wmf8/cache/l10n: (no message) (duration: 05m 51s)
* 22:09 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 02:26 logmsgbot: LocalisationUpdate completed (1.26wmf7) at 2015-05-31 02:25:44+00:00
* 22:07 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.20/extensions/Flow/includes/Content/BoardContent.php: {{Gerrit|694b94657d251df64145e8153b269094bba75be9}}: BoardContent: Fix deprecation warning ([[phab:T289625|T289625]]) (duration: 01m 04s)
* 02:21 logmsgbot: l10nupdate Synchronized php-1.26wmf7/cache/l10n: (no message) (duration: 06m 41s)
* 22:04 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.20/extensions/VisualEditor/includes/ApiVisualEditor.php: {{Gerrit|73478bc9c72286123cef69e57e0aef9e745dcff9}}: Make sure params is an array ([[phab:T289730|T289730]]) (duration: 01m 04s)
* 22:00 legoktm@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'shellbox-constraints' for release 'main' .
* 21:59 brennen: 1.37.0-wmf.20 train status ([[phab:T281161|T281161]]) blockers should be patched shortly; as we've reached the 15:00 Pacific deploy cutoff for the day, train will resume first thing in US morning
* 21:58 legoktm@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'shellbox-constraints' for release 'main' .
* 21:37 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 21:36 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 21:35 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.20/extensions/DiscussionTools/includes/Notifications/EventDispatcher.php: {{Gerrit|cc04b33dec6b9aed1d7621957c4de527266600d1}}: EventDispatcher: Try really, really hard to read from master ([[phab:T289717|T289717]]) (duration: 01m 04s)
* 21:32 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.20/includes/page/PageStore.php: {{Gerrit|34fb2b99104d0a2bda8aa202f4cdeb07cb983531}}: PageStore: Pass query flags to getPageByName() ([[phab:T289717|T289717]]; [[phab:T195069|T195069]]) (duration: 01m 06s)
* 21:19 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 21:17 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 21:14 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.20/extensions/ConfirmEdit/SimpleCaptcha/SimpleCaptcha.php: {{Gerrit|190d8b7579af981cf2f5e4a6d9457ee0a7edca3f}}: Use Parser::getUserIdentity() instead of ::getUser() in SimpleCaptcha ([[phab:T289731|T289731]]) (duration: 01m 05s)
* 21:05 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 21:04 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 21:03 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.20/extensions/ProofreadPage/: {{Gerrit|913043a5ca7982e07ab0c01f88076af866a43cc3}}: Fixes exception thrown by FilePagination::getPageNumber ([[phab:T289728|T289728]]) (duration: 01m 06s)
* 20:02 brennen: 1.37.0-wmf.20 ([[phab:T281161|T281161]]) status: blocked at group0; 2/3 blockers have probable patches, all seem to be getting attention, so holding off on blocker mail for now.
* 19:54 urbanecm: enwikisource: Start server-side upload for one video file ([[phab:T289698|T289698]])
* 19:45 urbanecm: Start server-side upload for ~2 GB tiff file ([[phab:T289711|T289711]])
* 19:31 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 19:29 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 19:28 dancy@deploy1002: Synchronized php: group1 wikis to 1.37.0-wmf.19 (duration: 01m 05s)
* 19:27 dancy@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.37.0-wmf.19
* 19:17 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 19:16 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 19:14 dancy@deploy1002: Synchronized php: group1 wikis to 1.37.0-wmf.20 (duration: 01m 04s)
* 19:13 dancy@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.37.0-wmf.20
* 19:10 eileen: tools revision changed from {{Gerrit|15bfaa7117}} to {{Gerrit|14e4125f73}}
* 18:43 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 18:42 robh@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
* 18:34 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 18:32 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 18:30 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 18:25 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 18:25 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.20/extensions/Flow/modules/editor/editors/visualeditor/ui/inspectors/mw.flow.ve.ui.MentionInspector.js: {{Gerrit|dd464b4522effbfabea371f8b95b0b25d53da43e}}: Fix reference to renamed abortAllApiRequests method ([[phab:T289648|T289648]]) (duration: 01m 04s)
* 18:24 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 18:23 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.20/skins/WikimediaApiPortal/src/Component/NotificationAlertComponent.php: {{Gerrit|a5bfcc8def96ad1b44fff31c4c1965311be2982a}}: Remove call to text() on string ([[phab:T289692|T289692]]) (duration: 01m 04s)
* 18:23 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 18:18 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|e7c8c041faa974585128c48631522a401fb3d41d}}: Add Wikimedia ES to $wgCopyUploadsDomains whitelist ([[phab:T289446|T289446]]) (duration: 01m 04s)
* 18:17 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 18:16 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|e6df0803e4eaca91bd725bcd376b260b97917de3}}: Disable legacy media dom on a few more wikis ([[phab:T51097|T51097]]) (duration: 01m 05s)
* 18:15 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 18:15 robh@cumin1001: START - Cookbook sre.dns.netbox
* 18:13 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 18:12 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 18:08 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 18:07 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 18:07 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|5182ac88263f23c15a3b10d0f3bc2e492fe425d5}}: Disable upcoming DiscussionTools automatic topic subscriptions for now (duration: 01m 04s)
* 18:06 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 18:06 cmjohnson@cumin1001: END (ERROR) - Cookbook sre.dns.netbox (exit_code=97)
* 18:05 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|2b14eb525e99008d5103a93c5bd01f75211dca99}}: Enable topic subscriptions as a beta feature on Wikipedias except enwiki ([[phab:T287801|T287801]]) (duration: 01m 06s)
* 18:04 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 17:59 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 17:56 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 17:56 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 17:53 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 17:52 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 17:50 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 17:48 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 17:48 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 17:46 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.37.0-wmf.19/extensions/Wikibase/repo/includes/Content/EntityHandler.php: Backport: [[gerrit:714674{{!}}Set EntityHandler::generateHTMLOnEdit to false (T285987)]] (duration: 01m 06s)
* 17:45 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 17:38 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 17:31 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 17:30 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 17:29 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 17:27 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.37.0-wmf.20/extensions/Wikibase: Backport: [[gerrit:714677{{!}}Return normalized snaks from SetClaim, SetReference (T289501)]] (duration: 01m 11s)
* 17:24 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 17:14 ryankemper: [[phab:T289483|T289483]] Depooled `wdqs1013`
* 17:14 ladsgroup@deploy1002: Synchronized php-1.37.0-wmf.20/extensions/Wikibase/repo/includes/Content/EntityHandler.php: Backport: [[gerrit:714675{{!}}Set EntityHandler::generateHTMLOnEdit to false (T285987)]] (duration: 01m 18s)
* 17:12 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 17:11 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 15:54 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
* 15:54 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 15:22 urbanecm: Run `User::newSystemUser( 'MediaWiki default', ['steal' => true] )` in mywiki shell.php session (same issue as [[phab:T289690|T289690]])
* 15:16 urbanecm: [urbanecm@mwmaint2002 ~]$ mwscript extensions/WikimediaMaintenance/createExtensionTables.php --wiki=zh_yuewiki growthexperiments # [[phab:T289680|T289680]]
* 15:04 klausman@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve2004.codfw.wmnet
* 15:04 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 15:02 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.19/extensions/GrowthExperiments/includes/Config/WikiPageConfigWriter.php: {{Gerrit|0b9ca1e11c1f0397847d4cfc7bc86220b6ebe9f6}}: WikiPageConfigWriter: Fix `autopatrol` right name ([[phab:T288886|T288886]]) (duration: 01m 04s)
* 15:02 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 15:00 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|0ccac4b2816f01c4b035aa51cbe4651c715632e0}}: Deploy Growth features to 44 new Wikipedias in dark mode ([[phab:T289680|T289680]]; 3/3) (duration: 01m 06s)
* 14:59 klausman@cumin2002: START - Cookbook sre.hosts.reboot-single for host ml-serve2004.codfw.wmnet
* 14:58 klausman@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve2003.codfw.wmnet
* 14:56 urbanecm@deploy1002: Synchronized wmf-config/config/: {{Gerrit|0ccac4b2816f01c4b035aa51cbe4651c715632e0}}: Deploy Growth features to 44 new Wikipedias in dark mode ([[phab:T289680|T289680]]; 2/3) (duration: 01m 05s)
* 14:55 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 14:55 urbanecm@deploy1002: Synchronized dblists/growthexperiments.dblist: {{Gerrit|0ccac4b2816f01c4b035aa51cbe4651c715632e0}}: Deploy Growth features to 44 new Wikipedias in dark mode ([[phab:T289680|T289680]]; 1/3) (duration: 01m 06s)
* 14:54 urbanecm@deploy1002: sync-file aborted: {{Gerrit|0ccac4b2816f01c4b035aa51cbe4651c715632e0}}: Deploy Growth features to 44 new Wikipedias in dark mode ([[phab:T289680|T289680]]) (duration: 00m 01s)
* 14:54 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 14:52 klausman@cumin2002: START - Cookbook sre.hosts.reboot-single for host ml-serve2003.codfw.wmnet
* 14:52 klausman@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve2002.codfw.wmnet
* 14:46 klausman@cumin2002: START - Cookbook sre.hosts.reboot-single for host ml-serve2002.codfw.wmnet
* 14:42 urbanecm: [urbanecm@mwmaint2002 /srv/mediawiki/php]$ mwscript extensions/GrowthExperiments/maintenance/initWikiConfig.php --wiki=brwiki # [[phab:T289690|T289690]], [[phab:T289680|T289680]]
* 14:40 urbanecm: Run `User::newSystemUser( 'MediaWiki default', ['steal' => true] )` in brwiki shell.php session ([[phab:T289690|T289690]])
* 14:35 klausman@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve2001.codfw.wmnet
* 14:32 urbanecm: mwmaint2002: scap pull # clearing temporary config changes
* 14:30 klausman@cumin2002: START - Cookbook sre.hosts.reboot-single for host ml-serve2001.codfw.wmnet
* 14:29 klausman@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve-ctrl2002.codfw.wmnet
* 14:26 klausman@cumin2002: START - Cookbook sre.hosts.reboot-single for host ml-serve-ctrl2002.codfw.wmnet
* 14:25 klausman@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve-ctrl2001.codfw.wmnet
* 14:23 urbanecm: [urbanecm@mwmaint2002 /srv/mediawiki/php]$ foreachwikiindblist growthexperiments extensions/GrowthExperiments/maintenance/initWikiConfig.php # [[phab:T289680|T289680]] # r714765 applied at mwmaint2002
* 14:22 urbanecm: Apply https://gerrit.wikimedia.org/r/c/operations/mediawiki-config/+/714765/ at mwmaint2002 temporarily ([[phab:T289680|T289680]])
* 14:21 klausman@cumin2002: START - Cookbook sre.hosts.reboot-single for host ml-serve-ctrl2001.codfw.wmnet
* 14:20 urbanecm: Create GrowthExperiments DB tables for wikis listed in P17081 ([[phab:T289680|T289680]])
* 14:20 klausman@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-etcd2003.codfw.wmnet
* 14:18 klausman@cumin2002: START - Cookbook sre.hosts.reboot-single for host ml-etcd2003.codfw.wmnet
* 14:17 klausman@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-etcd2002.codfw.wmnet
* 14:15 klausman@cumin2002: START - Cookbook sre.hosts.reboot-single for host ml-etcd2002.codfw.wmnet
* 14:12 klausman@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-etcd2001.codfw.wmnet
* 14:10 ejegg: updated fundraising CiviCRM from {{Gerrit|d60442e119}} to {{Gerrit|13bf3a02df}}
* 14:08 klausman@cumin2001: START - Cookbook sre.hosts.reboot-single for host ml-etcd2001.codfw.wmnet
* 13:59 volans@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 0:05:00 on cumin2001.codfw.wmnet with reason: apostrophe's test failure
* 13:59 volans@cumin1001: START - Cookbook sre.hosts.downtime for 0:05:00 on cumin2001.codfw.wmnet with reason: apostrophe's test failure
* 13:57 ejegg: updated fundraising CiviCRM from {{Gerrit|42bb64c608}} to {{Gerrit|d60442e119}}
* 13:53 volans@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:05:00 on cumin1001.eqiad.wmnet with reason: apostrophe's test
* 13:53 volans@cumin2002: START - Cookbook sre.hosts.downtime for 0:05:00 on cumin1001.eqiad.wmnet with reason: apostrophe's test
* 13:51 volans: upgraded spicerack to 0.0.58 on cumin2002
* 13:37 joal@deploy1002: Finished deploy [analytics/refinery@7bed213] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@7bed213] (duration: 05m 55s)
* 13:32 joal@deploy1002: Started deploy [analytics/refinery@7bed213] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@7bed213]
* 13:31 joal@deploy1002: Finished deploy [analytics/refinery@7bed213] (thin): Regular analytics weekly train THIN [analytics/refinery@7bed213] (duration: 00m 07s)
* 13:31 joal@deploy1002: Started deploy [analytics/refinery@7bed213] (thin): Regular analytics weekly train THIN [analytics/refinery@7bed213]
* 13:31 joal@deploy1002: Finished deploy [analytics/refinery@7bed213]: Regular analytics weekly train [analytics/refinery@7bed213] (duration: 20m 25s)
* 13:10 joal@deploy1002: Started deploy [analytics/refinery@7bed213]: Regular analytics weekly train [analytics/refinery@7bed213]
* 13:03 jayme: restarted all pods in kube-system namespace in codfw k8s cluster - [[phab:T289131|T289131]]
* 12:25 kormat@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 12:21 kormat@cumin1001: START - Cookbook sre.dns.netbox
* 11:39 jayme: slowly restarting all pods in kube-system namespace in eqiad k8s cluster - [[phab:T289131|T289131]]
* 11:38 btullis@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host an-test-coord1002.eqiad.wmnet
* 11:32 kharlan@deploy1002: Synchronized php-1.37.0-wmf.20/extensions/VisualEditor/includes/ApiVisualEditorEdit.php: Backport: [[gerrit:714670{{!}}ApiVisualEditorEdit: data-<nowiki>{</nowiki>plugin<nowiki>}</nowiki> is not multi (T289652)]] (duration: 01m 06s)
* 11:30 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 11:28 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 11:18 volans: uploaded spicerack_0.0.58 to apt.wikimedia.org buster-wikimedia,bullseye-wikimedia
* 11:02 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host rdb2010.codfw.wmnet
* 10:57 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host rdb2010.codfw.wmnet
* 10:54 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host rdb2008.codfw.wmnet
* 10:49 ladsgroup@deploy1002: Synchronized php-1.37.0-wmf.19/includes/Storage/DerivedPageDataUpdater.php: Backport: [[gerrit:714672{{!}}Introduce concept of generateHTMLOnEdit() for ContentHandler (T285987)]], Part II (duration: 01m 04s)
* 10:47 ladsgroup@deploy1002: Synchronized php-1.37.0-wmf.19/includes/content/ContentHandler.php: Backport: [[gerrit:714672{{!}}Introduce concept of generateHTMLOnEdit() for ContentHandler (T285987)]], Part I (duration: 01m 08s)
* 10:46 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 10:45 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host rdb2008.codfw.wmnet
* 10:45 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 10:21 jbond: rolling out openssl updates
* 10:07 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 10:05 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 10:03 ladsgroup@deploy1002: Synchronized php-1.37.0-wmf.20/includes: Backport: [[gerrit:714671{{!}}Introduce concept of generateHTMLOnEdit() for ContentHandler (T285987)]] (duration: 02m 17s)
* 10:01 mutante: - removed jmads from wmf group
* 09:59 btullis@cumin1001: START - Cookbook sre.ganeti.makevm for new host an-test-coord1002.eqiad.wmnet
* 09:49 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host rdb1011.eqiad.wmnet
* 09:44 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host rdb1011.eqiad.wmnet
* 09:35 jayme@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
* 09:35 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host rdb1012.eqiad.wmnet
* 09:35 jayme@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
* 09:35 jayme@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
* 09:34 jayme@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
* 09:30 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host rdb1012.eqiad.wmnet
* 08:59 jiji@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mc2033.codfw.wmnet with reason: REIMAGE
* 08:57 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mc2033.codfw.wmnet with reason: REIMAGE
* 08:17 godog: swift codfw add ms-be20[62-65] with initial weight - [[phab:T288458|T288458]]
* 07:01 marostegui@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on db1160.eqiad.wmnet with reason: REIMAGE
* 06:59 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1160.eqiad.wmnet with reason: REIMAGE
* 06:43 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1160 for reimage [[phab:T288803|T288803]]', diff saved to https://phabricator.wikimedia.org/P17078 and previous config saved to /var/cache/conftool/dbconfig/20210825-064319-marostegui.json
* 06:08 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db2118.codfw.wmnet with reason: Reimaging [[phab:T288244|T288244]]
* 06:08 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db2118.codfw.wmnet with reason: Reimaging [[phab:T288244|T288244]]
* 06:07 kormat@cumin1001: dbctl commit (dc=all): 'Depool db2118 until it's reimaged to buster [[phab:T289129|T289129]]', diff saved to https://phabricator.wikimedia.org/P17077 and previous config saved to /var/cache/conftool/dbconfig/20210825-060742-kormat.json
* 06:02 kormat@cumin1001: dbctl commit (dc=all): 'Promote db2121 to s7 primary and set section read-write [[phab:T289129|T289129]]', diff saved to https://phabricator.wikimedia.org/P17076 and previous config saved to /var/cache/conftool/dbconfig/20210825-060222-kormat.json
* 06:01 kormat@cumin1001: dbctl commit (dc=all): 'Set s7 codfw as read-only for maintenance - [[phab:T289129|T289129]]', diff saved to https://phabricator.wikimedia.org/P17075 and previous config saved to /var/cache/conftool/dbconfig/20210825-060112-kormat.json
* 06:00 kormat: Starting s7 codfw failover from db2118 to db2121 - [[phab:T289129|T289129]]
* 05:33 eileen: civicrm revision changed from {{Gerrit|a4ce949828}} to {{Gerrit|42bb64c608}}, config revision is {{Gerrit|1afcea7f5b}}
* 05:28 kormat: Moving s7 codfw replicas under db2121 - [[phab:T289129|T289129]]
* 05:27 kormat@cumin1001: dbctl commit (dc=all): 'Set db2121 with weight 0 [[phab:T289129|T289129]]', diff saved to Unable to send diff to phaste and previous config saved to /var/cache/conftool/dbconfig/20210825-052741-kormat.json
* 05:27 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:04:00 on 27 hosts with reason: Primary switchover s7 [[phab:T289129|T289129]]
* 05:27 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:04:00 on 27 hosts with reason: Primary switchover s7 [[phab:T289129|T289129]]
* 02:06 eileen: civicrm revision changed from {{Gerrit|8ed303f2d1}} to {{Gerrit|a4ce949828}}, config revision is {{Gerrit|ac2d75d4a8}}
* 00:53 legoktm@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'shellbox' for release 'main' .
* 00:50 legoktm@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'shellbox' for release 'main' .
* 00:47 legoktm@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'shellbox' for release 'main' .


== May 30 ==
== 2021-08-24 ==
* 21:07 bd808: Upgraded Elasticsearch cluster to 1.3.9 on logstash100[1-6]
* 22:05 legoktm@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'shellbox' for release 'main' .
* 18:35 logmsgbot: hoo Synchronized php-1.26wmf7/extensions/UploadWizard/: Touch js… (duration: 00m 18s)
* 22:04 legoktm@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'shellbox-constraints' for release 'main' .
* 17:06 logmsgbot: legoktm Synchronized php-1.26wmf8/extensions/WikiEditor/extension.json: Explicitly define module position (duration: 00m 13s)
* 21:10 tgr: running extensions/GrowthExperiments/maintenance/revalidateLinkRecommendations.php on various wikis per [[phab:T282873|T282873]]#7303828
* 05:32 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Sat May 30 05:31:02 UTC 2015 (duration 31m 1s)
* 20:59 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 02:56 logmsgbot: LocalisationUpdate completed (1.26wmf8) at 2015-05-30 02:55:22+00:00
* 20:55 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|a6fd96b15e6e3c068c2faac60208b9722d32af0f}}: Growth features: Promote 9 wikis out of dark mode ([[phab:T287871|T287871]]; [[phab:T287874|T287874]]; [[phab:T287872|T287872]]; [[phab:T287880|T287880]]; [[phab:T287868|T287868]]; [[phab:T287873|T287873]]; [[phab:T287879|T287879]]; [[phab:T287875|T287875]]; [[phab:T287876|T287876]]) (duration: 01m 25s)
* 02:52 logmsgbot: l10nupdate Synchronized php-1.26wmf8/cache/l10n: (no message) (duration: 05m 40s)
* 20:54 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 02:36 logmsgbot: LocalisationUpdate completed (1.26wmf7) at 2015-05-30 02:34:55+00:00
* 20:43 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 02:30 logmsgbot: l10nupdate Synchronized php-1.26wmf7/cache/l10n: (no message) (duration: 06m 50s)
* 20:35 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 01:15 ori: Deployed rcstream I797bc1244: Handle invalid JSON gracefully
* 20:35 dancy@deploy1002: Pruned MediaWiki: 1.37.0-wmf.17 (duration: 01m 48s)
* 00:08 logmsgbot: krenair Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/212436/ - docs only, no code change (how was this waiting 10 days?) (duration: 00m 14s)
* 20:33 dancy@deploy1002: Pruned MediaWiki: 1.37.0-wmf.18 (duration: 03m 26s)
* 20:27 dancy@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.37.0-wmf.20
* 20:18 dancy@deploy1002: Finished scap: testwikis wikis to 1.37.0-wmf.20 (duration: 36m 32s)
* 20:16 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 20:09 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 19:41 dancy@deploy1002: Started scap: testwikis wikis to 1.37.0-wmf.20
* 17:23 mbsantos@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
* 17:19 mbsantos@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
* 17:17 mbsantos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'mobileapps' for release 'staging' .
* 15:26 dcausse@deploy1002: Finished deploy [wikimedia/discovery/analytics@e02c602]: transfer_to_es: stop adding data to article_topics (duration: 02m 17s)
* 15:23 dcausse@deploy1002: Started deploy [wikimedia/discovery/analytics@e02c602]: transfer_to_es: stop adding data to article_topics
* 15:15 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 15:13 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 14:55 jayme@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
* 14:54 jayme@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
* 14:50 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
* 14:49 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
* 14:23 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc2031.codfw.wmnet with reason: REIMAGE
* 14:19 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mc2031.codfw.wmnet with reason: REIMAGE
* 13:12 XioNoX: push pfw policies - [[phab:T289353|T289353]]
* 12:45 vgutierrez: enable puppet on P:tlsproxy::envoy hosts - merging https://gerrit.wikimedia.org/r/c/operations/puppet/+/710507/9
* 12:37 vgutierrez: disable puppet on P:tlsproxy::envoy hosts - merging https://gerrit.wikimedia.org/r/c/operations/puppet/+/710507/9
* 12:33 godog: test patched python3-eventlet on thanos-fe1003 - [[phab:T283714|T283714]]
* 12:30 marostegui: Install 10.4.21 on clouddb1015
* 11:27 jiji@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mc2029.codfw.wmnet with reason: REIMAGE
* 11:24 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mc2029.codfw.wmnet with reason: REIMAGE
* 09:08 jbond: upload new statograph version
* 09:02 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
* 09:02 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 08:54 Amir1: start of mwscript extensions/FlaggedRevs/maintenance/pruneRevData.php --wiki=dewiki --prune --batch-size=5 --sleep=5 ([[phab:T289249|T289249]])
* 08:51 Amir1: start of mwscript extensions/FlaggedRevs/maintenance/pruneRevData.php --wiki=arwiki --prune --batch-size=5 --sleep=5 ([[phab:T289249|T289249]])
* 08:01 godog: temp fix thanos-swift.discovery.wmnet in /etc/hosts to get swift-dispersion-stats to work - [[phab:T283714|T283714]]
* 07:51 dcausse: repool wdqs1012 [[phab:T289551|T289551]]
* 07:29 dcausse: restarting blazegraph on wdqs1012
* 07:17 marostegui: Optimize huwiki.flaggedtemplates on db1127
* 07:15 marostegui: Optimize huwiki.flaggedtemplates on db1098:3317
* 06:17 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc2037.codfw.wmnet with reason: REIMAGE
* 06:14 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mc2037.codfw.wmnet with reason: REIMAGE
* 03:51 rzl: rzl@wdqs1012:~$ sudo depool
* 03:46 legoktm: wdqs1012 restarted prometheus-blazegraph-exporter-wdqs-blazegraph.service and prometheus-blazegraph-exporter-wdqs-categories.service after apparent exceptions/crashes
* 02:08 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 02:06 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 00:17 ryankemper: [WDQS Deploy] Restarting `wdqs-categories` across lvs-managed hosts, one node at a time: `sudo -E cumin -b 1 'A:wdqs-all and not A:wdqs-test' 'depool && sleep 45 && systemctl restart wdqs-categories && sleep 45 && pool'`
* 00:17 ryankemper: [WDQS Deploy] Restarted `wdqs-categories` across all test hosts simultaneously: `sudo -E cumin 'A:wdqs-test' 'systemctl restart wdqs-categories'`
* 00:17 ryankemper: [WDQS Deploy] Restarted `wdqs-updater` across all hosts, 4 hosts at a time: `sudo -E cumin -b 4 'A:wdqs-all' 'systemctl restart wdqs-updater'`
* 00:16 ryankemper@deploy1002: Finished deploy [wdqs/wdqs@da9efa9]: 0.3.83 (duration: 07m 05s)
* 00:10 ryankemper: [WDQS Deploy] Tests passing following deploy of `0.3.83` on canary `wdqs1003`; proceeding to rest of fleet
* 00:09 ryankemper@deploy1002: Started deploy [wdqs/wdqs@da9efa9]: 0.3.83
* 00:08 ryankemper: [WDQS Deploy] Gearing up for deploy of wdqs `0.3.83`. Pre-deploy tests passing on canary `wdqs1003`


== May 29 ==
== 2021-08-23 ==
* 23:56 logmsgbot: ori Synchronized w/static/images/project-logos: Ic62747f37: Optimise project logos added since I8c9a6a56 (duration: 00m 13s)
* 23:41 ryankemper: [[phab:T285355|T285355]] `helmfile -e staging -i apply` on `/srv/deployment-charts/helmfile.d/services/linkrecommendation/` from `ryankemper@deploy1002`
* 21:21 logmsgbot: ori Synchronized wmf-config/throttle.php: Ife45684c5: Add another IP address for Santiago edit-a-thon (duration: 00m 13s)
* 23:40 ryankemper@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'linkrecommendation' for release 'staging' .
* 20:43 logmsgbot: ori Synchronized robots.txt: I7b321b62d: allow robots to use RL on domains (duration: 00m 14s)
* 18:56 tgr: morning deploys done
* 17:18 mutante: fix client_max_body_size syntax error in nginx config of payments1001
* 18:56 tgr@deploy1002: Synchronized php-1.37.0-wmf.19/extensions/GrowthExperiments: Backport: [[gerrit:714158{{!}}Add Link: store when tasks were generated (T284551)]] (duration: 00m 57s)
* 15:19 logmsgbot: anomie Synchronized php-1.26wmf8/extensions/ConfirmEdit/: Update ConfirmEdit to fix API breakage [[gerrit:214620]] (duration: 00m 14s)
* 18:49 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 14:52 paravoid: re-redirecting ns0 traffic back to rubidium
* 18:47 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 14:17 jynus: Moving pdns and designate databases from m1 to m5
* 18:35 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 13:30 logmsgbot: aude Synchronized php-1.26wmf8/extensions/Wikidata: touch js and css files to try to fix issues on test.wikidata (duration: 00m 26s)
* 18:34 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 13:17 godog: roll-restart cassandra on cerium / xenon / praseodymium following java upgrade
* 18:27 dancy@deploy1002: Synchronized wmf-config/etcd.php: Config: [[gerrit:713907{{!}}wmfSetupEtcd only supports array input]] (duration: 00m 57s)
* 11:53 paravoid: reimaging rubidium
* 18:27 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 11:45 _joe_: restart nutcracker on mw1150
* 18:24 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 11:41 paravoid: redirecting ns0 traffic to baham (= ns1) in preparation for rubidium upgrade
* 18:23 dancy@deploy1002: Synchronized wmf-config: Config: [[gerrit:713906{{!}}Use array format to specify etcd server]] (duration: 00m 57s)
* 06:52 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Fri May 29 06:51:45 UTC 2015 (duration 51m 44s)
* 18:12 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 06:13 logmsgbot: ori Synchronized php-1.26wmf7/includes/deferred/SiteStatsUpdate.php: Icc12c07ab: Update context stats in SiteStatsUpdate (duration: 00m 13s)
* 18:12 dancy@deploy1002: Synchronized wmf-config/etcd.php: Config: [[gerrit:713704{{!}}Allow protocol for etcd server to be specified]] (duration: 00m 57s)
* 06:12 logmsgbot: ori Synchronized php-1.26wmf8/includes/deferred/SiteStatsUpdate.php: Icc12c07ab: Update context stats in SiteStatsUpdate (duration: 00m 14s)
* 18:11 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 06:03 apergos: salt keys regenerated on all production hosts (minions, not master key)
* 17:59 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 03:09 logmsgbot: LocalisationUpdate completed (1.26wmf8) at 2015-05-29 03:08:15+00:00
* 17:57 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 03:02 logmsgbot: l10nupdate Synchronized php-1.26wmf8/cache/l10n: (no message) (duration: 10m 08s)
* 17:17 ebernhardson@deploy1002: Finished deploy [search/airflow@4c49df7]: ship modern pip/wheel version to support manylinux2014 (pyarrow) (duration: 00m 56s)
* 02:36 logmsgbot: LocalisationUpdate completed (1.26wmf7) at 2015-05-29 02:35:10+00:00
* 17:16 ebernhardson@deploy1002: Started deploy [search/airflow@4c49df7]: ship modern pip/wheel version to support manylinux2014 (pyarrow)
* 02:31 logmsgbot: l10nupdate Synchronized php-1.26wmf7/cache/l10n: (no message) (duration: 06m 54s)
* 16:37 ebernhardson@deploy1002: Finished deploy [search/airflow@32f5039]: Add pyarrow lib for hdfs integration (duration: 00m 35s)
* 00:07 logmsgbot: ori Synchronized php-1.26wmf7/includes/diff/UnifiedDiffFormatter.php: d95cac90c7: Make the output of UnifiedDiffFormatter match diff -u (duration: 00m 14s)
* 16:37 ebernhardson@deploy1002: Started deploy [search/airflow@32f5039]: Add pyarrow lib for hdfs integration
* 00:06 logmsgbot: ori Synchronized php-1.26wmf7/extensions/Echo/includes/DiffParser.php: 41d27c4a26: Update Echo for cherry-picks (duration: 00m 13s)
* 16:24 jiji@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mc2027.codfw.wmnet with reason: REIMAGE
* 16:21 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mc2027.codfw.wmnet with reason: REIMAGE
* 15:48 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 15:44 pt1979@cumin2002: START - Cookbook sre.dns.netbox
* 15:43 pt1979@cumin2002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
* 15:38 pt1979@cumin2002: START - Cookbook sre.dns.netbox
* 14:59 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 14:59 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|26fe6d7a380d4a798f78abf0e722e36c5c63df80}}: ckbwiki: Enable Growth features in dark mode ([[phab:T287867|T287867]]; 3/3) (duration: 00m 56s)
* 14:58 urbanecm@deploy1002: Synchronized wmf-config/config/ckbwiki.yaml: {{Gerrit|26fe6d7a380d4a798f78abf0e722e36c5c63df80}}: ckbwiki: Enable Growth features in dark mode ([[phab:T287867|T287867]]; 2/3) (duration: 00m 57s)
* 14:57 urbanecm@deploy1002: Synchronized dblists/growthexperiments.dblist: {{Gerrit|26fe6d7a380d4a798f78abf0e722e36c5c63df80}}: ckbwiki: Enable Growth features in dark mode ([[phab:T287867|T287867]]; 1/3) (duration: 00m 57s)
* 14:56 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 14:54 urbanecm: [urbanecm@mwmaint2002 /srv/mediawiki-staging/php]$ mwscript extensions/GrowthExperiments/maintenance/initWikiConfig.php --wiki=ckbwiki --phab=[[phab:T287867|T287867]] # [[phab:T287867|T287867]]
* 14:53 urbanecm: [urbanecm@mwmaint2002 /srv/mediawiki-staging/php]$ mwscript extensions/WikimediaMaintenance/createExtensionTables.php --wiki=ckbwiki growthexperiments # [[phab:T287867|T287867]]
* 14:29 zpapierski@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
* 14:26 btullis@cumin1001: END (PASS) - Cookbook sre.hadoop.roll-restart-masters (exit_code=0) restart masters for Hadoop analytics cluster: Restart of jvm daemons. - btullis@cumin1001
* 14:00 btullis@cumin1001: START - Cookbook sre.hadoop.roll-restart-masters restart masters for Hadoop analytics cluster: Restart of jvm daemons. - btullis@cumin1001
* 13:57 zpapierski@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
* 13:56 zpapierski@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
* 13:26 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5 days, 8:00:00 on planet1002.eqiad.wmnet with reason: known issue
* 13:26 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 5 days, 8:00:00 on planet1002.eqiad.wmnet with reason: known issue
* 12:59 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 12:57 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 12:55 jiji@deploy1002: Synchronized wmf-config/ProductionServices.php: Config: [[gerrit:713619{{!}}ProductionServices: change rdb* servers in eqiad and codfw (T280582)]] (duration: 00m 57s)
* 11:35 Lucas_WMDE: EU backport+config window done
* 11:33 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/Wikibase.php: Config: [[gerrit:714334{{!}}Set $wgWBRepoSettings['tmpNormalizeDataValues'] on test wikis (T251480)]] (2/2) (duration: 00m 57s)
* 11:31 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:714334{{!}}Set $wgWBRepoSettings['tmpNormalizeDataValues'] on test wikis (T251480)]] (1/2) (duration: 00m 58s)
* 11:20 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 11:18 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 11:10 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 11:09 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 11:04 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:713860{{!}}Revert "Enable NewUserMessage on hiwiktionary" (T287091)]] (duration: 00m 57s)
* 10:57 jiji@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mc2025.codfw.wmnet with reason: REIMAGE
* 10:55 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mc2025.codfw.wmnet with reason: REIMAGE
* 09:56 ladsgroup@deploy1002: Synchronized php-1.37.0-wmf.19/extensions/FlaggedRevs/maintenance/pruneRevData.php: Backport: [[gerrit:714152{{!}}Add extra sleep option between each batch in pruneRevData.php (T289249)]] (duration: 00m 58s)
* 09:55 mbsantos: start re-import OSM planet data into maps1009 eqiad master ([[phab:T288400|T288400]], [[phab:T288897|T288897]])
* 09:53 urbanecm: Deploy security patch for [[phab:T289408|T289408]]
* 09:51 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 09:50 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 09:33 filippo@puppetmaster1001: conftool action : set/pooled=false; selector: dnsdisc=swift-ro,name=codfw
* 09:33 filippo@puppetmaster1001: conftool action : set/pooled=false; selector: dnsdisc=swift,name=codfw
* 09:02 filippo@puppetmaster1001: conftool action : set/pooled=true; selector: dnsdisc=swift-ro,name=eqiad
* 09:02 filippo@puppetmaster1001: conftool action : set/pooled=true; selector: dnsdisc=swift,name=eqiad
* 09:01 godog: pooling swift in eqiad - [[phab:T288458|T288458]]
* 07:59 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 07:55 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 07:46 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 07:44 ladsgroup@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:714322{{!}}Set request languages rdf output for wikidata to true (T285795)]] (duration: 00m 57s)
* 07:42 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 07:28 Amir1: running FlaggedRevs/maintenance/pruneRevData.php on all flaggedrevs wikis
* 07:28 ladsgroup@deploy1002: Synchronized php-1.37.0-wmf.19/extensions/FlaggedRevs/maintenance/pruneRevData.php: Backport: [[gerrit:714151{{!}}Avoid calling delete() with empty arrays in PruneFRIncludeData (T289249)]] (duration: 00m 59s)
* 07:20 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc2023.codfw.wmnet with reason: REIMAGE
* 07:18 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mc2023.codfw.wmnet with reason: REIMAGE


== May 28 ==
== 2021-08-21 ==
* 23:33 jgage: restarted nutcracker on mw1056 due to errors, per bd808
* 15:12 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 23:18 logmsgbot: catrope Synchronized php-1.26wmf7/includes/EditPage.php: Fix regression with URL-specified edit tags (duration: 00m 13s)
* 15:11 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 23:18 logmsgbot: catrope Synchronized php-1.26wmf6/includes/EditPage.php: Fix regression with URL-specified edit tags (duration: 00m 13s)
* 23:04 logmsgbot: catrope Synchronized wmf-config/InitialiseSettings.php: Enable A/B test of VE for new accounts on enwiki (duration: 00m 13s)
* 22:48 logmsgbot: hoo Synchronized php-1.26wmf7/: Touching some JS, re-syncing resource definitions to rule out causes for Wikidata JS problem. (duration: 01m 00s)
* 21:52 logmsgbot: ori Synchronized php-1.26wmf7/resources/src/mediawiki/mediawiki.toc.js: Touching file on unconfirmed suspicion of stale cache (duration: 00m 16s)
* 21:51 logmsgbot: ori Synchronized php-1.26wmf8/resources/src/mediawiki/mediawiki.toc.js: Touching file on unconfirmed suspicion of stale cache (duration: 00m 15s)
* 20:24 mutante: killed nodejs on wtp1023,wtp1016
* 20:11 logmsgbot: aude Synchronized wmf-config/InitialiseSettings.php: Enable Wikibase usage tracking on Wikivoyage (duration: 00m 13s)
* 20:03 cscott: updated Parsoid to version 497da30e ; canary restart of wtp1001; observed network TX spike (possibly UDP, possibly logging); reverted to 8ed6fd0b and restarted all parsoids.
* 19:33 mutante: temp. stopped icinga-wm
* 19:05 logmsgbot: legoktm Synchronized php-1.26wmf8/extensions/Gadgets/: Explicitly define module position (duration: 00m 14s)
* 18:32 logmsgbot: legoktm Synchronized php-1.26wmf7/extensions/GlobalCssJs/: Explicitly define module position (duration: 00m 12s)
* 18:24 logmsgbot: legoktm Synchronized php-1.26wmf8/extensions/GlobalCssJs/: Explicitly define module position (duration: 00m 13s)
* 18:22 logmsgbot: krenair Synchronized php-1.26wmf6/extensions/VisualEditor: https://gerrit.wikimedia.org/r/#/c/214397/ - in case we have to go back to wmf6 again for whatever reason (duration: 00m 15s)
* 18:20 logmsgbot: krenair Synchronized php-1.26wmf8/extensions/VisualEditor: https://gerrit.wikimedia.org/r/#/c/214396/ (duration: 00m 13s)
* 18:17 logmsgbot: krenair Synchronized php-1.26wmf7/extensions/VisualEditor: https://gerrit.wikimedia.org/r/#/c/214395/ (duration: 00m 14s)
* 17:29 logmsgbot: twentyafterfour Finished scap: Group0 to 1.26wmf8, everything else to 1.26wmf7 (duration: 28m 16s)
* 17:01 logmsgbot: twentyafterfour Started scap: Group0 to 1.26wmf8, everything else to 1.26wmf7
* 16:59 paravoid: reimaging baham
* 16:52 paravoid: redirecting ns1 traffic to rubidium (= ns0) in preparation for baham upgrade
* 15:54 logmsgbot: kartik Finished scap: Update ContentTranslation (duration: 03m 19s)
* 15:50 logmsgbot: kartik Started scap: Update ContentTranslation
* 15:47 logmsgbot: thcipriani Synchronized wmf-config/abusefilter.php: SWAT: Modify AbuseFilter block configuration on eswikibooks [[gerrit:206510]] (duration: 00m 15s)
* 15:40 logmsgbot: thcipriani Synchronized wmf-config/InitialiseSettings.php: SWAT: Prevent indexing of User: namespace on ukwiki [[gerrit:210680]] (duration: 00m 14s)
* 15:35 logmsgbot: thcipriani Synchronized wmf-config/InitialiseSettings.php: SWAT: Enable NewUserMessage on sa.wikipedia [[gerrit:212724]] (duration: 00m 13s)
* 15:28 godog: set operations/debs/python-statsd as hidden in gerrit -- deprecated
* 15:24 logmsgbot: thcipriani Synchronized wmf-config/InitialiseSettings.php: SWAT:  Enable Extension:NewUserMessage on ta.wikipedia [[gerrit:213841]] (duration: 00m 12s)
* 15:13 logmsgbot: thcipriani Synchronized wmf-config/InitialiseSettings.php: SWAT: Enable SandboxLink for cswiki [[gerrit:214247]] (duration: 00m 15s)
* 15:11 godog: set operations/debs/txstatsd as hidden in gerrit -- deprecated
* 15:05 logmsgbot: thcipriani Synchronized wmf-config/InitialiseSettings.php: SWAT: CX: Add wikis for CX deployment on 20150528 [[gerrit:213992]] (duration: 00m 15s)
* 15:00 bblack: merged up https://gerrit.wikimedia.org/r/214345 - look here if IPv6 problems!
* 14:37 cmjohnson1: powering down dataset1001 to add disk array
* 14:17 bblack: deploying https://gerrit.wikimedia.org/r/214341 - keep in mind if ipv6-related issues arise!
* 13:50 akosiaris: started ircecho (icinga-wm) on neon
* 13:46 hashar: upgrading Jenkins git plugin from 1.4.6+wmf1 to 1.7.1 {{bug|T100655}}  and restarting Jenkins
* 13:25 logmsgbot: jynus Synchronized wmf-config/db-eqiad.php: repool pc1003 (not to confuse with db1003) after warmup (duration: 00m 15s)
* 13:11 akosiaris: killed ircecho service on neon
* 09:48 _joe_: depooling the HHVM appserver. 503s reduced slightly but still non-irrelevant
* 09:37 logmsgbot: jynus Synchronized wmf-config/db-eqiad.php: Depool pc1003 (duration: 00m 15s)
* 09:35 _joe_: pooling mw1152 into the imagescalers pool after fixes made in Lyon
* 06:11 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Thu May 28 06:09:56 UTC 2015 (duration 9m 55s)
* 04:22 springle: reload dbstore1002 s7
* 02:41 logmsgbot: LocalisationUpdate completed (1.26wmf6) at 2015-05-28 02:40:00+00:00
* 02:36 logmsgbot: l10nupdate Synchronized php-1.26wmf6/cache/l10n: (no message) (duration: 06m 46s)
* 02:20 springle: set global read_only=0 on pc1001 pc1002. this config broke in the recent upgrade
* 00:59 logmsgbot: legoktm Synchronized php-1.26wmf8/resources/: Revert "Convert mediawiki.toc and mediawiki.user to using mw.cookie" (duration: 00m 17s)
* 00:58 logmsgbot: legoktm Synchronized php-1.26wmf7/resources/: Revert "Convert mediawiki.toc and mediawiki.user to using mw.cookie" (duration: 00m 13s)
* 00:07 logmsgbot: twentyafterfour Synchronized rpc/RunJobs.php: deploy I98b8a4ddbcdd58d1f2f23e4b1bf154f10b6b279e (duration: 00m 17s)


== May 27 ==
== 2021-08-20 ==
* 23:46 awight: updated payments from 858b87319daa3d66f62eb32e08cefc6b061748d1 to aa66797553fbcfb63f7cf29abccc44d060b65db0
* 23:17 legoktm: deployed patch for [[phab:T289385|T289385]]
* 23:31 logmsgbot: twentyafterfour Finished scap: scap, now with 10% less fail (duration: 22m 07s)
* 17:03 btullis@cumin1001: END (PASS) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=0) for hosts an-worker1141.eqiad.wmnet
* 23:26 awight: payments rolled back to 858b87319daa3d66f62eb32e08cefc6b061748d1
* 17:01 btullis@cumin1001: START - Cookbook sre.hadoop.init-hadoop-workers for hosts an-worker1141.eqiad.wmnet
* 23:24 awight: updated payments from 858b87319daa3d66f62eb32e08cefc6b061748d1 to aa66797553fbcfb63f7cf29abccc44d060b65db0
* 16:58 btullis@cumin1001: END (PASS) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=0) for hosts an-worker1140.eqiad.wmnet
* 23:09 logmsgbot: twentyafterfour Started scap: scap, now with 10% less fail
* 16:56 btullis@cumin1001: START - Cookbook sre.hadoop.init-hadoop-workers for hosts an-worker1140.eqiad.wmnet
* 22:57 logmsgbot: ori rebuilt wikiversions.cdb and synchronized wikiversions files: (no message)
* 16:56 btullis@cumin1001: END (PASS) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=0) for hosts an-worker1139.eqiad.wmnet
* 21:49 mutante: restarted hhvm on mw1250,mw1254,mw1256
* 16:54 btullis@cumin1001: START - Cookbook sre.hadoop.init-hadoop-workers for hosts an-worker1139.eqiad.wmnet
* 21:47 mutante: restarted hhvm on mw1017,mw1243,mw1244
* 16:45 btullis@cumin1001: END (PASS) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=0) for hosts an-worker1134.eqiad.wmnet
* 21:42 bblack: restarting hhvm everywhere on 30s intervals between hosts
* 16:43 btullis@cumin1001: START - Cookbook sre.hadoop.init-hadoop-workers for hosts an-worker1134.eqiad.wmnet
* 21:10 logmsgbot: twentyafterfour Synchronized php-1.26wmf8: Fix ConfirmEdit fatal Change-Id: I22353669a85391c3d9760a5253cac1263e895cf9 (duration: 01m 08s)
* 16:38 btullis@cumin1001: END (PASS) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=0) for hosts an-worker1133.eqiad.wmnet
* 20:46 logmsgbot: twentyafterfour Purged l10n cache for 1.26wmf6
* 16:36 btullis@cumin1001: START - Cookbook sre.hadoop.init-hadoop-workers for hosts an-worker1133.eqiad.wmnet
* 20:45 logmsgbot: twentyafterfour rebuilt wikiversions.cdb and synchronized wikiversions files: group0 to 1.26wmf8
* 15:37 jayme: deleting various pods from staging to have them recreated with priorities - [[phab:T289131|T289131]]
* 20:41 logmsgbot: twentyafterfour rebuilt wikiversions.cdb and synchronized wikiversions files: wikipedias to 1.26wmf7
* 15:25 btullis@cumin1001: END (PASS) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=0) for hosts an-worker1129.eqiad.wmnet
* 20:36 logmsgbot: twentyafterfour Finished scap: testwiki to php-1.26wmf8 and rebuild l10n cache (duration: 67m 53s)
* 15:23 btullis@cumin1001: START - Cookbook sre.hadoop.init-hadoop-workers for hosts an-worker1129.eqiad.wmnet
* 19:40 akosiaris: removed operations/puppet/varnish from gerrit, git.wikimedia.org and github. The repo was used as a git submodule but the workflow turned out to be cumbersome approximately a year ago and was no longer updated. Up to a few minutes ago, it only served as a source of confusion. It no longer does.  
* 15:14 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 19:28 logmsgbot: twentyafterfour Started scap: testwiki to php-1.26wmf8 and rebuild l10n cache
* 14:41 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc2021.codfw.wmnet with reason: REIMAGE
* 19:22 logmsgbot: twentyafterfour scap failed: CalledProcessError Command '/usr/local/bin/mwscript rebuildLocalisationCache.php --wiki="testwiki" --outdir="/tmp/scap_l10n_1863397713" --threads=4 --lang en  --quiet' returned non-zero exit status 255 (duration: 03m 38s)
* 14:39 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mc2021.codfw.wmnet with reason: REIMAGE
* 19:18 logmsgbot: twentyafterfour Started scap: testwiki to php-1.26wmf8 and rebuild l10n cache
* 13:54 jelto@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'miscweb' for release 'main' .
* 18:12 moritzm: Uploaded gridengine_6.2u5-4+wmf2 for precise-wikimedia to apt.wikimedia.org
* 13:48 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 17:55 logmsgbot: jynus Synchronized wmf-config/db-eqiad.php: repool pc1002 (duration: 00m 13s)
* 12:00 jayme: enabled priority admission plugin on k8s staging, rolling restart all pods in kube-system namespace - [[phab:T289131|T289131]]
* 17:42 paravoid: rebooting asw-d2-eqiad
* 11:55 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 17:41 ottomata: initiating controlled shutdown of kafka broker analytics1018 in anticipation of switch reboot
* 10:35 dzahn@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'miscweb' for release 'main' .
* 15:33 logmsgbot: jynus Synchronized wmf-config/db-eqiad.php: depool pc1002 (duration: 00m 13s)
* 09:33 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts druid1001.eqiad.wmnet
* 15:02 cmjohnson1: powering down cp1069 to relocate within the same rack
* 09:32 dzahn@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'miscweb' for release 'main' .
* 14:47 cmjohnson1: powering down cp1070 to relocate within the same rack
* 09:23 btullis@cumin1001: START - Cookbook sre.hosts.decommission for hosts druid1001.eqiad.wmnet
* 13:30 hashar: All Jenkins slaves are disconnected due to some ssh error. CI is down.
* 08:48 godog: roll depool/pool thanos-fe to apply swift change - [[phab:T288815|T288815]]
* 13:27 hashar: restarting Jenkins for java upgrade
* 08:43 godog: temp depool thanos-fe2003 to test https://gerrit.wikimedia.org/r/c/operations/puppet/+/713815
* 13:13 logmsgbot: jynus Synchronized wmf-config/db-eqiad.php: repool pc1001 (duration: 00m 13s)
* 08:43 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5 days, 0:00:00 on druid1001.eqiad.wmnet with reason: decommissioning druid1001
* 11:16 akosiaris: rebooting ganeti100{1..4} for bridge networking configuration
* 08:43 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 5 days, 0:00:00 on druid1001.eqiad.wmnet with reason: decommissioning druid1001
* 09:59 paravoid: powercycling ms-be1001; dead, console unresponsive
* 07:14 jiji@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mc2019.codfw.wmnet with reason: REIMAGE
* 06:35 springle: clone dbstore2001 data to dbstore2002
* 07:12 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1038.eqiad.wmnet with reason: REIMAGE
* 05:48 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Wed May 27 05:47:25 UTC 2015 (duration 47m 24s)
* 07:10 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1037.eqiad.wmnet with reason: REIMAGE
* 02:53 logmsgbot: LocalisationUpdate completed (1.26wmf7) at 2015-05-27 02:52:25+00:00
* 07:10 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1038.eqiad.wmnet with reason: REIMAGE
* 02:48 logmsgbot: l10nupdate Synchronized php-1.26wmf7/cache/l10n: (no message) (duration: 06m 52s)
* 07:09 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mc2019.codfw.wmnet with reason: REIMAGE
* 02:29 logmsgbot: LocalisationUpdate completed (1.26wmf6) at 2015-05-27 02:28:34+00:00
* 07:08 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1037.eqiad.wmnet with reason: REIMAGE
* 02:24 logmsgbot: l10nupdate Synchronized php-1.26wmf6/cache/l10n: (no message) (duration: 06m 45s)
* 06:13 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 06:07 TimStarling: sending election email to 44k people
* 03:15 legoktm@deploy1002: Synchronized php-1.37.0-wmf.19/extensions/Score/scripts/removeTagline.php: removeTagline: Set explicit pcre.backtrack_limit ([[phab:T289298|T289298]]) (duration: 00m 58s)
* 03:12 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 03:11 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 00:18 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 00:16 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 00:13 tstarling@deploy1002: Synchronized php-1.37.0-wmf.19/extensions/SecurePoll/cli/wm-scripts/makeMailingList.php: code that uses said hack (duration: 00m 57s)
* 00:12 tstarling@deploy1002: Synchronized php-1.37.0-wmf.19/extensions/SecurePoll/includes/User/LocalAuth.php: hack for mailout (duration: 00m 58s)
* 00:09 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 00:08 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .


== May 26 ==
== 2021-08-19 ==
* 18:21 logmsgbot: twentyafterfour rebuilt wikiversions.cdb and synchronized wikiversions files: Group1 wikis to 1.26wmf7
* 23:15 brennen: ended backport & config window early, as no patches were scheduled and no new attendees for this week
* 17:13 logmsgbot: krenair Synchronized wmf-config/interwiki.cdb: Updating interwiki cache (duration: 00m 15s)
* 22:42 ejegg: updated payments-wiki from {{Gerrit|0a27dbe9b6}} to {{Gerrit|564daed816}}
* 17:10 logmsgbot: krenair Synchronized multiversion/MWMultiVersion.php: open cnwikimedia (duration: 00m 13s)
* 21:20 Amir1: ladsgroup@mwmaint2002:~$ mwscript extensions/FlaggedRevs/maintenance/pruneRevData.php --wiki=huwiki --prune ([[phab:T289249|T289249]])
* 16:27 logmsgbot: krenair Synchronized wmf-config/InitialiseSettings.php: (no message) (duration: 00m 13s)
* 19:13 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 16:12 logmsgbot: krenair rebuilt wikiversions.cdb and synchronized wikiversions files: add cnwikimedia
* 19:12 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 16:08 logmsgbot: krenair Synchronized wmf-config/InitialiseSettings.php: (no message) (duration: 00m 15s)
* 19:07 brennen@deploy1002: rebuilt and synchronized wikiversions files: all wikis to 1.37.0-wmf.19
* 16:07 logmsgbot: krenair Synchronized database lists: (no message) (duration: 00m 15s)
* 19:07 razzi@deploy1002: Finished deploy [analytics/aqs/deploy@57c253e]: Deploy aqs {{Gerrit|9c062f2}} (duration: 03m 30s)
* 16:07 logmsgbot: krenair Synchronized w/static/images/project-logos/cnwikimedia.png: (no message) (duration: 00m 19s)
* 19:03 razzi@deploy1002: Started deploy [analytics/aqs/deploy@57c253e]: Deploy aqs {{Gerrit|9c062f2}}
* 15:52 logmsgbot: jynus Synchronized wmf-config/db-eqiad.php: repool db1063 (duration: 00m 14s)
* 18:27 razzi: Beginning aqs deploy process
* 15:32 logmsgbot: jynus Synchronized wmf-config/db-eqiad.php: repool db1063 (warm period) (duration: 00m 13s)
* 18:00 herron@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts kafkamon2001.codfw.wmnet
* 15:24 logmsgbot: krenair Synchronized wmf-con