You are browsing a read-only backup copy of Wikitech. The live site can be found at wikitech.wikimedia.org

Server Admin Log: Difference between revisions

From Wikitech-static
Jump to navigation Jump to search
imported>Stashbot
(bawolff@deploy1001: Synchronized wmf-config/InitialiseSettings.php: T207900 (b74911f6201) enable csp users with session all group1 wikis (duration: 00m 55s))
imported>Stashbot
(mutante: icinga1001 - using wmf-auto-reimage to reinstall gets stuck at initial puppet run after reboot - Still waiting for Puppet after 105.0 minutes - aborting on cumin, loggin in directly and manually running puppet (T202782 T208100))
Line 1: Line 1:
== 2018-10-27 ==
* 00:00 mutante: icinga1001 - using wmf-auto-reimage to reinstall gets stuck at initial puppet run after reboot - Still waiting for Puppet after 105.0 minutes - aborting on cumin, loggin in directly and manually running puppet ([[phab:T202782|T202782]] [[phab:T208100|T208100]])
== 2018-10-26 ==
== 2018-10-26 ==
* 22:54 mutante: sodium - attempted to replace broken disk for RAID - did not go well
* 21:38 ejegg: updated fundraising CiviCRM from {{Gerrit|97506677e8}} to {{Gerrit|65130ef3dd}}
* 21:34 aaron@deploy1001: Synchronized php-1.33.0-wmf.1/autoload.php: {{Gerrit|86c0b56b0d1bf66073fafb9bc00bafb87d2e3b9c}} (duration: 00m 52s)
* 21:33 aaron@deploy1001: Synchronized php-1.33.0-wmf.1/tests: {{Gerrit|86c0b56b0d1bf66073fafb9bc00bafb87d2e3b9c}} (duration: 01m 08s)
* 20:03 mutante: icinga1001 - disabled puppet, changed:  check_result_reaper_frequency=2 ; max_check_result_reaper_time=10  to test if it lowers latency ([[phab:T208066|T208066]])
* 19:40 chasemp: remove 2fa for charlottepotero and cwd users in phab (so they can readd)
* 19:09 SMalyshev: repooled wdqs1003 - looks like it caught up now
* 17:18 SMalyshev: depool wdqs1003 again to let it catch up some more
* 16:10 ejegg: updated payments-wiki to {{Gerrit|34506ce636}}
* 15:32 elukey: rolling restart of all prometheus-mcrouter-exporters on app/api servers - metrics not reported after the last mcrouter restart
* 15:20 gehel: repooling wdqs1003, other nodes are starting to lag as well
* 14:56 reedy@deploy1001: Synchronized wmf-config/CommonSettings.php: Prevent sysops from disabling 2FA for other users as part of upcoming feature (duration: 00m 53s)
* 13:02 gehel: depool wdqs1003 to catch up on updates
* 07:51 bawolff: adjust patch [[phab:T207916|T207916]]
* 07:16 bawolff@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [[phab:T207916|T207916]] {{Gerrit|13b993ab9f}} - auth log on in arwiki (duration: 00m 54s)
* 06:51 bawolff@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [[phab:T207916|T207916]] {{Gerrit|13b993ab9f}} - auth log on in group1 (duration: 00m 54s)
* 06:33 moritzm: uploaded openjdk-8 backport for recent Java 8 security updates to apt.wikimedia.org/jessie-wikimedia
* 06:24 bawolff@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|ecf579e9f9}} - [[phab:T207916|T207916]] - enable auth log group0 (duration: 00m 55s)
* 06:01 bawolff: adjust patch for [[phab:T207916|T207916]]
* 05:07 bawolff@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|d3b2c346}} [[phab:T207916|T207916]] (duration: 00m 55s)
* 04:58 SMalyshev: depooled wdqs1003 again, let's see if it helps it catch up now
* 04:12 bawolff: deploy patch [[phab:T207916|T207916]]
* 03:19 bawolff@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|bc9b863e}} - [[phab:T207900|T207900]] - enable CSP report only for users w/session everywhere (duration: 00m 55s)
* 03:10 bawolff@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|745d0b61}} - [[phab:T207900|T207900]] - enable CSP report only for users w/session enwiki (duration: 00m 55s)
* 03:01 bawolff@deploy1001: Synchronized wmf-config/CommonSettings.php: [[phab:T207900|T207900]] - Add wikimedia.org (no subdomain) to allow list for math (duration: 00m 53s)
* 02:54 bawolff@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|745d0b61}} - [[phab:T207900|T207900]] - enable CSP report only for users w/session enwiki (duration: 00m 53s)
* 02:43 bawolff@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|a8aa9d6aae}} - [[phab:T207900|T207900]] - enable CSP report only for users w/session fawiki, frwiki, svwiki, eswiki, ruwiki, zhwiki, dewiki (duration: 00m 56s)
* 02:22 bawolff@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|d743db261}} - [[phab:T207900|T207900]] - enable CSP report only for users w/session arwiki (duration: 00m 54s)
* 02:05 bawolff@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|bd55034d122}} - [[phab:T207900|T207900]] - enable CSP report only for users w/session on medium wikis (duration: 00m 55s)
* 01:41 bawolff@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [[phab:T207900|T207900]] ({{Gerrit|b74911f6201}}) enable csp users with session all group1 wikis (duration: 00m 55s)
* 01:41 bawolff@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [[phab:T207900|T207900]] ({{Gerrit|b74911f6201}}) enable csp users with session all group1 wikis (duration: 00m 55s)
* 01:28 bawolff@deploy1001: Synchronized wmf-config/CommonSettings.php: {{Gerrit|Ia518c031}} (duration: 00m 55s)
* 01:28 bawolff@deploy1001: Synchronized wmf-config/CommonSettings.php: {{Gerrit|Ia518c031}} (duration: 00m 55s)

Revision as of 00:00, 27 October 2018

2018-10-27

  • 00:00 mutante: icinga1001 - using wmf-auto-reimage to reinstall gets stuck at initial puppet run after reboot - Still waiting for Puppet after 105.0 minutes - aborting on cumin, loggin in directly and manually running puppet (T202782 T208100)

2018-10-26

  • 22:54 mutante: sodium - attempted to replace broken disk for RAID - did not go well
  • 21:38 ejegg: updated fundraising CiviCRM from 97506677e8 to 65130ef3dd
  • 21:34 aaron@deploy1001: Synchronized php-1.33.0-wmf.1/autoload.php: 86c0b56 (duration: 00m 52s)
  • 21:33 aaron@deploy1001: Synchronized php-1.33.0-wmf.1/tests: 86c0b56 (duration: 01m 08s)
  • 20:03 mutante: icinga1001 - disabled puppet, changed: check_result_reaper_frequency=2 ; max_check_result_reaper_time=10 to test if it lowers latency (T208066)
  • 19:40 chasemp: remove 2fa for charlottepotero and cwd users in phab (so they can readd)
  • 19:09 SMalyshev: repooled wdqs1003 - looks like it caught up now
  • 17:18 SMalyshev: depool wdqs1003 again to let it catch up some more
  • 16:10 ejegg: updated payments-wiki to 34506ce636
  • 15:32 elukey: rolling restart of all prometheus-mcrouter-exporters on app/api servers - metrics not reported after the last mcrouter restart
  • 15:20 gehel: repooling wdqs1003, other nodes are starting to lag as well
  • 14:56 reedy@deploy1001: Synchronized wmf-config/CommonSettings.php: Prevent sysops from disabling 2FA for other users as part of upcoming feature (duration: 00m 53s)
  • 13:02 gehel: depool wdqs1003 to catch up on updates
  • 07:51 bawolff: adjust patch T207916
  • 07:16 bawolff@deploy1001: Synchronized wmf-config/InitialiseSettings.php: T207916 13b993ab9f - auth log on in arwiki (duration: 00m 54s)
  • 06:51 bawolff@deploy1001: Synchronized wmf-config/InitialiseSettings.php: T207916 13b993ab9f - auth log on in group1 (duration: 00m 54s)
  • 06:33 moritzm: uploaded openjdk-8 backport for recent Java 8 security updates to apt.wikimedia.org/jessie-wikimedia
  • 06:24 bawolff@deploy1001: Synchronized wmf-config/InitialiseSettings.php: ecf579e9f9 - T207916 - enable auth log group0 (duration: 00m 55s)
  • 06:01 bawolff: adjust patch for T207916
  • 05:07 bawolff@deploy1001: Synchronized wmf-config/InitialiseSettings.php: d3b2c346 T207916 (duration: 00m 55s)
  • 04:58 SMalyshev: depooled wdqs1003 again, let's see if it helps it catch up now
  • 04:12 bawolff: deploy patch T207916
  • 03:19 bawolff@deploy1001: Synchronized wmf-config/InitialiseSettings.php: bc9b863e - T207900 - enable CSP report only for users w/session everywhere (duration: 00m 55s)
  • 03:10 bawolff@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 745d0b61 - T207900 - enable CSP report only for users w/session enwiki (duration: 00m 55s)
  • 03:01 bawolff@deploy1001: Synchronized wmf-config/CommonSettings.php: T207900 - Add wikimedia.org (no subdomain) to allow list for math (duration: 00m 53s)
  • 02:54 bawolff@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 745d0b61 - T207900 - enable CSP report only for users w/session enwiki (duration: 00m 53s)
  • 02:43 bawolff@deploy1001: Synchronized wmf-config/InitialiseSettings.php: a8aa9d6aae - T207900 - enable CSP report only for users w/session fawiki, frwiki, svwiki, eswiki, ruwiki, zhwiki, dewiki (duration: 00m 56s)
  • 02:22 bawolff@deploy1001: Synchronized wmf-config/InitialiseSettings.php: d743db261 - T207900 - enable CSP report only for users w/session arwiki (duration: 00m 54s)
  • 02:05 bawolff@deploy1001: Synchronized wmf-config/InitialiseSettings.php: bd55034d122 - T207900 - enable CSP report only for users w/session on medium wikis (duration: 00m 55s)
  • 01:41 bawolff@deploy1001: Synchronized wmf-config/InitialiseSettings.php: T207900 (b74911f6201) enable csp users with session all group1 wikis (duration: 00m 55s)
  • 01:28 bawolff@deploy1001: Synchronized wmf-config/CommonSettings.php: Ia518c031 (duration: 00m 55s)
  • 01:26 smalyshev@deploy1001: Finished deploy [wdqs/wdqs@e9392f4]: Re-deploy Updater to deal with performance issues (duration: 31m 28s)
  • 01:21 bawolff@deploy1001: Synchronized wmf-config/CommonSettings.php: T207900 - deploy CSP to people with session on enwikiquote (duration: 00m 54s)
  • 01:19 bawolff@deploy1001: Synchronized wmf-config/InitialiseSettings.php: T207900 - deploy CSP to people with session on enwikiquote (duration: 00m 55s)
  • 00:56 twentyafterfour: twentyafterfour@deploy1001 rebuilt and synchronized wikiversions files: group2 wikis to 1.33.0-wmf.1 refs T206655
  • 00:55 smalyshev@deploy1001: Started deploy [wdqs/wdqs@e9392f4]: Re-deploy Updater to deal with performance issues
  • 00:39 jforrester@deploy1001: Synchronized wmf-config/Wikibase.php: Post-SWAT: De-register all entities on WBMI installations calling themselves Commons I09e066f2 (duration: 00m 56s)
  • 00:32 ejegg: updated payments-wiki from f5999d963d to 57e8438e9c
  • 00:18 addshore@deploy1001: Synchronized wmf-config/Wikibase.php: Define and specify lexeme NS for wikidatawiki (duration: 00m 55s)

2018-10-25

  • 23:59 twentyafterfour@deploy1001: Synchronized php-1.33.0-wmf.1/includes/parser/Parser.php: deploy https://gerrit.wikimedia.org/r/#/c/mediawiki/core/+/469799/ refs T208000 (duration: 00m 56s)
  • 23:57 twentyafterfour: deploying https://gerrit.wikimedia.org/r/#/c/mediawiki/core/+/469799/
  • 23:53 jforrester@deploy1001: Synchronized php-1.32.0-wmf.26/extensions/CentralNotice/special/SpecialCentralNotice.php: SWAT Sync versions of SpecialCentralNotice to avoid dirty repo checkout T208004 (duration: 00m 56s)
  • 23:39 maxsem@deploy1001: Synchronized php-1.33.0-wmf.1/extensions/CentralNotice/: https://gerrit.wikimedia.org/r/#/c/mediawiki/extensions/CentralNotice/+/469794/ (duration: 00m 57s)
  • 23:26 maxsem@deploy1001: Synchronized php-1.33.0-wmf.1/extensions/GlobalPreferences/: https://gerrit.wikimedia.org/r/c/469793/ (duration: 00m 58s)
  • 21:35 twentyafterfour@deploy1001: rebuilt and synchronized wikiversions files: rolling back group1 refs T206655 T208000
  • 21:29 XioNoX: configure 208.80.153.185/29 on cr1/2-codfw - T207663
  • 21:25 twentyafterfour@deploy1001: rebuilt and synchronized wikiversions files: group2 wikis to 1.33.0-wmf.1 refs T206655
  • 20:48 twentyafterfour: staying at group1, error rate seems to have stabilized
  • 20:43 ejegg: updated fundraising python tools from 5a2d39b41b to af5dbee8eb
  • 20:37 ejegg: updated standalone SmashPig deployment from b638ca02bc to f65daa8550
  • 20:32 twentyafterfour: db error rate increased again. rolling back
  • 20:31 twentyafterfour@deploy1001: Synchronized php: group1 wikis to 1.33.0-wmf.1 refs T206655 (duration: 00m 54s)
  • 20:30 twentyafterfour@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.33.0-wmf.1 refs T206655
  • 20:07 jgleeson_: Updated paymentswiki from 5a7a8e7e4f to f5999d963d
  • 20:04 twentyafterfour: still haven't deployed wmf.1 yet error rate increased and icinga is alerting about mediawiki exceptions + wdqs1010 degraded
  • 20:02 twentyafterfour@deploy1001: Finished scap: full sync to be sure that 1.33.0-wmf.1 is fully deployed (duration: 36m 57s)
  • 19:54 mutante: mw1272 - repooled (T207983)
  • 19:51 mutante: mw1272 - rebooting (a stop job is running for HHVM PH/Hack runtime) (T207983)
  • 19:47 mutante: mw1272 - depooled, restarting hhvm (T207983)
  • 19:45 mutante: mw1272 - depooled
  • 19:26 twentyafterfour@deploy1001: Started scap: full sync to be sure that 1.33.0-wmf.1 is fully deployed
  • 19:23 twentyafterfour: beginning mediawiki train. Will start with group1 and then monitor the situation for a few minutes. If everything looks good then we go to group2.
  • 19:20 sbisson@deploy1001: Synchronized php-1.32.0-wmf.26/extensions/ContentTranslation/: SWAT: Add detailed logging for AbuseFilter (duration: 00m 56s)
  • 18:37 sbisson@deploy1001: Synchronized php-1.33.0-wmf.1/extensions/ContentTranslation/: SWAT: Remove the session parameter from AbuseFilter logging (duration: 00m 56s)
  • 17:58 aaron@deploy1001: Synchronized php-1.33.0-wmf.1/extensions/Translate/tag: c5fa239 (duration: 00m 55s)
  • 17:52 aaron@deploy1001: Synchronized php-1.33.0-wmf.1/includes/page/WikiPage.php: f3b5a1d (duration: 00m 54s)
  • 17:52 smalyshev@deploy1001: Finished deploy [wdqs/wdqs@4967dba]: Test deploy new update & scripts (duration: 00m 28s)
  • 17:51 smalyshev@deploy1001: Started deploy [wdqs/wdqs@4967dba]: Test deploy new update & scripts
  • 17:50 aaron@deploy1001: Synchronized php-1.33.0-wmf.1/tests/phpunit/includes/page/WikiPageDbTestBase.php: f3b5a1d (duration: 00m 55s)
  • 17:36 aaron@deploy1001: Synchronized php-1.33.0-wmf.1/includes/changetags/ChangeTags.php: 08f8e6a (duration: 00m 55s)
  • 17:24 bsitzmann@deploy1001: Finished deploy [mobileapps/deploy@95452cf]: Update mobileapps to 58cbdff (T206527) (duration: 03m 50s)
  • 17:20 bsitzmann@deploy1001: Started deploy [mobileapps/deploy@95452cf]: Update mobileapps to 58cbdff (T206527)
  • 17:20 mutante: planet - regenerating feeds for 'en' and 'de', others will follow by cron. switching to new theme. replaced bootstrap with bulma. removed jQuery. thanks to paladox
  • 16:34 gehel@puppetmaster1001: conftool action : set/weight=20; selector: dc=eqiad,cluster=wdqs,name=wdqs1005.codfw.wmnet
  • 16:34 gehel@puppetmaster1001: conftool action : set/weight=20; selector: dc=eqiad,cluster=wdqs,name=wdqs1004.codfw.wmnet
  • 16:34 gehel: decreasing relative weight of wdqs1003 in LVS to ease the updater
  • 16:24 shdubsh: installed patched nagios-nrpe-plugin and nagios-nrpe-server on icinga1001 - T207775
  • 15:36 elukey: shutdown aqs1006 to replace one broken disk - T206915
  • 15:31 SMalyshev: depooling wdqs1003 again, it's not catching up like the other hosts
  • 15:16 addshore@deploy1001: Synchronized wmf-config/InitialiseSettings-labs.php: [Beta Cluster] Re-enable WBMI on Beta Commons (duration: 00m 54s)
  • 15:11 addshore@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Revert "logging: Disable Wikibase.NewItemIdFormatter channel" (duration: 00m 55s)
  • 15:08 addshore@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Explicitly set wgLexemeEnableRepo for wikidatas gerrit:469625 (duration: 00m 55s)
  • 15:02 godog: test rsyslog 8.38 upgrade on lithium - T136312
  • 14:28 elukey: upgrade druid on druid100[4-6] to Druid 0.12.3
  • 14:20 banyek: running dns update (gerrit patch: 467711)
  • 13:48 anomie@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Setting comment table migration stage to write-new/read-both on all wikis (T166733) (duration: 00m 55s)
  • 13:46 godog: reformat ms-be2043 xfs filesystems - T199198
  • 13:29 XioNoX: test successful, rollback add term return-tcp permit on cr2-codfw
  • 13:28 XioNoX: test add term return-tcp permit on cr2-codfw
  • 12:14 volans: rebooting cumin1001 to pick new kernel and clear any potential weird state after OOMs
  • 12:01 zeljkof: EU SWAT finished
  • 11:17 zfilipin@deploy1001: Synchronized wmf-config/throttle.php: SWAT: New throttle rule for Johannesburg Event on 2018-10-27 (T207742) (duration: 00m 55s)
  • 11:09 zfilipin@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Stop collecting data CitaitonUsage and CitationUsagePageLoad (T191086 T203253) (duration: 00m 57s)
  • 10:57 volans: restart pdfrender on scb1003
  • 10:11 elukey: upgrade druid100[1-3] to druid 0.12.3
  • 09:51 gehel: resetting deployment directory on wdqs1003
  • 09:15 elukey@deploy1001: Finished deploy [analytics/turnilo/deploy@84bf1ad]: Upgrade to 1.8.1 (duration: 00m 10s)
  • 09:15 elukey@deploy1001: Started deploy [analytics/turnilo/deploy@84bf1ad]: Upgrade to 1.8.1
  • 09:10 ema: resume cache hosts rolling reboots for kernel/microcode updates T203011
  • 07:16 vgutierrez: Uploaded certcentral 0.3 to apt.wikimedia.org (stretch) - T207737 T207478
  • 07:11 moritzm: installing requests security updates on trusty
  • 06:17 SMalyshev: depooling wdqs1003 again, it's not catching up like the other hosts
  • 06:06 elukey: upload druid 0.12.3-1 debs to stretch-wikimedia

2018-10-24

  • 23:24 maxsem@deploy1001: Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/operations/mediawiki-config/+/469495/ (duration: 00m 54s)
  • 23:15 maxsem@deploy1001: Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/operations/mediawiki-config/+/462040/ (duration: 00m 55s)
  • 23:08 bawolff@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Deploy csp report-only to small.dblist wikis T207900 (duration: 00m 56s)
  • 22:38 bawolff@deploy1001: Synchronized wmf-config/CommonSettings.php: Deploy csp report-only to outreachwiki T207900 (duration: 00m 54s)
  • 22:36 bawolff@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Deploy csp report-only to outreachwiki T207900 (duration: 00m 54s)
  • 22:33 bawolff@deploy1001: scap failed: average error rate on 8/11 canaries increased by 10x (rerun with --force to override this check, see https://logstash.wikimedia.org/goto/db09a36be5ed3e81155041f7d46ad040 for details)
  • 22:27 eileen_: civicrm revision changed from 1c0a1b2406 to 97506677e8, config revision is c0a8be03a1
  • 21:33 banyek: compressing tables in s1@dbstore2002 (T204930)
  • 21:26 banyek: pausing replication on dbstore2002 (T204930)
  • 19:38 twentyafterfour: The train is now blocked by database lock contention of unknown origin
  • 19:31 twentyafterfour: the errors were all coming from wmf.26 but the error rate skyrocketed after deploying 1.33.0-wmf.1 to group1 so there is some query in the new branch which is holding a lock. T207881
  • 19:19 twentyafterfour@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.33.0-wmf.1 refs T206655
  • 18:16 XioNoX: enable BGP sessions to transit/peering on cr2-eqord - T204170
  • 17:20 gehel: repooling all elasticsearch servers in eqiad
  • 17:12 cmjohnson1: rebooting cloudvirt1019
  • 17:04 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings-labs.php: [Beta Cluster] Re-disable WBMI on Beta Commons for now T180981 (duration: 00m 54s)
  • 17:03 jforrester@deploy1001: scap failed: average error rate on 4/11 canaries increased by 10x (rerun with --force to override this check, see https://logstash.wikimedia.org/goto/db09a36be5ed3e81155041f7d46ad040 for details)
  • 16:36 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings-labs.php: [Beta Cluster] Re-disable WBMI on Beta Commons for now T180981 (duration: 00m 54s)
  • 16:31 addshore@deploy1001: Synchronized wmf-config/Wikibase.php: gerrit:469444 Wikibase.php, dont load wikidata repo settings on other repos (take 2) (duration: 00m 54s)
  • 16:04 XioNoX: power-off cr1-eqord - T204170
  • 16:00 twentyafterfour: 15:59:06 Synchronized php-1.33.0-wmf.1/extensions/EventBus/: revert "Set event datetime with microsecond resolution." on 1.33.0-wmf.1 refs T207817 (duration: 00m 56s)
  • 15:59 XioNoX: disable BGP sessions to transit/peering on cr1-eqord - T204170
  • 15:54 twentyafterfour: deploying https://gerrit.wikimedia.org/r/469451
  • 14:23 herron: scheduled icinga downtime and disabling puppet on logstash hosts. deploying role::kafka::logging to logstash elasticserach data hosts
  • 13:35 XioNoX: pre-configure switch ports for labvirt1007/8/9/12:eth1 in cloud-virt-instance-trunk range on asw2-b-eqiad
  • 13:17 ema: begin cache hosts rolling reboots for kernel/microcode updates T203011
  • 12:24 ema: cp-ats: upgrade trafficserver to 8.0.0-1wm1 T204232
  • 12:12 ema: cp1072: upgrade trafficserver to 8.0.0-1wm1 T204232
  • 11:22 ema: cp1071: upgrade trafficserver to 8.0.0-1wm1 T204232
  • 10:56 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Restore db1092 and db1104 original weight (duration: 00m 52s)
  • 10:35 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Increase traffic for db1092 and starting to restore db1104 original weight (duration: 00m 54s)
  • 10:28 marostegui: Compare revision table on dewiki cebwiki shwiki srwiki mgwiktionary enwikivoyage on db1100 and db2075 - T184805
  • 09:54 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Increase traffic for db1092 (duration: 00m 54s)
  • 09:39 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Increase traffic for db1092 (duration: 00m 54s)
  • 09:20 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Slowly repool db1092 and db1087 (duration: 01m 05s)
  • 08:55 marostegui: Stop MySQL for upgrade and reboot on db1087
  • 08:47 marostegui: Update MySQL on db1092 for upgrade and reboot
  • 08:03 godog: fix aggregation to 'sum' for MediaWiki.RevisionSlider - T205416
  • 07:33 gehel: powercycling wdqs1010 - T207817
  • 07:19 _joe_: powercycling wdqs1009
  • 07:04 elukey: powercycle wdqs1008
  • 06:59 elukey: powercycle wdqs1007
  • 06:55 elukey: powercycle wdqs1006 (depool first)
  • 06:46 elukey: powercycle wdqs1005
  • 06:42 SMalyshev: repooled wdqs1003
  • 06:35 _joe_: powercycling wdqs[2001-2002,2004-2006].codfw.wmnet, one at a time
  • 06:33 elukey: powercycle wdqs1004
  • 05:24 kartik@deploy1001: Finished deploy [cxserver/deploy@80dc518]: Update cxserver to 9ad60d9 (T207445) (duration: 04m 06s)
  • 05:20 kartik@deploy1001: Started deploy [cxserver/deploy@80dc518]: Update cxserver to 9ad60d9 (T207445)
  • 02:34 mutante: powercycled wdqs1009 - by request
  • 02:24 onimisionipe@deploy1001: Finished deploy [wdqs/wdqs@d4692ea]: Reverting update on wdqs1003 to fix wdqs-updater issue (duration: 00m 03s)
  • 02:24 onimisionipe@deploy1001: Started deploy [wdqs/wdqs@d4692ea]: Reverting update on wdqs1003 to fix wdqs-updater issue
  • 02:12 onimisionipe@deploy1001: Finished deploy [wdqs/wdqs@d4692ea]: Reverting update on wdqs1003 to fix wdqs-updater issue (duration: 00m 23s)
  • 02:12 onimisionipe@deploy1001: Started deploy [wdqs/wdqs@d4692ea]: Reverting update on wdqs1003 to fix wdqs-updater issue
  • 01:56 tstarling@deploy1001: Synchronized php-1.33.0-wmf.1/includes/page/WikiPage.php: T207530 (duration: 00m 53s)
  • 01:46 tstarling@deploy1001: Synchronized php-1.32.0-wmf.26/includes/page/WikiPage.php: fix deletion performance regression T207530 (duration: 00m 55s)
  • 01:37 bawolff: deployed T207750
  • 00:28 twentyafterfour@deploy1001: rebuilt and synchronized wikiversions files: group0 wikis to 1.33.0-wmf.1 refs T206655
  • 00:26 twentyafterfour: finished with mediawiki train for group0 refs T206655
  • 00:08 twentyafterfour@deploy1001: Finished scap: syncing 1.33.0-wmf.1 refs T206655 (duration: 36m 58s)

2018-10-23

  • 23:31 twentyafterfour@deploy1001: Started scap: syncing 1.33.0-wmf.1 refs T206655
  • 23:30 twentyafterfour@deploy1001: Synchronized php-1.32.0-wmf.26/includes/export/WikiExporter.php: sync https://gerrit.wikimedia.org/r/#/c/mediawiki/core/+/469319/ refs T207628 (duration: 01m 39s)
  • 22:16 eileen: civicrm revision changed from bde28d4453 to 1c0a1b2406, config revision is c0a8be03a1
  • 22:14 twentyafterfour: scap prep 1.33.0-wmf.1
  • 21:47 mutante: icinga1001 - replacing check_ping with check_fping as the standard host check command, for faster host checks (another tip from Nagios Tuning guide, still manual testing) (T202782)
  • 21:30 mutante: icinga1001 - changing check_result_reaper_frequecy from 10 to 3, trying to lower average check latency. "allow faster check result processing -> requires more CPU" (T202782)
  • 19:31 twentyafterfour@deploy1001: Synchronized php-1.32.0-wmf.26/skins/MinervaNeue/resources/skins.minerva.scripts/pageIssuesLogger.js: sync https://gerrit.wikimedia.org/r/#/c/mediawiki/skins/MinervaNeue/+/469244/ refs T207423 (duration: 00m 48s)
  • 19:27 twentyafterfour: deploying https://gerrit.wikimedia.org/r/#/c/mediawiki/skins/MinervaNeue/+/469244/
  • 19:22 bawolff: deploy patch T207778
  • 18:17 mutante: icinga - performance/latency comparison - https://icinga.wikimedia.org/cgi-bin/icinga/extinfo.cgi?type=4 vs https://icinga-stretch.wikimedia.org/cgi-bin/icinga/extinfo.cgi?type=4 (T202782)
  • 18:13 mutante: icinga1001 - manually set max_concurrent_checks to 0 (unlimited), restart icinga, keep puppet disabled, for testing (it ran into the limit of 10000 all the time, causing lots of logging, and the CPU power is actually slightly lower than on einsteinium (T202782) refs: Nagios Tuning, point 7 https://assets.nagios.com/downloads/nagioscore/docs/nagioscore/3/en/tuning.html
  • 17:20 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings-labs.php: BETA: Set wmgWikibaseCachePrefix for commonswiki I0badd355723 (duration: 00m 46s)
  • 17:18 ejegg: updated standalone SmashPig deploy from 2292111bda to b638ca02bc
  • 17:15 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: For WBMI, intentionally rather than implicitly install Wikibase I38574e670 (duration: 00m 47s)
  • 17:13 mutante: icinga1001 rm /var/log/user.log.1 - was 14G and using 25% of the / partition and server out of disk :/
  • 17:06 ejegg: rolled SmashPig back to 2292111bda
  • 17:03 ejegg: updated standalone SmashPig deployment from 2292111bda to 18da9727d8
  • 16:20 volans: restarted pdfrender on scb1004
  • 14:47 herron: added confluent-kafka-2.11 1.1.0-1 package to jessie-wikimedia/thirdparty T206454
  • 14:34 anomie@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Setting comment table migration stage to write-new/read-both on group 1 (T166733) (duration: 00m 46s)
  • 14:22 anomie@deploy1001: Synchronized php-1.32.0-wmf.26/includes/filerepo/file/LocalFile.php: Backport for T207419 (duration: 00m 47s)
  • 14:02 gehel: repooling / banning elastics1031 - T207724
  • 14:01 moritzm: installing spice security updates
  • 14:00 ema: upload trafficserver 8.0.0-1wm1 to stretch-wikimedia/main T204232
  • 13:49 gehel: depooling / banning elastics1031 - T207724
  • 13:43 gehel: depooling / banning elastics1029 - T207724
  • 13:35 gehel: rolling restart of blazegraph for change to blazegraph home dir
  • 13:22 gehel: depooling / banning elastics1018 - T207724
  • 12:29 gehel: depooling / banning elastics1028 and 1030 - T207724
  • 11:23 zeljkof: EU SWAT finished
  • 11:20 zfilipin@deploy1001: Synchronized wmf-config/throttle.php: SWAT: New throttle rule for Wikipedia in Ort (T207714) (duration: 00m 46s)
  • 11:11 zfilipin@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Enable RCPatrol for srwikiquote (T207732) (duration: 00m 47s)
  • 10:13 ema: upload libc++ 6.0.1 to stretch-wikimedia/main T204232
  • 09:42 jynus: stopping db1087 to fix db1124
  • 09:31 gehel: depooling / banning elastics1017 and 1022 - T207724
  • 09:13 godog: roll-restart thumbor to send statsd traffic through statsd_exporter - T205870
  • 08:08 godog: update hp firmware to 6.60 on ms-be2017 - T141756
  • 07:14 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1100 - T184805 (duration: 00m 48s)
  • 06:50 elukey: powercycle ms-be2017 (frozen since ~8hrs ago)
  • 06:42 elukey: restart yarn and hdfs daemon on analytics1068 to pick up correct config (the host was down since before we swapped the Hadoop masters due to hw failure)
  • 06:39 marostegui: Stop replication on db1092 and db1087 for checking T206743
  • 06:02 marostegui: Deploy schema change on s3 - T207359
  • 00:35 SMalyshev: temp depooled wdq1003 to let it catch up
  • 00:17 Amir1: evening SWAT is done

2018-10-22

  • 23:59 ladsgroup@deploy1001: Synchronized php-1.32.0-wmf.26/includes/changetags/ChangeTags.php: SWAT: Fix bad join on ChangeTag subquery (T207313) (duration: 00m 47s)
  • 23:39 smalyshev@deploy1001: Finished deploy [wdqs/wdqs@d4692ea]: Redeploy Updater for T207673 (duration: 10m 12s)
  • 23:29 smalyshev@deploy1001: Started deploy [wdqs/wdqs@d4692ea]: Redeploy Updater for T207673
  • 22:12 pmiazga@deploy1001: Synchronized wmf-config//InitialiseSettings-labs.php: SWAT: beta: Disable page issues A/B test on beta cluster only (T200792) (duration: 00m 46s)
  • 21:44 mutante: adding new prod ServerAlias punjabi.wikimedia.org to Apache cluster (T207583)
  • 21:13 ayounsi@deploy1001: Finished deploy [librenms/librenms@0fd8da6]: Revert LibreNMS upgrade - T207481 (duration: 00m 08s)
  • 21:13 ayounsi@deploy1001: Started deploy [librenms/librenms@0fd8da6]: Revert LibreNMS upgrade - T207481
  • 21:08 andrewbogott: rebooting cloudvirt1023
  • 20:52 ayounsi@deploy1001: Finished deploy [librenms/librenms@737683a]: Upgreade LibreNMS to 1.44 - T207481 (duration: 00m 10s)
  • 20:52 ayounsi@deploy1001: Started deploy [librenms/librenms@737683a]: Upgreade LibreNMS to 1.44 - T207481
  • 20:29 ladsgroup@deploy1001: Finished deploy [ores/deploy@e89e880]: Use redis task tracker (T152012) (duration: 22m 02s)
  • 20:06 ladsgroup@deploy1001: Started deploy [ores/deploy@e89e880]: Use redis task tracker (T152012)
  • 18:54 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT Deploy TemplateWizard everywhere T202545, re-try (duration: 00m 45s)
  • 18:50 jforrester@deploy1001: scap failed: average error rate on 4/11 canaries increased by 10x (rerun with --force to override this check, see https://logstash.wikimedia.org/goto/db09a36be5ed3e81155041f7d46ad040 for details)
  • 18:48 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings-labs.php: [Beta] Temporarily disable WBMI from Beta Commons whilst Wikibse is fixed T180981 (duration: 00m 46s)
  • 18:38 jforrester@deploy1001: Synchronized php-1.32.0-wmf.26/resources/src/mediawiki.rcfilters/styles/mw.rcfilters.ui.ChangesListWrapperWidget.highlightCircles.seenunseen.less: SWAT RCFIlters: Fix highlight circles for unseen changes T207472 (duration: 00m 46s)
  • 18:36 jforrester@deploy1001: Synchronized php-1.32.0-wmf.26/skins/MinervaNeue/resources/skins.minerva.scripts/pageIssuesLogger.js: SWAT Fix reading depth logging part 2 T207423 (duration: 00m 46s)
  • 18:35 jforrester@deploy1001: Synchronized php-1.32.0-wmf.26/extensions/WikimediaEvents/modules/all/ext.wikimediaEvents.readingDepth.js: SWAT Fix reading depth logging part 1 T207423 (duration: 00m 46s)
  • 18:31 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT Add TemplateWizard to the BF allow list T205290 (duration: 00m 48s)
  • 18:05 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings-labs.php: [Beta Cluster] Load but don't enable MediaInfo on Beta Commons cf. T180981 (duration: 00m 45s)
  • 18:00 jforrester@deploy1001: Synchronized wmf-config/Wikibase.php: For WikibaseMediaInfo wikis, load basic Wikibase repo code cf. T180981 (duration: 00m 46s)
  • 17:57 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Allow enablement of the WikibaseMediaInfo, still off everywhere cf. T180981 (duration: 00m 48s)
  • 17:46 jforrester@deploy1001: Synchronized wmf-config/extension-list: Add WikibaseMediaInfo i18n to cache cf. T180981 (duration: 00m 46s)
  • 17:40 mobrovac@deploy1001: Finished deploy [proton/deploy@b3e254a]: Update Puppeteer to v1.9.0 - T207416 (duration: 01m 34s)
  • 17:38 mobrovac@deploy1001: Started deploy [proton/deploy@b3e254a]: Update Puppeteer to v1.9.0 - T207416
  • 17:34 onimisionipe@deploy1001: Finished deploy [wdqs/wdqs@975a67b]: WDQS deployment - GUI update and binaries upgrade (duration: 11m 47s)
  • 17:23 XioNoX: enable cr2:xe-4/0/0 (to asw-a) for optics replacement - T203719
  • 17:22 onimisionipe@deploy1001: Started deploy [wdqs/wdqs@975a67b]: WDQS deployment - GUI update and binaries upgrade
  • 17:20 XioNoX: disable cr2:xe-4/0/0 (to asw-a) for optics replacement - T203719
  • 17:19 elukey@deploy1001: Finished deploy [analytics/refinery@1de5f44]: Deploy new version of Camus and pageview whitelist (duration: 07m 05s)
  • 17:16 cmjohnson1: analytics1068 down for mother board swap
  • 17:12 elukey@deploy1001: Started deploy [analytics/refinery@1de5f44]: Deploy new version of Camus and pageview whitelist
  • 17:11 XioNoX: re-enable puppet fleet-wide for puppetmaster1001 uplink move
  • 17:10 XioNoX: moving puppetmaster1001 uplink to asw2-b
  • 17:08 andrew@deploy1001: Finished deploy [horizon/deploy@431a55d]: Rolling out fix for T207510 (duration: 03m 40s)
  • 17:07 XioNoX: disable puppet fleet-wide for puppetmaster1001 uplink move
  • 17:05 andrew@deploy1001: Started deploy [horizon/deploy@431a55d]: Rolling out fix for T207510
  • 16:24 arturo: T206261 2h icinga downtime cloudnet1003/4 for another patch
  • 15:54 ejegg: updated payments-wiki from 06848600ed to 5a7a8e7e4f
  • 15:51 ejegg: updated fundraising CiviCRM from 1f10dc8a18 to bde28d4453
  • 15:35 XioNoX: push firewall changes to pfw3-eqiad - T207175
  • 15:35 ejegg: updated standalone SmashPig deployment from 581c685326 to 2292111bda
  • 15:03 mforns@deploy1001: Finished deploy [analytics/refinery@bbebc20]: deploying refinery together with refinery-source v0.0.79 (duration: 10m 16s)
  • 14:52 mforns@deploy1001: Started deploy [analytics/refinery@bbebc20]: deploying refinery together with refinery-source v0.0.79
  • 14:49 XioNoX: push firewall changes to pfw3-codfw - T207175
  • 13:19 marostegui: Run myloader for enwikivoyage cebwiki shwiki srwiki mgwiktionary on db2052 (s5 codfw master) - T184805
  • 13:12 kartik@deploy1001: Finished deploy [cxserver/deploy@5f53734]: Update cxserver to 7f996f3 (T207445) (duration: 03m 53s)
  • 13:08 kartik@deploy1001: Started deploy [cxserver/deploy@5f53734]: Update cxserver to 7f996f3 (T207445)
  • 11:51 zeljkof: eu swat finished
  • 11:49 zfilipin@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Enable rollbacker right on srwikisource (T206935) (duration: 00m 46s)
  • 11:37 zfilipin@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Enable autopatroller, patroller and rollbacker rights on srwikiquote (T206936) (duration: 00m 49s)
  • 11:28 zfilipin@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Enable suppressredirect and markbotedit rights to rollbackers on it.wikiversity (T207300) (duration: 00m 46s)
  • 11:21 zfilipin@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Enable cx2outreach campaign (T207031) (duration: 00m 47s)
  • 11:09 zfilipin@deploy1001: Synchronized static/images/project-logos/: SWAT: Anniversary logo for cswiki (T207589) (duration: 00m 47s)
  • 11:06 zfilipin@deploy1001: sync-file aborted: SWAT: Test if logo specified in wgLogo/wgLogoHD exists (T207053) (duration: 00m 02s)
  • 10:03 arturo: icinga downtime for cloudnet1003/4 for T206261
  • 09:16 marostegui: Remove replication filters from db2052 (s5 codfw master) - T184805
  • 09:04 marostegui: Run mydumper on db1100 for enwikivoyage cebwiki shwiki srwiki mgwiktionary - T184805
  • 08:58 marostegui: Stop replication in sync on db1100 and db2052 (codfw master) to reimport wikis - T184805
  • 08:33 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1100 - T184805 (duration: 00m 47s)
  • 08:29 moritzm: powercycling ms-be1018, stuck during reboot
  • 08:28 jynus: performing deletes on db1087 to fix wb_terms on labs
  • 08:27 marostegui: Deploy schema change on db2043 (s3 master) without replication - T204006
  • 08:22 marostegui: Disconnect codfw -> eqiad replication on s5 (db1070)
  • 08:19 marostegui: Disconnect codfw -> eqiad replication on s3 (db1075)
  • 08:13 marostegui: Disconnect codfw -> eqiad replication on es3 (es1017)
  • 08:11 marostegui: Disconnect codfw -> eqiad replication on es2 (es1015)
  • 08:08 marostegui: Disconnect codfw -> eqiad replication on x1 (db1069)
  • 08:05 marostegui: Disconnect codfw -> eqiad replication on s8 (db1071)
  • 08:03 marostegui: Disconnect codfw -> eqiad replication on s7 (db1062)
  • 08:01 marostegui: Disconnect codfw -> eqiad replication on s6 (db1061)
  • 07:59 marostegui: Disconnect codfw -> eqiad replication on s4 (db1068)
  • 07:57 marostegui: Disconnect codfw -> eqiad replication on s2 (db1066)
  • 07:52 marostegui: Disconnect codfw -> eqiad replication on s1 (db1067)
  • 07:38 moritzm: rebooting swift-be servers in eqiad for kernel security update
  • 07:24 godog: reformat ms-be2042 - T199198
  • 06:34 marostegui: Deploy schema change on db2036 - T204006
  • 06:11 marostegui: Deploy schema change on db2050 - T204006
  • 06:00 marostegui: Deploy schema change on db2057 - T204006
  • 05:47 marostegui: Deploy schema change on s3 db2074 (and db2094 sanitarium) - T204006
  • 05:31 marostegui: Deploy schema change on dbstore2002:3313 - T204006
  • 05:29 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Clarify db2033 BBU status (duration: 00m 49s)
  • 04:37 kartik@deploy1001: Finished deploy [cxserver/deploy@904151f]: Update cxserver to eee8974 (T207070, T203077, T199529) (duration: 05m 42s)
  • 04:31 kartik@deploy1001: Started deploy [cxserver/deploy@904151f]: Update cxserver to eee8974 (T207070, T203077, T199529)

2018-10-21

  • 22:15 onimisionipe: repooling wdqs1003 as it has caught up on lag
  • 20:42 banyek: resuming replication on s4@dbstore2002 (T204930)
  • 16:15 reedy@deploy1001: Synchronized wmf-config/interwiki.php: Updating interwiki cache (duration: 04m 52s)
  • 15:57 bawolff: adjust patch for T194204
  • 12:39 onimisionipe: depooling wdqs1003 to catchup on lag time

2018-10-20

  • 23:05 reedy@deploy1001: Synchronized php-1.32.0-wmf.26/extensions/CentralAuth/: Update setEmail (duration: 00m 55s)
  • 21:29 gehel: repooling wdqs1003 (still some lag, but 100[45] start to be impacted)
  • 19:54 gehel: depooling wdqs1003 to catch up on lag
  • 13:53 reedy@deploy1001: Synchronized php-1.32.0-wmf.26/includes/auth/AuthManager.php: (no justification provided) (duration: 00m 55s)
  • 12:46 hoo@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Add CentralAuth related permissions to stewards at metawiki (T207531) (duration: 01m 09s)
  • 05:38 marostegui: Force writeback on db2033 - T184888

2018-10-19

  • 20:33 twentyafterfour: deployed RCFilters: Fix completely broken highlight circles refs T207472
  • 20:32 twentyafterfour@deploy1001: Synchronized php-1.32.0-wmf.26/resources/src/mediawiki.rcfilters/styles/: sync https://gerrit.wikimedia.org/r/#/c/mediawiki/core/+/468636/ (duration: 00m 54s)
  • 20:31 twentyafterfour: deploying https://gerrit.wikimedia.org/r/#/c/mediawiki/core/+/468636/ to the full cluster.
  • 20:28 twentyafterfour: deployed https://gerrit.wikimedia.org/r/#/c/mediawiki/core/+/468636/ to mwdebug1002
  • 19:20 mutante: ns0 / ns1 - authdns-gen-zones -f /srv/authdns/git/templates /etc/gdnsd/zones && gdnsdctl reload-zones - to add new language shn (T206777)
  • 19:16 mutante: ns2/multatuli - gnddctl reload-zones
  • 19:12 mutante: labweb1001 / wikitech - disabling 2fa for myself, logging in , re-enabling it again
  • 17:49 ejegg: updated fundraising CiviCRM from 83874e75ba to 1f10dc8a18
  • 17:47 mutante: DNS - 'authdns-gen-zones -f /srv/authdns/git/templates /etc/gdnsd/zones && gdnsd checkconf && gdnsd reload-zones' - needed when adding new languages to langs.tmpl - adding "shn" (Shan language) T206777
  • 16:36 XioNoX: deactivate BGP to 15426 in ams-ix (down and no reply to emails) - T207428
  • 14:16 banyek: disconnecting s4 replication on dbstore2002 (T204930)
  • 14:12 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Remove useless comments (duration: 00m 54s)
  • 13:58 vgutierrez: Uploaded certcentral 0.2 to apt.wikimedia.org (stretch) - T207457
  • 11:46 banyek: starting compression of s4 tables @dbstore2002 (T204930)
  • 11:33 addshore@deploy1001: Synchronized wmf-config/InitialiseSettings.php: T207313 UBN - Revert back wikidata for change_tag backend (duration: 00m 59s)
  • 10:53 arturo: icinga downtime for 2h for clounet1003/1004 to deploy patch related to T206261
  • 09:37 godog: bump /proc/sys/net/core/rmem_default temporarily to 6MB and bounce statsd-proxy statsite-instances on graphite1004 - T196484
  • 08:53 banyek: adding wmf-pt-kill_2.2.20-1+wmf4 package for stretch (T206521)
  • 08:28 jynus: stopping db1092 and db1087 in sync
  • 07:50 godog: bump /proc/sys/net/core/rmem_default temporarily to 2MB and bounce statsd-proxy statsite-instances on graphite1004 - T196484
  • 07:20 marostegui: Remove mwmaint1001 grants from m5 - https://phabricator.wikimedia.org/T201343 https://phabricator.wikimedia.org/T192457
  • 07:15 godog: powercycle ms-be1021, [19601329.556259] sd 0:1:0:1: rejecting I/O to offline device
  • 07:05 godog: bump /proc/sys/net/core/rmem_default temporarily to 1MB and bounce statsd-proxy statsite-instances on graphite1004 - T196484
  • 06:13 marostegui: Deploy schema change on s7 codfw host by host without replication - T204006
  • 05:58 marostegui: Deploy schema change on s2 codfw host by host without replication - T204006
  • 05:25 marostegui: Deploy schema change on s1 codfw host by host without replication - T204006
  • 01:49 krinkle@deploy1001: Synchronized php-1.32.0-wmf.26/extensions/WikimediaEvents/includes/WikimediaEventsHooks.php: Ic74a9d5601b8c (duration: 00m 55s)

2018-10-18

  • 22:00 mutante: lvs1011,lvs1012 - manually editing nagios NRPE config and restarting service (to make monitoring from icinga1001 work and puppet is disabled)
  • 21:52 mutante: eeden - manually editing nagios NRPE config and restarting service (to make monitoring from icinga1001 work and puppet is disabled)
  • 21:49 twentyafterfour@deploy1001: rebuilt and synchronized wikiversions files: group2 wikis to 1.32.0-wmf.26 refs T191072
  • 21:46 twentyafterfour@deploy1001: Synchronized php-1.32.0-wmf.26/includes/filerepo/file/LocalFile.php: sync Id97e1c refs T207419 (duration: 00m 53s)
  • 21:29 twentyafterfour@deploy1001: Synchronized php-1.32.0-wmf.26/includes/filerepo/file/LocalFile.php: sync https://gerrit.wikimedia.org/r/#/c/mediawiki/core/+/468470/ refs T207419 (duration: 00m 54s)
  • 20:49 twentyafterfour@deploy1001: rebuilt and synchronized wikiversions files: group2 wikis to 1.32.0-wmf.24 refs T191072
  • 20:39 twentyafterfour@deploy1001: rebuilt and synchronized wikiversions files: all wikis to 1.32.0-wmf.26
  • 20:21 volans: start ferm on db2042, it failed to start at reboot due to DNS resolution timeout
  • 19:22 ejegg: updated SmashPig standalone deploy from 5f21d3f2db to 581c685326
  • 19:21 ejegg: updated payments-wiki from a3892e4ed3 to 06848600ed
  • 19:17 shdubsh: rebooting graphite1004
  • 19:11 shdubsh: upping ring buffer size on graphite1004 in an attempt to mitigate dropped packets at the interface -- T196484
  • 19:02 sbisson@deploy1001: Synchronized php-1.32.0-wmf.26/extensions/PageTriage/: SWAT: Use Main Object Stash for keeping track of PageTriage last use (duration: 00m 54s)
  • 18:19 awight: Restarting ORES services for T88997
  • 17:33 ladsgroup@deploy1001: Finished deploy [ores/deploy@4ac4c8b]: Logstash support for ores: T181546 T169586 T168921 T181630 T205256 (duration: 23m 48s)
  • 17:19 herron: aborted enabling kafka on logstash elasticsearch cluster due to puppet errors. reverted change T206454
  • 17:09 ladsgroup@deploy1001: Started deploy [ores/deploy@4ac4c8b]: Logstash support for ores: T181546 T169586 T168921 T181630 T205256
  • 17:00 twentyafterfour@deploy1001: Synchronized php: group1 wikis to 1.32.0-wmf.26 refs T191072 (duration: 00m 53s)
  • 16:59 twentyafterfour@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.32.0-wmf.26 refs T191072
  • 16:57 herron: enabling kafka on logstash elasticsearch cluster T206454
  • 16:55 twentyafterfour@deploy1001: Synchronized php-1.32.0-wmf.26/extensions/WikibaseQualityConstraints/src/ServiceWiring.php: sync https://gerrit.wikimedia.org/r/#/c/mediawiki/extensions/WikibaseQualityConstraints/+/468352/ refs T207394 (duration: 00m 54s)
  • 16:52 mobrovac@deploy1001: Finished deploy [restbase/deploy@6c879fa]: Have 100% of traffic directed to Proton as well - T186748 (duration: 20m 52s)
  • 16:31 mobrovac@deploy1001: Started deploy [restbase/deploy@6c879fa]: Have 100% of traffic directed to Proton as well - T186748
  • 15:51 XioNoX: trunk cloud-instances2-b-eqiad between asw-b-eqiad and asw2-b-eqiad
  • 15:50 cmjohnson1: disabling checks on cloudvirt1019 for maintenance
  • 15:42 twentyafterfour: twentyafterfour@deploy1001 Synchronized php: group1 wikis to 1.32.0-wmf.24 refs T191072 (duration: 00m 53s)
  • 15:35 twentyafterfour@deploy1001: scap failed: average error rate on 6/11 canaries increased by 10x (rerun with --force to override this check, see https://logstash.wikimedia.org/goto/db09a36be5ed3e81155041f7d46ad040 for details)
  • 14:46 moritzm: installing tomcat8 security updates
  • 14:34 moritzm: remove labvirt1018 from debmonitor (T207317)
  • 14:28 godog: temporarily bump default socket receive memory to 1MB on graphite1001, restart statsd-proxy and statsite
  • 14:22 godog: begin reformat of ms-be2041 - T199198
  • 14:21 banyek: shutting down mysql and powering down db2042 (T202051)
  • 14:13 godog: corrections to the statements above, graphite1004 not graphite1001
  • 14:11 godog: ditto for statsite instances on graphite1001, temporarily bump receive socket memory to 1MB and bounce the service
  • 14:08 godog: temporarily bump receive socket memory for statsd-proxy on graphite1001 and bounce the service
  • 13:51 moritzm: installing libidn security updates
  • 12:59 moritzm: installing libssh security updates
  • 12:55 godog: bounce statsd-proxy on graphite1001
  • 11:59 addshore: SWAT done
  • 11:59 addshore@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Wikidata.org: enable sense data type T203888 (duration: 00m 54s)
  • 11:54 mobrovac@deploy1001: Finished deploy [restbase/deploy@1041a02]: Disable onthisday check - T203588 (duration: 21m 23s)
  • 11:54 zfilipin@deploy1001: Synchronized tests/InitialiseSettingsTest.php: SWAT: Test if logo specified in wgLogo/wgLogoHD exists (T207053) (duration: 00m 53s)
  • 11:49 zfilipin@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Fix typo in IS.php: use ltwiki instead of ltwikipedia (T207081) (duration: 00m 54s)
  • 11:39 zfilipin@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Use testwikidatawiki instead of testwikidata in IS.php (T207089) (duration: 00m 53s)
  • 11:33 mobrovac@deploy1001: Started deploy [restbase/deploy@1041a02]: Disable onthisday check - T203588
  • 11:29 zfilipin@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Use new wordmarks in uzwiki (T205226) (duration: 00m 53s)
  • 11:10 zfilipin@deploy1001: Synchronized static/images/mobile/copyright/: SWAT: Upload uz specific wordmark (T205226) (duration: 00m 54s)
  • 10:59 addshore: wikidata senses deploy slot done
  • 10:57 addshore: addshore@mwmaint1002:~$ mwscript purgeList.php --wiki wikidatawiki --namespace 146
  • 10:57 mobrovac@deploy1001: Finished deploy [restbase/deploy@88c8f26]: Parallelise onthisday call, take #4 (duration: 03m 52s)
  • 10:55 addshore@deploy1001: Synchronized wmf-config/CommonSettings.php: RejectParserCacheValue Wikidata lexemes before sense deployment T203888 (duration: 00m 54s)
  • 10:54 addshore@deploy1001: sync-file aborted: RejectParserCacheValue Wikidata lexemes before sense deploymentT203888 (duration: 00m 00s)
  • 10:53 mobrovac@deploy1001: Started deploy [restbase/deploy@88c8f26]: Parallelise onthisday call, take #4
  • 10:53 mobrovac@deploy1001: Finished deploy [restbase/deploy@88c8f26]: Parallelise onthisday call, take #3 (duration: 04m 13s)
  • 10:51 addshore@deploy1001: Synchronized php-1.32.0-wmf.24/extensions/WikibaseLexeme: Wikidata: Make statement group IDs on Senses unique (duration: 00m 59s)
  • 10:49 mobrovac@deploy1001: Started deploy [restbase/deploy@88c8f26]: Parallelise onthisday call, take #3
  • 10:49 mobrovac@deploy1001: Finished deploy [restbase/deploy@88c8f26]: Parallelise onthisday call, take #2 (duration: 07m 32s)
  • 10:41 mobrovac@deploy1001: Started deploy [restbase/deploy@88c8f26]: Parallelise onthisday call, take #2
  • 10:41 mobrovac@deploy1001: Finished deploy [restbase/deploy@88c8f26]: Parallelise onthisday call - T203588 (duration: 11m 24s)
  • 10:34 addshore@deploy1001: Synchronized wmf-config/Wikibase-production.php: Combine if blocks in Wikibase-production NOOP (duration: 00m 53s)
  • 10:32 volans@deploy1001: Finished deploy [netbox/deploy@1cd4d43]: Upgrade to upstream v2.4.6 (5) - T205896 (duration: 00m 29s)
  • 10:31 volans@deploy1001: Started deploy [netbox/deploy@1cd4d43]: Upgrade to upstream v2.4.6 (5) - T205896
  • 10:31 volans@deploy1001: Finished deploy [netbox/deploy@1cd4d43]: Upgrade to upstream v2.4.6 (5) - T205896 (duration: 01m 37s)
  • 10:31 addshore@deploy1001: Synchronized wmf-config/InitialiseSettings-labs.php: BETA ONLY Remove wgLexemeEnableSenses from IS-labs (duration: 00m 53s)
  • 10:30 volans@deploy1001: Started deploy [netbox/deploy@1cd4d43]: Upgrade to upstream v2.4.6 (5) - T205896
  • 10:29 mobrovac@deploy1001: Started deploy [restbase/deploy@88c8f26]: Parallelise onthisday call - T203588
  • 10:28 volans@deploy1001: Finished deploy [netbox/deploy@1cd4d43]: Upgrade to upstream v2.4.6 (4) - T205896 (duration: 00m 05s)
  • 10:28 volans@deploy1001: Started deploy [netbox/deploy@1cd4d43]: Upgrade to upstream v2.4.6 (4) - T205896
  • 10:15 addshore: purging wikidata lexemes
  • 10:12 volans@deploy1001: Finished deploy [netbox/deploy@1cd4d43]: Upgrade to upstream v2.4.6 (3) - T205896 (duration: 00m 29s)
  • 10:11 volans@deploy1001: Started deploy [netbox/deploy@1cd4d43]: Upgrade to upstream v2.4.6 (3) - T205896
  • 10:10 addshore@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enable senses on wikidatawiki T203888 (duration: 00m 53s)
  • 10:09 volans@deploy1001: Finished deploy [netbox/deploy@1cd4d43]: Upgrade to upstream v2.4.6 (2) - T205896 (duration: 02m 01s)
  • 10:07 volans@deploy1001: Started deploy [netbox/deploy@1cd4d43]: Upgrade to upstream v2.4.6 (2) - T205896
  • 10:00 volans@deploy1001: Finished deploy [netbox/deploy@438f1c0]: Upgrade to upstream v2.4.6 - T205896 (duration: 03m 07s)
  • 09:57 volans@deploy1001: Started deploy [netbox/deploy@438f1c0]: Upgrade to upstream v2.4.6 - T205896
  • 09:52 XioNoX: activate bgp group Customer6 on cr4-ulsfo
  • 09:20 banyek: enabling replication monitor check on pc1005 pc1006 pc2005 pc2006 (T206992)
  • 09:18 godog: bounce statsd-proxy on graphite1001
  • 09:08 moritzm: powercycling ms-be2019, stuck during reboot
  • 09:01 banyek: enabling replication monitor check on pc1004 (T206992)
  • 08:56 banyek: enabling replication monitor check on pc2004 (T206992)
  • 08:41 banyek: disabling puppet on parser caches (T206992)
  • 08:40 banyek: adding replication monitoring checks to parsercache hosts (T206992)
  • 08:26 vgutierrez: Uploaded certcentral 0.1-2 to apt.wikimedia.org (stretch)
  • 07:56 moritzm: rebooting swift backend servers in codfw for spectre v3/v4/L1TF security updates
  • 07:43 addshore@deploy1001: Synchronized wmf-config/Wikibase.php: Wikidata dispatch: reduce concurrent dispatchers to 2 (duration: 00m 59s)
  • 05:34 marostegui: Restarting a failed s8 backup from dbstore1001 to db1116:3318
  • 05:05 XioNoX: start office-DC link renumbering - T205985
  • 02:51 ejegg: updated fundraising CiviCRM from 7b8d33bb4e to 83874e75ba
  • 00:32 twentyafterfour: restarting apache on phab1001 to apply b3bfff1

2018-10-17

  • 22:56 awight: Restarting ORES uwsgi service for T88997
  • 22:38 twentyafterfour@deploy1001: rebuilt and synchronized wikiversions files: group0 wikis to 1.32.0-wmf.26 refs T191072
  • 22:36 robh: bast4001 reboot is my fault, power cables were justled when i was decommssioning lvs4002 right above it in the rack
  • 22:31 ejegg: updated fundraising CiviCRM from 5eac0634e6 to 7b8d33bb4e
  • 22:24 ejegg: updated payments-wiki from 0385ad02a7 to a3892e4ed3
  • 22:21 ppchelko@deploy1001: Finished deploy [restbase/deploy@88c8f26] (dev-cluster): Spread requests beetween MCS nodes for onthisday (duration: 02m 54s)
  • 22:18 ppchelko@deploy1001: Started deploy [restbase/deploy@88c8f26] (dev-cluster): Spread requests beetween MCS nodes for onthisday
  • 20:50 arlolra: Updated Parsoid to e6b708b (T204622, T187848, T207093)
  • 20:40 arlolra@deploy1001: Finished deploy [parsoid/deploy@babf1da]: Updating Parsoid to e6b708b (duration: 08m 41s)
  • 20:32 arlolra@deploy1001: Started deploy [parsoid/deploy@babf1da]: Updating Parsoid to e6b708b
  • 20:17 mobrovac@deploy1001: Started restart [proton/deploy@a657059]: (no justification provided)
  • 20:10 ejegg: updated fundraising CiviCRM from 4cc21d61c5 to 5eac0634e6
  • 19:26 shdubsh: restart eventlogging for statsd DNS change - T88997
  • 19:23 twentyafterfour: Mediawiki train is still blocked by T207288
  • 19:19 godog: restart zuul for statsd DNS change - T88997
  • 19:12 mutante: scb1003 - restart pdfrender
  • 19:09 godog: roll-restart eventbus for statsd DNS change - T88997
  • 19:00 krinkle@deploy1001: Synchronized php-1.32.0-wmf.26/includes/cache/: T193271 - I25aa0e27200a0 (duration: 01m 01s)
  • 18:57 awight: Restarting ORES cluster to refresh DNS, T88997
  • 18:48 banyek: repooling labsdb1009 (T181650)
  • 18:48 shdubsh: restart navtiming on webperf nodes
  • 18:39 godog: restart jmxtrans on kafka hosts
  • 18:17 shdubsh: moving statsd cname to graphite1004
  • 18:07 banyek: depooling labsdb1009 (T181650)
  • 17:08 banyek: depooling labsdb1009 (T181650)
  • 16:53 banyek: repooling labsdb1011
  • 15:53 twentyafterfour@deploy1001: Synchronized php-1.32.0-wmf.26/extensions/AbuseFilter/: sync AbuseFilter revision 4e2a6b6 to 1.32.0-wmf.26 refs T207220 (duration: 00m 58s)
  • 15:34 banyek@deploy1001: Synchronized wmf-config/db-codfw.php: T206593: Enabling db2096 for x1 (duration: 00m 56s)
  • 15:31 banyek@deploy1001: Synchronized wmf-config/db-codfw.php: T206593: Enabling db2096 for x1 (duration: 00m 56s)
  • 15:28 banyek: enabling db2096 for cluster x1 (T206593)
  • 14:33 godog: upload prometheus-statsd-exporter 0.7.0+ds1-2 - T205870
  • 14:01 marostegui: Repool labsdb1010, depool labsdb1011 - T181650
  • 13:08 gehel: applying rps NIC config for all wdqs nodes - T206105
  • 13:05 banyek: deplooling labsdb1010 (T181650)
  • 12:56 banyek: enabling notifications on db2096 (T206593)
  • 12:55 banyek: enabling notifications on db2096
  • 11:40 Amir1: EU SWAT is done
  • 11:40 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Enable reading from new backend of change tag everywhere (T194164) (duration: 00m 57s)
  • 11:32 moritzm: installing graphicsmagick security updates
  • 11:30 addshore@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: T206702 Enable client side error counting on Minerva (duration: 00m 57s)
  • 11:26 addshore@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: T207196 gerrit:467736 Wikidata: enable JSON-LD data format on test.wikidata.org (duration: 00m 56s)
  • 11:21 addshore@deploy1001: Synchronized wmf-config/Wikibase.php: SWAT: T207196 Wikidata: add setting for setting the enabled entity data forms gerrit:467735 PT 2/2 (duration: 00m 56s)
  • 11:19 addshore@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: T207196 Wikidata: add setting for setting the enabled entity data forms gerrit:467735 PT 1/2 (duration: 00m 57s)
  • 11:17 Amir1: ladsgroup@mwmaint1002:~$ mwscript deleteLocalPasswords.php --wiki=enwiki --delete --batch-size 200 (This will cause lag on codfw)
  • 11:15 addshore@deploy1001: Synchronized wmf-config/CommonSettings.php: SWAT: T205611 T205330 Remove Wikidata RejectParserCacheValue hook gerrit:467913 (duration: 00m 56s)
  • 11:11 addshore@deploy1001: Synchronized wmf-config/Wikibase.php: SWAT: Increase wikidata dispatch randomness to 30 (duration: 00m 56s)
  • 11:08 addshore@deploy1001: Synchronized wmf-config/Wikibase-production.php: SWAT: T207019 gerrit:467343 Enable WBQualityConstraintsSuggestionsBetaFeature on wikidatawiki (duration: 00m 56s)
  • 11:04 addshore@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT gerrit:467691 Add constraint-suggestions to wgBetaFeaturesWhitelist (duration: 01m 10s)
  • 11:04 ariel@deploy1001: Finished deploy [dumps/dumps@ed7eed9]: use lbzip2 for recombine steps if configured (duration: 00m 03s)
  • 11:04 ariel@deploy1001: Started deploy [dumps/dumps@ed7eed9]: use lbzip2 for recombine steps if configured
  • 09:34 XioNoX: update interfaces and BGP IPs for office-DC link (DC side, interfaces still disabled) - T205985
  • 09:30 banyek: truncating parsercache tables on pc2006 (T206740)
  • 09:12 _joe_: reenabling puppet (not running it) in codfw
  • 09:12 _joe_: change applied to all appservers serving traffic
  • 09:08 _joe_: running puppet on all apaches (appserver/api) in eqiad to pick up the wikipedia.org vhost refactor
  • 09:05 _joe_: running puppet on mwdebug1001, then testing again wikipedia.org for regressions
  • 09:04 _joe_: puppet disabled on the appservers, now merging the wikipedia.org conversion to mediawiki::web::vhost
  • 08:43 mobrovac@deploy1001: Started restart [proton/deploy@a657059]: (no justification provided)
  • 08:30 kartik@deploy1001: Finished deploy [cxserver/deploy@b30a323]: Update cxserver to 29e01e4 (T206305, T204668) (duration: 03m 54s)
  • 08:27 kartik@deploy1001: Started deploy [cxserver/deploy@b30a323]: Update cxserver to 29e01e4 (T206305, T204668)
  • 08:09 banyek: stopping binlog purgers on the parsercache hosts (the binlogs will be kept for 24hrs) - T206740
  • 08:00 banyek: truncating parsercache tables on pc2005 (T206740)
  • 06:52 jynus: fixing s8 master drifts T206743
  • 02:10 ejegg: updated payments-wiki from 7fb1aae963 to 0385ad02a7
  • 01:24 legoktm@deploy1001: Synchronized wmf-config/CommonSettings.php: Add REL1_32 to ExtensionDistributor (duration: 00m 59s)

2018-10-16

  • 22:11 ppchelko@deploy1001: Finished deploy [restbase/deploy@d9e3a09]: Downgrade major-greater to minor-greater if no-cache is required, take 6 (duration: 01m 18s)
  • 22:09 ppchelko@deploy1001: Started deploy [restbase/deploy@d9e3a09]: Downgrade major-greater to minor-greater if no-cache is required, take 6
  • 22:09 ppchelko@deploy1001: Finished deploy [restbase/deploy@d9e3a09]: Downgrade major-greater to minor-greater if no-cache is required, take 5 (duration: 05m 16s)
  • 22:04 ppchelko@deploy1001: Started deploy [restbase/deploy@d9e3a09]: Downgrade major-greater to minor-greater if no-cache is required, take 5
  • 22:04 ppchelko@deploy1001: Finished deploy [restbase/deploy@d9e3a09]: Downgrade major-greater to minor-greater if no-cache is required, take 4 (duration: 03m 53s)
  • 22:00 ppchelko@deploy1001: Started deploy [restbase/deploy@d9e3a09]: Downgrade major-greater to minor-greater if no-cache is required, take 4
  • 22:00 ppchelko@deploy1001: Finished deploy [restbase/deploy@d9e3a09]: Downgrade major-greater to minor-greater if no-cache is required, take 3 (duration: 04m 15s)
  • 21:58 twentyafterfour@deploy1001: rebuilt and synchronized wikiversions files: group0 wikis to 1.32.0-wmf.24 refs T191072
  • 21:55 ppchelko@deploy1001: Started deploy [restbase/deploy@d9e3a09]: Downgrade major-greater to minor-greater if no-cache is required, take 3
  • 21:55 ppchelko@deploy1001: Finished deploy [restbase/deploy@d9e3a09]: Downgrade major-greater to minor-greater if no-cache is required, take 2 (duration: 09m 11s)
  • 21:46 ppchelko@deploy1001: Started deploy [restbase/deploy@d9e3a09]: Downgrade major-greater to minor-greater if no-cache is required, take 2
  • 21:45 ppchelko@deploy1001: Finished deploy [restbase/deploy@d9e3a09]: Downgrade major-greater to minor-greater if no-cache is required (duration: 03m 53s)
  • 21:42 ppchelko@deploy1001: Started deploy [restbase/deploy@d9e3a09]: Downgrade major-greater to minor-greater if no-cache is required
  • 21:18 twentyafterfour@deploy1001: rebuilt and synchronized wikiversions files: group0 wikis to 1.32.0-wmf.26 refs T191072
  • 20:55 twentyafterfour@deploy1001: Finished scap: Syncing 1.32.0-wmf.26 refs T191072 (duration: 26m 32s)
  • 20:28 twentyafterfour@deploy1001: Started scap: Syncing 1.32.0-wmf.26 refs T191072
  • 20:14 shdubsh: restarted pdfrender on scb1003
  • 18:44 ppchelko@deploy1001: Started restart [proton/deploy@a657059]: Try restarting again for metrics
  • 18:43 ppchelko@deploy1001: Started restart [proton/deploy@a657059]: Try restarting again for metrics
  • 18:42 ppchelko@deploy1001: Finished deploy [proton/deploy@a657059]: Try restarting for metrics (duration: 00m 20s)
  • 18:42 ppchelko@deploy1001: Started deploy [proton/deploy@a657059]: Try restarting for metrics
  • 17:01 _joe_: restarted pdfrender on scb1004
  • 16:33 akosiaris: depool restbase-async from eqiad in order to test traffic going to parsoid codfw
  • 16:15 _joe_: disabled puppet on all appservers, merging wikidata apache change, re-enabling puppet on mwdebug1001 for testing
  • 14:51 mobrovac@deploy1001: Finished deploy [proton/deploy@a657059]: Rollback to puppeteer v1.5.0 - T186748 (duration: 00m 49s)
  • 14:51 mobrovac@deploy1001: Started deploy [proton/deploy@a657059]: Rollback to puppeteer v1.5.0 - T186748
  • 14:28 godog: roll-restart elasticsearch on logstash100[456] to change elasticsearch data dir - T206454
  • 14:06 godog: depool in turn logstash1008 and logstash1009 to change elasticsearch data dir - T206454
  • 13:55 godog: depool logstash1007 to change elasticsearch data dir - T206454
  • 13:54 XioNoX: router back and healthy, enable external BGP sessions on cr2-eqdfw - T203261
  • 13:51 moritzm: rebooting acamar for update to stretch-proposed-updates kernel
  • 13:44 XioNoX: reboot cr2-eqdfw for upgrade - T203261
  • 13:43 XioNoX: disable external BGP sessions on cr2-eqdfw - T203261
  • 13:43 anomie@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Setting comment table migration stage to write-new/read-both on group 0 (T166733) (duration: 00m 50s)
  • 13:34 XioNoX: start install process on cr2-eqdfw (non impacting before reboot) - T203261
  • 13:11 akosiaris: pool codfw for apertium|citoid|cxserver|eventbus|eventstreams|graphoid|mathoid|mobileapps|ores|parsoid|pdfrender|proton|recommendation-api|restbase|restbase-async|wdqs|wdqs-internal|zotero
  • 13:11 akosiaris@puppetmaster1001: conftool action : set/pooled=true; selector: name=codfw,dnsdisc=^apertium|citoid|cxserver|eventbus|eventstreams|graphoid|mathoid|mobileapps|ores|parsoid|pdfrender|proton|recommendation-api|restbase|restbase-async|wdqs|wdqs-internal|zotero$
  • 13:08 elukey: restart memcached on mc1035 with -R 200 (will wipe the object cache shard as consequence) - T203786
  • 12:57 akosiaris: pool mathoid eqiad
  • 12:52 gtirloni: T186571 removed legofan4000 user from project-tools group (leftover from T165624 legofan4000->macfan4000 rename)
  • 12:44 akosiaris@deploy1001: scap-helm mathoid finished
  • 12:43 akosiaris@deploy1001: scap-helm mathoid cluster eqiad completed
  • 12:43 akosiaris@deploy1001: scap-helm mathoid upgrade production stable/mathoid --reset-values -f mathoid.yaml [namespace: mathoid, clusters: eqiad]
  • 12:35 akosiaris: depool eqiad mathoid for helm chart upgrade
  • 12:32 akosiaris: pool codfw mathoid
  • 12:14 akosiaris@deploy1001: scap-helm mathoid finished
  • 12:14 akosiaris@deploy1001: scap-helm mathoid cluster codfw completed
  • 12:14 akosiaris@deploy1001: scap-helm mathoid upgrade production stable/mathoid --reset-values -f mathoid.yaml [namespace: mathoid, clusters: codfw]
  • 12:08 Amir1: EU SWAT is done
  • 12:08 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Enable reading from new backend of change_tag in s7 (T194164) (duration: 00m 50s)
  • 12:03 akosiaris@deploy1001: scap-helm mathoid finished
  • 12:03 akosiaris@deploy1001: scap-helm mathoid cluster codfw completed
  • 12:03 ladsgroup@deploy1001: Synchronized php-1.32.0-wmf.24/includes/changetags/ChangeTags.php: SWAT: Avoid fatals when the filter tags is empty (T194164) (duration: 00m 50s)
  • 12:03 akosiaris@deploy1001: scap-helm mathoid upgrade production stable/mathoid --set main_app.limits.memory=1G [namespace: mathoid, clusters: codfw]
  • 12:02 akosiaris@deploy1001: scap-helm mathoid upgrade production stable/mathoid --set main_app.limits.memory=1g [namespace: mathoid, clusters: codfw]
  • 11:49 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Re-enable search integration for ArticlePlaceholder (T195751) (duration: 00m 50s)
  • 11:38 zfilipin@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Enable Translate on idwikimedia (T204292) (duration: 00m 49s)
  • 11:32 banyek: the binlog purging stopped on pc2004 (T206740)
  • 11:27 akosiaris: upgrade mathoid chart to version 0.0.12
  • 11:26 akosiaris@deploy1001: scap-helm mathoid finished
  • 11:26 akosiaris@deploy1001: scap-helm mathoid cluster codfw completed
  • 11:26 akosiaris@deploy1001: scap-helm mathoid upgrade production stable/mathoid [namespace: mathoid, clusters: codfw]
  • 11:24 zfilipin@deploy1001: Synchronized wmf-config/throttle.php: SWAT: Add throttle rule for editathon at University of North Carolina at Charlotte (T207043) (duration: 00m 49s)
  • 11:18 zfilipin@deploy1001: Synchronized wmf-config/throttle.php: SWAT: Add new throttle rule for WMCL Editathon (T206914) (duration: 00m 49s)
  • 11:09 zfilipin@deploy1001: Synchronized wmf-config/throttle.php: SWAT: Add throttle rule for "Night of the Digital Language" (T206408) (duration: 00m 49s)
  • 11:05 zfilipin@deploy1001: Synchronized wmf-config/throttle.php: SWAT: Remove expired throttle rule (T207015) (duration: 00m 50s)
  • 11:02 banyek: truncating tables in parsecache@pc2004 (T206740)
  • 10:52 moritzm: rolling reboot of thumbor in eqiad for kernel security updates
  • 10:50 godog: run puppet on scb to deploy db configuration for recommendation-service
  • 10:37 banyek: stopping pc2005 -> pc1005 replication (T206740)
  • 10:37 banyek: stopping pc2006 -> pc1006 replication (T206740)
  • 10:22 jynus: running database maintenance tasks on cumin1001, expect very high memory usage
  • 09:53 akosiaris: upload blubber_0.6.0-1_amd64 to apt.wikimedia.org/jessie-wikimedia/main and apt.wikimedia.org/stretch-wikimedia/main T206766
  • 09:03 moritzm: rolling reboot of thumbor in codfw for kernel security updates
  • 08:56 banyek: stopping pc2004 -> pc1004 replication (T206740)
  • 08:42 moritzm: removed mwmaint1001 from debmonitor (T192457)
  • 07:46 akosiaris: upgrade apertium-apy throught the fleet T199447
  • 07:46 akosiaris: upgrade apertium-apy throught the fleet
  • 07:22 akosiaris: upload apertium-apy_0.11.4-1+wmf1 to apt.wikimedia.org/jessie-wikimedia/main T199447
  • 07:22 akosiaris: upload apertium-apy_0.11.4-1+wmf1 to apt.wikimedia.org/jessie-wikimedia/main
  • 07:20 akosiaris@deploy1001: scap-helm mathoid finished
  • 07:19 akosiaris@deploy1001: scap-helm mathoid cluster codfw completed
  • 07:19 akosiaris@deploy1001: scap-helm mathoid upgrade production stable/mathoid [namespace: mathoid, clusters: codfw]
  • 07:19 akosiaris@deploy1001: scap-helm mathoid upgrade [namespace: mathoid, clusters: codfw]
  • 07:17 moritzm: installing net-snmp security updates
  • 06:32 legoktm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Revert "Enable reading from new backend of change_tag in s7" (T194164) (duration: 00m 50s)
  • 06:05 jynus: stopping db1092 and db1087 in sync T206743
  • 05:10 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Remove db1092 BBU comments after BBU replacement (duration: 00m 52s)
  • 00:23 smalyshev@deploy1001: Finished deploy [wdqs/wdqs@c81dd9e]: Redeploy Updater for removal of props channel (duration: 10m 21s)
  • 00:13 smalyshev@deploy1001: Started deploy [wdqs/wdqs@c81dd9e]: Redeploy Updater for removal of props channel

2018-10-15

  • 20:52 arlolra: Updated Parsoid to 8f3ff40 (T205642, T206003, T187848, T205455, T205743)
  • 20:37 bsitzmann@deploy1001: Finished deploy [mobileapps/deploy@834d00a]: Update mobileapps to c2a4ef9 (T206701 T206467 T168875) (duration: 03m 47s)
  • 20:34 bsitzmann@deploy1001: Started deploy [mobileapps/deploy@834d00a]: Update mobileapps to c2a4ef9 (T206701 T206467 T168875)
  • 20:32 arlolra@deploy1001: Finished deploy [parsoid/deploy@b758124]: Updating Parsoid to 8f3ff40 (duration: 11m 43s)
  • 20:20 arlolra@deploy1001: Started deploy [parsoid/deploy@b758124]: Updating Parsoid to 8f3ff40
  • 19:37 mforns@deploy1001: Finished deploy [analytics/refinery@3f4adf8]: deploy refinery together with source version 0.0.78 without all removed old jars (duration: 05m 18s)
  • 19:33 smalyshev@deploy1001: Finished deploy [wdqs/wdqs@ff3bf90]: Redeploy 1010 (duration: 00m 28s)
  • 19:33 smalyshev@deploy1001: Started deploy [wdqs/wdqs@ff3bf90]: Redeploy 1010
  • 19:32 mforns@deploy1001: Started deploy [analytics/refinery@3f4adf8]: deploy refinery together with source version 0.0.78 without all removed old jars
  • 19:27 mforns@deploy1001: Finished deploy [analytics/refinery@1fc53d9]: deploy refinery together with source version 0.0.78 (duration: 15m 56s)
  • 19:11 mforns@deploy1001: Started deploy [analytics/refinery@1fc53d9]: deploy refinery together with source version 0.0.78
  • 18:59 tgr@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Enable reading from new backend of change_tag in s7 (T194164) (duration: 00m 49s)
  • 18:59 mutante: LDAP - added crusnov to wmf and ops groups
  • 18:51 tgr: pulled gerrit 467315 to mwdeploy1001 (no-op, no scap needed)
  • 18:47 onimisionipe@deploy1001: Finished deploy [wdqs/wdqs@ff3bf90]: GUI updates and new Updater build (duration: 13m 57s)
  • 18:44 tgr@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: cswikivoyage has HD logo even the project doesnt exist (T207066) (duration: 00m 49s)
  • 18:39 tgr@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Disable AICaptcha data collection (T186244) (duration: 00m 49s)
  • 18:33 tgr@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Fix a typo in wgLogoHD (mapwiki => napwiki) T207056, Remove techcomwikis row in wgLogo, techcomwiki doesnt exist T207056 (duration: 00m 48s)
  • 18:33 onimisionipe@deploy1001: Started deploy [wdqs/wdqs@ff3bf90]: GUI updates and new Updater build
  • 18:30 tgr@deploy1001: Synchronized wmf-config/InitialiseSettings-labs.php: SWAT: Beta: Show share button on mobile web for beta user (no-op) (duration: 00m 49s)
  • 18:14 addshore@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT enable senses on testwikidatawiki T203887 (duration: 00m 49s)
  • 18:10 addshore@deploy1001: Synchronized wmf-config/Wikibase-production.php: SWAT: T207019 Enable WBQualityConstraintsSuggestionsBetaFeature on testwikidatawiki (duration: 00m 49s)
  • 18:01 onimisionipe@deploy1001: Finished deploy [wdqs/wdqs@ff3bf90]: Test deployment - GUI update and new Updater build(wdqs1009) (duration: 02m 11s)
  • 17:59 onimisionipe@deploy1001: Started deploy [wdqs/wdqs@ff3bf90]: Test deployment - GUI update and new Updater build(wdqs1009)
  • 17:57 onimisionipe@deploy1001: Finished deploy [wdqs/wdqs@ff3bf90]: Test deployment - GUI update and new Updater build(wdqs1009) (duration: 02m 10s)
  • 17:55 onimisionipe@deploy1001: Started deploy [wdqs/wdqs@ff3bf90]: Test deployment - GUI update and new Updater build(wdqs1009)
  • 16:54 marostegui: Start replication on db1087 and db1092 to avoid them lagging behind the whole night (nothing running there at this time)
  • 16:36 cmjohnson1: replacing pem0 on asw2-a7-eqiad T206972
  • 16:18 _joe_: restart prometheus-mcrouter-exporter.service across the fleet
  • 15:39 marostegui: Stop MySQL and poweroff db1092 for BBU replacement - T205514
  • 15:31 andrewbogott: restarting slapd on seaborgium as a test for T205463
  • 15:14 cmjohnson1: replacing optics asw2-b fpc2 -fpc8
  • 15:13 mforns@deploy1001: Finished deploy [analytics/refinery@9b288c5]: deploy refinery together with source version 0.0.77 (duration: 20m 19s)
  • 14:53 mforns@deploy1001: Started deploy [analytics/refinery@9b288c5]: deploy refinery together with source version 0.0.77
  • 14:46 marostegui: Ease consistency replication options on db2048 to mitigate lag
  • 14:29 moritzm: rebooting backup2001 for some tests
  • 13:35 anomie@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Setting MCR migration stage to write-both/read-new on Commons (T198308) (duration: 00m 49s)
  • 13:32 banyek@deploy1001: Synchronized wmf-config/db-codfw.php: T206593: adding db2096 to hosts (and repooling db2069) (duration: 00m 49s)
  • 13:30 banyek@deploy1001: Synchronized wmf-config/db-eqiad.php: T206593: adding db2096 to hosts (and repooling db2069) (duration: 00m 49s)
  • 13:16 jynus: stopping db1092 and db1087 in sync T206743
  • 13:10 Jeff_Green: auithdns-update to deploy saiph->frpig2001 rename
  • 13:02 godog: upload prometheus-statsd-exporter 0.7.0 - T205870
  • 12:45 banyek: rebooting db2096
  • 12:44 gehel: reseting kafka offsets on wdqs public cluster
  • 12:44 elukey: complete rolling restart of eventbus on kafka[12]00[1-3] for python security upgrades (only codfw was done)
  • 12:41 elukey: upgrade prometheus-memcached-exporter on swift and thumbor
  • 11:57 Amir1: start of mwscript deleteLocalPasswords.php --delete --batch-size 200 on all wikis
  • 11:38 zeljkof: EU SWAT finished
  • 11:29 hoo: Started rebuildItemsPerSite on mwmaint1002 (T44325). Can be killed at any time, if necessary.
  • 11:26 zfilipin@deploy1001: scap failed: average error rate on 4/11 canaries increased by 10x (rerun with --force to override this check, see https://logstash.wikimedia.org/goto/db09a36be5ed3e81155041f7d46ad040 for details)
  • 11:09 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Enable reading from ct_tag_id in s7 (T194164) (duration: 00m 49s)
  • 10:57 moritzm: installing ghostscript security updates for jessie
  • 10:47 moritzm: installing tomcat7 security updates
  • 10:43 jdrewniak@deploy1001: Synchronized portals: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 00m 49s)
  • 10:42 jdrewniak@deploy1001: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 00m 49s)
  • 09:46 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1092 for recloning - T206743 (duration: 00m 49s)
  • 09:45 marostegui: Stop MySQL on db1116:3318 to reclone db1092
  • 09:41 banyek: max_binlog_size is set back to 1048576000 on ParseCache hosts (T206740)
  • 09:23 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Restore original weight for db1104 (duration: 00m 49s)
  • 09:00 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Increase traffic for db1104 (duration: 00m 48s)
  • 08:58 banyek@deploy1001: Synchronized wmf-config/db-codfw.php: T206593: depooling db2069 (duration: 00m 48s)
  • 08:50 elukey: restart hadoop yarn resource managers on an-master* to pick up new jvm settings
  • 08:49 XioNoX: repool eqsin - T206861
  • 08:48 banyek: depooling db2033 (T206593)
  • 08:46 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Slowly repool db1104 - T206743 (duration: 00m 49s)
  • 08:17 moritzm: installing imagemagick security update
  • 07:57 godog: reformat ms-be2040 with crc=1 finobt=0 - T199198
  • 07:32 banyek: reimaging db2096(T206593)
  • 07:31 banyek: reimaging db2096
  • 07:24 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1104 - T206743 (duration: 00m 48s)
  • 07:15 marostegui: Stop MySQL at db1116:3318 to clone db1104
  • 07:12 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1104 - T206743 (duration: 00m 49s)
  • 07:04 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Fully repool db1109 (duration: 00m 49s)
  • 06:55 XioNoX: add v6 monitoring for mr1-ulsfo OOB - T206778
  • 06:49 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Increase weight for db1109 (duration: 00m 49s)
  • 06:40 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Slowly repool db1109 (duration: 00m 50s)
  • 05:20 kartik@deploy1001: Finished deploy [cxserver/deploy@fd74c3b]: Update cxserver to b51f363 (T203077, T99934, T203550) (duration: 04m 25s)
  • 05:16 kartik@deploy1001: Started deploy [cxserver/deploy@fd74c3b]: Update cxserver to b51f363 (T203077, T99934, T203550)
  • 05:16 marostegui: Stop MySQL on db1109 for recloning - T206743
  • 05:14 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1109 (duration: 00m 50s)
  • 05:11 marostegui: Stop MySQL on db1116:3318 to use it to clone db1109
  • 03:18 kartik@deploy1001: Finished deploy [cxserver/deploy@5a70ef1]: Update cxserver to 47a864b (T205420, T203077, T205700, T205616) (duration: 04m 44s)
  • 03:14 kartik@deploy1001: Started deploy [cxserver/deploy@5a70ef1]: Update cxserver to 47a864b (T205420, T203077, T205700, T205616)
  • 00:45 krinkle@deploy1001: Synchronized multiversion/MWRealm.php: I79fb3d194a58: use env.php (duration: 00m 49s)
  • 00:08 krinkle@deploy1001: Synchronized wmf-config/: I79fb3d194a: add env.php file (not yet used) (duration: 00m 50s)

2018-10-14

  • 23:42 krinkle@deploy1001: Synchronized multiversion/getMWVersion: Ice9a74e73481 no-op (duration: 00m 49s)
  • 23:21 krinkle@deploy1001: Synchronized wmf-config/ProductionServices.php: If4d8faa4 (duration: 00m 48s)
  • 21:48 krinkle@deploy1001: Synchronized multiversion/MWMultiVersion.php: I83b2bdd53c13e (duration: 00m 50s)
  • 20:47 krinkle@deploy1001: Synchronized wmf-config/import.php: beta-only (duration: 00m 54s)
  • 16:34 volans: forcing a puppet run on all eqsin hosts with batch 1 to clear most of the alarms - T206861
  • 08:54 elukey: restart Yarn resource manager on an-master1002 to force an-master1001 to take the leadership back - T206943
  • 08:34 elukey: powercycle restbase1015 (frozen, no ssh, no metrics, no root console via serial available)
  • 00:48 krinkle@deploy1001: Synchronized php-1.32.0-wmf.24/extensions/CentralAuth/includes/specials/SpecialGlobalGroupMembership.php: T203767 - If2bfa092b (duration: 00m 50s)

2018-10-13

  • 23:37 krinkle@deploy1001: Synchronized wmf-config/InitialiseSettings.php: T45086 - I4857e8ac (duration: 00m 51s)
  • 03:07 bblack: eqsin repooled

2018-10-12

  • 18:56 brion: restarted vp9 background transcodes in eqiad, via mwmaint1002
  • 18:37 addshore: modified attachLatest.php script finished running over 9395 pages T206743
  • 18:25 addshore: running modified attachLatest.php script over ~9000 pages on wikidatawiki (with added wait for slaves) T206743
  • 15:50 mutante: repair /dev/sde1 on ms-be2041 - T199198
  • 15:48 mutante: repair /dev/sdh1 on ms-be1043 - T199198
  • 14:23 _joe_: depooling eqsin via geodns due to loss of power redundancy
  • 13:35 gehel: repooling wdqs1003 catched up on lag
  • 12:59 gehel: depooling wdqs1003 to catch up on lag
  • 12:20 bblack: uploading gdnsd 2.99.9942-beta-1+wmf1 to stretch-wikimedia
  • 10:51 _joe_: depooling mw2252 for mcrouter tests T203786
  • 10:27 hoo: Updated the Wikidata property suggester with data from Monday's JSON dump and applied the T132839 workarounds
  • 10:08 addshore@deploy1001: Synchronized php-1.32.0-wmf.24/extensions/WikimediaEvents/extension.json: T205283 gerrit:466843 Update Schema:WMDEBannerEvents rev to 18437830 (duration: 00m 52s)
  • 09:01 elukey: rolling restart of eventbus on kafka[1,2]00[1-3] to pick up python security upgrades
  • 05:54 moritzm: installing git security updates on trusty
  • 02:25 ejegg: updated fundraising tools from 3754f32 to 5a2d39b

2018-10-11

  • 23:33 Reedy: ran mwscript extensions/ShortUrl/populateShortUrlTable.php --wiki=gomwiki T206741
  • 23:32 reedy@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enable shorturl on gomwiki (duration: 00m 48s)
  • 23:30 Reedy: created shorturl table on gomwiki T206741
  • 23:26 reedy@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enable FileExporter to Meta-Wiki (duration: 00m 49s)
  • 23:21 reedy@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Disable CongressLookup (duration: 00m 49s)
  • 23:05 krinkle@deploy1001: Synchronized php-1.32.0-wmf.24/includes/jobqueue/jobs/ThumbnailRenderJob.php: T203135 - Ib4640e (duration: 00m 49s)
  • 22:56 dzahn@neodymium: conftool action : set/pooled=inactive; selector: name=mwmaint1001.eqiad.wmnet
  • 22:53 mutante: netbox - correction, mwmaint1001 to status "Staged", following new lifecycle docs T192457
  • 22:50 mutante: netbox - renamed mwmaint1001 to mw1279, changed status to inventory, renamed in DNS - T192457
  • 22:45 krinkle@deploy1001: Synchronized php-1.32.0-wmf.24/includes/Revision/RenderedRevision.php: I553dba13486 (duration: 00m 51s)
  • 22:30 mutante: mwmaint1001 - shutting down after final backup of /home, renaming back to mw1297 in DNS and DHCP, and reinstalling (T192457)
  • 21:53 mutante: mwmaint1001 - schduled downtime, is being renamed back to mw1297 and reinstalled
  • 21:47 mutante: mwmaint2001 - rsyncing home dirs from mwmaint1002 to /root/home-mwmaint1002 (which includes home-terbium even!) in case anyone is missing anything from one of mwaint*
  • 21:41 mutante: mwmaint2001 - deleting 60G of unneeded files from home
  • 20:37 XioNoX: add IPv6 to mr1-ulsfo OOB - T206778
  • 18:46 sbisson@deploy1001: Synchronized php-1.32.0-wmf.24/extensions/PageTriage/: SWAT: Handle page that are unnominated for deletion (duration: 00m 50s)
  • 18:34 sbisson@deploy1001: Synchronized php-1.32.0-wmf.24/extensions/PageTriage/modules/ext.pageTriage.views.list/ext.pageTriage.listControlNav.js: SWAT: Default to deleted and others when no type is selected on mode switch (duration: 00m 50s)
  • 18:22 sbisson@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Remove config for RCFilters variables being removed from Core (duration: 00m 49s)
  • 18:14 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Repool db2083 and db2085:3318 (duration: 00m 48s)
  • 18:13 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1101:3318 (duration: 00m 49s)
  • 18:09 sbisson@deploy1001: Synchronized wmf-config/CommonSettings.php: SWAT: Add copyviobot group management to relevant wikis (duration: 00m 49s)
  • 17:36 gehel: repooling wdqs1003, catched up on lag
  • away: automated binlog purging started on pc2004, pc2005, pc2006
  • 16:54 gehel: depooling wdqs1003 to let it catch up on lag
  • 15:38 jynus@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1087 (duration: 00m 50s)
  • 15:12 marostegui: Stop MySQL on db2085:3318 to reclone db1101:3318 - T206743
  • 15:11 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1101:3318 (duration: 00m 49s)
  • 15:04 akosiaris: Media storage/Swift Swift set to active/passive
  • 15:01 akosiaris: Media storage/Swift Swift set to active/active
  • 14:56 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1099:3318 (duration: 00m 48s)
  • 14:52 jynus: deploying wikidata row fix to db1087 with replication enabled
  • 14:47 END: (PASS) - Cookbook sre.switchdc.services.02-restore-ttl (exit_code=0) (volans@neodymium)
  • 14:47 START: - Cookbook sre.switchdc.services.02-restore-ttl (volans@neodymium)
  • 14:36 END: (PASS) - Cookbook sre.switchdc.services.01-switch-dc (exit_code=0) (volans@neodymium)
  • 14:36 Switching: services parsoid, restbase, restbase-async, mobileapps, apertium, citoid, cxserver, eventstreams, graphoid, mathoid, proton, pdfrender, recommendation-api, zotero, eventbus, ores, wdqs, wdqs-internal: codfw => eqiad (volans@neodymium)
  • 14:36 START: - Cookbook sre.switchdc.services.01-switch-dc (volans@neodymium)
  • 14:35 END: (PASS) - Cookbook sre.switchdc.services.00-reduce-ttl-and-sleep (exit_code=0) (volans@neodymium)
  • 14:30 banyek@deploy1001: Synchronized wmf-config/db-eqiad.php: T206743: mariadb: Depool db1087 (duration: 00m 49s)
  • 14:30 START: - Cookbook sre.switchdc.services.00-reduce-ttl-and-sleep (volans@neodymium)
  • 14:28 banyek: depooling db1087 (T206743)
  • 14:28 banyek: depooling db1087
  • 14:15 elukey: reboot eventlog1002 for kernel upgrades
  • 14:15 jynus: applying row filling to (most) eqiad s8 dbs, including the mater
  • 14:13 moritzm: install libxml2 security updates on jessie servers
  • 13:55 jynus: recovering rows to db1092
  • 13:26 jynus: filling in missing rows on dbstore1002
  • 13:23 marostegui: Stop MySQL on db2083 to reclone db1116:3318 - T206743
  • 13:21 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Depool db2083 (duration: 00m 49s)
  • 13:20 marostegui: Stop MySQL on db1116:3318 to reclone it from db2083 - T206743
  • 12:43 elukey: upgrade prometheus-memcached-exporter on mc1*
  • 12:38 elukey: upgrade prometheus-memcached-exporter on mc2*
  • 12:15 elukey: upgrade prometheus-memcached-exporter on mc2035
  • 12:14 elukey: upload prometheus-memcached-exporter_0.4.1+git20181010.2fa99eb-1 to (jessie|stretch)-wikimedia
  • 12:12 Amir1: EU SWAT is done
  • 11:30 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Set some small wikis to read new for change tag backend (T194164) (duration: 00m 50s)
  • 11:10 marostegui: Stop MYSQL on db2085:3318 and db1099:3318 T206743
  • 11:09 marostegui: Stop MYSQL on db2088:3318 and db1099:3318 T206743
  • 11:08 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Depool db2085:3318 and db1099:3318 (duration: 00m 49s)
  • 11:07 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db2085:3318 and db1099:3318 (duration: 00m 49s)
  • 11:07 banyek: binlog expiration set to 60 days on db2045
  • 08:30 banyek: setting up some automated binlog purge mechanism on pc1004,pc1005,pc1006
  • 08:26 jynus: setting up replication from pc2005 -> pc1005 and from pc2006 -> pc2006
  • 08:20 jynus: setting up replication from pc2004 -> pc1004
  • 08:04 banyek: purging binary logs on pc1006
  • 08:04 banyek: purging binary logs on pc1005
  • 08:04 jynus: running /usr/local/bin/mwscript purgeParserCache.php --wiki=aawiki --age=1900800 --msleep 0
  • 08:04 banyek: purging binary logs on pc1004
  • 07:57 gehel: rolling restart blazegraph on wdqs-internal for config change - T206648
  • 07:43 addshore: deploy https://gerrit.wikimedia.org/r/#/c/mediawiki/extensions/Wikibase/+/466031 to mwmaint1002 only (increasing tracking of wikidata dispatching) T205865
  • 07:36 elukey: roll restart of aqs on aqs100[4-9] to pick up new Druid settings
  • 06:00 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Increase db1092 weight (duration: 00m 49s)
  • 05:43 marostegui: Purge binary logs on pc2005 due to disk space issues - T206740
  • 05:29 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1109 (duration: 00m 48s)
  • 05:24 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1109 (duration: 00m 51s)
  • 02:25 krinkle@deploy1001: Synchronized w/static.php: T127233 - Ic6acb70 (duration: 00m 49s)
  • 02:10 krinkle@deploy1001: Synchronized php-1.32.0-wmf.24/includes/page/WikiPage.php: T203942 - Ib211d98498f (duration: 00m 49s)
  • 02:07 krinkle@deploy1001: Synchronized php-1.32.0-wmf.24/tests/phpunit/includes/page/: Ib211d98498f (duration: 00m 49s)
  • 01:38 krinkle@deploy1001: Synchronized wmf-config/etcd.php: T176370 - I5e7e5d167d517 (duration: 00m 55s)

2018-10-10

  • 23:08 krinkle@deploy1001: Synchronized php-1.32.0-wmf.24/maintenance/resources/foreign-resources.yaml: Ic865e7077d (duration: 00m 49s)
  • 22:59 krinkle@deploy1001: Synchronized php-1.32.0-wmf.24/extensions/MultimediaViewer/: T206099 - I53dbce0a (duration: 00m 49s)
  • 22:43 krinkle@deploy1001: Synchronized php-1.32.0-wmf.24/includes/specials/SpecialDeletedContributions.php: T187619 - Ic6b0d8020553 (duration: 00m 48s)
  • 22:41 krinkle@deploy1001: Synchronized php-1.32.0-wmf.24/extensions/ORES/includes/FetchScoreJob.php: T204753 - Icc28230585bc (duration: 00m 49s)
  • 22:25 mutante: icinga1001 - chmod 2710 /var/lib/icinga/rw
  • 22:16 krinkle@deploy1001: Synchronized wmf-config/arclamp.php: T206092 - If607ad111a (duration: 00m 48s)
  • 21:51 krinkle@deploy1001: Synchronized php-1.32.0-wmf.24/extensions/ContentTranslation/specials/SpecialContentTranslation.php: T205433 - Ib34b28 (duration: 00m 49s)
  • 21:48 krinkle@deploy1001: Synchronized php-1.32.0-wmf.24/extensions/Echo/includes/DiscussionParser.php: T204291 - Ia5323b401b94 (duration: 00m 51s)
  • 21:45 XioNoX: Add icinga1001 to mr* security policies - T206704
  • 20:34 thcipriani: upgrading ci jenkins install on contint1001
  • 20:19 thcipriani: upgrading releases-jenkins jenkins install on releases1001
  • 20:17 thcipriani: upgrading releases-jenkins jenkins install on releases2001
  • 19:58 mutante: icinga - enabled icinga service on icinga1001 (stretch), but all notifications are disabled
  • 19:43 mutante: awight restarted ORES celery workers on ores2003 (~17:00), ores200* (17:05)
  • 19:35 kaldari@deploy1001: Finished scap: (no justification provided) (duration: 22m 05s)
  • 19:13 kaldari@deploy1001: Started scap: (no justification provided)
  • 19:11 kaldari: scap sync to rebuild i18n cache
  • 18:35 XioNoX: disable VC port 1/2 on asw2-c-eqiad:fpc3 (to fpc8)
  • 18:20 otto@deploy1001: Finished deploy [analytics/refinery@28bbee8]: Add accept header to webrequest logs - T170606 (duration: 10m 34s)
  • 18:19 XioNoX: delete sessions to AS6805 on cr2-esams (left AMS-IX)
  • 18:10 otto@deploy1001: Started deploy [analytics/refinery@28bbee8]: Add accept header to webrequest logs - T170606
  • 18:09 otto@deploy1001: Finished deploy [analytics/refinery@4e2d956]: Add accept header to webrequest logs - T170606 (duration: 04m 35s)
  • 18:05 otto@deploy1001: Started deploy [analytics/refinery@4e2d956]: Add accept header to webrequest logs - T170606
  • 17:49 XioNoX: replace 10.195.0.0/25 with 10.195.0.0/24 in prefix-list fundraising-codfw4 on cr1/2-codfw - T206637
  • 16:25 mutante: LDAP - added isaacj to wmf group (for SWAP access, existing shell user since recently) (T206631) (T205840)
  • 16:16 _joe_: restart of now-unused jobqueue redises for stopping the alerts post-switchover
  • 16:09 ejegg: updated CiviCRM from 1165e7ed79 to 4cc21d61c5
  • 15:59 vgutierrez: Uploaded certcentral 0.1 to apt.wikimedia.org (stretch) - T199711
  • 15:55 cmjohnson1: scheduled downtime for host cloudvirt1019 swap raid card T196507
  • 15:35 moritzm: uploaded jenkins 2.138.2 security release to apt.wikimedia.org (jessie/stretch) (T206234)
  • 15:11 _joe_: started again hhvm on mwmaint2001
  • 14:51 ejegg: turned fundraising scheduled jobs back on
  • 14:43 ejegg: turned off fundraising scheduled jobs
  • 14:42 END: (PASS) - Cookbook sre.switchdc.mediawiki.08-restore-ttl (exit_code=0) (volans@neodymium)
  • 14:42 START: - Cookbook sre.switchdc.mediawiki.08-restore-ttl (volans@neodymium)
  • 14:42 END: (FAIL) - Cookbook sre.switchdc.mediawiki.08-start-maintenance (exit_code=99) (volans@neodymium)
  • 14:40 START: - Cookbook sre.switchdc.mediawiki.08-start-maintenance (volans@neodymium)
  • 14:39 END: (FAIL) - Cookbook sre.switchdc.mediawiki.08-start-maintenance (exit_code=99) (volans@neodymium)
  • 14:38 START: - Cookbook sre.switchdc.mediawiki.08-start-maintenance (volans@neodymium)
  • 14:33 oblivian@puppetmaster1001: conftool action : set/weight=15; selector: cluster=api_appserver,service=apache2,dc=eqiad,name=mw123.*
  • 14:31 oblivian@puppetmaster1001: conftool action : set/weight=15; selector: cluster=api_appserver,service=apache2,dc=eqiad,name=mw122.*
  • 14:19 END: (PASS) - Cookbook sre.switchdc.mediawiki.08-update-tendril (exit_code=0) (volans@neodymium)
  • 14:19 START: - Cookbook sre.switchdc.mediawiki.08-update-tendril (volans@neodymium)
  • 14:18 END: (PASS) - Cookbook sre.switchdc.mediawiki.07-set-readwrite (exit_code=0) (volans@neodymium)
  • 14:18 MediaWiki: read-only period ends at: 2018-10-10 14:18:26.908958 (volans@neodymium)
  • 14:18 START: - Cookbook sre.switchdc.mediawiki.07-set-readwrite (volans@neodymium)
  • 14:18 END: (PASS) - Cookbook sre.switchdc.mediawiki.06-set-db-readwrite (exit_code=0) (volans@neodymium)
  • 14:18 START: - Cookbook sre.switchdc.mediawiki.06-set-db-readwrite (volans@neodymium)
  • 14:17 END: (PASS) - Cookbook sre.switchdc.mediawiki.05-invert-redis-sessions (exit_code=0) (volans@neodymium)
  • 14:17 START: - Cookbook sre.switchdc.mediawiki.05-invert-redis-sessions (volans@neodymium)
  • 14:17 END: (PASS) - Cookbook sre.switchdc.mediawiki.04-switch-traffic (exit_code=0) (volans@neodymium)
  • 14:15 START: - Cookbook sre.switchdc.mediawiki.04-switch-traffic (volans@neodymium)
  • 14:15 END: (PASS) - Cookbook sre.switchdc.mediawiki.04-switch-mediawiki (exit_code=0) (volans@neodymium)
  • 14:14 START: - Cookbook sre.switchdc.mediawiki.04-switch-mediawiki (volans@neodymium)
  • 14:14 END: (PASS) - Cookbook sre.switchdc.mediawiki.03-set-db-readonly (exit_code=0) (volans@neodymium)
  • 14:14 START: - Cookbook sre.switchdc.mediawiki.03-set-db-readonly (volans@neodymium)
  • 14:14 END: (PASS) - Cookbook sre.switchdc.mediawiki.02-set-readonly (exit_code=0) (volans@neodymium)
  • 14:13 MediaWiki: read-only period starts at: 2018-10-10 14:13:46.068081 (volans@neodymium)
  • 14:13 START: - Cookbook sre.switchdc.mediawiki.02-set-readonly (volans@neodymium)
  • 14:10 END: (PASS) - Cookbook sre.switchdc.mediawiki.01-stop-maintenance (exit_code=0) (volans@neodymium)
  • 14:10 START: - Cookbook sre.switchdc.mediawiki.01-stop-maintenance (volans@neodymium)
  • 14:10 END: (PASS) - Cookbook sre.switchdc.mediawiki.00-warmup-caches (exit_code=0) (volans@neodymium)
  • 14:07 START: - Cookbook sre.switchdc.mediawiki.00-warmup-caches (volans@neodymium)
  • 14:07 END: (PASS) - Cookbook sre.switchdc.mediawiki.00-warmup-caches (exit_code=0) (volans@neodymium)
  • 14:05 START: - Cookbook sre.switchdc.mediawiki.00-warmup-caches (volans@neodymium)
  • 14:05 END: (PASS) - Cookbook sre.switchdc.mediawiki.00-warmup-caches (exit_code=0) (volans@neodymium)
  • 14:01 START: - Cookbook sre.switchdc.mediawiki.00-warmup-caches (volans@neodymium)
  • 14:01 END: (PASS) - Cookbook sre.switchdc.mediawiki.00-reduce-ttl (exit_code=0) (volans@neodymium)
  • 14:01 START: - Cookbook sre.switchdc.mediawiki.00-reduce-ttl (volans@neodymium)
  • 14:00 END: (PASS) - Cookbook sre.switchdc.mediawiki.00-disable-puppet (exit_code=0) (volans@neodymium)
  • 14:00 START: - Cookbook sre.switchdc.mediawiki.00-disable-puppet (volans@neodymium)
  • 12:18 _joe_: decommissioning conf1001-1003: stopping etcd, nginx, and masking both
  • 11:41 jynus: renaming some s3 wiki tables on eqiad master to prevent split brain T184805
  • 11:29 zeljkof: EU SWAT finished
  • 11:26 zfilipin@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Permissions changes on itwikibooks (T206447) (duration: 00m 57s)
  • 10:54 marostegui: Set a replication filter on db1075 (s3 eqiad) to ignore enwikivoyage, cebwiki, shwiki, srwiki & mgwiktionary - T184805
  • 10:49 marostegui@deploy1001: Synchronized dblists/s5.dblist: Update s5.dblist to reflect the wikis moved from s3 - T184805 (duration: 00m 56s)
  • 10:48 marostegui@deploy1001: Synchronized dblists/s3.dblist: Update s3.dblist to reflect the wikis moved to s5 - T184805 (duration: 00m 58s)
  • 09:12 ema: Traffic: move restbase back to eqiad T203777
  • 09:07 ema: Traffic: set services active/active T203777
  • 09:00 ema: Traffic: route esams caches back to eqiad T203777
  • 08:27 moritzm: installing fuse security updates
  • 08:07 ariel@deploy1001: Finished deploy [dumps/dumps@0714a93]: fix adds/changes dumps generation when prev run is missing (duration: 00m 06s)
  • 08:07 ariel@deploy1001: Started deploy [dumps/dumps@0714a93]: fix adds/changes dumps generation when prev run is missing
  • 08:01 moritzm: rolling out debdeploy 0.0.99.6
  • 07:51 elukey: cleaned up some log files from eventlog1002
  • 02:55 ejegg: updated payments-wiki from 1472604b6e to 7fb1aae963
  • 00:19 krinkle@deploy1001: Synchronized php-1.32.0-wmf.24/includes/utils/UIDGenerator.php: T94522 - I2a0c51bea58 (duration: 00m 56s)
  • 00:15 krinkle@deploy1001: sync-file aborted: T205567 - I75f1eb6dc2cb (duration: 00m 01s)
  • 00:14 krinkle@deploy1001: Synchronized php-1.32.0-wmf.24/tests/phpunit/includes/utils/: T94522 - I2a0c51bea58 (duration: 01m 02s)

2018-10-09

  • 22:58 SMalyshev: repooled wdqs2003
  • 22:26 shdubsh: repairing /dev/sdl1 on ms-be2040 - T199198
  • 21:52 bblack: cp1085: varnish backend restart for mbox lag
  • 21:50 mutante: releases1001 - restarted jenkins (it went from 200 -> 503 -> 403) curl localhost:8080 works again after restart, icinga check still getting 403 now
  • food: updated fundraising CiviCRM from 7a0d14015e to 1165e7ed79
  • 20:08 mutante: repair /dev/sdg1 on ms-be2041 - T199198
  • 19:37 XioNoX: disable igmp-snooping on asw2-c-eqiad - T201039
  • 19:25 XioNoX: disable igmp-snooping on asw2-b-eqiad - T201039
  • 19:20 XioNoX: bounce igmp-snooping on asw2-b-eqiad
  • 18:24 ottomata: adding Accept header to all varnishkafka generated webrequest logs
  • 17:21 SMalyshev: depooled wdq23 again, sigh
  • 13:54 moritzm: rebooting prometheus1004 for kernel security update
  • 13:41 moritzm: rebooting prometheus1003 for kernel security update
  • 13:28 moritzm: rebooting prometheus2004 for kernel security update
  • 13:13 moritzm: rebooting prometheus2003 for kernel security update
  • 12:54 gehel: silencing wdqs-public lag alerts (service still functional, and SLO unclear) - T199228
  • 12:45 moritzm: installing imagemagick security updates
  • 11:47 END: (ERROR) - Cookbook sre.switchdc.services.00-reduce-ttl-and-sleep (exit_code=2) (volans@neodymium)
  • 11:47 START: - Cookbook sre.switchdc.services.00-reduce-ttl-and-sleep (volans@neodymium)
  • 11:45 akosiaris: dry-run services switchover from codfw to eqiad in preparation for Thursday
  • 11:37 END: (ERROR) - Cookbook sre.switchdc.services.00-reduce-ttl-and-sleep (exit_code=2) (volans@neodymium)
  • 11:37 START: - Cookbook sre.switchdc.services.00-reduce-ttl-and-sleep (volans@neodymium)
  • 11:14 volans: live-test of the inverted switchdc (eqiad->codfw) completed, all good - T203777
  • 11:14 END: (PASS) - Cookbook sre.switchdc.mediawiki.08-update-tendril (exit_code=0) (volans@neodymium)
  • 11:13 START: - Cookbook sre.switchdc.mediawiki.08-update-tendril (volans@neodymium)
  • 11:12 END: (PASS) - Cookbook sre.switchdc.mediawiki.08-start-maintenance (exit_code=0) (volans@neodymium)
  • 11:11 START: - Cookbook sre.switchdc.mediawiki.08-start-maintenance (volans@neodymium)
  • 11:11 END: (PASS) - Cookbook sre.switchdc.mediawiki.08-restore-ttl (exit_code=0) (volans@neodymium)
  • 11:11 START: - Cookbook sre.switchdc.mediawiki.08-restore-ttl (volans@neodymium)
  • 11:11 END: (PASS) - Cookbook sre.switchdc.mediawiki.07-set-readwrite (exit_code=0) (volans@neodymium)
  • 11:11 [DRY-RUN]: MediaWiki read-only period ends at: 2018-10-09 11:11:05.042622 (volans@neodymium)
  • 11:11 START: - Cookbook sre.switchdc.mediawiki.07-set-readwrite (volans@neodymium)
  • 11:08 END: (PASS) - Cookbook sre.switchdc.mediawiki.06-set-db-readwrite (exit_code=0) (volans@neodymium)
  • 11:08 START: - Cookbook sre.switchdc.mediawiki.06-set-db-readwrite (volans@neodymium)
  • 11:07 END: (PASS) - Cookbook sre.switchdc.mediawiki.05-invert-redis-sessions (exit_code=0) (volans@neodymium)
  • 11:07 START: - Cookbook sre.switchdc.mediawiki.05-invert-redis-sessions (volans@neodymium)
  • 11:06 END: (PASS) - Cookbook sre.switchdc.mediawiki.04-switch-traffic (exit_code=0) (volans@neodymium)
  • 11:04 START: - Cookbook sre.switchdc.mediawiki.04-switch-traffic (volans@neodymium)
  • 11:03 END: (PASS) - Cookbook sre.switchdc.mediawiki.04-switch-mediawiki (exit_code=0) (volans@neodymium)
  • 11:03 START: - Cookbook sre.switchdc.mediawiki.04-switch-mediawiki (volans@neodymium)
  • 11:00 END: (PASS) - Cookbook sre.switchdc.mediawiki.03-set-db-readonly (exit_code=0) (volans@neodymium)
  • 10:59 START: - Cookbook sre.switchdc.mediawiki.03-set-db-readonly (volans@neodymium)
  • 10:56 END: (PASS) - Cookbook sre.switchdc.mediawiki.02-set-readonly (exit_code=0) (volans@neodymium)
  • 10:56 [DRY-RUN]: MediaWiki read-only period starts at: 2018-10-09 10:56:12.213026 (volans@neodymium)
  • 10:56 START: - Cookbook sre.switchdc.mediawiki.02-set-readonly (volans@neodymium)
  • 10:53 END: (PASS) - Cookbook sre.switchdc.mediawiki.01-stop-maintenance (exit_code=0) (volans@neodymium)
  • 10:53 START: - Cookbook sre.switchdc.mediawiki.01-stop-maintenance (volans@neodymium)
  • 10:51 END: (PASS) - Cookbook sre.switchdc.mediawiki.00-warmup-caches (exit_code=0) (volans@neodymium)
  • 10:49 onimisionipe: repooling wdqs2001 catched up on lag - T206423
  • 10:48 START: - Cookbook sre.switchdc.mediawiki.00-warmup-caches (volans@neodymium)
  • 10:47 END: (PASS) - Cookbook sre.switchdc.mediawiki.00-warmup-caches (exit_code=0) (volans@neodymium)
  • 10:41 START: - Cookbook sre.switchdc.mediawiki.00-warmup-caches (volans@neodymium)
  • 10:40 END: (PASS) - Cookbook sre.switchdc.mediawiki.00-reduce-ttl (exit_code=0) (volans@neodymium)
  • 10:40 START: - Cookbook sre.switchdc.mediawiki.00-reduce-ttl (volans@neodymium)
  • 10:37 END: (PASS) - Cookbook sre.switchdc.mediawiki.00-disable-puppet (exit_code=0) (volans@neodymium)
  • 10:36 START: - Cookbook sre.switchdc.mediawiki.00-disable-puppet (volans@neodymium)
  • 10:35 onimisionipe: deploying prometheus-blazegraph-exporter 0.6 on all wdqs clusters - T206123
  • 10:34 volans: about to perform live-test of the inverted switchdc (eqiad->codfw), actions will be real but basically noop due to codfw being already active - T203777
  • 09:25 elukey: swapped Hadoop's hive/oozie from analytics1003 to an-coord1001
  • 09:16 ema: restart pybal on lvs1005 to pick up config changes (conf2001 -> conf1004)
  • 09:00 ema: re-enable puppet/pybal on lvs1002, IPv6 connectivity with phab1001 working again T201039
  • 08:16 elukey: update puppet compiler facts
  • 08:06 onimisionipe: depooling wdqs2001 to catch up on lag -T206423
  • 07:03 akosiaris: restart zuul and zuul-merger on contint1001 for the upgrade of zuul to finish
  • 06:37 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1122 (duration: 00m 57s)
  • 05:19 marostegui: Stop MySQL on db1122 for binlog format change, mysql and kernel upgrade
  • 05:14 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1122 (duration: 00m 59s)
  • 02:41 krinkle@deploy1001: Synchronized wmf-config/profiler.php: T176916 / T206092 - Ie86e88777c48 (duration: 00m 56s)
  • 02:21 krinkle@deploy1001: Synchronized wmf-config/arclamp.php: T176916 - Id79baae90: ensure file exists before Ie86e88777c48 (duration: 00m 57s)
  • 00:04 krinkle@deploy1001: Synchronized php-1.32.0-wmf.24/includes/libs/rdbms/database: T201900 - I8ae754a2518 (duration: 00m 59s)

2018-10-08

  • 22:45 XioNoX: increase accepted-prefix-limit for 24115 on cr4-ulsfo
  • 22:41 XioNoX: clear BGP neighbor cr1-eqsin:AS9583 (bgp limit threshold reached)
  • 21:11 ejegg: updated payments-wiki from d623de9494 to 1472604b6e
  • 20:42 gehel: repooling wdqs2003 catched up on lag - T206423
  • 19:41 XioNoX: troubleshooting asw2-b-eqid with JTAC - T201039
  • 19:08 gehel: depooling wdqs2003 to catch up on lag -T206423
  • 19:00 tgr@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Enable MCR read-new mode on some small wikis (T198308) (duration: 00m 56s)
  • 18:55 onimisionipe@deploy1001: Finished deploy [wdqs/wdqs@bd698bd]: WDQS deployment - New federation whitelist entries (duration: 10m 07s)
  • 18:45 onimisionipe@deploy1001: Started deploy [wdqs/wdqs@bd698bd]: WDQS deployment - New federation whitelist entries
  • 18:37 onimisionipe@deploy1001: Finished deploy [wdqs/wdqs@bd698bd]: WDQS test deployment - New federation whitelist entries(wdqs1009) (duration: 00m 33s)
  • 18:37 onimisionipe@deploy1001: Started deploy [wdqs/wdqs@bd698bd]: WDQS test deployment - New federation whitelist entries(wdqs1009)
  • 18:36 tgr@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Enable Extension:File exporter to mrwikipedia (T206437) (duration: 00m 57s)
  • 16:29 XioNoX: push firewall filter counters on asw2-b-eqiad - T201039
  • 16:28 elukey: restart eventlogging on eventlog1002 for python security upgrades
  • 14:05 banyek@deploy1001: Synchronized wmf-config/db-eqiad.php: T184805: Revert 'mariadb: Depool db1110 for testing s3 imports' (duration: 00m 57s)
  • 14:03 banyek@deploy1001: Synchronized wmf-config/db-eqiad.php: T184805: Revert 'mariadb: Depool db1110 for testing s3 imports' (duration: 00m 56s)
  • 13:43 elukey: restart confd on esams nodes to pick up new srv settings
  • 13:41 elukey: restart navtiming.service on webperf1001 to pick up the dns change for etcd
  • 13:39 marostegui: Enable gtid on the following slaves: db2068 db1122 db1117:3323
  • 13:37 elukey: restart confd on all the other eqiad nodes to pick up new srv records
  • 13:32 elukey: restart confd on cp1* to pick up new srv records
  • 13:11 _joe_: purging the dnsrec cache for eqiad,esams etcd client SRV records
  • 13:09 ema: depool eqiad front-edge traffic T201039
  • 13:05 banyek: converting cebwiki.templatelinks to TokuDB on host dbstore1002.eqiad.wmnet (T205544)
  • 13:04 banyek: downtime notifications for dbstore1002 repliaction threads (T205544)
  • 12:49 banyek: pt-kill-wmf enabled on the wikireplicas (T203674)
  • 11:59 _joe_: restart pybal in esams, after running puppet, to switch etcd cluster used
  • 11:46 _joe_: restart pybal on lvs1001
  • 11:46 addshore: SWAT done
  • 11:45 addshore@deploy1001: Synchronized wmf-config/throttle.php: Add throttle exception for Netherlands Hackathon October 2018 - Wiki Techstorm T206241, and remove other rules. (duration: 00m 56s)
  • 11:39 addshore: addshore@mwmaint2001:~$ mwscript namespaceDupes.php --wiki fywiktionary --fix --add-prefix=T202769 # T202769
  • 11:35 addshore: addshore@mwmaint2001:~$ mwscript namespaceDupes.php --wiki fywiktionary --fix # Finished, still 111 pages to fix
  • 11:34 addshore: addshore@mwmaint2001:~$ mwscript namespaceDupes.php --wiki fywiktionary --fix # Started
  • 11:33 addshore: addshore@mwmaint2001:~$ mwscript namespaceDupes.php --wiki fywiktionary # (dryrun, 11529 links to fix, 11529 were resolvable.)
  • 11:32 addshore@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: gerrit:455249 Use translated MetaNamespace for fy.wiktionary T202769 (duration: 00m 58s)
  • 11:27 addshore@deploy1001: Synchronized wmf-config/flaggedrevs.php: SWAT: gerrit:464890 Remove the "reviewer" group at ruwikisource T205997 (duration: 00m 57s)
  • 10:41 elukey: restart mcrouter on mw2201 with more verbose logging settings as test
  • 09:55 moritzm: installing python3.5/python2.7 security updates
  • 09:51 godog: rebuild sdc sdh sdj sdi on ms-be2041 with crc=1 finobt=0 - T199198
  • 08:20 marostegui: Disable gtid on es2 and es3 eqiad master
  • 08:20 gehel@puppetmaster1001: conftool action : set/weight=15; selector: dc=codfw,cluster=wdqs,name=wdqs2001.codfw.wmnet
  • 08:20 gehel@puppetmaster1001: conftool action : set/weight=15; selector: dc=codfw,cluster=wdqs,name=wdqs2002.codfw.wmnet
  • 07:50 marostegui: Enabling replication eqiad -> codfw in preparation for DC failover
  • 07:40 marostegui: Disable GTID on s1,s2,s3,s4,s6,s7,s8 eqiad masters in preparation for enabling replication eqiad -> codfw
  • 07:39 _joe_: disabling puppet, doing etcd tests on lvs1006
  • 07:38 gehel@puppetmaster1001: conftool action : set/weight=15; selector: dc=codfw,cluster=wdqs,name=wdqs2002.eqiad.wmnet
  • 07:38 gehel@puppetmaster1001: conftool action : set/weight=15; selector: dc=codfw,cluster=wdqs,name=wdqs2001.eqiad.wmnet
  • 07:38 gehel: reducing relative weight of wdqs2003 in pybal - T206423
  • 07:27 banyek: enabling first time wmf-pt-kill on labsdb1010
  • 07:20 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1092 with low weight - T205514 (duration: 01m 27s)
  • 07:00 moritzm: installing git security updates

2018-10-07

  • 16:40 dereckson: Reset user email for account "Dominic Mayers" (T206421)
  • 16:35 elukey: run a script in tmux (my username) on mw2201 to poll the status of a mcrouter key/route every 10s using its admin api (very lightweight but kill if needed)
  • 14:52 onimisionipe: repooling wdqs2003. Catched up on Lag and also Lag issues seems to be creeping on wdqs200[1|2]
  • 04:29 SMalyshev: temp depooled wdqs2003
  • 03:12 ejegg: disabled all fundraising scheduled jobs - something that looks like disk issues on civi1001

2018-10-06

  • 21:20 gehel: repooling wdqs2003: catched up on updater lag
  • 20:43 _joe_: restarting apache2 on puppetmaster1001
  • 19:16 onimisionipe: depooling wdqs2003
  • 18:10 elukey: restart Yarn Resource Manager on an-master1002 to force an-master1001 to take the active role back (failed over due to a zk conn issue)
  • 17:07 onimisionipe: restarting wdqs-blazegraph on wdqs2003
  • 13:48 bblack: multatuli: update gdnsd package to 2.99.9930-beta-1+wmf1
  • 13:47 bblack: authdns1001: update gdnsd package to 2.99.9930-beta-1+wmf1 (correction to last msg)
  • 13:46 bblack: authdns1001: update gdnsd package to 2.99.9161-beta-1+wmf1
  • 12:57 bblack: rebooting cp1076
  • 12:49 bblack: depool cp1076, apparently has disk issues

2018-10-05

  • 23:50 bblack: <<<<<<< repooling eqiad edge caches, a few days ahead of intended switchback next Weds, to alleviate some traffic engineering concerns over the weekend >>>>>>
  • 20:48 mutante: T191183 - it's still showing the error page as before but that isn't due to apache issues, it just needs additional ferm rules
  • 20:44 mutante: gerrit - adding gerrit.wmfusercontent.org virtual host for avatars. applied first on gerrit2001, then on cobalt (T191183)
  • 20:03 ejegg: updated fundraising CiviCRM from ebc2e0076c to 7a0d14015e
  • 19:48 banyek: repooling labsdb1009 (T195747)
  • 19:44 smalyshev@deploy1001: Finished deploy [wdqs/wdqs@f8776de]: Redeploy 1009 (duration: 00m 26s)
  • 19:44 smalyshev@deploy1001: Started deploy [wdqs/wdqs@f8776de]: Redeploy 1009
  • 18:37 bblack: authdns2001: upgraded gdnsd to 2.99.9930-beta
  • 18:31 bblack: gdnsd-2.99.9930-beta-1+wmf1 uploaded to stretch-wikimedia
  • 18:26 mutante: icinga - noop on all servers, no change, puppet re-enabled, operations normal
  • 18:08 mutante: disabling puppet on icinga for 5 min for extra safety before a change that should be noop
  • 17:58 banyek: depooling labsdb1009 (T195747)
  • 17:50 banyek: repooling labsdb1011 (T195747)
  • 17:12 elukey: set etcd in codfw as read/write (was readonly) and eqiad as readonly (was read/write)
  • 14:57 banyek: depooling labsdb1011 (T195747)
  • 14:56 banyek: depooling labsdb1011
  • 13:26 banyek: adding wmf-pt-kill_2.2.20-1+wmf3 package for stretch
  • 13:25 moritzm: installing python3.5/2.7 security updates
  • 13:02 volans: upgraded spicerack to version 0.0.9 on sarin/neodymium/cumin* - T199079
  • 12:13 vgutierrez: Creating certcentral1001.eqiad.wmnet in ganeti - T206308
  • 12:12 vgutierrez: Creating certcentral2001.codfw.wmnet in ganeti - T206308
  • 11:59 elukey: deleted bohrium from ganeti via gnt-instance
  • 11:43 moritzm: rebooting wezen for kernel security update
  • 11:29 moritzm: rebooting ruthenium for kernel security update
  • 10:40 jynus: restarting replication on labsdb1010/1 on s3 and s5
  • 10:37 volans: uploaded spicerack_0.0.9-1{,+deb9u1} to apt.wikimedia.org {jessie,stretch}-wikimedia - T199079
  • 10:17 moritzm: rearmed keyholder on netmon2001
  • 10:10 elukey: restart confd on labs-puppetmaster to pick up new etcd settings (eqiad -> codfw)
  • 10:03 _joe_: restarting navtiming.service on webperf1001 to pick up the dns change for etcd
  • 09:37 elukey: restart rsyslog on lithium - broken connection to tegmen - T199406
  • 09:37 banyek: disabling puppet on labsdb1009,labsdb1010,labsdb1011 (T203674)
  • 09:36 banyek: adding wmf-pt-kill_2.2.20-1+wmf2 package for stretch
  • 09:16 volans: rebooting tegmen, console stuck, possible re-occurrence of T199413 (to be confirmed)
  • 09:12 jynus@deploy1001: Synchronized wmf-config/db-eqiad.php: Move some wikis for s3 to s5 (duration: 00m 56s)
  • 09:06 elukey: stop etcdmirror replication on conf2002
  • 09:05 _joe_: restarting confd on all nodes in eqiad and esams
  • 08:58 _joe_: wiped cached values for the read-only etcd SRV record
  • 08:56 _joe_: read-write connections to etcd only go to codfw now
  • 08:35 _joe_: reenabling notifications for etcdmirror on conf1005
  • 08:02 jynus: start replication on db1069 (x1)
  • 07:54 jynus: starting replicatios on db1075; db1070, db1070:s3 with disabled gtid
  • 07:50 jynus: stopping dbstore1001:x1
  • 07:33 jynus: chaning s3 master for db1070
  • 07:28 jynus: stopping s3 replication on db1070
  • 07:20 jynus: stopping x1 replication on db1069
  • 07:20 godog: temporarily stop prometheus on bast4001 to finalize data transfer - T179050
  • 07:19 jynus: stopping s3 replication on db1075
  • 07:18 jynus: stopping s5 replication on db1070
  • 07:09 moritzm: installing python3.4/2.7 security updates
  • 05:55 krinkle@deploy1001: Synchronized wmf-config/InitialiseSettings.php: T205599 - Ic28e00c30 (duration: 00m 57s)
  • 05:53 _joe_: upgrading python-etcd on conf1004-6, restarting etcdmirror
  • 05:13 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Clarify db1092 status - T205514 (duration: 00m 57s)
  • 04:18 krinkle@deploy1001: Synchronized php-1.32.0-wmf.24/includes/libs/filebackend/FileBackendStore.php: T205567 - I75f1eb6dc2cb (duration: 00m 56s)
  • 04:16 krinkle@deploy1001: Synchronized php-1.32.0-wmf.24/extensions/CirrusSearch/includes/DataSender.php: I0769c50c (duration: 01m 01s)
  • 00:31 mutante: LDAP: added user skvjold to group wmf (T204377)

2018-10-04

  • 22:51 ejegg: updated fundraising CiviCRM from 944b954bac to ebc2e0076c
  • 21:27 XioNoX: bounce phab1001 switch port - T201039
  • 20:47 ejegg: updated fundraising CiviCRM from ddf4865650 to 944b954bac
  • 20:23 mforns@deploy1001: Finished deploy [analytics/refinery@3eb9bf2]: deploying refinery together with refinery-source v0.0.76 (duration: 00m 17s)
  • 20:22 mforns@deploy1001: Started deploy [analytics/refinery@3eb9bf2]: deploying refinery together with refinery-source v0.0.76
  • 20:10 mforns@deploy1001: Finished deploy [analytics/refinery@3eb9bf2]: deploying refinery together with refinery-source v0.0.76 (duration: 14m 04s)
  • 19:56 mforns@deploy1001: Started deploy [analytics/refinery@3eb9bf2]: deploying refinery together with refinery-source v0.0.76
  • 19:30 marxarelli: rise in fatals "Fatal error: entire web request took longer than 60 seconds and timed out in /srv/mediawiki/php-1.32.0-wmf.24/includes/Title.php"
  • 19:26 dduvall@deploy1001: rebuilt and synchronized wikiversions files: all wikis to 1.32.0-wmf.24
  • 19:15 ppchelko@deploy1001: Finished deploy [cpjobqueue/deploy@6dc89c0]: Bump cirrusSearchLinksUpdate concurrency to 50 (duration: 00m 53s)
  • 19:14 ppchelko@deploy1001: Started deploy [cpjobqueue/deploy@6dc89c0]: Bump cirrusSearchLinksUpdate concurrency to 50
  • 18:49 sbisson@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:460202|]] (duration: 00m 59s)
  • 18:24 XioNoX: bounce lvs1002:eth1 switch port
  • 18:23 sbisson@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Enable PageTriage/ORES on enwiki (T206149) (duration: 01m 01s)
  • 18:21 bblack: lvs1002: puppet disabled, stopping pybal (fail to 1005)
  • 18:07 _joe_: disabled notifications for etcd replication lag on conf1005, not in production
  • 17:47 banyek: repooling labsb1010 (T195747)
  • 17:41 _joe_: uploaded new python-etcd packages for jessie, stretch
  • 17:38 XioNoX: asw2-b-eqiad recabling done - T201039
  • 17:34 elukey: pool kafka1002 (eventbus) after maintenance
  • 17:22 elukey: re-enable ircecho after alarms shower
  • 17:15 andrewbogott: triggering some alerts on labvirt1018 to figure out about alert thresholds
  • 17:06 elukey: stop ircecho on einstenium - alarms shower
  • 17:02 gtirloni: tools - published updated toollabs-* Docker images
  • 16:54 ejegg: updated standalone SmashPig deploy from 82f9d49c23 to 5f21d3f2db
  • 16:52 XioNoX: Step 3) Add missing links - T201039
  • 16:45 shdubsh: etherpad1001 running systemctl reset-failed
  • 16:41 XioNoX: Connect/enable fpc2:0/51-fpc5:1/0 (5m DAC) - T201039
  • 16:39 XioNoX: Enable fpc5-fpc7 - T201039
  • 16:33 twentyafterfour: started phd on phab1001 and re-enabled puppet (I had it disabled to prevent starting phd during read-only)
  • 16:25 twentyafterfour: phabricator is read-write
  • 16:21 jynus: reloading dbproxy1003,8
  • 16:16 marostegui: Stop and reboot db1072 (phabricator master) for maintenance
  • 16:16 twentyafterfour: phabricator is read-only
  • 16:14 XioNoX: Enable all VC ports on FPC2 and FPC7 - T201039
  • 16:13 XioNoX: starting asw2-b-eqiad re-cabling - T201039
  • 16:08 twentyafterfour: logged downtime for phabricator in icinga, stopped phd queue processing in preparation for read-only mode
  • 16:07 jynus: reloading haproxy @ dbproxy1005
  • 16:00 marostegui: Stop MySQL on db1073 for mariadb and kernel upgrade - T201039 T148507
  • 15:58 arturo: icinga downtime every server in the main cloudvps deployment for 2h T201039
  • 15:56 arturo: icinga downtime every server with the cloudXXXX scheme for 2h T201039
  • 15:54 ppchelko@deploy1001: Finished deploy [cpjobqueue/deploy@55dbb8b]: Proper reconnect on topics change T199444 (duration: 00m 55s)
  • 15:53 ppchelko@deploy1001: Started deploy [cpjobqueue/deploy@55dbb8b]: Proper reconnect on topics change T199444
  • 15:52 ppchelko@deploy1001: Finished deploy [changeprop/deploy@5d00448]: Proper reconnect on topics change T199444 (duration: 01m 40s)
  • 15:51 ppchelko@deploy1001: Started deploy [changeprop/deploy@5d00448]: Proper reconnect on topics change T199444
  • 15:41 elukey: depool kafka1002 from eventbus as precautionary step for T201039
  • 14:48 banyek: depooling labsb1010 (T195747)
  • 14:09 marostegui: Sanitize enwikivoyage cebwiki shwiki srwiki mgwiktionary on db1124:3315 T184805
  • 13:46 pmiazga@deploy1001: Finished deploy [proton/deploy@ecb9a0e]: Bugfix:handle undefined response and fix grafana stats (T186748,T201158) (duration: 02m 55s)
  • 13:43 pmiazga@deploy1001: Started deploy [proton/deploy@ecb9a0e]: Bugfix:handle undefined response and fix grafana stats (T186748,T201158)
  • 13:14 banyek: muting alerts on s2replication @dbstore2002 and resuming compression of s2 database tables (T204930)
  • 13:14 banyek: muting alerts on dbstore2002 and resuming compression of s2 database tables (T204930)
  • 12:23 elukey: deploy etcdmirror on conf1005 - T205814
  • 12:06 zeljkof: EU SWAT finished
  • 12:06 zfilipin@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Add permission "move-rootuserpages" to usergroup "eliminator" at ptwiki (T205595) (duration: 00m 57s)
  • 12:01 moritzm: rolling reboot of ms-fe hosts in codfw for kernel security update
  • 12:00 zeljkof: one more patch for EU SWAT
  • 11:57 zeljkof: EU SWAT finished
  • 11:57 zfilipin@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Add *.nasimonline.ir to wgCopyUploadsDomains whitelist for Commons (T203371) (duration: 00m 56s)
  • 11:52 zfilipin@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: add Radlines.org to $wgCopyUploadsDomains (T203219) (duration: 00m 57s)
  • 11:42 zfilipin@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Add .bollywoodhungama.in to wgCopyUploadsDomains (T203363) (duration: 00m 57s)
  • 11:35 zfilipin@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Add some namespaces aliases for zhwikiversity (T201675) (duration: 00m 57s)
  • 11:27 zfilipin@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Change acewiki default time zone to Asia/Jakarta (T205693) (duration: 00m 56s)
  • 11:17 zfilipin@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Create Photowalk and Photowalk Talk namespaces for bd.wikimedia.org (T205747) (duration: 00m 57s)
  • 10:44 twentyafterfour@deploy1001: Synchronized php-1.32.0-wmf.23/README: noop sync to verify that scap 3.8.7-1 works (at least on a basic level) (duration: 00m 59s)
  • 10:38 godog: upload scap 3.8.7-1 - T204383
  • 10:36 _joe_: uploading etcd-mirror to stretch-wikimedia T205814
  • 10:08 moritzm: rolling reboot of ms-fe hosts in eqiad for kernel security update
  • 09:13 arturo: T203177 schedule 8h icinga downtime for cloudcontrol1003,1004 and labmon1001
  • 08:52 moritzm: installing python2.7/python3.4/python3.5 security updates on jessie/stretch
  • 08:34 moritzm: installing ca-certificates updates for jessie/stretch
  • 08:09 marostegui: Restart icinga T196336
  • 08:00 gehel: re-enabling puppet on maps1004
  • 07:31 elukey: move Piwik/Matomo from bohrium to matomo1001 - T202962
  • 07:25 godog: reformat ms-be1041 with crc=1 finobt=0 - T199198
  • 06:57 jynus: starting multisource replication of s3 from s5 at eqiad master
  • 06:51 jynus: reenabling consistency configuration on s5 replica databases
  • 06:24 jynus: create manual backup of databases on eqiad s6, s7, s8, x1
  • 05:36 marostegui: Deploy schema change on db2048 (s1 master) - T205913
  • 05:35 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Repool db2062 (duration: 00m 56s)
  • 05:30 marostegui: Deploy schema change on db2062 - T205913
  • 05:30 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Depool db2062 (duration: 00m 57s)
  • 04:04 SMalyshev: repooled wdqs2003
  • 03:22 SMalyshev: depool wdqs2003 to let it catch up
  • 03:21 SMalyshev: repooled wdqs2001
  • 03:16 ejegg: re-enabled PayPal EC orphan rectifier
  • 03:06 ejegg: updated CiviCRM from 80cb98e33e to ddf4865650
  • 02:43 SMalyshev: depooled wdqs2001 to see if it catches up faster
  • 01:54 ejegg: updated payments-wiki from 8b673cfb4f to d623de9494

2018-10-03

  • 23:54 mutante: scheduled downtime for wdqs as it's flapping and already known
  • 23:45 catrope@deploy1001: Synchronized php-1.32.0-wmf.23/extensions/VisualEditor/: Require Parsoid HTML 2.0.0, and handle its <audio> tags (T201081); ext.visualEditor.mwlanguage: Actually load all of the code (T205834) (duration: 00m 57s)
  • 23:41 catrope@deploy1001: Synchronized php-1.32.0-wmf.24/extensions/VisualEditor/: Require Parsoid HTML 2.0.0, and handle its <audio> tags (T201081) (duration: 00m 59s)
  • 23:29 catrope@deploy1001: Synchronized php-1.32.0-wmf.23/extensions/PageTriage/: Hide copyvio AFC filter option behind flag (T205918) (duration: 00m 57s)
  • 23:23 catrope@deploy1001: Synchronized php-1.32.0-wmf.24/includes/utils/UIDGenerator.php: Make UID clock drift error have more details (T94522) (duration: 00m 58s)
  • 23:20 XenoRyet: shut off Paypal orphan rectifier
  • 23:12 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Bump Minerva A/B test rates to 100% on jawiki, ruwiki, fawiki (T200792) (duration: 00m 56s)
  • 22:49 shdubsh: re-enable puppet on einsteinium
  • 22:45 shdubsh: einsteinium: setting enable_notifications=1 and reloading icinga
  • 22:36 herron: herron@neodymium:~$ sudo cumin -b 15 -p 95 '*' 'run-puppet-agent -q --failed-only'
  • 22:20 shdubsh: einsteinium: setting enable_notifications=0 and starting icinga
  • 22:06 herron: herron@neodymium:~$ sudo cumin -b 40 -p 95 'R:file = /etc/nagios/nrpe_local.cfg' run-puppet-agent
  • 22:02 mutante: mw2242 - started nagios-nrpe-server
  • 22:01 shdubsh: icinga stopped manually
  • 21:57 mutante: einstienium - disabling puppet
  • 21:25 bblack: upgraded gdnsd to 2.99.9161 on authdns1001
  • 21:17 dduvall@deploy1001: Synchronized php: group1 wikis to 1.32.0-wmf.24 (duration: 00m 55s)
  • 21:16 dduvall@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.32.0-wmf.24
  • 21:12 dduvall@deploy1001: Synchronized php-1.32.0-wmf.24/extensions/WikibaseQualityConstraints/src/ServiceWiring.php: deploying fix to 1.32.0-wmf.24 for T206161 (duration: 00m 57s)
  • 20:28 marxarelli: deployed proposed WikibaseQualityConstraints fix and wikiversions bump for wikidatawiki to mwdebug1001 and mwdebug1002 for verification (T206161)
  • 20:18 robh: optic swap on cr4-ulsfo:et-0/0/1
  • 20:03 bblack: upgraded gdnsd to 2.99.9161 on multatuli
  • 19:40 bblack: upgraded gdnsd to 2.99.9161 on authdns2001
  • 19:35 bblack: uploaded 2.99.9161-beta-1+wmf1 to stretch-wikimedia
  • 19:33 mateusbs17: running initial osm import in maps1004
  • 19:23 dduvall@deploy1001: Synchronized php: rollback group1 to 1.32.0-wmf.23 (duration: 00m 54s)
  • 19:18 dduvall@deploy1001: rebuilt and synchronized wikiversions files: rollback group1 to 1.32.0-wmf.23
  • 19:15 marxarelli: rolling back group1 after rapid rise in fatals
  • 19:14 dduvall@deploy1001: scap failed: average error rate on 6/11 canaries increased by 10x (rerun with --force to override this check, see https://logstash.wikimedia.org/goto/db09a36be5ed3e81155041f7d46ad040 for details)
  • 18:49 RoanKattouw: Deployed patches for T206130
  • 18:36 papaul: reinstalling OS on lvs2010
  • 18:16 mutante: lvs2010 - schduled downtime for host and services for 12 hours for reinstall
  • 18:09 mutante: lvs2009 - schedule downtime in icinga for 4 hours, reinstall in progress
  • 18:08 ppchelko@deploy1001: Finished deploy [cpjobqueue/deploy@d5bab41]: Bump cirrusSearchLinksUpdate concurrency to 20 (duration: 00m 57s)
  • 18:07 ppchelko@deploy1001: Started deploy [cpjobqueue/deploy@d5bab41]: Bump cirrusSearchLinksUpdate concurrency to 20
  • 18:07 XioNoX: disable ulsfo Zayo transit/transport links
  • 17:42 XioNoX: re-enable cr1-eqiad:ae1 - T201145
  • 17:28 XioNoX: start of recabling asw2-a-eqiad between asw and cr1 - T201145
  • 17:26 XioNoX: disable cr1-eqiad:ae1 - T201145
  • 17:10 papaul: reinstalling OS on lvs2009
  • 16:24 reedy@deploy1001: Synchronized php-1.32.0-wmf.24/extensions/Flow/: fixup flow exporting T203424 (duration: 01m 03s)
  • 15:45 ejegg: updated fundraising CiviCRM from e3e1963915 to 80cb98e33e
  • 14:42 jynus: fixed some prometheus metrics grants on dbstore1001:3306, db1116:3317 and db1116:3318
  • 14:07 banyek: converting wikidatawiki.change_tag to TokuDB on host dbstrore1002 (T205544)
  • 12:54 urandom: DROP unused RESTBase tables - T204752
  • 12:26 stephanebisson: Finished mwscript extensions/ORES/maintenance/BackfillPageTriageQueue.php --wiki enwiki (T203286)
  • 12:12 stephanebisson: Starting mwscript extensions/ORES/maintenance/BackfillPageTriageQueue.php --wiki enwiki (T203286)
  • 11:54 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Don't purge articlequality, draftquality scores (T203286) (duration: 00m 57s)
  • 11:45 banyek: converting enwiki.slots to TokuDB on host dbstrore1002 (T205544)
  • 11:42 pmiazga@deploy1001: Synchronized wmf-config: SWAT: Remove dead config relating to wgRelatedArticlesEnabledBucketSize (T202306) (duration: 00m 57s)
  • 11:38 arturo: downtime cloudcontrol1003,1004 for 2h for T203177
  • 11:30 zfilipin@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Create eliminator group at Vietnamese Wikibooks (T202207) (duration: 00m 58s)
  • 11:25 zfilipin@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Fix a typo in zhwikiversitys importsources definition (T201328) (duration: 00m 57s)
  • 11:20 zfilipin@deploy1001: Synchronized wmf-config/throttle.php: SWAT: Fix a typo in lift account creation cap for cswiki event (T206119) (duration: 00m 56s)
  • 10:41 jynus: start compressing dbstore1001:x1 tables
  • 09:26 jynus: reducing io overhead temporarilly in exchange for crash safety for s5 replicas T184805
  • 09:23 jynus: fixing replication filters on dbstore1002 (again)
  • 08:34 jynus: fixing replication filters on dbstore1002
  • 08:18 jynus: starting importing of certain s3 wikis into eqiad s5 master T184805
  • 07:51 jynus: deploying replication filtes to s5 at labsdb1009/10/11 and dbstore1002 T184805
  • 07:06 mholloway-shell@deploy1001: Finished deploy [kartotherian/deploy@27062b4] (maps1004): Specify WDQS endpoint at wdqs.discovery.wmnet in the service config (T205607) (duration: 00m 28s)
  • 07:05 mholloway-shell@deploy1001: Started deploy [kartotherian/deploy@27062b4] (maps1004): Specify WDQS endpoint at wdqs.discovery.wmnet in the service config (T205607)
  • 06:42 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Repool db2055 (duration: 00m 55s)
  • 06:37 marostegui: Deploy schema change on db2055 - T205913
  • 06:37 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Depool db2055 (duration: 00m 56s)
  • 06:03 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Repool db2085:3311 (duration: 00m 56s)
  • 05:59 mholloway-shell@deploy1001: Finished deploy [mobileapps/deploy@e1aab7b]: Request Parsoid HTML version 2.0.0 (0866a07) (duration: 03m 32s)
  • 05:57 marostegui: Deploy schema change on db2085:3311 - T205913
  • 05:56 mholloway-shell@deploy1001: Started deploy [mobileapps/deploy@e1aab7b]: Request Parsoid HTML version 2.0.0 (0866a07)
  • 05:55 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Depool db2085:3311 (duration: 00m 58s)
  • 05:26 marostegui: Deploy schema change on db1067 (s1 eqiad master), lag will be generated - T205913
  • 05:25 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Repool db2070 (duration: 00m 57s)
  • 05:24 krinkle@deploy1001: Synchronized php-1.32.0-wmf.24/languages/Language.php: T206030 - I985dfa3eb17 (duration: 00m 56s)
  • 05:21 marostegui: Deploy schema change on db1075 (s3 eqiad master), lag will be generated - T205913
  • 05:20 marostegui: Deploy schema change on db2070 - T205913
  • 05:20 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Depool db2070 (duration: 00m 56s)
  • 04:45 krinkle@deploy1001: Synchronized php-1.32.0-wmf.24/extensions/NavigationTiming: T205580 - I04c52658fbf6d (duration: 01m 03s)
  • 00:42 Amir1: Evening SWAT is done
  • 00:41 ladsgroup@deploy1001: Synchronized php-1.32.0-wmf.23/extensions/GlobalPreferences/resources/ext.GlobalPreferences.global.ooui.js: SWAT: Fail gracefully if we failed to find associated widget (T205991) (duration: 00m 57s)
  • 00:38 mutante: icinga1001 (not prod yet), removing all icinga packages, running puppet to reinstall them, debugging dpkg issue
  • 00:19 ladsgroup@deploy1001: Synchronized php-1.32.0-wmf.24/extensions/GlobalPreferences/resources/ext.GlobalPreferences.global.ooui.js: SWAT: Fail gracefully if we failed to find associated widget (T205991) (duration: 00m 55s)

2018-10-02

  • 23:54 ladsgroup@deploy1001: Synchronized php-1.32.0-wmf.24/extensions/PageTriage/i18n/en.json: SWAT: Align copyvio log terminology (T199359) (duration: 00m 56s)
  • 23:38 ladsgroup@deploy1001: Synchronized php-1.32.0-wmf.24/extensions/PageTriage/modules/ext.pageTriage.views.list/ext.pageTriage.listControlNav.underscore: SWAT: Hide copyvio, none afc filter options behind flag (T205918) (duration: 00m 56s)
  • 23:33 ejegg: updated fundraising CiviCRM from c353eba283 to e3e1963915
  • 23:26 ladsgroup@deploy1001: Synchronized php-1.32.0-wmf.24/extensions/ORES/tests/phpunit/includes/HooksTest.php: SWAT: Disable RCFilters in tests (duration: 00m 54s)
  • 23:16 ladsgroup@deploy1001: Synchronized php-1.32.0-wmf.23/extensions/FlaggedRevs/frontend/specialpages/reports/ProblemChanges_body.php: SWAT: Fix using the old index when new indexes are not there (T205904) (duration: 00m 57s)
  • 22:53 shdubsh: powercycling icinga1001 after removing problematic entry from fstab
  • 22:26 gtirloni: labstore2003 re-started service block_sync
  • 21:39 XioNoX: Fix unused vlans XLink1/2 on asw2-a5
  • 21:15 banyek: enabling puppet on es2001
  • 21:12 banyek: re-enabling and starting backups on host es2001 (TT205257)
  • 21:01 gtirloni: labstore2003 stopped service block_sync
  • 20:15 dduvall@deploy1001: Finished scap: group0 to php-1.32.0-wmf.24 (duration: 33m 00s)
  • 20:04 Jeff_Green: authdns-update to deploy new IP for frbast2001.frack.eqiad.wmnet
  • 19:50 XioNoX: update prefix-list fundraising-codfw-internal4 to /24 on pfw3-codfw - T204271
  • 19:42 dduvall@deploy1001: Started scap: group0 to php-1.32.0-wmf.24
  • 19:36 dduvall@deploy1001: Pruned MediaWiki: 1.32.0-wmf.19 (duration: 07m 25s)
  • 19:21 XioNoX: update fw policies on pfw3-eqiad - T204271
  • 19:19 XioNoX: update fw policies on pfw3-codfw - T204271
  • 18:39 XioNoX: replace 10.195.0.73/29 with 10.195.0.65/28 on pfw3-codfw - T204271
  • 18:26 XioNoX: remove old 10.195.0.65/29 from pfw3-codfw - T204271
  • 18:24 jynus: restarting ferm on dbstore2002 T205257
  • 18:08 arlolra: Updated Parsoid to 65d6f82 (T163438, T205674, T205673)
  • 18:07 ariel@deploy1001: Finished deploy [dumps/dumps@a9570fb]: fix incr dumps multiversion conf setting (duration: 00m 06s)
  • 18:07 ariel@deploy1001: Started deploy [dumps/dumps@a9570fb]: fix incr dumps multiversion conf setting
  • 18:01 arlolra@deploy1001: Finished deploy [parsoid/deploy@19053a3]: Updating Parsoid to 65d6f82 (duration: 10m 44s)
  • 17:51 arlolra@deploy1001: Started deploy [parsoid/deploy@19053a3]: Updating Parsoid to 65d6f82
  • 17:37 XioNoX: update NAT for frbast2001 on pfw3-codfw - T204271
  • 17:25 XioNoX: update fw policies on pfw3-eqiad - T204271
  • 17:22 XioNoX: update fw policies on pfw3-codfw - T204271
  • 17:22 andrewbogott: upgraded wikitech-static to remotes/origin/REL1_31
  • 17:18 andrewbogott: upgrading debian packages and MediaWiki version on wikitech-static
  • 16:53 jynus: setup test s3 replication channel on db1110 (filtered)
  • 16:49 XioNoX: assign 10.195.0.129/29 to pfw3-codfw:reth0.2133 - T204271
  • 16:38 cmjohnson1: swapping failed disk db1067 T205780
  • 16:04 ppchelko@deploy1001: Finished deploy [cpjobqueue/deploy@093551f]: Increase cirrusSearchLinksUpdate concurrency (duration: 01m 06s)
  • 16:03 ppchelko@deploy1001: Started deploy [cpjobqueue/deploy@093551f]: Increase cirrusSearchLinksUpdate concurrency
  • 15:50 marxarelli: cutting 1.32.0-wmf.24 branch
  • 15:33 gehel: cleanup old cronjob (cleanup GC logs) on all elasticsearch servers
  • 15:24 akosiaris: upgrade mathoid chart version to 0.0.11
  • 15:24 akosiaris@deploy1001: scap-helm mathoid finished
  • 15:23 akosiaris@deploy1001: scap-helm mathoid cluster codfw completed
  • 15:23 akosiaris@deploy1001: scap-helm mathoid cluster eqiad completed
  • 15:23 akosiaris@deploy1001: scap-helm mathoid upgrade production stable/mathoid [namespace: mathoid, clusters: eqiad,codfw]
  • 15:21 akosiaris@deploy1001: scap-helm mathoid finished
  • 15:21 akosiaris@deploy1001: scap-helm mathoid cluster codfw completed
  • 15:21 akosiaris@deploy1001: scap-helm mathoid cluster eqiad completed
  • 15:21 akosiaris@deploy1001: scap-helm mathoid upgrade -h [namespace: mathoid, clusters: eqiad,codfw]
  • 14:11 banyek: powering off dbstore2002.codfw.wmnet for BBU change (T205257)
  • 13:47 marostegui: Deploy schema change on s4 eqiad, this will generate lag on eqiad - T205913
  • 13:06 marostegui: Deploy schema change on s7 eqiad, this will generate lag on eqiad - T205913
  • 12:47 banyek: converting enwiki.content to TokuDB on host dbstrore1002 (T205544)
  • 12:47 banyek: converting enwiki.contents to TokuDB on host dbstrore1002 (T205544)
  • 11:58 banyek: converting wikidatawiki.slots to TokuDB on host dbstrore1002 (T205544)
  • 11:41 arturo: downtime labstore1007 load check in icinga for 1d
  • 11:21 zeljkof: EU SWAT finished
  • 11:19 ladsgroup@deploy1001: Synchronized php-1.32.0-wmf.23/extensions/FlaggedRevs/frontend/specialpages/reports/ProblemChanges_body.php: SWAT: Use proper index on change_tag table (T205904) (duration: 00m 57s)
  • 10:58 mobrovac@deploy1001: Synchronized rpc/RunSingleJob.php: RunSingleJob: Delay job execution while in read-only mode - T204154 (duration: 00m 57s)
  • 10:34 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Repool db2092 (duration: 00m 56s)
  • 10:24 marostegui: Deploy schema change on db2092 - T203709
  • 10:24 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Depool db2092 (duration: 00m 56s)
  • 09:30 marostegui: Deploy schema change on s2 eqiad master, lag will be generated T205913
  • 08:43 banyek: disabling puppet on es2001 and disabling backups too
  • 08:28 marostegui: Deploy schema change on s6 eqiad master, lag will be generated T205913
  • 08:16 jynus: test recover some s3 wiki data onto db1110 (s5)
  • 08:04 jynus@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1110 (duration: 00m 56s)
  • 08:04 marostegui: Deploy schema change on s5 eqiad master, lag will be generated T205913
  • 08:01 banyek: converting wikidatawiki.content to TokuDB on host dbstrore1002 (T205544)
  • 07:54 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Repool db2071 (duration: 00m 55s)
  • 07:50 marostegui: Deploy schema change on db2071 T205913
  • 07:50 mholloway-shell@deploy1001: Finished deploy [tilerator/deploy@6c80537] (maps1004): Disable event logging requests and remove HTTP proxy (duration: 00m 17s)
  • 07:49 mholloway-shell@deploy1001: Started deploy [tilerator/deploy@6c80537] (maps1004): Disable event logging requests and remove HTTP proxy
  • 07:49 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Depool db2071 (duration: 00m 56s)
  • 07:48 mholloway-shell@deploy1001: Finished deploy [kartotherian/deploy@0bf513a] (maps1004): Remove HTTP proxy (duration: 00m 16s)
  • 07:48 mholloway-shell@deploy1001: Started deploy [kartotherian/deploy@0bf513a] (maps1004): Remove HTTP proxy
  • 07:42 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Repool db2088:3311 (duration: 00m 56s)
  • 07:36 marostegui: Deploy schema change on db2088:3311 T205913
  • 07:36 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Depool db2088:3311 (duration: 00m 55s)
  • 07:32 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Repool db2072 (duration: 00m 55s)
  • 07:18 marostegui: Deploy schema change on db2072 T205913
  • 07:17 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Depool db2072 (duration: 01m 02s)
  • 05:22 _joe_: stopped tilerator on maps1004, was spamming like crazy
  • 01:18 ejegg: updated CiviCRM from e7a620a00c to c353eba283

2018-10-01

  • 23:44 eileen: update process control revision is b9c7ab286e - define but not enable Redis
  • 23:43 foks: disabling 2FA for two users
  • 23:31 twentyafterfour: finished creating database tables
  • 23:18 twentyafterfour: creating ipblocks_restrictions table (command run on mwmaint2001: foreachwiki sql.php maintenance/archives/patch-ipblocks_restrictions-table.sql)
  • 22:52 ppchelko@deploy1001: Finished deploy [restbase/deploy@babfe80]: Don't log the request for transform failures, take 3, feeds check timeouts (duration: 06m 22s)
  • 22:46 ppchelko@deploy1001: Started deploy [restbase/deploy@babfe80]: Don't log the request for transform failures, take 3, feeds check timeouts
  • 22:45 ppchelko@deploy1001: Finished deploy [restbase/deploy@babfe80]: Don't log the request for transform failures, take 2, feeds check timeouts (duration: 03m 57s)
  • 22:41 ppchelko@deploy1001: Started deploy [restbase/deploy@babfe80]: Don't log the request for transform failures, take 2, feeds check timeouts
  • 22:41 ppchelko@deploy1001: Finished deploy [restbase/deploy@babfe80]: Don't log the request for transform failures (duration: 12m 27s)
  • 22:29 ppchelko@deploy1001: Started deploy [restbase/deploy@babfe80]: Don't log the request for transform failures
  • 21:17 arlolra: Updated Parsoid to 224ecde (T198504, T133673, T202666)
  • 20:45 arlolra@deploy1001: Finished deploy [parsoid/deploy@8ff45db]: Updating Parsoid to 224ecde (duration: 08m 22s)
  • 20:37 arlolra@deploy1001: Started deploy [parsoid/deploy@8ff45db]: Updating Parsoid to 224ecde
  • 20:35 gehel@deploy1001: Finished deploy [wdqs/wdqs@a637583]: New version of WDQS GUI, updater and blazegraph (duration: 14m 00s)
  • 20:21 gehel@deploy1001: Started deploy [wdqs/wdqs@a637583]: New version of WDQS GUI, updater and blazegraph
  • 19:52 gehel@deploy1001: Finished deploy [wdqs/wdqs@a637583]: New version of WDQS GUI, updater and blazegraph (wdqs1009 only) (duration: 00m 30s)
  • 19:51 gehel@deploy1001: Started deploy [wdqs/wdqs@a637583]: New version of WDQS GUI, updater and blazegraph (wdqs1009 only)
  • 19:27 ppchelko@deploy1001: Finished deploy [restbase/deploy@7caf4d8]: Content-negotiation filter going live T128040 (duration: 03m 38s)
  • 19:24 ppchelko@deploy1001: Started deploy [restbase/deploy@7caf4d8]: Content-negotiation filter going live T128040
  • 19:11 thcipriani: restarting ci jenkins for new plugins
  • 18:33 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enable page issues A/B test at 20% rate (T200792) (duration: 00m 56s)
  • 18:28 Amir1: ladsgroup@mwmaint2001:~$ mwscript extensions/CentralAuth/maintenance/deleteLocalPasswords.php --wiki=enwiki --prefix (T201009)
  • 18:23 catrope@deploy1001: Synchronized php-1.32.0-wmf.23/maintenance/includes/DeleteLocalPasswords.php: T201009 (duration: 00m 56s)
  • 18:17 catrope@deploy1001: Synchronized php-1.32.0-wmf.23/extensions/PageTriage/: Ensure valid AFC option is selected (T205324, T205168); hide copyvio behind a global var and URL param (duration: 00m 57s)
  • 18:12 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enable page issues A/B test at 5% rate (T200792) (duration: 00m 59s)
  • 17:59 XioNoX: push fw change on pfw3-eqiad - T205888
  • 17:57 XioNoX: push fw change on pfw3-codfw - T205888
  • 17:28 onimisionipe@deploy1001: Finished deploy [wdqs/wdqs@a637583]: Test deployment for recent updater build and GUI changes. Also blazegraph updates(wdqs1009) (duration: 01m 46s)
  • 17:27 onimisionipe@deploy1001: Started deploy [wdqs/wdqs@a637583]: Test deployment for recent updater build and GUI changes. Also blazegraph updates(wdqs1009)
  • 17:06 jynus@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1093, db1064 (duration: 00m 57s)
  • 17:02 jynus: stopping some mariadb instances on dbstore1001 and starting compression T201392
  • 16:26 ppchelko@deploy1001: Started restart [cpjobqueue/deploy@58f9ed3]: Fix KafkaConsumer not connected error
  • 15:16 jynus: stopping db1064 to clone it to dbstore1001
  • 15:00 akosiaris: upgrade etherpad to 1.7.0-2
  • 14:14 anomie@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Setting MCR migration stage to write-both/read-new on mediawikiwiki (T198308) (duration: 00m 56s)
  • 13:51 banyek: Downtimed the slave lag monitoring on dbstore1002 while the tables getting converted (T205544)
  • 12:38 akosiaris: upload hfst_3.13.0~r3461-1+wmf2 to apt.wikimedia.org/jessie-wikimedia/main. T199962
  • 12:26 banyek: converting enwiki.categorylinks to TokuDB on host dbstrore1002 (T205544)
  • 12:19 banyek: stopping replication on s2@dbstore20002: the tables being compressed (T204930)
  • 12:19 banyek: stopping replication on s2@dbstore20002: the tables being compressed
  • 12:15 banyek: enabling puppet on labsdb1009, labsdb1010, labsdb1011 (T183983)
  • 12:13 zeljkof: EU SWAT finished
  • 12:12 zfilipin@deploy1001: Synchronized php-1.32.0-wmf.23/extensions/ContentTranslation/: SWAT: Fix error in CXTransclusionNode#afterRender method (T205521) (duration: 00m 59s)
  • 11:56 jynus: stopping db1093 to clone it to dbstore1001
  • 11:52 arturo: install prometheus-openstack-exporte 0.0.8-3 in reprepro T203177
  • 11:41 zfilipin@deploy1001: Synchronized wmf-config: SWAT: Remove unused default source language config for CX (duration: 00m 57s)
  • 11:16 jynus@deploy1001: Synchronized wmf-config/db-codfw.php: Repool db2058 (duration: 00m 55s)
  • 11:09 _joe_: killed bash runner.sh by user ladsgroup on mwmaint2001
  • 10:58 jynus@deploy1001: Synchronized wmf-config/db-codfw.php: Depool db2058 (duration: 00m 57s)
  • 10:52 jynus@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1093, db1064 (duration: 00m 57s)
  • 10:42 jdrewniak@deploy1001: Synchronized portals: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 00m 56s)
  • 10:41 jdrewniak@deploy1001: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 00m 59s)
  • 10:21 godog: repair /dev/sdf1 /dev/sde1 on ms-be1041 - T199198
  • 10:15 Amir1: ladsgroup@mwmaint2001:~$ mwscript extensions/CentralAuth/maintenance/deleteLocalPasswords.php --prefix on all CentralAuth wikis (T201009)
  • 10:10 Amir1: mwscript extensions/CentralAuth/maintenance/deleteLocalPasswords.php --wiki=fawiki --delete (T201009)
  • 09:33 godog: test formatting sdh and sdi on ms-be2040 with crc=0 - T199198
  • 09:15 volans: Set Racktables in read-only mode - T199083
  • 08:56 _joe_: rolling restart of parsoid in codfw; afterwards, parsoid will connect to the MediaWiki API via HTTPS
  • 08:54 _joe_: rolling restart of parsoid in eqiad
  • 07:54 banyek: disabling puppet on labsdb1009, labsdb1010, labsdb1011 (T183983)
  • 07:54 banyek: disabling puppet on labsdb1009, labsdb1010, labsdb1011
  • 07:00 mholloway-shell@deploy1001: Finished deploy [kartotherian/deploy@ab6cb74] (maps1004): Update kartotherian to latest (T205462) (duration: 00m 16s)
  • 07:00 mholloway-shell@deploy1001: Started deploy [kartotherian/deploy@ab6cb74] (maps1004): Update kartotherian to latest (T205462)
  • 06:39 mholloway-shell@deploy1001: Finished deploy [tilerator/deploy@22f90ee] (maps1004): Update tilerator to latest (T205462) (duration: 00m 19s)
  • 06:39 mholloway-shell@deploy1001: Started deploy [tilerator/deploy@22f90ee] (maps1004): Update tilerator to latest (T205462)
  • 05:35 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1103:3312 (duration: 00m 56s)
  • 05:19 marostegui: Stop replication on dbstore1002 and db1103:3312 in sync
  • 05:19 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1103:3312 (duration: 01m 01s)
  • 05:19 mholloway-shell@deploy1001: Finished deploy [mobileapps/deploy@07cbfb4]: Update mobileapps to a1fa41b (duration: 03m 18s)
  • 05:15 mholloway-shell@deploy1001: Started deploy [mobileapps/deploy@07cbfb4]: Update mobileapps to a1fa41b
  • 05:07 marostegui: Deploy schema change on s1 codfw msater - T203709
  • 03:21 onimisionipe: restarting inplace reindexing of enwiki and viwiki at codfw - T204362


Archives

See Server admin log/Archives.