You are browsing a read-only backup copy of Wikitech. The primary site can be found at wikitech.wikimedia.org
Server Admin Log
Jump to navigation
Jump to search
2020-02-07
- 00:13 pt1979@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
- 00:11 pt1979@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
- 00:09 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
- 00:08 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
2020-02-06
- 23:44 pt1979@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
- 23:42 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
- 23:37 pt1979@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
- 23:35 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
- 23:25 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: T244133 [cswikisource] Enable VisualEditor in the Edice namespace (duration: 01m 07s)
- 23:22 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: T159711 T161365 T164435 [nlwiki] Enable VisualEditor in the Project namespace (duration: 01m 08s)
- 23:21 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
- 23:19 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
- 23:15 pt1979@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
- 23:13 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
- 23:10 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: T244405 Don't trying to assign to if it's unset (duration: 01m 07s)
- 22:50 jforrester@deploy1001: Synchronized php-1.35.0-wmf.18/extensions/VisualEditor: T242184 Change tags method so anon edits will go through (duration: 01m 08s)
- 22:42 pt1979@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
- 22:40 pt1979@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
- 22:39 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
- 22:38 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
- 22:18 pt1979@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
- 22:15 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
- 22:13 mutante: turning mw2271 and mw2163 into canary appservers for codfw, this adds mediawiki-testers shell users and removes scap sql scripts, rest stays as is (T242606)
- 21:54 pt1979@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
- 21:52 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
- 21:40 twentyafterfour: train blocked due to serious incident related to deploying the latest branch. Incident documentation: https://wikitech.wikimedia.org/wiki/Incident_documentation/20200206-mediawiki refs T233866
- 21:30 pt1979@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
- 21:27 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
- 21:05 pt1979@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
- 21:03 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
- 20:52 akosiaris: restart all wikifeeds pods
- 20:48 akosiaris@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
- 20:45 akosiaris: restart restbase on restbase1027
- 20:32 twentyafterfour@deploy1001: rebuilt and synchronized wikiversions files: (no justification provided)
- 20:30 twentyafterfour: sync-wikiversions --force
- 20:30 twentyafterfour@deploy1001: Scap failed!: 9/11 canaries failed their endpoint checks(http://en.wikipedia.org)
- 20:25 twentyafterfour@deploy1001: rebuilt and synchronized wikiversions files: all wikis to 1.35.0-wmf.18 refs T233866
- 19:45 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: T244405 Set wgLogoHD before adding wordmark (duration: 01m 06s)
- 19:36 bblack: re-pool cp1075 (eqiad text)
- 19:33 addshore: SWAT done!
- 19:32 addshore@deploy1001: Synchronized php-1.35.0-wmf.18/extensions/WikibaseLexemeCirrusSearch: T244479 Update namespace for PrefetchingTermLookup & fix tests (duration: 01m 06s)
- 19:31 bblack: depool cp1075 (eqiad text) for minor experimentation
- 19:29 addshore@deploy1001: Synchronized php-1.35.0-wmf.16/extensions/Babel/includes/Babel.php: T243713 Timeout for meta api call from 10 to 2 seconds. (duration: 01m 07s)
- 19:28 addshore@deploy1001: Synchronized php-1.35.0-wmf.18/extensions/Babel/includes/Babel.php: T243713 Timeout for meta api call from 10 to 2 seconds. (duration: 01m 07s)
- 19:25 addshore@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Fix incorrect spellings of "RESTBase" in config variables (2/2) 2.IS (duration: 01m 06s)
- 19:23 addshore@deploy1001: Synchronized wmf-config/CommonSettings.php: Fix incorrect spellings of "RESTBase" in config variables (2/2) 1.CS (duration: 01m 07s)
- 19:23 cdanis: manual puppet run on netflow1001 looked good; ✔️ cdanis@cumin1001.eqiad.wmnet ~ 🕑☕ sudo cumin A:netflow "run-puppet-agent --enable 'rollout of I60692f0e8 T237587 cdanis'"
- 19:22 pt1979@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
- 19:20 addshore@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Fix incorrect spellings of "RESTBase" in config variables (1/2) (duration: 01m 06s)
- 19:20 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
- 19:14 addshore@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enable EntitySourceBasedFederation everywhere T243395, sync again for luck (duration: 01m 06s)
- 19:12 cdanis: ✔️ cdanis@cumin1001.eqiad.wmnet ~ 🕑☕ sudo cumin A:netflow "disable-puppet 'rollout of I60692f0e8 T237587 cdanis'"
- 19:10 addshore@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enable EntitySourceBasedFederation everywhere T243395 (duration: 01m 07s)
- 19:05 addshore@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enable EntitySourceBasedFederation for group1 T243395 (duration: 01m 10s)
- 19:01 moritzm: restarting exim on mendelevium to pick up cyrus-sasl security updates
- 18:58 pt1979@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
- 18:56 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
- 18:55 moritzm: restarting apache on tungsten/dbmonitor to pick up cyrus-sasl security updates
- 18:53 mholloway-shell@deploy1001: Finished deploy [mobileapps/deploy@8e15868]: Update mobileapps to ceeb950 (duration: 06m 27s)
- 18:46 mholloway-shell@deploy1001: Started deploy [mobileapps/deploy@8e15868]: Update mobileapps to ceeb950
- 18:36 pt1979@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
- 18:34 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
- 18:06 pt1979@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
- 18:04 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
- 17:32 herron: set performance cpu scaling governor on maps*
- 16:49 vgutierrez: pooling ncredir5002 running buster - T243391
- 16:38 vgutierrez: pooling cp4023 with buster - T242093
- 16:36 ebernhardson@deploy1001: Finished deploy [wikimedia/discovery/analytics@524be2b]: airflow: Update ores data transfer from drafttopic -> articletopic (duration: 00m 19s)
- 16:35 ebernhardson@deploy1001: Started deploy [wikimedia/discovery/analytics@524be2b]: airflow: Update ores data transfer from drafttopic -> articletopic
- 16:35 XioNoX: remove AS prepending in esams/knams
- 16:31 bblack: lvs1013 - restart pybal for dual bgp session config - T180069
- 16:30 bblack: lvs1014 - restart pybal for dual bgp session config - T180069
- 16:30 bblack: lvs1015 - restart pybal for dual bgp session config - T180069
- 16:29 bblack: lvs1016 - restart pybal for dual bgp session config - T180069
- 16:28 moritzm: restarting apache on bromine to pick up SASL security updates
- 16:24 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
- 16:22 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
- 16:22 moritzm: installing cyrus-sasl2 security updates on jessie
- 16:20 bblack: lvs2001 - restart pybal for dual bgp session config - T180069
- 16:19 bblack: lvs2002 - restart pybal for dual bgp session config - T180069
- 16:19 bblack: lvs2003 - restart pybal for dual bgp session config - T180069
- 16:07 vgutierrez: depool and reimage ncredir5002 as buster - T243391
- 16:07 bblack: lvs4005 - restart pybal for dual bgp session config - T180069
- 16:06 bblack: lvs4006 - restart pybal for dual bgp session config - T180069
- 16:06 bblack: lvs4007 - restart pybal for dual bgp session config - T180069
- 16:03 vgutierrez: depool & reimage cp4023 as buster - T242093
- 16:03 vgutierrez: pooling cp4024 with buster - T242093
- 15:59 akosiaris: repool eventgate-analytics/eqiad. Experiment proved the failover wouldn't cause (on it's own) a problem. Experiment done.
- 15:58 akosiaris@cumin1001: conftool action : set/pooled=true; selector: name=eqiad,dnsdisc=eventgate-analytics
- 15:57 halfak@deploy1001: Finished deploy [ores/deploy@50a101a]: T242705 (duration: 04m 35s)
- 15:56 vgutierrez: pooling ncredir4001 running buster - T243391
- 15:55 moritzm: installing qemu security updates
- 15:54 bblack: lvs5001 - restart pybal for dual bgp session config - T180069
- 15:53 bblack: lvs5002 - restart pybal for dual bgp session config - T180069
- 15:53 halfak@deploy1001: Started deploy [ores/deploy@50a101a]: T242705
- 15:52 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
- 15:52 bblack: lvs5003 - restart pybal for dual bgp session config - T180069
- 15:50 moritzm: installing python-ecdsa security updates
- 15:50 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
- 15:41 moritzm: installing jsoup security updates
- 15:30 vgutierrez: depool & reimage ncredir4001 as buster - T243391
- 15:29 vgutierrez: depool & reimage cp4024 as buster - T242093
- 15:28 vgutierrez: pooling ncredir4002 running buster - T243391
- 15:27 moritzm: installing sudo security updates on jessie
- 15:23 vgutierrez: pooling cp4025 with buster - T242093
- 15:14 ema: A:mw-api: force puppet run to increase keepalive_requests from 100 to 200 https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/570670/ T241145
- 15:09 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
- 15:07 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
- 14:59 godog: extend graphite1004 / graphite2003 fs +200G
- 14:56 vgutierrez: depool and reimage ncredir4002 as buster - T243391
- 14:46 vgutierrez: depool & reimage cp4025 as buster - T242093
- 14:16 akosiaris: 20mins in with eventgate-analytics/eqiad depooled from discovery, no issues yet.
- 14:14 ema: run puppet on mw-api-canary to revert nginx keepalive_requests bump T241145
- 13:55 marostegui: Stop MySQL on es1019, upgrade and poweroff for on-site maintenance - T243963
- 13:54 akosiaris@cumin1001: conftool action : set/pooled=false; selector: name=eqiad,dnsdisc=eventgate-analytics
- 13:53 akosiaris: depool eqiad eventgate-analytics for testing purposes. Requests will flow to codfw, monitoring https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?orgId=1&from=now-30m&to=now for issues.
- 13:51 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es1019 for onsite maintenance T243963', diff saved to https://phabricator.wikimedia.org/P10321 and previous config saved to /var/cache/conftool/dbconfig/20200206-135157-marostegui.json
- 13:45 XioNoX: rollback deactivate BGP transits on cr3-knams
- 13:34 elukey: repool mw1347 with mcrouter running with 10 proxy threads (was: 5)
- 13:31 XioNoX: reboot cr3-knams
- 13:31 elukey: depool mw1347 to test some mcrouter settings
- 13:27 XioNoX: deactivate BGP transits on cr3-knams
- 13:22 vgutierrez: Enable server session sharing on ats-tls in cp4031 - T244464
- 13:10 XioNoX: rollback: deactivate BGP transits on cr2-eqsin
- 13:00 XioNoX: reboot cr2-eqsin for sw upgrade
- 13:00 addshore: SWAT done
- 13:00 addshore@deploy1001: Synchronized wmf-config/InitialiseSettings.php: resync REVERT Enable EntitySourceBasedFederation for group1 (duration: 01m 07s)
- 12:59 XioNoX: deactivate BGP transits on cr2-eqsin
- 12:58 addshore@deploy1001: Synchronized wmf-config/InitialiseSettings.php: REVERT Enable EntitySourceBasedFederation for group1 T243395, due to T244479 (duration: 01m 07s)
- 12:52 addshore@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enable EntitySourceBasedFederation for group1 T243395 (duration: 01m 06s)
- 12:46 addshore@deploy1001: Synchronized php-1.35.0-wmf.18/extensions/Babel: REVERT Fetch central babel information over SQL query, not API (T243726) (duration: 01m 07s)
- 12:44 addshore@deploy1001: sync-file aborted: Fetch central babel information over SQL query, not API (T243726) (duration: 01m 04s)
- 12:40 vgutierrez: pooling cp3065 - T242093
- 12:39 addshore@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enable EntitySourceBasedFederation for group0 T243395 (duration: 01m 07s)
- 12:34 cparle@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Re-enable delayed new upload jobs for MachineVision extension (duration: 01m 08s)
- 12:26 cparle@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Remove handler deleted from the MachineVision extension (duration: 01m 05s)
- 12:25 XioNoX: remove full-duplex statement from eqsin Tata link (not supported on Junos 18, as 10G is full duplex anyway)
- 12:24 cparle@deploy1001: Synchronized php-1.35.0-wmf.18/extensions/MachineVision: Use the wbsetclaim API to add depicts statements (duration: 01m 09s)
- 12:07 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: 5e1cbb2: Enable CX in te, kn, gu, mr and pawiki as a default tool (T243271, T243272, T243273, T243274, T243275) (duration: 01m 09s)
- 11:41 akosiaris: upgrade etherpad-lite on etherpad1002 to 1.8.0-1
- 11:38 kart_: Updated cxserver to 2020-02-05-051751-production (T244230, T234323)
- 11:35 kartik@deploy1001: helmfile [EQIAD] Ran 'apply' command on namespace 'cxserver' for release 'production' .
- 11:33 akosiaris: upload etherpad-lite_1.8.0-1 to apt.wikimedia.org buster-wikimedia/main
- 11:31 kartik@deploy1001: helmfile [CODFW] Ran 'apply' command on namespace 'cxserver' for release 'production' .
- 11:28 kartik@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'cxserver' for release 'staging' .
- 11:14 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
- 11:11 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
- 10:21 akosiaris: undo "switchover selectively eventgate-analytics.discovery.wmnet to codfw for mw1331 and mw1348". no effect observed
- 10:20 akosiaris: undo "switchover selectively eventgate-analytics.discovery.wmnet to codfw for mw1331 and mw1348"
- 10:19 vgutierrez: Enabling HTTP keepalive between ats-tls and varnish-frontend on cp4031 - T244464
- 10:00 vgutierrez: depool and reimage cp3065 as buster - T242093
- 09:59 vgutierrez: upload trafficserver 8.0.5-1wm14 to apt.wm.o (buster) - T242093
- 09:08 dcausse@deploy1001: Finished deploy [wdqs/wdqs@4306c64]: deploying wdqs 0.3.14-SNAPSHOT and gui 5a1af3b (duration: 11m 41s)
- 08:56 dcausse@deploy1001: Started deploy [wdqs/wdqs@4306c64]: deploying wdqs 0.3.14-SNAPSHOT and gui 5a1af3b
- 08:45 dcausse@deploy1001: Finished deploy [wdqs/wdqs@4306c64]: deploying wdqs 0.3.14-SNAPSHOT and gui 5a1af3b to wdqs1010.eqiad.wmnet (duration: 00m 29s)
- 08:44 dcausse@deploy1001: Started deploy [wdqs/wdqs@4306c64]: deploying wdqs 0.3.14-SNAPSHOT and gui 5a1af3b to wdqs1010.eqiad.wmnet
- 08:23 marostegui: Reboot dbproxy1012 and dbproxy1014 for upgrade
- 08:18 dcausse: restarting blazegraph on wdqs1006: T242453
- 08:17 akosiaris: switchover selectively eventgate-analytics.discovery.wmnet to codfw for mw1331 and mw1348 to
- 06:59 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1101:3317 - T239453', diff saved to https://phabricator.wikimedia.org/P10319 and previous config saved to /var/cache/conftool/dbconfig/20200206-065906-marostegui.json
- 06:52 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1098:3317 - T239453', diff saved to https://phabricator.wikimedia.org/P10318 and previous config saved to /var/cache/conftool/dbconfig/20200206-065238-marostegui.json
- 06:46 elukey: run puppet on all ores[12]* nodes
- 02:49 dzahn@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
- 02:42 mutante: ganeti - Creating new VM named install2003.codfw.wmnet in codfw with row=A vcpu=1 memory=1 gigabytes disk=20 gigabytes link=private (T244390)
- 02:39 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm
- 02:30 dzahn@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
- 02:21 mutante: ganeti - Creating new VM named install1003.eqiad.wmnet in eqiad with row=C vcpu=1 memory=1 gigabytes disk=20 gigabytes link=private (T244390)
- 02:20 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm
2020-02-05
- 23:30 ebernhardson: delete search indices duplicated on multiple clusters for: hywwiki, chrwiktionary, gcrwiki, mnwwiki, noboard_chapterswikimedia nqowiki nrmwiki outreachwiki and srnwiki
- 23:08 mholloway-shell@deploy1001: Finished deploy [mobileapps/deploy@a51f927]: Update mobileapps to a7928fa (duration: 10m 48s)
- 22:57 mholloway-shell@deploy1001: Started deploy [mobileapps/deploy@a51f927]: Update mobileapps to a7928fa
- 22:07 mutante: Gerrit - added ppchelko to 'wmf-deployment' Gerrit group (he is already in deployment admin group) (T244389)
- 21:37 arlolra@deploy1001: Finished deploy [parsoid/deploy@01d9d3d]: Updating Parsoid to 74730a3 (duration: 03m 07s)
- 21:33 arlolra@deploy1001: Started deploy [parsoid/deploy@01d9d3d]: Updating Parsoid to 74730a3
- 21:31 mutante: killing and restarting wikibugs, it was reporting each update twice
- 20:51 joal@deploy1001: Finished deploy [analytics/refinery@a47f0d5] (thin): Analytics regular weekly deploy (duration: 00m 07s)
- 20:51 joal@deploy1001: Started deploy [analytics/refinery@a47f0d5] (thin): Analytics regular weekly deploy
- 20:51 joal@deploy1001: Finished deploy [analytics/refinery@a47f0d5]: Analytics regular weekly deploy (duration: 13m 28s)
- 20:50 mutante: ores1004 - systemctl start celery-ores-worker
- 20:45 twentyafterfour@deploy1001: Synchronized php: group1 wikis to 1.35.0-wmf.18 refs T233866 (duration: 01m 07s)
- 20:44 twentyafterfour@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.35.0-wmf.18 refs T233866
- 20:37 joal@deploy1001: Started deploy [analytics/refinery@a47f0d5]: Analytics regular weekly deploy
- 20:34 dzahn@cumin1001: conftool action : set/weight=25; selector: name=mw1269.eqiad.wmnet
- 20:25 dzahn@cumin1001: conftool action : set/weight=25; selector: name=mw1267.eqiad.wmnet
- 20:25 mutante: mw1267 restarting php7.2-fpm
- 20:21 joal@deploy1001: Finished deploy [analytics/hdfs-tools/deploy@714e2d0]: Deploy bug fix version (duration: 00m 08s)
- 20:21 joal@deploy1001: Started deploy [analytics/hdfs-tools/deploy@714e2d0]: Deploy bug fix version
- 20:09 twentyafterfour: Preparing to deploy wmf/1.35.0-wmf.18 to group1 wikis refs T233866
- 20:09 moritzm: installing git security updates for jessie
- 20:00 moritzm: installing unzip security updates
- 19:44 mutante: LDAP - added spramduya to wmf group (T243802)
- 19:38 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Clean up VisualEditor settings (duration: 01m 07s)
- 19:38 ebernhardson: restart mjolnir-kafka-bulk-daemon across eqiad, daemons appear stuck and not reading new messages
- 19:19 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: T238029 Enable InukaPageView logging on production Wikipedias (duration: 01m 07s)
- 19:15 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: Sync back revert of 975b4bbb9 (duration: 01m 06s)
- 19:10 jforrester@deploy1001: scap failed: average error rate on 4/11 canaries increased by 10x (rerun with --force to override this check, see https://logstash.wikimedia.org/goto/db09a36be5ed3e81155041f7d46ad040 for details)
- 18:35 vgutierrez: pooling cp5012 - T242093
- 18:23 vgutierrez: rebooting cp5012 - T242093
- 18:21 elukey: restart memcached on mc1025 with 8 threads (rollback - revert https://gerrit.wikimedia.org/r/#/c/570370/, run puppet, restart memcached)
- 17:51 mutante: ganeti1017 - rebooting (not in use yet)
- 17:34 reedy@deploy1001: Synchronized php-1.35.0-wmf.18/languages/: T244300 (duration: 01m 13s)
- 17:33 reedy@deploy1001: Synchronized php-1.35.0-wmf.18/includes/: T244300 (duration: 01m 14s)
- 16:53 urandom: Sessionstore deployment (mediawiki-config) is done
- 16:37 ppchelko@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: gerrit:569678 Config: Enable sessionstore on group0 and 1 T243106 (duration: 01m 08s)
- 16:25 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: T232140 Restore wgLogoHD to wikis without a MinervaCustomLogos defined (duration: 01m 09s)
- 16:07 elukey: update puppet compiler's facts
- 15:54 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
- 15:52 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
- 15:29 effie: restart php-fpm on canaries - T236800
- 15:24 effie: Rollout php-apcu_5.1.17+4.0.11-1+0~20190217111312.9+stretch~1.gbp192528+wmf2 to api, app and jobrunner canaries - T236800
- 15:15 vgutierrez: depooling & reimaging cp5012 as buster - T242093
- 15:12 ema: cp: unset Accept-Encoding from ats-be requests to applayer T242478
- 14:35 vgutierrez: updating acme-chief to version 0.24 - T244236
- 14:32 _joe_: restarting mcrouter at nice -19 on mw1331 for testing effects of that change
- 14:30 vgutierrez: upload acme-chief 0.24 to apt.wm.o (buster) - T244236
- 14:26 XioNoX: push inital flowspec config to all routers
- 14:23 vgutierrez: pooling cp5006 - T242093
- 14:13 ema: cp1075: back to leaving Accept-Encoding as it is due to unrelated applayer issues T242478
- 13:46 marostegui: Decrease buffer pool size on db1107 for testing - T242702
- 13:45 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
- 13:43 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
- 13:42 akosiaris: undo the manually set 10.2.1.42 eventgate-analytics.discovery.wmnet in /etc/hosts for mw1331, mw1348. Verify hypothesis that this should cause increased latency. Restart php-fpm
- 13:41 ema: cp1075: unset Accept-Encoding on origin server requests T242478
- 13:39 Amir1: EU SWAT is done
- 13:38 ema: cp: disable puppet and merge https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/570311/ T242478
- 13:35 XioNoX: rollback traffic steering off cr2-eqord
- 13:29 akosiaris: manually set 10.2.1.42 eventgate-analytics.discovery.wmnet in /etc/hosts for mw1331, mw1348. Verify hypothesis that this should cause increased latency
- 13:25 XioNoX: reboot cr2-eqord for software upgrade - yaaaaa
- 13:24 ladsgroup@deploy1001: Synchronized php-1.35.0-wmf.18/extensions/Wikibase/lib/includes/Store/CachingPropertyInfoLookup.php: SWAT: Cache PropertyInfoLookup internally (T243955) (duration: 01m 07s)
- 13:17 XioNoX: increase ospf cost for cr2-eqord links