You are browsing a read-only backup copy of Wikitech. The live site can be found at wikitech.wikimedia.org

Server Admin Log: Difference between revisions

From Wikitech-static
Jump to navigation Jump to search
imported>Stashbot
(krinkle@deploy1001: Synchronized wmf-config/CommonSettings.php: 88ba4f8f4d49 (duration: 00m 55s))
imported>Nhatminh01
(fix something)
Line 1: Line 1:
== 2019-08-31 ==
== 2019-09-01 ==
* 13:33 krinkle@deploy1001: Synchronized wmf-config/CommonSettings.php: {{Gerrit|88ba4f8f4d49}} (duration: 00m 55s)
* 17:53 Urbanecm: Run mwscript extensions/AbuseFilter/maintenance/fixFirstBlockautopromoteEntries.php --wiki=enwikiquote --verbose ([[phab:T231137|T231137]])
 
* 17:45 Urbanecm: Run mwscript extensions/AbuseFilter/maintenance/fixFirstBlockautopromoteEntries.php --wiki=metawiki --verbose ([[phab:T231137|T231137]])
== 2019-08-30 ==
* 17:33 Urbanecm: Run foreachwikiindblist group1.dblist extensions/AbuseFilter/maintenance/fixFirstBlockautopromoteEntries.php --dry-run --verbose ([[phab:T231137|T231137]])
* 22:30 ejegg: disabled fundraising targetsmart import jobs
* 17:29 Urbanecm: Previous should be *group0.dblist ([[phab:T231137|T231137]])
* 22:09 gehel: regenerating tiles around Lake Huron for maps eqiad / codfw - [[phab:T231691|T231691]]
* 17:29 Urbanecm: Run foreachwikiindblist group0 extensions/AbuseFilter/maintenance/fixFirstBlockautopromoteEntries.php --dry-run --verbose ([[phab:T231137|T231137]])
* 22:04 gehel: forcing osm replication on maps[12]004 - lake Huron has dried up
* 19:33 jforrester@deploy1001: Synchronized php-1.34.0-wmf.20/extensions/Kartographer/includes/ApiQueryMapData.php: [[phab:T231561|T231561]] UBN fix for PHP fatal when ParserOutput has no map data (duration: 00m 56s)
* 19:24 ebernhardson: cloudelastic-chi index.merge.policy.deletes_pct_allowed=20
* 16:45 urandom: restarting restbase2017-b with live hack startup script (adds logging) -- [[phab:T231027|T231027]]
* 16:38 ebernhardson: cloudelastic-chi all indices auto_expand_replicas set to '0-1'
* 14:17 krinkle@deploy1001: Synchronized php-1.34.0-wmf.20/extensions/Thanks/includes/: [[phab:T231617|T231617]] - {{Gerrit|8a3c458c4d937}} (duration: 00m 54s)
* 13:19 ema@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp1075.eqiad.wmnet,service=ats-be
* 13:19 ema: cp1075: pause ats-be testing during the weekend [[phab:T228629|T228629]]
* 12:43 ema@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp1075.eqiad.wmnet,service=ats-be
* 12:12 marostegui: Start replication s2 on labsdb1009 and labsdb1010
* 11:57 marostegui: Start replication s2 on labsdb1011
* 11:48 marostegui: Start s2 replication on labsdb1012
* 11:33 jynus: switching db1125:s2 (eqiad sanitarium) to replicate from codfw [[phab:T231638|T231638]]
* 11:31 marostegui: Temporary stop s2 replication on labsdb1009-labsdb1012
* 10:23 jynus: reseting db1074 from iLo
* 10:10 jynus@deploy1001: Synchronized wmf-config/db-eqiad.php: Mirror dbctl depool of db1074 (duration: 00m 55s)
* 09:57 jynus@cumin1001: dbctl commit (dc=all): 'Depool db1074 after crash', diff saved to https://phabricator.wikimedia.org/P9013 and previous config saved to /var/cache/conftool/dbconfig/20190830-095747-jynus.json
* 09:24 ema: cp1075: depool ats-be due to low but constant 504 rate after 8.0.5-1wm4 upgrade
* 09:20 ema@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp1075.eqiad.wmnet,service=ats-be
* 09:13 ema: cp1075: upgrade ATS to 8.0.5-1wm4
* 08:50 ema: repool ats-be on cp1075 and verify if [[phab:T231504|T231504]] is fixed
* 08:49 ema@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp1075.eqiad.wmnet,service=ats-be
* 08:03 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1076 after upgrade', diff saved to https://phabricator.wikimedia.org/P9011 and previous config saved to /var/cache/conftool/dbconfig/20190830-080334-marostegui.json
* 07:42 marostegui: Upgrade db2055 db2071 db2072 db2092
* 07:10 marostegui@cumin1001: dbctl commit (dc=all): 'More traffic to db1076 after upgrade', diff saved to https://phabricator.wikimedia.org/P9010 and previous config saved to /var/cache/conftool/dbconfig/20190830-071043-marostegui.json
* 06:39 marostegui@cumin1001: dbctl commit (dc=all): 'More traffic to db1076 after upgrade', diff saved to https://phabricator.wikimedia.org/P9009 and previous config saved to /var/cache/conftool/dbconfig/20190830-063949-marostegui.json
* 06:25 marostegui@cumin1001: dbctl commit (dc=all): 'More traffic to db1076 after upgrade', diff saved to https://phabricator.wikimedia.org/P9008 and previous config saved to /var/cache/conftool/dbconfig/20190830-062517-marostegui.json
* 06:15 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1076 after upgrade', diff saved to https://phabricator.wikimedia.org/P9007 and previous config saved to /var/cache/conftool/dbconfig/20190830-061546-marostegui.json
* 06:07 marostegui: Upgrade db1076
* 06:07 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1076 for upgrade - [[phab:T230785|T230785]]', diff saved to https://phabricator.wikimedia.org/P9006 and previous config saved to /var/cache/conftool/dbconfig/20190830-060702-marostegui.json
* 05:25 marostegui: Stop MySQL on db2060 - [[phab:T231625|T231625]]
* 05:23 marostegui: Remove db2060 from tendril and zarcillo - [[phab:T231625|T231625]]
* 05:15 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Remove db2060 from config [[phab:T231625|T231625]] (duration: 00m 53s)
* 05:14 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Remove db2060 from config [[phab:T231625|T231625]] (duration: 00m 53s)
* 05:10 marostegui: Restart wikibugs
 
== 2019-08-29 ==
* 23:23 ejegg: updated payments-wiki from {{Gerrit|1d5d7503b0}} to {{Gerrit|51d9ed79b6}}
* 23:15 krinkle@deploy1001: Synchronized wmf-config/CommonSettings.php: {{Gerrit|4cdfebe}} (duration: 00m 54s)
* 21:36 ejegg: re-enabled fundraising python jobs
* 20:18 ejegg: updated fundraising python tools from {{Gerrit|c0f4e7a379}} to {{Gerrit|b42bda6bf3}}
* 20:14 foks: removing two files for legal compliance
* 20:14 ejegg: disabled fundraising python jobs
* 19:56 ebernhardson: cloudelastic-chi run frwiki_content/_forcemerge?only_expunge_deletes=true to try and fix 5gb segments with 96% deleted documents
* 18:59 ebernhardson: restart elasticsearch on cloudelastic1003 ([[phab:T231517|T231517]])
* 18:50 ebernhardson: restart elasticsearch on cloudelastic1002 ([[phab:T231517|T231517]])
* 18:41 ebernhardson: set index.merge.scheduler.max_thread_count to null to accept default values on cloudelastic-chi ([[phab:T231517|T231517]])
* 18:36 krinkle@deploy1001: Synchronized php-1.34.0-wmf.20/extensions/AbuseFilter/includes/AbuseFilterVariableHolder.php: [[phab:T231542|T231542]] {{Gerrit|f37f0bd50cf}} (duration: 00m 53s)
* 18:33 krinkle@deploy1001: Synchronized php-1.34.0-wmf.20/extensions/CentralAuth/modules/ext.centralauth.ForeignApi.js: {{Gerrit|e7cd3cd313a4642}} (duration: 00m 55s)
* 18:23 ebernhardson: restart elasticsearch on cloudelastic1001 ([[phab:T231517|T231517]])
* 18:22 urbanecm@deploy1001: Synchronized wmf-config/CommonSettings.php: SWAT: Fix "Assign all rights assigned to suppress group to oversight group" ([[phab:T230601|T230601]]) (duration: 00m 54s)
* 18:07 ebernhardson: increase index.refresh_interval to 5m for all indices on cloudelastic-chi
* 17:22 crusnov@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
* 17:19 crusnov@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
* 17:15 dcausse: restarted elasticsearch on cloudelastic1004 ([[phab:T231517|T231517]])
* 17:10 crusnov@cumin1001: START - Cookbook sre.ganeti.makevm
* 17:09 crusnov@cumin1001: START - Cookbook sre.ganeti.makevm
* 17:09 crusnov@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
* 16:59 crusnov@cumin1001: START - Cookbook sre.ganeti.makevm
* 16:49 crusnov@cumin1001: START - Cookbook sre.ganeti.makevm
* 16:49 crusnov@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99)
* 16:49 crusnov@cumin1001: START - Cookbook sre.ganeti.makevm
* 14:17 ema@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp1075.eqiad.wmnet,service=ats-be
* 14:16 ema: depool ats-be on cp1075 to investigate [[phab:T231504|T231504]]
* 11:54 Lucas_WMDE: EU SWAT done
* 11:45 mlitn@deploy1001: Synchronized php-1.34.0-wmf.20/extensions/UploadWizard: [SDC] Add "copy statements" functionality (UploadWizard part) (duration: 00m 52s)
* 11:44 mlitn@deploy1001: Synchronized php-1.34.0-wmf.20/extensions/WikibaseMediaInfo: [SDC] Add "copy statements" functionality (MediaInfo part) (duration: 00m 54s)
* 11:37 mutante: scholarships.wikimedia.org app moving to new backend and using TLS. backend upgraded from jessie to stretch and PHP7 ([[phab:T210411|T210411]])
* 09:19 mutante: iegreview.wikimedia.org switched to new stretch backend and using TLS ([[phab:T210411|T210411]])
* 09:08 marostegui: Reboot db1133 to upgrade kernel - [[phab:T229657|T229657]]
* 08:43 marostegui: Change min_replicas to 4 on s2 for eqiad and codfw [[phab:T231019|T231019]]
* 08:41 mutante: cp1085 - puppet run stuck after Loading facts, possibly related to ACKed IPMI sensor status issue in Icinga
* 08:39 mutante: cp1085 - kill stuck puppet processes and run manually
* 08:36 marostegui: Change min_replicas to 4 on s4 for eqiad and codfw [[phab:T231019|T231019]]
* 08:30 marostegui: Change min_replicas to 2 on s3 for eqiad and codfw [[phab:T231019|T231019]]
* 08:26 mutante: running puppet on cp-text_eqiad
* 08:23 mutante: switching iegreview app to stretch backend with TLS and discovery record
* 08:23 kart_: Updated cxserver to 2019-08-29-074757-production ([[phab:T230200|T230200]])
* 08:21 @: helmfile [EQIAD] Ran 'apply' command on namespace 'cxserver' for release 'production' .
* 08:18 @: helmfile [CODFW] Ran 'apply' command on namespace 'cxserver' for release 'production' .
* 08:15 @: helmfile [STAGING] Ran 'apply' command on namespace 'cxserver' for release 'staging' .
* 08:11 _joe_: disabling zend GC on mw1347, testing an hypothesis for [[phab:T231011|T231011]]
* 08:03 _joe_: live tweak on mw1270: apc.ttl removed; apc size 4 GB; tideways disabled.
* 05:00 marostegui: Stop MySQL on db2053 for decommission [[phab:T231407|T231407]]
* 04:59 marostegui: Remove db2053 from tendril and zarcillo [[phab:T231407|T231407]]
* 03:28 ejegg: updated payments-wiki from {{Gerrit|231b7b0850}} to {{Gerrit|1d5d7503b0}}
 
== 2019-08-28 ==
* 23:17 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: {{Gerrit|8bfe43a}}: Add scielo.br to wgCopyUploadsDomains for commonswiki ([[phab:T231402|T231402]]) (duration: 00m 55s)
* 21:39 bd808: Set downtime/ack for showmount on labstore1004 ([[phab:T229448|T229448]])
* 21:03 ejegg: deleted fredge_multiqueue_consumer process-control job
* 19:39 jforrester@deploy1001: Synchronized php-1.34.0-wmf.20/includes/upload/UploadFromChunks.php: [[phab:T231488|T231488]] Speculatively hot-deploy fix ahead of landing in git (duration: 00m 54s)
* 19:15 James_F: Live hacking php-1.34.0-wmf.20/includes/upload/UploadFromChunks.php on mwdebug1002 for [[phab:T231488|T231488]]
* 18:57 XioNoX: update cloud firewall policies on cr1/2-eqiad - [[phab:T231418|T231418]]
* 18:32 urandom: rebooting restbase-dev1006 -- [[phab:T229421|T229421]]
* 18:25 zfilipin@deploy1001: Synchronized php: group1 wikis to 1.34.0-wmf.20 (duration: 00m 53s)
* 18:24 zfilipin@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.34.0-wmf.20
* 17:42 XioNoX: re-enable both sides of the reline link between knams and esams - [[phab:T230448|T230448]]
* 17:14 jforrester@deploy1001: Synchronized php-1.34.0-wmf.20/extensions/Kartographer/includes/ApiQueryMapData.php: [[phab:T231453|T231453]] Fix array access as object (duration: 00m 54s)
* 17:06 jforrester@deploy1001: Synchronized php-1.34.0-wmf.19/extensions/MobileFrontend/includes: [[phab:T231014|T231014]] Postpone call to MobileContext::shouldDisplayMobileView() (duration: 00m 55s)
* 16:51 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Stop setting wgGraphIsTrusted (no longer used) (duration: 00m 56s)
* 16:06 hashar: upgrading Jenkins on contint1001
* 16:03 mutante: imported new jenkins package to thirdparty/ci stretch-wikimedia
* 16:01 hashar: contint2001: upgraded Debian packages / Jenkins
* 15:15 jeh: restart puppetdb on compiler1002.puppet-diffs.eqiad.wmflabs
* 14:35 mutante: racktables - down for maintenance
* 13:59 ema: cp1075 ats-be repooled to resume testing [[phab:T228629|T228629]]
* 13:58 ema@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp1075.eqiad.wmnet,service=ats-be
* 13:38 zfilipin@deploy1001: rebuilt and synchronized wikiversions files: Revert "group1 wikis to 1.34.0-wmf.20"
* 13:30 zfilipin@deploy1001: Synchronized php: group1 wikis to 1.34.0-wmf.20 (duration: 00m 55s)
* 13:29 zfilipin@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.34.0-wmf.20
* 13:20 marostegui: Change min_replicas to 4 on s8 for eqiad and codfw [[phab:T231019|T231019]]
* 13:18 marostegui: Change min_replicas to 3 on s6 for eqiad and codfw [[phab:T231019|T231019]]
* 13:15 marostegui: Optimize pc2010 after deleting old rows - [[phab:T210725|T210725]]
* 12:17 hashar: contint1001: manually gzip a few mw-debug-cli.log.gz files # [[phab:T219850|T219850]]
* 12:06 Urbanecm: Closing EU SWAT
* 12:05 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: {{Gerrit|389919f}}: [rowiki] Allow sysops to name patrollers ([[phab:T231099|T231099]]) (duration: 00m 53s)
* 12:03 Urbanecm: EU SWAT is taking few mins out of the sanity break, last patch
* 12:01 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: {{Gerrit|34f1552}}: Disable search engine indexing in some namespaces of Icelandic Wikipedia ([[phab:T231179|T231179]]) (duration: 00m 54s)
* 11:57 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: {{Gerrit|2aebc15}}: Enable Page Previews as default on zhwikivoyage ([[phab:T230624|T230624]]) (duration: 00m 52s)
* 11:55 Urbanecm: Purge /static/images/project-logos/specieswiki-1.5x.png and /static/images/project-logos/specieswiki-2x.png ([[phab:T230113|T230113]])
* 11:54 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: {{Gerrit|f86baa3}}: Create HIDPI logo for Wikispecies ([[phab:T230113|T230113]]) (duration: 00m 52s)
* 11:52 urbanecm@deploy1001: Synchronized static/images/project-logos/: SWAT: {{Gerrit|f86baa3}}: Create HIDPI logo for Wikispecies (1/2, [[phab:T230113|T230113]]) (duration: 00m 54s)
* 11:48 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:532983{{!}}Enable WRITE_BOTH for items term store for testwikidatawiki (T225055)]] (duration: 00m 54s)
* 11:42 mlitn@deploy1001: Synchronized php-1.34.0-wmf.20/extensions/WikibaseMediaInfo: [SDC] Check existence of objects before using it (duration: 00m 54s)
* 11:31 marostegui: Optimize pc1010 after deleting old rows - [[phab:T210725|T210725]]
* 11:30 jdrewniak@deploy1001: Synchronized portals: Wikimedia Portals Update: [[gerrit:532966{{!}} Bumping portals to master (T128546)]] (duration: 00m 52s)
* 11:30 jdrewniak@deploy1001: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: [[gerrit:532966{{!}} Bumping portals to master (T128546)]] (duration: 00m 53s)
* 11:16 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: {{Gerrit|622cb63}}: Enable AMC Outreach modal ([[phab:T231206|T231206]]) (duration: 00m 54s)
* 11:15 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: {{Gerrit|4ebddb8}}: Enable webfonts for ru,uk,be of wikisource, and for sourceswiki ([[phab:T220752|T220752]]) (duration: 00m 55s)
* 11:04 Amir1: end of foreachwikiindblist wikidataclient extensions/Wikibase/lib/maintenance/populateSitesTable.php --force-protocol https ([[phab:T212886|T212886]])
* 10:42 Amir1: start of foreachwikiindblist wikidataclient extensions/Wikibase/lib/maintenance/populateSitesTable.php --force-protocol https ([[phab:T212886|T212886]])
* 09:58 vgutierrez: repooling cp5001 - [[phab:T231287|T231287]]
* 09:56 vgutierrez: upgrading trafficserver on cp5001 to version 8.0.5-1wm4 - [[phab:T231287|T231287]]
* 09:28 mutante: notebook1004 - systemctl start jupyter-ebernhardson-singleuser ([[phab:T231365|T231365]])
* 09:19 mutante: notebook1003 - systemctl start jupyter-iflorez-singleuser
* 09:14 mutante: mwdebug1002 - restart php-fpm
* 09:11 mutante: miscweb2001 - edit /etc/apache2/ports.conf and replace port 444 with 443 again; a2dismod ssl; systemctl restart apache2; systemctl restart envoyproxy; now also has envoy listening on 443, matches miscweb1001 and manual hack removed ([[phab:T210411|T210411]])
* 09:06 mutante: miscweb1001 - a2dismod ssl; restart apache - stop listening on 443 to make room for envoy
* 08:17 marostegui: Deploy grants on labsdb hosts for dbproxy1018 - [[phab:T202367|T202367]]
* 08:10 vgutierrez: uploaded trafficserver-8.0.5-1wm4 to apt.wikimedia.org (stretch) - [[phab:T231287|T231287]]
* 08:08 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 08:06 marostegui@cumin1001: START - Cookbook sre.hosts.downtime
* 06:41 marostegui: Upgrade mysql on s7 codfw hosts: db2054, db2061, db2068, db2077 - [[phab:T230106|T230106]]
* 06:35 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Remove db2053 from config [[phab:T231407|T231407]] (duration: 00m 53s)
* 06:34 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Remove db2053 from config [[phab:T231407|T231407]] (duration: 00m 55s)
* 05:54 marostegui: Remove old rows from pc1010 - [[phab:T210725|T210725]]
* 05:19 marostegui: Start dropping neodymium grants across all the databases, parsercache, es, dbstore... [[phab:T229796|T229796]]
* 05:03 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool pc1009 after optimize [[phab:T210725|T210725]] (duration: 00m 54s)
 
== 2019-08-27 ==
* 23:45 Urbanecm: Evening SWAT done
* 23:44 urbanecm@deploy1001: Synchronized php-1.34.0-wmf.20/skins/MinervaNeue/: SWAT: {{Gerrit|4d04797}}: Restore contributions icon to non-AMC menu ([[phab:T231363|T231363]]) (duration: 00m 54s)
* 23:39 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: {{Gerrit|1422870}}: [sqwikiquote] Enable WikiLove and SandboxLink ([[phab:T230390|T230390]]) (duration: 00m 54s)
* 23:36 Urbanecm: Run mwscript extensions/WikimediaMaintenance/createExtensionTables.php sqwikiquote wikilove ([[phab:T230390|T230390]])
* 23:30 urbanecm@deploy1001: Synchronized php-1.34.0-wmf.20/extensions/MobileFrontend/resources/dist/: SWAT: {{Gerrit|a109b25}}: Build assets reflecting edit change (duration: 00m 55s)
* 23:22 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: {{Gerrit|3704bb7}}: Enable partial blocks on ruwiki ([[phab:T231298|T231298]]) (duration: 00m 54s)
* 23:16 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: {{Gerrit|1687ec9}}: Whitelist *.wikimedia.cz in wgCopyUploadsDomains for commonswiki ([[phab:T231247|T231247]]) (duration: 00m 54s)
* 23:02 eileen: civicrm revision is {{Gerrit|049c9666b6}}, config revision is {{Gerrit|24aed9745e}}
* 22:40 eileen: civicrm revision changed from {{Gerrit|517e6ee4e0}} to {{Gerrit|049c9666b6}}, config revision is {{Gerrit|24aed9745e}}
* 22:29 jforrester@deploy1001: Synchronized php-1.34.0-wmf.20/extensions/VisualEditor/lib/ve/src/ce/nodes/ve.ce.GeneratedContentNode.js: [[phab:T231381|T231381]] Follow-up {{Gerrit|I196f5bd88}}: Fix typo (set node=this) (duration: 00m 57s)
* 21:51 bblack@cumin1001: conftool action : set/pooled=no; selector: name=cp1075.eqiad.wmnet,service=ats-be
* 21:11 XioNoX: disable both sides of the reline link between knams and esams - [[phab:T230448|T230448]]
* 20:36 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Remove unusued wgEnableBlockNoticeStats setting (duration: 00m 54s)
* 19:08 gehel: starting deployment of Apache config for lexemes / SDoC - [[phab:T222321|T222321]]
* 18:59 jforrester@deploy1001: Synchronized php-1.34.0-wmf.20/includes/gallery/ImageGalleryBase.php: [[phab:T231340|T231340]] [[phab:T231353|T231353]] BadFileLookup::isBadFile() expects null, not false for galleries (duration: 00m 53s)
* 18:58 jforrester@deploy1001: Synchronized php-1.34.0-wmf.20/includes/api/ApiQueryImageInfo.php: [[phab:T231340|T231340]] [[phab:T231353|T231353]] BadFileLookup::isBadFile() expects null, not false for the API (duration: 00m 53s)
* 18:56 jforrester@deploy1001: Synchronized php-1.34.0-wmf.20/skins/MinervaNeue/skin.json: [[phab:T231358|T231358]] Fix userSandbox image path (duration: 00m 53s)
* 17:46 mholloway-shell@deploy1001: Finished deploy [mobileapps/deploy@1869f79]: Fix definition endpoint TypeError ([[phab:T230503|T230503]]) (duration: 04m 39s)
* 17:42 krinkle@deploy1001: Synchronized php-1.34.0-wmf.20/includes/password/PasswordPolicyChecks.php: {{Gerrit|098755622f7}} (duration: 00m 54s)
* 17:41 mholloway-shell@deploy1001: Started deploy [mobileapps/deploy@1869f79]: Fix definition endpoint TypeError ([[phab:T230503|T230503]])
* 17:04 krinkle@deploy1001: Synchronized php-1.34.0-wmf.20/extensions/Echo: {{Gerrit|34084279089f}} (duration: 00m 55s)
* 16:38 krinkle@deploy1001: Synchronized php-1.34.0-wmf.20/extensions/TwoColConflict/extension.json: {{Gerrit|d6b5d441b}}, [[phab:T229791|T229791]] (duration: 00m 55s)
* 15:41 James_F: That was [[phab:T231279|T231279]] Set `$wgRelatedArticlesDescriptionSource` to `wikidata`
* 15:41 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: [[phab:T231279|T231279]] Set to (duration: 00m 54s)
* 14:52 _joe_: running scap pull on mw1280
* 14:50 oblivian@puppetmaster1001: conftool action : set/pooled=inactive; selector: name=mw1280.eqiad.wmnet
* 14:49 _joe_: powercycling mw1280
* 14:49 bblack: deploying anycast recdns resolv.conf setting to all codfw - [[phab:T228190|T228190]]
* 14:45 zfilipin@deploy1001: rebuilt and synchronized wikiversions files: group0 to 1.34.0-wmf.20
* {{safesubst:SAL entry|1=14:39 ema: cp1081: restart crashed services varnishkafka-{statsv,webrequest}.service}}
* 14:38 vgutierrez: depool cp5001 - [[phab:T231287|T231287]]
* 14:33 zfilipin@deploy1001: Finished scap: testwiki to php-1.34.0-wmf.20 and rebuild l10n cache (duration: 30m 48s)
* 14:03 zfilipin@deploy1001: Started scap: testwiki to php-1.34.0-wmf.20 and rebuild l10n cache
* 13:52 zfilipin@deploy1001: Pruned MediaWiki: 1.34.0-wmf.16 [keeping static files] (duration: 01m 35s)
* 13:47 zfilipin@deploy1001: Pruned MediaWiki: 1.34.0-wmf.15 (duration: 06m 44s)
* 13:43 vgutierrez: repool cp5001 - [[phab:T231287|T231287]]
* 12:36 ema: pool cp1075 w/ ATS backend (for real) [[phab:T228629|T228629]]
* 12:29 marostegui: Rename table filejournal on enwiki on db1089 - [[phab:T51195|T51195]]
* 12:17 ema: depool cp1075, confd is not watching the key "ats-be"
* 12:15 ema: pool cp1075 w/ ATS backend [[phab:T228629|T228629]]
* 11:55 mutante: miscweb1001 - a2dismod mpm_event ; a2enmod php7.0 ; systemctl restart apache2 ([[phab:T224247|T224247]], [[phab:T196968|T196968]]) please also see https://gerrit.wikimedia.org/r/c/operations/puppet/+/451206
* 11:52 dcausse: EU Swat done
* 11:51 mutante: miscweb1001 - manually remove tin.eqiad.wmnet (!) from /srv/iegreview/iegreview-cache/.config and replace with deploy1001 after first puppet run. still existing bug that tin is not fully removed ([[phab:T224247|T224247]], [[phab:T175288|T175288]], [[phab:T197470|T197470]])
* 11:49 dcausse@deploy1001: Synchronized wmf-config/CirrusSearch-production.php: [[phab:T231194|T231194]] [cirrus] Stop generating new cirrusSearchChecker jobs (duration: 00m 45s)
* 11:43 dcausse: reopening EU SWAT
* 11:18 raynor: EU SWAT finished
* 11:15 vgutierrez: depooling cp5001
* 11:11 pmiazga@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:532422{{!}}Drop MobileWebUIActionsTracking sampling rate to 0.01% (T220016)]] (duration: 00m 46s)
* 11:10 mutante: ganeti1001 - starting and OS install of new VM miscweb1001
* 10:25 marostegui: Remove grants from sarin from all the dbs, dbstore, parsercache, es, labsdb - [[phab:T229796|T229796]]
* 10:25 mutante: ganeti eqiad - creating new VM with same specs as krypton to replace it with a stretch instance and mirror miscweb2001. krypton to be removed ([[phab:T224323|T224323]], [[phab:T105507|T105507]], [[phab:T224247|T224247]])
* 10:12 dcausse: cirrus: reindexing lost updates since 2019-08-12T10:00:00Z for wikitech ([[phab:T230994|T230994]])
* 09:39 marostegui: Deploy grants for dbproxy1016 on m3 - [[phab:T202367|T202367]]
* 09:21 marostegui: Remove grants for dbproxy1004 and dbproxy1009 from m4 hosts (db1107 and db1108) - [[phab:T231280|T231280]]
* 09:21 vgutierrez: upgrading trafficserver to version 8.0.5-1wm3 on cp5001 - [[phab:T221594|T221594]]
* 09:20 vgutierrez: uploaded trafficserver-8.0.5-1wm3 to apt.wikimedia.org (stretch) - [[phab:T221594|T221594]]
* 09:11 mobrovac@deploy1001: Finished deploy [cpjobqueue/deploy@c2bc1a3]: Increase cirrusSearchLinksUpdate concurrency to 150 - [[phab:T231194|T231194]] (duration: 01m 09s)
* 09:09 mobrovac@deploy1001: Started deploy [cpjobqueue/deploy@c2bc1a3]: Increase cirrusSearchLinksUpdate concurrency to 150 - [[phab:T231194|T231194]]
* 08:46 ema@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 08:44 ema@cumin1001: START - Cookbook sre.hosts.downtime
* 08:36 vgutierrez: repooling cp5001 - [[phab:T231262|T231262]]
* 08:18 ema: depool cp1075 and reimage as text_ats [[phab:T228629|T228629]]
* 07:50 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Reorganize s6 codfw weights and roles [[phab:T230106|T230106]] (duration: 00m 44s)
* 07:48 marostegui@cumin1001: dbctl commit (dc=codfw): 'Reorganize s6 codfw weights and roles [[phab:T230106|T230106]]', diff saved to https://phabricator.wikimedia.org/P8983 and previous config saved to /var/cache/conftool/dbconfig/20190827-074802-marostegui.json
* 07:28 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Promote db2129 as s6 codfw master [[phab:T230106|T230106]] (duration: 00m 46s)
* 07:28 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2046, this host will be decommissioned [[phab:T230106|T230106]]', diff saved to https://phabricator.wikimedia.org/P8982 and previous config saved to /var/cache/conftool/dbconfig/20190827-072847-marostegui.json
* 07:25 marostegui@cumin1001: dbctl commit (dc=codfw): 'Promote db2129 to codfw s6 master [[phab:T230106|T230106]]', diff saved to https://phabricator.wikimedia.org/P8981 and previous config saved to /var/cache/conftool/dbconfig/20190827-072556-marostegui.json
* 07:16 marostegui: Switchover codfw s6 master from db2046 to db2129 [[phab:T230106|T230106]]
* 07:14 marostegui@cumin1001: dbctl commit (dc=all): 'Set db2129 weight to 0 before promoting it to codfw s6 master [[phab:T230106|T230106]]', diff saved to https://phabricator.wikimedia.org/P8980 and previous config saved to /var/cache/conftool/dbconfig/20190827-071456-marostegui.json
* 07:07 vgutierrez: depooling cp5001 - [[phab:T231262|T231262]]
* 07:04 vgutierrez: repooling cp5001 - [[phab:T231262|T231262]]
* 06:11 _joe_: updating reprepro sources for jessie-wikimedia
* 05:36 XioNoX: update cloud acls on cr1/2-eqiad - [[phab:T230980|T230980]]
* 05:28 marostegui: Optimize pc1009 - [[phab:T210725|T210725]]
* 05:28 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool pc1009 for optimize [[phab:T210725|T210725]] (duration: 00m 45s)
* 05:24 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Depool pc1009 for optimize [[phab:T210725|T210725]] (duration: 00m 45s)
* 05:12 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Repool pc2009 after optimize [[phab:T210725|T210725]] (duration: 00m 47s)
* 03:59 vgutierrez: depooling cp5001 - [[phab:T231262|T231262]]
* 03:53 vgutierrez: repooling cp5001 - [[phab:T231262|T231262]]
* 02:59 vgutierrez: rebooting cp5001
* 01:47 eileen: process-control config revision is {{Gerrit|24aed9745e}}
* 00:18 eileen: civicrm revision changed from {{Gerrit|ab2a9b264b}} to {{Gerrit|517e6ee4e0}}, config revision is {{Gerrit|8c900d909f}}
 
== 2019-08-26 ==
* 20:50 bsitzmann@deploy1001: Finished deploy [mobileapps/deploy@0463394]: Update mobileapps to {{Gerrit|6bdc333}} (duration: 06m 18s)
* 20:44 bsitzmann@deploy1001: Started deploy [mobileapps/deploy@0463394]: Update mobileapps to {{Gerrit|6bdc333}}
* 20:25 bsitzmann@deploy1001: Finished deploy [mobileapps/deploy@d9042a1]: Update mobileapps to {{Gerrit|fbe3cc6}} (duration: 13m 08s)
* 20:12 bsitzmann@deploy1001: Started deploy [mobileapps/deploy@d9042a1]: Update mobileapps to {{Gerrit|fbe3cc6}}
* 18:30 sbisson@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:528546]] lvwiki damaging model adjustment (duration: 00m 46s)
* 18:15 sbisson@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:528506]] Enable Related Article cards in Timeless across all projects (duration: 00m 46s)
* 17:53 XioNoX: add new IP to labsdb-tcp4 on cr1/2-eqiad - [[phab:T230980|T230980]]
* 17:34 herron: beginning roll out of prometheus-ipsec-exporter in ulsfo [[phab:T230236|T230236]]
* 15:38 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: [[phab:T228051|T228051]] Load the Translate extension via static extension registration (duration: 00m 46s)
* 15:02 vgutierrez: depooling cp5001
* 14:59 marostegui: Change min_replicas to 3 on s5 for eqiad and codfw [[phab:T231019|T231019]]
* 14:16 @: helmfile [EQIAD] Ran 'apply' command on namespace 'termbox' for release 'production' .
* 14:05 @: helmfile [CODFW] Ran 'apply' command on namespace 'termbox' for release 'production' .
* 14:05 vgutierrez: repooling cp5001 using trafficserver as TLS termination layer - [[phab:T221594|T221594]]
* 14:02 herron: uploaded prometheus-ipsec-exporter-0.3.1-1 pacakge to stretch-wikimedia and buster-wikimedia
* 14:00 @: helmfile [STAGING] Ran 'apply' command on namespace 'termbox' for release 'staging' .
* 13:58 @: helmfile [STAGING] Ran 'apply' command on namespace 'termbox' for release 'test' .
* 13:49 vgutierrez: upgraded trafficserver to version 8.0.5-1wm2 in cp5001
* 13:49 marostegui: Rename table filejournal on enwiki on db2112 - [[phab:T51195|T51195]]
* 13:38 mobrovac@deploy1001: Finished deploy [restbase/deploy@38c313d]: Expose RB on both 7231 and 7233 - [[phab:T223953|T223953]] (duration: 23m 00s)
* 13:28 vgutierrez: Replacing nginx with ats-tls in cp5001 - [[phab:T221594|T221594]]
* 13:21 marostegui: Change MySQL.monitoring queries latency graph parameters to support buster+mariadb 10.3 - [[phab:T231190|T231190]]
* 13:15 mobrovac@deploy1001: Started deploy [restbase/deploy@38c313d]: Expose RB on both 7231 and 7233 - [[phab:T223953|T223953]]
* 13:09 mobrovac@deploy1001: Finished deploy [restbase/deploy@38c313d] (dev-cluster): Bring the dev cluster up to date and expose RB on both 7231 and 7233 in it - [[phab:T223953|T223953]] (duration: 03m 22s)
* 13:06 mobrovac@deploy1001: Started deploy [restbase/deploy@38c313d] (dev-cluster): Bring the dev cluster up to date and expose RB on both 7231 and 7233 in it - [[phab:T223953|T223953]]
* 13:06 mobrovac@deploy1001: deploy aborted: Bring the dev cluster up to date and expose RB on both 7231 and 7233 in it - [[phab:T223953|T223953]] (duration: 00m 04s)
* 13:06 mobrovac@deploy1001: Started deploy [restbase/deploy@38c313d]: Bring the dev cluster up to date and expose RB on both 7231 and 7233 in it - [[phab:T223953|T223953]]
* 12:48 marostegui: Restart MySQL on db2114 to pick up binlog format change
* 12:48 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Clarify db2114 status (duration: 00m 45s)
* 11:57 mobrovac@deploy1001: Finished deploy [cpjobqueue/deploy@e742ecf]: Increase the concurrency of cirusSearchCheckerJobs to 20 - [[phab:T231194|T231194]] (duration: 01m 31s)
* 11:55 mobrovac@deploy1001: Started deploy [cpjobqueue/deploy@e742ecf]: Increase the concurrency of cirusSearchCheckerJobs to 20 - [[phab:T231194|T231194]]
* 11:36 Amir1: EU SWAT is done
* 11:34 ladsgroup@deploy1001: Synchronized php-1.34.0-wmf.19/extensions/UniversalLanguageSelector: SWAT: [[gerrit:532341{{!}}Revert "Return target of redirect languages in mw.uls.getFrequentLanguageList" (T217770 T121747)]] (duration: 00m 46s)
* 11:10 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:527087{{!}}Switch property terms migration to WRITE_NEW on client wikis (T225053)]] (duration: 00m 46s)
* 10:47 jdrewniak@deploy1001: Synchronized portals: Wikimedia Portals Update: [[gerrit:528092{{!}} Bumping portals to master (T128546)]] (duration: 00m 46s)
* 10:47 jdrewniak@deploy1001: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: [[gerrit:528092{{!}} Bumping portals to master (T128546)]] (duration: 00m 46s)
* 10:26 vgutierrez: uploaded trafficserver-8.0.5-1wm2 to apt.wikimedia.org (stretch) - [[phab:T221594|T221594]]
* 09:54 _joe_: codfw/appserver/*/mw2231.codfw.wmnet: pooled changed yes => inactive [[phab:T231192|T231192]]
* 09:43 Urbanecm: Run scap pull on mwdebug1001, test ended
* 09:38 Urbanecm: Enable partial blocks on test2wiki and mwdebug1001 to test something
* 08:46 _joe_: hard powercycle of mw2231, down with a blank console
* 06:51 ema: cp-upload: rolling ats-backend-restart to enable compress plugin
* 05:25 marostegui: Upload new mariadb 10.3 packages to repo
* 05:09 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Depool pc2009 for optimize [[phab:T210725|T210725]] (duration: 02m 53s)
* 05:08 marostegui: Optimize tables on pc2009 - [[phab:T210725|T210725]]
 
== 2019-08-25 ==
* 13:46 volans: uploaded spicerack_0.0.27-1_amd64.deb to apt.wikimedia.org stretch-wikimedia
* 02:22 cdanis: clear downtimes on cr2-eqiad/cr2-codfw, link supposedly stable now
* 00:35 herron: set icinga downtimes on flapping cr2-eqiad and cr2-codfw alerts until monday
 
== 2019-08-24 ==
* 15:27 Urbanecm: Run mwscript importImages.php --wiki=commonswiki --comment-ext=txt --user=Shangkuanlc /home/urbanecm/T231129 ([[phab:T231129|T231129]])
 
== 2019-08-23 ==
* 23:34 robh@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 23:34 robh@cumin1001: START - Cookbook sre.hosts.decommission
* 23:03 robh@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 23:03 robh@cumin1001: START - Cookbook sre.hosts.decommission
* 22:05 eileen: process-control config revision is {{Gerrit|8c900d909f}}
* 21:48 XioNoX: increase ospf cost of zayo codfw-eqiad link to 1320 (was 320) to make it secondary
* 19:11 robh@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 19:11 robh@cumin1001: START - Cookbook sre.hosts.decommission
* 18:14 James_F: Dropped 2FA for User:DBrant (WMF), per request.
* 17:40 robh@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 17:39 robh@cumin1001: START - Cookbook sre.hosts.decommission
* 12:24 _joe_: pooling mw1270 temporarily, debugging performance issues
* 12:15 _joe_: depooling mw1270 temporarily, performance issues
* 11:13 marostegui: Upgrade db1114 from 10.3.16 to 10.3.17
* 10:06 dcausse: elastic: reindexing wikis with old mappings in eqiad & codfw ([[phab:T230990|T230990]])
* 05:52 moritzm: installing squid3 security updates
* 05:11 marostegui: Stop MySQL on db2066 for decommission [[phab:T230885|T230885]]
* 05:08 marostegui: Remove db2066 from tendril and zarcillo [[phab:T230885|T230885]]
 
== 2019-08-22 ==
* 23:41 urbanecm@deploy1001: Synchronized php-1.34.0-wmf.19/extensions/WikibaseLexeme: SWAT: {{Gerrit|e4a5457}}: Fix Lexemes RDF generation ([[phab:T230974|T230974]]) (duration: 00m 49s)
* 23:32 urbanecm@deploy1001: Synchronized wmf-config/abusefilter.php: SWAT: {{Gerrit|eb1c4ea}}: Rename globals and rights in AbuseFilter config (duration: 00m 47s)
* 23:24 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: {{Gerrit|66b719d}}: General cleanup of `groupOverrides` ([[phab:T231041|T231041]]) (duration: 00m 47s)
* 23:15 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: {{Gerrit|872f4b0}}: Change language code for punjabiwikimedia, resyncing, got broken pipe at the end ([[phab:T230680|T230680]]) (duration: 00m 47s)
* 23:13 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: {{Gerrit|872f4b0}}: Change language code for punjabiwikimedia ([[phab:T230680|T230680]]) (duration: 00m 48s)
* 23:10 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: {{Gerrit|a5917e4}}: Clean up `wgRateLimits` to remove unneeded entries ([[phab:T231040|T231040]]) (duration: 00m 48s)
* 22:07 mholloway-shell@deploy1001: Synchronized wmf-config/InitialiseSettings-labs.php: Update MachineVision Beta config (duration: 00m 47s)
* 21:19 eileen: tools revision changed from {{Gerrit|5c080bac63}} to {{Gerrit|c0f4e7a379}}
* 20:35 ejegg: updated payments-wiki from {{Gerrit|85dce8f79f}} to {{Gerrit|231b7b0850}}
* 17:14 elukey: remove analytics-tool1002 from ganeti - [[phab:T231021|T231021]]
* 17:12 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 17:12 elukey@cumin1001: START - Cookbook sre.hosts.decommission
* 14:37 _joe_: restarting php-fpm on mw1348 to observe the effect on the slowdown, [[phab:T231011|T231011]]
* 13:47 elukey: update puppet compiler's facts
* 13:42 jijiki: Restart php-fpm on mw1348 and mw1347
* 13:41 zfilipin@deploy1001: rebuilt and synchronized wikiversions files: all wikis to 1.34.0-wmf.19
* 13:36 joal@deploy1001: Finished deploy [analytics/refinery@a9b99e9]: Regular weekly analytics deployment train (1 day late) (duration: 18m 57s)
* 13:32 jiji@deploy1001: Synchronized wmf-config/CommonSettings.php: Reverting PHP7 traffic back to 20% - [[phab:T219150|T219150]] (duration: 00m 57s)
* 13:27 tarrow@deploy1001: Synchronized php-1.34.0-wmf.19/extensions/Wikibase/client/: [[gerrit:531677{{!}}Revert "Use the backwards-compatible HTML ID for the wikidata item link" (T230958, T66315)]] (duration: 00m 58s)
* 13:18 joal@deploy1001: Started deploy [analytics/refinery@a9b99e9]: Regular weekly analytics deployment train (1 day late)
* 12:56 _joe_: restarting mw1270 with slowlog disabled
* 12:41 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 12:40 elukey@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 12:39 elukey@cumin1001: START - Cookbook sre.hosts.downtime
* 12:39 elukey@cumin1001: START - Cookbook sre.hosts.downtime
* 12:38 _joe_: disabled slowlog on mw1348, repooling after reload
* 12:37 jijiki: Pooling mv1347 not mw1247
* 12:35 jijiki: Pooling mw1247
* 12:16 moritzm: upgrading mariadb (packaged Debian version) on matomo1001
* 12:15 elukey@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 12:13 elukey@cumin1001: START - Cookbook sre.hosts.downtime
* 12:07 jijiki: Depooling  mw1347 and mw1348
* 10:55 jiji@deploy1001: Synchronized wmf-config/CommonSettings.php: Push PHP7 traffic to 33.3% - [[phab:T219150|T219150]] (duration: 01m 01s)
* 09:09 ema: rolling ats-backend-restart to enable @debug system call family
* 09:02 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 09:00 marostegui@cumin1001: START - Cookbook sre.hosts.downtime
* 08:17 moritzm: restarting oozie on an-coord1001
* 07:54 tarrow@deploy1001: Synchronized php-1.34.0-wmf.19/extensions/Wikibase/repo/: Backport for UBN [[gerrit:531527{{!}}Hack to avoid trying to termbox render page before save (T230937)]] (duration: 00m 56s)
* 07:46 marostegui: Deploy grants on labsdb1009-labsdb1012 to allow connections for haproxy from dbproxy1019 - [[phab:T202367|T202367]]
* 06:52 moritzm: installing mariadb-10.1 updates from Stretch 9.9 point release (unrelated to wmf-mariadb, mostly client-side clients/libraries as shipped in Debian)
* 06:37 moritzm: installing python-pip updates from Stretch 9.9 point release
* 05:41 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Remove db2066 from config [[phab:T230885|T230885]] (duration: 00m 54s)
* 05:40 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Remove db2066 from config [[phab:T230885|T230885]] (duration: 00m 54s)
* 05:14 marostegui: Remove db2059 from tendril and zarcillo - [[phab:T230884|T230884]]
* 05:08 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Remove db2059 from config [[phab:T230884|T230884]] (duration: 00m 55s)
* 05:06 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Remove db2059 from config [[phab:T230884|T230884]] (duration: 00m 59s)
* 00:37 XioNoX: run /usr/local/sbin/restart-php7.2-fpm on mwdebug1001/2
* 00:27 XioNoX: push L3 ECMP to eqiad routers - [[phab:T230955|T230955]]
* 00:23 XioNoX: push L3 ECMP to esams routers - [[phab:T230955|T230955]]
* 00:22 XioNoX: push L3 ECMP to eqsin routers - [[phab:T230955|T230955]]
* 00:21 twentyafterfour: phabricator update completed without incident
* 00:19 XioNoX: push L3 ECMP to codfw routers - [[phab:T230955|T230955]]
* 00:15 twentyafterfour: Starting phabricator upgrade from tag release/2019-08-14/1 to release/2019-08-22/1
 
== 2019-08-21 ==
* 21:44 eileen: civicrm revision changed from {{Gerrit|d7370a9d0b}} to {{Gerrit|ab2a9b264b}}, config revision is {{Gerrit|58cd6b7ae6}}
* 20:51 ebernhardson@deploy1001: Finished deploy [search/mjolnir/deploy@fc270fd]: bulk_daemon: Retune popularity_score bulk sizing (duration: 03m 49s)
* 20:48 ebernhardson@deploy1001: Started deploy [search/mjolnir/deploy@fc270fd]: bulk_daemon: Retune popularity_score bulk sizing
* 20:17 ebernhardson@deploy1001: Finished deploy [search/mjolnir/deploy@67103e9]: bulk_daemon: Correct super() call (duration: 04m 19s)
* 20:13 ebernhardson@deploy1001: Started deploy [search/mjolnir/deploy@67103e9]: bulk_daemon: Correct super() call
* 20:02 ebernhardson@deploy1001: Finished deploy [search/mjolnir/deploy@556c4d0]: bulk_daemon: Track timeouts, log indices used, increase thread counts (duration: 04m 42s)
* 20:00 XioNoX: test l3 ECMP in ulsfo
* 19:57 ebernhardson@deploy1001: Started deploy [search/mjolnir/deploy@556c4d0]: bulk_daemon: Track timeouts, log indices used, increase thread counts
* 19:54 ebernhardson@deploy1001: Finished deploy [search/mjolnir/deploy@adff5ad]: bulk_daemon: Track timeouts, log indices used, increase thread counts (duration: 02m 34s)
* 19:52 ebernhardson@deploy1001: Started deploy [search/mjolnir/deploy@adff5ad]: bulk_daemon: Track timeouts, log indices used, increase thread counts
* 19:34 XioNoX: repool codfw and eqsin - [[phab:T226422|T226422]]
* 19:31 XioNoX: Rollback: Varnish: redirect eqsin/ulsfo text to eqiad - [[phab:T226422|T226422]]
* 19:29 ayounsi@puppetmaster1001: conftool action : set/pooled=true; selector: dnsdisc=kartotherian,name=codfw
* 19:29 ayounsi@puppetmaster1001: conftool action : set/pooled=true; selector: dnsdisc=swift,name=codfw
* 19:26 XioNoX: rollback: increase OSPF cost on cr2-codfw links - [[phab:T226422|T226422]]
* 19:25 XioNoX: rollback deactivate transit links on cr2-codfw - [[phab:T226422|T226422]]
* 19:24 XioNoX: rollback: move VRRP master from cr2-codfw to cr1-codfw - [[phab:T226422|T226422]]
* 19:16 XioNoX: restart both REs on cr2-codfw - [[phab:T226422|T226422]]
* 19:14 XioNoX: failover master RE to RE0 on cr2-codfw - [[phab:T226422|T226422]]
* 18:37 XioNoX: shutdown re0:cr2-codfw (backup) - [[phab:T226422|T226422]]
* 18:32 XioNoX: failover master RE to RE1 on cr2-codfw - [[phab:T226422|T226422]]
* 18:19 XioNoX: shutdown re1:cr2-codfw (backup) - [[phab:T226422|T226422]]
* 18:18 jforrester@deploy1001: Synchronized php-1.34.0-wmf.19/includes/specialpage/RedirectSpecialPage.php: [[phab:T230932|T230932]] RedirectSpecialArticle: Fix PHP notice about undefined index (duration: 00m 54s)
* 18:18 XioNoX: move VRRP master from cr2-codfw to cr1-codfw - [[phab:T226422|T226422]]
* 18:15 ayounsi@puppetmaster1001: conftool action : set/pooled=false; selector: dnsdisc=swift,name=codfw
* 18:15 ayounsi@puppetmaster1001: conftool action : set/pooled=false; selector: dnsdisc=kartotherian,name=codfw
* 18:15 tarrow@deploy1001: Synchronized php-1.34.0-wmf.19/extensions/Wikibase/client/: SWAT: [[gerrit:531528{{!}}Use the backwards-compatible HTML ID for the wikidata item link (T66315)]] (duration: 00m 58s)
* 18:14 ayounsi@puppetmaster1001: conftool action : set/pooled=false; selector: dnsdisc=swift,name=codfw
* 18:14 ayounsi@puppetmaster1001: conftool action : set/pooled=false; selector: dnsdisc=kartotherian,name=codfw
* 18:12 XioNoX: deactivate transit links on cr2-codfw - [[phab:T226422|T226422]]
* 18:04 XioNoX: increase OSPF cost on cr2-codfw links - [[phab:T226422|T226422]]
* 17:56 XioNoX: rollback: apply BGP graceful shutdown to cr1-codfw transits - [[phab:T226422|T226422]]
* 17:55 XioNoX: Rollback: increase OSPF cost on ulsfo-codfw link - [[phab:T226422|T226422]]
* 17:53 XioNoX: rollback: disable BGP from cr1-codfw to lvs2001/2/3 - [[phab:T226422|T226422]]
* 17:43 XioNoX: restart both REs on cr1-codfw - [[phab:T226422|T226422]]
* 17:33 XioNoX: failover master RE to RE0 on cr1-codfw - [[phab:T226422|T226422]]
* 17:33 cmjohnson1: cloudvirt1015 down for a new motherboard
* 17:25 XioNoX: shutdown RE0 on cr1-codfw - [[phab:T226422|T226422]]
* 17:17 bstorm_: reboot cloudvirt1024 to try and reset raid [[phab:T230289|T230289]]
* 17:17 XioNoX: failover master RE to RE1 on cr1-codfw - [[phab:T226422|T226422]]
* 17:08 XioNoX: disable BGP from cr1-codfw to lvs2001/2/3 - [[phab:T226422|T226422]]
* 17:02 cmjohnson1: rebooting cloudvirt1024
* 17:00 tarrow: continuing the SWAT window to backport train blocker fixes
* 16:56 XioNoX: Varnish: redirect eqsin/ulsfo text to eqiad - [[phab:T226422|T226422]]
* 16:51 XioNoX: increase OSPF cost on ulsfo-codfw link - [[phab:T226422|T226422]]
* 16:46 XioNoX: apply BGP graceful shutdown to cr1-codfw transits - [[phab:T226422|T226422]]
* 16:37 XioNoX: depool eqsin and codfw - [[phab:T226422|T226422]]
* 16:01 moritzm: fixed apt config on krypton, broken getenvoy-jessie.list made apt-get update fail
* 15:16 elukey@deploy1001: Finished deploy [analytics/superset/deploy@UNKNOWN]: Rollback to 0.32 (duration: 00m 25s)
* 15:15 elukey@deploy1001: Started deploy [analytics/superset/deploy@UNKNOWN]: Rollback to 0.32
* 15:07 moritzm: installing python-cryptography update from Stretch point release
* 15:00 jbond42: adding interface::add_ip6_mapped to media wiki servers
* 14:46 elukey@deploy1001: Finished deploy [analytics/superset/deploy@868635a]: Upgrading superset to 0.34rc1 (duration: 00m 33s)
* 14:46 elukey@deploy1001: Started deploy [analytics/superset/deploy@868635a]: Upgrading superset to 0.34rc1
* 14:42 moritzm: installing java-common update from Stretch point release
* 14:36 moritzm: installing dns-root-data update from Stretch point release
* 14:29 godog: silence average mw appserver latency alerts for 24h, too noisy
* 14:28 elukey: swap turnilo backend in varnish from analytics-tool1002 to an-tool1007
* 14:27 moritzm: installing ca-certificates-java update from Stretch point release
* 14:10 marostegui: Upgrade mysql on db2075
* 13:12 zfilipin@deploy1001: Synchronized php: group1 wikis to 1.34.0-wmf.19 (duration: 00m 55s)
* 13:11 zfilipin@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.34.0-wmf.19
* 11:59 jbond42: add ipv6 mapped address to mw codfw servers
* 11:41 Amir1: EU SWAT is done
* 11:38 jijiki: Restarting ores on ores1004 and ores1005
* 11:37 elukey: restart celery-ores-worker on ores1002
* 10:57 Urbanecm: Run scap pull on mwdebug1002 ([[phab:T230601|T230601]])
* 10:52 Urbanecm: Move 0a87e3c's code to abusefilter.php on mwdebug1002 ([[phab:T230601|T230601]])
* 10:49 Urbanecm: Previous log entry was for mwdebug1002
* 10:49 Urbanecm: Wrapped code added to CommonSettings.php in [[phab:T230601|T230601]] to wgExtensionFunctions
* 10:45 Urbanecm: Run mwscript namespaceDupes.php --wiki=zhwikisource --add-prefix=FIXME --fix ([[phab:T230548|T230548]])
* 10:02 moritzm: installing puppetdb1002
* 09:46 tarrow: finished enabling termbox on wikidatawiki
* 09:36 tarrow@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [[gerrit:531433{{!}}Enable Termbox on wikidatawiki (T230896)]] (duration: 00m 55s)
* 09:29 moritzm: rebooting db2102 (reverting to a proper stretch 4.9 kernel, it used a bpo kernel due to some hardware debuging a while back)
* 09:20 @: helmfile [EQIAD] Ran 'apply' command on namespace 'termbox' for release 'production' .
* 09:15 @: helmfile [CODFW] Ran 'apply' command on namespace 'termbox' for release 'production' .
* 09:09 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 09:09 jmm@cumin2001: START - Cookbook sre.hosts.downtime
* 09:09 @: helmfile [STAGING] Ran 'apply' command on namespace 'termbox' for release 'staging' .
* 09:07 _joe_: uploaded python-poolcounter to stretch,buster
* 08:57 @: helmfile [STAGING] Ran 'apply' command on namespace 'termbox' for release 'test' .
* 08:29 moritzm: upgrading PHP on contint*
* 08:18 moritzm: installing puppetdb2002
* 08:11 marostegui: Stop MySQL on db2052 [[phab:T230883|T230883]]
* 08:11 marostegui: Remove db2052 from tendril and zarcillo [[phab:T230883|T230883]]
* 08:05 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Remove db2052 from config [[phab:T230883|T230883]] (duration: 00m 54s)
* 08:04 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Remove db2052 from config [[phab:T230883|T230883]] (duration: 00m 54s)
* 07:58 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1122', diff saved to https://phabricator.wikimedia.org/P8953 and previous config saved to /var/cache/conftool/dbconfig/20190821-075813-marostegui.json
* 07:56 ema: upload@eqsin: rolling ats-backend-restart to enable compress plugin
* 05:45 marostegui@cumin1001: dbctl commit (dc=all): 'More weight to db1122', diff saved to https://phabricator.wikimedia.org/P8952 and previous config saved to /var/cache/conftool/dbconfig/20190821-054542-marostegui.json
* 05:28 eileen: civicrm revision changed from {{Gerrit|0d1b7f107a}} to {{Gerrit|d7370a9d0b}}, config revision is {{Gerrit|58cd6b7ae6}}
* 05:26 marostegui@cumin1001: dbctl commit (dc=all): 'More weight to db1122', diff saved to https://phabricator.wikimedia.org/P8951 and previous config saved to /var/cache/conftool/dbconfig/20190821-052613-marostegui.json
* 05:14 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1122 after restart', diff saved to https://phabricator.wikimedia.org/P8950 and previous config saved to /var/cache/conftool/dbconfig/20190821-051441-marostegui.json
* 05:05 marostegui: Restart MySQL on db1122 for binlog format change - [[phab:T230785|T230785]]
* 05:05 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1122 for binlog format change', diff saved to https://phabricator.wikimedia.org/P8949 and previous config saved to /var/cache/conftool/dbconfig/20190821-050501-marostegui.json
* 05:04 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Clarify db1122 status: candidate master for s2 - [[phab:T230785|T230785]] (duration: 00m 55s)
* 02:28 ebernhardson@deploy1001: Finished deploy [search/mjolnir/deploy@adff5ad]: bulk_daemon: Handle non-integer status_code in json response (duration: 04m 09s)
* 02:24 ebernhardson@deploy1001: Started deploy [search/mjolnir/deploy@adff5ad]: bulk_daemon: Handle non-integer status_code in json response
 
== 2019-08-20 ==
* 23:53 ebernhardson@deploy1001: Finished deploy [search/mjolnir/deploy@a7bf6cf]: bulk_daemon: Increase bulk request_timeout (duration: 03m 40s)
* 23:50 eileen: that just changes us to php7 csv so watch for any fail mail
* 23:49 eileen: civicrm revision changed from {{Gerrit|9c7b2ffbc9}} to {{Gerrit|0d1b7f107a}}, config revision is {{Gerrit|58cd6b7ae6}}
* 23:49 ebernhardson@deploy1001: Started deploy [search/mjolnir/deploy@a7bf6cf]: bulk_daemon: Increase bulk request_timeout
* 23:42 Urbanecm: Evening SWAT aborted due to no logs logged for some period of time ([[phab:T230847|T230847]]), no patches were reverted
* 23:27 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: {{Gerrit|fd2cece}}: Enable RelatedArticles on all skins on eswikinews ([[phab:T230660|T230660]]) (duration: 00m 52s)
* 23:22 urbanecm@deploy1001: Synchronized wmf-config/throttle-analyze.php: SWAT: {{Gerrit|a3927a7}}: Grant skipcaptcha to everyone coming from whitelisted IP ([[phab:T227487|T227487]]) (duration: 00m 54s)
* 23:19 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: {{Gerrit|13be059}}: Disable Wikimedia ReadingDepth ([[phab:T229042|T229042]]) (duration: 00m 56s)
* 23:16 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: {{Gerrit|0c08257}}: Remove unused remnant from old menu click tracking ([[phab:T228681|T228681]]) (duration: 00m 55s)
* 23:10 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: {{Gerrit|b94b647}}: Update wgSkipSkins to experiment with not showing skins to users ([[phab:T223824|T223824]]) (duration: 00m 58s)
* 21:20 ebernhardson@deploy1001: Finished deploy [search/mjolnir/deploy@9b40607]: bulk_daemon: Increase max_poll_interval_ms to 15 minutes (duration: 06m 22s)
* 21:14 ebernhardson@deploy1001: Started deploy [search/mjolnir/deploy@9b40607]: bulk_daemon: Increase max_poll_interval_ms to 15 minutes
* 20:49 XioNoX: push BGP_Wikimedia_pops to ams - [[phab:T227808|T227808]]
* 19:28 XioNoX: push BGP_Wikimedia_pops to eqsin - [[phab:T227808|T227808]]
* 19:25 thcipriani: cleanup old (pre 1.34.0-wmf.14) wmf/* branches for core and extensions on gerrit
* 19:25 XioNoX: push BGP_Wikimedia_pops to cr4-ulsfo - [[phab:T227808|T227808]]
* 19:04 XioNoX: push BGP_Wikimedia_pops to cr3-ulsfo - [[phab:T227808|T227808]]
* 19:00 cdanis@deploy1001: Synchronized docroot/noc/db.php: {{Gerrit|80a6743dd}} noc: read dbctl JSON [[phab:T229631|T229631]] (duration: 00m 58s)
* 17:57 bblack: deploying anycast recdns settings to resolv.conf on 41 live hosts in eqiad - https://gerrit.wikimedia.org/r/528524 - [[phab:T228190|T228190]]
* 16:54 cdanis: ✔️ cdanis@deploy1001.eqiad.wmnet /srv/mediawiki-staging 🕐☕ sudo chmod g+w -R /srv/mediawiki-staging/
* 16:49 krinkle@deploy1001: Synchronized php-1.34.0-wmf.17/includes/resourceloader/ResourceLoaderWikiModule.php: [[phab:T229433|T229433]] - {{Gerrit|f84a4abb418de8}} (debugging) (duration: 00m 56s)
* 16:42 Krinkle: php-1.34.0-wmf.17/extensions/TimedMediaHandler is dirty. A merged patch was not deployed - https://gerrit.wikimedia.org/r/#/c/mediawiki/extensions/TimedMediaHandler/+/530558/
* 16:25 hoo: Updated the Wikidata property suggester with data from the 2019-08-12 JSON dump and applied the [[phab:T132839|T132839]] workarounds
* 16:15 krinkle@deploy1001: Synchronized php-1.34.0-wmf.19/includes/resourceloader/ResourceLoaderWikiModule.php: [[phab:T229433|T229433]] - {{Gerrit|44607c984016b}} (debugging) (duration: 00m 55s)
* 16:08 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|fa903b7}}: Enable DNS blacklist for es.wikiquote ([[phab:T230796|T230796]]) (duration: 00m 55s)
* 16:04 oblivian@deploy1001: Pruned MediaWiki: 1.34.0-wmf.13 (duration: 04m 09s)
* 15:59 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|5ab38dc}}: Restrict account creation on es.wikiquote to 1 day/IP ([[phab:T230796|T230796]]) (duration: 01m 00s)
* 15:49 urandom: creating Parsoid/PHP storage schema in restbase-dev -- [[phab:T230792|T230792]]
* 15:48 Urbanecm: Run sudo -u mwdeploy chmod g+w /srv/mediawiki-stagging/wmf-config on deploy1001
* 15:17 zfilipin@deploy1001: rebuilt and synchronized wikiversions files: group0 to 1.34.0-wmf.19
* 14:54 zfilipin@deploy1001: Finished scap: testwiki to php-1.34.0-wmf.19 and rebuild l10n cache (duration: 30m 31s)
* 14:29 jmm@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
* 14:24 zfilipin@deploy1001: Started scap: testwiki to php-1.34.0-wmf.19 and rebuild l10n cache
* 14:21 zfilipin@deploy1001: Pruned MediaWiki: 1.34.0-wmf.15 [keeping static files] (duration: 01m 43s)
* 14:13 zfilipin@deploy1001: Pruned MediaWiki: 1.34.0-wmf.14 (duration: 06m 44s)
* 13:58 jmm@cumin1001: START - Cookbook sre.ganeti.makevm
* 13:32 jmm@cumin2001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
* 13:07 cdanis: ✔️ cdanis@cobalt.wikimedia.org ~ 🕘 sudo systemctl restart gerrit.service
* 13:03 jmm@cumin2001: START - Cookbook sre.ganeti.makevm
* 12:05 awight@deploy1001: Synchronized php-1.34.0-wmf.17/extensions/Wikibase: SWAT: [[gerrit:530845{{!}}Initialize  DatabaseTermIdsResolver and DatabaseTypeIdsStore with repo database name in client. (T230119, T225053)]] (duration: 00m 52s)
* 10:51 marostegui: Stop MySQL on db2051 and db2056 for decommission [[phab:T230777|T230777]] [[phab:T230778|T230778]]
* 10:30 ema: cp5002: restart trafficserver for compress.so config change
* 10:11 tarrow: termbox 2nd smoketests finished
* 09:52 marostegui: Remove db2051 and db2056 from tendril and zarcillo - [[phab:T230777|T230777]] [[phab:T230778|T230778]]
* 09:35 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Remove db2051 and db2056 from config [[phab:T230777|T230777]] [[phab:T230778|T230778]] (duration: 00m 48s)
* 09:00 tarrow: Starting 2nd smoketest of termbox service on eqiad: [[phab:T229907|T229907]]
* 08:28 marostegui@cumin1001: dbctl commit (dc=all): 'Set db2123 as codfw s5 master - [[phab:T230106|T230106]]', diff saved to https://phabricator.wikimedia.org/P8936 and previous config saved to /var/cache/conftool/dbconfig/20190820-082802-marostegui.json
* 08:24 marostegui@cumin1001: dbctl commit (dc=all): 'Re-organize s5 codfw weights - [[phab:T230106|T230106]]', diff saved to https://phabricator.wikimedia.org/P8935 and previous config saved to /var/cache/conftool/dbconfig/20190820-082411-marostegui.json
* 08:19 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Promote db2123 to s5 codfw master [[phab:T230106|T230106]] (duration: 00m 48s)
* 08:05 marostegui: Switchover s5 codfw master db2052 -> db2123 [[phab:T230106|T230106]]
* 07:50 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Reorganize s5 codfw weights [[phab:T230106|T230106]] (duration: 00m 47s)
* 07:49 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2059 and db2066, those two will be decommissioned [[phab:T228258|T228258]]', diff saved to https://phabricator.wikimedia.org/P8934 and previous config saved to /var/cache/conftool/dbconfig/20190820-074900-marostegui.json
* 06:59 moritzm: installing failoid1001/2001 [[phab:T229903|T229903]]
* 05:59 marostegui: Stop MySQL and shutdown db1114 for on-siste maintenance - [[phab:T229452|T229452]]
* 05:55 marostegui: Stop MySQL on db2044 for decommissioning - [[phab:T221594|T221594]]
* 05:37 marostegui: Remove db2049 from tendril and zarcillo [[phab:T230721|T230721]]
* 05:35 marostegui: Stop MySQL on db2049 for decommissioning - [[phab:T230721|T230721]]
* 05:24 marostegui: Reload haproxy on dbproxy2002 [[phab:T230705|T230705]]
* 05:18 marostegui: Switchover m2 codfw master, db2044 -> db2067 [[phab:T230705|T230705]]
 
== 2019-08-19 ==
* 21:21 ejegg: updated payments-wiki from {{Gerrit|7b8091ba87}} to {{Gerrit|85dce8f79f}}
* 21:21 ejegg: updated payments-wiki subdesarrollo
* 19:35 ejegg: updated payments-wiki from {{Gerrit|e3b378f65d}} to {{Gerrit|7b8091ba87}}
* 18:57 Urbanecm: Morning SWaT done
* 18:48 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Raise rollback limit for all groups ([[phab:T228708|T228708]]) (duration: 00m 48s)
* 18:36 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: {{Gerrit|26317c7}}: Fix zhwikisource wgExtraNamespaces entry ([[phab:T230294|T230294]]) (duration: 00m 48s)
* 18:31 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: {{Gerrit|b21bbc0}}: Add `WS` and `CAT` as aliases for zhwikisource namespaces ([[phab:T230548|T230548]]) (duration: 00m 47s)
* 18:26 urbanecm@deploy1001: Synchronized wmf-config/CommonSettings.php: SWAT: {{Gerrit|0a87e3c}}: Assign all rights assigned to suppress group to oversight group ([[phab:T230601|T230601]]) (duration: 00m 48s)
* 17:56 ebernhar1son: freeze cloudelastic writes to let prod clear 30 min backlog
* 17:23 onimisionipe@deploy1001: Finished deploy [wdqs/wdqs@2d36896]: Fix Blazegraph dictionary mixup (duration: 18m 18s)
* 17:17 shdubsh: restarting icinga to disable UI autocomplete
* 17:04 onimisionipe@deploy1001: Started deploy [wdqs/wdqs@2d36896]: Fix Blazegraph dictionary mixup
* 16:45 onimisionipe: pool elastic2050. mgmt issue has been resolved - [[phab:T230597|T230597]]
* 15:39 ejegg: updated payments-wiki from {{Gerrit|00eb090dcc}} to {{Gerrit|e3b378f65d}}
* 13:57 vgutierrez: repooling cp5001
* 12:51 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Remove db2049 from config [[phab:T230721|T230721]] (duration: 00m 48s)
* 12:50 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Remove db2049 from config [[phab:T230721|T230721]] (duration: 00m 48s)
* 12:38 vgutierrez: depooling cp5001 prior to ats-tls deployment
* 12:02 Urbanecm: EU SWAT done
* 11:46 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Revert {{Gerrit|483691c}} ([[phab:T225053|T225053]]) (duration: 00m 48s)
* 11:39 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: {{Gerrit|483691c}}: Revert "Revert "Switch property terms migration to WRITE_NEW on client wikis"" ([[phab:T225053|T225053]]) (duration: 00m 48s)
* 11:15 jmm@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
* 11:12 jmm@cumin1001: START - Cookbook sre.ganeti.makevm
* 11:03 jmm@cumin2001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
* 11:02 elukey@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
* 11:00 jmm@cumin2001: START - Cookbook sre.ganeti.makevm
* 10:53 elukey@cumin1001: START - Cookbook sre.ganeti.makevm
* 10:53 elukey@cumin1001: END (ERROR) - Cookbook sre.ganeti.makevm (exit_code=97)
* 10:52 elukey@cumin1001: START - Cookbook sre.ganeti.makevm
* 10:38 jdrewniak@deploy1001: Synchronized portals: Wikimedia Portals Update: [[gerrit:530826{{!}} Bumping portals to master (T128546)]] (duration: 00m 49s)
* 10:37 jdrewniak@deploy1001: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: [[gerrit:530826{{!}} Bumping portals to master (T128546)]] (duration: 00m 49s)
* 10:32 elukey@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
* 10:22 elukey@cumin1001: START - Cookbook sre.ganeti.makevm
* 09:57 jbond42: add mapped ipv6 to conf200* servers https://gerrit.wikimedia.org/r/c/operations/puppet/+/528475
* 09:26 marostegui@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 09:24 marostegui@cumin1001: START - Cookbook sre.hosts.downtime
* 08:57 godog: add 100G to graphite1004 / graphite2003 /srv LVs
* 07:59 onimisionipe: shutdown elastic2050 to prepare for mgmt reset - [[phab:T230597|T230597]]
* 07:40 marostegui: Redact napwikisource on db1124 and db2094 - [[phab:T210762|T210762]]
* 07:19 moritzm: installing golang-1.11 security updates on buster
* 07:08 moritzm: installing ffmpeg security updates on buster
* 06:37 vgutierrez: upgrading acme-chief to version 0.20 on production servers - [[phab:T229096|T229096]]
* 06:30 vgutierrez@puppetmaster1001: conftool action : set/pooled=yes; selector: name=ncredir1001.eqiad.wmnet
* 06:29 vgutierrez@puppetmaster1001: conftool action : set/pooled=no; selector: name=ncredir1001.eqiad.wmnet
* 06:28 vgutierrez@puppetmaster1001: conftool action : set/pooled=yes; selector: name=ncredir1002.eqiad.wmnet
* 06:27 vgutierrez@puppetmaster1001: conftool action : set/pooled=no; selector: name=ncredir1002.eqiad.wmnet
* 06:26 moritzm: installing ghostscript security updates on scb/proton/notebook* hosts
* 06:25 vgutierrez@puppetmaster1001: conftool action : set/pooled=yes; selector: name=ncredir2001.codfw.wmnet
* 06:25 vgutierrez@puppetmaster1001: conftool action : set/pooled=no; selector: name=ncredir2001.codfw.wmnet
* 06:24 vgutierrez@puppetmaster1001: conftool action : set/pooled=yes; selector: name=ncredir2002.codfw.wmnet
* 06:22 vgutierrez@puppetmaster1001: conftool action : set/pooled=no; selector: name=ncredir2002.codfw.wmnet
* 06:21 vgutierrez: rolling upgrade of nginx in ncredir hosts
* 06:03 moritzm: installing php5 security updates
* 05:51 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Remove db2067 from config [[phab:T230705|T230705]]  (duration: 00m 47s)
* 05:50 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Remove db2067 from config [[phab:T230705|T230705]]  (duration: 00m 50s)
* 05:46 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2067, will be moved to m1 [[phab:T230705|T230705]]', diff saved to https://phabricator.wikimedia.org/P8930 and previous config saved to /var/cache/conftool/dbconfig/20190819-054606-marostegui.json
* 05:29 elukey: reboot cp2004 due to bnx2x crash (kern.log saved into my home on the host if needed)
 
== 2019-08-18 ==
* 08:28 onimisionipe: running `_cluster/reroute?pretty&explain=true&retry_failed` on eqiad production-search cluster to force allocation of shards
 
== 2019-08-16 ==
* 19:48 sbassett: Deployed security patch for [[phab:T230576|T230576]] (ex:MobileFrontend)
* 18:57 @: helmfile [STAGING] Ran 'apply' command on namespace 'sessionstore' for release 'staging' .
* 16:38 XioNoX: add BGP sessions to Scaleway (AS12876) in esams
* 16:12 elukey: upload prometheus-druid-exporter 0.7-1 to stretch/buster-wikimedia
* 15:42 elukey: roll restart of druid broker/historicals to pick up new logging/metrics settings
* 14:39 onimisionipe: run `bmc-device --cold-reset; echo $?` in elastic2050 hoping it resets mgmt interface -[[phab:T230597|T230597]]
* 14:24 gehel: rolling reboot of cloudelastic
* 13:52 mholloway-shell@deploy1001: Synchronized wmf-config/InitialiseSettings-labs.php: MachineVision (beta): Request labels targeting Beta Wikidata (duration: 00m 50s)
* 08:18 _joe_: stopping php on phab1003, to restart it with systemd
* 06:50 _joe_: upgrading envoyproxy across production (http2 CVEs)
* 02:51 vgutierrez: repooling cp5002, running compress.so experiment
 
== 2019-08-15 ==
* 23:35 smalyshev@deploy1001: Finished deploy [wdqs/wdqs@b4da6e4]: Rollback blazegraph due to [[phab:T230588|T230588]] (duration: 09m 48s)
* 23:25 smalyshev@deploy1001: Started deploy [wdqs/wdqs@b4da6e4]: Rollback blazegraph due to [[phab:T230588|T230588]]
* 21:54 smalyshev@deploy1001: Finished deploy [wdqs/wdqs@fce8177]: Weekly deploy (duration: 25m 28s)
* 21:28 smalyshev@deploy1001: Started deploy [wdqs/wdqs@fce8177]: Weekly deploy
* 21:27 ebernhardson: finish restarting cloudelastic-chi-eqiad with -XX:NewRatio=3
* 21:18 ebernhardson: increase cloudelastic indices.recovery.max_bytes_per_sec from 40mbit to 512mbit as these have 10G networking
* 21:07 ebernhardson: restart cloudelastic1002 with -XX:NewRatio=3 to match cloudelastic1001
* 20:22 gehel@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 19:37 ema: depool cp5002 during the EU night, running compress.so experiment
* 19:28 gehel@cumin1001: END (PASS) - Cookbook sre.wdqs.reboot-wdqs (exit_code=0)
* 19:19 sbassett: Deployed security patch for [[phab:T230402|T230402]] (1.34.0-wmf.17)
* 19:18 gehel@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 19:18 sbassett: Deployed security patch for [[phab:T229541|T229541]] (1.34.0-wmf.17)
* 19:17 gehel@cumin1001: END (ERROR) - Cookbook sre.wdqs.data-transfer (exit_code=97)
* 19:17 gehel@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 19:01 ebernhardson: restart elasticsearch on cloudelastic1001 with -XX:NewRatio=3
* 18:51 gehel@cumin1001: START - Cookbook sre.wdqs.reboot-wdqs
* 17:58 mbsantos@deploy1001: Finished deploy [proton/deploy@fb0b2a5]: Update chromium-renderer to {{Gerrit|3f1cc72}} ([[phab:T218220|T218220]]) (duration: 00m 43s)
* 17:58 mbsantos@deploy1001: Started deploy [proton/deploy@fb0b2a5]: Update chromium-renderer to {{Gerrit|3f1cc72}} ([[phab:T218220|T218220]])
* 17:47 mbsantos@deploy1001: Finished deploy [mobileapps/deploy@1bd2bea]: Update service-mobileapp-node to {{Gerrit|5c1da03}} ([[phab:T230067|T230067]] [[phab:T229984|T229984]]) (duration: 05m 53s)
* 17:41 mbsantos@deploy1001: Started deploy [mobileapps/deploy@1bd2bea]: Update service-mobileapp-node to {{Gerrit|5c1da03}} ([[phab:T230067|T230067]] [[phab:T229984|T229984]])
* 17:11 ejegg: updated payments-wiki from {{Gerrit|44eae2d65f}} to {{Gerrit|00eb090dcc}}
* 17:02 cstone: civicrm revision changed from {{Gerrit|3caf54a0d2}} to {{Gerrit|9c7b2ffbc9}}
* 16:53 reedy@deploy1001: Synchronized docroot/noc/db.php: Use WmfClusters from seperate file (duration: 00m 47s)
* 16:52 reedy@deploy1001: Synchronized src/WmfClusters.php: Move WmfClusters.php (duration: 00m 47s)
* 16:27 XioNoX: advertise core v4 range (208.80.152.0/22) from eqord - [[phab:T167841|T167841]]
* 16:09 ori: Finished messing around with mwdebug1002
* 16:06 reedy@deploy1001: Synchronized docroot/: phpcs fixes (duration: 00m 47s)
* 16:05 reedy@deploy1001: Synchronized wmf-config/arclamp.php: phpcs (duration: 00m 47s)
* 16:04 reedy@deploy1001: Synchronized tests/: phpunit (duration: 00m 47s)
* 16:03 reedy@deploy1001: Synchronized phpcs.xml: more exclusions! (duration: 00m 47s)
* 15:40 ebernhardson: unfreeze writes to cloudelastic cluster
* 15:37 ema: cp5002: re-pool with compress.so cache:false
* 15:34 herron: performing rolling restarts of eqiad kafka-main brokers for security updates
* 15:34 gehel@cumin1001: END (PASS) - Cookbook sre.wdqs.reboot-wdqs (exit_code=0)
* 15:13 ori: Messing around with CommonSettings.php on mwdebug1002 to profile config loading
* 14:58 gehel@cumin1001: START - Cookbook sre.wdqs.reboot-wdqs
* 14:58 gehel@cumin1001: END (ERROR) - Cookbook sre.wdqs.reboot-wdqs (exit_code=97)
* 14:56 gehel@cumin1001: START - Cookbook sre.wdqs.reboot-wdqs
* 14:52 reedy@deploy1001: Synchronized wmf-config/: phpcs cleanup (duration: 00m 47s)
* 14:51 gehel@cumin1001: END (ERROR) - Cookbook sre.wdqs.reboot-wdqs (exit_code=97)
* 14:51 reedy@deploy1001: Synchronized multiversion/: phpcs cleanup (duration: 00m 47s)
* 14:50 gehel@cumin1001: START - Cookbook sre.wdqs.reboot-wdqs
* 14:50 ema: cp5002 depool due to compress.so crash
* 14:50 gehel@cumin1001: END (PASS) - Cookbook sre.wdqs.reboot-wdqs (exit_code=0)
* 14:49 reedy@deploy1001: Synchronized phpcs.xml: remove exclusions (duration: 00m 49s)
* 14:47 gehel@cumin1001: START - Cookbook sre.wdqs.reboot-wdqs
* 14:44 gehel@cumin1001: END (PASS) - Cookbook sre.wdqs.reboot-wdqs (exit_code=0)
* 14:41 gehel@cumin1001: START - Cookbook sre.wdqs.reboot-wdqs
* 14:33 papaul: shutting down db2063 for maintenance
* 13:17 reedy@deploy1001: Synchronized phpcs.xml: remove excess lines (duration: 00m 46s)
* 12:56 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Remove account creation restrictions ([[phab:T230304|T230304]], [[phab:T230521|T230521]]) (duration: 00m 48s)
* 12:21 Urbanecm: EU SWAT done
* 12:19 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: {{Gerrit|d036388}}:  Increase default thumb size to 260px on Dutch Wikipedia ([[phab:T215106|T215106]]) (duration: 00m 48s)
* 12:16 urbanecm@deploy1001: Synchronized php-1.34.0-wmf.17/extensions/AbuseFilter/extension.json: SWAT: {{Gerrit|e9422c5}}: Rearrange config to provide better experience ([[phab:T191740|T191740]], [[phab:T200032|T200032]], [[phab:T226987|T226987]]) (duration: 00m 47s)
* 12:14 urbanecm@deploy1001: Synchronized wmf-config/: SWAT: {{Gerrit|7e95f6d}}: Update AbuseFilter config to keep the status quo ([[phab:T191740|T191740]], [[phab:T200032|T200032]], [[phab:T226987|T226987]]) (duration: 00m 49s)
* 12:04 Urbanecm: EU SWAT is going a few minutes out of its window
* 12:01 gehel@cumin1001: END (PASS) - Cookbook sre.elasticsearch.force-shard-allocation (exit_code=0)
* 12:01 gehel@cumin1001: START - Cookbook sre.elasticsearch.force-shard-allocation
* 12:00 gehel@cumin1001: END (PASS) - Cookbook sre.elasticsearch.force-shard-allocation (exit_code=0)
* 12:00 gehel@cumin1001: START - Cookbook sre.elasticsearch.force-shard-allocation
* 11:37 Urbanecm: Run mwscript namespaceDupes.php --wiki=zhwikisource --add-prefix="FIXME" --fix ([[phab:T230294|T230294]])
* 11:36 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: {{Gerrit|fe9b6ed}}: Add Portal namespace on zhwikisource ([[phab:T230294|T230294]]) (duration: 00m 47s)
* 11:29 urbanecm@deploy1001: Synchronized wmf-config/throttle.php: SWAT: {{Gerrit|377cc53}}: Add new throttle rule for cawiki editathon ([[phab:T230313|T230313]]) (duration: 00m 47s)
* 11:26 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Remove napwikisource from wgProofreadPageNamespaceIds ([[phab:T230541|T230541]]) (duration: 00m 47s)
* 11:12 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: {{Gerrit|0d8c516}}: Fix addition of Hubblesite.org and Spacetelescope.org to commons wgCopyUploadsDomains ([[phab:T230083|T230083]]) (duration: 00m 48s)
* 10:50 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [[phab:T230533|T230533]]: Add more import sources for napwikisource (duration: 00m 52s)
* 08:54 gehel@cumin1001: END (PASS) - Cookbook sre.elasticsearch.force-shard-allocation (exit_code=0)
* 08:54 gehel@cumin1001: START - Cookbook sre.elasticsearch.force-shard-allocation
* 08:52 gehel@cumin1001: END (PASS) - Cookbook sre.elasticsearch.force-shard-allocation (exit_code=0)
* 08:52 gehel@cumin1001: START - Cookbook sre.elasticsearch.force-shard-allocation
* 07:35 ema: cp5002: ats-backend-restart to enable compress plugin
* 06:38 ema: wdqs1009: restart wdqs-updater.service
* 00:15 robh: scs-ulsfo offline due to networking issues, rob returning tomorrow with fix [[phab:T230077|T230077]]
* 00:03 twentyafterfour: starting phabricator upgrade to 2019-08-14/1 refs [[phab:T215697|T215697]]
 
== 2019-08-14 ==
* 23:13 ebernhardson: leave cloudelastic writes paused, and dropping from backlog queue, to allow primary clusters to catch up
* 22:41 eileen: civicrm revision changed from {{Gerrit|569e52e23d}} to {{Gerrit|3caf54a0d2}}, config revision is {{Gerrit|1c76e94ac3}}
* 22:38 ebernhardson: freeze writes to cloudelastic for real this time
* 22:03 ejegg: updated fundraising python tools from {{Gerrit|827ce3750e}} to {{Gerrit|5c080bac63}}
* 22:01 robh: starting scs-ulsfo replacement.  There will be icinga errors and they are intentionally being allowed so we know when things dont recover properly [[phab:T230077|T230077]]
* 21:37 XioNoX: advertise core v6 range (2620:0:860::/46) from eqord - [[phab:T167841|T167841]]
* 21:30 @: helmfile [STAGING] Ran 'apply' command on namespace 'sessionstore' for release 'staging' .
* 21:26 ebernhardson: thaw writes to cloudelastic
* 21:24 ejegg: updated payments-wiki from {{Gerrit|9533f70fab}} to {{Gerrit|44eae2d65f}}
* 21:18 @: helmfile [STAGING] Ran 'apply' command on namespace 'sessionstore' for release 'staging' .
* 21:13 ebernhardson: apply freeze to cloudelastic writes, to determine if backlog processing can catchup while deferring cloudelastic writes
* 20:49 @: helmfile [STAGING] Ran 'apply' command on namespace 'sessionstore' for release 'staging' .
* 20:46 @: helmfile [STAGING] Ran 'apply' command on namespace 'sessionstore' for release 'staging' .
* 20:44 @: helmfile [STAGING] Ran 'apply' command on namespace 'sessionstore' for release 'staging' .
* 20:44 @: helmfile [STAGING] Ran 'apply' command on namespace 'sessionstore' for release 'staging' .
* 18:01 gehel@cumin1001: END (PASS) - Cookbook sre.elasticsearch.rolling-reboot (exit_code=0)
* 17:29 gehel@cumin1001: START - Cookbook sre.elasticsearch.rolling-reboot
* 16:32 gehel@cumin1001: END (PASS) - Cookbook sre.elasticsearch.force-shard-allocation (exit_code=0)
* 16:32 gehel@cumin1001: START - Cookbook sre.elasticsearch.force-shard-allocation
* 16:31 gehel@cumin1001: END (ERROR) - Cookbook sre.elasticsearch.rolling-reboot (exit_code=97)
* 15:50 ema: cp5002: ats-backend-restart to disable compress plugin while I'm not around
* 15:45 gehel@cumin1001: START - Cookbook sre.elasticsearch.rolling-reboot
* 15:41 gehel: powercycling elastic101[789]
* 15:30 gehel@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.rolling-reboot (exit_code=99)
* 14:55 vgutierrez: upgrade nginx to 1.13.9-1wm2 in cp3032
* 14:17 fsero: upgrading envoy package to 1.11.1
* 14:09 vgutierrez: rolling back nginx upgrade in cp3032
* 14:01 reedy@deploy1001: Synchronized wmf-config/interwiki.php: Update interwiki cache (duration: 03m 04s)
* 13:58 reedy@deploy1001: Synchronized static/images/project-logos/: [[phab:T210752|T210752]] (duration: 00m 47s)
* 13:56 reedy@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [[phab:T210752|T210752]] (duration: 00m 47s)
* 13:55 reedy@deploy1001: rebuilt and synchronized wikiversions files: [[phab:T212881|T212881]]
* 13:53 reedy@deploy1001: Synchronized dblists/: [[phab:T212881|T212881]] (duration: 00m 48s)
* 12:48 gehel@cumin1001: START - Cookbook sre.elasticsearch.rolling-reboot
* 12:47 James_F: <sadtrombone> Wiki creation is still not working correctly, unfortunately.
* Away: We're going to try making a new wiki. [[phab:T212881|T212881]]
* 12:20 vgutierrez: rolling upgrade of nginx to 1.13.9-1+wmf2 in the cache cluster
* 12:17 gehel@cumin1001: END (ERROR) - Cookbook sre.elasticsearch.rolling-reboot (exit_code=97)
* 11:20 vgutierrez: repooling cp5002
* 11:19 tarrow: termbox smoketests finished
* 11:06 gehel@cumin1001: START - Cookbook sre.elasticsearch.rolling-reboot
* 10:46 ema: depool cp5002 after crash. See /var/log/trafficserver/crash-2019-08-14-104502.log
* 10:28 tarrow: Starting smoketest of termbox service on eqiad: [[phab:T229907|T229907]]
* 09:40 gehel@cumin1001: END (ERROR) - Cookbook sre.elasticsearch.rolling-reboot (exit_code=97)
* 09:20 ema: cp5002: ats-backend-restart to enable compress plugin
* 08:52 vgutierrez: upgrading nginx to 1.13.9-1+wmf2 in cp1075, cp2001, cp3030 and cp4027 (text) and cp1076, cp2002, cp3034, cp4021 (upload)
* 08:25 vgutierrez: upgrading nginx to 1.13.9-1+wmf2 in cp5001 (upload) and cp5007 (text)
* 08:17 vgutierrez: uploaded nginx-1.13.9-1+wmf2 to apt.wikimedia.org (stretch)
* 08:16 gehel@cumin1001: START - Cookbook sre.elasticsearch.rolling-reboot
* 08:12 gehel@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.rolling-reboot (exit_code=99)
* 08:10 gehel@cumin1001: START - Cookbook sre.elasticsearch.rolling-reboot
* 07:09 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Remove db2063 from config [[phab:T230459|T230459]] (duration: 00m 47s)
* 07:08 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Remove db2063 from config [[phab:T230459|T230459]] (duration: 00m 48s)
 
== 2019-08-13 ==
* 20:43 ejegg: rolled back payments-wiki from {{Gerrit|9ed8be0532}} to {{Gerrit|9533f70fab}}
* 20:34 ejegg: updated payments-wiki from {{Gerrit|9533f70fab}} to {{Gerrit|9ed8be0532}}
* 20:32 mholloway-shell@deploy1001: Synchronized wmf-config/InitialiseSettings-labs.php: Fix MachineVision provider config (duration: 00m 47s)
* 19:48 gehel@cumin2001: END (PASS) - Cookbook sre.elasticsearch.rolling-reboot (exit_code=0)
* 19:23 ppchelko@deploy1001: Finished deploy [cpjobqueue/deploy@3882ddb]: Increase cirrusSearchLinksUpdatePrioritized concurrency 150 -> 200 [[phab:T220625|T220625]] (duration: 00m 58s)
* 19:22 ppchelko@deploy1001: Started deploy [cpjobqueue/deploy@3882ddb]: Increase cirrusSearchLinksUpdatePrioritized concurrency 150 -> 200 [[phab:T220625|T220625]]
* 19:19 ppchelko@deploy1001: deploy aborted: Revert on canary (duration: 00m 18s)
* 19:18 ppchelko@deploy1001: Started deploy [cpjobqueue/deploy@f1a562e]: Revert on canary
* 19:17 ppchelko@deploy1001: deploy aborted: Increase cirrusSearchLinksUpdatePrioritized concurrency 150 -> 200 [[phab:T220625|T220625]] (duration: 01m 30s)
* 19:15 ppchelko@deploy1001: Started deploy [cpjobqueue/deploy@f1a562e]: Increase cirrusSearchLinksUpdatePrioritized concurrency 150 -> 200 [[phab:T220625|T220625]]
* 19:03 @: helmfile [STAGING] Ran 'apply' command on namespace 'sessionstore' for release 'staging' .
* 18:50 gehel@cumin2001: START - Cookbook sre.elasticsearch.rolling-reboot
* 18:41 ebernhardson: set cpufreq scaling_governor to performance on cloudelastic100[1-4] to test any changes to indexing performance
* 18:38 mholloway-shell@deploy1001: Synchronized wmf-config/CommonSettings.php: Enable MachineVision on Beta (4/4) (duration: 00m 48s)
* 18:34 mholloway-shell@deploy1001: Synchronized wmf-config/InitialiseSettings-labs.php: Enable MachineVision on Beta (3/4) (duration: 00m 47s)
* 18:33 crusnov@deploy1001: Finished deploy [netbox/deploy@367ca84]: Update Netbox to v2.6.1-wmf3 affects: [[phab:T223292|T223292]] (fix perms) (duration: 00m 09s)
* 18:33 crusnov@deploy1001: Started deploy [netbox/deploy@367ca84]: Update Netbox to v2.6.1-wmf3 affects: [[phab:T223292|T223292]] (fix perms)
* 18:33 crusnov@deploy1001: Finished deploy [netbox/deploy@367ca84]: Update Netbox to v2.6.1-wmf3 affects: [[phab:T223292|T223292]] (duration: 00m 43s)
* 18:32 crusnov@deploy1001: Started deploy [netbox/deploy@367ca84]: Update Netbox to v2.6.1-wmf3 affects: [[phab:T223292|T223292]]
* 18:32 crusnov@deploy1001: Finished deploy [netbox/deploy@367ca84]: Update Netbox to v2.6.1-wmf3 affects: [[phab:T223292|T223292]] (duration: 00m 36s)
* 18:31 crusnov@deploy1001: Started deploy [netbox/deploy@367ca84]: Update Netbox to v2.6.1-wmf3 affects: [[phab:T223292|T223292]]
* 18:30 mholloway-shell@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enable MachineVision on Beta (2/4) (duration: 00m 48s)
* 18:27 mholloway-shell@deploy1001: Synchronized wmf-config/extension-list: Enable MachineVision on Beta (1/4) (duration: 00m 48s)
* 17:44 XioNoX: set target netflow port to 2000 in eqiad
* 17:11 XioNoX: repool eqsin
* 17:06 XioNoX: rollback: disable all peering and transit on cr2-eqsin
* 16:57 XioNoX: reboot cr2-eqsin
* 16:46 XioNoX: disable all peering and transit on cr2-eqsin
* 16:25 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 16:25 filippo@cumin1001: START - Cookbook sre.hosts.downtime
* 16:25 filippo@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 16:25 filippo@cumin1001: START - Cookbook sre.hosts.downtime
* 16:07 ppchelko@deploy1001: Finished deploy [restbase/deploy@8fca708]: Expose transform/wikitext/to/mobile-html endpoint [[phab:T211026|T211026]], take 2 (duration: 10m 12s)
* 15:56 ppchelko@deploy1001: Started deploy [restbase/deploy@8fca708]: Expose transform/wikitext/to/mobile-html endpoint [[phab:T211026|T211026]], take 2
* 15:56 ppchelko@deploy1001: Finished deploy [restbase/deploy@8fca708]: Expose transform/wikitext/to/mobile-html endpoint [[phab:T211026|T211026]] (duration: 07m 35s)
* 15:49 ppchelko@deploy1001: Started deploy [restbase/deploy@8fca708]: Expose transform/wikitext/to/mobile-html endpoint [[phab:T211026|T211026]]
* 15:46 XioNoX: fail vrrp master to cr1-eqsin
* 15:42 gehel@cumin2001: END (FAIL) - Cookbook sre.elasticsearch.rolling-reboot (exit_code=99)
* 15:39 bblack: puppet re-enabled on lvs1014, lvs1016, icinga1001
* 15:35 XioNoX: depool eqsin for cr2-eqsin upgrade
* 15:32 bblack: disabled pupped on lvs1014, lvs1016, icinga1001 ahead of deploying https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/528885/ - [[phab:T229621|T229621]]
* 15:32 gehel@cumin2001: START - Cookbook sre.elasticsearch.rolling-reboot
* 15:30 XioNoX: rollback ospf + bgp changes on cr2-eqord
* 15:19 XioNoX: restart cr2-eqord - [[phab:T227886|T227886]]
* 15:12 XioNoX: disable all peering and transit on cr2-eqord
* 15:01 XioNoX: increase ospf cost of cr2-eqord<->cr2-eqiad link (+1000)
* 14:57 ema: cp5002: reboot for kernel upgrade
* 14:42 gehel@cumin2001: END (FAIL) - Cookbook sre.elasticsearch.rolling-reboot (exit_code=99)
* 14:42 gehel@cumin2001: START - Cookbook sre.elasticsearch.rolling-reboot
* 14:31 gehel@cumin2001: END (FAIL) - Cookbook sre.elasticsearch.rolling-reboot (exit_code=99)
* 14:31 gehel@cumin2001: START - Cookbook sre.elasticsearch.rolling-reboot
* 14:29 XioNoX: rollback: disable all peering and transit on cr2-eqdfw
* 14:18 XioNoX: reboot cr2-eqdfw for software upgrade - [[phab:T227886|T227886]]
* 14:14 XioNoX: disable all peering and transit on cr2-eqdfw
* 14:04 volans@cumin2001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 14:04 volans@cumin2001: START - Cookbook sre.hosts.decommission
* 13:20 jbond42: rolling update of postgresql-9.6
* 13:07 jijiki: rolling restart hhvm on api servers in eqiad
* 12:57 jijiki: Restart hhvm on mw1235
* 12:17 fsero@puppetmaster1001: conftool action : set/pooled=true; selector: dnsdisc=sessionstore{{!}}citoid{{!}}cxserver{{!}}eventgate-analytics{{!}}eventgate-main{{!}}termbox{{!}}blubberoid{{!}}mathoid{{!}}zotero,name=eqiad
* 12:08 _joe_: restarted php-fpm on mw1221
* 12:03 fsero@: helmfile [EQIAD] Ran 'apply' command on namespace 'sessionstore' for release 'production' .
* 12:00 fsero@: helmfile [EQIAD] Ran 'apply' command on namespace 'cxserver' for release 'production' .
* 11:56 fsero@: helmfile [EQIAD] Ran 'apply' command on namespace 'blubberoid' for release 'production' .
* 11:56 fsero@: helmfile [EQIAD] Ran 'apply' command on namespace 'blubberoid' for release 'production' .
* 11:49 fsero@: helmfile [EQIAD] Ran 'apply' command on namespace 'blubberoid' for release 'production' .
* 11:44 fsero: recreating cxserver blubber and sessionstore namespace - [[phab:T228836|T228836]]
* 11:39 fsero@: helmfile [EQIAD] Ran 'apply' command on namespace 'mathoid' for release 'production' .
* 11:35 gehel: restart wdqs-blazegraph on wdqs2001
* 11:34 gehel: restart wdqs-updater on wdqs2001
* 11:30 fsero@: helmfile [EQIAD] Ran 'apply' command on namespace 'eventgate-main' for release 'main' .
* 11:29 fsero@: helmfile [EQIAD] Ran 'apply' command on namespace 'eventgate-analytics' for release 'analytics' .
* 11:25 fsero@: helmfile [EQIAD] Ran 'apply' command on namespace 'citoid' for release 'production' .
* 11:21 fsero: recreating citoid eventgate-analytics eventgate-main mathoid namespace - [[phab:T228836|T228836]]
* 11:20 fsero@: helmfile [EQIAD] Ran 'apply' command on namespace 'termbox' for release 'production' .
* 11:18 raynor: EU SWAT finished
* 11:15 pmiazga@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:529925{{!}}Undeploy editor gender surveys (T227793)]] (duration: 00m 48s)
* 11:13 fsero: recreating termbox namespace - [[phab:T228836|T228836]]
* 11:06 oblivian@: helmfile [EQIAD] Ran 'apply' command on namespace 'zotero' for release 'production' .
* 11:04 fsero: resetting net.netfilter.nf_conntrack_tcp_timeout_time_wait to 65 in kubernetes2006
* 10:59 _joe_: [eqiad] downtiming zotero on icinga for 10 minutes while recreating the deployment with helmfile
* 10:57 oblivian@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 10:57 oblivian@cumin1001: START - Cookbook sre.hosts.downtime
* 10:56 oblivian@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 10:56 oblivian@cumin1001: START - Cookbook sre.hosts.downtime
* 10:49 oblivian@: helmfile [EQIAD] Ran 'apply' command on namespace 'kube-system' for release 'coredns' .
* 10:44 oblivian@: helmfile [EQIAD] Ran 'apply' command on namespace 'kube-system' for release 'coredns' .
* 10:39 oblivian@: helmfile [EQIAD] Ran 'apply' command on namespace 'kube-system' for release 'rbac-deploy-clusterrole' .
* 10:39 _joe_: recreating rbac roles via helmfile
* 10:32 oblivian@: helmfile [EQIAD] Ran 'apply' command on namespace 'kube-system' for release 'calico-policy-controller' .
* 10:29 _joe_: deleting calico deploy and configmap in kubernetes in eqiad, recreating with helmfile
* 10:25 jbond42: rolling update of ghostscript
* 10:23 fsero@puppetmaster1001: conftool action : set/pooled=false; selector: dnsdisc=sessionstore{{!}}citoid{{!}}cxserver{{!}}eventgate-analytics{{!}}eventgate-main{{!}}termbox{{!}}blubberoid{{!}}mathoid{{!}}zotero,name=eqiad
* 10:10 fsero: initialize_cluster.sh kube-system kubemaster.svc.eqiad.wmnet 6443 - [[phab:T228836|T228836]]
* 10:10 fsero: creating tiller in kube-system for helmfile [[phab:T228836|T228836]]
* 09:58 vgutierrez: upgrading the rest of cache@upload to 8.0.3-1wm3 - [[phab:T221594|T221594]]
* 08:49 marostegui: Stop MySQL on db2057 - [[phab:T230394|T230394]]
* 08:48 marostegui: Remove db2057 from tendril and zarcillo [[phab:T230394|T230394]]
* 07:55 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Remove db2057 from config [[phab:T230394|T230394]] (duration: 00m 47s)
* 07:54 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Remove db2057 from config [[phab:T230394|T230394]] (duration: 00m 48s)
* 06:59 volans: upgrading spicerack to 0.0.26 on cumin2001
* 06:49 vgutierrez: Rolling restart of fifo-log-demux and atsmtail services across cache@upload
* 06:38 vgutierrez: upgrading fifo-log-demux to version 0.5 in cache@upload
* 06:11 vgutierrez: Upgrading ATS to 8.0.3-1wm3 in cp2002, cp1076, cp3034 and cp4021 - [[phab:T221594|T221594]]
* 05:47 marostegui: Stop mysql on db2050 - [[phab:T230391|T230391]]
* 05:40 marostegui: Remove db2050 from tendril and zarcillo [[phab:T230391|T230391]]
* 05:35 marostegui@cumin1001: dbctl commit (dc=all): 'Remove db2050 from config, host will be decommissioned [[phab:T230391|T230391]]', diff saved to https://phabricator.wikimedia.org/P8904 and previous config saved to /var/cache/conftool/dbconfig/20190813-053514-marostegui.json
* 05:33 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Remove db2050 from config [[phab:T230391|T230391]] (duration: 00m 48s)
* 05:32 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Remove db2050 from config [[phab:T230391|T230391]] (duration: 00m 48s)
* 05:12 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Provision db2122 into s7 [[phab:T228969|T228969]] (duration: 00m 47s)
* 05:11 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Provision db2122 into s7 [[phab:T228969|T228969]] (duration: 00m 49s)
* 05:10 marostegui@cumin1001: dbctl commit (dc=all): 'Provision db2122 into s7 [[phab:T228969|T228969]]', diff saved to https://phabricator.wikimedia.org/P8903 and previous config saved to /var/cache/conftool/dbconfig/20190813-051019-marostegui.json
 
== 2019-08-12 ==
* 23:24 XioNoX: add samplicator to buster-wikimedia repo
* 21:33 eileen: tools revision changed from {{Gerrit|2a56e5e283}} to {{Gerrit|827ce3750e}}
* 20:43 eileen: civicrm revision changed from {{Gerrit|be5b5a150b}} to {{Gerrit|569e52e23d}}, config revision is {{Gerrit|1c76e94ac3}}
* 20:17 mbsantos@deploy1001: Finished deploy [mobileapps/deploy@615004f]: Update service-mobileapp-node to {{Gerrit|f0a2847}} (duration: 05m 05s)
* 20:12 mbsantos@deploy1001: Started deploy [mobileapps/deploy@615004f]: Update service-mobileapp-node to {{Gerrit|f0a2847}}
* 20:08 gehel@cumin2001: END (ERROR) - Cookbook sre.elasticsearch.rolling-reboot (exit_code=97)
* 19:15 mforns@deploy1001: Finished deploy [analytics/refinery@5418d3b]: deploying analytics-refinery up to {{Gerrit|5418d3be5f65f7325324d0c15c51b3ca722dde1c}} (duration: 39m 23s)
* 19:14 gehel@cumin2001: START - Cookbook sre.elasticsearch.rolling-reboot
* 18:35 mforns@deploy1001: Started deploy [analytics/refinery@5418d3b]: deploying analytics-refinery up to {{Gerrit|5418d3be5f65f7325324d0c15c51b3ca722dde1c}}
* 17:42 onimisionipe@deploy1001: Finished deploy [wdqs/wdqs@8579f50]: Updated GUI, New endpoints and New Blazegraph and Updater build (duration: 05m 04s)
* 17:37 onimisionipe@deploy1001: Started deploy [wdqs/wdqs@8579f50]: Updated GUI, New endpoints and New Blazegraph and Updater build
* 15:05 jijiki: rolling restat php-fpm on mw122[4-8] - [[phab:T219150|T219150]]
* 15:01 ema: cp1076, cp500[12]: restart trafficserver with compress plugin disabled
* 14:39 jijiki: disable puppet on mw122[4-8]
* 14:24 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Account creation throttle to 2 everywhere ([[phab:T230304|T230304]]) (duration: 00m 47s)
* 13:51 gehel@cumin2001: END (ERROR) - Cookbook sre.elasticsearch.rolling-reboot (exit_code=97)
* 13:18 gehel@cumin2001: START - Cookbook sre.elasticsearch.rolling-reboot
* 12:51 ema: cp1076,cp5001,cp5002: ats-backend-restart to disable ATS systemd hardening features
* 11:36 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: More restrictive account creation throttle ([[phab:T230304|T230304]]) (duration: 00m 47s)
* 11:34 vgutierrez: restart atsmtail@backend on cp1076
* 11:24 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enable global abuse filters on warwiki as an emergency measure ([[phab:T230304|T230304]]) (duration: 00m 48s)
* 10:59 vgutierrez: restarting trafficserver in cp5002
* 10:47 vgutierrez: Upgrade trafficserver to 8.0.3-1wm3 in cp5002 - [[phab:T221594|T221594]]
* 10:47 jijiki: Enabling puppet and rolling restarting nginx across the fleet - [[phab:T224538|T224538]]
* 10:39 jijiki: Restarting  nginx  on mwmaint2001.codfw.wmnet,mwmaint1002.eqiad.wmnet,scandium.eqiad.wmnet,snapshot[1005-1009].eqiad.wmnet, deploy2001.codfw.wmnet,deploy1001.eqiad.wmnet
* 10:28 jijiki: Disable puppet on all servers running a services_proxy - [[phab:T224538|T224538]]
* 10:09 marostegui: Remove empty table globalblocks from s3 (where it exists) - [[phab:T230055|T230055]]
* 10:07 vgutierrez: Upgrade trafficserver to 8.0.3-1wm3 in cp5001 - [[phab:T221594|T221594]]
* 10:01 marostegui: Remove empty table wikidatawiki.globalblocks from s8 - [[phab:T230055|T230055]]
* 09:36 jijiki: Disable puppet on mwmaint for 425027
* 09:36 marostegui: Remove empty table enwikivoyage.globalblocks from s5 - [[phab:T230055|T230055]]
* 09:32 marostegui: Stop MySQL on db2043 [[phab:T230311|T230311]]
* 09:24 marostegui: Remove empty table testcommonswiki. globalblocks from s4 - [[phab:T230055|T230055]]
* 09:22 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Remove db2043 from config [[phab:T230311|T230311]] (duration: 00m 47s)
* 09:22 marostegui: Remove db2043 from tendril and zarcillo [[phab:T230311|T230311]]
* 09:21 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Remove db2043 from config [[phab:T230311|T230311]] (duration: 00m 48s)
* 09:06 jijiki: depool and pool back mw1222
* 08:22 elukey: restart Analytics hadoop HDFS namenodes to pick up new heap settings
* 08:12 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Reorganize s3 codfw weights [[phab:T220170|T220170]] (duration: 00m 48s)
* 08:07 marostegui@cumin1001: dbctl commit (dc=codfw): 'Reorganize s3 codfw weights [[phab:T220170|T220170]]', diff saved to https://phabricator.wikimedia.org/P8901 and previous config saved to /var/cache/conftool/dbconfig/20190812-080731-marostegui.json
* 07:46 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Promote db2105 as s3 codfw master (duration: 00m 47s)
* 07:43 marostegui@cumin1001: dbctl commit (dc=all): 'Promote db2105 to s3 codfw master [[phab:T230106|T230106]]', diff saved to https://phabricator.wikimedia.org/P8900 and previous config saved to /var/cache/conftool/dbconfig/20190812-074314-marostegui.json
* 07:34 marostegui: Switchover s3 codfw master db2043 -> db2105 - [[phab:T230106|T230106]]
* 07:26 marostegui@cumin1001: dbctl commit (dc=all): 'Pool db2121 into s7', diff saved to https://phabricator.wikimedia.org/P8899 and previous config saved to /var/cache/conftool/dbconfig/20190812-072617-marostegui.json
* 07:20 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Provision db2121 into s7 [[phab:T228969|T228969]] (duration: 00m 47s)
* 07:19 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Provision db2121 into s7 [[phab:T228969|T228969]] (duration: 00m 48s)
* 05:04 marostegui: Remove math table from s3 - [[phab:T196055|T196055]]
* 05:02 marostegui: Remove math table from s1 - [[phab:T196055|T196055]]
 
== 2019-08-11 ==
* 22:38 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Temporary make account creation limits more restrictive ([[phab:T230304|T230304]]) (duration: 00m 50s)
 
== 2019-08-10 ==
* 01:49 mutante: mwmaint - running (1 of 8, the one for en) refreshLinks maintenance cron manually to verify it works after switching mwscriptwikiset to PHP7.2 ([[phab:T195392|T195392]])
* 00:52 mutante: mwmaint - running update_flaggedrevs_stats - updates the flagged revs statistics table on each wiki
* 00:47 mutante: mwmaint - running cirrus sanitize jobs maintenance cron
 
== 2019-08-09 ==
* 21:28 mutante: mwmaint - generating new captchas for ConfirmEdit extension by running generatecaptcha maintenance cron command
* 20:55 mutante: mwmaint - running update_special_pages maintenance cron manually
* 20:31 mutante: contint1001 - added entry to /etc/fstab  for /mnt/docker to survive reboots ( 13 /dev/mapper/contint1001--data-docker /mnt/docker ext4    defaults        0      2$
* 19:46 mutante: mwdebug1001 - temp stopped puppet, editing nginx config to test making it listen on IPv6 for upstream proxies (529401) ([[phab:T224538|T224538]])
* 19:37 mutante: mwmaint - running cirrussearch maintenance jobs manually (completion indices, sanitize cirrus jobs)
* 18:14 elukey: add BGP peer for AS 38758 on cr1-eqsin
* 17:54 mutante: mwmaint - running initsitestats maintenance job - initializes or updates statistics table on all wikis
* 17:23 elukey: set BGP peer "BrightRidge" on cr2-eqiad
* 17:19 mutante: mwmaint - running purgeParserCache maintenance cron manually with PHP 7.2 - ..slowly
* 16:52 mutante: mwmaint - manually running updatePageTriageQueue maintenance cron with PHP 7.2
* 16:15 arturo: add phamhi to 'wmf' and 'ops' LDAP groups ([[phab:T228942|T228942]])
* 15:48 jijiki: Disable puppet on mw1222 and depool
* 11:50 ema: root@puppetmaster2001:/srv/private# su -c "export GIT_SSH=/srv/private/.git/ssh_wrapper.sh ; git push ssh://puppetmaster1001.eqiad.wmnet/srv/private master" gitpuppet
* 11:44 ema: puppetmaster1001: resetting last 3 /srv/private commits due to broken replication
* 10:38 thcipriani: gerrit restart on cobalt.
* 09:36 marostegui: Drop math table from s7 [[phab:T196055|T196055]]
* 09:04 marostegui: Drop math table from s4 - [[phab:T196055|T196055]]
* 08:58 marostegui: Reload haproxy on dbproxy1010 to repool labsdb1011 - [[phab:T196055|T196055]]
* 08:51 moritzm: upgrading ghostscript on thumbor1001
* 08:32 marostegui: Stop MySQL on db2069 [[phab:T230107|T230107]]
* 08:29 marostegui: Remove db2069 from tendril and zarcillo [[phab:T230107|T230107]]
* 08:24 marostegui: Reload haproxy on dbproxy1010 to depool labsdb1011 - [[phab:T196055|T196055]]
* 07:31 vgutierrez: uploaded trafficserver-8.0.3wm3 to apt.wikimedia.org (stretch) - [[phab:T220383|T220383]] [[phab:T228135|T228135]]
* 06:19 elukey: powercycle thumbor2004 (no ssh, serial console showing a fronzen os)
* 05:37 marostegui: Run maintain-views script with --clean to clean up math table views - [[phab:T196055|T196055]]
* 02:30 mutante: mwmaint1002 - manually running cleanup_upload_stash maintenance cron to confirm no issues with PHP 7.2 in maintenance/cleanupUploadStash.php
* 02:24 mutante: mwmaint1002 - manually running purge_expired_userrights maintenance cron to confirm no issues with PHP 7.2 in maintenance/purgeExpiredUserrights.php
* 02:17 mutante: mwmaint1002 - manually running purge_abusefilter maintenance cron
 
== 2019-08-08 ==
* 23:50 Urbanecm: Evening SWAT done
* 23:49 urbanecm@deploy1001: Synchronized php-1.34.0-wmf.17/extensions/WikiEditor/modules/jquery.wikiEditor.dialogs.config.js: SWAT: {{Gerrit|6dcab39}}: Follow-up {{Gerrit|Ia75d685c}}: Fix the insert file dialog ([[phab:T230078|T230078]]) (duration: 00m 50s)
* 23:48 mutante: mwmaint1002 - manually running purge_securepoll maintenance script
* 23:42 mutante: mwmaint1002 - manually running TranslatioNNotifications DigestEmailer maintenance cron
* 22:05 mutante: rolling out new scap version 3.12.0-1 on all of eqiad
* 22:02 mutante: mwdebug2002 - scap pull to test new scap, nothing to do
* 22:00 mutante: rolling out new scap version 3.12.0-1 on all of codfw
* 21:54 mutante: (purge unpublished articles from ContentTranslation older than 455 days)
* 21:52 mutante: mwmwaint1002 - manually running purge_old_cx_drafts maintenance job for ContentTranslation - after switching helper script to PHP 7.2
* 21:50 mutante: mwmaint1002 - manually running purgeUnusedProjects with PageAssessments extension to confirm no issues after switch to PHP7.2
* 21:40 mutante: mwmaint1002 - manually running (weekly) echo_mail cron job (user notifications) to confirm it works after switching foreachwikiindblist to use php7.2 ([[phab:T195392|T195392]])
* 21:30 mutante: rolling out new scap package 3.12.0-1 on mw-canary servers via debdeploy ([[phab:T230144|T230144]])
* 21:28 mutante: rolling out new scap package 3.12.0-1 on contint servers
* 21:22 mutante: built new scap version 3.12.0-1 on boron, imported packages on install1002 (apt.wm.org), copied from stretch to jessie and buster ([[phab:T230144|T230144]])
* 20:33 gehel@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 19:36 thcipriani: restart gerrit on cobalt to pick up new config
* 19:34 thcipriani: restart gerrit-replica on gerrit2001 to pick up new config
* 19:27 gehel@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 19:06 brennen@deploy1001: rebuilt and synchronized wikiversions files: all wikis to 1.34.0-wmf.17
* 17:52 XioNoX: run /usr/local/sbin/restart-php7.2-fpm on mwdebug1001
* 17:33 fdans@deploy1001: Finished deploy [analytics/refinery@cef01d3]: deploy analytics refinery, second attempt (duration: 16m 52s)
* 17:21 XioNoX: add user jbond to network devices
* 17:16 fdans@deploy1001: Started deploy [analytics/refinery@cef01d3]: deploy analytics refinery, second attempt
* 16:56 ppchelko@deploy1001: Finished deploy [changeprop/deploy@069d297]: Remove workaround for ORES not supporting eventgate events [[phab:T228688|T228688]] (duration: 01m 24s)
* 16:55 ppchelko@deploy1001: Started deploy [changeprop/deploy@069d297]: Remove workaround for ORES not supporting eventgate events [[phab:T228688|T228688]]
* 16:40 fdans@deploy1001: Started deploy [analytics/refinery@cef01d3]: deploying analytics refinery
* 15:49 XioNoX: set virtual-chassis vcp-snmp-statistics to all VC - [[phab:T228824|T228824]]
* 15:13 herron: rebooting fermium (lists) for security updates
* 15:11 XioNoX: commit synchronize on cr1-codfw - [[phab:T226422|T226422]]
* 14:52 XioNoX: continue cr1-codfw:re1 replacement - [[phab:T226422|T226422]]
* 13:09 marostegui: Drop table math from s8 [[phab:T196055|T196055]]
* 12:15 tarrow: EU midday SWAT done
* 12:15 tarrow@deploy1001: Synchronized php-1.34.0-wmf.17/extensions/Wikibase/: SWAT: [[gerrit:529059{{!}}Add hook to invalidate cache entries missing TermboxOption (T228978)]] (duration: 01m 14s)
* 12:01 tarrow@deploy1001: Synchronized php-1.34.0-wmf.17/extensions/Wikibase/: SWAT: [[gerrit:529055{{!}}Split ParserCache on Termbox (T228978)]] (duration: 01m 21s)
* 12:00 tarrow: Running SWAT a little over time because late start and slow jenkins
* 11:46 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: {{Gerrit|dfeb2a9}}: HD logo for enwikivoyage ([[phab:T230114|T230114]]) (duration: 00m 56s)
* 11:44 urbanecm@deploy1001: Synchronized static/images/project-logos/: SWAT: {{Gerrit|dfeb2a9}}: HD logo for enwikivoyage ([[phab:T230114|T230114]]) (duration: 00m 56s)
* 11:31 Urbanecm: Purge https://en.wikipedia.org/static/images/project-logos/zhwikisource.png ([[phab:T229715|T229715]])
* 11:30 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: {{Gerrit|be886ad}}: Add hd variations for zhwikiource project logo ([[phab:T229715|T229715]]) (duration: 00m 55s)
* 11:28 urbanecm@deploy1001: Synchronized static/images/project-logos/: SWAT: {{Gerrit|be886ad}}: Add hd variations for zhwikiource project logo ([[phab:T229715|T229715]]) (duration: 00m 56s)
* 11:24 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: {{Gerrit|9a4494a}}: Add Hubblesite.org and Spacetelescope.org to commons wgCopyUploadsDomains ([[phab:T230083|T230083]]) (duration: 00m 57s)
* 11:05 Urbanecm: Run scap pull on mwdebug1001 to revert local modifications ([[phab:T207627|T207627]])
* 10:53 jijiki: Disable puppet, depool and pool mw1221, mw1222, mw1223 for 529061
* 10:46 Urbanecm: Set $wgContentHandlers["flow-board"] = $wgContentHandlers["wikitext"]; locally on mwdebug1001 to fix few bad pages ([[phab:T207627|T207627]])
* 10:43 moritzm: installing exim4 security updates on buster hosts (our exim config is not vulnerable)
* 09:41 moritzm: installing OpenJDK security updates on WDQS servers
* 09:30 jbond42: disabling puppet fleet wide
* 09:26 marostegui: Drop table math from labswiki (wikitech) and labtestwiki [[phab:T196055|T196055]]
* 09:21 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Remove db2069 from config [[phab:T230107|T230107]] (duration: 00m 55s)
* 09:19 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Remove db2069 from config [[phab:T230107|T230107]] (duration: 00m 57s)
* 08:45 elukey: restart hadoop namenodes on an-master100* to pick up new GC settings (CMS -> G1 switch)
* 08:44 moritzm: installing OpenJDK security updates on elastic* servers
* 08:36 marostegui: Remove math table from s5 [[phab:T196055|T196055]]
* 08:13 marostegui: Stop MySQL on db2065 to test dbproxy2003
* 07:48 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Promote db2096 as codfw x1 master [[phab:T220170|T220170]] (duration: 00m 57s)
* 07:39 marostegui: Switchover x1 codfw master db2069 -> db2096 [[phab:T220170|T220170]]
* 06:40 _joe_: restarting php-fpm on the application servers to pick up the change
* 05:54 marostegui: Stop MySQL on db2035 for decommissioning [[phab:T229784|T229784]]
* 05:52 marostegui: Remove db2035 from tendril and zarcillo [[phab:T229784|T229784]]
* 00:48 mutante: mwdebug2002 - sudo -i restart-php7.2-fpm
* 00:20 ejegg: re-enabled both recurring charge jobs
* 00:02 tstarling@deploy1001: Synchronized wmf-config/CommonSettings.php: hack for Parsoid testing on scandium (duration: 00m 55s)
 
== 2019-08-07 ==
* 23:58 tstarling@deploy1001: Synchronized w/rest.php: Creating rest.php endpoint disabled by default (duration: 00m 55s)
* 23:46 ejegg: disabled newer recurring charge job to test one at a time on existing recur records
* 23:22 mutante: elastic2054 - powercycling after it went down unexpectedly and Icinga alerted, this happened before in [[phab:T227298|T227298]]
* 23:08 XioNoX: set virtual-chassis vcp-snmp-statistics on asw2-ulsfo - [[phab:T228824|T228824]]
* 23:07 ebernhardson@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [[phab:T220625|T220625]]: Send writes for all non-private wikis to cloudelastic (duration: 01m 02s)
* 23:03 XioNoX: set virtual-chassis vcp-snmp-statistics on asw-a-codfw - [[phab:T228824|T228824]]
* 22:50 ebernhardson: mwmaint start cirrussearch saneitize.php against all non-private group1 wikis for cloudelastic cluster
* 22:48 mutante: mwmaint1002 - manually running the purgeOldData cron command to verify it with PHP 7.2 for 528730 ([[phab:T195392|T195392]])
* 22:12 jgleeson: switched on all fundraising process-control except ingenico_recurring_charge
* 21:50 ppchelko@deploy1001: Finished deploy [cpjobqueue/deploy@a151f4e]: Prepare for eventgate transition [[phab:T230049|T230049]] [[phab:T230048|T230048]] (duration: 00m 59s)
* 21:49 ppchelko@deploy1001: Started deploy [cpjobqueue/deploy@a151f4e]: Prepare for eventgate transition [[phab:T230049|T230049]] [[phab:T230048|T230048]]
* 21:25 mutante: restarting gerrit service to apply config change (528769)
* 21:00 ebernhardson: apply transient logger settings from prod search clusters to cloudelastic
* 20:34 reedy@deploy1001: rebuilt and synchronized wikiversions files: labswiki back to .17
* 20:34 jgleeson: updated civicrm from {{Gerrit|727a2c193b}} to {{Gerrit|be5b5a150b}}
* 20:32 reedy@deploy1001: rebuilt and synchronized wikiversions files: labswiki back to .16 temporarily
* 20:28 jgleeson: switched off fundraising process-control jobs
* 19:36 brennen@deploy1001: Synchronized php: group1 wikis to 1.34.0-wmf.17 (duration: 00m 54s)
* 19:35 brennen@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.34.0-wmf.17
* 19:16 reedy@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Revert Switch property terms migration to WRITE_NEW on client wikis [[phab:T225053|T225053]] (duration: 00m 58s)
* 18:15 jijiki: Restart  hhvm and php-fpm on canary mw hosts
* 17:54 shdubsh: install2002 add fstab entry for /srv mount - [[phab:T229997|T229997]]
* 17:46 shdubsh: install2002 stop nginx and squid for resync /srv to spare disk and restore mount - [[phab:T229997|T229997]]
* 17:42 otto@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Retry - Revert "Switch high-traffic jobs to eventgate." (duration: 00m 58s)
* 16:40 mobrovac@deploy1001: Synchronized wmf-config/InitialiseSettings.php: JobQueue: Revert switching high-traffic jobs to eventgate (duration: 00m 55s)
* 16:34 mobrovac@deploy1001: scap failed: average error rate on 6/11 canaries increased by 10x (rerun with --force to override this check, see https://logstash.wikimedia.org/goto/db09a36be5ed3e81155041f7d46ad040 for details)
* 16:00 thcipriani: restarting jenkins for update
* 15:58 jijiki: restart npre on stat1004
* 15:08 _joe_: freeing APCu on mw1270, which has degraded performance
* 14:24 marostegui: Reboot dbproxy2003 for kernel upgrades
* 14:16 jbond42: puppet *now* re-enabled
* 14:16 jbond42: puppet not re-enabled
* 14:01 jbond42: disable puppet fleet wide for puppetdb restart
* 13:57 marostegui: Remove labsdb1004 and labsdb1005 from zarcillo database (instance table), as those hosts were decommissioned months ago
* 13:55 marostegui: Remove labsdb1004 and labsdb1005 from zarcillo database, as those hosts were decommissioned months ago
* 13:48 marostegui: Apply grants for dbproxy2003 on m3 - [[phab:T202367|T202367]]
* 13:22 elukey: roll restart aqs on aqs100[4-9] to pick up new Druid backend settings
* 11:48 Amir1: EU SWAT is done
* 11:37 kart_: Updated cxserver to 2019-08-06-100812-production ([[phab:T227571|T227571]])
* 11:33 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:527087{{!}}Switch property terms migration to WRITE_NEW on client wikis (T225053)]] (duration: 00m 56s)
* 11:29 @: helmfile [EQIAD] Ran 'apply' command on namespace 'cxserver' for release 'production' .
* 11:26 pmiazga@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:528458{{!}}Enable AMC on all wikipedias (T228916)]] (duration: 00m 55s)
* 11:26 @: helmfile [CODFW] Ran 'apply' command on namespace 'cxserver' for release 'production' .
* 11:22 @: helmfile [STAGING] Ran 'apply' command on namespace 'cxserver' for release 'staging' .
* 11:09 marostegui: Restart gerrit
* 10:11 moritzm: deleting poolcounter1001, poolcounter1003, poolcounter2001, poolcounter2002 in Ganeti ([[phab:T224572|T224572]])
* 10:03 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 10:03 jmm@cumin2001: START - Cookbook sre.hosts.decommission
* 09:14 marostegui: Drop math table from s6 - [[phab:T196055|T196055]]
* 08:49 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Provision db2131 into x1 [[phab:T228969|T228969]] (duration: 00m 55s)
* 08:48 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Provision db2131 into x1 [[phab:T228969|T228969]] (duration: 00m 56s)
* 08:37 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 08:37 jmm@cumin2001: START - Cookbook sre.hosts.decommission
* 08:01 marostegui@cumin1001: dbctl commit (dc=all): 'Pool db2130 into s1 - [[phab:T228969|T228969]]', diff saved to https://phabricator.wikimedia.org/P8877 and previous config saved to /var/cache/conftool/dbconfig/20190807-080059-marostegui.json
* 07:36 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 07:36 jmm@cumin2001: START - Cookbook sre.hosts.decommission
* 07:33 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1100 after on-site maintenance', diff saved to https://phabricator.wikimedia.org/P8876 and previous config saved to /var/cache/conftool/dbconfig/20190807-073349-marostegui.json
* 07:32 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 07:31 jmm@cumin2001: START - Cookbook sre.hosts.decommission
* 07:28 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Provision db2130 into s1 [[phab:T228969|T228969]] (duration: 00m 56s)
* 07:27 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Provision db2130 into s1 [[phab:T228969|T228969]] (duration: 00m 55s)
* 05:57 marostegui: Stop MySQL on db1071 - [[phab:T229381|T229381]]
* 05:55 marostegui: Remove db1071 from tendril and zarcillo - [[phab:T229381|T229381]]
* 05:51 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Remove db1071 from config [[phab:T229381|T229381]] (duration: 00m 55s)
* 05:50 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Remove db1071 from config [[phab:T229381|T229381]] (duration: 00m 57s)
* 05:39 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1100 after on-site maintenance', diff saved to https://phabricator.wikimedia.org/P8875 and previous config saved to /var/cache/conftool/dbconfig/20190807-053903-marostegui.json
* 00:48 mutante: restarting gerrit to apply config change 528276 to exclude some projects from github replication
* 00:21 mutante: gerrit2001 - restarting gerrit to apply 528276
 
== 2019-08-06 ==
* 23:51 catrope@deploy1001: Synchronized static/images/project-logos/: Update HD logos for enwikisource and sourceswiki ([[phab:T229769|T229769]]) (duration: 00m 56s)
* 23:50 catrope@deploy1001: Synchronized php-1.34.0-wmf.17/extensions/Flow/includes/Import/OptInController.php: Unbreak disabling of Flow beta feature ([[phab:T229795|T229795]]) (duration: 00m 55s)
* 23:49 catrope@deploy1001: Synchronized php-1.34.0-wmf.16/extensions/Flow/includes/Import/OptInController.php: Unbreak disabling of Flow beta feature ([[phab:T229795|T229795]]) (duration: 00m 56s)
* 23:36 mutante: phabricator - added ssingh to acl*sre-team (group 29), WMF-NDA-requests (group 974) and WMF-NDA (group 61) ([[phab:T229860|T229860]])
* 23:25 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Update HD logos for enwikisource and sourceswiki ([[phab:T229769|T229769]]) (duration: 00m 55s)
* 23:24 catrope@deploy1001: Synchronized static/images/project-logos/: Update HD logos for enwikisource and sourceswiki ([[phab:T229769|T229769]]) (duration: 00m 56s)
* 23:19 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Switch updateBetaFeaturesUserCounts job to eventgate ([[phab:T228705|T228705]]) (duration: 00m 57s)
* 23:12 eileen: civicrm revision changed from {{Gerrit|2e03f9bb1e}} to {{Gerrit|727a2c193b}}, config revision is {{Gerrit|84b785d41c}}
* 22:33 ebernhardson: restart mjolnir-kafka-daemon across all elasticsearch servers
* 22:25 ebernhardson@deploy1001: Finished deploy [search/mjolnir/deploy@9e95ab4]: Deploy latest mjolnir daemon to handle bulk imports via swift (duration: 05m 35s)
* 22:19 ebernhardson@deploy1001: Started deploy [search/mjolnir/deploy@9e95ab4]: Deploy latest mjolnir daemon to handle bulk imports via swift
* 21:53 ebernhardson@deploy1001: Finished deploy [search/mjolnir/deploy@8e513f6]: Deploy latest mjolnir daemon to handle bulk imports via swift (duration: 16m 35s)
* 21:36 ebernhardson@deploy1001: Started deploy [search/mjolnir/deploy@8e513f6]: Deploy latest mjolnir daemon to handle bulk imports via swift
* 21:35 ebernhardson@deploy1001: Finished deploy [search/mjolnir/deploy@860fb33]: Deploy latest mjolnir daemon to handle bulk imports via swift (duration: 01m 50s)
* 21:34 ebernhardson@deploy1001: Started deploy [search/mjolnir/deploy@860fb33]: Deploy latest mjolnir daemon to handle bulk imports via swift
* 21:28 @: helmfile [STAGING] Ran 'apply' command on namespace 'sessionstore' for release 'staging' .
* 20:17 subbu: repooled wtp2019 ( after papaul finished upgrade as part of [[phab:T221572|T221572]] )
* 19:52 papaul: shutting down wtp2019 for firmware upgrade
* 19:50 herron: disabling puppet on logstash collectors for rolling deploy of https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/528306/ [[phab:T166107|T166107]]
* 19:42 subbu: depooled wtp2019 ( to assist papaul with [[phab:T221572|T221572]] )
* 19:22 thcipriani: gerrit restart on cobalt
* 19:12 brennen@deploy1001: rebuilt and synchronized wikiversions files: Group0 to 1.34.0-wmf.17
* 18:38 brennen@deploy1001: Finished scap: testwiki to php-1.34.0-wmf.17 and rebuild l10n cache (duration: 19m 02s)
* 18:19 brennen@deploy1001: Started scap: testwiki to php-1.34.0-wmf.17 and rebuild l10n cache
* 18:13 brennen@deploy1001: Pruned MediaWiki: 1.34.0-wmf.14 [keeping static files] (duration: 08m 28s)
* 17:37 accraze@deploy1001: Finished deploy [ores/deploy@d08fa62]: [[phab:T229848|T229848]] (duration: 17m 21s)
* 17:20 accraze@deploy1001: Started deploy [ores/deploy@d08fa62]: [[phab:T229848|T229848]]
* 17:14 volans: uploaded spicerack_0.0.26-1_amd64.deb to apt.wikimedia.org stretch-wikimedia
* 16:54 akosiaris@puppetmaster1001: conftool action : set/pooled=true; selector: dnsdisc=mathoid,name=codfw
* 16:52 @: helmfile [CODFW] Ran 'sync' command on namespace 'mathoid' for release 'production' .
* 16:50 brennen: cutting branch for 1.34.0-wmf.17
* 16:50 akosiaris@puppetmaster1001: conftool action : set/pooled=true; selector: dnsdisc=citoid,name=codfw
* 16:50 akosiaris@puppetmaster1001: conftool action : set/pooled=true; selector: dnsdisc=eventgate-analytics,name=codfw
* 16:48 @: helmfile [CODFW] Ran 'sync' command on namespace 'eventgate-analytics' for release 'analytics' .
* 16:47 akosiaris@puppetmaster1001: conftool action : set/pooled=false; selector: dnsdisc=mathoid,name=codfw
* 16:43 akosiaris@puppetmaster1001: conftool action : set/pooled=false; selector: dnsdisc=eventgate-analytics,name=codfw
* 16:40 ebernhardson@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [[phab:T220625|T220625]]: Re-sync enable group1 on cloudelastic, job runners are claiming its not enabled while app servers are sending jobs (duration: 00m 47s)
* 16:39 @: helmfile [CODFW] Ran 'apply' command on namespace 'cxserver' for release 'production' .
* 16:37 @: helmfile [CODFW] Ran 'apply' command on namespace 'blubberoid' for release 'production' .
* 16:36 @: helmfile [CODFW] Ran 'sync' command on namespace 'citoid' for release 'production' .
* 16:33 @: helmfile [CODFW] Ran 'sync' command on namespace 'mathoid' for release 'production' .
* 16:33 @: helmfile [CODFW] Ran 'apply' command on namespace 'mathoid' for release 'production' .
* 16:33 @: helmfile [CODFW] Ran 'sync' command on namespace 'mathoid' for release 'production' .
* 16:32 @: helmfile [CODFW] Ran 'apply' command on namespace 'mathoid' for release 'production' .
* 16:31 @: helmfile [STAGING] Ran 'apply' command on namespace 'mathoid' for release 'staging' .
* 16:19 ebernhardson@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [[phab:T220625|T220625]]: Turn on cloudelastic writes for group1 (duration: 00m 47s)
* 16:08 akosiaris@puppetmaster1001: conftool action : set/pooled=false; selector: dnsdisc=citoid,name=codfw
* 15:13 moritzm: installing bind9 security updates (client-side tools/libs only) for jessie
* 15:04 cdanis: ✔️ cdanis@cumin1001.eqiad.wmnet ~ 🕚☕ sudo debdeploy deploy -u 2019-08-06-conftool.yaml -s all
* 14:55 moritzm: rebooting mwlog1001 for kernel update
* 14:55 cdanis: ✔️ cdanis@cumin1001.eqiad.wmnet ~ 🕚☕ sudo cumin -p99 -b100 'A:all' 'apt-get update'
* 14:54 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 14:54 jmm@cumin2001: START - Cookbook sre.hosts.downtime
* 14:52 herron: restarting logstash service on logstash1007 to pick up puppet managed log4j2 config
* 14:50 cdanis: ✔️ cdanis@cumin1001.eqiad.wmnet ~ 🕚☕ sudo debdeploy deploy -u 2019-08-06-conftool.yaml -s mw-canary
* 14:45 cdanis: ✔️ cdanis@install1002.wikimedia.org ~/conftool-1.1.4-2 🕥☕ sudo -E reprepro -C main include buster-wikimedia conftool_1.1.4-2+deb10u1_amd64.changes
* 14:44 cdanis: ✔️ cdanis@install1002.wikimedia.org ~/conftool-1.1.4-2 🕥☕ sudo -E reprepro -C main include stretch-wikimedia conftool_1.1.4-2_amd64.changes
* 14:37 cdanis: ✔️ cdanis@install1002.wikimedia.org ~/conftool-1.1.4-2 🕥 sudo -E reprepro -C main include jessie-wikimedia conftool_1.1.4-2+deb8u1_amd64.changes
* 14:36 marostegui: Start mysql on db1100 after on-site maintenance - [[phab:T228732|T228732]]
* 12:30 elukey: roll restart cassandra on aqs for openjdk-8 upgrades
* 12:06 akosiaris@: helmfile [CODFW] Ran 'apply' command on namespace 'kube-system' for release 'calico-policy-controller' .
* 12:05 akosiaris@: helmfile [STAGING] Ran 'apply' command on namespace 'kube-system' for release 'calico-policy-controller' .
* 11:49 akosiaris@: helmfile [CODFW] Ran 'apply' command on namespace 'kube-system' for release 'calico-policy-controller' .
* 11:49 akosiaris@: helmfile [STAGING] Ran 'apply' command on namespace 'kube-system' for release 'calico-policy-controller' .
* 11:36 Urbanecm: EU SWAT done
* 11:21 urbanecm@deploy1001: Synchronized php-1.34.0-wmf.16/extensions/AbuseFilter/: SWAT: {{Gerrit|8cc96db}}: Better handling of DNONE ([[phab:T214674|T214674]], [[phab:T228677|T228677]]) (duration: 00m 48s)
* 11:11 moritzm: rebooting install1002 to pick up MDS-enabled qemu
* 11:11 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 11:11 jmm@cumin2001: START - Cookbook sre.hosts.downtime
* 11:08 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:528405{{!}}Disable EntitySchema in production wikidata ]] (duration: 00m 48s)
* 10:52 moritzm: rebooting install2002 to pick up MDS-enabled qemu
* 10:49 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 10:49 jmm@cumin2001: START - Cookbook sre.hosts.downtime
* 10:07 moritzm: rebooting etherpad1001 to pick up MDS-enabled qemu
* 10:06 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 10:06 jmm@cumin2001: START - Cookbook sre.hosts.downtime
* 09:59 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 09:59 filippo@cumin1001: START - Cookbook sre.hosts.downtime
* 09:59 filippo@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 09:58 filippo@cumin1001: START - Cookbook sre.hosts.downtime
* 08:52 @: helmfile [STAGING] Ran 'apply' command on namespace 'mathoid' for release 'staging' .
* 08:39 marostegui: Add db2130 to tendril and zarcillo [[phab:T228969|T228969]]
* 08:22 @: helmfile [STAGING] Ran 'apply' command on namespace 'mathoid' for release 'staging' .
* 07:27 marostegui: Stop MySQL on db1100 before powering the host off - [[phab:T228732|T228732]]
* 07:27 marostegui@cumin1001: dbctl commit (dc=all): 'Depool for firmware and BIOS upgrade [[phab:T228732|T228732]]', diff saved to https://phabricator.wikimedia.org/P8869 and previous config saved to /var/cache/conftool/dbconfig/20190806-072720-marostegui.json
* 07:10 onimisionipe: pool maps1001. Postgres init complete - [[phab:T229788|T229788]]
* 05:59 urbanecm@deploy1001: Synchronized php-1.34.0-wmf.16/extensions/CheckUser: Fix [[phab:T229893|T229893]] (duration: 00m 47s)
* 05:53 marostegui@cumin1001: dbctl commit (dc=all): 'Pool db2127 into s3 [[phab:T228969|T228969]]', diff saved to https://phabricator.wikimedia.org/P8868 and previous config saved to /var/cache/conftool/dbconfig/20190806-055357-marostegui.json
* 05:49 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Provision db2127 into s3 [[phab:T228969|T228969]] (duration: 00m 48s)
* 05:34 marostegui: Restart wikibugs
* 05:06 marostegui: Reload haproxy on dbproxy1011 to repool labsdb1010 [[phab:T222978|T222978]]
* 03:58 ebernhardson: start importing group[12] to cloudelastic from mwmaint1002
* 02:08 eileen: civicrm revision changed from {{Gerrit|857dcc9461}} to {{Gerrit|2e03f9bb1e}}, config revision is {{Gerrit|84b785d41c}}
* 02:05 MaxSem: Creating local accounts for Community Tech bot on every Wikipedia
 
== 2019-08-05 ==
* 23:34 mutante: mwmaint1002 - remove getJobQueueLengths.php from www-data's crontab ([[phab:T195392|T195392]])
* 23:03 Urbanecm: Evening SWAT done
* 23:03 urbanecm@deploy1001: Synchronized wmf-config/ProductionServices.php: SWAT: {{Gerrit|87b428d}}: Repoint cloudelastic at LB dns ([[phab:T220625|T220625]]) (duration: 00m 48s)
* 21:55 papaul: powering down wtp2011 for BIOS upgrade
* 21:39 cdanis: ✔️ cdanis@cumin1001.eqiad.wmnet ~ 🕠🍺 sudo debdeploy deploy -u 2019-08-05-conftool.yaml -s all
* 21:35 cdanis: ❌cdanis@cumin1001.eqiad.wmnet ~ 🕠🍺 sudo debdeploy deploy -u 2019-08-05-conftool.yaml -s eqsin
* 21:29 cdanis: ✔️ cdanis@cumin1001.eqiad.wmnet ~ 🕠🍺 sudo cumin -p99 -b100 'A:all' 'apt-get update'
* 21:28 mutante: 🔔 scandium - ree-enabled icinga notifications for various services
* 21:27 cdanis: ✔️ cdanis@cumin1001.eqiad.wmnet ~ 🕠🍺 sudo debdeploy deploy -u 2019-08-05-conftool.yaml -s mw-canary
* 21:25 cdanis: ✔️ cdanis@install1002.wikimedia.org ~ 🕠🍺 sudo -E reprepro -C main include jessie-wikimedia conftool-1.1.4-1/conftool_1.1.4-1+deb8u1_amd64.changes
* 21:25 cdanis: ✔️ cdanis@install1002.wikimedia.org ~ 🕠🍺 sudo -E reprepro -C main include buster-wikimedia conftool-1.1.4-1/conftool_1.1.4-1+deb10u1_amd64.changes
* 21:24 cdanis: ✔️ cdanis@install1002.wikimedia.org ~ 🕠 sudo -E reprepro -C main include stretch-wikimedia conftool-1.1.4-1/conftool_1.1.4-1_amd64.changes
* 21:22 ebernhardson: start importing group0 to cloudelastic from mwmaint1002
* 20:49 ebernhardson: nuke all search indices on cloudelastic preparing for fresh imports and live updates [[phab:T220625|T220625]]
* 20:34 arlolra: Updated Parsoid to {{Gerrit|7232dff}} ([[phab:T228223|T228223]])
* 20:28 arlolra@deploy1001: Finished deploy [parsoid/deploy@d3a2937]: Updating Parsoid to {{Gerrit|7232dff}} (duration: 09m 02s)
* 20:19 arlolra@deploy1001: Started deploy [parsoid/deploy@d3a2937]: Updating Parsoid to {{Gerrit|7232dff}}
* 20:06 mholloway-shell@deploy1001: Finished deploy [mobileapps/deploy@e774a05]: Update mobileapps to {{Gerrit|c713c2e}} (duration: 04m 51s)
* 20:01 mholloway-shell@deploy1001: Started deploy [mobileapps/deploy@e774a05]: Update mobileapps to {{Gerrit|c713c2e}}
* 19:51 gehel: depool wdqs1005 - [[phab:T229876|T229876]]
* 19:35 thcipriani: gerrit restart on cobalt for configuration updates
* 19:34 bblack: fixing up cloudelastic LVS IPv6 stuff on lvs1014, lvs1016, cloudelastic* - possible monitoring noise
* 19:33 thcipriani: gerrit restart for gerrit-replica on gerrit2001
* 18:44 Urbanecm: Morning SWAT done
* 18:39 urbanecm@deploy1001: Synchronized php-1.34.0-wmf.16/extensions/AbuseFilter/: SWAT: {{Gerrit|d358f17}}: Revert "Better handling of DNONE" ([[phab:T214674|T214674]], [[phab:T228677|T228677]]) (duration: 00m 47s)
* 18:32 urbanecm@deploy1001: Synchronized php-1.34.0-wmf.16/extensions/AbuseFilter/: SWAT: {{Gerrit|936a462}}: Better handling of DNONE ([[phab:T214674|T214674]], [[phab:T228677|T228677]]) (duration: 00m 47s)
* 18:29 urbanecm@deploy1001: Synchronized php-1.34.0-wmf.16/extensions/WikimediaEvents/: SWAT: {{Gerrit|3ee0e84}}: Temporarily log search to two schemas (duration: 00m 47s)
* 18:25 Urbanecm: Deployed patch for [[phab:T207094|T207094]]
* 18:21 urbanecm@deploy1001: Synchronized dblists/: SWAT: {{Gerrit|a9e4ed8}}: Remove related-articles-footer-blacklisted-skins.dblist ([[phab:T229644|T229644]], 3/3) (duration: 00m 46s)
* 18:20 urbanecm@deploy1001: Synchronized wmf-config/CommonSettings.php: SWAT: {{Gerrit|a9e4ed8}}: Remove related-articles-footer-blacklisted-skins.dblist ([[phab:T229644|T229644]], 2/3) (duration: 00m 47s)
* 18:19 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: {{Gerrit|a9e4ed8}}: Remove related-articles-footer-blacklisted-skins.dblist ([[phab:T229644|T229644]], 1/3) (duration: 00m 49s)
* 18:09 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: {{Gerrit|254ecc1}}: Switch testwiki to use kask (only) for sessions ([[phab:T222099|T222099]]) (duration: 00m 48s)
* 18:07 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: {{Gerrit|e44a6e6}}: Enable editor gender surveys ([[phab:T227793|T227793]]) (duration: 00m 48s)
* 18:06 onimisionipe: reinit postgres on maps1001 - [[phab:T229788|T229788]]
* 17:33 jijiki: Pool restbase2009 - [[phab:T227408|T227408]]
* 17:28 fsero@puppetmaster1001: conftool action : set/pooled=true; selector: dnsdisc=sessionstore{{!}}citoid{{!}}cxserver{{!}}eventgate-analytics{{!}}eventgate-main{{!}}termbox{{!}}blubberoid{{!}}mathoid{{!}}zotero,name=codfw
* 16:53 fsero@: helmfile [CODFW] Ran 'apply' command on namespace 'cxserver' for release 'production' .
* 16:53 fsero@: helmfile [CODFW] Ran 'apply' command on namespace 'cxserver' for release 'production' .
* 16:52 fsero@: helmfile [CODFW] Ran 'apply' command on namespace 'blubberoid' for release 'production' .
* 16:37 fsero@: helmfile [CODFW] Ran 'apply' command on namespace 'sessionstore' for release 'production' .
* 16:32 fsero@: helmfile [CODFW] Ran 'apply' command on namespace 'mathoid' for release 'production' .
* 16:22 crusnov@: helmfile [CODFW] Ran 'apply' command on namespace 'eventgate-main' for release 'main' .
* 16:22 crusnov@: helmfile [CODFW] Ran 'apply' command on namespace 'eventgate-main' for release 'main' .
* 16:18 crusnov@: helmfile [CODFW] Ran 'apply' command on namespace 'eventgate-analytics' for release 'analytics' .
* 16:16 crusnov@: helmfile [CODFW] Ran 'apply' command on namespace 'citoid' for release 'production' .
* 16:10 fsero: recreating citoid eventgate-analytics eventgate-main mathoid sessionstore namespaces and redeploying from helmfile [[phab:T228837|T228837]]
* 16:06 crusnov@: helmfile [CODFW] Ran 'apply' command on namespace 'zotero' for release 'production' .
* 16:04 crusnov@: helmfile [CODFW] Ran 'apply' command on namespace 'termbox' for release 'production' .
* 16:02 crusnov@: helmfile [CODFW] Ran 'apply' command on namespace 'termbox' for release 'production' .
* 15:58 Urbanecm: Deploy patch for [[phab:T200104|T200104]]
* 15:41 vgutierrez@puppetmaster1001: conftool action : set/pooled=yes; selector: name=ncredir2002.codfw.wmnet
* 15:36 fsero@: helmfile [CODFW] Ran 'apply' command on namespace 'zotero' for release 'production' .
* 15:32 fsero@: helmfile [CODFW] Ran 'apply' command on namespace 'termbox' for release 'production' .
* 15:27 fsero: recreating zotero and termbox  namespaces and services from helmfile codfw - [[phab:T228837|T228837]]
* 15:26 fsero: recreating zotero and termbox from helmfile codfw - [[phab:T228837|T228837]]
* 15:21 marostegui: Add db2127 to tendril and zarcillo (s3) - [[phab:T228969|T228969]]
* 15:18 vgutierrez@puppetmaster1001: conftool action : set/pooled=no; selector: name=ncredir2002.codfw.wmnet
* 14:32 marostegui: Reload haproxy on dbproxy1011 to depool labsdb1010 [[phab:T222978|T222978]]
* 14:24 papaul: shut down rstbase2009 for battery replacement
* 14:12 fsero@puppetmaster1001: conftool action : set/pooled=false; selector: dnsdisc=sessionstore{{!}}citoid{{!}}cxserver{{!}}eventgate-analytics{{!}}eventgate-main{{!}}termbox{{!}}blubberoid{{!}}mathoid{{!}}zotero,name=codfw
* 14:08 jiji@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 14:08 jiji@cumin1001: START - Cookbook sre.hosts.downtime
* 14:07 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 14:07 jiji@cumin1001: START - Cookbook sre.hosts.downtime
* 14:06 jijiki: Depool and restart restbase2009 for maint - [[phab:T227408|T227408]]
* 14:05 fsero@: helmfile [CODFW] Ran 'apply' command on namespace 'kube-system' for release 'coredns' .
* 14:04 fsero@: helmfile [CODFW] Ran 'apply' command on namespace 'kube-system' for release 'rbac-deploy-clusterrole' .
* 14:00 fsero@: helmfile [CODFW] Ran 'apply' command on namespace 'kube-system' for release 'calico-policy-controller' .
* 13:57 fsero@: helmfile [CODFW] Ran 'apply' command on namespace 'kube-system' for release 'calico-policy-controller' .
* 13:56 fsero: deploying calico controller  in codfw via helmfile - [[phab:T228837|T228837]]
* 13:42 fsero: deploying tiller in kube-system for helmfile changes - [[phab:T228837|T228837]]
* 13:37 volans: run cumin 'A:cumin' 'rm -v /usr/local/sbin/{wmf-upgrade-varnish,wmf-upgrade-and-reboot,wmf-downtime-host,wmf-decommission-host}' [[phab:T205886|T205886]]
* 13:28 elukey@cumin1001: END (PASS) - Cookbook sre.kafka.roll-restart-mirror-maker (exit_code=0)
* 13:16 elukey@cumin1001: START - Cookbook sre.kafka.roll-restart-mirror-maker
* 13:01 jbond42: rolling update of openjdk-8 on restbase
* 12:44 moritzm: restarting cassandra on restbase-dev1004
* 12:44 moritzm: restarting cassandra on restbase-dev1040
* 12:33 moritzm: uploaded openjdk-8 u222 for jessie-wikimedia
* 12:26 Krinkle: mwscript deleteEqualMessages.php --wiki fywiktionary (requested at [[m:Steward_requests/Miscellaneous]])
* 12:13 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:526657{{!}}Switch property terms migration to WRITE_NEW on production wikidata (T225053)]] (duration: 00m 48s)
* 12:01 Urbanecm: EU SWAT done
* 11:46 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: {{Gerrit|0032b0a}}: Enable Page Previews as default on hewikivoyage ([[phab:T222017|T222017]]) (duration: 00m 47s)
* 11:43 jbond@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.rolling-restart (exit_code=99)
* 11:43 jbond@cumin1001: START - Cookbook sre.elasticsearch.rolling-restart
* 11:42 jbond@cumin1001: END (ERROR) - Cookbook sre.elasticsearch.rolling-restart (exit_code=97)
* 11:42 jbond@cumin1001: START - Cookbook sre.elasticsearch.rolling-restart
* 11:38 urbanecm@deploy1001: Synchronized php-1.34.0-wmf.16/extensions/MobileFrontend/: SWAT: {{Gerrit|b7ae4fb}}: Revert "[AMC] [desktop] [mobile] use AMC by default for desktop users" ([[phab:T229722|T229722]]) (duration: 00m 49s)
* 11:33 marostegui: Upgrade MySQL on db2074 db2057 db2050 db2035 db2098
* 11:29 urbanecm@deploy1001: Synchronized php-1.34.0-wmf.16/extensions/Wikibase: SWAT: {{Gerrit|3ecaa57}}: Add only needed entity usages in AddUsagesForPageJob ([[phab:T226818|T226818]], [[phab:T205045|T205045]]) (duration: 01m 12s)
* 11:04 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: {{Gerrit|9eb74c2}}: Define import sources for fawiki ([[phab:T229717|T229717]]) (duration: 00m 48s)
* 10:40 jbond42: update java on sessionstore
* 10:40 jdrewniak@deploy1001: Synchronized portals: Wikimedia Portals Update: [[gerrit:528092{{!}} Bumping portals to master (T128546)]] (duration: 00m 46s)
* 10:39 jdrewniak@deploy1001: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: [[gerrit:528092{{!}} Bumping portals to master (T128546)]] (duration: 00m 49s)
* 10:27 ema: upload fifo-log-demux 0.5 to stretch-wikimedia
* 10:12 jbond42: rolling update of openjdk on maps servers
* 09:30 marostegui: Stop MySQL on db2105 to change binlog format
* 09:18 @: helmfile [STAGING] Ran 'apply' command on namespace 'restrouter' for release 'staging' .
* 09:07 arturo: downtime toolschecker for 5hours
* 09:05 @: helmfile [STAGING] Ran 'apply' command on namespace 'restrouter' for release 'staging' .
* 08:56 moritzm: installing vim security updates for jessie (stretch/buster already fixed)
* 08:46 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Remove db2035 from config [[phab:T229784|T229784]] (duration: 00m 46s)
* 08:45 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Remove db2035 from config [[phab:T229784|T229784]] (duration: 00m 47s)
* 08:43 @: helmfile [STAGING] Ran 'apply' command on namespace 'restrouter' for release 'staging' .
* 08:32 marostegui@cumin1001: dbctl commit of MediaWiki config (dc=all), diff saved to 'https://phabricator.wikimedia.org/P8861', previous config saved to /var/cache/conftool/dbconfig/20190805-083254-marostegui.json
* 08:21 marostegui: Switchover s2 codfw master from db2035 to db2107 - [[phab:T221533|T221533]] [[phab:T220170|T220170]]
* 07:53 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Reorganize s2 [[phab:T228969|T228969]] (duration: 00m 47s)
* 07:53 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Reorganize s2 [[phab:T228969|T228969]] (duration: 00m 48s)
* 07:52 marostegui@deploy1001: sync-file aborted: Reorganize s2 [[phab:T228969|T228969]] (duration: 00m 06s)
* 07:49 marostegui@cumin1001: dbctl commit of MediaWiki config (dc=all), diff saved to 'https://phabricator.wikimedia.org/P8859', previous config saved to /var/cache/conftool/dbconfig/20190805-074930-marostegui.json
* 07:45 moritzm: installing unzip regression DLA for jessie
* 07:43 moritzm: removed orespoolcounter[12]00[12] from debmonitor [[phab:T227640|T227640]]
* 07:23 marostegui: Move db2095:3312 from db2063 to db2126 - [[phab:T228969|T228969]]
* 05:58 marostegui: Update rack column on zarcillo.servers for the new servers [[phab:T229683|T229683]]
* 05:30 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Provision db2124 into s6 [[phab:T228969|T228969]] (duration: 00m 46s)
* 05:29 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Provision db2124 into s6 [[phab:T228969|T228969]] (duration: 00m 49s)
* 05:28 marostegui@cumin1001: dbctl commit of MediaWiki config (dc=all), diff saved to 'https://phabricator.wikimedia.org/P8858', previous config saved to /var/cache/conftool/dbconfig/20190805-052839-marostegui.json
 
== 2019-08-04 ==
* 18:45 krinkle@deploy1001: Synchronized wmf-config/abusefilter.php: labs-only noop - {{Gerrit|f740f89c594979}} (duration: 00m 50s)
 
== 2019-08-03 ==
* 12:02 gilles: purging ruwiki articles on mwmaint1002
* 11:30 gilles: purging eswiki articles on mwmaint1002
* 10:01 ema: cp1085: restart varnish-be
* 09:36 gilles@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [[phab:T216499|T216499]] [[phab:T216594|T216594]] Renew origin trial tokens (duration: 00m 48s)
* 00:40 ejegg: rolled back fundraising python tools from {{Gerrit|493a38f9e0}} to {{Gerrit|2a56e5e283}}
 
== 2019-08-02 ==
* 23:58 mutante: scandium - apt-get remove --purge prometheus-hhvm-exporter - not needed here, no HHVM ([[phab:T228069|T228069]])
* 23:16 XioNoX: Make the Level3 link between eqiad-knams primary - [[phab:T228827|T228827]]
* 23:06 mutante: mwdebug1001/mwdebug1002 - restart-php7.2-fpm - low opcache
* 20:48 sbassett: Deployed security patch for [[phab:T229541|T229541]]
* 20:14 Urbanecm: Run mwscript deleteEqualMessages.php --wiki=cswiki --delete
* 19:24 mutante: gerrit2001 - re-enabling puppet, starting as slave for the first time ever, thanks to codfw dbproxy, gerrit service running  ([[phab:T176532|T176532]])
* 18:37 mutante: gerrit2001 - disabling puppet, stopping gerrit service
* 18:36 mutante: adding gerrit2001 to ferm rules on dbproxy for misc
* 18:14 Lucas_WMDE: recached all WikibaseView messages in ResourceLoader for [[phab:T229604|T229604]], cf. https://w.wiki/6kc
* 17:46 XioNoX: flap NTT link in eqsin
* 17:42 lucaswerkmeister-wmde@deploy1001: Finished scap: Fix WikibaseView i18n globals ([[phab:T229604|T229604]]) (duration: 16m 51s)
* 17:26 XioNoX: add avoid_path to cr1/2-eqsin
* 17:25 lucaswerkmeister-wmde@deploy1001: Started scap: Fix WikibaseView i18n globals ([[phab:T229604|T229604]])
* 17:19 krinkle@deploy1001: Synchronized docroot/noc/db.php: {{Gerrit|a75d23ecb1b}} (duration: 00m 47s)
* 17:10 krinkle@deploy1001: Synchronized docroot/noc/db.php: {{Gerrit|ee528e886268c08e9377fbd764ec861b09adfc73}} (duration: 00m 48s)
* 16:42 XioNoX: replace rhenium with netflow1001 netflow target + iBGP peer on all routers
* 15:52 mholloway-shell@deploy1001: Finished deploy [mobileapps/deploy@250f711]: Fix MCS production crashers ([[phab:T229521|T229521]], [[phab:T229630|T229630]]) (duration: 04m 41s)
* 15:47 mholloway-shell@deploy1001: Started deploy [mobileapps/deploy@250f711]: Fix MCS production crashers ([[phab:T229521|T229521]], [[phab:T229630|T229630]])
* 15:14 @: helmfile [STAGING] Ran 'apply' command on namespace 'restrouter' for release 'staging' .
* 15:12 @: helmfile [STAGING] Ran 'apply' command on namespace 'restrouter' for release 'staging' .
* 14:14 mforns@deploy1001: Finished deploy [analytics/refinery@b50a939]: deploying refinery up to {{Gerrit|b50a93955952ed863d5ef7703a91ab59f5d979cf}} (rollback of cassandra and edit_hourly hive2 actions to unbreak production) (duration: 16m 47s)
* 13:57 mforns@deploy1001: Started deploy [analytics/refinery@b50a939]: deploying refinery up to {{Gerrit|b50a93955952ed863d5ef7703a91ab59f5d979cf}} (rollback of cassandra and edit_hourly hive2 actions to unbreak production)
* 13:54 @: helmfile [STAGING] Ran 'sync' command on namespace 'restrouter' for release 'staging' .
* 13:45 oblivian@puppetmaster1001: conftool action : set/weight=10; selector: cluster=api_appserver,dc=eqiad,service=nginx,name=mw12[23].*
* 12:33 marostegui: Restarted wikibugs a few minutes ago as it was not sending anything on IRC
* 11:56 Amir1: aborted l10nupdate
* 11:54 Amir1: start of l10nupdate
* 11:48 ladsgroup@deploy1001: scap sync-l10n completed (1.34.0-wmf.16) (duration: 00m 44s)
* 11:39 ladsgroup@deploy1001: Finished scap: [[phab:T229604{{!}}Rebuilding l10n cache]] (duration: 05m 06s)
* 11:34 ladsgroup@deploy1001: Started scap: [[phab:T229604{{!}}Rebuilding l10n cache]]
* 10:51 ladsgroup@deploy1001: Synchronized php-1.34.0-wmf.16/extensions/Wikibase: [[gerrit:527501{{!}}Revert "fix eslint errors in lib after moving submodule files into lib"]] (duration: 01m 08s)
* 10:01 @: helmfile [STAGING] Ran 'apply' command on namespace 'restrouter' for release 'staging' .
* 09:22 marostegui: Compress s7 on labsdb1010 - [[phab:T222978|T222978]]
* 09:17 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:526657{{!}}Revert: Switch property terms migration to WRITE_NEW on production wikidata (T225053)]] (duration: 00m 48s)
* 09:12 elukey: umount /sys/kernel/debug/tracing on analytics1043
* 08:57 akosiaris@: helmfile [STAGING] Ran 'apply' command on namespace 'kube-system' for release 'calico-policy-controller' .
* 08:56 @: helmfile [STAGING] Ran 'apply' command on namespace 'kube-system' for release 'calico-policy-controller' .
* 08:16 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Add db2129 to s6 (duration: 00m 46s)
* 07:56 marostegui@cumin2001: dbctl commit of MediaWiki config (dc=all), diff saved to 'https://phabricator.wikimedia.org/P8852', previous config saved to /var/cache/conftool/dbconfig/20190802-075548-marostegui.json
* 07:53 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Add db2129 to the config [[phab:T228969|T228969]] (duration: 00m 47s)
* 07:52 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Add db2129 to the config [[phab:T228969|T228969]] (duration: 00m 47s)
* 07:43 marostegui: Restart hhvm on mw1226
* 07:40 _joe_: restarting php-fpm on mw1270, with 80 pms - static, apc 6 GB no ttl
* 07:38 _joe_: disabling puppet on mw1270 for testing of different php settings
* 07:21 marostegui: Add db2124 to tendril and zarcillo [[phab:T228969|T228969]]
* 07:00 _joe_: running systemd-tmpfiles --create nutcracker.conf on scandium
* 06:46 vgutierrez: upgrading acme-chief to version 0.20 in acme-chief test instances - [[phab:T229096|T229096]]
* 05:21 vgutierrez: uploaded acme-chief 0.20 to apt.wikimedia.org (buster) - [[phab:T229096|T229096]]
* 05:10 marostegui: Stop MySQL on db2058 for decommissioning [[phab:T229543|T229543]]
* 05:06 marostegui: Remove db2058 from tendril and zarcillo [[phab:T229543|T229543]]
 
== 2019-08-01 ==
* 23:32 Urbanecm: Evening SWAT done
* 23:32 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: {{Gerrit|819073a}}: Add `autopatrolled` group to az wikisource ([[phab:T229371|T229371]]) (duration: 00m 49s)
* 23:29 urbanecm@deploy1001: Synchronized wmf-config/flaggedrevs.php: SWAT: {{Gerrit|8aca0eb}}: Remove the "autoreview" user group from ru.wikipedia ([[phab:T229596|T229596]]) (duration: 00m 47s)
* 23:22 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: {{Gerrit|cf01272}}: Add importing to english wikiquote ([[phab:T228607|T228607]]) (duration: 00m 48s)
* 23:10 ebernhardson@deploy1001: Synchronized php-1.34.0-wmf.16/extensions/WikimediaEvents/modules/all/ext.wikimediaEvents.searchSatisfaction.js: [[phab:T229614|T229614]]: Pass proper types to eventlogging to resolve eventlogging errors in wmf.16 (duration: 00m 47s)
* 22:52 mholloway-shell@deploy1001: Finished deploy [mobileapps/deploy@5ebf93e]: Update mobileapps to {{Gerrit|2ee48ab}} (duration: 04m 34s)
* 22:47 mholloway-shell@deploy1001: Started deploy [mobileapps/deploy@5ebf93e]: Update mobileapps to {{Gerrit|2ee48ab}}
* 22:17 ebernhardson@deploy1001: Synchronized php-1.34.0-wmf.16/extensions/WikimediaEvents/extension.json: [[phab:T229614|T229614]]: Update eventlogging schema version to resolve eventlogging errors in wmf.16 (duration: 00m 47s)
* 22:13 mutante: scandium apt-get autoremove
* 22:13 mutante: scandium apt-get remove --purge wikimedia-lvs-realserver ([[phab:T228069|T228069]])
* 21:48 mutante: scandium - apt-get remove --purge hhvm* ([[phab:T228069|T228069]])
* 21:23 brennen@deploy1001: Synchronized php: group1 and group2 to 1.34.0-wmf.16 (duration: 00m 46s)
* 21:22 brennen@deploy1001: rebuilt and synchronized wikiversions files: group1 and group2 to 1.34.0-wmf.16
* 20:57 krinkle@deploy1001: Synchronized php-1.34.0-wmf.16/includes/Revision/RevisionRenderer.php: [[phab:T229589|T229589]] - {{Gerrit|3f1b32e4db3698b8}} (duration: 00m 50s)
* 20:47 mutante: scandium - turning into an mw appserver
* 20:46 mutante: puppetmaster: create mcrouter certs for scandium.eqiad.wmnet needed to make it an appserver (https://wikitech.wikimedia.org/wiki/Mcrouter#Generate_certs_for_a_new_host) ([[phab:T228069|T228069]])
* 20:29 bblack: restart pybal on lvs1014
* 19:57 bblack: lvs1016 - restart pybal for slight LVS config change for cloudelastic - [[phab:T224324|T224324]]
* 19:40 brennen@deploy1001: Synchronized php: Revert group1 and group2 back to 1.34.0-wmf.15 (duration: 00m 53s)
* 19:39 twentyafterfour: finished phabricator database dump
* 19:34 bblack: lvs1014 - puppetize and restart pybal for cloudelastic LVS - [[phab:T224324|T224324]]
* 19:31 brennen@deploy1001: rebuilt and synchronized wikiversions files: group1 and group2 to 1.34.0-wmf.15
* 19:20 brennen: rolling back to wfm.15 on group1 and group2 while we investigate [[phab:T229575|T229575]]
* 19:12 brennen@deploy1001: rebuilt and synchronized wikiversions files: all wikis to 1.34.0-wmf.16
* 18:52 mutante: scandium (parsoid testing) - added mw application server roles - puppet work / maintenance
* 18:47 mutante: stat1004 - starting nagios-nrpe-server which got killed again - jbd2/md0-8 invoked oom-killer
* 18:32 bblack@puppetmaster1001: conftool action : set/pooled=yes; selector: name=^cloudelastic.*
* 18:30 bblack: lvs1016: puppet re-enabled, pybal restarted, cloudelastic deploy - [[phab:T224324|T224324]]
* 18:06 urbanecm@deploy1001: Synchronized wmf-config/CommonSettings.php: SWAT: {{Gerrit|469c42d}}: Switch testwiki to read sessions from kask, with fallback to redis ([[phab:T222099|T222099]]) (duration: 00m 55s)
* 17:42 bblack: disable puppet on lvs1014 + lvs1016 for cloudelastic LVS merge - [[phab:T224324|T224324]]
* 17:36 twentyafterfour: running db dump on phab1003 (in tmux).  command: sudo ./bin/storage dump --output /srv/dumps/phabricator_db_20190801.sql.gz --compress
* 16:05 XioNoX: power down msw1-codfw
* 15:47 XioNoX: start codfw mgmt work - [[phab:T228112|T228112]]
* 15:40 brennen@deploy1001: Synchronized php: group1 wikis to 1.34.0-wmf.16 (duration: 00m 54s)
* 15:39 brennen@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.34.0-wmf.16
* 15:16 mholloway-shell@deploy1001: Synchronized php-1.34.0-wmf.15/extensions/Wikibase: Do not warn about entity that was not found in WikiPageEntityRevisionLookup ([[phab:T229482|T229482]]) (duration: 01m 14s)
* 15:13 mholloway-shell@deploy1001: Synchronized php-1.34.0-wmf.16/extensions/Wikibase: Do not warn about entity that was not found in WikiPageEntityRevisionLookup ([[phab:T229482|T229482]]) (duration: 01m 20s)
* 14:51 herron: performing rolling restarts of eqiad logstash cluster for security updates
* 14:38 cdanis@deploy1001: Synchronized wmf-config/CommonSettings.php: {{Gerrit|Iaaa1238}} comment-only no-op change (dbctl to 100% of production!) (duration: 00m 55s)
* 14:22 cdanis@deploy1001: Synchronized wmf-config/etcd.php: {{Gerrit|Iaaa1238}} dbctl to 100% of production! (duration: 00m 54s)
* 12:38 jbond42: add cp1008 to canary hosts https://github.com/wikimedia/puppet/blob/production/hieradata/role/common/puppetmaster/frontend.yaml#L22
* 12:18 marostegui: Rename math table on db1089 (enwiki) - [[phab:T196055|T196055]]
* 11:42 Urbanecm: EU SWAT done
* 11:40 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: {{Gerrit|c51baa3}}: Add files.geocollections.info to the wgCopyUploadsDomains whitelist for commonswiki ([[phab:T229547|T229547]]) (duration: 00m 55s)
* 11:34 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: {{Gerrit|1e4458e}}: Add nlm.nih.gov to the wgCopyUploadsDomains whitelist for commonswiki ([[phab:T229470|T229470]]) (duration: 00m 53s)
* 11:23 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: {{Gerrit|c164132}}: Revert "Revert "Switch property terms migration to WRITE_NEW on production wikidata"" ([[phab:T225053|T225053]]) (duration: 00m 55s)
* 11:19 urbanecm@deploy1001: Synchronized php-1.34.0-wmf.16/extensions/ExternalGuidance/: SWAT: {{Gerrit|9402c36}}: Provide the messages in the target language of translation ([[phab:T228019|T228019]]) (duration: 00m 56s)
* 11:09 urbanecm@deploy1001: Synchronized wmf-config/flaggedrevs.php: SWAT: {{Gerrit|7db98f3}}: flaggedrevs.php: Remove useless wgAddGroups/wgRemoveGroups declarations (duration: 00m 55s)
* 11:05 urbanecm@deploy1001: Synchronized wmf-config/flaggedrevs.php: SWAT: {{Gerrit|aa82657}}: flaggedrevs.php: Allow wikis to remove ability to promote to/demote from autoreview/editor ([[phab:T229346|T229346]]) (duration: 00m 54s)
* 10:51 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Remove db2058 from config [[phab:T229543|T229543]] (duration: 00m 57s)
* 10:50 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Remove db2058 from config [[phab:T229543|T229543]] (duration: 00m 55s)
* 10:12 jbond42: rolling upgrade for patch
* 10:10 _joe_: repooling mw1348 after reimaging as pure-php7
* 07:38 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Provision db2126 into s2 [[phab:T228969|T228969]] (duration: 00m 55s)
* 07:37 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Provision db2126 into s2 [[phab:T228969|T228969]] (duration: 00m 54s)
* 07:35 marostegui@cumin1001: dbctl commit of MediaWiki config (dc=all), diff saved to 'https://phabricator.wikimedia.org/P8844', previous config saved to /var/cache/conftool/dbconfig/20190801-073459-marostegui.json
* 07:29 oblivian@puppetmaster1001: conftool action : set/pooled=inactive; selector: dc=eqiad,name=mw1348.eqiad.wmnet
* 07:27 _joe_: removing mw1348 from rotation - reimaging for [[phab:T228976|T228976]]
* 07:10 marostegui@cumin1001: dbctl commit of MediaWiki config (dc=all), diff saved to 'https://phabricator.wikimedia.org/P8843', previous config saved to /var/cache/conftool/dbconfig/20190801-071022-marostegui.json
* 07:09 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1112 (duration: 00m 54s)
* 06:59 elukey: install python3-docopt manually on lithium to test check_anycast_healthchecker
* 06:51 oblivian@puppetmaster1001: conftool action : set/pooled=yes; selector: dc=eqiad,name=mw1270.eqiad.wmnet
* 06:42 oblivian@puppetmaster1001: conftool action : set/pooled=no; selector: dc=eqiad,name=mw1270.eqiad.wmnet
* 06:42 _joe_: depooling mw1270 while migrating it to pure-php7
* 06:28 oblivian@puppetmaster1001: conftool action : set/pooled=yes; selector: dc=eqiad,name=mw1348.eqiad.wmnet
* 06:19 oblivian@puppetmaster1001: conftool action : set/pooled=no; selector: dc=eqiad,name=mw1348.eqiad.wmnet
* 06:18 _joe_: depooling mw1348 while moving it to no hhvm support.
* 00:33 krinkle@deploy1001: Synchronized php-1.34.0-wmf.16/resources/Resources.php: {{Gerrit|acfff6751f3b8f7650}} (duration: 00m 54s)
* 00:32 krinkle@deploy1001: Synchronized php-1.34.0-wmf.16/includes/specials/SpecialJavaScriptTest.php: {{Gerrit|acfff6751f3b8f7650}} (duration: 00m 54s)
* 00:30 krinkle@deploy1001: Synchronized php-1.34.0-wmf.16/includes/resourceloader/ResourceLoader.php: {{Gerrit|acfff6751f3b8f7650}} (duration: 00m 55s)
* 00:28 krinkle@deploy1001: sync-file aborted: composer.json composer.lock dblists debug.json docroot errorpages fc-list fonts images langlist langlist-labs multiversion php php-1.34.0-wmf.13 php-1.34.0-wmf.14 php-1.34.0-wmf.15 php-1.34.0-wmf.16 phpcs.xml phpunit.xml portals private README requirements.txt robots.txt rpc scap setup.py src static test-requirements.txt tests tox.ini typos vendor w wikiversions.json wikiversions-labs.js
 
== 2019-07-31 ==
* 23:34 eileen: civicrm revision changed from {{Gerrit|218328b29d}} to {{Gerrit|857dcc9461}}, config revision is {{Gerrit|84b785d41c}}
* 23:22 mholloway-shell@deploy1001: Finished deploy [mobileapps/deploy@db795ec]: Update mobileapps to {{Gerrit|b8c4166}} (duration: 04m 21s)
* 23:17 mholloway-shell@deploy1001: Started deploy [mobileapps/deploy@db795ec]: Update mobileapps to {{Gerrit|b8c4166}}
* 23:14 Urbanecm: Evening SWAT done
* 23:12 urbanecm@deploy1001: Synchronized wmf-config/: SWAT: Add kask session storage configuration. Use only on testwiki, ({{Gerrit|ede989e}}, {{Gerrit|862df8d}}, [[phab:T222099|T222099]]) (duration: 00m 56s)
* 21:56 ejegg: updated fundraising python tools from {{Gerrit|2a56e5e283}} to {{Gerrit|493a38f9e0}}
* 21:32 XioNoX: set cr1-eqiad's netflow target port to 2100 (nfacctd)
* 20:58 brennen@deploy1001: Synchronized php: Revert group1 back to 1.34.0-wmf.15 (duration: 00m 53s)
* 20:55 brennen@deploy1001: rebuilt and synchronized wikiversions files: Revert group1 back to 1.34.0-wmf.15
* 20:48 brennen@deploy1001: Synchronized php: group1 wikis to 1.34.0-wmf.16 (duration: 00m 54s)
* 20:47 brennen@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.34.0-wmf.16
* 20:37 brennen@deploy1001: Synchronized php-1.34.0-wmf.16/skins/MinervaNeue/includes/MinervaHooks.php: [[gerrit:526754{{!}}Limit Recent Changes disable-table mode to Minerva skin]] [[phab:T228280|T228280]] (duration: 00m 56s)
* 20:32 mdholloway: mobileapps deploy failed, investigating
* 20:32 mholloway-shell@deploy1001: Finished deploy [mobileapps/deploy@7c6ce69]: Update mobileapps to {{Gerrit|5eb9068}} (duration: 01m 39s)
* 20:30 mholloway-shell@deploy1001: Started deploy [mobileapps/deploy@7c6ce69]: Update mobileapps to {{Gerrit|5eb9068}}
* 20:01 mbsantos@deploy1001: Finished deploy [proton/deploy@ed6ebd8]: Update chromium-renderer to {{Gerrit|529c493}} ([[phab:T227124|T227124]]) (duration: 01m 43s)
* 19:59 mbsantos@deploy1001: Started deploy [proton/deploy@ed6ebd8]: Update chromium-renderer to {{Gerrit|529c493}} ([[phab:T227124|T227124]])
* 19:55 ejegg: updated payments-wiki from {{Gerrit|70b432d309}} to {{Gerrit|9533f70fab}}
* 18:49 mutante: phab1003 - manually running project_changes.sh to create mail to phabricator-reports@lists ([[phab:T228575|T228575]])
* 17:46 cdanis@deploy1001: Synchronized wmf-config/etcd.php: {{Gerrit|I45b705c8}} disable dbctl on half of canary hosts (duration: 00m 57s)
* 17:21 volans@deploy1001: Synchronized wmf-config/db-codfw.php: depool db2058, I/O error, [[phab:T229449|T229449]] (duration: 00m 54s)
* 17:15 volans@cumin1001: dbctl commit of MediaWiki config (dc=codfw), diff saved to 'https://phabricator.wikimedia.org/P8841', previous config saved to /var/cache/conftool/dbconfig/20190731-171536-volans.json
* 16:52 Urbanecm: Morning SWAT done
* 16:39 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: [[:gerrit:526691{{!}}Enable MobileWebUIActionsTracking schema with 50% sampling rate]] ([[phab:T220016|T220016]]) (duration: 00m 58s)
* 16:37 urbanecm@deploy1001: Synchronized php-1.34.0-wmf.15/extensions/WikimediaEvents/: SWAT: [[:gerrit:526688{{!}}Improved MobileUIActions tracking schema]] ([[phab:T220016|T220016]]) (duration: 00m 54s)
* 16:26 urbanecm@deploy1001: Synchronized php-1.34.0-wmf.16/extensions/GrowthExperiments/: SWAT: [[:gerrit:526610{{!}}Only set relevant title on mobile skin]] ([[phab:T229263|T229263]], [[phab:T225659|T225659]]) (duration: 00m 51s)
* 16:25 urbanecm@deploy1001: Synchronized php-1.34.0-wmf.15/extensions/GrowthExperiments/: SWAT: [[:gerrit:526612{{!}}Only set relevant title on mobile skin]] ([[phab:T229263|T229263]], [[phab:T225659|T225659]]) (duration: 00m 56s)
* 16:14 bblack: deploying VCL for H/2 coalesce 421 responses - [[phab:T207340|T207340]]
* 16:12 marostegui: Poweroff pc2010 for on-site maintenance  [[phab:T227552|T227552]]
* 15:52 mforns@deploy1001: Finished deploy [analytics/refinery@eb2d9b0]: deploying analytics-refinery up to {{Gerrit|eb2d9b005b26f6dddab2b59f1ba591f1758ec99f}} (duration: 13m 09s)
* 15:45 bstorm_: restarting nfs service on labstore1004
* 15:39 mforns@deploy1001: Started deploy [analytics/refinery@eb2d9b0]: deploying analytics-refinery up to {{Gerrit|eb2d9b005b26f6dddab2b59f1ba591f1758ec99f}}
* 15:24 thcipriani: restarting jenkins for update
* 15:22 ema: cp-ats: upgrade fifo-log-demux to 0.4 and restart atsmtail@backend.service [[phab:T229414|T229414]]
* 15:17 brennen@deploy1001: rebuilt and synchronized wikiversions files: Group0 to 1.34.0-wmf.16
* 15:15 ema: upload fifo-log-demux 0.4 to stretch-wikimedia [[phab:T229414|T229414]]
* 15:03 XioNoX: power down re1:cr1-codfw (backup) - [[phab:T226422|T226422]]
* 14:57 godog: ms-be2018 disablepd 1I:1:1 - [[phab:T225630|T225630]]
* 14:47 marostegui@cumin1001: dbctl commit of MediaWiki config (dc=all), diff saved to 'https://phabricator.wikimedia.org/P8838', previous config saved to /var/cache/conftool/dbconfig/20190731-144731-marostegui.json
* 14:45 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1112 (duration: 00m 46s)
* 14:29 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Fully repool db1078 after upgrade and alter (duration: 00m 47s)
* 14:28 herron: beginning rolling reboots of codfw logstash hosts for security updates
* 14:28 marostegui@cumin1001: dbctl commit of MediaWiki config (dc=all), diff saved to 'https://phabricator.wikimedia.org/P8837', previous config saved to /var/cache/conftool/dbconfig/20190731-142814-marostegui.json
* 14:18 cdanis@deploy1001: Synchronized wmf-config/etcd.php: {{Gerrit|I02d66736}} expand dbctl to 25% of the fleet (duration: 00m 46s)
* 14:04 elukey@cumin1001: END (PASS) - Cookbook sre.zookeeper.roll-restart-zookeeper (exit_code=0)
* 14:03 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: More traffic to db1078 after upgrade and alter (duration: 00m 46s)
* 14:01 marostegui@cumin1001: dbctl commit of MediaWiki config (dc=all), diff saved to 'https://phabricator.wikimedia.org/P8836', previous config saved to /var/cache/conftool/dbconfig/20190731-140124-marostegui.json
* 13:56 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Slowly repool db1078 after upgrade and alter (duration: 00m 46s)
* 13:51 marostegui@cumin1001: dbctl commit of MediaWiki config (dc=all), diff saved to 'https://phabricator.wikimedia.org/P8835', previous config saved to /var/cache/conftool/dbconfig/20190731-135129-marostegui.json
* 13:49 elukey@cumin1001: START - Cookbook sre.zookeeper.roll-restart-zookeeper
* 13:46 ema: cp4021: test fifo-log-demux 0.4 [[phab:T229414|T229414]]
* 13:37 elukey@cumin1001: END (PASS) - Cookbook sre.zookeeper.roll-restart-zookeeper (exit_code=0)
* 13:35 herron: beginning rolling restarts of codfw kafka-main brokers for security updates
* 13:32 jbond42: rolling update of exim
* 13:31 elukey@cumin1001: START - Cookbook sre.zookeeper.roll-restart-zookeeper
* 13:27 elukey: roll restart of zookeeper on conf100[4-6] and conf200[1-3] for openjdk upgrades
* 13:19 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1078 for alter and upgrade (duration: 00m 47s)
* 13:19 marostegui: Upgrade db1078
* 13:19 marostegui@cumin1001: dbctl commit of MediaWiki config (dc=all), diff saved to 'https://phabricator.wikimedia.org/P8834', previous config saved to /var/cache/conftool/dbconfig/20190731-131900-marostegui.json
* 13:15 marostegui: Drop abuse_filter_log.afl_log_id in s3 eqiad - [[phab:T226851|T226851]]
* 13:12 elukey@cumin1001: END (PASS) - Cookbook sre.zookeeper.roll-restart-zookeeper (exit_code=0)
* 13:05 elukey@cumin1001: START - Cookbook sre.zookeeper.roll-restart-zookeeper
* 12:59 elukey@cumin1001: END (PASS) - Cookbook sre.zookeeper.roll-restart-zookeeper (exit_code=0)
* 12:53 marostegui: Drop abuse_filter_log.afl_log_id from s3 codfw with replication (this will cause lag in s3 codfw) - [[phab:T226851|T226851]]
* 12:53 elukey@cumin1001: START - Cookbook sre.zookeeper.roll-restart-zookeeper
* 12:22 Amir1: EU SWAT is done
* 12:19 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:526657{{!}}Revert: Switch property terms migration to WRITE_NEW on production wikidata (T225053)]] (duration: 00m 47s)
* 12:06 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:519212{{!}}Switch property terms migration to WRITE_NEW on production wikidata (T225053)]] (duration: 00m 47s)
* 12:05 ladsgroup@deploy1001: sync-file aborted: SWAT: [[gerrit:519212{{!}}Switch property terms migration to WRITE_NEW on production wikidata (T225053)]] (duration: 00m 03s)
* 11:56 jbond42: enable puppet fleet wide https://gerrit.wikimedia.org/r/c/operations/puppet/+/526645 deployed
* 11:52 kartik@deploy1001: Synchronized php-1.34.0-wmf.15/extensions/ExternalGuidance: SWAT: [[gerrit{{!}}526637{{!}}Provide the messages in the target language of translation (T228019)]] (duration: 00m 46s)
* 11:41 jbond42: disable puppet to deploy https://gerrit.wikimedia.org/r/c/operations/puppet/+/526645
* {{safesubst:SAL entry|1=11:40 ladsgroup@deploy1001: Synchronized wmf-config/Wikibase.php: SWAT: [[gerrit:526646{{!}}Fix typo in name of config (T225055) (duration: 00m 47s)}}
* 11:25 kartik@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:526197{{!}}Decrease idwiki MT threshold for publishing (T228971)]] (duration: 00m 48s)
* 11:16 cparle@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [SDC] Enable other statements on Commons (duration: 00m 48s)
* 10:08 elukey@cumin1001: END (PASS) - Cookbook sre.kafka.roll-restart-mirror-maker (exit_code=0)
* 10:05 jbond42: rolling back https://gerrit.wikimedia.org/r/q/c9f876e9990fb171f27616515e7d125824d7a6ac
* 09:56 elukey@cumin1001: START - Cookbook sre.kafka.roll-restart-mirror-maker
* 09:49 _joe_: pruning orphaned images on contint1001
* 08:37 elukey: restart Yarn Resource Managers on an-master100[12] to pick up the new openjdk version
* 08:06 _joe_: running puppet (and restarting mtail) on all eqiad appservers
* 08:05 elukey: restart hadoop Namenodes on an-master100[12] to pick up new heap settings and new openjdk
* 07:40 marostegui: Drop abuse_filter_log.afl_log_id in s1 eqiad - [[phab:T226851|T226851]]
* 07:36 marostegui@cumin1001: dbctl commit of MediaWiki config (dc=codfw), diff saved to 'https://phabricator.wikimedia.org/P8833', previous config saved to /var/cache/conftool/dbconfig/20190731-073608-marostegui.json
* 07:34 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Provision db2125 into s2 [[phab:T228969|T228969]] (duration: 00m 47s)
* 07:34 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Provision db2125 into s2 [[phab:T228969|T228969]] (duration: 00m 49s)
* 07:29 elukey: restart-hhvm on mw1290
* 07:25 marostegui: Add db2125 to tendril and zarcillo [[phab:T228969|T228969]]
* 05:44 marostegui: Drop abuse_filter_log.afl_log_id from s1 codfw with replication (this will cause lag in s1 codfw) - [[phab:T226851|T226851]]
* 05:39 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Clarify that db2128 is the new sanitarium master (duration: 00m 47s)
* 05:00 marostegui: Compress s6 on labsdb1010 - [[phab:T222978|T222978]]
* 04:00 tstarling@deploy1001: Synchronized php-1.34.0-wmf.16/tests/phpunit/includes/parser/ParserOutputTest.php: [[phab:T229366|T229366]] (duration: 00m 46s)
* 03:59 tstarling@deploy1001: Synchronized php-1.34.0-wmf.16/includes/parser/ParserOutput.php: [[phab:T229366|T229366]] (duration: 00m 47s)
* 02:24 TimStarling: on mwmaint1002 reverted previous change using scap pull
* 01:08 TimStarling: on mwmaint1002, editing wikiversions.json locally to move wikimania2006wiki to .16, to investigate [[phab:T229366|T229366]]
* 00:24 eileen: tools revision changed from {{Gerrit|4910f1507c}} to {{Gerrit|2a56e5e283}}
* 00:04 catrope@deploy1001: Synchronized php-1.34.0-wmf.15/extensions/CentralNotice/: [[phab:T227711|T227711]] among others (duration: 00m 47s)
* 00:01 catrope@deploy1001: Synchronized php-1.34.0-wmf.16/extensions/CentralNotice/: [[phab:T227711|T227711]] among others (duration: 00m 48s)
 
== 2019-07-30 ==
* 23:31 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Revert "Enable MobileWebUIActionsTracking schema with 50% sampling rate" ([[phab:T220016|T220016]]) (duration: 00m 47s)
* 23:13 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Specify CentralAuth and OAuth session storage separately from per-wiki session storage ([[phab:T227097|T227097]], [[phab:T227696|T227696]]) (duration: 00m 47s)
* 23:06 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enable MobileWebUIActionsTracking schema with 50% sampling rate ([[phab:T220016|T220016]]) (duration: 00m 48s)
* 22:26 crusnov@deploy1001: Finished deploy [netbox/deploy@b76139e]: Upgrade Netbox to v2.6.1 (pass 3) - [[phab:T226331|T226331]] (duration: 00m 09s)
* 22:26 crusnov@deploy1001: Started deploy [netbox/deploy@b76139e]: Upgrade Netbox to v2.6.1 (pass 3) - [[phab:T226331|T226331]]
* 22:23 crusnov@deploy1001: Finished deploy [netbox/deploy@b76139e]: Upgrade Netbox to v2.6.1 (pass 2) - [[phab:T226331|T226331]] (duration: 00m 10s)
* 22:23 crusnov@deploy1001: Started deploy [netbox/deploy@b76139e]: Upgrade Netbox to v2.6.1 (pass 2) - [[phab:T226331|T226331]]
* 22:19 crusnov@deploy1001: Finished deploy [netbox/deploy@b76139e]: Upgrade Netbox to v2.6.1 - [[phab:T226331|T226331]] (duration: 00m 47s)
* 22:18 crusnov@deploy1001: Started deploy [netbox/deploy@b76139e]: Upgrade Netbox to v2.6.1 - [[phab:T226331|T226331]]
* 22:18 crusnov@deploy1001: Finished deploy [netbox/deploy@b76139e]: Upgrade Netbox to v2.6.1 - [[phab:T226331|T226331]] (duration: 00m 20s)
* 22:18 crusnov@deploy1001: Started deploy [netbox/deploy@b76139e]: Upgrade Netbox to v2.6.1 - [[phab:T226331|T226331]]
* 22:15 eileen: tools revision changed from {{Gerrit|8a464c4f0d}} to {{Gerrit|4910f1507c}} (reverted pgmysql switch)
* 22:13 ppchelko@deploy1001: Finished deploy [changeprop/deploy@76b6639]: Report 400 errors by default. [[phab:T229277|T229277]] (duration: 01m 29s)
* 22:11 ppchelko@deploy1001: Started deploy [changeprop/deploy@76b6639]: Report 400 errors by default. [[phab:T229277|T229277]]
* 22:02 ppchelko@deploy1001: Finished deploy [restbase/deploy@c7e0e33]: Enable language variants filter for PCS endpoints. [[phab:T229060|T229060]], take 2, feeds timed out (duration: 01m 03s)
* 22:00 ppchelko@deploy1001: Started deploy [restbase/deploy@c7e0e33]: Enable language variants filter for PCS endpoints. [[phab:T229060|T229060]], take 2, feeds timed out
* 22:00 ppchelko@deploy1001: Finished deploy [restbase/deploy@c7e0e33]: Enable language variants filter for PCS endpoints. [[phab:T229060|T229060]] (duration: 18m 40s)
* 21:42 ppchelko@deploy1001: Started deploy [restbase/deploy@c7e0e33]: Enable language variants filter for PCS endpoints. [[phab:T229060|T229060]]
* 19:39 brennen@deploy1001: rebuilt and synchronized wikiversions files: Revert "group0 wikis to 1.34.0-wmf.15
* 19:19 mutante: restbase2017 - sudo systemctl start cassandra-b after it had failed for unknown reason
* 19:19 XioNoX: repool ulsfo
* 19:13 brennen@deploy1001: rebuilt and synchronized wikiversions files: group0 to 1.34.0-wmf.16
* 18:49 XioNoX: rollback vrrp priority changes on cr4-ulsfo
* 18:48 XioNoX: rollback bump cr4-ulsfo<->cr1-codfw ospf metric
* 18:39 XioNoX: restart cr4-ulsfo
* 18:38 elukey@cumin1001: END (PASS) - Cookbook sre.kafka.roll-restart-brokers (exit_code=0)
* 18:38 XioNoX: bump cr4-ulsfo<->cr1-codfw ospf metric
* 18:26 XioNoX: failover VRRP master to cr3-ulsfo
* 18:25 XioNoX: activate transit BGP groups on cr3-ulsfo
* 18:25 XioNoX: rollback - bump cr3-ulsfo<->cr2-eqord ospf metric
* 18:15 XioNoX: restart cr3-ulsfo
* 18:15 brennen@deploy1001: Finished scap: testwiki to php-1.34.0-wmf.16 and rebuild l10n cache (duration: 18m 23s)
* 18:14 XioNoX: bump cr3-ulsfo<->cr2-eqord ospf metric
* 18:07 XioNoX: deactivate transit BGP groups on cr3-ulsfo
* 18:06 XioNoX: failover VRRP master to cr4-ulsfo
* 17:56 brennen@deploy1001: Started scap: testwiki to php-1.34.0-wmf.16 and rebuild l10n cache
* 17:55 brennen@deploy1001: Pruned MediaWiki: 1.34.0-wmf.11 (duration: 07m 40s)
* 17:53 bsitzmann@deploy1001: Finished deploy [mobileapps/deploy@af8b471]: Update mobileapps to {{Gerrit|ec865a7}} (duration: 05m 45s)
* 17:47 bsitzmann@deploy1001: Started deploy [mobileapps/deploy@af8b471]: Update mobileapps to {{Gerrit|ec865a7}}
* 17:20 XioNoX: depool ulsfo for routers upgrades - [[phab:T227886|T227886]]
* 17:15 godog: use wezen.codfw.wmnet instead of syslog.codfw.wmnet for production hosts
* 17:00 thcipriani: gerrit restart incoming -- gc time increasing causing timeouts
* 16:46 XioNoX: adding port 9105 to term prometheus in filter labs-in4 - [[phab:T225296|T225296]]
* 16:41 cdanis@deploy1001: Synchronized wmf-config/etcd.php: {{Gerrit|Icf57a2ab}} enable dbctl on all mw canaries (duration: 00m 47s)
* 16:37 brennen: cutting 1.34-wmf.16
* 16:33 elukey@cumin1001: START - Cookbook sre.kafka.roll-restart-brokers
* 16:22 godog: bounce rsyslog on centrallog1001 - [[phab:T199406|T199406]]
* 15:41 elukey@cumin1001: END (FAIL) - Cookbook sre.kafka.roll-restart-brokers (exit_code=99)
* 15:28 legoktm@deploy1001: Finished scap: Rebuild l10n cache for SecureLinkFixer message (duration: 18m 51s)
* 15:21 elukey@cumin1001: START - Cookbook sre.kafka.roll-restart-brokers
* 15:18 jijiki: Disable puppet on  mw1347 and mw2136, depool and pool back -  [[phab:T219150|T219150]]
* 15:13 elukey: remove snakebite from buster-wikimedia (not needed anymore)
* 15:09 legoktm@deploy1001: Started scap: Rebuild l10n cache for SecureLinkFixer message
* 15:06 legoktm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enable SecureLinkFixer everywhere ([[phab:T200751|T200751]]) (duration: 00m 47s)
* 14:48 cdanis@deploy1001: Synchronized wmf-config/etcd.php: {{Gerrit|I17c55428}} dbctl canary on mwdebug*, mw1261, mw1276 (duration: 00m 47s)
* 14:36 cdanis@deploy1001: Synchronized wmf-config/CommonSettings.php: {{Gerrit|Ie98a8d9e}} dbctl canary on mwdebug1001 (duration: 00m 47s)
* 14:34 cdanis@deploy1001: Synchronized wmf-config/etcd.php: {{Gerrit|Ie98a8d9e}} dbctl canary on mwdebug1001 (duration: 00m 47s)
* 14:33 cdanis@deploy1001: Synchronized docroot/noc/db.php: {{Gerrit|Ie98a8d9e}} dbctl canary on mwdebug1001 (duration: 00m 48s)
* 14:14 fsero: refreshing calico policy from code in eqiad
* 14:13 fsero: refreshing calico policy from code in codfw
* 13:38 marostegui: Move db2094:3315 from db2066 to db2128 - [[phab:T228258|T228258]]
* 13:14 @: helmfile [CODFW] Ran 'apply' command on namespace 'eventgate-analytics' for release 'analytics' .
* 13:13 @: helmfile [EQIAD] Ran 'apply' command on namespace 'eventgate-analytics' for release 'analytics' .
* 12:36 marostegui@cumin1001: dbctl commit of MediaWiki config (dc=all), diff saved to 'https://phabricator.wikimedia.org/P8824', previous config saved to /var/cache/conftool/dbconfig/20190730-123630-marostegui.json
* 12:21 @: helmfile [EQIAD] Ran 'apply' command on namespace 'termbox' for release 'production' .
* 12:15 @: helmfile [CODFW] Ran 'apply' command on namespace 'termbox' for release 'production' .
* 12:13 @: helmfile [STAGING] Ran 'apply' command on namespace 'termbox' for release 'test' .
* 12:13 jbond42: while testing some changes on the puppet master a bad config caused a small blip in catalouge compilation
* 12:09 @: helmfile [STAGING] Ran 'apply' command on namespace 'termbox' for release 'staging' .
* 11:34 @: helmfile [EQIAD] Ran 'apply' command on namespace 'cxserver' for release 'production' .
* 11:31 @: helmfile [CODFW] Ran 'apply' command on namespace 'cxserver' for release 'production' .
* 11:30 jijiki: Depool  mw1348 and pool back
* 11:28 @: helmfile [STAGING] Ran 'apply' command on namespace 'cxserver' for release 'staging' .
* 09:49 elukey: upload python-snakebite to buster-wikimedia (rebuilt for buster from source)
* 09:31 akosiaris@: helmfile [STAGING] Ran 'apply' command on namespace 'kube-system' for release 'calico-policy-controller' .
* 09:27 elukey: add thirdparty/cloudera to buster-wikimedia and import packages to it (pull from the jessie component)
* 08:17 marostegui: Stop MySQL on db2038 [[phab:T227565|T227565]]
* 08:10 marostegui: Remove db2038 from tendril and zarcillo [[phab:T227565|T227565]]
* 08:04 akosiaris: delete orespoolcounter{1,2}00{1,2} [[phab:T227640|T227640]]
* 08:04 akosiaris: revoke and deactivate orespoolcounter{1,2}00{1,2} [[phab:T227640|T227640]]
* 07:30 godog: bounce hhvm on mw1221
* 05:36 marostegui: Disable puppet on cumin2001 to investigate a backups issue
* 05:25 tstarling@deploy1001: Synchronized php-1.34.0-wmf.15/includes/jobqueue/jobs/AssembleUploadChunksJob.php: [[phab:T228929|T228929]] (duration: 00m 46s)
* 05:24 tstarling@deploy1001: Synchronized php-1.34.0-wmf.15/includes/api/ApiUpload.php: [[phab:T228929|T228929]] (duration: 00m 47s)
* 05:23 tstarling@deploy1001: Synchronized php-1.34.0-wmf.15/includes/upload/UploadBase.php: [[phab:T228929|T228929]] (duration: 00m 48s)
* 05:02 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Remove s8 ready only [[phab:T227062|T227062]] (duration: 00m 24s)
* 05:01 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Switchover s8 master eqiad from db1071 to db1104  [[phab:T227062|T227062]] (duration: 00m 24s)
* 05:00 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Set s8 on read-only [[phab:T227062|T227062]] (duration: 00m 26s)
* 05:00 marostegui: Starting s8 failover from db1071 to db1104 -  [[phab:T227062|T227062]]
* 04:48 eileen: civicrm revision changed from {{Gerrit|1d57aca19c}} to {{Gerrit|218328b29d}}, config revision is {{Gerrit|3f960c48f6}}
* 04:15 marostegui: Start pre-steps for s8 primary master failover - [[phab:T227062|T227062]]
* 02:37 eileen: civicrm revision changed from {{Gerrit|121feb5d53}} to {{Gerrit|1d57aca19c}}, config revision is {{Gerrit|3f960c48f6}}
 
== 2019-07-29 ==
* 23:37 XioNoX: replace export policy BGP_Wikimedia_own_space with BGP_Wikimedia_no_dfz in ams
* 23:36 catrope@deploy1001: Synchronized php-1.34.0-wmf.15/extensions/GrowthExperiments/: Make welcome and discovery tours fully mutually exclusive ([[phab:T229044|T229044]]) (duration: 00m 48s)
* 23:26 XioNoX: replace export policy BGP_Wikimedia_own_space with BGP_Wikimedia_no_dfz in ulsfo
* 23:22 XioNoX: replace export policy BGP_Wikimedia_own_space with BGP_Wikimedia_no_dfz in Dallas
* 22:33 krinkle@deploy1001: Synchronized php-1.34.0-wmf.15/includes/cache/MessageCache.php: [[phab:T208897|T208897]] - {{Gerrit|fa817b088e43975}} (duration: 00m 47s)
* 22:32 krinkle@deploy1001: Synchronized php-1.34.0-wmf.15/extensions/AbuseFilter/: [[phab:T214674|T214674]] - {{Gerrit|bfcaf0c26d6}} (duration: 00m 48s)
* 22:28 XioNoX: roll out anycast DNS and syslog to all network devices - [[phab:T228190|T228190]]
* 22:16 krinkle@deploy1001: Synchronized php-1.34.0-wmf.15/extensions/AbuseFilter/: [[phab:T214674|T214674]] - {{Gerrit|940955ea3844721a0}} (duration: 00m 48s)
* 22:05 XioNoX: replace ulsfo network devices' DNS target with 10.3.0.1
* 22:00 Krinkle: krinkle@deploy1001: Dirty git status on extensions/AbusesFilter and extensions/CheckUser in php-1.34.0-wmf.15
* 21:43 XioNoX: replace ulsfo network devices' syslog target with syslog.anycast.wmnet
* 19:22 smalyshev@deploy1001: Finished deploy [wdqs/wdqs@c3ffbee]: Weekly deploy (duration: 11m 42s)
* 19:10 smalyshev@deploy1001: Started deploy [wdqs/wdqs@c3ffbee]: Weekly deploy
* 18:23 Urbanecm: Morning SWAT done
* 18:21 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: [[:gerrit:526196{{!}}Rename Image-reviewer to image-reviewer on fawiki]] ([[phab:T216406|T216406]]) (duration: 00m 47s)
* 18:19 Urbanecm: Run mwscript migrateUserGroup.php --wiki=fawiki Image-reviewer image-reviewer ([[phab:T216406|T216406]])
* 18:18 XioNoX: switch traffic to the GTT link between Ashburn and Amsterdam (set GTT metric to 820 vs. 1820 before) - [[phab:T228827|T228827]]
* 18:10 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: [[:gerrit:430627{{!}}Add several rights to eliminators in fawiki]] ([[phab:T176553|T176553]], 2/2) (duration: 00m 47s)
* 18:08 urbanecm@deploy1001: Synchronized wmf-config/abusefilter.php: SWAT: [[:gerrit:430627{{!}}Add several rights to eliminators in fawiki]] ([[phab:T176553|T176553]], 1/2) (duration: 00m 47s)
* 18:04 urbanecm@deploy1001: Synchronized php-1.34.0-wmf.15/extensions/AbuseFilter: SWAT: [[:gerrit:525598{{!}}Initialize user-defined variables during shortcircuit]] ([[phab:T214674|T214674]]) (duration: 00m 49s)
* 17:37 ejegg: updated payments-wiki config to {{Gerrit|a7dacbf8e9}}
* 17:08 XioNoX: reprepro copy buster-wikimedia stretch-wikimedia python3-anycast-healthchecker
* 17:05 XioNoX: reprepro copy buster-wikimedia stretch-wikimedia python3-json-logger
* 17:05 XioNoX: reprepro copy buster-wikimedia stretch-wikimedia anycast-healthchecker
* 16:47 godog: add anycast syslog to wezen/centrallog1001
* 16:19 elukey: manually stopped the sre.kafka.roll-restart-brokers cookbook after 4 brokers restarts since the sleep interval (10mins) is too tight.
* 16:17 elukey@cumin1001: END (ERROR) - Cookbook sre.kafka.roll-restart-brokers (exit_code=97)
* 15:35 otto@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Retry - Produce resource_change stream to eventgate-main - [[phab:T211248|T211248]] (duration: 00m 46s)
* 15:34 elukey@cumin1001: START - Cookbook sre.kafka.roll-restart-brokers
* 15:30 otto@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Produce resource_change stream to eventgate-main - [[phab:T211248|T211248]] (duration: 00m 47s)
* 14:35 papaul: shutting down pc2010 for maintenance
* 13:57 cdanis@cumin1001: dbctl commit of MediaWiki config (dc=all), diff saved to 'https://phabricator.wikimedia.org/P8816', previous config saved to /var/cache/conftool/dbconfig/20190729-135730-cdanis.json
* 13:30 elukey@cumin1001: END (PASS) - Cookbook sre.druid.roll-restart-workers (exit_code=0)
* 13:28 marostegui: Stop MySQL on pc2010 - [[phab:T227552|T227552]]
* 13:23 arturo: [[phab:T228870|T228870]] reboot cloudvirt1007.eqiad.wmnet for kernel updates
* 13:23 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 13:23 aborrero@cumin1001: START - Cookbook sre.hosts.downtime
* 13:09 arturo: [[phab:T228870|T228870]] reboot cloudvirt1006.eqiad.wmnet for kernel updates
* 13:09 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 13:09 aborrero@cumin1001: START - Cookbook sre.hosts.downtime
* 13:01 elukey@cumin1001: START - Cookbook sre.druid.roll-restart-workers
* 12:45 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Provision db2128 into s5 api [[phab:T221533|T221533]] (duration: 00m 47s)
* 12:45 marostegui: Provision db2128 into s5 codfw - [[phab:T228969|T228969]]
* 12:44 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Provision db2128 into s5 api [[phab:T221533|T221533]] (duration: 00m 47s)
* 12:39 arturo: [[phab:T228870|T228870]] reboot cloudvirt1005.eqiad.wmnet for kernel updates
* 12:38 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 12:38 aborrero@cumin1001: START - Cookbook sre.hosts.downtime
* 12:20 arturo: [[phab:T228870|T228870]] reboot cloudvirt1004.eqiad.wmnet for kernel updates
* 12:20 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 12:20 aborrero@cumin1001: START - Cookbook sre.hosts.downtime
* 11:58 arturo: [[phab:T228870|T228870]] reboot cloudvirt1003.eqiad.wmnet for kernel updates
* 11:57 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 11:57 aborrero@cumin1001: START - Cookbook sre.hosts.downtime
* 11:36 arturo: icinga downtime toolschecker for 6h
* 11:31 arturo: [[phab:T228870|T228870]] reboot cloudvirt1002.eqiad.wmnet for kernel updates
* 11:31 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 11:31 aborrero@cumin1001: START - Cookbook sre.hosts.downtime
* 11:14 arturo: [[phab:T228870|T228870]] reboot cloudvirt1001.eqiad.wmnet for kernel updates
* 11:14 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 11:13 aborrero@cumin1001: START - Cookbook sre.hosts.downtime
* 11:13 aborrero@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 11:13 aborrero@cumin1001: START - Cookbook sre.hosts.downtime
* 11:11 dcausse: EU SWAT done
* 11:10 dcausse@deploy1001: Synchronized wmf-config/SearchSettingsForWikidata.php: [cirrus] Use correct factory declaration for EntityFullTextQueryBuilder (duration: 00m 47s)
* 10:37 jdrewniak@deploy1001: Synchronized portals: Wikimedia Portals Update: [[gerrit:526125{{!}} Bumping portals to master (T128546)]] (duration: 00m 47s)
* 10:36 jdrewniak@deploy1001: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: [[gerrit:526125{{!}} Bumping portals to master (T128546)]] (duration: 00m 47s)
* 09:49 marostegui: Add db2128 to tendril and zarcillo - [[phab:T228969|T228969]]
* 09:24 elukey@cumin1001: END (FAIL) - Cookbook sre.druid.roll-restart-workers (exit_code=99)
* 09:22 elukey@cumin1001: START - Cookbook sre.druid.roll-restart-workers
* 09:21 elukey@cumin1001: END (PASS) - Cookbook sre.druid.roll-restart-workers (exit_code=0)
* 08:55 elukey@cumin1001: START - Cookbook sre.druid.roll-restart-workers
* 08:51 root@: helmfile [STAGING] Ran 'apply' command on namespace 'kube-system' for release 'calico-policy-controller' .
* 08:47 elukey: set mcrouter async behavior for codfw replication to all mw app/api servers (changes will be picked up when puppet runs on the hosts) - [[phab:T225642|T225642]]
* 08:35 godog: temp stop puppet on cp hosts to apply https://gerrit.wikimedia.org/r/c/operations/puppet/+/525259
* 08:32 elukey@cumin1001: END (ERROR) - Cookbook sre.hadoop.roll-restart-workers (exit_code=97)
* 08:32 elukey@cumin1001: START - Cookbook sre.hadoop.roll-restart-workers
* 08:16 marostegui: Drop abuse_filter_log.afl_log_id in s7 eqiad - [[phab:T226851|T226851]]
* 07:49 dcausse: elastic@eqiad force recovery of failed shards (eswiki stuck)
* 07:30 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Remove db2038 from config [[phab:T221533|T221533]] (duration: 00m 46s)
* 07:29 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Remove db2038 from config [[phab:T221533|T221533]] (duration: 00m 50s)
* 07:18 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.roll-restart-workers (exit_code=0)
* 06:45 akosiaris: poweroff orespoolcounter{1,2}00{1,2} for removal [[phab:T227640|T227640]]
* 06:37 _joe_: restarted php7.2 on mwdebug1002, low opcache
* 06:36 _joe_: restarted coherence report on netmon1002, it failed earlier this morning
* 06:31 _joe_: restarting nrpe on restbase-dev1006 [[phab:T224260|T224260]]
* 06:30 elukey@cumin1001: START - Cookbook sre.hadoop.roll-restart-workers
* 05:33 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1104 in preparation for Tuesday 30th failover in s8 (duration: 00m 54s)
* 05:18 marostegui: Drop Drop abuse_filter_log.afl_log_id from s7 codfw with replication (this will cause lag in s7 codfw) - [[phab:T226851|T226851]]
* 05:05 marostegui: Remove db1072 from tendril and zarcillo [[phab:T228956|T228956]]
 
== 2019-07-28 ==
* 15:13 arturo: disable 1m load average check in icinga for labstore1007 for 24h
 
== 2019-07-27 ==
* 17:39 bd808: Updated profile & images for @wikimediatech twitter account
* 14:49 godog: bounce rsyslog on wezen / centrallog1001
* 06:43 elukey: powercycle mw1300 - no ssh, serial com2 stuck with no root loging available
* 00:35 mutante: restbase-dev1006 - starting nagios-nrpe-server
* 00:33 mutante: wikitech-static - fix /etc/letsencrypt/renewal/wikitech-static.wikimedia.org.conf - remove webroot_map and and line for status.wm.org that caused errors when doing a renewal dry-run. now dry run finishes succesfully and we are using "webroot" authenticator and not "apache" anymore. This should have resolved what this ticket was about. No more Apache kills/restarts on renewal. ([[phab:T214640|T214640]])
 
== 2019-07-26 ==
* 23:51 mutante: restbase-dev1006 - manually booting into PXE to debug boot issue / start Debian installer ([[phab:T224260|T224260]])
* 23:27 mutante: restbase-dev1006 - does not boot - hangs at "attempting to boot from C:" - entering "Legacy BIOS One Time Boot Menu" ([[phab:T224260|T224260]])
* 21:52 mutante: restbase-dev1006 - power reset via mgmt
* 20:48 mutante: restbase-dev1006 - rebooting from busybox shell where it was idling since a failed reimage attempt
* 20:22 foks: reset password for Sharons36
* 18:43 XioNoX: remove lvs100[1-6] switch config from asw2-d-eqiad - [[phab:T224223|T224223]]
* 18:33 mutante: deploy2001 - delgroup gerrit-root (follow-up to https://gerrit.wikimedia.org/r/c/operations/puppet/+/525444)
* 18:32 mutante: deploy1001 - delgroup gerrit-root (follow-up to https://gerrit.wikimedia.org/r/c/operations/puppet/+/525444)
* 18:20 XioNoX: remove lvs100[1-6] switch config from asw2-c-eqiad - [[phab:T224223|T224223]]
* 18:08 XioNoX: remove lvs100[1-6] switch config from asw2-b-eqiad - [[phab:T224223|T224223]]
* 18:01 XioNoX: remove lvs100[1-6] switch config from asw2-a-eqiad - [[phab:T224223|T224223]]
* 17:37 robh@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 17:37 robh@cumin1001: START - Cookbook sre.hosts.decommission
* 16:05 jforrester@deploy1001: Synchronized php-1.34.0-wmf.15/extensions/Flow/includes/Search/Iterators/TopicIterator.php: [[phab:T229114|T229114]] make orderUUID public, as it is needed by other classes for Dumps (duration: 00m 47s)
* 15:12 krinkle@deploy1001: Synchronized wmf-config/CommonSettings.php: {{Gerrit|de0822497919b1b}} (duration: 00m 48s)
* 15:02 Krinkle: krinkle@deploy1001: php-1.34.0-wmf.15 is still dirty on extensions/CheckUser
* 14:23 ema: re-enable puppet on cache nodes [[phab:T229091|T229091]]
* 14:10 ema: disable puppet on cache nodes [[phab:T229091|T229091]]
* 13:41 fsero: sudo -i reprepro --ignore=wrongdistribution include stretch-wikimedia /home/fsero/envoyproxy_1.11.0~wmf1_amd64.changes
* 13:41 jeh: updated labstore100[67].wikimedia.org performance scaling_governor [[phab:T225713|T225713]]
* 13:07 jeh: rebooting labstore1006.wikimedia.org for updates [[phab:T224228|T224228]]
* 13:00 Urbanecm: Change user email assigned to SUL user Stansfield ([[phab:T229004|T229004]])
* 12:45 jeh: rebooting labsdb1012.eqiad.wmnet for updates [[phab:T224228|T224228]]
* 12:35 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Provision db2123 into s5 vslow [[phab:T221533|T221533]] (duration: 00m 50s)
* 09:32 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Provision db2123 into s5 [[phab:T228969|T228969]] (duration: 00m 47s)
* 09:31 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Provision db2123 into s5 [[phab:T228969|T228969]] (duration: 00m 48s)
* 08:42 marostegui: Add db2123 to tendril and zarcillo - [[phab:T228969|T228969]]
* 06:12 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Fully repool db1096 (duration: 00m 47s)
* 06:01 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: More traffic to db1096 (duration: 00m 47s)
* 05:45 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: More traffic to db1096 (duration: 00m 46s)
* 05:40 marostegui: Stop MySQL on db1072 to get it ready for decommission - [[phab:T228956|T228956]]
* 05:36 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Slowly repool db1096 (duration: 00m 48s)
* 05:05 marostegui: Stop MySQL on db1096 for upgrade
* 05:00 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1096 (duration: 00m 49s)
* 00:53 ejegg: re-enabled dedupe_civicrm_contacts and major_gifts_addresses fundraising jobs
* 00:51 ejegg: re-enabled donations queue consumer
* 00:15 ejegg: disabled donations queue consumer
 
== 2019-07-25 ==
* 23:47 catrope@deploy1001: Synchronized php-1.34.0-wmf.15/extensions/GrowthExperiments/extension.json: Fix over-eager GrowthExperiments popups ([[phab:T229045|T229045]]) (duration: 00m 50s)
* 23:19 urbanecm@deploy1001: Synchronized wmf-config/CommonSettings.php: SWAT: [[:gerrit:523214{{!}}Revert "Delete Image-reviewer group from commonswiki for good"]] ([[phab:T228098|T228098]]) (duration: 00m 47s)
* 23:18 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: [[:gerrit:520364{{!}}Add sju, sjd, and rmf to wmgExtraLanguageNames]] ([[phab:T226701|T226701]]) (duration: 00m 47s)
* 23:11 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: [[:gerrit:525580{{!}}Enable VisualEditor in namespace Wikipédia on Slovak Wikipedia]] ([[phab:T229014|T229014]]) (duration: 00m 48s)
* 22:34 ejegg: re-enabled donations queue consumer
* 22:07 bblack: lvs1013 - restart pybal for resolv.conf changes - [[phab:T228190|T228190]]
* 22:04 bblack: lvs1014 - restart pybal for resolv.conf changes - [[phab:T228190|T228190]]
* 22:02 bblack: lvs1015 - restart pybal for resolv.conf changes - [[phab:T228190|T228190]]
* 22:02 ejegg: turned off dedupe_civicrm_contacts fundraising job
* 21:59 bblack: lvs1016 - restart pybal for resolv.conf changes - [[phab:T228190|T228190]]
* 21:47 bblack: primary high-traffic2 lvses in codfw, esams, ulsfo: restart pybal for resolv.conf changes - [[phab:T228190|T228190]]
* 21:46 XioNoX: apply export BGP_Wikimedia_no_dfz to eqiad's Confed_esams - [[phab:T227808|T227808]]
* 21:40 ejegg: turned off major_gifts_addresses fundraising job
* 21:38 bblack: primary high-traffic1 lvses in codfw, esams, ulsfo: restart pybal for resolv.conf changes - [[phab:T228190|T228190]]
* 21:07 bblack: backup lvses in codfw, esams, ulsfo: restart pybal for resolv.conf changes - [[phab:T228190|T228190]]
* 20:54 hashar: Rebasing mediawiki/extensions/MobileFrontend@wmf/1.34.0-wmf.15 for a build/CI related change to package.json https://gerrit.wikimedia.org/r/#/c/mediawiki/extensions/MobileFrontend/+/525632/
* 20:37 XioNoX: add prometheus-bird-exporter to stretch-wikimedia repo
* 20:15 robh@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 20:15 robh@cumin1001: START - Cookbook sre.hosts.decommission
* 20:03 robh@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 20:02 robh@cumin1001: START - Cookbook sre.hosts.decommission
* 19:59 ppchelko@deploy1001: Finished deploy [restbase/deploy@279cf27]: Set proper CSP headers for mobile-html [[phab:T229016|T229016]], feeds timing out. (duration: 05m 34s)
* 19:53 ppchelko@deploy1001: Started deploy [restbase/deploy@279cf27]: Set proper CSP headers for mobile-html [[phab:T229016|T229016]], feeds timing out.
* 19:53 robh@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 19:53 robh@cumin1001: START - Cookbook sre.hosts.decommission
* 19:52 ppchelko@deploy1001: Finished deploy [restbase/deploy@279cf27]: Set proper CSP headers for mobile-html [[phab:T229016|T229016]], take 3 (duration: 03m 14s)
* 19:49 ppchelko@deploy1001: Started deploy [restbase/deploy@279cf27]: Set proper CSP headers for mobile-html [[phab:T229016|T229016]], take 3
* 19:49 ppchelko@deploy1001: Finished deploy [restbase/deploy@279cf27]: Set proper CSP headers for mobile-html [[phab:T229016|T229016]], take 2 (duration: 06m 33s)
* 19:44 robh@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 19:44 robh@cumin1001: START - Cookbook sre.hosts.decommission
* 19:42 ppchelko@deploy1001: Started deploy [restbase/deploy@279cf27]: Set proper CSP headers for mobile-html [[phab:T229016|T229016]], take 2
* 19:42 ppchelko@deploy1001: Finished deploy [restbase/deploy@279cf27]: Set proper CSP headers for mobile-html [[phab:T229016|T229016]] (duration: 13m 42s)
* 19:29 ppchelko@deploy1001: Started deploy [restbase/deploy@279cf27]: Set proper CSP headers for mobile-html [[phab:T229016|T229016]]
* 19:04 robh@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 19:04 robh@cumin1001: START - Cookbook sre.hosts.decommission
* 19:03 robh@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 19:03 robh@cumin1001: START - Cookbook sre.hosts.decommission
* 19:01 otto@: helmfile [STAGING] Ran 'apply' command on namespace 'eventgate-analytics' for release 'analytics' .
* 18:36 robh@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 18:36 robh@cumin1001: START - Cookbook sre.hosts.decommission
* 18:19 robh@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 18:19 robh@cumin1001: START - Cookbook sre.hosts.decommission
* 18:19 robh@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 18:19 robh@cumin1001: START - Cookbook sre.hosts.decommission
* 18:04 robh@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 18:04 robh@cumin1001: START - Cookbook sre.hosts.decommission
* 18:04 robh@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 18:04 robh@cumin1001: START - Cookbook sre.hosts.decommission
* 18:04 robh@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 18:04 robh@cumin1001: START - Cookbook sre.hosts.decommission
* 18:00 robh@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 18:00 robh@cumin1001: START - Cookbook sre.hosts.decommission
* 18:00 robh@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 17:59 robh@cumin1001: START - Cookbook sre.hosts.decommission
* 17:59 robh@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 17:59 robh@cumin1001: START - Cookbook sre.hosts.decommission
* 17:58 mbsantos@deploy1001: Finished deploy [mobileapps/deploy@11d9d4a]: Update service-mobileapp-node to {{Gerrit|200a323}} ([[phab:T228938|T228938]] [[phab:T228287|T228287]]) (duration: 04m 39s)
* 17:53 mbsantos@deploy1001: Started deploy [mobileapps/deploy@11d9d4a]: Update service-mobileapp-node to {{Gerrit|200a323}} ([[phab:T228938|T228938]] [[phab:T228287|T228287]])
* 17:51 elukey: powercycle stat1007
* 17:44 volans: sudo cumin -s30 -b1 -m async 'A:wdqs-all and not A:wdqs-internal and not P{wdqs1009.eqiad.wmnet}' 'run-puppet-agent -e "volans - [[phab:T228122|T228122]] - deploying gerrit/524954"' 'systemctl restart wdqs-blazegraph'
* 17:33 volans: running sudo cumin -s30 -b1 -m async 'A:wdqs-internal' 'run-puppet-agent -e "volans - [[phab:T228122|T228122]] - deploying gerrit/524954"' 'systemctl restart wdqs-blazegraph'
* 17:18 volans: disabled puppet on A:wdqs-all, deploying gerrit/524954 - [[phab:T228122|T228122]]
* 17:17 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.rolling-restart-workers (exit_code=0)
* 17:01 elukey@cumin1001: START - Cookbook sre.hadoop.rolling-restart-workers
* 16:54 bblack: lvs5001 - restart pybal for resolv.conf change - [[phab:T228190|T228190]]
* 16:53 jforrester@deploy1001: Synchronized php-1.34.0-wmf.15/extensions/WikibaseMediaInfo/resources/statements/: [[phab:T228807|T228807]] Fix formatValue abort handling (duration: 00m 48s)
* 16:52 jijiki: Rolling restart of hhvm across the fleet
* 16:50 bblack: lvs5002 - restart pybal for resolv.conf change - [[phab:T228190|T228190]]
* 16:44 bblack: lvs5003 - restart pybal for resolv.conf change - [[phab:T228190|T228190]]
* 16:19 jijiki: Disable puppet on mw* servers for 525156
* 15:52 jeh: rebooting cloudstore1008.wikimedia.org for updates [[phab:T224228|T224228]]
* 15:41 jeh: rebooting cloudstore1009.wikimedia.org for updates [[phab:T224228|T224228]]
* 15:41 nuria@deploy1001: Finished deploy [analytics/refinery@f310917]: deploying refinery - migrations to hive2 actions (duration: 13m 40s)
* 15:35 robh@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 15:35 robh@cumin1001: START - Cookbook sre.hosts.decommission
* 15:35 robh@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 15:35 robh@cumin1001: START - Cookbook sre.hosts.decommission
* 15:32 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: Remove redundant wgResourceLoaderStorageEnabled override (duration: 00m 50s)
* 15:27 nuria@deploy1001: Started deploy [analytics/refinery@f310917]: deploying refinery - migrations to hive2 actions
* 15:09 jeh: rebooting labstore1004.eqiad.wmnet for updates [[phab:T224228|T224228]]
* 14:42 ebernhardson@deploy1001: Finished deploy [wikimedia/discovery/analytics@87b25f2]: Convert oozie actions from hive to hive2 (duration: 00m 19s)
* 14:42 ebernhardson@deploy1001: Started deploy [wikimedia/discovery/analytics@87b25f2]: Convert oozie actions from hive to hive2
* 14:22 root@: helmfile [STAGING] Ran 'apply' command on namespace 'kube-system' for release 'calico-policy-controller' .
* 14:22 root@: helmfile [STAGING] Ran 'apply' command on namespace 'kube-system' for release 'coredns' .
* 14:22 root@: helmfile [STAGING] Ran 'apply' command on namespace 'kube-system' for release 'rbac-deploy-clusterrole' .
* 14:06 root@: helmfile [STAGING] Ran 'apply' command on namespace 'kube-system' for release 'calico-policy-controller' .
* 14:06 root@: helmfile [STAGING] Ran 'apply' command on namespace 'kube-system' for release 'coredns' .
* 14:06 root@: helmfile [STAGING] Ran 'apply' command on namespace 'kube-system' for release 'rbac-deploy-clusterrole' .
* 14:02 moritzm: installing Java security updates on Druid servers
* 13:52 moritzm: installing Java security updates on AQS, Hadoop and Kafka/Jumbo servers
* 13:49 root@: helmfile [STAGING] Ran 'apply' command on namespace 'kube-system' for release 'calico-policy-controller' .
* 13:49 root@: helmfile [STAGING] Ran 'apply' command on namespace 'kube-system' for release 'coredns' .
* 13:49 root@: helmfile [STAGING] Ran 'apply' command on namespace 'kube-system' for release 'rbac-deploy-clusterrole' .
* 13:42 root@: helmfile [STAGING] Ran 'apply' command on namespace 'kube-system' for release 'calico-policy-controller' .
* 13:42 root@: helmfile [STAGING] Ran 'apply' command on namespace 'kube-system' for release 'coredns' .
* 13:42 root@: helmfile [STAGING] Ran 'apply' command on namespace 'kube-system' for release 'rbac-deploy-clusterrole' .
* 13:39 root@: helmfile [STAGING] Ran 'apply' command on namespace 'kube-system' for release 'calico-policy-controller' .
* 13:39 root@: helmfile [STAGING] Ran 'apply' command on namespace 'kube-system' for release 'coredns' .
* 13:39 root@: helmfile [STAGING] Ran 'apply' command on namespace 'kube-system' for release 'rbac-deploy-clusterrole' .
* 13:39 root@: helmfile [STAGING] Ran 'apply' command on namespace 'kube-system' for release 'rbac-deploy-clusterrole' .
* 13:38 root@: helmfile [STAGING] Ran 'apply' command on namespace 'kube-system' for release 'rbac-deploy-clusterrole' .
* 13:35 robh: cloudvirt1015 offline for ram swap via [[phab:T220853|T220853]]
* 13:20 root@: helmfile [STAGING] Ran 'apply' command on namespace 'kube-system' for release 'rbac-deploy-clusterrole' .
* 13:19 fsero: recreating clusterrole deploy from helmfile in staging
* 13:09 marostegui: Drop abuse_filter_log.afl_log_id in s5 eqiad - [[phab:T226851|T226851]]
* 13:04 liw@deploy1001: rebuilt and synchronized wikiversions files: all wikis to 1.34.0-wmf.15
* 12:49 marostegui: Drop abuse_filter_log.afl_log_id in s4 codfw (lag will appear on codfw) - [[phab:T226851|T226851]]
* 11:53 marostegui: Compress s3 wikis on labsdb1010 - [[phab:T222978|T222978]]
* 11:03 arturo: update stretch-wikimedia/thirdparty/kubeadm-k8s on install1002 for [[phab:T215531|T215531]] (kubeadm 1.15.1)
* 10:53 moritzm: rebooting cloudvirt2003-dev
* 10:52 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 10:52 jmm@cumin2001: START - Cookbook sre.hosts.downtime
* 10:35 moritzm: rebooting cloudvirt1024 for kernel update
* 10:35 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 10:35 jmm@cumin2001: START - Cookbook sre.hosts.downtime
* 09:21 marostegui: Failover m1 from dbproxy1006 to dbproxy1001 - [[phab:T227139|T227139]]
* 08:54 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 08:54 jmm@cumin2001: START - Cookbook sre.hosts.downtime
* 08:54 moritzm: rebooting cloudvirt2001-dev
* 08:32 Urbanecm: Password reset for SUL user Strejc
* 08:04 oblivian@puppetmaster1001: conftool action : set/pooled=yes; selector: cluster=api_appserver,dc=eqiad,name=mw128[0-3].*
* 08:01 oblivian@puppetmaster1001: conftool action : set/pooled=yes; selector: cluster=appserver,dc=eqiad,name=mw12(6[89]{{!}}7[0-5]).*
* 08:01 _joe_: repooling mw1268-1275 in the appserver cluster
* 08:00 moritzm: rebooting cloudvirt2001-dev
* 07:59 oblivian@puppetmaster1001: conftool action : set/pooled=yes; selector: cluster=api_appserver,dc=eqiad,name=mw12(7[6-9{{!}}8[0-3]).*
* 07:59 _joe_: repooling mw1276-1283 in the API cluster
* 07:33 moritzm: rebooting cloudvirt2001-dev
* 07:23 marostegui: Upgrade MySQL on db1072
* 07:02 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 07:02 jmm@cumin2001: START - Cookbook sre.hosts.downtime
* 06:42 elukey: restart kafka* on kafka-jumbo1001 to pick up new openjdk-8 version
* 06:37 elukey: restart cassandra instances on aqs1004 to pick up new openjdk-8 version
* 06:34 elukey: add term eventgate to analytics-in4 on cr1/cr2-eqiad - [[phab:T228882|T228882]]
* 05:31 twentyafterfour: set phabricator to read-write mode
* 05:30 marostegui: Failover m3 from db1072 to db1128 - [[phab:T228243|T228243]]
* 05:30 twentyafterfour: phabricator set to read-only mode
* 04:51 marostegui: Start pre-failover steps on m3 [[phab:T228243|T228243]]
* 02:02 XioNoX: remove peer AS63541 from cr1-eqsin
 
== 2019-07-24 ==
* 23:46 nuria@deploy1001: Finished deploy [analytics/refinery@7d93398]: deploying refinery 0.0.96 (skipping 0.0.95 due to some jenkins/archiva issues). Try 2 (duration: 13m 34s)
* 23:43 catrope@deploy1001: Synchronized php-1.34.0-wmf.15/extensions/Flow: Fix JS error when saving Flow board descriptions ([[phab:T228818|T228818]]) (duration: 01m 01s)
* 23:42 catrope@deploy1001: Synchronized php-1.34.0-wmf.14/extensions/Flow: Fix JS error when saving Flow board descriptions ([[phab:T228818|T228818]]) (duration: 01m 03s)
* 23:39 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enable homepage for 50% of new users on arwiki ([[phab:T228120|T228120]]) (duration: 00m 58s)
* 23:32 nuria@deploy1001: Started deploy [analytics/refinery@7d93398]: deploying refinery 0.0.96 (skipping 0.0.95 due to some jenkins/archiva issues). Try 2
* 23:30 nuria@deploy1001: Finished deploy [analytics/refinery@834db0a]: deploying refinery 0.0.96 (skipping 0.0.95 due to some jenkins/archiva issues) (duration: 18m 10s)
* 23:22 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enable GrowthExperiments homepage on arwiki ([[phab:T228120|T228120]]) (duration: 00m 55s)
* 23:13 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Correct typo in arwiki help panel config ([[phab:T228820|T228820]]) (duration: 00m 57s)
* 23:12 nuria@deploy1001: Started deploy [analytics/refinery@834db0a]: deploying refinery 0.0.96 (skipping 0.0.95 due to some jenkins/archiva issues)
* 22:41 thcipriani@: helmfile [EQIAD] Ran 'apply' command on namespace 'blubberoid' for release 'production' .
* 22:36 thcipriani@: helmfile [CODFW] Ran 'apply' command on namespace 'blubberoid' for release 'production' .
* 22:28 thcipriani@: helmfile [STAGING] Ran 'apply' command on namespace 'blubberoid' for release 'staging' .
* 21:22 mutante: <+icinga-wm> RECOVERY - Device not healthy -SMART- on restbase-dev1006 is OK: All metrics within thresholds. ([[phab:T224260|T224260]])
* 21:18 cscott@deploy1001: Finished deploy [parsoid/deploy@abd05ab]: Updating Parsoid to {{Gerrit|df1af404}} ([[phab:T227216|T227216]], [[phab:T226523|T226523]], [[phab:T226451|T226451]]) (duration: 18m 35s)
* 21:16 nuria@deploy1001: Finished deploy [analytics/refinery@58e64c1]: deploying refinery 0.0.95 (duration: 03m 54s)
* 21:12 nuria@deploy1001: Started deploy [analytics/refinery@58e64c1]: deploying refinery 0.0.95
* 21:03 ppchelko@deploy1001: Finished deploy [restbase/deploy@7911f65]: Store PCS endpoints [[phab:T222384|T222384]] (duration: 18m 18s)
* 21:00 cscott@deploy1001: Started deploy [parsoid/deploy@abd05ab]: Updating Parsoid to {{Gerrit|df1af404}} ([[phab:T227216|T227216]], [[phab:T226523|T226523]], [[phab:T226451|T226451]])
* 20:45 ppchelko@deploy1001: Started deploy [restbase/deploy@7911f65]: Store PCS endpoints [[phab:T222384|T222384]]
* 20:39 bsitzmann@deploy1001: Finished deploy [mobileapps/deploy@2e2ce6c]: Update mobileapps to {{Gerrit|1751a2e}} (duration: 04m 20s)
* 20:38 ppchelko@deploy1001: Finished deploy [changeprop/deploy@bf28187]: Rerender PCS endpoints [[phab:T222384|T222384]] (duration: 01m 34s)
* 20:36 ppchelko@deploy1001: Started deploy [changeprop/deploy@bf28187]: Rerender PCS endpoints [[phab:T222384|T222384]]
* 20:35 bsitzmann@deploy1001: Started deploy [mobileapps/deploy@2e2ce6c]: Update mobileapps to {{Gerrit|1751a2e}}
* 20:12 jeh: redirecting dumps.wikimedia.org back to labstore1007.wikimedia.org [[phab:T224228|T224228]]
* 19:43 ejegg: updated fundraising CiviCRM from {{Gerrit|875ab97742}} to {{Gerrit|121feb5d53}}
* 19:08 legoktm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enable SecureLinkFixer on group0 wikis - [[phab:T200751|T200751]] (duration: 00m 55s)
* 18:33 cmjohnson1: moving cloudvirt107 to 10G rack [[phab:T228691|T228691]]
* 18:19 krinkle@deploy1001: Synchronized php-1.34.0-wmf.15/includes/cache/localisation/LocalisationCache.php: {{Gerrit|31d99eb381bc}} (duration: 00m 54s)
* 18:15 ejegg: updated payments-wiki from {{Gerrit|a28ad541ed}} to {{Gerrit|70b432d309}}
* 18:13 urandom: creating new restbase keyspaces -- [[phab:T228804|T228804]]
* 18:12 Krinkle: krinkle@deploy1001: extensions/CheckUser is dirty in php-1.34.0-wmf.15
* 17:14 XioNoX: rollback failover master VIP of ae2.1202 inet6 away from cr1-eqiad - [[phab:T226782|T226782]]
* 17:10 XioNoX: Add mr1-codfw<->cr1/2-codfw vlan/link config on asw-a-codfw - [[phab:T228112|T228112]]
* 16:44 jijiki: Rolling puppet-enable and apache reload of jobrunners in codfw
* 16:12 bblack@cumin1001: conftool action : set/pooled=yes; selector: name=dns1001.wikimedia.org
* 16:12 bblack: re-pooling recdns on dns1001 via confctl - [[phab:T226782|T226782]]
* 16:11 bblack: lvs1014 - restore puppet and resolv.conf contents, restart pybal
* 16:10 bblack: dns1001 - restart recursor and re-enable puppet - [[phab:T226782|T226782]]
* 16:07 jforrester@deploy1001: Synchronized php-1.34.0-wmf.15/includes/export/XmlDumpWriter.php: [[phab:T228720|T228720]] make XmlDumpwriter more resilient to blob store corruption (duration: 00m 55s)
* 16:06 jforrester@deploy1001: Synchronized php-1.34.0-wmf.14/includes/export/XmlDumpWriter.php: [[phab:T228720|T228720]] make XmlDumpwriter more resilient to blob store corruption (duration: 00m 55s)
* 15:59 bblack: dns1001 - puppet disable, stop recursor service to kill anycast advert - [[phab:T226782|T226782]]
* 15:59 bblack: lvs1014 - puppet disable, remove dns1001 from resolv.conf, restart pybal - [[phab:T226782|T226782]]
* 15:58 XioNoX: failover master VIP of ae2.1202 inet6 away from cr1-eqiad - [[phab:T226782|T226782]]
* 15:56 bblack@cumin1001: conftool action : set/pooled=no; selector: name=dns1001.wikimedia.org
* 15:56 bblack: depooling recdns on dns1001 via confctl - [[phab:T226782|T226782]]
* 15:56 bblack: depooling recdns on dns1001 via confctl
* 15:47 jijiki: Rolling puppet-enable and apache reload of jobrunners in eqiad
* 15:44 jeh: rebooting labstore1007.wikimedia.org for updates [[phab:T224228|T224228]]
* 15:42 jijiki: Disable puppet on jobrunners for 525306
* 15:11 herron: resume ingesting [message] =~ /^SlowTimer/ logs on logstash1007 (as a canary)
* 15:02 XioNoX: re-enable vc link between asw2-a6 and asw2-a7 - [[phab:T228823|T228823]]
* 14:58 jeh: unmounting dumps NFS clients from labstore1007.wikimedia.org [[phab:T224228|T224228]]
* 14:54 XioNoX: cleared vc ports stats on asw2-a-eqiad - [[phab:T228823|T228823]]
* 14:43 marostegui: Drop abuse_filter_log.afl_log_id in s5 eqiad - [[phab:T226851|T226851]]
* 14:40 marostegui: Drop abuse_filter_log.afl_log_id in s5 codfw (lag will appear on codfw) - [[phab:T226851|T226851]]
* 14:31 tarrow@: helmfile [CODFW] Ran 'apply' command on namespace 'termbox' for release 'production' .
* 13:49 robh: rebooting cloudvirt1015 into OS, memory error confirmed.  new memory replacement dispatch entered via [[phab:T220853|T220853]]
* 13:31 marostegui: Drop abuse_filter_log.afl_log_id in s2 eqiad - [[phab:T226851|T226851]]
* 13:25 robh: rebooting cloudvirt1015 into memtest for dell support repair via [[phab:T220853|T220853]]
* 13:06 liw@deploy1001: Synchronized php: group1 wikis to 1.34.0-wmf.15 (duration: 00m 54s)
* 13:05 liw@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.34.0-wmf.15
* 12:19 marostegui: Stop haproxy on dbproxy1004 and dbproxy1009 (m4 - eventlogging) - [[phab:T228768|T228768]]
* 11:23 awight@deploy1001: Synchronized wmf-config/CommonSettings.php: [[gerrit:525254{{!}}Disable FileImporter source wiki edits (T228851)]] (duration: 00m 54s)
* 11:12 awight@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [[gerrit:514672{{!}}Remove Content Translation event logging config]] (part 2/2) (duration: 00m 54s)
* 11:10 awight@deploy1001: Synchronized wmf-config/CommonSettings.php: [[gerrit:514672{{!}}Remove Content Translation event logging config]] (part 1/2) (duration: 00m 59s)
* 10:04 marostegui: Drop abuse_filter_log.afl_log_id from labswiki (wikitech) and labtestwiki - [[phab:T226851|T226851]]
* 09:39 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: More traffic to db1082 (duration: 00m 55s)
* 08:58 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1082 into API after upgrade (duration: 00m 55s)
* 08:53 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Slowly repool db1082 after upgrade (duration: 00m 54s)
* 08:40 marostegui: Stop MySQL on db1082 for upgrade
* 08:40 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1082 for upgrade (duration: 00m 57s)
* 08:35 marostegui: Drop abuse_filter_log.afl_log_id in s2 codfw (lag will appear on codfw) - [[phab:T226851|T226851]]
* 07:58 marostegui: Drop abuse_filter_log.afl_log_id  from wikidata in eqiad - [[phab:T226851|T226851]]
* 07:21 marostegui: Stop MySQL on db1117:3322 to check dbproxy1013 notifications - [[phab:T202367|T202367]]
* 07:10 marostegui: Deploy grants for dbproxy1013 in m2 - [[phab:T202367|T202367]]
* 05:00 marostegui: Stop puppet on dbprov2001 to generate s5 mysqldump manually
* 04:52 tstarling@deploy1001: Synchronized php-1.34.0-wmf.14/includes/MediaWiki.php: [[phab:T227700|T227700]] (duration: 00m 54s)
* 04:51 tstarling@deploy1001: Synchronized php-1.34.0-wmf.14/includes/specials/SpecialGoToInterwiki.php: [[phab:T227700|T227700]] (duration: 00m 54s)
* 04:50 tstarling@deploy1001: Synchronized php-1.34.0-wmf.14/includes/MediaWiki.php: [[phab:T227700|T227700]] (duration: 00m 53s)
* 04:49 tstarling@deploy1001: Synchronized php-1.34.0-wmf.14/includes/specials/SpecialGoToInterwiki.php: [[phab:T227700|T227700]] (duration: 00m 54s)
* 04:46 tstarling@deploy1001: Synchronized php-1.34.0-wmf.15/includes/MediaWiki.php: [[phab:T227700|T227700]] (duration: 00m 54s)
* 04:45 tstarling@deploy1001: Synchronized php-1.34.0-wmf.15/includes/specials/SpecialGoToInterwiki.php: [[phab:T227700|T227700]] (duration: 00m 54s)
* 04:42 tstarling@deploy1001: Synchronized php-1.34.0-wmf.15/includes/MediaWiki.php: [[phab:T227700|T227700]] (duration: 00m 54s)
* 04:40 tstarling@deploy1001: Synchronized php-1.34.0-wmf.15/includes/specials/SpecialGoToInterwiki.php: (no justification provided) (duration: 00m 56s)
* 03:41 tstarling@deploy1001: Synchronized w/fatal-error.php: Adding post-send exception test for [[phab:T228462|T228462]] (duration: 00m 54s)
* 03:39 tstarling@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Adding DeferredUpdates log channel ([[phab:T228462|T228462]]) (duration: 00m 56s)
* 02:01 eileen: payments-wiki revision changed from {{Gerrit|224c6b2d7b}} to {{Gerrit|a28ad541ed}}, config revision is {{Gerrit|8dcb77cf22}}
 
== 2019-07-23 ==
* 23:44 eileen: civicrm revision changed from {{Gerrit|88e9f24893}} to {{Gerrit|875ab97742}}, config revision is {{Gerrit|4006d3bdc5}}
* 23:43 shdubsh: reverting logstash mitigations and re-enable puppet
* 23:42 jforrester@deploy1001: Synchronized php-1.34.0-wmf.15/includes/diff/DifferenceEngine.php: [[phab:T228766|T228766]] Don't double wrap rollback links (duration: 00m 56s)
* 23:31 mutante: mw1267 - rm -rf /srv/mediawiki/php-1.33.0-wmf.23 ; rm -rf /srv/mediawiki/php-1.32.0-wmf.3 ; scap pull
* 23:25 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1267.eqiad.wmnet
* 22:36 mutante: rolling out scap 3.11.1-1 on mw-eqiad servers
* 22:14 mutante: continuing rollout of new scap version 3.11.1-1, starting with kafka-all followed by other cumin-alias groups ([[phab:T228328|T228328]])
* 22:06 herron: puppet temporarily disabled on eqiad/codfw logstash collectors while catching up with backlog. see /etc/logstash/conf.d/01-filter_temp_drops.conf
* 21:52 herron: logstash - temporarily dropping logs matching [message] =~ /^SlowTimer/ due to UTF-8 parsing errors that are stopping the logstash processing pipeline.  will re-enable after logstash has caught up with the backlog
* 20:59 shdubsh: temporarily disable input-kafka-rsyslog-shipper and drop memcached logs on logstash nodes
* 20:08 paravoid: asw2-a-eqiad: request virtual-chassis vc-port set interface member 6 vcp-255/1/0 disable
* 19:58 eileen: process-control config revision is {{Gerrit|4006d3bdc5}} - disabled  drush fill donor totals job
* 19:49 mutante: mwdebug1002 - restarting hhvm - mw1312 - restarted apache
* 19:44 andrewbogott: restarting rabbitmq-server on cloudcontrol1003 and 1004
* 19:40 mutante: restarting hhvm on mw1312
* 19:28 cdanis: depool all appservers in eqiad A7 cdanis@cumin1001.eqiad.wmnet ~ 🍵 sudo cumin 'mw12[67-83]*' 'depool'
* 19:11 bblack: repool lvs1013 - [[phab:T227143|T227143]]
* 19:10 bblack: repool cp1077 + cp1078 - [[phab:T227143|T227143]]
* 19:09 elukey: depool mw1261 for investigation
* 19:06 herron: restarting logstash on logstash100[789]
* 18:53 robh: mw1271 had power loss event due to pdu swap via [[phab:T227143|T227143]]
* 18:45 mutante: rolling out scap 3.11.1-1 on all mw codfw servers ([[phab:T228328|T228328]])
* 18:43 mutante: rolling out scap 3.11.1-1 on mw canary servers ([[phab:T228328|T228328]])
* 18:13 robh: started depooling servers in a7-eqiad for pdu work via [[phab:T227143|T227143]]
* 18:11 cdanis: depool mw1267
* 18:10 cdanis: cdanis@mw1267.eqiad.wmnet /srv/mediawiki ☕ scap pull
* 18:09 cdanis: cdanis@mw1267.eqiad.wmnet ~ ☕ sudo apt install python-concurrent.futures
* 18:08 jforrester@deploy1001: Synchronized php-1.34.0-wmf.15/includes/export/XmlDumpWriter.php: [[phab:T228720|T228720]] Make XmlDumpwriter resilient to blob store corruption (duration: 00m 54s)
* 18:07 James_F: Belay that, error on mw1267.
* 18:06 James_F: Sync error on mw1314.eqiad.wmnet, No module named concurrent.futures
* 18:06 jforrester@deploy1001: Synchronized php-1.34.0-wmf.14/includes/export/XmlDumpWriter.php: [[phab:T228720|T228720]] Make XmlDumpwriter resilient to blob store corruption (duration: 00m 57s)
* 18:05 bblack: lvs1013 - disable puppet and stop pybal - [[phab:T227143|T227143]]
* 18:04 bblack: depool cp1077 + cp1088 - [[phab:T227143|T227143]]
* 18:03 cdanis@deploy1001: Synchronized docroot/noc/db.php: {{Gerrit|8def4af1d}} noc db.php: include readonly status & group loads (duration: 00m 55s)
* 17:52 moritzm: installing Java security updates on kafka/main and Logstash servers
* 17:38 ppchelko@deploy1001: Finished deploy [changeprop/deploy@6c5c0a3]: Switch internal events to the new schema [[phab:T226522|T226522]], step 2 (duration: 01m 37s)
* 17:36 ppchelko@deploy1001: Started deploy [changeprop/deploy@6c5c0a3]: Switch internal events to the new schema [[phab:T226522|T226522]], step 2
* 17:00 ppchelko@deploy1001: Finished deploy [changeprop/deploy@894f735]: Switch internal events to the new schema [[phab:T226522|T226522]] (duration: 01m 30s)
* 16:58 ppchelko@deploy1001: Started deploy [changeprop/deploy@894f735]: Switch internal events to the new schema [[phab:T226522|T226522]]
* 16:22 godog: pool prometheus1003 - [[phab:T227139|T227139]]
* 15:46 robh: side b of a5-eqiad swapping pdu via [[phab:T227141|T227141]]
* 15:14 otto@: helmfile [CODFW] Ran 'apply' command on namespace 'eventgate-main' for release 'main' .
* 15:08 _joe_: uninstalling php-pear, php-mail, php-mail-mime from mw1267 [[phab:T195364|T195364]]
* 14:52 ppchelko@deploy1001: Finished deploy [restbase/deploy@ea10fa5]: Switch event production to eventgate [[phab:T211248|T211248]], attempt 2 (duration: 13m 08s)
* 14:39 ppchelko@deploy1001: Started deploy [restbase/deploy@ea10fa5]: Switch event production to eventgate [[phab:T211248|T211248]], attempt 2
* 14:14 robh: a3-eqiad pdu swap taking place now via [[phab:T227139|T227139]]
* 13:47 otto@: helmfile [EQIAD] Ran 'apply' command on namespace 'eventgate-main' for release 'main' .
* 13:45 godog: depool restbase1016 restbase1019 restbase1011 restbase1010 prometheus1003 ahead of PDU work - [[phab:T227139|T227139]]
* 13:45 otto@: helmfile [CODFW] Ran 'apply' command on namespace 'eventgate-main' for release 'main' .
* 13:44 moritzm: installing Java security updates on furud/flerovium
* 13:43 otto@: helmfile [STAGING] Ran 'apply' command on namespace 'eventgate-main' for release 'main' .
* 13:27 jeh: dumps switching active vps to labstore1006 [[phab:T224228|T224228]]
* 13:17 liw@deploy1001: rebuilt and synchronized wikiversions files: group0 to 1.34.0-wmf.15
* 13:07 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 13:07 jmm@cumin2001: START - Cookbook sre.hosts.downtime
* 13:06 liw@deploy1001: rebuilt and synchronized wikiversions files: group0 to 1.34.0-wmf.15
* 13:06 marostegui: Drop abuse_filter_log.afl_log_id from s8 codfw (lag will happen on codfw s8) - [[phab:T226851|T226851]]
* 12:33 liw@deploy1001: Finished scap: testwiki to php-1.34.0-wmf.15 and rebuild l10n cache (duration: 29m 46s)
* 12:04 liw@deploy1001: Started scap: testwiki to php-1.34.0-wmf.15 and rebuild l10n cache
* 12:02 akosiaris: drain kubernetes1001. [[phab:T227139|T227139]]
* 12:01 akosiaris: empty ganeti1007 from running instances. [[phab:T227139|T227139]]
* 11:59 akosiaris: enable disable poolcounter1003, switchover codfw poolcounters [[phab:T224572|T224572]]
* 11:58 tarrow: EU SWAT finished
* 11:58 akosiaris@deploy1001: Synchronized wmf-config/ProductionServices.php: (no justification provided) (duration: 00m 46s)
* 11:56 tarrow@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:525065{{!}}T214902 Fix missing /termbox in SSRTermboxServerUrl]] (duration: 00m 44s)
* 11:54 liw@deploy1001: Pruned MediaWiki: 1.34.0-wmf.10 (duration: 07m 55s)
* 11:43 jijiki: restart php-fpm on mwdebug*
* 11:25 tarrow@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:525062{{!}}T214902 Enable termbox on testwikidatawiki]] (duration: 01m 37s)
* 11:08 jijiki: enable puppet on jobrunners
* 10:17 marostegui: Drop abuse_filter_log.afl_log_id from db1096:3316, db1139:3316 and dbstore1005:3316 [[phab:T226851|T226851]]
* 10:02 moritzm: installing Java security updates on notebook/stat hosts
* 09:59 fsero@: helmfile [STAGING] Ran 'apply' command on namespace 'termbox' for release 'staging' .
* 09:59 fsero@: helmfile [STAGING] Ran 'apply' command on namespace 'termbox' for release 'test' .
* 09:57 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 09:57 jmm@cumin2001: START - Cookbook sre.hosts.downtime
* 09:53 marostegui: Drop abuse_filter_log.afl_log_id from s6 codfw with replication (this will cause lag in s6 codfw) - [[phab:T226851|T226851]]
* 09:51 akosiaris: enable poolcounter1005, disablepoolcounter1001 [[phab:T224572|T224572]]
* 09:51 akosiaris@deploy1001: Synchronized wmf-config/ProductionServices.php: (no justification provided) (duration: 00m 47s)
* 09:23 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Fully repool into API db1100 after upgrade (duration: 00m 46s)
* 09:13 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Slowly repool into API db1100 after upgrade (duration: 00m 47s)
* 09:09 akosiaris@deploy1001: Synchronized wmf-config/ProductionServices.php: (no justification provided) (duration: 00m 47s)
* 09:03 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Slowly repool db1100 after upgrade (duration: 00m 46s)
* 08:34 marostegui: Upgrade db1100
* 08:33 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1100 for upgrade (duration: 00m 53s)
* 08:08 marostegui: Stop MySQL on db2044 to test dbproxy2002 notifications - [[phab:T202367|T202367]]
* 07:31 marostegui: Deploy grants for dbproxy2002 on m2 - [[phab:T202367|T202367]]
* 04:52 eileen: civicrm revision changed from {{Gerrit|d951b07ce3}} to {{Gerrit|88e9f24893}}, config revision is {{Gerrit|f7b7622e27}}
* 04:43 marostegui: Failover m1 from dbproxy1001 to dbproxy1006 [[phab:T227139|T227139]]
* 00:06 Urbanecm: slwiki updateCollection.php completed ([[phab:T208984|T208984]])
 
== 2019-07-22 ==
* 23:54 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: 524952 Increase hewiki rollback limit for patrollers to 50/60 (duration: 00m 48s)
* 23:54 Urbanecm: Run mwscript importImages.php --wiki=commonswiki --user=Meisam /home/urbanecm/T223052
* 23:42 Urbanecm: All updateCollation.php runs completed, except the one for slwiki ([[phab:T208984|T208984]])
* 23:42 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Add flood group to ptwiki ([[phab:T228521|T228521]]) (duration: 00m 47s)
* 23:39 Urbanecm: Run mwscript updateCollation.php --wiki=slwiktionary --previous-collation=uppercase ([[phab:T208984|T208984]])
* 23:39 Urbanecm: Run mwscript updateCollation.php --wiki=slwikiversity --previous-collation=uppercase ([[phab:T208984|T208984]])
* 23:37 Urbanecm: Run mwscript updateCollation.php --wiki=slwikisource --previous-collation=uppercase ([[phab:T208984|T208984]])
* 23:36 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Fix comment in IS.php (noop, [[phab:T227000|T227000]]) (duration: 00m 46s)
* 23:34 Urbanecm: Run mwscript updateCollation.php --wiki=slwikiquote --previous-collation=uppercase ([[phab:T208984|T208984]])
* 23:34 Urbanecm: Run mwscript updateCollation.php --wiki=slwikibooks --previous-collation=uppercase ([[phab:T208984|T208984]])
* 23:33 urbanecm@deploy1001: Synchronized wmf-config/flaggedrevs.php: SWAT: [[:gerrit:524704{{!}}Fix "Remove "עמוד" namespace from wgFlaggedRevsNamespaces for hewikisource"]] ([[phab:T227000|T227000]]) (duration: 00m 47s)
* 23:29 Urbanecm: Run mwscript updateCollation.php --wiki=slwiki --previous-collation=uppercase ([[phab:T208984|T208984]])
* 23:27 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Set wgCategoryCollation to uca-sl-u-kn on Slovene projects (sl) ([[phab:T208984|T208984]]) (duration: 00m 47s)
* 22:11 mutante: dropped zero.wikiMedia.org from DNS ([[phab:T187716|T187716]])
* 21:50 sbassett@deploy1001: Synchronized private/PrivateSettings.php: Further mitigations for [[phab:T227416|T227416]] (duration: 00m 46s)
* 21:38 ppchelko@deploy1001: Finished deploy [restbase/deploy@9a99b17]: Rollback: Switch event production to eventgate [[phab:T211248|T211248]] (duration: 13m 01s)
* 21:35 sbassett@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Revert "Temporary make account creation limits more restrictive" (duration: 00m 47s)
* 21:27 eileen: civicrm revision is {{Gerrit|d951b07ce3}}, config revision is {{Gerrit|f7b7622e27}}
* 21:25 ppchelko@deploy1001: Started deploy [restbase/deploy@9a99b17]: Rollback: Switch event production to eventgate [[phab:T211248|T211248]]
* 21:21 ppchelko@deploy1001: Finished deploy [restbase/deploy@ea10fa5]: Switch event production to eventgate [[phab:T211248|T211248]] (duration: 16m 14s)
* 21:21 otto@: helmfile [EQIAD] Ran 'apply' command on namespace 'eventgate-main' for release 'main' .
* 21:20 otto@: helmfile [CODFW] Ran 'apply' command on namespace 'eventgate-main' for release 'main' .
* 21:19 otto@: helmfile [STAGING] Ran 'apply' command on namespace 'eventgate-main' for release 'main' .
* 21:17 otto@: helmfile [STAGING] Ran 'apply' command on namespace 'eventgate-main' for release 'main' .
* 21:05 eileen: civicrm revision changed from {{Gerrit|f932e56cd2}} to {{Gerrit|d951b07ce3}}, config revision is {{Gerrit|f7b7622e27}}
* 21:04 ppchelko@deploy1001: Started deploy [restbase/deploy@ea10fa5]: Switch event production to eventgate [[phab:T211248|T211248]]
* 20:04 smalyshev@deploy1001: Finished deploy [wdqs/wdqs@0be6045]: Weekly deploy (duration: 18m 42s)
* 19:46 smalyshev@deploy1001: Started deploy [wdqs/wdqs@0be6045]: Weekly deploy
* 19:09 ppchelko@deploy1001: Finished deploy [changeprop/deploy@3f8aad2]: Switch revision-score to eventgate [[phab:T211248|T211248]] (duration: 01m 31s)
* 19:07 ppchelko@deploy1001: Started deploy [changeprop/deploy@3f8aad2]: Switch revision-score to eventgate [[phab:T211248|T211248]]
* 18:59 elukey: repool scb1001 after pdu maintenance
* 18:59 herron: repooling kafka1001 [[phab:T227140|T227140]]
* 18:31 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enable help panel for 50% of new users on arwiki ([[phab:T226729|T226729]]) (duration: 00m 47s)
* 18:17 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Trying the last sync again, because it's appearing inconsistently (duration: 00m 47s)
* 18:15 thcipriani: restarting gerrit due to [[phab:T224448|T224448]]
* 18:09 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enable GrowthExperiments help panel on arwiki ([[phab:T226729|T226729]]) (duration: 00m 48s)
* 18:00 elukey: arm keyholder on netmon1002 after power loss
* 17:35 elukey: depool scb1001 for PDU work [[phab:T227140|T227140]]
* 17:22 herron: depooling kafka1001 for PDU work [[phab:T227140|T227140]]
* 17:17 nuria@deploy1001: Finished deploy [analytics/refinery@d889893]: deploying refinery jar bump forwebrequest/load jobs (duration: 14m 51s)
* 17:02 nuria@deploy1001: Started deploy [analytics/refinery@d889893]: deploying refinery jar bump forwebrequest/load jobs
* 17:02 jijiki: enable puppet on all jobrunners
* 16:55 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: [[phab:T87899|T87899]] Use wfLoadExtension for Collection rather than deprecated entry point (duration: 00m 47s)
* 16:48 jforrester@deploy1001: Synchronized wmf-config/extension-list: Load Collection i18n via extension.json directly (duration: 00m 47s)
* 16:36 jeh: redirecting dumps.wikimedia.org  dns to labstore1006 [[phab:T224228|T224228]]
* 15:49 jijiki: Rolling depool and pool of mw1293, mw1294, mw1295, mw1296, mw1299 - [[phab:T219148|T219148]]
* 15:38 marostegui: Stop mysql and power off pc2010 for on-site maintenance - [[phab:T227552|T227552]]
* 15:35 jforrester@deploy1001: Synchronized php-1.34.0-wmf.14/extensions/Wikibase/lib/WikibaseLib.php: [[phab:T227814|T227814]] Wikibase: Define $wgMessagesDirs in WikibaseLib PHP entry point (duration: 00m 48s)
* 15:27 jijiki: Depool mw1300 and pool back
* 15:24 jforrester@deploy1001: Synchronized php-1.34.0-wmf.14/includes/export/XmlDumpWriter.php: [[phab:T228614|T228614]] XmlDumpWriter: don't load revision text content unless requested to (duration: 00m 48s)
* 15:17 jijiki: Disable puppet on jobrunners to enable php7_only
* 14:55 otto@: helmfile [EQIAD] Ran 'apply' command on namespace 'eventgate-main' for release 'main' .
* 14:53 otto@: helmfile [CODFW] Ran 'apply' command on namespace 'eventgate-main' for release 'main' .
* 14:44 otto@: helmfile [EQIAD] Ran 'apply' command on namespace 'eventgate-analytics' for release 'analytics' .
* 14:38 otto@: helmfile [CODFW] Ran 'apply' command on namespace 'eventgate-analytics' for release 'analytics' .
* 14:30 otto@: helmfile [CODFW] Ran 'apply' command on namespace 'eventgate-analytics' for release 'analytics' .
* 14:30 ottomata: deploying refactored eventgate chart using eventgate-wikimedia image to  eventgate-* services -  [[phab:T226668|T226668]]
* 14:28 liw@deploy1001: rebuilt and synchronized wikiversions files: all wikis to 1.34.0-wmf.14
* 13:12 kart_: Updated cxserver to 2019-07-17-074415-production ([[phab:T227553|T227553]], [[phab:T216812|T216812]])
* 13:07 kartik@deploy1001: scap-helm cxserver finished
* 13:07 kartik@deploy1001: scap-helm cxserver cluster eqiad completed
* 13:07 kartik@deploy1001: scap-helm cxserver upgrade -f cxserver-eqiad-values.yaml production stable/cxserver [namespace: cxserver, clusters: eqiad]
* 13:02 kartik@deploy1001: scap-helm cxserver finished
* 13:02 kartik@deploy1001: scap-helm cxserver cluster codfw completed
* 13:02 kartik@deploy1001: scap-helm cxserver upgrade -f cxserver-codfw-values.yaml production stable/cxserver [namespace: cxserver, clusters: codfw]
* 13:00 kartik@deploy1001: scap-helm cxserver finished
* 13:00 kartik@deploy1001: scap-helm cxserver cluster staging completed
* 12:59 kartik@deploy1001: scap-helm cxserver upgrade -f cxserver-staging-values.yaml staging stable/cxserver [namespace: cxserver, clusters: staging]
* 12:58 marostegui: Stop MySQL on db1117:3321 to test dbproxy1014 (replacement for dbproxy1006) on m1 - [[phab:T202367|T202367]]
* 12:22 moritzm: installing debian-archive-keyring Stretch update (SUA 164)
* 11:20 awight@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [[gerrit:524685{{!}}Enable wgNamespacesWithSubpages on main NS for kowikiversity (T228481)]] (duration: 00m 54s)
* 11:14 awight@deploy1001: Synchronized wmf-config/CommonSettings-labs.php: SWAT: [[gerrit:523661{{!}}Enable FileImporter source wiki edit and delete, (remove labs customizations) (T225617, T226532)]] (duration: 00m 54s)
* 11:13 awight@deploy1001: Synchronized wmf-config/CommonSettings.php: SWAT: [[gerrit:523661{{!}}Enable FileImporter source wiki edit and delete (T225617, T226532)]] (duration: 00m 56s)
* 10:55 jijiki: Enable puppet on jobrunners
* 10:27 jijiki: Depool and pool mw1300
* 10:23 jijiki: Disable puppet on jobrunners for 524336 - [[phab:T219148|T219148]]
* 10:21 root@: helmfile [STAGING] Ran 'apply' command on namespace 'kube-system' for release 'coredns' .
* 10:20 fsero: deploy coredns in staging [[phab:T226516|T226516]]
* 09:47 elukey: failover + restart of Hadoop HDFS namenode on an-master1001 to apply GC settings - [[phab:T228620|T228620]]
* 09:40 marostegui: Deploy grants on m1 to allow connections from dbproxy1014 - [[phab:T202367|T202367]]
* 09:32 elukey: restart hadoop hdfs namenode on an-master1002 to apply new GC settings - [[phab:T228620|T228620]]
* 08:33 marostegui: Rename table enwiki.math on db2116  [[phab:T196055|T196055]]
* 07:57 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1134 after schema change [[phab:T226851|T226851]] (duration: 00m 51s)
* 07:54 elukey: sudo -i depool on elastic1046 - broken disk (srv partition not available) - [[phab:T228606|T228606]]
* 07:40 elukey: systemctl reset-failed restbase on restbase1007->15 (decommed nodes)
* 07:27 marostegui: Drop afl_log_id column from enwiki.abuse_filter_log on db1134 [[phab:T226851|T226851]]
* 07:27 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1134 for schema change [[phab:T226851|T226851]] (duration: 00m 56s)
* 07:17 moritzm: installing openjdk-11 security updates
* 06:47 marostegui: Stop MySQL on db2062 to test dbproxy2001 notification [[phab:T202367|T202367]]
* 06:23 elukey: restart hadoop-hdfs-namenode on an-master1002 to verify if out-of-the-ordinary GC activity
* 06:15 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Remove db1104 from s8 API (duration: 00m 55s)
* 05:53 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: More traffic to db1109 into API (duration: 00m 58s)
* 05:24 marostegui: Compress more tables on labsdb1009 - [[phab:T222978|T222978]]
* 04:48 tstarling@deploy1001: Synchronized php-1.34.0-wmf.14/extensions/TorBlock/extension.json: fixing UBN [[phab:T228465|T228465]] (duration: 00m 54s)
* 04:46 tstarling@deploy1001: Synchronized php-1.34.0-wmf.14/extensions/TorBlock/maintenance/loadExitNodes.php: fixing UBN [[phab:T228465|T228465]] (duration: 00m 54s)
* 04:44 tstarling@deploy1001: Synchronized php-1.34.0-wmf.14/extensions/TorBlock/includes/TorExitNodes.php: fixing UBN [[phab:T228465|T228465]] (duration: 00m 56s)
* 04:17 tstarling@deploy1001: Synchronized php-1.34.0-wmf.14/extensions/CentralAuth/includes/specials/SpecialMultiLock.php: fix UBN bug [[phab:T227772|T227772]] (duration: 00m 56s)
 
== 2019-07-21 ==
* 01:06 Urbanecm: Deployed patch for [[phab:T228574|T228574]]
 
== 2019-07-19 ==
* 22:36 mutante: phab2001 - switching apache to php-fpm and worker instead of mpm-prefork (to match phab1001) ([[phab:T190568|T190568]] [[phab:T137928|T137928]] [[phab:T190572|T190572]])
* 21:57 eileen: update process control process-control config revision is {{Gerrit|c913a5f261}}
* 21:34 gehel@cumin1001: END (ERROR) - Cookbook sre.wdqs.data-transfer (exit_code=97)
* 21:25 eileen: civicrm revision changed from {{Gerrit|21d3c5a3fc}} to {{Gerrit|f932e56cd2}}, config revision is {{Gerrit|9f7eba2193}}
* 19:35 gehel@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 19:35 gehel@cumin1001: END (ERROR) - Cookbook sre.wdqs.data-transfer (exit_code=97)
* 19:34 gehel@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 19:07 eevans@: helmfile [CODFW] Ran 'apply' command on namespace 'sessionstore' for release 'production' .
* 19:02 eevans@: helmfile [EQIAD] Ran 'apply' command on namespace 'sessionstore' for release 'production' .
* 17:53 cdanis@deploy1001: Synchronized docroot/noc/db.php: noc: db.php: support ?dc=codfw, and cleanups (duration: 00m 56s)
* 17:44 XioNoX: change netflow target port to 2055 in eqiad
* 16:17 gehel@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 15:55 moritzm: rebooting mw2164 for a test
* 15:54 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 15:54 jmm@cumin2001: START - Cookbook sre.hosts.downtime
* 15:40 root@: helmfile [STAGING] Ran 'apply' command on namespace 'kube-system' for release 'coredns' .
* 15:27 root@: helmfile [STAGING] Ran 'apply' command on namespace 'kube-system' for release 'coredns' .
* 15:26 root@: helmfile [STAGING] Ran 'apply' command on namespace 'kube-system' for release 'coredns' .
* 15:22 fsero: deploy coredns in staging [[phab:T226516|T226516]]
* 15:03 gehel@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 14:42 krinkle@deploy1001: Synchronized php-1.34.0-wmf.14/extensions/Collection/Collection.php: {{Gerrit|90eed0fad}} / [[phab:T87899|T87899]] (duration: 00m 54s)
* 14:35 krinkle@deploy1001: Synchronized php-1.34.0-wmf.13/extensions/Collection/Collection.php: {{Gerrit|66ce154d7d734209c76a62cf}} / [[phab:T87899|T87899]] (duration: 00m 56s)
* 14:29 ariel@deploy1001: Finished deploy [dumps/dumps@71e62ee]: better exception handling for misc dumps (duration: 00m 03s)
* 14:29 ariel@deploy1001: Started deploy [dumps/dumps@71e62ee]: better exception handling for misc dumps
* 14:28 Krinkle: krinkle@deploy1001: Untracked file found in php-1.34-wmf.13
* 14:28 Krinkle: krinkle@deploy1001: extensions/CheckUser is dirty in php-1.34-wmf.13 and php-1.34-wmf.14
* 13:30 tarrow@: helmfile [STAGING] Ran 'apply' command on namespace 'termbox' for release 'staging' .
* 13:04 moritzm: installing bzip2 security updates on jessie
* 12:28 gehel@cumin1001: END (ERROR) - Cookbook sre.wdqs.data-transfer (exit_code=97)
* 10:56 root@: helmfile [STAGING] Ran 'apply' command on namespace 'kube-system' for release 'calico-policy-controller' .
* 10:55 root@: helmfile [STAGING] Ran 'apply' command on namespace 'kube-system' for release 'calico-policy-controller' .
* 10:53 root@: helmfile [STAGING] Ran 'apply' command on namespace 'kube-system' for release 'calico-policy-controller' .
* 10:53 fsero: deploying calico from helmfile in staging [[phab:T227775|T227775]]
* 10:35 jijiki: enable puppet on jobrunners
* 10:26 jijiki: disable puppet on jobrunners for 523908
* 08:37 ariel@deploy1001: Finished deploy [dumps/dumps@440faa0]: more error reporting for stubs/abstracts/pagelogs; more public table dumps by default (duration: 00m 04s)
* 08:37 ariel@deploy1001: Started deploy [dumps/dumps@440faa0]: more error reporting for stubs/abstracts/pagelogs; more public table dumps by default
* 08:36 gehel@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 08:24 gehel: repooling wdqs2004 - [[phab:T228122|T228122]]
* 08:22 gehel: repooling wdqs2003 - [[phab:T228122|T228122]]
* 08:20 vgutierrez: restart pybal on lvs2003
* 08:16 vgutierrez: restart pybal on lvs2006
* 08:10 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Pool db1109 into API (duration: 00m 54s)
* 07:57 moritzm: installing idp1001 [[phab:T228403|T228403]]
* 07:38 moritzm: rebooting tungsten for kernel update
* 07:38 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 07:38 jmm@cumin2001: START - Cookbook sre.hosts.downtime
* 07:25 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 07:25 jmm@cumin2001: START - Cookbook sre.hosts.downtime
* 07:25 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 07:25 jmm@cumin2001: START - Cookbook sre.hosts.downtime
* 07:03 elukey: restart php-fpm on mw1330 - op-cache hit ratio low
* 07:02 jynus: reloading dbproxy1004/9
* 07:01 elukey: depool wdqs2004 from all services (waiting for maintenance)
* 06:32 legoktm@deploy1001: Synchronized php-1.34.0-wmf.13/extensions/EventBus/includes/EventBus.php: Add more debugging to figure out which events are invalid: [[phab:T225199|T225199]] (duration: 00m 55s)
* 06:30 legoktm@deploy1001: Synchronized php-1.34.0-wmf.14/extensions/EventBus/includes/EventBus.php: Add more debugging to figure out which events are invalid: [[phab:T225199|T225199]] (duration: 00m 55s)
* 06:15 elukey: clear opcache on mwdebug*
* 05:26 fsero: repool ms-fe2005 - [[phab:T228196|T228196]]
* 05:11 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Repool db2116 (duration: 00m 55s)
* 04:11 eileen: I think I didn't push the turn it on commit - tried again  process-control config revision is {{Gerrit|9f7eba2193}}
* 03:03 eileen: process-control config revision is {{Gerrit|7598dc1bf9}} (jobs reenabled)
* 01:52 XioNoX: enable outbound sampling on eqiad's router
* 00:52 sbassett@deploy1001: Synchronized private/PrivateSettings.php: Add even more severe rate limits for eswikiquote and some other, smaller wikis ([[phab:T227416|T227416]]) (duration: 00m 58s)
* 00:38 mutante: mwmaint2001 - puppet fails - not removing a bunch of log dirs for maintenance crons
* 00:10 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2250.codfw.wmnet
* 00:08 eileen: process-control config revision is {{Gerrit|7598dc1bf9}} - jobs disabled
* 00:04 mutante: install1002 - exported indices for new scap version - copied back from buster to stretch - upgraded scap version on mw2250 - scap pull now works and starts to rsync ([[phab:T228482|T228482]], [[phab:T228328|T228328]], [[phab:T226948|T226948]])
 
== 2019-07-18 ==
* 23:50 mutante: built new scap version 3.11.1-1 on boron, copied to install1002, imported package with reprepro, copied from stretch to jessie and buster ([[phab:T228482|T228482]])
* 23:22 Lucas_WMDE: Evening SWAT done
* 23:17 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/InitialiseSettings-labs.php: SWAT: [[gerrit:523141{{!}}Configure Citoid+Wikibase integration on Beta (production no-op) (T228411)]] (duration: 00m 54s)
* 23:13 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/Wikibase.php: SWAT: [[gerrit:523140{{!}}Set $wgWBRepoSettings[enableRefTabs] in Wikibase.php (T228414)]] (duration: 01m 16s)
* 23:09 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:523139{{!}}Define settings for Citoid+Wikibase integration (T228414)]] (duration: 00m 55s)
* 22:23 gehel@puppetmaster1001: conftool action : set/pooled=inactive; selector: dc=eqiad,name=wdqs1008.eqiad.wmnet
* 22:16 gehel@cumin1001: END (ERROR) - Cookbook sre.wdqs.data-transfer (exit_code=97)
* 22:00 eevans@: helmfile [STAGING] Ran 'apply' command on namespace 'sessionstore' for release 'staging' .
* 21:49 bd808: Cleaned up stale striker logs on labweb1001 and labweb1002. Logs go to journald now so log rotate is not triggered to rotate out logs from before that change.
* 21:42 eevans@: helmfile [STAGING] Ran 'apply' command on namespace 'sessionstore' for release 'staging' .
* 21:36 bd808@deploy1001: Finished deploy [striker/deploy@91594df]: Fixes for deprecation warnings and editing Tool models ([[phab:T228222|T228222]], [[phab:T228332|T228332]]) (duration: 01m 13s)
* 21:34 bd808@deploy1001: Started deploy [striker/deploy@91594df]: Fixes for deprecation warnings and editing Tool models ([[phab:T228222|T228222]], [[phab:T228332|T228332]])
* 21:15 mutante: gerrit (cobalt) - scheduled 1h downtime, rebooting for kernel upgrade
* 21:03 jforrester@deploy1001: Synchronized php-1.34.0-wmf.14/extensions/Flow: [[phab:T228290|T228290]] Fix fatal in ChangesListFormatter::getLogTextLinks() (duration: 01m 02s)
* 20:57 mutante: gerrit2001 - icinga downtime for 1h
* 20:56 mutante: gerrit2001 - reboot for kernel upgrade
* 20:51 mutante: gerrit2001 - apt-get upgrade; apt-get autoremove ; puppet agent -tv
* 19:55 eevans@: helmfile [STAGING] Ran 'apply' command on namespace 'sessionstore' for release 'staging' .
* 19:33 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: [[phab:T228374|T228374]] Enable SecureLinkFixer in beta cluster (2/2) (duration: 00m 55s)
* 19:31 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [[phab:T228374|T228374]] Enable SecureLinkFixer in beta cluster (1/2) (duration: 00m 55s)
* 19:27 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: [[phab:T207750|T207750]] Revoke editmyuserjsredirect from all users (duration: 00m 54s)
* 19:25 otto@: helmfile [STAGING] Ran 'apply' command on namespace 'eventgate-main' for release 'main' .
* 19:21 otto@: helmfile [STAGING] Ran 'apply' command on namespace 'eventgate-analytics' for release 'analytics' .
* 19:20 eevans@: helmfile [STAGING] Ran 'apply' command on namespace 'sessionstore' for release 'staging' .
* 18:45 mutante: contint2001 - had puppet failure in puppet board / dpkg issue due to unfinished zuul install which was done on contint1001 - stopped zuul and zuul-merger, apt-install zuul (was already latest version but needed to finish configure step), apt-get autoremove to remove unused packages, ran puppet. dpkg and puppet happy again
* 17:45 krinkle@deploy1001: Synchronized php-1.34.0-wmf.14/includes/libs/objectcache/RedisBagOStuff.php: {{Gerrit|69cd8b0f49e8caf8c7398ad76a1ce3d2da4f3e6b}} (duration: 00m 55s)
* 17:15 Krinkle: krinkle@depoy1001: Pull down https://gerrit.wikimedia.org/r/#/c/mediawiki/extensions/CentralAuth/+/523844/ and  https://gerrit.wikimedia.org/r/#/c/mediawiki/extensions/CentralAuth/+/524276/ (no-op, not deploying)
* 16:36 jmm@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
* 16:29 XioNoX: upgrade Routinator to 0.5.0 in eqiad - [[phab:T220669|T220669]]
* 16:24 krinkle@deploy1001: Synchronized php-1.34.0-wmf.14/resources/src/mediawiki.misc-authed-ooui/special.movePage.js: {{Gerrit|e97a284dbe54}} (duration: 00m 58s)
* 16:17 jmm@cumin1001: START - Cookbook sre.ganeti.makevm
* 16:06 XioNoX: upgrade Routinator to 0.5.0 in codfw - [[phab:T220669|T220669]]
* 16:05 XioNoX: add routinator 0.5.0 to APT
* 15:54 fsero: depool ms-fe2005 - [[phab:T228196|T228196]]
* 15:40 hashar@deploy1001: rebuilt and synchronized wikiversions files: Revert group2 wikis to 1.34.0-wmf.13 # [[phab:T228436|T228436]] [[phab:T220739|T220739]]
* 15:19 gehel@cumin2001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 14:46 godog: roll-restart thumbor in codfw - [[phab:T228086|T228086]]
* 14:45 gehel@cumin1001: END (FAIL) - Cookbook sre.postgresql.postgres-init (exit_code=99)
* 14:37 liw: all wikis at 1.34.0-wmf.14
* 14:36 liw@deploy1001: rebuilt and synchronized wikiversions files: all wikis to 1.34.0-wmf.14
* 14:28 bblack: cp hosts: apt autoremove to clean up pkgs on the fleet
* 14:27 nuria@deploy1001: Finished deploy [analytics/refinery@4f07755]: deploying v0.0.94 of refinery (duration: 00m 20s)
* 14:26 nuria@deploy1001: Started deploy [analytics/refinery@4f07755]: deploying v0.0.94 of refinery
* 14:24 godog: repool thumbor2003
* 14:20 godog: reboot thumbor2003
* 14:17 jijiki: Depool thumbor2003 for reboot
* 14:12 gehel@cumin2001: START - Cookbook sre.wdqs.data-transfer
* 13:53 moritzm: installing php5 security updates
* 13:50 gehel@cumin2001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 13:36 jeh: rebooting labstore1005.eqiad.wmnet - [[phab:T224228|T224228]]
* 13:34 jbond42: remove mtail 3.0.0~rc24.1-1+wmf1 from stretch-wikimedia
* 13:30 liw@deploy1001: Synchronized php: group1 wikis to 1.34.0-wmf.14 (duration: 00m 53s)
* 13:29 liw@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.34.0-wmf.14
* 13:24 jbond42: downgrade cp servers backl to 3.0.0~rc5-1~bpo9+1
* 13:23 liw: promoting 1.34.0-wmf.14 to group1
* 13:22 godog: temporarily stop ircecho on icinga1001 to avoid spam
* 13:00 jbond42: rolling upgrade of mtail
* 12:57 gehel@cumin2001: START - Cookbook sre.wdqs.data-transfer
* 12:53 gehel@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 12:53 gehel@cumin1001: END (ERROR) - Cookbook sre.wdqs.data-transfer (exit_code=97)
* 12:51 gehel@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 12:34 gehel@cumin1001: START - Cookbook sre.postgresql.postgres-init
* 12:26 jbond42: add mtail 3.0.0~rc24.1-1+wmf1 to stretch-wikimedia
* 11:13 dcausse: EU Swat done
* 11:08 dcausse@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Revert [cirrus] switch search traffic (except completion) to codfw (duration: 00m 56s)
* 11:02 godog: swift eqiad-prod: put back ms-be1043 sdk1 - [[phab:T218544|T218544]]
* 10:51 gehel@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 10:43 ema: cp-eqiad: varnish frontend rolling restarts for 5.1.3-1wm11 upgrades [[phab:T227672|T227672]]
* 10:37 jijiki: enable puppet on services_proxy hosts - [[phab:T228063|T228063]]
* 10:29 godog: reboot wezen.codfw.wmnet - [[phab:T225713|T225713]]
* 10:27 gehel@cumin1001: END (FAIL) - Cookbook sre.postgresql.postgres-init (exit_code=99)
* 10:15 jijiki: Disable puppet on services_proxy hosts - [[phab:T228063|T228063]]
* 09:33 gehel@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 09:26 gehel@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 09:09 godog: resume swift ms-be rolling restarts - [[phab:T225713|T225713]]
* 09:03 fsero: reuploding missing layers [[phab:T228196|T228196]]
* 08:57 hashar: contint1001: stopped zuul, ran apt install to get the new python2.7 copied to Zuul virtualenv, restarted zuul/zuul-merger. That clears a couple Icinga alarms from yesterday
* 08:56 marostegui: Drop afl_log_id column from enwiki.abuse_filter_log on db2116 [[phab:T226851|T226851]]
* 08:54 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Depool db2116 (duration: 00m 55s)
* 08:18 gehel@cumin1001: START - Cookbook sre.postgresql.postgres-init
* 08:14 gehel@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 06:56 dcausse: deleting zerowiki elastic indices (eqiad and codfw) [[phab:T227718|T227718]]
* 05:22 marostegui: Stop MySQL on db2045, host will be decommissioned [[phab:T228281|T228281]]
* 05:18 marostegui: Remove db2045 from tendril and zarcillo [[phab:T228281|T228281]]
* 05:16 marostegui: Disable notifications on db2045 [[phab:T228281|T228281]]
* 05:09 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Remove db2045 from config, will be decommissioned [[phab:T228281|T228281]] (duration: 00m 54s)
* 05:08 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Remove db2045 from config, will be decommissioned [[phab:T228281|T228281]] (duration: 00m 56s)
* 04:31 legoktm: running query for [[phab:T227843|T227843]] on mwmaint102
 
== 2019-07-17 ==
* 23:51 catrope@deploy1001: Synchronized wmf-config/CommonSettings.php: Add wmgUseTheWikipediaLibrary (false everywhere, no-op) (duration: 00m 54s)
* 23:48 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Add wmgUseTheWikipediaLibrary (false everywhere, no-op) (duration: 00m 53s)
* 22:35 mutante: reimaging mw2250 after disks have been replaced
* 22:16 hoo: Manually started the Wikidata RDF dumps on snapshot1008 (due to [[phab:T228104|T228104]])
* 21:42 apergos: started wikidata entity dumps json run on snapshot1008
* 21:37 nuria: deployment aborted for refinary 0.0.94
* 21:37 nuria@deploy1001: Finished deploy [analytics/refinery@4f07755]: refinery 0.0.94 (duration: 36m 28s)
* 21:16 jforrester@deploy1001: Synchronized php-1.34.0-wmf.13/includes/libs/rdbms/loadbalancer: [[phab:T228104|T228104]] rdbms: better handle a non-existing  defaultGroup in LoadBalancer (duration: 00m 55s)
* 21:15 catrope@deploy1001: Synchronized php-1.34.0-wmf.14/extensions/Flow: Clean up accidentally-deployed debugging code for [[phab:T228290|T228290]] (duration: 01m 02s)
* 21:10 otto@deploy1001: Finished deploy [eventstreams/deploy@dbc9bbb]: Fix ?doc to use openapi instead of swagger - [[phab:T227958|T227958]] (duration: 02m 52s)
* 21:07 otto@deploy1001: Started deploy [eventstreams/deploy@dbc9bbb]: Fix ?doc to use openapi instead of swagger - [[phab:T227958|T227958]]
* 21:00 nuria@deploy1001: Started deploy [analytics/refinery@4f07755]: refinery 0.0.94
* 20:35 accraze@deploy1001: Finished deploy [ores/deploy@676f7ba]: [[phab:T228331|T228331]] (duration: 24m 59s)
* 20:10 accraze@deploy1001: Started deploy [ores/deploy@676f7ba]: [[phab:T228331|T228331]]
* 19:35 jforrester@deploy1001: Synchronized php-1.34.0-wmf.14/includes/libs/rdbms/loadbalancer: [[phab:T228104|T228104]] rdbms: better handle a non-existing  defaultGroup in LoadBalancer (duration: 00m 55s)
* 19:04 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2181.codfw.wmnet
* 18:36 cdanis: cdanis@cumin1001.eqiad.wmnet ~ % sudo debdeploy deploy -u 2019-07-17-conftool.yaml -s eqiad
* 18:28 cdanis: cdanis@cumin1001.eqiad.wmnet ~ % sudo debdeploy deploy -u 2019-07-17-conftool.yaml -s codfw
* 18:26 cdanis: cdanis@cumin1001.eqiad.wmnet ~ % sudo debdeploy deploy -u 2019-07-17-conftool.yaml -s esams
* 18:25 cdanis: cdanis@cumin1001.eqiad.wmnet ~ % sudo debdeploy deploy -u 2019-07-17-conftool.yaml -s ulsfo
* 18:23 cdanis: cdanis@cumin1001.eqiad.wmnet ~ % sudo debdeploy deploy -u 2019-07-17-conftool.yaml -s eqsin
* 18:20 cdanis: cdanis@mw1261.eqiad.wmnet ~ % sudo -i pool
* 18:19 cdanis: testing conftool upgrade: cdanis@mw1261.eqiad.wmnet ~ % sudo -i depool
* 18:15 mutante: mw2181 - sudo: /usr/local/bin/mwscript: command not found  on scap pull ??
* 18:14 mutante: mw2181 - scap pull ([[phab:T205240|T205240]])
* 18:06 cdanis: cdanis@cumin1001.eqiad.wmnet ~ % sudo debdeploy deploy -u 2019-07-17-conftool.yaml -s mw-canary
* 18:02 cdanis: upgrade to python3-conftool 1.1.1-1 on mwdebug2001
* 18:01 cdanis: cdanis@install1002.wikimedia.org ~ % sudo -E reprepro -C main include jessie-wikimedia conftool/conftool_1.1.1-1+deb8u1_amd64.changes
* 18:01 cdanis: cdanis@install1002.wikimedia.org ~ % sudo -E reprepro -C main include buster-wikimedia conftool/conftool_1.1.1-1+deb10u1_amd64.changes
* 18:01 cdanis: cdanis@install1002.wikimedia.org ~ % sudo -E reprepro -C main include stretch-wikimedia conftool/conftool_1.1.1-1_amd64.changes
* 17:09 papaul: shutting down restbase2009 for firmware upgrade
* 17:06 liw@deploy1001: rebuilt and synchronized wikiversions files: Revert "group[0{{!}}1] wikis to 1.34.0-wmf.13"
* 16:57 dcausse: morning swat done
* 16:54 dcausse@deploy1001: Synchronized php-1.34.0-wmf.14/extensions/CirrusSearch/includes/ElasticaErrorHandler.php: [[phab:T228283|T228283]]: Log response data JSON on errors (duration: 00m 55s)
* 16:48 Urbanecm: Deployed patch for [[phab:T207094|T207094]]
* 16:47 gehel@cumin1001: END (FAIL) - Cookbook sre.postgresql.postgres-init (exit_code=99)
* 16:40 elukey: execute reprepro clearvanished on install1002 to clear buster-wikimedia{{!}}thirdparty/amd-rocm (not used anymore)
* 16:37 dcausse: reponing morning SWAT
* 16:24 papaul: shutting down mw2181 for firmware upgrade
* 16:20 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 16:20 jiji@cumin1001: START - Cookbook sre.hosts.downtime
* 16:19 jijiki: Depool mw2181 - [[phab:T205240|T205240]]
* 16:08 Urbanecm: Morning SWAT done
* 16:08 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Raise zh_classicalwiki requirement for autoconfirmed ([[phab:T228141|T228141]]) (duration: 00m 55s)
* 16:07 cmjohnson1: powering off cloudvirt1014 for rack move [[phab:T226188|T226188]]
* 16:05 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: [[:gerrit:523686{{!}}Enable partial blocks on dewiki]] ([[phab:T228150|T228150]]) (duration: 00m 54s)
* 16:01 jbond42: copy confd package from stretch-wikimedia to buster-wikimedia
* 15:47 Urbanecm: Re-syncing patch for [[phab:T207094|T207094]] [[phab:T228284|T228284]] and wmf.14
* 15:37 Urbanecm: Deployed patch for [[phab:T207094|T207094]] [[phab:T228284|T228284]] to wmf.13 and wmf.14
* 15:15 fsero: restarting swift-container-sync on ms-be* for getting logging configuration [[phab:T228196|T228196]]
* 15:11 papaul: shutting down mw2250 for disk replacement
* 15:10 gehel@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 15:07 hashar: upgrading CI Jenkins # [[phab:T228142|T228142]]
* 15:06 papaul: shutting down ms-be2022 for HW  troubleshooting
* 15:06 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 15:05 jiji@cumin1001: START - Cookbook sre.hosts.downtime
* 15:03 jijiki: Depool mw2269 to reboot it - [[phab:T227548|T227548]]
* 15:00 godog: poweroff ms-be2022 - [[phab:T227667|T227667]]
* 14:55 moritzm: updated jenkins in thirdparty/ci (stretch) and thirdparty (jessie) to 2.176.2 ([[phab:T228142|T228142]])
* 14:45 fsero: enabling container-sync logging [[phab:T228196|T228196]]
* 14:41 otto@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 14:41 otto@cumin1001: START - Cookbook sre.hosts.decommission
* 14:35 moritzm: restart pybal on lvs2002 (codfw primary) [[phab:T227778|T227778]]
* 14:32 gehel@cumin1001: START - Cookbook sre.postgresql.postgres-init
* 14:31 gehel: repool maps1004 - [[phab:T218097|T218097]]
* 14:11 liw@deploy1001: Synchronized php: group1 wikis to 1.34.0-wmf.14 (duration: 00m 54s)
* 14:10 liw@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.34.0-wmf.14
* 14:09 moritzm: restarting pybal on backup LVSes in codfw
* 14:02 liw@deploy1001: Synchronized php-1.34.0-wmf.14/extensions/CirrusSearch/includes/Searcher.php: Do not serialize ResultsType instance [[phab:T228276|T228276]] (duration: 00m 55s)
* 13:37 gehel@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 13:26 moritzm: disabled puppet on Icinga hosts in preparation of adding the LDAP replicas/codfw to LVS
* 13:10 ema: cp-codfw: varnish frontend rolling restarts for 5.1.3-1wm11 upgrades [[phab:T227672|T227672]]
* 13:06 ema: prometheus servers: remove varnish-upload_$dc_backend.yaml, replaced by ATS equivalent [[phab:T227668|T227668]]
* 12:57 gehel@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 12:36 godog: upgrade hp raid firmware on ms-be1 hosts - [[phab:T141756|T141756]]
* 12:15 Urbanecm: Running foreachwiki extensions/AbuseFilter/maintenance/normalizeThrottleParameters.php in tmux session on mwmaint1002 ([[phab:T209565|T209565]])
* 12:11 Urbanecm: Ran extensions/AbuseFilter/maintenance/normalizeThrottleParameters.php for cawiki and viwiki ([[phab:T209565|T209565]])
* 11:58 gehel@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 11:30 mlitn@deploy1001: Synchronized php-1.34.0-wmf.14/extensions/WikibaseMediaInfo: [WikibaseMediaInfo] Revert "Add Wikidata links to statement UI elements" (duration: 00m 56s)
* 11:16 dcausse: reindexing wikidata (elastic@eqiad) [[phab:T227136|T227136]]
* 11:08 dcausse@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [[phab:T227136|T227136]]: [cirrus] switch search traffic (except completion) to codfw (duration: 00m 54s)
* 10:53 moritzm: re-enabled icinga1001 in meta monitoring
* 10:41 godog: install updated linux-image-4.9.0-9-amd64 on ms-be hosts
* 10:30 godog: start rolling reboot of ms-be eqiad hosts - [[phab:T225713|T225713]]
* 10:30 gehel@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 10:23 moritzm: rebooting icinga1001 for kernel update
* 10:20 moritzm: disabled icinga1001 in meta monitoring
* 10:18 jmm@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 10:18 jmm@cumin2001: START - Cookbook sre.hosts.downtime
* 10:08 moritzm: rebooting lithium for kernel update
* 10:04 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 10:04 jmm@cumin2001: START - Cookbook sre.hosts.downtime
* 09:33 gehel@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 09:33 gehel@cumin1001: END (ERROR) - Cookbook sre.wdqs.data-transfer (exit_code=97)
* 09:23 moritzm: rebooting grafana1001 to pick up MDS-enabled qemu
* 09:21 ema: cp-ats: upgrade fifo-log-demux to 0.3 [[phab:T227668|T227668]]
* 09:21 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Depool and clarify db2045 status [[phab:T227862|T227862]] (duration: 00m 55s)
* 09:19 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 09:19 jmm@cumin2001: START - Cookbook sre.hosts.downtime
* 09:15 gehel@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 09:07 ema: upload fifo-log-demux 0.3 to stretch-wikimedia [[phab:T227668|T227668]]
* 08:51 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 08:51 jmm@cumin2001: START - Cookbook sre.hosts.downtime
* 08:36 jijiki: Disable puppet on thumbor* in eqiad, depool and pool back to apply 523728 - [[phab:T224572|T224572]]
* 08:17 jijiki: Pool mw1239 - [[phab:T227867|T227867]]
* 07:48 godog: swift eqiad-prod: put back ms-be1043 sdk1 - [[phab:T218544|T218544]]
* 07:46 ema: cp-esams: varnish frontend rolling restarts for 5.1.3-1wm11 upgrades [[phab:T227672|T227672]]
* 07:33 moritzm: reimaging sarin for some tests
* 06:59 elukey: apply mcrouter async replication to mw2224 - [[phab:T225642|T225642]]
* 06:25 elukey: reboot analytics1072 as attempt to clear the megacli's config (and add a new disk)
* 06:20 elukey: sudo -i /usr/local/sbin/restart-php7.2-fpm on mwdebug* to reset opcache
* 05:26 marostegui: Stop MySQL on db1065 for decommissioning - [[phab:T227560|T227560]]
* 05:24 marostegui: Remove db1065 from tendril and zarcillo - [[phab:T227560|T227560]]
* 03:46 tstarling@deploy1001: Synchronized php-1.34.0-wmf.14/extensions/CentralAuth/includes/specials/SpecialMultiLock.php: [[phab:T227772|T227772]] (duration: 00m 54s)
* 03:42 tstarling@deploy1001: Synchronized php-1.34.0-wmf.13/extensions/CentralAuth/includes/specials/SpecialMultiLock.php: [[phab:T227772|T227772]] (duration: 00m 56s)
* 03:00 tstarling@deploy1001: Synchronized php-1.34.0-wmf.13/includes/Permissions/PermissionManager.php: (no justification provided) (duration: 00m 54s)
* 02:58 tstarling@deploy1001: Synchronized php-1.34.0-wmf.14/includes/Permissions/PermissionManager.php: (no justification provided) (duration: 00m 57s)
* 00:50 mutante: wikitech-static commented out cert renewal cron job out of caution - still needs fixing but continue tomorrow
* 00:12 mutante: wikitech-static - adding (undocumented!) option webroot-map to certbot config to use webroot authenticator with different document roots per domain while using the config file and not cli params ([[phab:T214640|T214640]])
* 00:01 mutante: wikitech-static certbot --dry-run renew ([[phab:T214640|T214640]])
* 00:01 mutante: wikitech-static changing certbot renewalparams: authenticator = webroot (changed from standalone), install = apache (unchanged) ([[phab:T214640|T214640]])
 
== 2019-07-16 ==
* 23:53 RoanKattouw: Deployed patch for [[phab:T207094|T207094]]
* 23:27 catrope@deploy1001: Synchronized php-1.34.0-wmf.14/skins/MinervaNeue/: Do not load main menu icons in critical path ([[phab:T227929|T227929]]) (duration: 00m 55s)
* 23:26 catrope@deploy1001: Synchronized php-1.34.0-wmf.13/skins/MinervaNeue/: Do not load main menu icons in critical path ([[phab:T227929|T227929]]) (duration: 00m 56s)
* 23:26 mutante: wikitech-static - current status with method 'standalone' is that it's broken on cert renewal and gets fixed by restarting apache, which makes no sense since the previous fixes were the straight opposite and the ticket claims the fix was moving back from apache to standalone ([[phab:T214640|T214640]])
* 23:26 fsero: repool ms-fe2005 [[phab:T228196|T228196]]
* 23:23 mutante: wikitech-static - testing cert renewal with dry-run option - getting some temp icinga alerts is now expected again because renewal method was changed back from 'apache' to 'standalone' (not by me -> [[phab:T204840|T204840]]#5243222 i previously did the opposite change in [[phab:T214640|T214640]]#4907685 to fix it) and that takes down apache during the renewal ([[phab:T214640|T214640]])
* 23:20 mutante: wikitech-static - testing cert renewal with dry-run option - getting some temp icinga alerts is now expected again because renewal method was changed back from 'apache' to 'standalone' (not by me) and that takes down apache during the renewal
* 23:17 catrope@deploy1001: Synchronized php-1.34.0-wmf.14/extensions/GrowthExperiments/: Don't use timestamp in help panel questions in Flow ([[phab:T212433|T212433]]) (duration: 00m 56s)
* 23:09 mutante: wikitech-static got ssl config files in sync with the repo, the difference was really just that space on one line each though ([[phab:T225258|T225258]])
* 22:35 fsero: uploading only blobs on docker-registry-codfw from a backup on ms-fe2005 [[phab:T228196|T228196]]
* 22:29 mutante: wikitech-static the diff between the ssl config files in the repo and on server were just a space at the end of the ServerAdmin line .... [[phab:T225258|T225258]]
* 22:28 fsero: depooling ms-fe2005 for swift upload for registry [[phab:T228196|T228196]]
* 22:26 mutante: wikitech-static ran certbot with --dry-run renew to confirm cert renewal works and it was just fine .. 2 minutes later apache errors which were fixed by restarting apache2 ([[phab:T214640|T214640]])
* 22:24 mutante: wikitech-static restarted apache
* 22:11 mutante: wikitech-static: turn /etc/apache2/sites-available/wikitech-static.wikimedia.org-ssl.conf and status.wikimedia.org-ssl.conf into symlinks to /wikitech-static/apache/ to match config for http vhosts ([[phab:T225258|T225258]])
* 22:06 mutante: wikitech-static: move /etc/apache2/sites-available/000-default.conf and default-ssl.conf out of directory and reload apache to confirm they are not used and get us in sync with the repo contents again ([[phab:T225258|T225258]])
* 21:17 bd808@deploy1001: Finished deploy [striker/deploy@247a8a6]: Fixes for ssh key management, git repo creation, and Django upgrade ([[phab:T221657|T221657]], [[phab:T227508|T227508]]) (duration: 01m 08s)
* 21:15 bd808@deploy1001: Started deploy [striker/deploy@247a8a6]: Fixes for ssh key management, git repo creation, and Django upgrade ([[phab:T221657|T221657]], [[phab:T227508|T227508]])
* 20:55 SMalyshev: repooled wdqs2004 and wdqs2001 - reload done
* 20:26 mutante: ganeti1001 - gnt-instance remove netmon1003.wikimedia.org ([[phab:T220355|T220355]])
* 19:59 XioNoX: update ACLs on pfw3-eqiad/codfw - [[phab:T228205|T228205]]
* 19:52 gehel@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 19:51 fsero: republishing base images for wikimedia-(stretch,jessie and buster) [[phab:T228196|T228196]]
* 18:58 gehel@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 18:58 gehel@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99)
* 18:58 gehel@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 18:54 gehel: data copy from wdqs2004 to wdqs2001 - [[phab:T228122|T228122]]
* 18:47 otto@deploy1001: Synchronized wmf-config/InitialiseSettings.php: retry - Produce revision-create stream to eventgate-main - [[phab:T211248|T211248]] (duration: 00m 54s)
* 18:23 otto@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Produce revision-create stream to eventgate-main - [[phab:T211248|T211248]] (duration: 00m 54s)
* 18:08 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: Update ExtensionDistributor config to point to REL1_33 as the released version (duration: 00m 54s)
* 18:05 fsero: republishing base images for nodejs-slim due to registry [[phab:T228196|T228196]]
* 18:02 andrewbogott: rebooting cloudcontrol2003-dev, cloudweb2001-dev, cloudcontrol1004 for [[phab:T225713|T225713]]
* 17:39 otto@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Produce centralnotice.campaign-* streams to eventgate-main - [[phab:T211248|T211248]] (duration: 00m 55s)
* 17:23 bsitzmann@deploy1001: Finished deploy [mobileapps/deploy@cb6e7bc]: Update mobileapps to {{Gerrit|334a4c4}} ([[phab:T227907|T227907]]) (duration: 04m 51s)
* 17:19 bsitzmann@deploy1001: Started deploy [mobileapps/deploy@cb6e7bc]: Update mobileapps to {{Gerrit|334a4c4}} ([[phab:T227907|T227907]])
* 16:55 mutante: netmon1003: shutdown -h now {{!}} ganeti1001: gnt-instance shutdown netmon1003.wikmedia.org - removed from icinga  [[phab:T198939|T198939]] [[phab:T220355|T220355]]
* 16:36 jiji@deploy1001: Finished deploy [cpjobqueue/deploy@5d8128e]: Migrating videoscaling  jobs to PHP7 - [[phab:T219150|T219150]] (duration: 00m 50s)
* 16:35 jiji@deploy1001: Started deploy [cpjobqueue/deploy@5d8128e]: Migrating videoscaling  jobs to PHP7 - [[phab:T219150|T219150]]
* 16:28 dcausse: reindexing wikidata (elastic@eqiad) [[phab:T227136|T227136]]
* 15:57 tarrow@: helmfile [STAGING] Ran 'apply' command on namespace 'termbox' for release 'staging' .
* 15:37 elukey: reboot analytics1072 as attempt to force the raid controller to set a drive failed - [[phab:T226467|T226467]]
* 15:12 elukey: start mariadb on db1107 and re-enable mysql consumers on eventlog1002 and replication on db1108
* 14:53 elukey: stop mariadb on db1107 to allow maintenance
* 14:53 elukey: stop eventlogging mysql consumers on eventlog1002 and eventlogging_sync on db1108 to allow db1107 maintenance
* 14:52 jbond42: will restart redis on oresdb at 16:00 UTC - [[phab:T228045|T228045]]
* 14:51 jbond42: enable puppet accross the fleat
* 14:50 jbond@cumin1001: conftool action : set/pooled=yes; selector: name=dns1001.wikimedia.org
* 14:43 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 14:43 jbond@cumin1001: START - Cookbook sre.hosts.downtime
* 14:43 jbond@cumin1001: conftool action : set/pooled=no; selector: name=dns1001.wikimedia.org
* 14:40 jbond42: disable puppet accross the fleat to make a change to the hiera
* 14:30 jijiki: Enable puppet and rolling restart thumbor* in codfw - [[phab:T224572|T224572]]
* 14:16 jijiki: Depool thumbor2001 and pool back - [[phab:T224572|T224572]]
* 14:13 jijiki: Disabling puppet on thumbor*codfw.wmnet - [[phab:T224572|T224572]]
* 14:08 liw: group0 to 1.34.0-wmf.14
* 14:06 liw@deploy1001: rebuilt and synchronized wikiversions files: group0 to php-1.34.0-wmf.14
* 13:41 liw@deploy1001: Finished scap: testwiki to php-1.34.0-wmf.14 and rebuild l10n cache (duration: 26m 45s)
* 13:24 vgutierrez: restarting pybal on lvs2001 and lvs1013
* 13:20 vgutierrez: restarting pybal on lvs2004 and lvs1016
* 13:14 liw@deploy1001: Started scap: testwiki to php-1.34.0-wmf.14 and rebuild l10n cache
* 12:59 liw@deploy1001: Pruned MediaWiki: 1.34.0-wmf.8 (duration: 01m 46s)
* 12:57 liw@deploy1001: Pruned MediaWiki: 1.34.0-wmf.7 (duration: 02m 01s)
* 12:54 liw@deploy1001: Pruned MediaWiki: 1.34.0-wmf.6 (duration: 02m 04s)
* 12:52 liw@deploy1001: Pruned MediaWiki: 1.34.0-wmf.4 (duration: 02m 11s)
* 12:49 liw@deploy1001: Pruned MediaWiki: 1.34.0-wmf.5 (duration: 07m 42s)
* 12:42 dcausse: deleting stale wikidata indices (elastic@eqiad) [[phab:T227136|T227136]]
* 12:11 jijiki: Depool mw1293 and pool back
* 11:57 moritzm: synched docker-ce, docker-ce-cli, containerd.io to thirdparty/ci for stretch-wikimedia ([[phab:T226236|T226236]])
* 11:12 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 11:12 jmm@cumin2001: START - Cookbook sre.hosts.downtime
* 11:12 moritzm: rebooting remaining swift frontends in eqiad to pick up a kernel with SACK fixed ([[phab:T228086|T228086]])
* 10:29 moritzm: rebooting ms-fe1005 to pick up kernel with SACK fixed ([[phab:T228086|T228086]])
* 10:28 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 10:28 jmm@cumin2001: START - Cookbook sre.hosts.downtime
* 10:17 vgutierrez: restart pybal on lvs1013
* 10:15 vgutierrez: restart pybal on lvs2001
* 10:11 vgutierrez: restarting pybal on lvs1016
* 10:08 vgutierrez: restarting pybal on lvs2004
* 10:04 vgutierrez@puppetmaster1001: conftool action : set/pooled=yes; selector: cluster=ncredir,service=nginx
* 09:24 elukey: apply mcrouter async replication settings to mw1276 - [[phab:T225642|T225642]]
* 09:23 elukey: pool mw1261 back with mcrouter async replication settings - [[phab:T225642|T225642]]
* 08:50 fsero: upload coredns docker image into registry [[phab:T226516|T226516]]
* 08:44 jynus: droping servermon accounts from m1 dbs [[phab:T198939|T198939]]
* 08:12 fsero: uploading coredns_1.5.2 for buster and stretch - [[phab:T226516|T226516]]
* 08:11 fsero: uploading coredns_1.5.2 for buster and stretch
* 07:45 elukey: depool mw1261 to test mcrouter changes
* 00:24 krinkle@deploy1001: Synchronized php-1.34.0-wmf.13/includes/cache/LinkCache.php: {{Gerrit|4a5f4ca2fd788}} (duration: 00m 51s)
* 00:05 catrope@deploy1001: Synchronized php-1.34.0-wmf.13/skins/MinervaNeue/: Restrict AMC scripts and styles to AMC mode ([[phab:T227929|T227929]]) (duration: 00m 52s)
* 00:03 shdubsh: restart logstash to revert mitigations - [[phab:T228089|T228089]]
 
== 2019-07-15 ==
* 23:55 XioNoX: rotate network-root password
* 23:31 robh@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 23:31 robh@cumin1001: START - Cookbook sre.hosts.decommission
* 23:07 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: GrowthExperiments: Remove reference to non-existent feature flag (duration: 00m 51s)
* 22:33 XenoRyet: updated civicrm from {{Gerrit|8a4451f390}} to {{Gerrit|3be1a8c77c}}
* 22:01 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Stop setting wgNonincludableNamespaces, default, never varied (duration: 00m 52s)
* 22:00 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Drop wmgEnableTabularData and wmgEnableMapData, unused (duration: 00m 55s)
* 21:58 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: Use wmgEnableJsonConfigDataMode instead of wmgEnableTabularData and wmgEnableMapData (duration: 00m 56s)
* 21:56 jijiki: Depool mw1239 for maintenance - [[phab:T227867|T227867]]
* 21:55 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Add wmgEnableJsonConfigDataMode to IS (duration: 00m 55s)
* 21:46 sbassett@deploy1001: Synchronized private/PrivateSettings.php: Add more severe rate limits for eswikiquote ([[phab:T227416|T227416]]) (duration: 00m 50s)
* 21:16 gehel@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 21:06 XioNoX: rollback `as-path HE ".* 6939 .*"` to AVOID-PATH in eqsin - [[phab:T228015|T228015]]
* 20:59 krinkle@deploy1001: Synchronized php-1.34.0-wmf.13/includes/Title.php: [[phab:T227700|T227700]] / [[phab:T227700|T227700]]: getSubpage should not lose the interwiki prefix (duration: 00m 52s)
* 20:54 bsitzmann@deploy1001: Finished deploy [mobileapps/deploy@bc3a2fd]: Update mobileapps to {{Gerrit|7fd39da}} ([[phab:T227907|T227907]]) (duration: 02m 24s)
* 20:52 bsitzmann@deploy1001: Started deploy [mobileapps/deploy@bc3a2fd]: Update mobileapps to {{Gerrit|7fd39da}} ([[phab:T227907|T227907]])
* 20:52 bsitzmann@deploy1001: Finished deploy [mobileapps/deploy@bc3a2fd]: Update mobileapps to {{Gerrit|7fd39da}} ([[phab:T227907|T227907]]) (duration: 07m 53s)
* 20:50 Krinkle: deploy1001: Unable to fetch git commits from Gerrit for php-1.34.0-wmf.13 due to "error: cannot update the ref 'refs/remotes/origin/fundraising/REL1_31': unable to append to '.git/logs/refs/remotes/origin/fundraising/REL1_31': Permission denied"
* 20:47 XioNoX: add `as-path HE ".* 6939 .*"` to AVOID-PATH in eqsin - [[phab:T228015|T228015]]
* 20:44 bsitzmann@deploy1001: Started deploy [mobileapps/deploy@bc3a2fd]: Update mobileapps to {{Gerrit|7fd39da}} ([[phab:T227907|T227907]])
* 20:30 XioNoX: deactivate HE peering in eqsin - [[phab:T228015|T228015]]
* 20:02 jynus: reducing consistency of db2045 to avoid lag at [[phab:T227862|T227862]]
* 19:53 gehel@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 19:31 ppchelko@deploy1001: Finished deploy [cpjobqueue/deploy@fd0a41a]: Change the name of the error log field for deduplicatio (duration: 01m 13s)
* 19:30 ppchelko@deploy1001: Started deploy [cpjobqueue/deploy@fd0a41a]: Change the name of the error log field for deduplicatio
* 19:27 ppchelko@deploy1001: Finished deploy [changeprop/deploy@df6322a]: Rename error field in deduplication logs (duration: 01m 28s)
* 19:26 ppchelko@deploy1001: Started deploy [changeprop/deploy@df6322a]: Rename error field in deduplication logs
* 19:25 XenoRyet: update payments-wiki from {{Gerrit|59ace50d66}} to {{Gerrit|224c6b2d7b}}
* 19:10 thcipriani: gerrit back
* 19:09 thcipriani: gerrit restart for v2.15.14
* 19:09 thcipriani@deploy1001: Finished deploy [gerrit/gerrit@40d88dc]: Bump gerrit version to 2.15.14 (cobalt - restart incoming) (duration: 00m 10s)
* 19:08 thcipriani@deploy1001: Started deploy [gerrit/gerrit@40d88dc]: Bump gerrit version to 2.15.14 (cobalt - restart incoming)
* 19:06 thcipriani@deploy1001: Finished deploy [gerrit/gerrit@40d88dc]: Bump gerrit version to 2.15.14 (gerrit2001) (duration: 00m 12s)
* 19:06 thcipriani@deploy1001: Started deploy [gerrit/gerrit@40d88dc]: Bump gerrit version to 2.15.14 (gerrit2001)
* 19:05 shdubsh: restarting logstash on logstash1008
* 18:27 Urbanecm: Morning SWAT done
* 18:13 urbanecm@deploy1001: Synchronized private/PrivateSettings.php: Remove spam mitigations ([[phab:T200104|T200104]]) (duration: 00m 50s)
* 18:11 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: [[:gerrit:523202{{!}}GrowthExperiments: Enable WelcomeSurvey A/B test for arwiki]] ([[phab:T226221|T226221]]) (duration: 01m 02s)
* 18:07 jbond42: syncing puppetmaster1001 facts to compiler1001/1002
* 17:34 cdanis: downtime mr1-eqsin.oob IPv6 for 20h [[phab:T227967|T227967]]
* 16:58 jynus: setting labsdb1009/10/11 to performance scaling_governor [[phab:T225713|T225713]]
* 16:57 otto@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Produce revision-visibility-change stream to eventgate-main - [[phab:T211248|T211248]] (duration: 00m 49s)
* 14:08 otto@deploy1001: Finished deploy [analytics/refinery@3296aab] (notebook): (no justification provided) (duration: 00m 06s)
* 14:08 otto@deploy1001: Started deploy [analytics/refinery@3296aab] (notebook): (no justification provided)
* 14:08 otto@deploy1001: Finished deploy [analytics/refinery@3296aab] (notebook): (no justification provided) (duration: 00m 01s)
* 14:07 otto@deploy1001: Started deploy [analytics/refinery@3296aab] (notebook): (no justification provided)
* 14:07 otto@deploy1001: Finished deploy [analytics/refinery@3296aab] (notebook): (no justification provided) (duration: 00m 01s)
* 14:07 otto@deploy1001: Started deploy [analytics/refinery@3296aab] (notebook): (no justification provided)
* 14:06 otto@deploy1001: Finished deploy [analytics/refinery@3296aab] (notebook): (no justification provided) (duration: 00m 07s)
* 14:06 otto@deploy1001: Started deploy [analytics/refinery@3296aab] (notebook): (no justification provided)
* 14:04 otto@deploy1001: Finished deploy [analytics/refinery@3296aab] (notebook): (no justification provided) (duration: 00m 05s)
* 14:04 otto@deploy1001: Started deploy [analytics/refinery@3296aab] (notebook): (no justification provided)
* 13:55 otto@deploy1001: Finished deploy [analytics/refinery@3296aab] (notebook): (no justification provided) (duration: 00m 06s)
* 13:55 elukey: enable profile::base::firewall on notebook100[3,4]
* 13:55 otto@deploy1001: Started deploy [analytics/refinery@3296aab] (notebook): (no justification provided)
* 13:55 otto@deploy1001: Finished deploy [analytics/refinery@3296aab] (notebook): (no justification provided) (duration: 00m 15s)
* 13:54 otto@deploy1001: Started deploy [analytics/refinery@3296aab] (notebook): (no justification provided)
* 13:23 Urbanecm: Running mwscript importImages.php --wiki=commonswiki --user=Meisam /home/urbanecm/T223052
* 13:16 gehel: repooling maps eqiad - [[phab:T218097|T218097]]
* 13:02 jdrewniak@deploy1001: Synchronized portals: Wikimedia Portals Update: [[gerrit:523127{{!}} Bumping portals to master (T128546)]] (duration: 00m 50s)
* 13:01 jdrewniak@deploy1001: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: [[gerrit:523127{{!}} Bumping portals to master (T128546)]] (duration: 00m 50s)
* 12:59 gehel: depooling kartotherian eqiad - [[phab:T225713|T225713]]
* 12:59 gehel: re-enabling kartotherian codfw - [[phab:T225713|T225713]]
* 12:55 gehel: shutting down tilerator on maps eqiad to free some CPU - [[phab:T225713|T225713]]
* 12:54 gehel: shutting down tilerator on maps eqiad to free some CPU -
* 12:52 urbanecm@deploy1001: Synchronized wmf-config/CommonSettings.php: Delete Image-reviewer group from commonswiki for good ([[phab:T216406|T216406]]) (duration: 00m 51s)
* 12:50 gehel: restarting kartotherian on maps1002
* 12:35 gehel: reimporting OSM data for maps eqiad cluster - [[phab:T218097|T218097]]
* 12:25 moritzm: installing openjpeg2 security updates
* 12:20 Amir1: ladsgroup@mwmaint1002:~$ mwscript maintenance/createAndPromote.php --wiki=testwikidatawiki --force --bureaucrat Ladsgroup
* 12:16 jbond42: update  redis on mwlog, pybal-test, maps and rdb*
* 12:10 moritzm: installing ldap-replica200[12] ([[phab:T227778|T227778]])
* 12:07 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/Wikibase.php: SWAT: [[gerrit:522126{{!}}Specify $wgWBRepoSettings['conceptBaseUri'] again (T225212)]] (duration: 00m 50s)
* 12:06 moritzm: removing myself from cn=tools.admin (currently not used, was mostly historical for debugging some Toollabs issue in the past)
* 12:00 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/: SWAT: [[gerrit:522125{{!}}Specify $wmgWBRepoConceptBaseUri again (T225212)]] (duration: 00m 51s)
* 12:00 Urbanecm: Running mwscript initSiteStats.php --wiki=commonswiki --update to update Special:Statistics after a big change ([[phab:T216406|T216406]])
* 11:52 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Regrant image reviewers on commonswiki the ability to mass upload ([[phab:T216406|T216406]]) (duration: 00m 50s)
* 11:49 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: [[:gerrit:520283{{!}}Rename `Image-reviewer` to `image-reviewer` for Commons]] (2/2, [[phab:T216406|T216406]]) (duration: 00m 48s)
* 11:48 urbanecm@deploy1001: Synchronized wmf-config/CommonSettings.php: SWAT: [[:gerrit:520283{{!}}Rename `Image-reviewer` to `image-reviewer` for Commons]] (1/2, [[phab:T216406|T216406]]) (duration: 00m 50s)
* 11:31 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: [[:gerrit:523128{{!}}Enable partial blocks on the Finnish Wikipedia]] ([[phab:T228008|T228008]]) (duration: 00m 51s)
* 11:28 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: [[:gerrit:522987{{!}}Move private and fishbowl overrides from groupOverrides to groupOverrides2]] ([[phab:T227980|T227980]]) (duration: 00m 51s)
* 11:24 urbanecm@deploy1001: Synchronized php-1.34.0-wmf.13/includes/libs/http/MultiHttpClient.php: SWAT: [[:gerrit:522951{{!}}Raise default reqTimeout in MultiHttpClient]] ([[phab:T226979|T226979]]) (duration: 00m 51s)
* 11:23 moritzm: installing python-django security updates on jessie
* 11:22 urbanecm@deploy1001: Synchronized php-1.34.0-wmf.13/includes/Title.php: SWAT: [[:gerrit:522871{{!}}When title contains only slashes, Title::getRootText() shouldnt return false]] ([[phab:T227816|T227816]]) (duration: 00m 51s)
* 11:19 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: [[:gerrit:523003{{!}}Enable WikiLove and SandboxLink on sqwiki]] ([[phab:T227970|T227970]]) (duration: 00m 51s)
* 11:15 Urbanecm: Running mwscript extensions/WikimediaMaintenance/createExtensionTables.php sqwiki wikilove for [[phab:T227970|T227970]]
* 11:13 Urbanecm: Running mwscript migrateUserGroup.php --wiki=commonswiki Image-reviewer image-reviewer for [[phab:T216406|T216406]]
* 11:11 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Disallow admins to grant or revoke image reviewer due to migration ([[phab:T216406|T216406]]) (duration: 00m 50s)
* 11:08 urbanecm@deploy1001: Synchronized wmf-config/CommonSettings.php: SWAT: [[:gerrit:523006{{!}}Create image-reviewer for commonswiki with same rights as Image-reviewer]] ([[phab:T216406|T216406]]) (duration: 00m 52s)
* 10:52 moritzm: installing ldap-replica200[12] ([[phab:T227778|T227778]])
* 10:34 jdrewniak@deploy1001: Synchronized portals: Wikimedia Portals Update: [[gerrit:523127{{!}} Bumping portals to master (T128546)]] (duration: 00m 50s)
* 10:34 jdrewniak@deploy1001: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: [[gerrit:523127{{!}} Bumping portals to master (T128546)]] (duration: 00m 56s)
* 09:56 ema: cp-eqsin: varnish frontend rolling restarts for 5.1.3-1wm11 upgrades [[phab:T227672|T227672]]
* 09:39 fsero: repooling ms-fe2005  [[phab:T227570|T227570]]
* 08:50 fsero: creating docker_registry_codfw on eqiad [[phab:T227570|T227570]]
* 08:49 gehel: correction: set oemhp_powerreg=os + reboot for elastic1052 (NOT elastic1054) - [[phab:T225713|T225713]]
* 08:49 fsero: [[phab:T227570|T227570]] changing container_synchronization on docker_registry_codfw to //docker_registry/eqiad/AUTH_docker/docker_registry_codfw
* 08:48 gehel: set oemhp_powerreg=os + reboot for elastic1054 - [[phab:T225713|T225713]]
* 08:22 godog: set oemhp_powerreg=os on ms-be10[16-39] - [[phab:T225713|T225713]]
* 08:01 vgutierrez: upgrading acme-chief to version 0.19 in acme-chief production instances - [[phab:T225945|T225945]]
 
== 2019-07-14 ==
* 13:18 godog: silence mr1-eqsin.oob IPv6 until tomorrow 8 UTC - [[phab:T227967|T227967]]
* 12:01 Urbanecm: Running mwscript importImages.php --wiki=commonswiki --comment-ext=txt --user=Sporti /home/urbanecm/T227968 for server side upload
 
== 2019-07-13 ==
* 01:51 MaxSem: DIsabled 2FA for my staff account
 
== 2019-07-12 ==
* 23:35 mutante: netmon1003 - shutdown -h now after it's gone from Icinga now
* 23:31 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 23:31 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
* 23:28 mutante: netmon1003 - stopping apache2 service (decom of servermon.wikimedia.org)
* 19:41 James_F: Disabled 2FA for MSchottlender-WMF for device reset.
* 19:17 shdubsh: add prometheus-varnishkafka-exporter 0.1 to apt repo [[phab:T196066|T196066]]
* 19:15 urandom: bootstrapping restbase1017-c -- [[phab:T222960|T222960]]
* 19:08 jeh: rebooting cloudvirt1018.eqiad.wmnet [[phab:T216040|T216040]]
* 18:53 mutante: cp1072 - enabling notifications for service checks in icinga, they were disabled but all green and no SAL/ticket. looked like forgotten from the past
* 18:49 gehel: setting CPU governor to performance for wdqs1010 - [[phab:T225713|T225713]]
* 18:16 Krinkle: Remove bogus Graphite data at frontend.navtiming2.requet (typo from Nov 2018), graphite1004/2003
* 18:02 urandom: bootstrapping restbase1017-b -- [[phab:T222960|T222960]]
* 16:32 urandom: bootstrapping restbase1017-a -- [[phab:T222960|T222960]]
* 16:25 jijiki: Rolling restart swift proxy on ms-fe*
* 15:25 jeh: rebooting cloudvirt1018.eqiad.wmnet [[phab:T216040|T216040]]
* 14:05 gehel@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 12:45 gehel@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 12:39 fsero: recreating ci staging namespaces [[phab:T227775|T227775]]
* 12:39 fsero@: helmfile [STAGING] Ran 'apply' command on namespace 'blubberoid' for release 'staging' .
* 12:38 fsero@: helmfile [STAGING] Ran 'apply' command on namespace 'blubberoid' for release 'staging' .
* 12:36 fsero@: helmfile [STAGING] Ran 'apply' command on namespace 'eventgate-main' for release 'main' .
* 12:33 fsero@: helmfile [STAGING] Ran 'apply' command on namespace 'eventgate-analytics' for release 'analytics' .
* 12:33 fsero@: helmfile [STAGING] Ran 'apply' command on namespace 'eventgate-analytics' for release 'analytics' .
* 12:22 fsero: recreating eventgate-* and blubberoid staging namespaces [[phab:T227775|T227775]]
* 12:22 fsero@: helmfile [STAGING] Ran 'apply' command on namespace 'mathoid' for release 'staging' .
* 12:22 fsero@: helmfile [STAGING] Ran 'apply' command on namespace 'mathoid' for release 'staging' .
* 12:18 fsero@: helmfile [STAGING] Ran 'apply' command on namespace 'cxserver' for release 'staging' .
* 12:18 fsero@: helmfile [STAGING] Ran 'apply' command on namespace 'cxserver' for release 'staging' .
* 12:18 fsero@: helmfile [STAGING] Ran 'apply' command on namespace 'cxserver' for release 'staging' .
* 12:15 fsero@: helmfile [STAGING] Ran 'apply' command on namespace 'sessionstore' for release 'staging' .
* 12:11 fsero: recreating sessionstore,cxserver and mathoid staging namespaces [[phab:T227775|T227775]]
* 12:10 fsero@: helmfile [STAGING] Ran 'apply' command on namespace 'citoid' for release 'staging' .
* 12:06 fsero: recreating citoid staging namespace [[phab:T227775|T227775]]
* 12:05 fsero@: helmfile [STAGING] Ran 'apply' command on namespace 'termbox' for release 'staging' .
* 12:01 fsero: recreating termbox staging namespace [[phab:T227775|T227775]]
* 11:09 jynus@deploy1001: Synchronized wmf-config/db-codfw.php: Switchover db2045 x1 codfw master to db2069 (duration: 00m 51s)
* 10:24 jynus: switchover x1 codfw master from db2045 to db2069 [[phab:T227862|T227862]]
* 10:23 jynus: switchover x1 codfw master from db2045 to db2069
* 09:43 moritzm: shut down ldap-codfw-replica01/ldap-codfw-replica02 (pending reimage)
* 08:18 jijiki: enable puppet on mw1222
* 06:35 vgutierrez: upgrading acme-chief to version 0.19 in acme-chief test instances - [[phab:T225945|T225945]]
* 06:28 vgutierrez: uploaded acme-chief 0.19 to apt.wikimedia.org (buster) - [[phab:T225945|T225945]]
* 05:45 elukey: sudo -i /usr/local/sbin/restart-php7.2-fpm on mwdebug* to clear opcache
* 01:01 Krinkle: mw1342 generated some ~ 11,500 additional PHP errors over a 4 hour period (18:00-22:30 UTC), ref [[phab:T224491|T224491]]
* 00:59 Krinkle: mw1342 is generating strange PHP erros (php7 only), ref [[phab:T224491|T224491]]
* 00:58 urandom: bootstrapping restbase1017-a -- [[phab:T222960|T222960]]
* 00:50 mutante: restbase1018 - restart ferm service
* 00:15 krinkle@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|e4bd91f71b}} (duration: 00m 50s)
* 00:13 krinkle@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|f309856f0912}} (duration: 00m 50s)
* 00:03 eevans@deploy1001: Finished deploy [cassandra/metrics-collector@df909a1]: deploy logback to restbase1017 ([[phab:T222960|T222960]]) (duration: 00m 03s)
* 00:03 eevans@deploy1001: Started deploy [cassandra/metrics-collector@df909a1]: deploy logback to restbase1017 ([[phab:T222960|T222960]])
* 00:01 eevans@deploy1001: Finished deploy [cassandra/metrics-collector@df909a1]: deploy logback to restbase1017 ([[phab:T222960|T222960]]) (duration: 00m 25s)
* 00:01 eevans@deploy1001: Started deploy [cassandra/metrics-collector@df909a1]: deploy logback to restbase1017 ([[phab:T222960|T222960]])
 
== 2019-07-11 ==
* 23:58 thcipriani@deploy1001: Synchronized php-1.34.0-wmf.13/includes/watcheditem/WatchedItemStore.php: SWAT: [[gerrit:522155{{!}}WatchedItemStore: Fix fatal when revision is deleted]] [[phab:T226741|T226741]] (duration: 00m 51s)
* 23:49 eevans@deploy1001: Finished deploy [cassandra/logstash-logback-encoder@d085ffa]: deploy logback to restbase1017 ([[phab:T222960|T222960]]) (duration: 00m 47s)
* 23:48 eevans@deploy1001: Started deploy [cassandra/logstash-logback-encoder@d085ffa]: deploy logback to restbase1017 ([[phab:T222960|T222960]])
* 23:47 eevans@deploy1001: Finished deploy [cassandra/logstash-logback-encoder@d085ffa]: (no justification provided) (duration: 01m 56s)
* 23:45 eevans@deploy1001: Started deploy [cassandra/logstash-logback-encoder@d085ffa]: (no justification provided)
* 23:38 eevans@deploy1001: deploy aborted: (no justification provided) (duration: 02m 00s)
* 23:36 eevans@deploy1001: Started deploy [cassandra/logstash-logback-encoder@d085ffa]: (no justification provided)
* 23:15 thcipriani@deploy1001: Synchronized wmf-config: SWAT: [[gerrit:521338{{!}}Oversample all EditAttemptStep events on VE-as-mobile-default wikis]] [[phab:T227317|T227317]] (duration: 00m 50s)
* 22:59 mutante: netmon1003 - removing servermon - servermon.wikimedia.org is being decom'ed  ([[phab:T198939|T198939]])
* 22:37 RoanKattouw: Deployed fix for [[phab:T224240|T224240]], accidentally rode along with Tyler's no-op scap
* 22:34 thcipriani@deploy1001: rebuilt and synchronized wikiversions files: wikidatawiki back to 1.34.0-wmf.13
* 22:26 thcipriani@deploy1001: Finished scap: no op scap sync to rebuild l10n-cache ([[phab:T227814|T227814]]) (duration: 19m 34s)
* 22:07 thcipriani@deploy1001: Started scap: no op scap sync to rebuild l10n-cache ([[phab:T227814|T227814]])
* 21:23 otto@deploy1001: Finished deploy [analytics/refinery@3296aab]: (no justification provided) (duration: 02m 02s)
* 21:21 otto@deploy1001: Started deploy [analytics/refinery@3296aab]: (no justification provided)
* 20:22 otto@deploy1001: Finished deploy [analytics/refinery@3296aab]: (no justification provided) (duration: 00m 02s)
* 20:22 otto@deploy1001: Started deploy [analytics/refinery@3296aab]: (no justification provided)
* 20:20 otto@deploy1001: Finished deploy [analytics/refinery@3296aab]: (no justification provided) (duration: 00m 03s)
* 20:20 otto@deploy1001: Started deploy [analytics/refinery@3296aab]: (no justification provided)
* 20:19 otto@deploy1001: Finished deploy [analytics/refinery@3296aab]: (no justification provided) (duration: 00m 02s)
* 20:19 otto@deploy1001: Started deploy [analytics/refinery@3296aab]: (no justification provided)
* 20:18 otto@deploy1001: Finished deploy [analytics/refinery@3296aab]: (no justification provided) (duration: 00m 02s)
* 20:18 otto@deploy1001: Started deploy [analytics/refinery@3296aab]: (no justification provided)
* 20:11 milimetric@deploy1001: deploy aborted: Fix to reimport cu_changes (duration: 27m 34s)
* 20:03 jhuneidi@deploy1001: rebuilt and synchronized wikiversions files: Revert wikidata to 1.34.0-wmf.11
* 19:44 milimetric@deploy1001: Started deploy [analytics/refinery@3296aab]: Fix to reimport cu_changes
* 19:29 jhuneidi@deploy1001: rebuilt and synchronized wikiversions files: all wikis to 1.34.0-wmf.13  refs [[phab:T220738|T220738]]
* 18:09 jhuneidi@deploy1001: Synchronized php: group1 wikis to 1.34.0-wmf.13  refs [[phab:T220738|T220738]] (duration: 00m 57s)
* 18:08 jhuneidi@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.34.0-wmf.13  refs [[phab:T220738|T220738]]
* 18:02 cdanis: cdanis@cumin1001.eqiad.wmnet ~ % sudo debdeploy deploy -u [[phab:T197126|T197126]]-2019-07-11-conftool.yaml -s eqiad
* 17:37 cdanis: cdanis@cumin1001.eqiad.wmnet ~ % sudo debdeploy deploy -u [[phab:T197126|T197126]]-2019-07-11-conftool.yaml -s codfw
* 17:02 cdanis: cdanis@cumin1001.eqiad.wmnet ~ % sudo debdeploy deploy -u [[phab:T197126|T197126]]-2019-07-11-conftool.yaml -s esams
* 16:48 cdanis: cdanis@cumin1001.eqiad.wmnet ~ % sudo debdeploy deploy -u [[phab:T197126|T197126]]-2019-07-11-conftool.yaml -s eqsin
* 16:19 cdanis: cdanis@cumin1001.eqiad.wmnet ~ % sudo debdeploy deploy -u [[phab:T197126|T197126]]-2019-07-11-conftool.yaml -s ulsfo
* 16:12 XioNoX: revert deactivate ping-offload in eqiad for server reboot
* 16:03 moritzm: rebooting ping1001 to pick up MDS-enabled qemu
* 16:02 cdanis: repool cp4022 after testing conftool change
* 15:59 XioNoX: deactivate ping-offload in eqiad for server reboot
* 15:58 cdanis: depool cp4022 for testing conftool change
* 15:58 XioNoX: revert deactivate ping-offload in codfw for server reboot
* 15:56 moritzm: installing dnspython update from stretch point release
* 15:53 moritzm: rebooting ping2001 to pick up MDS-enabled qemu
* 15:53 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 15:53 jmm@cumin2001: START - Cookbook sre.hosts.downtime
* 15:50 XioNoX: deactivate ping-offload in codfw for server reboot
* 15:45 thcipriani@deploy1001: Finished deploy [gerrit/gerrit@4daa16c]: it-phabricator plugin update (cobalt) (duration: 00m 11s)
* 15:45 thcipriani@deploy1001: Started deploy [gerrit/gerrit@4daa16c]: it-phabricator plugin update (cobalt)
* 15:44 thcipriani@deploy1001: Finished deploy [gerrit/gerrit@4daa16c]: it-phabricator plugin update (gerrit2001 only) (duration: 00m 11s)
* 15:44 thcipriani@deploy1001: Started deploy [gerrit/gerrit@4daa16c]: it-phabricator plugin update (gerrit2001 only)
* 15:28 gehel: setting CPU governor to performance for wdqs1004 - [[phab:T225713|T225713]]
* 15:28 cdanis: upgrade to python3-conftool 1.1.0-1 on cp4022
* 15:05 cdanis: cdanis@cumin1001.eqiad.wmnet ~ % sudo debdeploy deploy -u [[phab:T197126|T197126]]-2019-07-11-conftool.yaml -s cp-canary
* 15:00 hashar_: restarted Jenkins for plugins upgrades
* 14:57 cdanis: cdanis@cumin1001.eqiad.wmnet ~ % sudo debdeploy deploy -u [[phab:T197126|T197126]]-2019-07-11-conftool.yaml -s mw-canary
* 14:55 gehel: setting CPU governor to performance for elastic1052 - [[phab:T225713|T225713]]
* 14:51 cdanis: upgrade to python3-conftool 1.1.0-1 on mwdebug2001
* 14:45 krinkle@deploy1001: Synchronized php-1.34.0-wmf.13/includes/libs/rdbms/database/Database.php: {{Gerrit|903f3f94f5d2e3}} / [[phab:T227708|T227708]] (duration: 00m 59s)
* 14:26 cdanis: cdanis@install1002.wikimedia.org ~ % sudo -E reprepro -C main include stretch-wikimedia /home/volans/conftool/stretch/conftool_1.1.0-1_amd64.changes
* 14:26 cdanis: cdanis@install1002.wikimedia.org ~ % sudo -E reprepro -C main include jessie-wikimedia /home/volans/conftool/jessie/conftool_1.1.0-1+deb8u1_amd64.changes
* 14:26 cdanis: cdanis@install1002.wikimedia.org ~ % sudo -E reprepro -C main include buster-wikimedia /home/volans/conftool/buster/conftool_1.1.0-1+deb10u1_amd64.changes
* 14:17 ema: restart wikibugs
* 13:40 godog: roll restart ms-be2016 ms-be2017 ms-be2018 ms-be2019 ms-be2020 ms-be2021 ms-be2028 ms-be2029 ms-be2030 ms-be2031 ms-be2032 ms-be2033 ms-be2034 ms-be2035 ms-be2036 - [[phab:T225713|T225713]]
* 13:00 ema: cp-ulsfo: varnish frontend rolling restarts for 5.1.3-1wm11 upgrades [[phab:T227672|T227672]]
* 12:48 ema: fleet-wide: remove obsolete file /etc/debdeploy-autorestarts.conf
* 12:44 ema: cp-ulsfo: upgrade mtail to 3.0.0~rc5-1~bpo9+1wmf1
* 12:44 Urbanecm: Running purgePage.php on pages in Page: NS on pawikisource ([[phab:T226959|T226959]])
* 12:39 jijiki: Disable puppet on mw1222, server will be depooled and pooled a few times for tests - [[phab:T224538|T224538]]
* 12:07 godog: ms-be2031 raid controller firmware upgrade 4.52 -> 6.88 - [[phab:T141756|T141756]]
* 12:03 godog: power reset ms-be2031, stuck and nothing on console
* 11:56 Urbanecm: EU SWAT done
* 11:54 urbanecm@deploy1001: Finished scap: Namespace translation for Punjabi ([[phab:T226959|T226959]]) (duration: 30m 13s)
* 11:24 urbanecm@deploy1001: Started scap: Namespace translation for Punjabi ([[phab:T226959|T226959]])
* 11:22 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: [[:gerrit:521933{{!}}Remove usergroup communityapps from officewiki]] ([[phab:T227680|T227680]]) (duration: 01m 02s)
* 11:20 urbanecm@deploy1001: Synchronized dblists/mobilemainpagelegacy.dblist: SWAT: [[:gerrit:522035{{!}}Remove commonswiki from mobilemainpagelegacy]] ([[phab:T227719|T227719]]) (duration: 00m 58s)
* 11:14 dcausse@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [cirrus] Enable UTR30 as a lookup method for ns prefixes on group2 (duration: 01m 02s)
* 10:45 moritzm: installing ldap-codfw-replica*
* 10:28 fsero: depooling ms-fe2005 for docker_registry_backups [[phab:T227570|T227570]]
* 10:08 fsero: creating swift docker_registry_container_backup [[phab:T227570|T227570]]
* 09:56 moritzm: re-enabling puppet (puppetdb reboots completed)
* 09:47 moritzm: rebooting puppetdb1001 to pick up MDS-enabled qemu
* 09:35 moritzm: rebooting puppetdb2001 to pick up MDS-enabled qemu
* 09:31 moritzm: disabling puppet temporarily (for puppetdb reboots)
* 09:27 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 09:27 jmm@cumin2001: START - Cookbook sre.hosts.downtime
* 08:51 godog: upload mtail 3.0.0~rc5-1~bpo9+1wmf1 to stretch-wikimedia - [[phab:T225604|T225604]]
* 08:14 ema: cp-ulsfo: downgrade mtail to 3.0.0~rc5-1~bpo9+1 to fix varnishmtail-backend [[phab:T225604|T225604]]
* 07:43 moritzm: installing ldap-codfw-replica* [[phab:T227669|T227669]]
* 07:31 jmm@cumin2001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
* 07:21 jmm@cumin2001: START - Cookbook sre.ganeti.makevm
* 07:21 jmm@cumin2001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
* 07:11 jmm@cumin2001: START - Cookbook sre.ganeti.makevm
* 07:10 jmm@cumin2001: END (ERROR) - Cookbook sre.ganeti.makevm (exit_code=97)
* 07:10 jmm@cumin2001: START - Cookbook sre.ganeti.makevm
* 02:27 ejegg: updated payments-wiki from {{Gerrit|4c1261fe5d}} to {{Gerrit|59ace50d66}}
 
== 2019-07-10 ==
* 23:16 jforrester@deploy1001: Synchronized php-1.34.0-wmf.13/extensions/CirrusSearch/includes: [[phab:T227691|T227691]] RedirectsAndIncomingLinks: succeede or fail, but not both (duration: 01m 02s)
* 23:02 jforrester@deploy1001: Synchronized php-1.34.0-wmf.13/extensions/OAuth/includes/backend/MWOAuthUtils.php: [[phab:T227688|T227688]] OAuth: Do not rely on array autocreation for custom User properties; re-try (duration: 00m 58s)
* 22:59 jforrester@deploy1001: scap failed: average error rate on 3/11 canaries increased by 10x (rerun with --force to override this check, see https://logstash.wikimedia.org/goto/db09a36be5ed3e81155041f7d46ad040 for details)
* 22:57 jforrester@deploy1001: Synchronized php-1.34.0-wmf.13/includes/user/User.php: [[phab:T227688|T227688]] User: support setting custom fields + array autocreation in non-existent field (duration: 00m 58s)
* 22:46 shdubsh: downgrading cp4031 to mtail_3.0.0~rc5-1~bpo9+1wmf1 to fix varnishmtail [[phab:T225604|T225604]]
* 22:46 jforrester@deploy1001: Synchronized w: [[phab:T156319|T156319]] Remove /w/skin-1.5 symlink (duration: 00m 58s)
* 22:16 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [[phab:T212865|T212865]] Stop configuring ZeroBanner and ZeroPortal, unused (duration: 00m 58s)
* 22:06 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: [[phab:T212865|T212865]] Drop the ability to use ZeroBanner and ZeroPortal from production (duration: 00m 57s)
* 22:03 jforrester@deploy1001: Synchronized wmf-config/mobile.php: [[phab:T212865|T212865]] Drop the ability to use ZeroBanner and ZeroPortal from production, mobile code (duration: 00m 57s)
* 21:59 jforrester@deploy1001: Synchronized w/robots.php: [[phab:T212865|T212865]] Drop the special treatment for Wikipedia Zero (duration: 00m 58s)
* 21:55 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [[phab:T212865|T212865]] Drop the Wikipedia Zero debug log channel (duration: 00m 58s)
* 21:51 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [[phab:T187716|T187716]] Drop all zerowiki configuration (duration: 00m 58s)
* 21:50 mutante: mwdebug1002 - php7adm /opcache-free  because icinga showed a warning for opcache free space below 100MB
* 21:49 jforrester@deploy1001: Synchronized dblists/: [[phab:T187716|T187716]] Mark zerowiki as deleted in dblists (duration: 01m 00s)
* 21:41 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [[phab:T212865|T212865]] Disable ZeroBanner on all wikis (duration: 00m 59s)
* 21:36 mutante: mw1235 - restarting hhvm (socket timeout alert in icinga since about 1.5h)
* 21:35 mutante: mw1290 - restarting hhvm (socket timeout alert in icinga since about 5h)
* 19:45 hoo: Updated the Wikidata property suggester with data from the 2019-07-01 JSON dump and applied the [[phab:T132839|T132839]] workarounds
* 19:32 otto@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Produce recentchange stream to eventgate-main - [[phab:T211248|T211248]] (duration: 00m 57s)
* 19:26 otto@deploy1001: Synchronized wmf-config/CommonSettings.php: Use wgEventServiceStreamConfig to configure wgRCFeeds eventbus. No-op in prod. - [[phab:T211248|T211248]] (duration: 00m 58s)
* 19:05 jiji@deploy1001: Finished deploy [cpjobqueue/deploy@8761480]: Migrating rest of hightraffic jobs to PHP7 - [[phab:T219150|T219150]] (duration: 01m 00s)
* 19:04 jiji@deploy1001: Started deploy [cpjobqueue/deploy@8761480]: Migrating rest of hightraffic jobs to PHP7 - [[phab:T219150|T219150]]
* 18:15 jforrester@deploy1001: Synchronized php-1.34.0-wmf.13/includes/Linker.php: [[phab:T227656|T227656]] Fix visibility of IPs that aren't suppressed (duration: 00m 59s)
* 17:54 twentyafterfour: phabricator: hotfixing fatal error by pulling upstream fix ( see https://secure.phabricator.com/D20644 )
* 16:09 Urbanecm: Morning SWAT done
* 16:09 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: [[:gerrit:521865{{!}}Change bawikibooks logo to correct one according to community wish]] (2/2, [[phab:T227418|T227418]]) (duration: 00m 58s)
* 16:07 Urbanecm: Purged two urls for [[phab:T227418|T227418]]
* 16:06 urbanecm@deploy1001: Synchronized static/images/project-logos/: SWAT: [[:gerrit:521865{{!}}Change bawikibooks logo to correct one according to community]] (1/2, [[phab:T227418|T227418]]) (duration: 01m 16s)
* 16:04 urbanecm@deploy1001: Synchronized dblists/commonsuploads.dblist: SWAT: [[:gerrit:521497{{!}}Disable local uploads on wuuwiki]] ([[phab:T226764|T226764]]) (duration: 00m 58s)
* 15:23 ema: cp-ulsfo: upgrade varnish to 5.1.3-1wm11 [[phab:T227672|T227672]]
* 15:08 ema: restart wb2-phab wikibugs job
* 14:51 ema: upload varnish 5.1.3-1wm11 to stretch-wikimedia [[phab:T227672|T227672]]
* 14:42 godog: reimage ms-be2022 - [[phab:T227667|T227667]]
* 14:03 jbond42: copy puppetdb-termini 4.4.0-1~wmf2 from stretch-wikimedia to jessie-wikimedia
* 13:47 ema: cp hosts: cleanup WP zero leftovers [[phab:T213769|T213769]]
* 13:22 godog: reset ilo on ms-be2022 - bios can't talk to it on boot
* 12:49 godog: reboot ms-be2022 - [[phab:T225713|T225713]]
* 11:53 Urbanecm: Purged 14 urls for [[phab:T211413|T211413]]
* 11:51 Urbanecm: Purged 24 urls for [[phab:T227635|T227635]]
* 11:11 Urbanecm: EU SWAT done
* 11:11 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Remove autopromote to patroller on testwiki ([[phab:T168718|T168718]]) (duration: 00m 58s)
* 11:10 urbanecm@deploy1001: Synchronized static/images/project-logos/: SWAT: Several logo changes ([[phab:T227635|T227635]] [[phab:T211413|T211413]]) (duration: 01m 00s)
* 11:09 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Remove fawikiquote HD logo ([[phab:T211413|T211413]]) (duration: 00m 57s)
* 11:07 urbanecm@deploy1001: sync-file aborted: SWAT: Several logo changes ([[phab:T227635|T227635]] [[phab:T211413|T211413]]) (duration: 00m 20s)
* 11:06 urbanecm@deploy1001: Synchronized docroot/noc/conf/highlight.php: SWAT: [[:gerrit:521576{{!}}Fix non-working "raw text" links on noc.wikimedia.org web pages]] ([[phab:T227606|T227606]]) (duration: 01m 02s)
* 09:57 moritzm: re-enabled puppet on hosts using acme_chief::cert for reboots of acmechief hosts (actually did that 20 minutes ago, but missed to log earlier)
* 09:54 jynus: disabling puppet on prometheus* hosts for upcoming deploy
* 09:38 fsero: doing the same on ms-be1030
* 09:37 fsero: docker-registry: running manual only once swift-container-sync on ms-be2019
* 09:36 moritzm: rearmed keyholder on acmechief1001
* 09:29 moritzm: rebooting acmechief1001 to pick up MDS-enabled qemu
* 09:25 moritzm: rearmed keyholder on acmechief2001
* 09:22 moritzm: rebooting acmechief2001 to pick up MDS-enabled qemu
* 09:21 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 09:21 jmm@cumin2001: START - Cookbook sre.hosts.downtime
* 09:19 moritzm: disabled puppet on hosts using acme_chief::cert for reboots of acmechief hosts
* 08:16 vgutierrez@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
* 08:16 vgutierrez@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
* 08:06 vgutierrez@cumin1001: START - Cookbook sre.ganeti.makevm
* 08:06 vgutierrez@cumin1001: START - Cookbook sre.ganeti.makevm
* 05:41 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Fully repool db1079 after upgrade (duration: 00m 57s)
* 05:32 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: More traffic to db1079 after upgrade (duration: 00m 57s)
* 05:17 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Slowly repool db1079 after upgrade (duration: 00m 58s)
* 05:05 marostegui: Upgrade db1079
* 05:05 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1079 for upgrade (duration: 00m 59s)
 
== 2019-07-09 ==
* 23:07 jhuneidi@deploy1001: rebuilt and synchronized wikiversions files: group0 wikis to 1.34.0-wmf.13  refs [[phab:T220738|T220738]]
* 23:06 robh: updating power ports on [[phab:T209101|T209101]] and disabling ports not in used (only turning off one side and awaiting any icinga alerts for 15 minutes before touching other side of power)
* 22:55 krinkle@deploy1001: Synchronized php-1.34.0-wmf.13/extensions/AbuseFilter/includes/AbuseFilter.php: {{Gerrit|0096dff3022}} / [[phab:T227613|T227613]] (duration: 00m 57s)
* 22:52 krinkle@deploy1001: Synchronized php-1.34.0-wmf.13/extensions/SecurePoll/includes/pages/: {{Gerrit|c7d7a55b8e8d947234a9}} / [[phab:T227620|T227620]] (duration: 00m 57s)
* 22:09 krinkle@deploy1001: Synchronized php-1.34.0-wmf.13/extensions/Collection/includes/CollectionProposals.php: [[phab:T227407|T227407]] / {{Gerrit|69a30966c}} (duration: 00m 57s)
* 21:53 krinkle@deploy1001: Synchronized php-1.34.0-wmf.13/includes/libs/rdbms/: [[phab:T226770|T226770]] / {{Gerrit|4c2a58589f2db}} (duration: 00m 59s)
* 20:58 jhuneidi@deploy1001: rebuilt and synchronized wikiversions files: Revert "group0 wikis to 1.34.0-wmf.11"
* 20:37 mutante: scb1001 - re-activate puppet, run puppet, stop pdfrender service, run puppet again ([[phab:T226675|T226675]])
* 20:36 mutante: scb2001 - sudo systemctl stop pdfrender ([[phab:T226675|T226675]])
* 20:25 mutante: temp disabling puppet on scb1001 - removing pdfrender classes from scb2001
* 20:23 jhuneidi@deploy1001: rebuilt and synchronized wikiversions files: group0 wikis to 1.34.0-wmf.13  refs [[phab:T220738|T220738]]
* 20:12 jhuneidi@deploy1001: Finished scap: testwikis wikis to 1.34.0-wmf.13 (duration: 36m 39s)
* 19:36 jhuneidi@deploy1001: Started scap: testwikis wikis to 1.34.0-wmf.13
* 19:17 XioNoX: enable samping on cr2-eqiad:border-in4
* 19:14 XioNoX: replace netflow target on cr2-eqiad with netflow1001
* 18:19 longma: cutting the branch for 1.34.0-wmf.13 [[phab:T220738|T220738]]
* 17:32 mholloway-shell@deploy1001: Finished deploy [mobileapps/deploy@2a9d097]: Fix etag generation for the talk endpoint, take 2 (duration: 02m 04s)
* 17:30 mholloway-shell@deploy1001: Started deploy [mobileapps/deploy@2a9d097]: Fix etag generation for the talk endpoint, take 2
* 17:30 mholloway-shell@deploy1001: Finished deploy [mobileapps/deploy@2a9d097]: Fix etag generation for the talk endpoint ([[phab:T227481|T227481]]) (duration: 03m 49s)
* 17:26 mholloway-shell@deploy1001: Started deploy [mobileapps/deploy@2a9d097]: Fix etag generation for the talk endpoint ([[phab:T227481|T227481]])
* 16:59 godog: reboot ms-be2039 with oemhp_powerreg=os - [[phab:T225713|T225713]]
* 16:54 godog: reboot ms-be2027 with oemhp_powerreg=os - [[phab:T225713|T225713]]
* 16:42 godog: reboot ms-be2026 with oemhp_powerreg=os - [[phab:T225713|T225713]]
* 16:29 godog: reboot ms-be2025 with oemhp_powerreg=os - [[phab:T225713|T225713]]
* 15:44 XioNoX: reject RPKI invalids on Ashburn peering links - [[phab:T220669|T220669]]
* 15:38 akosiaris: restart pybal on lvs2003, lvs1015. Removal of pdfrender service [[phab:T226675|T226675]]
* 15:38 XioNoX: reject RPKI invalids on Amsterdam peering link - [[phab:T220669|T220669]]
* 15:33 akosiaris: restart pybal on lvs2006, lvs1016. Removal of pdfrender service [[phab:T226675|T226675]]
* 15:28 XioNoX: reject RPKI invalids on Chicago peering link - [[phab:T220669|T220669]]
* 15:27 godog: reboot ms-be2024 with oemhp_powerreg=os - [[phab:T225713|T225713]]
* 15:22 godog: reboot ms-be2023 with oemhp_powerreg=os - [[phab:T225713|T225713]]
* 15:20 XioNoX: reject RPKI invalids on Singapore peering link - [[phab:T220669|T220669]]
* 15:13 XioNoX: reject RPKI invalids on Dallas peering link - [[phab:T220669|T220669]]
* 15:03 jeh: rebooting cloudnet1003.eqiad [[phab:T224228|T224228]]
* 14:53 gehel: repooled elastic2054 - [[phab:T227298|T227298]]
* 14:50 moritzm: installing orespoolcounter100[34] [[phab:T227567|T227567]]
* 14:42 XioNoX: reject RPKI invalids on ulsfo peering link - [[phab:T220669|T220669]]
* 14:29 jiji@deploy1001: Finished deploy [cpjobqueue/deploy@8517fec]: Migrating cirrus* jobs to PHP7 - [[phab:T219150|T219150]] (duration: 01m 02s)
* 14:28 jiji@deploy1001: Started deploy [cpjobqueue/deploy@8517fec]: Migrating cirrus* jobs to PHP7 - [[phab:T219150|T219150]]
* 14:28 jeh: rebooting cloudnet1004.eqiad [[phab:T224228|T224228]]
* 14:21 tarrow@deploy1001: scap-helm termbox finished
* 14:21 tarrow@deploy1001: scap-helm termbox cluster staging completed
* 14:21 tarrow@deploy1001: scap-helm termbox upgrade staging stable/termbox -f termbox-staging-values.yaml [namespace: termbox, clusters: staging]
* 13:59 moritzm: installing orespoolcounter200[34] [[phab:T227567|T227567]]
* 13:26 elukey: enable base::firewall on stat1007
* 13:25 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 13:25 jbond@cumin1001: START - Cookbook sre.hosts.downtime
* 13:25 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 13:25 jbond@cumin1001: START - Cookbook sre.hosts.downtime
* 12:27 jmm@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
* 12:21 jmm@cumin1001: START - Cookbook sre.ganeti.makevm
* 12:18 jmm@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
* 12:13 jmm@cumin1001: START - Cookbook sre.ganeti.makevm
* 12:11 jmm@cumin2001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99)
* 12:11 jmm@cumin2001: START - Cookbook sre.ganeti.makevm
* 12:09 jmm@cumin2001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
* 12:04 jmm@cumin2001: START - Cookbook sre.ganeti.makevm
* 12:02 jmm@cumin2001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
* 11:57 jmm@cumin2001: START - Cookbook sre.ganeti.makevm
* 11:47 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 11:47 jbond@cumin1001: START - Cookbook sre.hosts.downtime
* 11:30 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 11:30 jbond@cumin1001: START - Cookbook sre.hosts.downtime
* 11:30 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 11:29 jbond@cumin1001: START - Cookbook sre.hosts.downtime
* 11:26 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 11:26 jbond@cumin1001: START - Cookbook sre.hosts.downtime
* 11:13 Urbanecm: EU SWAT done
* 11:12 urbanecm@deploy1001: Synchronized wmf-config/flaggedrevs.php: SWAT: [[:gerrit:521383{{!}}Disable flaggedrevs for hewikisource main page]] ([[phab:T227000|T227000]]) (duration: 00m 48s)
* 11:11 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: [[:gerrit:521390{{!}}Clean up `wgNamespacesWithSubpages` to remove unneeded entries]] ([[phab:T227546|T227546]]) (duration: 00m 49s)
* 11:09 urbanecm@deploy1001: Synchronized wmf-config/CommonSettings.php: SWAT: [[:gerrit:517933{{!}}Configuration migration for Translate]] ([[phab:T87985|T87985]]) (duration: 00m 49s)
* 11:04 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: [[:gerrit:521298{{!}}Configure help urls for MediaInfo]] ([[phab:T227226|T227226]]) (duration: 00m 50s)
* 10:39 elukey: update wikimedia-buster thirparty/amd-rocm component with upstream packages - [[phab:T224723|T224723]]
* 10:14 jbond42: upgrade openssl on canary systems
* 09:30 ema@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp1076.eqiad.wmnet,service=ats-be
* 09:26 ema: cp1076: restart trafficserver with storage.config set to /dev/nvme0n1
* 09:25 ema@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp1076.eqiad.wmnet,service=ats-be
* 09:13 elukey: enable per-server metrics on all prometheus-mcrouter-exporter(s) via puppet - [[phab:T225059|T225059]]
* 09:11 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Fully repool db1086 after upgrade (duration: 00m 49s)
* 08:56 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Slowly repool db1086 after upgrade (duration: 00m 47s)
* 08:49 elukey: upgrade prometheus-mcrouter-exporter to 0.0.0+git20190709-1 on mw-eqiad (cumin alias) via debdeploy - [[phab:T225059|T225059]]
* 08:41 marostegui: Upgrade db1086
* 08:41 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1086 for upgrade (duration: 00m 51s)
* 08:36 elukey: upgrade prometheus-mcrouter-exporter to 0.0.0+git20190709-1 on mw-codfw (cumin alias) via debdeploy - [[phab:T225059|T225059]]
* 08:08 moritzm: installing zeromq3 security updates
* 08:00 marostegui: Upgrade db1065 to 10.1.39
* 07:39 moritzm: pruning unused libzmq3/python-zmq packages from swift/parsoid hosts
* 07:26 elukey: upload prometheus-mcrouter-exporter 0.0.0+git20190709-1 to stretch-wikimedia - [[phab:T225059|T225059]]
* 06:00 marostegui: Failover m2 from db1065 to db1132 - [[phab:T226952|T226952]]
* 05:19 marostegui: Start switchover steps [[phab:T226952|T226952]]
* 05:13 marostegui: Rebooting pc2010 for a second time as per papaul's suggestion [[phab:T227552|T227552]]
* 04:53 marostegui: Reboot pc2010 to debug a memory issue
* 01:47 XioNoX: restart PHP FPM on mwdebug2001
* 01:35 XioNoX: restart PHP FPM on mwdebug1002
 
== 2019-07-08 ==
* 23:03 tzatziki: changing password for user "Naomi.piquette"
* 20:57 bd808: Upgraded prometheus-pdns-exporter to 0.4.1 on cloudservices1004.wikimedia.org ([[phab:T227411|T227411]])
* 20:53 bd808: Upgraded prometheus-pdns-exporter to 0.4.1 on cloudservices1003.wikimedia.org ([[phab:T227411|T227411]])
* 19:38 reedy@deploy1001: Synchronized php-1.34.0-wmf.11/extensions/OATHAuth/src/Key/TOTPKey.php: [[phab:T227502|T227502]] (duration: 00m 50s)
* 19:23 moritzm: uploaded prometheus-pdns-exporter 0.4.1 to stretch-wikimedia [[phab:T227411|T227411]]
* 18:43 otto@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Produce page-* streams to eventgate-main - [[phab:T211248|T211248]] (duration: 00m 50s)
* 18:33 moritzm: installing zeromq3 security updates
* 18:15 Urbanecm: Morning SWAT done
* 18:14 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: [[:gerrit:521308{{!}}Change liwikinews logo to correct one per community wish]] (2/2, [[phab:T227418|T227418]]) (duration: 00m 49s)
* 18:13 urbanecm@deploy1001: Synchronized static/images/project-logos/: SWAT: [[:gerrit:521308{{!}}Change liwikinews logo to correct one per community wish]] (1/2, [[phab:T227418|T227418]]) (duration: 00m 49s)
* 18:10 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: [[:gerrit:521191{{!}}Add templateeditor user group and protection level on commons]] ([[phab:T227420|T227420]]) (duration: 00m 49s)
* 18:06 urbanecm@deploy1001: Synchronized wmf-config/CirrusSearch-production.php: SWAT: [[:gerrit:520446{{!}}[cirrus] Increase elastic master timeout to 5m]] ([[phab:T227136|T227136]]) (duration: 00m 49s)
* 18:04 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: [[:gerrit:520078{{!}}Enable RDF output for MediaInfo]] ([[phab:T221916|T221916]]) (duration: 00m 49s)
* 17:20 gehel@deploy1001: Finished deploy [wdqs/wdqs@4b7cdf5]: new blazegraph and updater version (duration: 12m 47s)
* 17:08 gehel@deploy1001: Started deploy [wdqs/wdqs@4b7cdf5]: new blazegraph and updater version
* 16:40 eevans@deploy1001: scap-helm sessionstore finished
* 16:40 eevans@deploy1001: scap-helm sessionstore cluster staging completed
* 16:40 eevans@deploy1001: scap-helm sessionstore upgrade staging -f sessionstore-staging-values.yaml stable/kask [namespace: sessionstore, clusters: staging]
* 16:39 eevans@deploy1001: scap-helm sessionstore finished
* 16:38 eevans@deploy1001: scap-helm sessionstore cluster staging completed
* 16:38 eevans@deploy1001: scap-helm sessionstore upgrade staging -f sessionstore-staging-values.yaml stable/kask [namespace: sessionstore, clusters: staging]
* 16:38 eevans@deploy1001: scap-helm sessionstore upgrade staging -f sessionstore-staging-values.yaml stable/kask [namespace: sessionstore, clusters: staging]
* 16:36 eevans@deploy1001: scap-helm sessionstore finished
* 16:36 eevans@deploy1001: scap-helm sessionstore cluster staging completed
* 16:36 eevans@deploy1001: scap-helm sessionstore upgrade staging -f sessionstore-staging-values.yaml stable/kask [namespace: sessionstore, clusters: staging]
* 16:05 sbassett@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Temporary make account creation limits more restrictive - part III (duration: 00m 50s)
* 15:59 godog: bounce prometheus@k8s on prometheus200[34] - [[phab:T227478|T227478]]
* 15:54 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Promote db2045 instead of db2069 as x1 codfw master (duration: 00m 49s)
* 15:45 marostegui: Failover db2069 to db2045 on x1 codfw
* 15:21 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Promote db2069 as x1 codfw master (duration: 00m 50s)
* 15:15 jynus: shutting down db2097 [[phab:T225378|T225378]] [[phab:T216240|T216240]]
* 15:13 jiji@deploy1001: Finished deploy [cpjobqueue/deploy@7379e91]: Migrating refreshLinks to PHP7 - [[phab:T219150|T219150]] (duration: 01m 26s)
* 15:12 jiji@deploy1001: Started deploy [cpjobqueue/deploy@7379e91]: Migrating refreshLinks to PHP7 - [[phab:T219150|T219150]]
* 15:07 eevans@deploy1001: scap-helm sessionstore finished
* 15:07 eevans@deploy1001: scap-helm sessionstore cluster staging completed
* 15:07 eevans@deploy1001: scap-helm sessionstore upgrade staging -f sessionstore-staging-values.yaml stable/kask [namespace: sessionstore, clusters: staging]
* 15:04 eevans@deploy1001: scap-helm sessionstore finished
* 15:04 eevans@deploy1001: scap-helm sessionstore cluster staging completed
* 15:04 eevans@deploy1001: scap-helm sessionstore upgrade staging -f sessionstore-staging-values.yaml stable/kask [namespace: sessionstore, clusters: staging]
* 14:57 marostegui: Failover x1 codfw from db2045 to db2069
* 14:48 ppchelko@deploy1001: Finished deploy [restbase/deploy@9a99b17]: Loosen etag regex for talk endpoint and fix alert (duration: 16m 07s)
* 14:45 marostegui: Restart MySQL on db1132 to enable performance_schema - [[phab:T226952|T226952]]
* 14:43 urandom: decommissioning restbase1017-c -- [[phab:T222960|T222960]]
* 14:32 ppchelko@deploy1001: Started deploy [restbase/deploy@9a99b17]: Loosen etag regex for talk endpoint and fix alert
* 14:21 papaul: shutting down elastic2054 for troubleshooting
* 14:05 ppchelko@deploy1001: Finished deploy [restbase/deploy@8e81e98]: Release 1.0, expose talk endpoints [[phab:T225733|T225733]], suggestions endpoints [[phab:T224754|T224754]], fix summary purging [[phab:T226983|T226983]] (duration: 16m 11s)
* 14:03 eevans@deploy1001: scap-helm sessionstore finished
* 14:03 eevans@deploy1001: scap-helm sessionstore cluster staging completed
* 14:03 eevans@deploy1001: scap-helm sessionstore upgrade staging -f sessionstore-staging-values.yaml stable/kask [namespace: sessionstore, clusters: staging]
* 13:53 godog: reprepro --delete clearvanished on install1002 to cleanup trusty
* 13:52 elukey: import AMD ROCm's Debian repo key (9386B48A1A693C5C) manually on install1002 - [[phab:T224723|T224723]]
* 13:51 moritzm: running "apt-get --allow-releaseinfo-update" on all buster hosts which were installed prior to the final buster release
* 13:48 ppchelko@deploy1001: Started deploy [restbase/deploy@8e81e98]: Release 1.0, expose talk endpoints [[phab:T225733|T225733]], suggestions endpoints [[phab:T224754|T224754]], fix summary purging [[phab:T226983|T226983]]
* 13:30 godog: bounce prometheus@k8s on prometheus1003
* 12:52 godog: copy mtail to buster-wikimedia - [[phab:T225604|T225604]]
* 12:42 kartik@deploy1001: scap-helm cxserver finished
* 12:42 kartik@deploy1001: scap-helm cxserver cluster eqiad completed
* 12:42 kartik@deploy1001: scap-helm cxserver upgrade -f cxserver-eqiad-values.yaml production stable/cxserver [namespace: cxserver, clusters: eqiad]
* 12:39 kartik@deploy1001: scap-helm cxserver finished
* 12:39 kartik@deploy1001: scap-helm cxserver cluster codfw completed
* 12:39 kartik@deploy1001: scap-helm cxserver upgrade -f cxserver-codfw-values.yaml production stable/cxserver [namespace: cxserver, clusters: codfw]
* 12:36 kartik@deploy1001: scap-helm cxserver finished
* 12:36 kartik@deploy1001: scap-helm cxserver cluster staging completed
* 12:36 kartik@deploy1001: scap-helm cxserver upgrade -f cxserver-staging-values.yaml staging stable/cxserver [namespace: cxserver, clusters: staging]
* 11:47 Urbanecm: EU SWAT done
* 11:44 urbanecm@deploy1001: Synchronized php-1.34.0-wmf.11/includes/Title.php: SWAT: [[:gerrit:521253{{!}}Title: ensure getBaseTitle and getRootTitle return valid Titles]] ([[phab:T225585|T225585]]) (duration: 00m 50s)
* 11:39 Urbanecm: Purged 14 logo urls for [[phab:T227418|T227418]]
* 11:36 urbanecm@deploy1001: Synchronized wmf-config/CirrusSearch-common.php: SWAT: [[:gerrit:521038{{!}}Fix array shape for $wgCirrusSearchExtraIndexes]] ([[phab:T227379|T227379]]) (duration: 00m 51s)
* 11:32 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: [[:gerrit:521183{{!}}Remove HD logos for projects with no entry in wgLogo or add a wgLogo entry]] (2/2, [[phab:T227418|T227418]]) (duration: 00m 49s)
* 11:30 urbanecm@deploy1001: Synchronized static/images/project-logos/: SWAT: [[:gerrit:521183{{!}}Remove HD logos for projects with no entry in wgLogo or add a wgLogo entry]] (1/2, [[phab:T227418|T227418]]) (duration: 00m 49s)
* 11:26 moritzm: installing poolcounter1004/1005
* 11:25 urbanecm@deploy1001: Synchronized php-1.34.0-wmf.11/extensions/AbuseFilter/: SWAT: [[:gerrit:520991{{!}}Fix query in normalizeThrottleParameters]] ([[phab:T209565|T209565]]) (duration: 00m 51s)
* 11:22 ladsgroup@deploy1001: Synchronized wmf-config/Wikibase.php: [[gerrit:520780{{!}}Disable Wikidata for ProofreadPage namespaces (T227201)]] (duration: 00m 50s)
* 11:16 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [[gerrit:521221{{!}}Enable jsonld output format for wikibase entities everywhere (T207168)]] (duration: 00m 49s)
* 11:11 urbanecm@deploy1001: Synchronized wmf-config/flaggedrevs.php: SWAT: [[:gerrit:521194{{!}}Remove "עמוד" namespace from wgFlaggedRevsNamespaces for hewikisource]] ([[phab:T227000|T227000]]) (duration: 00m 49s)
* 11:09 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: [[:gerrit:520997{{!}}Add several Ukrainian government websites to wgCopyUploadsDomains]] ([[phab:T227366|T227366]]) (duration: 00m 49s)
* 11:07 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: [[:gerrit:520507{{!}}Create "autopatrolled" user group on az.wiktionary]] ([[phab:T227208|T227208]]) (duration: 00m 49s)
* 11:04 urbanecm@deploy1001: Synchronized dblists/commonsuploads.dblist: SWAT: [[:gerrit:520507{{!}}Create "autopatrolled" user group on az.wiktionary]] ([[phab:T227208|T227208]]) (duration: 00m 50s)
* 10:56 moritzm: installing poolcounter2003/2004
* 10:39 jdrewniak@deploy1001: Synchronized portals: Wikimedia Portals Update: [[gerrit:521246{{!}} Bumping portals to master (T128546)]] (duration: 00m 49s)
* 10:38 jdrewniak@deploy1001: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: [[gerrit:521246{{!}} Bumping portals to master (T128546)]] (duration: 00m 51s)
* 09:51 elukey@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99)
* 09:51 elukey@cumin1001: START - Cookbook sre.ganeti.makevm
* 09:49 ema: removed /srv/prometheus/ops/targets/varnish-upload-ats_mtail_$DC.yaml from prometheus hosts
* 08:27 moritzm: updated buster installer images to final release
* 07:43 moritzm: rebooting hassium to pick up MDS-enabled qemu
* 07:43 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 07:43 jmm@cumin2001: START - Cookbook sre.hosts.downtime
* 07:40 moritzm: rebooting weblog1001 for kernel security update
* 07:38 jynus: deploying sys schema to missing db production hosts
* 07:34 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 07:34 jmm@cumin2001: START - Cookbook sre.hosts.downtime
* 07:00 elukey: add base::firewall to stat1004 - [[phab:T170826|T170826]]
* 06:43 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Fully repool db1109 after changing its binlog format (duration: 00m 49s)
* 06:36 marostegui: Run compare for s5 main tables on db2038 vs db2059 - [[phab:T221533|T221533]]
* 06:09 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: More traffic to db1109 after changing its binlog format (duration: 00m 49s)
* 05:50 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Fully repool db1094 after upgrade, slowly repool db1109 after changing its binlog format (duration: 00m 49s)
* 05:45 marostegui: Restart MySQL on db1109 to pick up STATEMENT as binlog format - [[phab:T227062|T227062]]
* 05:44 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1109 for binlog format change (duration: 00m 49s)
* 05:39 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: More weight to db1094 after upgrade (duration: 00m 51s)
* 05:31 marostegui: Compress medium wikis on labsdb1009 - [[phab:T222978|T222978]]
* 05:24 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Slowly repool db1094 after upgrade (duration: 00m 49s)
* 05:22 marostegui: Drop empty table edit_page_tracking from some s3 wikis - [[phab:T57385|T57385]]
* 05:11 marostegui: Drop empty table edit_page_tracking from s7 - [[phab:T57385|T57385]]
* 05:08 marostegui: Stop MySQL on db1094 for upgrade
* 05:08 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1094 for upgrade (duration: 00m 50s)
* 03:19 sbassett@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Temporary make account creation limits more restrictive (duration: 00m 53s)
* 01:16 sbassett@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Temporary make account creation limits more restrictive (duration: 00m 50s)
 
== 2019-07-07 ==
* 20:13 urandom: decommissioning restbase1017-b -- [[phab:T222960|T222960]]
* 17:25 urandom: decommissioning restbase1017-a -- [[phab:T222960|T222960]]
* 15:14 godog: power reset restbase2009
 
== 2019-07-06 ==
* 07:56 thcipriani: restarting gerrit out of heap space
 
== 2019-07-05 ==
* 17:18 smalyshev@deploy1001: Finished deploy [wdqs/wdqs@4b7cdf5]: Deploy new versions preparing for reload (duration: 00m 39s)
* 17:17 smalyshev@deploy1001: Started deploy [wdqs/wdqs@4b7cdf5]: Deploy new versions preparing for reload
* 17:17 smalyshev@deploy1001: Finished deploy [wdqs/wdqs@4b7cdf5]: Deploy new versions preparing for reload (duration: 00m 01s)
* 17:17 smalyshev@deploy1001: Started deploy [wdqs/wdqs@4b7cdf5]: Deploy new versions preparing for reload
* 15:32 fsero: uploaded debian buster base docker image
* 15:30 jmm@cumin2001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
* 15:23 fsero: restarting swift-container-sync on swift backends
* 15:20 jmm@cumin2001: START - Cookbook sre.ganeti.makevm
* 15:15 jmm@cumin2001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
* 15:05 jmm@cumin2001: START - Cookbook sre.ganeti.makevm
* 15:01 jmm@cumin2001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
* 14:51 jmm@cumin2001: START - Cookbook sre.ganeti.makevm
* 14:24 jmm@cumin2001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
* 14:15 jmm@cumin2001: START - Cookbook sre.ganeti.makevm
* 13:44 elukey: roll restart of aqs on aqs100* to pick up new druid settings
* 13:33 fsero: disabling puppet on swift backends
* 13:26 fsero: restarting swift-container-sync on swift backends
* 13:05 ema: pool cp1090 w/ ATS backend [[phab:T226638|T226638]]
* 12:12 ema: depool cp1090 and reimage as upload_ats [[phab:T226638|T226638]]
* 11:46 ema: pool cp1088 w/ ATS backend [[phab:T226638|T226638]]
* 11:39 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 11:39 jiji@cumin1001: START - Cookbook sre.hosts.downtime
* 11:38 jijiki: Reboot ms-be1021 - [[phab:T141756|T141756]] - [[phab:T227076|T227076]]
* 11:32 jijiki: Upgrading smartarray firmware on ms-be1021 - [[phab:T141756|T141756]] - [[phab:T227076|T227076]]
* 11:31 moritzm: installing postgresql-9.4 updates on jessie
* 11:10 jmm@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99)
* 11:09 jmm@cumin1001: START - Cookbook sre.ganeti.makevm
* 11:05 jmm@cumin2001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99)
* 11:05 jmm@cumin2001: START - Cookbook sre.ganeti.makevm
* 11:05 jmm@cumin2001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99)
* 11:04 jmm@cumin2001: START - Cookbook sre.ganeti.makevm
* 11:00 ema: depool cp1088 and reimage as upload_ats [[phab:T226638|T226638]]
* 10:55 ema: pool cp1086 w/ ATS backend [[phab:T226638|T226638]]
* 10:29 moritzm: rebooting debug proxies to pick up MDS-enabled qemu
* 10:29 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 10:29 jmm@cumin2001: START - Cookbook sre.hosts.downtime
* 10:23 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 10:23 jiji@cumin1001: START - Cookbook sre.hosts.downtime
* 10:23 moritzm: rebooting seaborgium to pick up correct Stretch kernel
* 10:15 moritzm: rebooting serpens to pick up correct Stretch kernel
* 10:14 moritzm: fixed up kernel packages on serpens/seaborgium, these were dist-upgraded from jessie, but the correct kernel packages for Stretch were not setup, as such there were still stuck with an old jessie kernel
* 10:06 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 10:06 jiji@cumin1001: START - Cookbook sre.hosts.downtime
* 10:02 jijiki: Rolling rebood rdb* hosts - [[phab:T227304|T227304]]
* 10:00 moritzm: rebooting seaborgium to pick up MDS-enabled qemu
* 09:51 moritzm: rebooting serpens to pick up MDS-enabled qemu
* 09:48 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 09:48 jmm@cumin2001: START - Cookbook sre.hosts.downtime
* 09:39 ema: depool cp1086 and reimage as upload_ats [[phab:T226638|T226638]]
* 09:31 moritzm: rebooting LDAP replicas in eqiad
* 09:28 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 09:28 jmm@cumin2001: START - Cookbook sre.hosts.downtime
* 09:15 gehel@puppetmaster1001: conftool action : set/pooled=inactive; selector: dc=codfw,name=elastic2054.codfw.wmnet
* 09:01 moritzm: rebooting kraz (irc.wikimedia.org) to pick up MDS-enabled qemu
* 08:54 jmm@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 08:54 jmm@cumin1001: START - Cookbook sre.hosts.downtime
* 07:57 jynus@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1087 (duration: 00m 48s)
* 07:35 moritzm: installing imagemagick security updates on jessie
* 07:23 moritzm: installing wireshark security updates on jessie
* 07:17 marostegui: Compress small wikis on labsdb1009 [[phab:T222978|T222978]]
* 07:13 jynus@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1087 (duration: 00m 52s)
* 06:46 jynus@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1109 with full weight (duration: 00m 49s)
* 06:35 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Remove old comments (duration: 00m 50s)
* 05:56 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Fully repool db1104 after upgrade (duration: 00m 49s)
* 05:43 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Slowly repool db1104 after upgrade (duration: 00m 49s)
* 05:23 marostegui: Upgrade db1104 [[phab:T227062|T227062]]
* 05:22 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1104 for upgrade (duration: 00m 51s)
* 05:18 vgutierrez@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
* 05:17 vgutierrez@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
* 05:09 marostegui: Stop MySQL on db1069 for decommission [[phab:T227166|T227166]]
* 05:08 vgutierrez@cumin1001: START - Cookbook sre.ganeti.makevm
* 05:08 vgutierrez@cumin1001: START - Cookbook sre.ganeti.makevm
* 05:02 marostegui: Remove db1069 from tendril and zarcillo - [[phab:T227166|T227166]]
 
== 2019-07-04 ==
* 21:50 volans@deploy1001: Finished deploy [debmonitor/deploy@0ee26a3]: Deploy Debmonitor v0.1.10 (duration: 00m 48s)
* 21:50 volans@deploy1001: Started deploy [debmonitor/deploy@0ee26a3]: Deploy Debmonitor v0.1.10
* 21:35 volans: forcing reboot of elastic2054 from console, host unresponsive - [[phab:T227298|T227298]]
* 17:03 AndyRussG: re-enabled banner impressions loader job
* 16:36 ema: pool cp1084 w/ ATS backend [[phab:T226638|T226638]]
* 16:02 AndyRussG: DjangoBannerStats revision changed from {{Gerrit|02be6cbb74}} to {{Gerrit|8965666e17}}
* 15:56 AndyRussG: temporarily disabled banner impressions loader job
* 15:34 ema: depool cp1084 and reimage as upload_ats [[phab:T226638|T226638]]
* 15:22 ema: pool cp1082 w/ ATS backend [[phab:T226638|T226638]]
* 14:51 twentyafterfour: phabricator: lowered phd.taskmasters config to 1 from 10
* 14:28 ema: depool cp1080 and reimage as upload_ats [[phab:T226638|T226638]]
* 13:51 volans: removing python-conftool (old py2 version) from all hosts - [[phab:T226965|T226965]]
* 13:40 ema: pool cp1080 w/ ATS backend [[phab:T226638|T226638]]
* 13:23 volans: upgraded scap to 3.11.0-1 on A:eqiad - [[phab:T227225|T227225]]
* 13:15 godog: reboot ms-be2037 after setting "os control" for power regulator mode - [[phab:T225713|T225713]]
* 13:05 volans: upgraded scap to 3.11.0-1 on A:codfw - [[phab:T227225|T227225]]
* 12:43 marostegui: Restore defaults replication consistency options on db2065 - [[phab:T227251|T227251]]
* 12:40 volans: upgraded scap to 3.11.0-1 on deploy[12]001 - [[phab:T227225|T227225]]
* 12:39 ema: depool cp1080 and reimage as upload_ats [[phab:T226638|T226638]]
* 12:24 jynus@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1109 with low weight (duration: 00m 49s)
* 12:21 hoo: Started a Wikidata JSON dump run (sudo -b -u dumpsgen /usr/local/bin/dumpwikidatajson.sh) on snapshot1008 ([[phab:T227207|T227207]])
* 12:01 moritzm: upgrading buster installations to final frozen package state
* 11:59 jynus: stop and upgrade db1109 [[phab:T227062|T227062]]
* 11:53 jynus@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1109 for upgrade (duration: 00m 50s)
* 11:47 jynus@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1109 for upgrade (duration: 00m 45s)
* 11:38 volans: upgraded scap to 3.11.0-1 on A:mw-canary - [[phab:T227225|T227225]]
* 10:47 marostegui: Ease replication consistency option on db2065 to allow it to catch a bit - [[phab:T227251|T227251]]
* 10:01 gehel@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 09:56 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 09:56 jmm@cumin2001: START - Cookbook sre.hosts.downtime
* 09:55 moritzm: rolling reboot of kubestagetcd* to pick up MDS-enabled qemu
* 09:52 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 09:52 jmm@cumin2001: START - Cookbook sre.hosts.downtime
* 09:41 moritzm: rearmed keyholder on netmon1002
* 09:36 moritzm: rebooting netmon1002 for kernel security update
* 09:36 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 09:36 jmm@cumin2001: START - Cookbook sre.hosts.downtime
* 09:25 volans: uploaded scap_3.11.0-1 to {jessie,stretch,buster}-wikimedia APT - [[phab:T227225|T227225]]
* 09:07 moritzm: partly rearmed keyholder on deploy1001 (missing for apache2modsec)
* 09:00 moritzm: rebooting deploy1001 for kernel security update
* 08:59 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 08:59 jmm@cumin2001: START - Cookbook sre.hosts.downtime
* 08:41 marostegui: Repool labsdb1011 - [[phab:T222978|T222978]]
* 08:29 gehel@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 08:29 vgutierrez: upgrading acme-chief to version 0.18 in acme-chief test instances - [[phab:T225945|T225945]]
* 08:25 moritzm: rearmed keyholder on cumin1001
* 08:22 vgutierrez: uploaded acme-chief 0.18 to apt.wikimedia.org (buster) - [[phab:T225945|T225945]]
* 08:22 ema: pool cp1078 w/ ATS backend [[phab:T226638|T226638]]
* 08:21 moritzm: rebooting cumin1001 for kernel security update
* 08:20 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 08:20 jmm@cumin2001: START - Cookbook sre.hosts.downtime
* 08:08 marostegui: Upgrade db2044 - [[phab:T226952|T226952]]
* 08:00 moritzm: rearmed keyholder on cumin2001
* 07:57 moritzm: rebooting cumin2001 for kernel security update
* 07:55 jmm@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 07:55 jmm@cumin1001: START - Cookbook sre.hosts.downtime
* 07:51 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Remove db1069 from config as it will be decommissioned [[phab:T227166|T227166]] (duration: 00m 48s)
* 07:50 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Remove db1069 from config as it will be decommissioned [[phab:T227166|T227166]] (duration: 00m 49s)
* 07:28 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Fully repool db1101 after upgrade (duration: 00m 49s)
* 07:17 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: More traffic to db1101 after upgrade (duration: 00m 49s)
* 07:17 ema: depool cp1078 and reimage as upload_ats [[phab:T226638|T226638]]
* 07:09 moritzm: rebooting restbase-dev* for kernel security updates
* 07:00 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 07:00 jmm@cumin2001: START - Cookbook sre.hosts.downtime
* 06:59 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: More traffic to db1101 after upgrade (duration: 00m 48s)
* 06:45 moritzm: restarting archiva on archiva.wikimedia.org to pick up Java security update
* 06:42 elukey: update puppet compiler's facts
* 05:57 twentyafterfour: disabled phd on phab1003 while I clean things up. Registered the downtime in icinga
* 05:33 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Slowly repool db1101 after upgrade (duration: 00m 49s)
* 05:16 marostegui: Upgrade db1101 - [[phab:T227062|T227062]]
* 05:16 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1101 for upgrade (duration: 00m 50s)
* 00:41 twentyafterfour: phabricator upgrade complete
* 00:27 twentyafterfour: Deploying Phabricator release/2019-07-03/1 from wmf/stable
* 00:21 cscott@deploy1001: Finished deploy [parsoid/deploy@af5fd0e]: Updating Parsoid to {{Gerrit|d355bc90}} (deploy-20170703 branch, [[phab:T227216|T227216]]) (duration: 06m 48s)
* 00:15 cscott@deploy1001: Started deploy [parsoid/deploy@af5fd0e]: Updating Parsoid to {{Gerrit|d355bc90}} (deploy-20170703 branch, [[phab:T227216|T227216]])
* 00:03 niharika29@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Deploy PB to wikisource, wikivoyage and wiktionary projects; [[phab:T218626|T218626]] (duration: 00m 50s)
 
== 2019-07-03 ==
* 23:26 foks: reset email for "Uwe Martens"
* 23:00 jforrester@deploy1001: Synchronized php-1.34.0-wmf.11/extensions/MobileFrontend/resources/dist/: [[phab:T221197|T221197]] schemaEditAttemptStep: only set bucket and anonymous-user-token on defaults if non-null (duration: 00m 51s)
* 22:59 mutante: stat1007 -  jbd2/md0-8 invoked oom-killer
* 22:57 mutante: stat1007 - systemctl restart nagios-nrpe-server after OOM from some python process
* 20:58 XioNoX: add static backup routes for anycast recdns on cr1/2-codfw/eqiad - [[phab:T186550|T186550]]
* 20:45 bsitzmann@deploy1001: Finished deploy [mobileapps/deploy@350e74b]: Update mobileapps to {{Gerrit|94d0233}} ([[phab:T205550|T205550]]) (duration: 05m 11s)
* 20:40 bsitzmann@deploy1001: Started deploy [mobileapps/deploy@350e74b]: Update mobileapps to {{Gerrit|94d0233}} ([[phab:T205550|T205550]])
* 20:28 bsitzmann@deploy1001: Finished deploy [mobileapps/deploy@cf64319]: Update mobileapps to {{Gerrit|fdb0108}} ([[phab:T205550|T205550]]) (duration: 01m 10s)
* 20:27 bsitzmann@deploy1001: Started deploy [mobileapps/deploy@cf64319]: Update mobileapps to {{Gerrit|fdb0108}} ([[phab:T205550|T205550]])
* 20:25 bsitzmann@deploy1001: Finished deploy [mobileapps/deploy@cf64319]: Update mobileapps to {{Gerrit|fdb0108}} ([[phab:T205550|T205550]]) (duration: 01m 25s)
* 20:24 bsitzmann@deploy1001: Started deploy [mobileapps/deploy@cf64319]: Update mobileapps to {{Gerrit|fdb0108}} ([[phab:T205550|T205550]])
* 20:12 jeh: rebooting labmon1001 [[phab:T224228|T224228]]
* 19:58 jeh: rebooting labmon1002 [[phab:T224228|T224228]]
* 19:44 jeh: rebooting labpuppetmaster1001 [[phab:T224228|T224228]]
* 19:22 jeh: rebooting labpuppetmaster1002 [[phab:T224228|T224228]]
* 19:10 jeh: rebooting cloudelastic1004 [[phab:T224228|T224228]]
* 19:02 jeh: rebooting cloudelastic1003 [[phab:T224228|T224228]]
* 18:58 jforrester@deploy1001: Synchronized php-1.34.0-wmf.11/extensions/Wikibase/data-access/src/GenericServices.php: [[phab:T227207|T227207]] Fix missing qualifier hashes in JSON output (duration: 00m 50s)
* 18:54 jeh: rebooting cloudelastic1002 [[phab:T224228|T224228]]
* 18:46 jeh: rebooting cloudelastic1001 [[phab:T224228|T224228]]
* 16:43 fsero@: helmfile [STAGING] Ran 'apply' command on namespace 'zotero' for release 'staging' .
* 16:36 fsero@: helmfile [STAGING] Ran 'apply' command on namespace 'zotero' for release 'staging' .
* 16:35 fsero@: helmfile [STAGING] Ran 'apply' command on namespace 'zotero' for release 'staging' .
* 16:35 fsero@: helmfile [STAGING] Ran 'apply' command on namespace 'blubberoid' for release 'staging' .
* 16:24 Urbanecm: Morning SWAT done
* 16:23 urbanecm@deploy1001: Synchronized php-1.34.0-wmf.11/extensions/ReadingLists/: SWAT: [[:gerrit:520480{{!}}Fix API continuation]] ([[phab:T226640|T226640]]) (duration: 00m 49s)
* 16:21 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/: SWAT: [[gerrit:519987{{!}}Enable DataBridge on Beta (T226816)]] (production no-op) (duration: 00m 54s)
* 16:18 robh@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 16:18 robh@cumin1001: START - Cookbook sre.hosts.decommission
* 16:18 robh@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 16:18 robh@cumin1001: START - C