You are browsing a read-only backup copy of Wikitech. The live site can be found at wikitech.wikimedia.org

Server Admin Log: Difference between revisions

From Wikitech-static
Jump to navigation Jump to search
imported>Stashbot
(jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: Remove unused preference T47877-buster (duration: 00m 47s))
imported>Stashbot
(herron: performing rolling reboots of logstash codfw frontends for security updates)
Line 1: Line 1:
== 2019-06-07 ==
== 2019-06-07 ==
* 00:00 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: Remove unused preference [[phab:T47877|T47877]]-buster (duration: 00m 47s)
* 18:56 herron: performing rolling reboots of logstash codfw frontends for security updates
* 00:00 bstorm_: [[phab:T224850|T224850]] repooled labsdb1009 after completing view updates
* 18:22 cstone: Update payments-wiki revision changed from {{Gerrit|c6c7bbf71e}} to {{Gerrit|75abd71cc1}}
* 15:34 godog: bounce rsyslog on wezen - [[phab:T199406|T199406]]


== 2019-06-06 ==
==2019-06-07==
* 23:57 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: Specify the fluidsynth paths for TMH MIDI conversion [[phab:T135597|T135597]] (duration: 00m 47s)
* 23:56 jforrester@deploy1001: Synchronized wmf-config/Wikibase.php: Remove [[phab:T225183|T225183]] (duration: 00m 48s)
* 23:03 jeh: [[phab:T224850|T224850]] depooled labsdb1009
* 22:42 bstorm_: [[phab:T224850|T224850]] repooled labsdb1011
* 21:01 bstorm_: [[phab:T224850|T224850]] depooled labsdb1011
* 20:58 jforrester@deploy1001: Synchronized wmf-config/reverse-proxy.php: Stop setting wgSquidServersNoPurge, MW now uses wgCdnServersNoPurge (duration: 00m 47s)
* 20:57 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Stop setting wgSquidMaxage, MW now uses wgCdnMaxAge (duration: 00m 46s)
* 20:55 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: Stop setting wgUseSquid or using wgSquidServersNoPurge, duplicate existing values (duration: 00m 48s)
* 20:49 jforrester@deploy1001: Synchronized wmf-config/Wikibase.php: Drop backwards-compatibility for dataSquidMaxage (duration: 00m 48s)
* 19:47 herron: performing rolling reboot of eqiad logstash hw for MDS security updates
* 18:58 jbond42: reimage sarin to stretch
* 18:39 jbond42: mw1249 - sudo systemctl restart php7.2-fpm.service
* 18:38 papaul: shutting down backup2001 for 10G nic troubleshooting
* 18:24 bstorm_: [[phab:T224850|T224850]] repooled labsdb1010 after completing view run
* 18:04 jijiki: Continuing rolling restarts of php-fpm in eqiad
* 17:30 elukey: restart mcrouter on mw2271 (codfw proxy) to pick up new config changes
* 15:56 bstorm_: [[phab:T224850|T224850]] depooled labsdb1010 for view updates
* 15:28 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 15:28 jmm@cumin2001: START - Cookbook sre.hosts.downtime
* 15:06 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 15:05 jmm@cumin2001: START - Cookbook sre.hosts.downtime
* 15:05 moritzm: rolling reboot of sessionstore hosts in eqiad for kernel security update
* 15:02 _joe_: rolling restart of php-fpm on {appservers,api} in eqiad, in groups of 4, staggered by 10 minutes, to pick up the new opcache settings
* 14:57 bstorm_: [[phab:T224850|T224850]] update views on labsdb1012
* 14:43 moritzm: updating qemu packages on ganeti hosts to deploy support for md_clear/MDS for Ganeti instances
* 14:43 elukey: restart mcrouter on mw2255 (codfw proxy) to pick up new config changes
* 14:22 dcausse@deploy1001: Synchronized wmf-config/CirrusSearch-common.php: fix logspam (duration: 00m 48s)
* 14:18 gehel@cumin1001: END (PASS) - Cookbook sre.wdqs.restart-wdqs (exit_code=0)
* 13:54 dcausse@deploy1001: Synchronized wmf-config/CirrusSearch-production.php: fix logspam (duration: 00m 47s)
* 13:44 moritzm: rolling reboot of sessionstore hosts in codfw for kernel security update
* 13:44 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 13:44 jmm@cumin2001: START - Cookbook sre.hosts.downtime
* 13:36 gehel@cumin1001: START - Cookbook sre.wdqs.restart-wdqs
* 13:35 zfilipin@deploy1001: rebuilt and synchronized wikiversions files: all wikis to 1.34.0-wmf.8
* 13:35 gehel@cumin1001: END (FAIL) - Cookbook sre.wdqs.restart-wdqs (exit_code=99)
* 13:35 gehel@cumin1001: START - Cookbook sre.wdqs.restart-wdqs
* 13:34 gehel@cumin1001: END (PASS) - Cookbook sre.wdqs.restart-wdqs (exit_code=0)
* 13:33 gehel@cumin1001: START - Cookbook sre.wdqs.restart-wdqs
* 13:32 gehel@cumin1001: END (PASS) - Cookbook sre.wdqs.restart-wdqs (exit_code=0)
* 13:31 gehel@cumin1001: START - Cookbook sre.wdqs.restart-wdqs
* 12:44 jbond42: reimage neodymium
* 12:23 _joe_: running puppet, restarting php-fpm on the canaries to pick up the new opcache size
* 12:11 ema: cp1075: repool with varnish 5.1.3-1wm10 [[phab:T224694|T224694]]
* 12:10 elukey: restart mcrouter on mw2235
* 12:05 Lucas_WMDE: EU SWAT done
* {{safesubst:SAL entry|1=12:04 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/: SWAT: [[gerrit:514700{{!}}Revert "Specify $wgWBRepoSettings['conceptBaseUri']" (duration: 00m 56s)}}
* 12:00 ema: cp1075: upgrade varnish to 5.1.3-1wm10 [[phab:T224694|T224694]]
* 11:55 lucaswerkmeister-wmde@deploy1001: scap failed: average error rate on 8/11 canaries increased by 10x (rerun with --force to override this check, see https://logstash.wikimedia.org/goto/db09a36be5ed3e81155041f7d46ad040 for details)
* 11:48 Urbanecm: running mwscript namespaceDupes.php --wiki=thwikisource --fix ([[phab:T216322|T216322]])
* 11:47 Urbanecm: running mwscript namespaceDupes.php --wiki=thwikibooks --fix for [[phab:T216322|T216322]]
* 11:46 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [[:gerrit:514678{{!}}Add new namespaces for several Thai projects]] ([[phab:T216322|T216322]]) (duration: 00m 54s)
* 11:38 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:514534{{!}}Remove unused config variable wgWikibaseEnableSenses]] (duration: 00m 55s)
* 11:23 gehel@cumin2001: END (PASS) - Cookbook sre.postgresql.postgres-init (exit_code=0)
* 11:22 lucaswerkmeister-wmde@deploy1001: Synchronized php-1.34.0-wmf.8/extensions/CirrusSearch/: SWAT: [[gerrit:514566{{!}}Fix event validation error for cirrussearch-request event]] (duration: 01m 06s)
* 10:55 elukey: restart mcrouter on mw2163 (codfw mcrouter proxy)
* 10:43 mobrovac@deploy1001: scap-helm mathoid finished
* 10:43 mobrovac@deploy1001: scap-helm mathoid cluster codfw completed
* 10:43 mobrovac@deploy1001: scap-helm mathoid cluster eqiad completed
* 10:43 mobrovac@deploy1001: scap-helm mathoid upgrade production stable/mathoid -f mathoid-values.yaml [namespace: mathoid, clusters: eqiad,codfw]
* 10:30 ema: varnish 5.1.3-1wm10 uploaded to stretch-wikimedia [[phab:T224694|T224694]]
* 10:19 elukey: rolling restart of mcrouter on mw1* hosts to pick up config change (batch of 5 hosts, depool/run-puppet/pool)
* 10:12 elukey: disable puppet on mw1* and mw[2163,2235,2255,2271] as prep step for mcrouter config deploy
* 10:10 fsero: rollbacked last deployment of mathoid to revision 16
* 09:59 mobrovac@deploy1001: scap-helm mathoid finished
* 09:59 mobrovac@deploy1001: scap-helm mathoid cluster codfw completed
* 09:59 mobrovac@deploy1001: scap-helm mathoid cluster eqiad completed
* 09:59 mobrovac@deploy1001: scap-helm mathoid upgrade production stable/mathoid -f mathoid-values.yaml [namespace: mathoid, clusters: eqiad,codfw]
* 09:32 moritzm: rebooting mwdebug2002 for some tests
* 09:31 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 09:30 jmm@cumin2001: START - Cookbook sre.hosts.downtime
* 09:28 moritzm: updating qemu on ganeti2004 for some tests
* 09:24 gehel@cumin2001: START - Cookbook sre.postgresql.postgres-init
* 08:38 marostegui: Stop MySQL on db1117:3322 - this will trigger haproxy alerts - [[phab:T222682|T222682]]
* 07:35 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1121 after upgrade [[phab:T224852|T224852]] (duration: 00m 53s)
* 07:20 marostegui: Stop MySQL on db1121 for upgrade, this will generate lag on labs hosts for s6 - [[phab:T224852|T224852]]
* 07:16 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Promote db2046 to s6 master as db2039 will be decommissioned [[phab:T221533|T221533]] (duration: 00m 55s)
* 06:31 marostegui: Start topology changes on s6 codfw to promote db2046 as master - [[phab:T221533|T221533]]
* 06:23 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1121 for upgrade [[phab:T224852|T224852]] (duration: 00m 55s)
* 06:15 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Fully repool db1091 after getting its BBU replaced  (duration: 00m 54s)
* 06:01 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: More traffic to db1091 after getting its BBU replaced  (duration: 01m 01s)
* 05:47 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: More traffic to db1091 after getting its BBU replaced  (duration: 00m 55s)
* 05:41 marostegui: Upgrade MySQL on s6 codfw hosts in preparation for s6 codfw master failover - [[phab:T221533|T221533]]
* 05:32 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: More traffic to db1091 after getting its BBU replaced  (duration: 00m 55s)
* 05:18 marostegui: Remove db2042 from tendril and zarcillo [[phab:T225090|T225090]]
* 05:18 marostegui: Remove db2042 from tendril and zarcillo
* 05:14 marostegui: Stop MySQL on db2042 to copy its content to dbprov2001 as a temporary backup - [[phab:T225090|T225090]]
* 05:11 marostegui: Disable notifications db2042 - [[phab:T225090|T225090]]
* 05:09 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Slowly repool db1091 after getting its BBU replaced [[phab:T225060|T225060]] (duration: 00m 56s)


== 2019-06-05 ==
*15:09 elukey: reboot thorium for kernel upgrades
* 22:15 chaomodus: restarting gerrit on cobalt due to it being down (seems like Java out of heap space)
*14:00 ema: pool cp3039 w/ ATS backend [[phab:T222937|T222937]]
* 20:43 mforns@deploy1001: Finished deploy [analytics/refinery@0660e70]: deploying analytics/refinery up to {{Gerrit|0660e70153dec892ae20bee7119a72cc17e8ec87}} (duration: 19m 30s)
*13:15 ema: depool cp3039 and reimage as upload_ats [[phab:T222937|T222937]]
* 20:39 reedy@deploy1001: Synchronized wmf-config/flaggedrevs.php: Turn off some FR config [[phab:T225138|T225138]] (duration: 00m 54s)
*13:04 arturo: aborrero@cumin1001:~ $ sudo cumin "P{R:Systemd::Timer::Job}" "puppet agent --enable && run-puppet-agent" (patch already merged)
* 20:25 akosiaris@deploy1001: scap-helm blubberoid finished
*13:03 arturo: aborrero@cumin1001:~$ sudo cumin "P{R:Systemd::Timer::Job}" "puppet agent --disable 'arturo merging systemd timer nrpe change'" (19 hosts affected) merging: https://gerrit.wikimedia.org/r/c/operations/puppet/+/514988
* 20:25 akosiaris@deploy1001: scap-helm blubberoid cluster codfw completed
*11:45 ema: pool cp3043 w/ ATS backend [[phab:T222937|T222937]]
* 20:25 akosiaris@deploy1001: scap-helm blubberoid cluster eqiad completed
*10:51 jbond42: upload libcpp-hocon0.1.6_0.1.6-1~bpo9+1_amd64.deb to wikimedia-stretch component/facter3
* 20:25 akosiaris@deploy1001: scap-helm blubberoid upgrade -f blubberoid-values.yaml production stable/blubberoid [namespace: blubberoid, clusters: eqiad,codfw]
*10:45 jbond42: upload libleatherman-data_1.4.0+dfsg-1\~bpo9+1_all.deb to wikimedia-stretch component/facter3
* 20:23 mforns@deploy1001: Started deploy [analytics/refinery@0660e70]: deploying analytics/refinery up to {{Gerrit|0660e70153dec892ae20bee7119a72cc17e8ec87}}
*10:43 ema: depool cp3043 and reimage as upload_ats [[phab:T222937|T222937]]
* 19:57 hashar: contint1001: docker container prune -f && docker image prune -f  # reclaimed 166 MB and 3.4 GB
*10:09 _joe_: restarting php-fpm on the codfw hosts to pick up the recent changes in opcache
* 19:48 marostegui: Check data consistency on db1091 against db1135 - [[phab:T225060|T225060]]
*09:59 jbond42: upload libleatherman1.4.0_1.4.0+dfsg-1~bpo9+1_amd64.deb to wikimedia-stretch component/facter3
* 19:45 reedy@deploy1001: Synchronized wmf-config/flaggedrevs.php: [[phab:T225115|T225115]] (duration: 00m 54s)
*09:49 jbond42: upload libleatherman1.4.0_1.4.0+dfsg-1~bpo8+1_amd64.deb to wikimedia-jessie component/facter3
* 17:36 marostegui: Start replication db1091 - [[phab:T225060|T225060]]
*09:16 mobrovac@deploy1001: scap-helm mathoid finished
* 17:32 marostegui: Start MySQL with replication stopped on db1091 - [[phab:T225060|T225060]]
*09:16 mobrovac@deploy1001: scap-helm mathoid cluster codfw completed
* 16:29 otto@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Revert user-blocks-change to use eventbus and old schema - [[phab:T211248|T211248]] (duration: 00m 54s)
*09:16 mobrovac@deploy1001: scap-helm mathoid cluster eqiad completed
* 16:22 otto@deploy1001: Synchronized wmf-config/InitialiseSettings.php: use eventgate-main for 2 events on all wikis - [[phab:T211248|T211248]] (duration: 00m 55s)
*09:16 mobrovac@deploy1001: scap-helm mathoid upgrade production stable/mathoid -f mathoid-values.yaml [namespace: mathoid, clusters: eqiad,codfw]
* 16:11 reedy@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Add wgEventServiceStreamConfig and switch 2 topics in group0 [[phab:T222822|T222822]] (duration: 00m 56s)
*09:00 marostegui: Upgrade x1 codfw hosts in preparation for its failover [[phab:T220170|T220170]]
* 16:11 XioNoX: remove BGP to AS38082 on cr4-ulsfo (left the IXP)
*08:46 elukey: start the reboot of the Analytics Hadoop's worker nodes for kernel+openjdk upgrades
* 15:46 reedy@deploy1001: Scap failed!: Call to mwscript eval.php returned: None
*08:24 marostegui: Upgrade s2 codfw to 10.1.39 in preparation for its codfw failover - [[phab:T221533|T221533]]
* 15:44 reedy@deploy1001: Finished scap: Rebuild .8 i18n for FlaggedRevs (duration: 41m 14s)
*08:19 XioNoX: remove BGP session to AS55658 on cr1-eqsin (left the IXP)
* 15:36 moritzm: installing exim4 security updates
*08:12 vgutierrez: upgrading certbot in wikitech-static
* 15:03 reedy@deploy1001: Started scap: Rebuild .8 i18n for FlaggedRevs
*07:29 marostegui: Drop unused temporary test tables on db1111 and db1112
* 14:24 marostegui: Poweroff db1091 for BBU replacement - [[phab:T225060|T225060]]
*05:40 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Move db2051 from s4 to s2T221533 (duration: 00m 49s)
* 13:57 elukey: restart mcrouter on MediaWiki app/api canaries to pick up new config change (timeouts before marking a memcached shard as TKO from 3 to 10) - [[phab:T203786|T203786]]
*00:00 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: Remove unused preference [[phab:T47877|T47877]]-buster (duration: 00m 47s)
* 13:56 jijiki: enabling puppet and pooling on mw* canaries
*00:00 bstorm_: [[phab:T224850|T224850]] repooled labsdb1009 after completing view updates
* 13:17 jynus: start es2,es3 backup on codfw
* 13:17 zfilipin@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.34.0-wmf.8
* 13:03 hashar: restarting Jenkins
* 12:53 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: More traffic to db1135 (duration: 00m 54s)
* 12:46 Lucas_WMDE: EU SWAT finished
* 12:32 ladsgroup@deploy1001: Synchronized php-1.34.0-wmf.8/extensions/WikimediaMessages/: SWAT: [[gerrit:514460{{!}}Fix wikidata copyright message (T224536)]] (duration: 00m 56s)
* 11:43 pmiazga@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:514449{{!}}Enable the new history page in the advanced mobile contributions mode (T219895)]] (duration: 00m 56s)
* 11:27 urbanecm@deploy1001: Synchronized wmf-config/flaggedrevs.php: [[:gerrit:514413{{!}}Remove project namespace from flaggedrevs on ruwikisource]] ([[phab:T225037|T225037]]) (duration: 00m 54s)
* 10:57 ladsgroup@deploy1001: Synchronized php-1.34.0-wmf.8/extensions/FlaggedRevs: [[gerrit:514456{{!}}Add ext.flaggedRevs.icons to modules registeration]] (duration: 00m 57s)
* 10:14 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: More traffic to db1135 (duration: 00m 55s)
* 10:09 godog: mount sdb3 on ms-be1022 - [[phab:T225079|T225079]]
* 09:50 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Pool db1135 with very low weight on s4 (duration: 00m 55s)
* 09:26 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Pool without traffic db1135 into s4 [[phab:T225060|T225060]] (duration: 00m 55s)
* 09:25 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Pool without traffic db1135 into s4 [[phab:T225060|T225060]] (duration: 00m 56s)
* 08:42 onimisionipe: removing maps2001 from cassandra cluster. It is going to be reimaged - [[phab:T224395|T224395]]
* 08:40 _joe_: rolling restart of php7 on the api servers, to test a different strategy of restarting compared to the appservers.
* 08:21 _joe_: performing a rolling restart of the php appservers via cumin to test speed and safety of the operations proposed in [[phab:T224857|T224857]]
* 08:13 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 08:12 jmm@cumin2001: START - Cookbook sre.hosts.downtime
* 08:12 moritzm: rebooting pybal-test2001 for tests with new qemu
* 08:12 ema: pool cp3035 w/ ATS backend [[phab:T222937|T222937]]
* 08:12 marostegui: Reboot db1091 [[phab:T225060|T225060]]
* 08:05 moritzm: installing qemu security updates on Ganeti hosts
* 07:45 marostegui: Transfer dbprov1001.eqiad.wmnet:snapshot.s4.2019-06-04--21-37-03.tar.gz to db1135 to provision it on s4 [[phab:T225060|T225060]]
* 07:33 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Clarify db1091 status (duration: 00m 56s)
* 07:22 ema: depool cp3035 and reimage as upload_ats [[phab:T222937|T222937]]
* 07:11 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1091 - host went down (duration: 00m 55s)
* 06:45 marostegui: Restart MySQL on db2110 to get the binlog format changed to STATEMENT - [[phab:T220170|T220170]]
* 06:45 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Promote db2090 to s4 codfw master [[phab:T220170|T220170]] (duration: 00m 54s)
* 06:25 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Mimic s4 codfw weights to eqiad [[phab:T220170|T220170]] (duration: 00m 55s)
* 06:17 marostegui: Start topology changes on s4 codfw to replace current master db2051 with db2090 - [[phab:T220170|T220170]]
* 06:13 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Fully repool db1084 into API (duration: 00m 54s)
* 05:57 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1084 after upgrade [[phab:T224852|T224852]] (duration: 00m 55s)
* 05:49 marostegui: Upgrade MySQL on db1084 [[phab:T224852|T224852]]
* 05:49 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1084 for upgrade [[phab:T224852|T224852]] (duration: 01m 06s)
* 05:31 marostegui: Stop MySQL on db1125 (sanitarium) s2,s4,s6,s7 to upgrade mysql - [[phab:T224852|T224852]]
* 05:29 marostegui: Keep compressing tables on labsdb1012 - [[phab:T222978|T222978]]
* 05:22 marostegui: Change replication topology  on m3 codfw to promote db2065 as codfw master instead of db2042 - [[phab:T221533|T221533]]
* 05:07 marostegui: Upgrade Mysql on labsdb1012 - [[phab:T224852|T224852]]
* 04:09 onimisionipe: starting postgres slave init on maps2001 - [[phab:T224395|T224395]]


== 2019-06-04 ==
==2019-06-06==
* 23:03 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Change log level to debug for PageTriage (duration: 01m 03s)
* 22:06 eileen: civicrm revision changed from {{Gerrit|506ebe2f2a}} to {{Gerrit|5c02e62d6e}}, config revision is {{Gerrit|63438eea43}}
* 21:08 jbond42: finished rolling reboots of mw1* servers
* 21:07 jbond42: finished tolling reboots of mw1* servers
* 20:56 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 20:55 jbond@cumin1001: START - Cookbook sre.hosts.downtime
* 20:36 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 20:36 jbond@cumin1001: START - Cookbook sre.hosts.downtime
* 20:26 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 20:26 jbond@cumin1001: START - Cookbook sre.hosts.downtime
* 20:10 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 20:10 jbond@cumin1001: START - Cookbook sre.hosts.downtime
* 20:03 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 20:03 jbond@cumin1001: START - Cookbook sre.hosts.downtime
* 19:55 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 19:55 jbond@cumin1001: START - Cookbook sre.hosts.downtime
* 19:48 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 19:48 jbond@cumin1001: START - Cookbook sre.hosts.downtime
* 19:48 XioNoX: replace logstash.svc.eqiad.wmnet syslog target with syslog.codfw.wmnet on cr4-ulsfo - [[phab:T224128|T224128]]
* 19:41 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 19:41 jbond@cumin1001: START - Cookbook sre.hosts.downtime
* 19:41 jbond42: reboot mwdebug1002
* 19:36 jbond42: reboot mwdebug1001
* 19:35 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 19:35 jbond@cumin1001: START - Cookbook sre.hosts.downtime
* 19:28 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 19:28 jbond@cumin1001: START - Cookbook sre.hosts.downtime
* 19:21 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 19:21 jbond@cumin1001: START - Cookbook sre.hosts.downtime
* 18:45 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 18:45 jbond@cumin1001: START - Cookbook sre.hosts.downtime
* 18:38 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 18:38 jbond@cumin1001: START - Cookbook sre.hosts.downtime
* 18:17 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 18:17 jbond@cumin1001: START - Cookbook sre.hosts.downtime
* 18:10 herron: correction — performing rolling reboots of codfw logstash hardware hosts for MDS security updates
* 18:10 herron: performing rolling reboots of eqiad logstash hardware hosts for MDS security updates
* 18:06 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 18:05 jbond@cumin1001: START - Cookbook sre.hosts.downtime
* 18:04 bblack: pool cp3045 - [[phab:T222937|T222937]]
* 17:49 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 17:49 jbond@cumin1001: START - Cookbook sre.hosts.downtime
* 17:38 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 17:38 jbond@cumin1001: START - Cookbook sre.hosts.downtime
* 17:37 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 17:37 jbond@cumin1001: START - Cookbook sre.hosts.downtime
* 17:33 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 17:33 jbond@cumin1001: START - Cookbook sre.hosts.downtime
* 17:27 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 17:27 jbond@cumin1001: START - Cookbook sre.hosts.downtime
* 17:21 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 17:21 jbond@cumin1001: START - Cookbook sre.hosts.downtime
* 16:58 legoktm: deleted some gerrit changes
* 16:54 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 16:54 jbond@cumin1001: START - Cookbook sre.hosts.downtime
* 16:40 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 16:40 jbond@cumin1001: START - Cookbook sre.hosts.downtime
* 16:33 robh@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 16:33 robh@cumin1001: START - Cookbook sre.hosts.decommission
* 16:33 robh@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 16:33 robh@cumin1001: START - Cookbook sre.hosts.decommission
* 16:33 robh@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 16:33 robh@cumin1001: START - Cookbook sre.hosts.decommission
* 16:32 marostegui: Compress some more tables on labsdb1012 before upgrading the host tomorrow [[phab:T222978|T222978]]
* 16:25 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 16:25 jbond@cumin1001: START - Cookbook sre.hosts.downtime
* 16:14 bblack: repool cp3035 (still varnish-be, but freshly installed!)
* 16:13 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 16:13 jbond@cumin1001: START - Cookbook sre.hosts.downtime
* 16:12 jbond42: starting rolling reboots of mw1*
* 16:09 bblack@cumin1001: conftool action : set/pooled=no; selector: name=cp3045.esams.wmnet
* 16:08 bblack: depool cp3045 for reimage - [[phab:T222937|T222937]]
* 15:56 reedy@deploy1001: Synchronized wmf-config/CommonSettings.php: JADE - [[phab:T212182|T212182]] (duration: 00m 53s)
* 15:55 reedy@deploy1001: Synchronized wmf-config/extension-list: JADE - [[phab:T212182|T212182]] (duration: 00m 53s)
* 15:52 reedy@deploy1001: Synchronized php-1.34.0-wmf.7/extensions/Jade: Consistency (duration: 01m 08s)
* 15:50 otto@deploy1001: Synchronized wmf-config/CommonSettings.php: Configure eventgate-main EventService. No-op in prod. [[phab:T211248|T211248]] (duration: 01m 19s)
* 15:41 bblack: reboot cp3035 post-reimage
* 15:27 otto@deploy1001: Synchronized wmf-config/InitialiseSettings-labs.php: Use eventgate-main in beta. No-op in prod. [[phab:T211248|T211248]] (duration: 00m 49s)
* 15:18 zfilipin@deploy1001: rebuilt and synchronized wikiversions files: group0 to 1.34.0-wmf.8
* 15:13 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 15:13 jmm@cumin2001: START - Cookbook sre.hosts.downtime
* 15:13 moritzm: draining ganeti1003 for eventual reboot to MDS-enabled Linux kernel
* 15:13 zfilipin@deploy1001: Finished scap: testwiki to php-1.34.0-wmf.8 and rebuild l10n cache (duration: 29m 46s)
* 15:04 moritzm: failover Ganeti master in eqiad to ganeti1001
* 14:51 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 14:51 jmm@cumin2001: START - Cookbook sre.hosts.downtime
* 14:51 bblack: depool cp3035 for ATS reimage - [[phab:T222937|T222937]]
* 14:43 zfilipin@deploy1001: Started scap: testwiki to php-1.34.0-wmf.8 and rebuild l10n cache
* 14:41 zfilipin@deploy1001: Pruned MediaWiki: 1.34.0-wmf.5 [keeping static files] (duration: 01m 38s)
* 14:39 zfilipin@deploy1001: Pruned MediaWiki: 1.34.0-wmf.4 [keeping static files] (duration: 01m 34s)
* 14:36 zfilipin@deploy1001: Pruned MediaWiki: 1.34.0-wmf.3 (duration: 11m 02s)
* 13:53 jbond42: restart mtail on lithium
* 13:46 fsero@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 13:46 fsero@cumin1001: START - Cookbook sre.hosts.decommission
* 13:30 jbond42: starting rolling reboots of mw1*
* 13:12 moritzm: draining ganeti1008 for eventual reboot to MDS-enabled Linux kernel
* 12:57 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 12:57 jmm@cumin2001: START - Cookbook sre.hosts.downtime
* 12:22 Urbanecm: ran mwscript deleteBatch.php --wiki=sawikisource -r '[[:phab:T214553{{!}}T214553]]: deleting useless red
* 12:13 akosiaris: restart pybal on lvs2003, lvs1015 for sessionstore LVS configuration. [[phab:T220401|T220401]]
* 12:09 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Repool db2046 (duration: 00m 46s)
* 12:04 akosiaris: restart pybal on lvs2006 for sessionstore LVS configuration. [[phab:T220401|T220401]]
* 11:40 akosiaris: restart pybal on lvs1015 for sessionstore LVS configuration. [[phab:T220401|T220401]]
* 11:39 krinkle@deploy1001: Synchronized php-1.34.0-wmf.7/includes/: [[phab:T221577|T221577]] / {{Gerrit|1286d131c01886}} (duration: 01m 04s)
* 11:39 jijiki: enabling puppet on mc1*
* 11:38 Urbanecm: run mwscript namespaceDupes.php --wiki=kuwiktionary --fix ([[phab:T224327|T224327]])
* 11:37 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [[:gerrit:514239{{!}}Custom namespaces for ku.wiktionary]] ([[phab:T224327|T224327]]) (duration: 00m 46s)
* 11:35 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [[:gerrit:507931{{!}}Add localized project logo for sahwikiquote]] (2/2, [[phab:T222065|T222065]]) (duration: 00m 47s)
* 11:34 urbanecm@deploy1001: Synchronized static/images/project-logos/: [[:gerrit:507931{{!}}Add localized project logo for sahwikiquote]] (1/2, [[phab:T222065|T222065]]) (duration: 00m 47s)
* 11:31 jijiki: enabling puppet on mc2*
* 11:29 Urbanecm: running mwscript namespaceDupes.php --wiki=sawikisource --add-prefix=[[phab:T214553|T214553]] --fix ([[phab:T214553|T214553]])
* 11:28 Urbanecm: run mwscript namespaceDupes.php --wiki=thwiki --fix ([[phab:T216322|T216322]])
* 11:28 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [[:gerrit:486221{{!}}Add Author namespace in Sanskrit Wikisource]] ([[phab:T214553|T214553]]) (duration: 00m 46s)
* 11:24 urbanecm@deploy1001: Synchronized wmf-config/CommonSettings.php: [[:gerrit:495918{{!}}Create new protection levels for dewiktionary]] (2/2, [[phab:T216885|T216885]]) (duration: 00m 47s)
* 11:23 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [[:gerrit:495918{{!}}Create new protection levels for dewiktionary]] (1/2, [[phab:T216885|T216885]]) (duration: 00m 47s)
* 11:19 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [[:gerrit:494016{{!}}Add editcontentmodel right to the templateeditor group on testwiki]] ([[phab:T217499|T217499]]) (duration: 00m 47s)
* 11:15 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [[:gerrit:491054{{!}}Add new namespaces for th.wiki]] ([[phab:T216322|T216322]]) (duration: 00m 47s)
* 11:09 krinkle@deploy1001: Synchronized php-1.34.0-wmf.6/includes/: [[phab:T221577|T221577]] / {{Gerrit|1286d131c01886}} (duration: 01m 07s)
* 11:02 moritzm: draining ganeti1007 for eventual reboot to MDS-enabled Linux kernel
* 11:02 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 11:02 jmm@cumin2001: START - Cookbook sre.hosts.downtime
* 10:44 jbond42: mw1* restarts will be delayed untill 11:15
* 10:42 jbond42: will start rolling reboots of mw1* servers 1t 10:50
* 09:27 moritzm: draining ganeti1006 for eventual reboot to MDS-enabled Linux kernel
* 09:25 jijiki: disable puppet on mc* hosts to merge 511963 and 511973
* 09:01 moritzm: draining ganeti1005 for eventual reboot to MDS-enabled Linux kernel
* 08:57 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 08:57 jmm@cumin2001: START - Cookbook sre.hosts.downtime
* 08:32 elukey: remove memcached nutcracker config from mw1* hosts (not used). Changes will be picked up when nutcracker will be restarted (after reboots, etc..) - [[phab:T214275|T214275]]
* 08:23 moritzm: draining ganeti1004 for eventual reboot to MDS-enabled Linux kernel
* 08:04 marostegui: Stop MySQL on db2046 to clone db2058 - [[phab:T221533|T221533]]
* 08:04 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Depool db2046 (duration: 00m 47s)
* 08:03 elukey: restart hive-server2 on an-coord1001 to pick up new GC/Heap settings
* 07:35 mobrovac@deploy1001: Finished deploy [restbase/deploy@abcb534]: Use only Proton for PDF rendering - [[phab:T210651|T210651]] (duration: 19m 16s)
* 07:22 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 07:21 jmm@cumin2001: START - Cookbook sre.hosts.downtime
* 07:21 moritzm: draining ganeti1002 for eventual reboot to MDS-enabled Linux kernel
* 07:16 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Move db2058 from s4 to s6 (duration: 00m 47s)
* 07:16 mobrovac@deploy1001: Started deploy [restbase/deploy@abcb534]: Use only Proton for PDF rendering - [[phab:T210651|T210651]]
* 06:57 elukey: restart hive metastore on an-coord1001 to apply new GC/heap settings
* 06:42 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Fully repool db1091 after upgrade (duration: 00m 48s)
* 06:21 elukey: restart pdfrender on scb1002 (flapping)
* 06:04 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: More traffic to db1091 after upgrade (duration: 00m 47s)
* 05:54 marostegui: Stop MySQL on db2078:m3 - [[phab:T221533|T221533]]
* 05:51 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Slowly repool db1091 after upgrade (duration: 00m 47s)
* 05:40 marostegui: Stop MySQL on db1091 for MySQL upgrade [[phab:T224852|T224852]]
* 05:40 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1091 for upgrade (duration: 00m 48s)
* 05:27 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1097 after upgrade (duration: 00m 46s)
* 05:19 marostegui: Stop MySQL on db1097 for upgrade
* 05:19 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1097 for upgrade (duration: 00m 47s)
* 04:59 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Remove db1081 from API (duration: 00m 49s)
* 01:10 bstorm_: [[phab:T223406|T223406]] depooled/repooled labsdb1009 for view updates
* 00:09 bstorm_: [[phab:T223406|T223406]] repooled labsdb1011 after completing view updates


== 2019-06-03 ==
*23:57 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: Specify the fluidsynth paths for TMH MIDI conversion [[phab:T135597|T135597]] (duration: 00m 47s)
* 22:20 bstorm_: [[phab:T223406|T223406]] depooled labsdb1011
*23:56 jforrester@deploy1001: Synchronized wmf-config/Wikibase.php: Remove [[phab:T225183|T225183]] (duration: 00m 48s)
* 22:09 bstorm_: [[phab:T223406|T223406]] repooled labsdb1010 after completing view updates
*23:03 jeh: [[phab:T224850|T224850]] depooled labsdb1009
* 21:29 XioNoX: drop all ICMP frag on all routers - [[phab:T224186|T224186]]
*22:42 bstorm_: [[phab:T224850|T224850]] repooled labsdb1011
* 19:57 XioNoX: stop sampling from cr2-eqiad
*21:01 bstorm_: [[phab:T224850|T224850]] depooled labsdb1011
* 18:48 XioNoX: Add RPKI validators to all routers - [[phab:T220669|T220669]]
*20:58 jforrester@deploy1001: Synchronized wmf-config/reverse-proxy.php: Stop setting wgSquidServersNoPurge, MW now uses wgCdnServersNoPurge (duration: 00m 47s)
* 18:35 hashar: switch most Quibble jobs to node 10 [[phab:T222406|T222406]] - ttps://gerrit.wikimedia.org/r/#/c/integration/config/+/514034/ [[phab:T222406|T222406]]
*20:57 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Stop setting wgSquidMaxage, MW now uses wgCdnMaxAge (duration: 00m 46s)
* 18:35 XioNoX: drop all ICMP frag on cr1/2-eqiad - [[phab:T224186|T224186]]
*20:55 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: Stop setting wgUseSquid or using wgSquidServersNoPurge, duplicate existing values (duration: 00m 48s)
* 18:17 XioNoX: add routinator 0.4.0 to APT repo - [[phab:T220669|T220669]]
*20:49 jforrester@deploy1001: Synchronized wmf-config/Wikibase.php: Drop backwards-compatibility for dataSquidMaxage (duration: 00m 48s)
* 17:16 onimisionipe@deploy1001: Finished deploy [wdqs/wdqs@9e3035c]: Blazegraph version wmf.4 (duration: 11m 29s)
*19:47 herron: performing rolling reboot of eqiad logstash hw for MDS security updates
* 17:05 onimisionipe@deploy1001: Started deploy [wdqs/wdqs@9e3035c]: Blazegraph version wmf.4
*18:58 jbond42: reimage sarin to stretch
* 16:40 onimisionipe: started osm-import on maps2004 - [[phab:T224395|T224395]]
*18:39 jbond42: mw1249 - sudo systemctl restart php7.2-fpm.service
* 16:30 bstorm_: [[phab:T223406|T223406]] depooled labsdb1010 for view updates
*18:38 papaul: shutting down backup2001 for 10G nic troubleshooting
* 15:39 bstorm_: [[phab:T223406|T223406]] labsdb1012 updated views for actor table changes
*18:24 bstorm_: [[phab:T224850|T224850]] repooled labsdb1010 after completing view run
* 14:46 akosiaris: deploy kask in sessionstore kubernetes namespace in eqiad, codfw [[phab:T220401|T220401]]
*18:04 jijiki: Continuing rolling restarts of php-fpm in eqiad
* 14:34 arturo: [[phab:T221769|T221769]] reimaging cloudservices1003 to stretch
*17:30 elukey: restart mcrouter on mw2271 (codfw proxy) to pick up new config changes
* 14:20 vgutierrez: upgrading acme-chief to version 0.17 in acme-chief production instances - [[phab:T220518|T220518]]
*15:56 bstorm_: [[phab:T224850|T224850]] depooled labsdb1010 for view updates
* 13:53 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
*15:28 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 13:53 jmm@cumin2001: START - Cookbook sre.hosts.downtime
*15:28 jmm@cumin2001: START - Cookbook sre.hosts.downtime
* 13:53 moritzm: draining ganeti1001 for eventual reboot to MDS-enabled Linux kernel
*15:06 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 13:44 mholloway-shell@deploy1001: Synchronized wmf-config/InitialiseSettings.php: WikimediaEditorTasks: Drop caption edit counter unlock delay to 0 (duration: 00m 49s)
*15:05 jmm@cumin2001: START - Cookbook sre.hosts.downtime
* 13:38 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Pool db1138 into s4 API (duration: 00m 48s)
*15:05 moritzm: rolling reboot of sessionstore hosts in eqiad for kernel security update
* 13:19 marostegui: Move db2078:3321 under db2062 [[phab:T220170|T220170]]
*15:02 _joe_: rolling restart of php-fpm on {appservers,api} in eqiad, in groups of 4, staggered by 10 minutes, to pick up the new opcache settings
* 13:03 arturo: add prometheus-pdns-rec-exporter v0.7 to stretch-wikimedia ([[phab:T224877|T224877]])
*14:57 bstorm_: [[phab:T224850|T224850]] update views on labsdb1012
* 12:56 anomie@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Setting actor migration to write-new/read-new on remaining wikis ([[phab:T188327|T188327]]) (duration: 00m 48s)
*14:43 moritzm: updating qemu packages on ganeti hosts to deploy support for md_clear/MDS for Ganeti instances
* 12:24 arturo: add prometheus-pdns-exporter v0.4 to stretch-wikimedia ([[phab:T224877|T224877]])
*14:43 elukey: restart mcrouter on mw2255 (codfw proxy) to pick up new config changes
* 11:28 gehel: reboot relforge for microcode + jvm upgrade
*14:22 dcausse@deploy1001: Synchronized wmf-config/CirrusSearch-common.php: fix logspam (duration: 00m 48s)
* 11:17 jijiki: Restarting php7.2-fpm in eqiad in batches of 2 for 513949
*14:18 gehel@cumin1001: END (PASS) - Cookbook sre.wdqs.restart-wdqs (exit_code=0)
* 11:15 Urbanecm: EU SWAT done
*13:54 dcausse@deploy1001: Synchronized wmf-config/CirrusSearch-production.php: fix logspam (duration: 00m 47s)
* 11:15 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [[:gerrit:513740{{!}}Add Wikiprojekti namespace to wgExtraSignatureNamespaces for fiwiki]] ([[phab:T224215|T224215]]) (duration: 00m 47s)
*13:44 moritzm: rolling reboot of sessionstore hosts in codfw for kernel security update
* 11:11 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [[:gerrit:503680{{!}}Add 5 active namespaces for VisualEditor on en.wikiversity]] ([[phab:T220881|T220881]]) (duration: 00m 48s)
*13:44 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 11:06 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [[:gerrit:513720{{!}}Add "Zerrenda" (list) namespace to VisualEditor on euwiki]] ([[phab:T224801|T224801]]) (duration: 00m 48s)
*13:44 jmm@cumin2001: START - Cookbook sre.hosts.downtime
* 10:52 moritzm: upgrading maps servers to new Java security release
*13:36 gehel@cumin1001: START - Cookbook sre.wdqs.restart-wdqs
* 10:47 moritzm: upgrading WDQS servers to new Java security release
*13:35 zfilipin@deploy1001: rebuilt and synchronized wikiversions files: all wikis to 1.34.0-wmf.8
* 10:42 vgutierrez: upgrading prometheus-trafficserver-exporter in upload_ats ulsfo instances
*13:35 gehel@cumin1001: END (FAIL) - Cookbook sre.wdqs.restart-wdqs (exit_code=99)
* 10:41 jdrewniak@deploy1001: Synchronized portals: Wikimedia Portals Update: [[gerrit:513972{{!}} Bumping portals to master (T128546)]] (duration: 00m 47s)
*13:35 gehel@cumin1001: START - Cookbook sre.wdqs.restart-wdqs
* 10:40 jdrewniak@deploy1001: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: [[gerrit:513972{{!}} Bumping portals to master (T128546)]] (duration: 00m 49s)
*13:34 gehel@cumin1001: END (PASS) - Cookbook sre.wdqs.restart-wdqs (exit_code=0)
* 10:36 jijiki: Restarting php7.2-fpm in codfw in batches of 2 for 513949
*13:33 gehel@cumin1001: START - Cookbook sre.wdqs.restart-wdqs
* 10:34 moritzm: upgrading Elastic servers to new Java security release
*13:32 gehel@cumin1001: END (PASS) - Cookbook sre.wdqs.restart-wdqs (exit_code=0)
* 10:26 bmansurov@deploy1001: Finished deploy [recommendation-api/deploy@5046f3c]: Update the recommendation API service (duration: 03m 15s)
*13:31 gehel@cumin1001: START - Cookbook sre.wdqs.restart-wdqs
* 10:23 bmansurov@deploy1001: Started deploy [recommendation-api/deploy@5046f3c]: Update the recommendation API service
*12:44 jbond42: reimage neodymium
* 10:03 oblivian@puppetmaster1001: conftool action : set/pooled=false; selector: name=codfw,dnsdisc=kartotherian
*12:23 _joe_: running puppet, restarting php-fpm on the canaries to pick up the new opcache size
* 10:02 oblivian@puppetmaster1001: conftool action : set/pooled=true; selector: name=eqiad,dnsdisc=kartotherian
*12:11 ema: cp1075: repool with varnish 5.1.3-1wm10 [[phab:T224694|T224694]]
* 09:48 onimisionipe: depooled maps codfw due to lag and disk issues - [[phab:T224395|T224395]]
*12:10 elukey: restart mcrouter on mw2235
* 09:46 moritzm: upgrading Druid/Kafka-Jumbo servers to new Java security release (will be picked up by forthcoming MDS reboots)
*12:05 Lucas_WMDE: EU SWAT done
* 09:43 moritzm: upgrading AQS servers to new Java security release (will be picked up by forthcoming MDS reboots)
*<nowiki>{{safesubst:SAL entry|1=12:04 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/: SWAT: [[gerrit:514700|Revert "Specify $wgWBRepoSettings['conceptBaseUri']" (duration: 00m 56s)}}</nowiki>
* 09:33 moritzm: upgrading Hadoop servers to new Java security release (will be picked up by forthcoming MDS reboots)
*12:00 ema: cp1075: upgrade varnish to 5.1.3-1wm10 [[phab:T224694|T224694]]
* 08:18 ema: cp1077: restart varnish-be
*11:55 lucaswerkmeister-wmde@deploy1001: scap failed: average error rate on 8/11 canaries increased by 10x (rerun with --force to override this check, see https://logstash.wikimedia.org/goto/db09a36be5ed3e81155041f7d46ad040 for details)
* 08:17 elukey: manually removed phab_clean_tmp from www-data's crontab on phab1001 to reduce cronspam
*11:48 Urbanecm: running mwscript namespaceDupes.php --wiki=thwikisource --fix ([[phab:T216322|T216322]])
* 08:16 ema: cp1075: restart varnish-be
*11:47 Urbanecm: running mwscript namespaceDupes.php --wiki=thwikibooks --fix for [[phab:T216322|T216322]]
* 08:03 marostegui: Stop MySQL on db1064 [[phab:T223217|T223217]]
*11:46 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [[:gerrit:514678{{!}}Add new namespaces for several Thai projects|gerrit:514678Add new namespaces for several Thai projects]] ([[phab:T216322|T216322]]) (duration: 00m 54s)
* 08:01 marostegui: Remove db1064 from tendril and zarcillo [[phab:T223217|T223217]]
*11:38 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:514534{{!}}Remove unused config variable wgWikibaseEnableSenses|gerrit:514534Remove unused config variable wgWikibaseEnableSenses]] (duration: 00m 55s)
* 07:58 elukey: refresh field list for logstash (via kibana Management -> Index patterns -> etc..)
*11:23 gehel@cumin2001: END (PASS) - Cookbook sre.postgresql.postgres-init (exit_code=0)
* 07:48 marostegui: Repool db1103 after upgrade [[phab:T224852|T224852]]
*11:22 lucaswerkmeister-wmde@deploy1001: Synchronized php-1.34.0-wmf.8/extensions/CirrusSearch/: SWAT: [[gerrit:514566{{!}}Fix event validation error for cirrussearch-request event|gerrit:514566Fix event validation error for cirrussearch-request event]] (duration: 01m 06s)
* 07:29 marostegui: Stop MySQL on db1103 (s2 and s4) for upgrade [[phab:T224852|T224852]]
*10:55 elukey: restart mcrouter on mw2163 (codfw mcrouter proxy)
* 07:29 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1103 for upgrade (duration: 00m 47s)
*10:43 mobrovac@deploy1001: scap-helm mathoid finished
* 07:10 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Fully repool db1081 into API after upgrade (duration: 00m 48s)
*10:43 mobrovac@deploy1001: scap-helm mathoid cluster codfw completed
* 06:50 elukey: roll restart varnishkafka (via puppet) for a config change - [[phab:T224236|T224236]]
*10:43 mobrovac@deploy1001: scap-helm mathoid cluster eqiad completed
* 06:46 kartik@deploy1001: scap-helm cxserver finished
*10:43 mobrovac@deploy1001: scap-helm mathoid upgrade production stable/mathoid -f mathoid-values.yaml [namespace: mathoid, clusters: eqiad,codfw]
* 06:46 kartik@deploy1001: scap-helm cxserver cluster eqiad completed
*10:30 ema: varnish 5.1.3-1wm10 uploaded to stretch-wikimedia [[phab:T224694|T224694]]
* 06:45 kartik@deploy1001: scap-helm cxserver upgrade -f cxserver-eqiad-values.yaml production stable/cxserver [namespace: cxserver, clusters: eqiad]
*10:19 elukey: rolling restart of mcrouter on mw1* hosts to pick up config change (batch of 5 hosts, depool/run-puppet/pool)
* 06:45 kartik@deploy1001: scap-helm cxserver finished
*10:12 elukey: disable puppet on mw1* and mw[2163,2235,2255,2271] as prep step for mcrouter config deploy
* 06:45 kartik@deploy1001: scap-helm cxserver cluster codfw completed
*10:10 fsero: rollbacked last deployment of mathoid to revision 16
* 06:45 kartik@deploy1001: scap-helm cxserver upgrade -f cxserver-codfw-values.yaml production stable/cxserver [namespace: cxserver, clusters: codfw]
*09:59 mobrovac@deploy1001: scap-helm mathoid finished
* 06:44 kartik@deploy1001: scap-helm cxserver finished
*09:59 mobrovac@deploy1001: scap-helm mathoid cluster codfw completed
* 06:44 kartik@deploy1001: scap-helm cxserver cluster staging completed
*09:59 mobrovac@deploy1001: scap-helm mathoid cluster eqiad completed
* 06:44 kartik@deploy1001: scap-helm cxserver upgrade -f cxserver-staging-values.yaml staging stable/cxserver [namespace: cxserver, clusters: staging]
*09:59 mobrovac@deploy1001: scap-helm mathoid upgrade production stable/mathoid -f mathoid-values.yaml [namespace: mathoid, clusters: eqiad,codfw]
* 06:39 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Slowly repool db1081 into API after upgrade (duration: 00m 49s)
*09:32 moritzm: rebooting mwdebug2002 for some tests
* 06:26 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1081 after upgrade (duration: 00m 46s)
*09:31 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 06:04 marostegui: Stop MySQL on db1081 for upgrade - [[phab:T224852|T224852]]
*09:30 jmm@cumin2001: START - Cookbook sre.hosts.downtime
* 06:00 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1081 for upgrade (duration: 00m 47s)
*09:28 moritzm: updating qemu on ganeti2004 for some tests
* 05:50 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Fully repool es1019 [[phab:T213422|T213422]] (duration: 00m 46s)
*09:24 gehel@cumin2001: START - Cookbook sre.postgresql.postgres-init
* 05:45 marostegui: Upgrade mariadb on dbstore1004 - [[phab:T224852|T224852]]
*08:38 marostegui: Stop MySQL on db1117:3322 - this will trigger haproxy alerts - [[phab:T222682|T222682]]
* 05:17 marostegui: Upgrade MariaDB on codfw hosts in preparation for s4 master failover [[phab:T217396|T217396]]
*07:35 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1121 after upgrade [[phab:T224852|T224852]] (duration: 00m 53s)
* 05:15 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: More traffic to es1019 [[phab:T213422|T213422]] (duration: 00m 46s)
*07:20 marostegui: Stop MySQL on db1121 for upgrade, this will generate lag on labs hosts for s6 - [[phab:T224852|T224852]]
* 05:05 marostegui: Remove db2037 from tendril and zarcillo [[phab:T224720|T224720]]
*07:16 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Promote db2046 to s6 master as db2039 will be decommissioned [[phab:T221533|T221533]] (duration: 00m 55s)
* 05:04 marostegui: Stop MySQL on db2037 for decommission [[phab:T224720|T224720]]
*06:31 marostegui: Start topology changes on s6 codfw to promote db2046 as master - [[phab:T221533|T221533]]
* 04:56 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Slowly repool es1019 [[phab:T213422|T213422]] (duration: 00m 51s)
*06:23 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1121 for upgrade [[phab:T224852|T224852]] (duration: 00m 55s)
*06:15 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Fully repool db1091 after getting its BBU replaced  (duration: 00m 54s)
*06:01 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: More traffic to db1091 after getting its BBU replaced  (duration: 01m 01s)
*05:47 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: More traffic to db1091 after getting its BBU replaced  (duration: 00m 55s)
*05:41 marostegui: Upgrade MySQL on s6 codfw hosts in preparation for s6 codfw master failover - [[phab:T221533|T221533]]
*05:32 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: More traffic to db1091 after getting its BBU replaced  (duration: 00m 55s)
*05:18 marostegui: Remove db2042 from tendril and zarcillo [[phab:T225090|T225090]]
*05:18 marostegui: Remove db2042 from tendril and zarcillo
*05:14 marostegui: Stop MySQL on db2042 to copy its content to dbprov2001 as a temporary backup - [[phab:T225090|T225090]]
*05:11 marostegui: Disable notifications db2042 - [[phab:T225090|T225090]]
*05:09 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Slowly repool db1091 after getting its BBU replaced [[phab:T225060|T225060]] (duration: 00m 56s)


== 2019-06-02 ==
==2019-06-05==
* 20:28 onimisionipe: pooled wdqs1007. It caught up on lag
* 15:24 onimisionipe: depooled wdqs1007 to catch up on lags
* 15:22 onimisionipe: depool wdqs internal cluster to allow  them catch up on lags. depool one at a time
* 03:09 andrewbogott: restarting pdns-recursor on cloudservices 1003 and 1004 (but not at the same time)


== 2019-06-01 ==
*22:15 chaomodus: restarting gerrit on cobalt due to it being down (seems like Java out of heap space)
* 22:49 krinkle@deploy1001: Synchronized php-1.34.0-wmf.7/extensions/3D/modules/mmv.3d.js: [[phab:T224812|T224812]] / {{Gerrit|bd4fbfddbe1a0}} (duration: 01m 07s)
*20:43 mforns@deploy1001: Finished deploy [analytics/refinery@0660e70]: deploying analytics/refinery up to {{Gerrit|0660e70153dec892ae20bee7119a72cc17e8ec87}} (duration: 19m 30s)
*20:39 reedy@deploy1001: Synchronized wmf-config/flaggedrevs.php: Turn off some FR config [[phab:T225138|T225138]] (duration: 00m 54s)
*20:25 akosiaris@deploy1001: scap-helm blubberoid finished
*20:25 akosiaris@deploy1001: scap-helm blubberoid cluster codfw completed
*20:25 akosiaris@deploy1001: scap-helm blubberoid cluster eqiad completed
*20:25 akosiaris@deploy1001: scap-helm blubberoid upgrade -f blubberoid-values.yaml production stable/blubberoid [namespace: blubberoid, clusters: eqiad,codfw]
*20:23 mforns@deploy1001: Started deploy [analytics/refinery@0660e70]: deploying analytics/refinery up to {{Gerrit|0660e70153dec892ae20bee7119a72cc17e8ec87}}
*19:57 hashar: contint1001: docker container prune -f && docker image prune -f  # reclaimed 166 MB and 3.4 GB
*19:48 marostegui: Check data consistency on db1091 against db1135 - [[phab:T225060|T225060]]
*19:45 reedy@deploy1001: Synchronized wmf-config/flaggedrevs.php: [[phab:T225115|T225115]] (duration: 00m 54s)
*17:36 marostegui: Start replication db1091 - [[phab:T225060|T225060]]
*17:32 marostegui: Start MySQL with replication stopped on db1091 - [[phab:T225060|T225060]]
*16:29 otto@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Revert user-blocks-change to use eventbus and old schema - [[phab:T211248|T211248]] (duration: 00m 54s)
*16:22 otto@deploy1001: Synchronized wmf-config/InitialiseSettings.php: use eventgate-main for 2 events on all wikis - [[phab:T211248|T211248]] (duration: 00m 55s)
*16:11 reedy@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Add wgEventServiceStreamConfig and switch 2 topics in group0 [[phab:T222822|T222822]] (duration: 00m 56s)
*16:11 XioNoX: remove BGP to AS38082 on cr4-ulsfo (left the IXP)
*15:46 reedy@deploy1001: Scap failed!: Call to mwscript eval.php returned: None
*15:44 reedy@deploy1001: Finished scap: Rebuild .8 i18n for FlaggedRevs (duration: 41m 14s)
*15:36 moritzm: installing exim4 security updates
*15:03 reedy@deploy1001: Started scap: Rebuild .8 i18n for FlaggedRevs
*14:24 marostegui: Poweroff db1091 for BBU replacement - [[phab:T225060|T225060]]
*13:57 elukey: restart mcrouter on MediaWiki app/api canaries to pick up new config change (timeouts before marking a memcached shard as TKO from 3 to 10) - [[phab:T203786|T203786]]
*13:56 jijiki: enabling puppet and pooling on mw* canaries
*13:17 jynus: start es2,es3 backup on codfw
*13:17 zfilipin@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.34.0-wmf.8
*13:03 hashar: restarting Jenkins
*12:53 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: More traffic to db1135 (duration: 00m 54s)
*12:46 Lucas_WMDE: EU SWAT finished
*12:32 ladsgroup@deploy1001: Synchronized php-1.34.0-wmf.8/extensions/WikimediaMessages/: SWAT: [[gerrit:514460{{!}}Fix wikidata copyright message (T224536)|gerrit:514460Fix wikidata copyright message (T224536)]] (duration: 00m 56s)
*11:43 pmiazga@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:514449{{!}}Enable the new history page in the advanced mobile contributions mode (T219895)|gerrit:514449Enable the new history page in the advanced mobile contributions mode (T219895)]] (duration: 00m 56s)
*11:27 urbanecm@deploy1001: Synchronized wmf-config/flaggedrevs.php: [[:gerrit:514413{{!}}Remove project namespace from flaggedrevs on ruwikisource|gerrit:514413Remove project namespace from flaggedrevs on ruwikisource]] ([[phab:T225037|T225037]]) (duration: 00m 54s)
*10:57 ladsgroup@deploy1001: Synchronized php-1.34.0-wmf.8/extensions/FlaggedRevs: [[gerrit:514456{{!}}Add ext.flaggedRevs.icons to modules registeration|gerrit:514456Add ext.flaggedRevs.icons to modules registeration]] (duration: 00m 57s)
*10:14 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: More traffic to db1135 (duration: 00m 55s)
*10:09 godog: mount sdb3 on ms-be1022 - [[phab:T225079|T225079]]
*09:50 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Pool db1135 with very low weight on s4 (duration: 00m 55s)
*09:26 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Pool without traffic db1135 into s4 [[phab:T225060|T225060]]  (duration: 00m 55s)
*09:25 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Pool without traffic db1135 into s4 [[phab:T225060|T225060]]  (duration: 00m 56s)
*08:42 onimisionipe: removing maps2001 from cassandra cluster. It is going to be reimaged - [[phab:T224395|T224395]]
*08:40 _joe_: rolling restart of php7 on the api servers, to test a different strategy of restarting compared to the appservers.
*08:21 _joe_: performing a rolling restart of the php appservers via cumin to test speed and safety of the operations proposed in [[phab:T224857|T224857]]
*08:13 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
*08:12 jmm@cumin2001: START - Cookbook sre.hosts.downtime
*08:12 moritzm: rebooting pybal-test2001 for tests with new qemu
*08:12 ema: pool cp3035 w/ ATS backend [[phab:T222937|T222937]]
*08:12 marostegui: Reboot db1091 [[phab:T225060|T225060]]
*08:05 moritzm: installing qemu security updates on Ganeti hosts
*07:45 marostegui: Transfer dbprov1001.eqiad.wmnet:snapshot.s4.2019-06-04--21-37-03.tar.gz to db1135 to provision it on s4 [[phab:T225060|T225060]]
*07:33 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Clarify db1091 status (duration: 00m 56s)
*07:22 ema: depool cp3035 and reimage as upload_ats [[phab:T222937|T222937]]
*07:11 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1091 - host went down (duration: 00m 55s)
*06:45 marostegui: Restart MySQL on db2110 to get the binlog format changed to STATEMENT - [[phab:T220170|T220170]]
*06:45 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Promote db2090 to s4 codfw master [[phab:T220170|T220170]] (duration: 00m 54s)
*06:25 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Mimic s4 codfw weights to eqiad [[phab:T220170|T220170]] (duration: 00m 55s)
*06:17 marostegui: Start topology changes on s4 codfw to replace current master db2051 with db2090 - [[phab:T220170|T220170]]
*06:13 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Fully repool db1084 into API (duration: 00m 54s)
*05:57 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1084 after upgrade [[phab:T224852|T224852]] (duration: 00m 55s)
*05:49 marostegui: Upgrade MySQL on db1084 [[phab:T224852|T224852]]
*05:49 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1084 for upgrade [[phab:T224852|T224852]] (duration: 01m 06s)
*05:31 marostegui: Stop MySQL on db1125 (sanitarium) s2,s4,s6,s7 to upgrade mysql - [[phab:T224852|T224852]]
*05:29 marostegui: Keep compressing tables on labsdb1012 - [[phab:T222978|T222978]]
*05:22 marostegui: Change replication topology  on m3 codfw to promote db2065 as codfw master instead of db2042 - [[phab:T221533|T221533]]
*05:07 marostegui: Upgrade Mysql on labsdb1012 - [[phab:T224852|T224852]]
*04:09 onimisionipe: starting postgres slave init on maps2001 - [[phab:T224395|T224395]]


== 2019-05-31 ==
==2019-06-04==
* 21:47 aaron@deploy1001: Synchronized wmf-config/db-eqiad.php: Set "secret" field in $wgLBFactoryConf for ChronologyProtector HMACs (duration: 00m 47s)
* 21:46 aaron@deploy1001: Synchronized wmf-config/db-codfw.php: Set "secret" field in $wgLBFactoryConf for ChronologyProtector HMACs (duration: 00m 50s)
* 21:10 bblack: cp3034: repool - [[phab:T222937|T222937]]
* 20:04 bblack: cp3034: depool for reimage - [[phab:T222937|T222937]]
* 18:44 marostegui: Start MySQL on es1019 - [[phab:T213422|T213422]]
* 18:34 jgleeson: payments-wiki updated from {{Gerrit|a76658f0a3}} to {{Gerrit|c6c7bbf71e}}
* 17:29 andrewbogott: added jeh to the 'ops' group in ldap
* 16:20 ariel@deploy1001: Finished deploy [dumps/dumps@fd6100a]: remove orderrevs config option, unneeded now (duration: 00m 03s)
* 16:20 ariel@deploy1001: Started deploy [dumps/dumps@fd6100a]: remove orderrevs config option, unneeded now
* 15:05 bblack: cp3039: restart varnish-be for mbox lag (likely induced by 3049's depool for ATS conversion!)
* 15:00 Krinkle: krinkle@deploy1001: pulling down {{Gerrit|6f91b41}} for  php-1.34-wmf.7/extensions/ORES (without deploy), commit seems test-only
* 14:59 Krinkle: krinkle@deploy1001: git status in php-1.34-wmf.7/ is dirty (extensions/ORES)
* 14:52 bblack: pool cp3049 back into service - [[phab:T222937|T222937]]
* 14:32 onimisionipe: depool maps2004 (again) - [[phab:T224395|T224395]]
* 14:32 elukey: powercycle notebook1003 - host stuck due to user processes, no ssh available, OOM didn't trigger
* 14:20 _joe_: rolling restart of php-fpm across production to pick up the shorter revalidate frequency for [[phab:T224491|T224491]]
* 14:10 bblack: reboot cp3049 - [[phab:T222937|T222937]]
* 13:16 bblack: depool cp3049 for reimage - [[phab:T222937|T222937]]
* 11:46 jynus: stop and upgrade db2084
* 11:09 jynus@deploy1001: Synchronized wmf-config/db-eqiad.php: Fully repool db1099 after maintenance (duration: 00m 48s)
* 10:54 jynus: depool labsdb1010 for maintenance
* 10:47 arturo: merging multiple commits to labs/private.git. We now require `puppet-merge --labsprivate` and people may not be yet aware of that
* 09:28 jynus: stop and upgrade db2073
* 09:11 jynus: stop and upgrade db2095 (s2, s4, s6, s7)
* 08:33 jynus: upgrade and restart db2065
* 08:16 jynus: depool labsdb1011 for maintenance
* 07:54 jynus@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1099 with low weight (duration: 00m 49s)
* 07:43 _joe_: restarting php-fpm on canaries
* 07:24 _joe_: repooling mw1348
* 07:24 jynus: upgrade and restart labsdb1009
* 07:15 _joe_: draining mw1348 from traffic
* 07:14 jynus: depool labsdb1009 for maintenance
* 06:55 jynus: upgrade and restart db2058
* 06:33 _joe_: repooled mw1348
* 06:21 jijiki: depool mw1348
* 06:16 _joe_: restarting php-fpm on mw1348
* 00:08 jgleeson: Updating civicrm from {{Gerrit|bb4acf3d8a}} to {{Gerrit|e028bfcd63}}


== 2019-05-30 ==
*23:03 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Change log level to debug for PageTriage (duration: 01m 03s)
* 23:36 XioNoX: remove BGP sessions to starhub on cr4-ulsfo (left the IXP)
*22:06 eileen: civicrm revision changed from {{Gerrit|506ebe2f2a}} to {{Gerrit|5c02e62d6e}}, config revision is {{Gerrit|63438eea43}}
* 22:59 marxarelli: deleted 95 docker images from contint1001, freeing ~ 8G on / cc: [[phab:T219850|T219850]]
*21:08 jbond42: finished rolling reboots of mw1* servers
* 22:59 XioNoX: add terms to drop specific icmp frag packets from cr1/2-eqiad - [[phab:T224186|T224186]]
*21:07 jbond42: finished tolling reboots of mw1* servers
* 22:53 marxarelli: deleting stale docker images from contint1001, cc: [[phab:T207707|T207707]] [[phab:T219850|T219850]]
*20:56 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 22:25 mutante: phab2001 / phab1003 - why is 'git status' in /srv/phab/phabricator unclean with lots of file deletions but also not identical
*20:55 jbond@cumin1001: START - Cookbook sre.hosts.downtime
* 22:24 mutante: phab2001 - scap pull - but it fails with directory /srv/mediawiki not found  that's so wrong
*20:36 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 22:20 niharika29@deploy1001: Synchronized php-1.34.0-wmf.7/extensions/WikimediaEvents/: Avoid division by zero warnings [[phab:T224686|T224686]] (duration: 00m 49s)
*20:36 jbond@cumin1001: START - Cookbook sre.hosts.downtime
* 22:19 niharika29@deploy1001: Synchronized php-1.34.0-wmf.7/extensions/PageTriage/: Fix broken feed - [[phab:T224693|T224693]] (duration: 00m 51s)
*20:26 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 21:27 Krinkle: krinkle@mwmaint1002 Add 1 row to pagetriage_tags table on test2wiki db, based on PageTriageTagsPatch-recreated.sql. [[phab:T224693|T224693]], [[phab:T189929|T189929]]
*20:26 jbond@cumin1001: START - Cookbook sre.hosts.downtime
* 21:12 Krinkle: krinkle@mwmaint1002 Add 1 row to pagetriage_tags table on testwiki db, based on PageTriageTagsPatch-recreated.sql. [[phab:T224693|T224693]], [[phab:T189929|T189929]]
*20:10 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 21:11 Krinkle: krinkle@mwmaint1002 Add 1 row to pagetriage_tags table on enwiki, based on PageTriageTagsPatch-recreated.sql. [[phab:T224693|T224693]], [[phab:T189929|T189929]]
*20:10 jbond@cumin1001: START - Cookbook sre.hosts.downtime
* 21:10 niharika29@deploy1001: Synchronized php-1.34.0-wmf.7/extensions/PageTriage: Bump wgPageTriageCacheVersion [[phab:T224693|T224693]] (duration: 00m 51s)
*20:03 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 21:07 XioNoX: add RPKI sessions on cr4-ulsfo - [[phab:T220669|T220669]]
*20:03 jbond@cumin1001: START - Cookbook sre.hosts.downtime
* 20:39 twentyafterfour: phabricator: restart ssh-phab.service
*19:55 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 19:49 mutante: sodium (mirrors) - sudo -u mirror /usr/local/sbin/update-ubuntu-mirror
*19:55 jbond@cumin1001: START - Cookbook sre.hosts.downtime
* 18:49 Urbanecm: Morning SWAT finished
*19:48 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 18:47 urbanecm@deploy1001: Synchronized php-1.34.0-wmf.7/extensions/GrowthExperiments/: [[:gerrit:513300{{!}}QuestionPoster: Correctly set timestamp when question is posted]] ([[phab:T223338|T223338]]) (duration: 00m 51s)
*19:48 jbond@cumin1001: START - Cookbook sre.hosts.downtime
* 18:26 mutante: phab1003 - switch 'vcs' user to 'NP' to match phab1001 setup and then /srv/phab/phabricator# ./bin/config set diffusion.ssh-user vcs ([[phab:T224677|T224677]])
*19:48 XioNoX: replace logstash.svc.eqiad.wmnet syslog target with syslog.codfw.wmnet on cr4-ulsfo - [[phab:T224128|T224128]]
* 18:24 XioNoX: bounce eqord-ulsfo interface to try to fix BFD sessions
*19:41 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 18:12 Krinkle: Running `php7adm /opcache-free`  on mw1348 and mw1321, [[phab:T224491|T224491]]
*19:41 jbond@cumin1001: START - Cookbook sre.hosts.downtime
* 18:12 Krinkle: Running `php7adm /opcache-free`  on mw1348 and mw1321
*19:41 jbond42: reboot mwdebug1002
* 18:11 Krinkle: mw1348 (recent api/php72 100% experiment) shows signs of corruption
*19:36 jbond42: reboot mwdebug1001
* 18:11 Krinkle: mw1321 php7.2 shows signs of corruption for over 2 hours – https://phabricator.wikimedia.org/T224491#5224464
*19:35 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 18:03 krinkle@deploy1001: Synchronized wmf-config/arclamp.php: (no justification provided) (duration: 00m 53s)
*19:35 jbond@cumin1001: START - Cookbook sre.hosts.downtime
* 16:24 bblack: re-pool cp3047 into service as ats-be - [[phab:T222937|T222937]]
*19:28 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 16:04 mutante: phab1001 - removing 2620:0:861:103:10:64:32:186/128 from eth0
*19:28 jbond@cumin1001: START - Cookbook sre.hosts.downtime
* 16:03 mutante: phab1001 - removing 10.64.32.186/32 from eth0
*19:21 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 16:01 mutante: phab1001 - removing git-ssh.wm.org IP from interface - phab1003 - activating IPv6 listen address for git-ssh
*19:21 jbond@cumin1001: START - Cookbook sre.hosts.downtime
* 15:36 jynus: stop es1019 for maintenance [[phab:T213422|T213422]]
*18:45 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 15:26 cmjohnson1: shutting down db1099 to swap DIMM [[phab:T221502|T221502]]
*18:45 jbond@cumin1001: START - Cookbook sre.hosts.downtime
* 15:20 jynus@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1089 with full weight; depool es1019 (duration: 00m 52s)
*18:38 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 15:19 herron: performing rolling reboots of eqiad kafka main cluster hosts for security updates
*18:38 jbond@cumin1001: START - Cookbook sre.hosts.downtime
* 15:06 onimisionipe: pooled maps2004 - osm import is complete - [[phab:T224395|T224395]]
*18:17 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 14:44 andrewbogott: reimaging cloudvirtan1001 for [[phab:T224566|T224566]]
*18:17 jbond@cumin1001: START - Cookbook sre.hosts.downtime
* 14:43 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
*18:10 herron: correction — performing rolling reboots of codfw logstash hardware hosts for MDS security updates
* 14:43 jbond@cumin1001: START - Cookbook sre.hosts.downtime
*18:10 herron: performing rolling reboots of eqiad logstash hardware hosts for MDS security updates
* 14:42 andrewbogott: reimaging cloudvirtan1001
*18:06 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 14:29 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
*18:05 jbond@cumin1001: START - Cookbook sre.hosts.downtime
* 14:29 jbond@cumin1001: START - Cookbook sre.hosts.downtime
*18:04 bblack: pool cp3045 - [[phab:T222937|T222937]]
* 14:25 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
*17:49 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 14:25 jbond@cumin1001: START - Cookbook sre.hosts.downtime
*17:49 jbond@cumin1001: START - Cookbook sre.hosts.downtime
* 14:22 bblack: rebooting cp3047 (post-reimage/puppetization for [[phab:T222937|T222937]])
*17:38 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 14:21 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
*17:38 jbond@cumin1001: START - Cookbook sre.hosts.downtime
* 14:21 jbond@cumin1001: START - Cookbook sre.hosts.downtime
*17:37 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 14:05 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
*17:37 jbond@cumin1001: START - Cookbook sre.hosts.downtime
* 14:05 jbond@cumin1001: START - Cookbook sre.hosts.downtime
*17:33 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 14:00 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
*17:33 jbond@cumin1001: START - Cookbook sre.hosts.downtime
* 14:00 jbond@cumin1001: START - Cookbook sre.hosts.downtime
*17:27 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 13:57 jijiki: enable puppet on mw* in eqiad
*17:27 jbond@cumin1001: START - Cookbook sre.hosts.downtime
* 13:44 volans: rm /root/.ssh/known_hosts on cumin[12]001
*17:21 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 13:40 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
*17:21 jbond@cumin1001: START - Cookbook sre.hosts.downtime
* 13:40 jbond@cumin1001: START - Cookbook sre.hosts.downtime
*16:58 legoktm: deleted some gerrit changes
* 13:36 zfilipin@deploy1001: rebuilt and synchronized wikiversions files: all wikis to 1.34.0-wmf.7
*16:54 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 13:28 jijiki: Enabling puppet on mw*.codfw.net
*16:54 jbond@cumin1001: START - Cookbook sre.hosts.downtime
* 13:22 zfilipin@deploy1001: Synchronized php-1.34.0-wmf.7/resources/src/jquery/jquery.suggestions.js: SWAT: [[gerrit:513237{{!}}jquery.suggestions: Do not show suggestions on prefilled values ([T224524])]] (duration: 00m 58s)
*16:40 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 13:17 filippo@puppetmaster1001: conftool action : set/pooled=no; selector: name=restbase1015.eqiad.wmnet
*16:40 jbond@cumin1001: START - Cookbook sre.hosts.downtime
* 13:17 filippo@puppetmaster1001: conftool action : set/pooled=no; selector: name=restbase1014.eqiad.wmnet
*16:33 robh@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 13:17 filippo@puppetmaster1001: conftool action : set/pooled=no; selector: name=restbase1013.eqiad.wmnet
*16:33 robh@cumin1001: START - Cookbook sre.hosts.decommission
* 13:17 filippo@puppetmaster1001: conftool action : set/pooled=no; selector: name=restbase1012.eqiad.wmnet
*16:33 robh@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 13:17 filippo@puppetmaster1001: conftool action : set/pooled=no; selector: name=restbase1011.eqiad.wmnet
*16:33 robh@cumin1001: START - Cookbook sre.hosts.decommission
* 13:16 filippo@puppetmaster1001: conftool action : set/pooled=no; selector: name=restbase1010.eqiad.wmnet
*16:33 robh@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 13:16 filippo@puppetmaster1001: conftool action : set/pooled=no; selector: name=restbase1009.eqiad.wmnet
*16:33 robh@cumin1001: START - Cookbook sre.hosts.decommission
* 13:16 filippo@puppetmaster1001: conftool action : set/pooled=no; selector: name=restbase1008.eqiad.wmnet
*16:32 marostegui: Compress some more tables on labsdb1012 before upgrading the host tomorrow [[phab:T222978|T222978]]
* 13:16 filippo@puppetmaster1001: conftool action : set/pooled=no; selector: name=restbase1007.eqiad.wmnet
*16:25 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 13:08 bblack: cp3047 puppet-disable + depool for reimage to ATS - [[phab:T222937|T222937]]
*16:25 jbond@cumin1001: START - Cookbook sre.hosts.downtime
* 13:03 marostegui: Stop MySQL on db1099 for onsite maintenance - [[phab:T221502|T221502]]
*16:14 bblack: repool cp3035 (still varnish-be, but freshly installed!)
* 13:02 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1099 [[phab:T221502|T221502]] (duration: 00m 56s)
*16:13 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 13:00 reedy@deploy1001: Synchronized php-1.34.0-wmf.7/tests/phpunit/includes/: [[phab:T222628|T222628]] (duration: 01m 06s)
*16:13 jbond@cumin1001: START - Cookbook sre.hosts.downtime
* 12:58 reedy@deploy1001: Synchronized php-1.34.0-wmf.7/includes/Linker.php: [[phab:T222628|T222628]] (duration: 01m 04s)
*16:12 jbond42: starting rolling reboots of mw1*
* 12:49 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
*16:09 bblack@cumin1001: conftool action : set/pooled=no; selector: name=cp3045.esams.wmnet
* 12:49 jbond@cumin1001: START - Cookbook sre.hosts.downtime
*16:08 bblack: depool cp3045 for reimage - [[phab:T222937|T222937]]
* 12:42 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
*15:56 reedy@deploy1001: Synchronized wmf-config/CommonSettings.php: JADE - [[phab:T212182|T212182]] (duration: 00m 53s)
* 12:42 jbond@cumin1001: START - Cookbook sre.hosts.downtime
*15:55 reedy@deploy1001: Synchronized wmf-config/extension-list: JADE - [[phab:T212182|T212182]] (duration: 00m 53s)
* 12:37 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
*15:52 reedy@deploy1001: Synchronized php-1.34.0-wmf.7/extensions/Jade: Consistency (duration: 01m 08s)
* 12:37 jbond@cumin1001: START - Cookbook sre.hosts.downtime
*15:50 otto@deploy1001: Synchronized wmf-config/CommonSettings.php: Configure eventgate-main EventService. No-op in prod. [[phab:T211248|T211248]] (duration: 01m 19s)
* 12:25 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
*15:41 bblack: reboot cp3035 post-reimage
* 12:25 jbond@cumin1001: START - Cookbook sre.hosts.downtime
*15:27 otto@deploy1001: Synchronized wmf-config/InitialiseSettings-labs.php: Use eventgate-main in beta. No-op in prod. [[phab:T211248|T211248]] (duration: 00m 49s)
* 12:21 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
*15:18 zfilipin@deploy1001: rebuilt and synchronized wikiversions files: group0 to 1.34.0-wmf.8
* 12:21 jbond@cumin1001: START - Cookbook sre.hosts.downtime
*15:13 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 12:15 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
*15:13 jmm@cumin2001: START - Cookbook sre.hosts.downtime
* 12:15 jbond@cumin1001: START - Cookbook sre.hosts.downtime
*15:13 moritzm: draining ganeti1003 for eventual reboot to MDS-enabled Linux kernel
* 12:08 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
*15:13 zfilipin@deploy1001: Finished scap: testwiki to php-1.34.0-wmf.8 and rebuild l10n cache (duration: 29m 46s)
* 12:08 jbond@cumin1001: START - Cookbook sre.hosts.downtime
*15:04 moritzm: failover Ganeti master in eqiad to ganeti1001
* 12:03 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
*14:51 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 12:02 jbond@cumin1001: START - Cookbook sre.hosts.downtime
*14:51 jmm@cumin2001: START - Cookbook sre.hosts.downtime
* 11:57 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
*14:51 bblack: depool cp3035 for ATS reimage - [[phab:T222937|T222937]]
* 11:57 jbond@cumin1001: START - Cookbook sre.hosts.downtime
*14:43 zfilipin@deploy1001: Started scap: testwiki to php-1.34.0-wmf.8 and rebuild l10n cache
* 11:52 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
*14:41 zfilipin@deploy1001: Pruned MediaWiki: 1.34.0-wmf.5 [keeping static files] (duration: 01m 38s)
* 11:52 jbond@cumin1001: START - Cookbook sre.hosts.downtime
*14:39 zfilipin@deploy1001: Pruned MediaWiki: 1.34.0-wmf.4 [keeping static files] (duration: 01m 34s)
* 11:44 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
*14:36 zfilipin@deploy1001: Pruned MediaWiki: 1.34.0-wmf.3 (duration: 11m 02s)
* 11:44 jbond@cumin1001: START - Cookbook sre.hosts.downtime
*13:53 jbond42: restart mtail on lithium
* 11:39 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
*13:46 fsero@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 11:39 jbond@cumin1001: START - Cookbook sre.hosts.downtime
*13:46 fsero@cumin1001: START - Cookbook sre.hosts.decommission
* 11:36 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
*13:30 jbond42: starting rolling reboots of mw1*
* 11:35 jbond@cumin1001: START - Cookbook sre.hosts.downtime
*13:12 moritzm: draining ganeti1008 for eventual reboot to MDS-enabled Linux kernel
* 11:34 akosiaris: reboot ganeti2003 for kernel upgrades
*12:57 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 11:28 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
*12:57 jmm@cumin2001: START - Cookbook sre.hosts.downtime
* 11:28 jbond@cumin1001: START - Cookbook sre.hosts.downtime
*12:22 Urbanecm: ran mwscript deleteBatch.php --wiki=sawikisource -r '[[:phab:T214553{{!}}T214553|phab:T214553T214553]]: deleting useless red
* 11:24 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
*12:13 akosiaris: restart pybal on lvs2003, lvs1015 for sessionstore LVS configuration. [[phab:T220401|T220401]]
* 11:24 jbond@cumin1001: START - Cookbook sre.hosts.downtime
*12:09 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Repool db2046 (duration: 00m 46s)
* 11:20 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
*12:04 akosiaris: restart pybal on lvs2006 for sessionstore LVS configuration. [[phab:T220401|T220401]]
* 11:20 jbond@cumin1001: START - Cookbook sre.hosts.downtime
*11:40 akosiaris: restart pybal on lvs1015 for sessionstore LVS configuration. [[phab:T220401|T220401]]
* 11:14 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
*11:39 krinkle@deploy1001: Synchronized php-1.34.0-wmf.7/includes/: [[phab:T221577|T221577]] / {{Gerrit|1286d131c01886}} (duration: 01m 04s)
* 11:14 jbond@cumin1001: START - Cookbook sre.hosts.downtime
*11:39 jijiki: enabling puppet on mc1*
* 11:14 _joe_: freed opcache on mw1281
*11:38 Urbanecm: run mwscript namespaceDupes.php --wiki=kuwiktionary --fix ([[phab:T224327|T224327]])
* 11:11 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
*11:37 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [[:gerrit:514239{{!}}Custom namespaces for ku.wiktionary|gerrit:514239Custom namespaces for ku.wiktionary]] ([[phab:T224327|T224327]]) (duration: 00m 46s)
* 11:11 jbond@cumin1001: START - Cookbook sre.hosts.downtime
*11:35 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [[:gerrit:507931{{!}}Add localized project logo for sahwikiquote|gerrit:507931Add localized project logo for sahwikiquote]] (2/2, [[phab:T222065|T222065]]) (duration: 00m 47s)
* 11:07 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
*11:34 urbanecm@deploy1001: Synchronized static/images/project-logos/: [[:gerrit:507931{{!}}Add localized project logo for sahwikiquote|gerrit:507931Add localized project logo for sahwikiquote]] (1/2, [[phab:T222065|T222065]]) (duration: 00m 47s)
* 11:07 jbond@cumin1001: START - Cookbook sre.hosts.downtime
*11:31 jijiki: enabling puppet on mc2*
* 11:05 Urbanecm: EU SWAT finished
*11:29 Urbanecm: running mwscript namespaceDupes.php --wiki=sawikisource --add-prefix=[[phab:T214553|T214553]] --fix ([[phab:T214553|T214553]])
* 11:04 urbanecm@deploy1001: Synchronized wmf-config/abusefilter.php: [[:gerrit:Enable abusefilter blocking ability in plwiki]] ([[phab:T224617|T224617]]) (duration: 00m 58s)
*11:28 Urbanecm: run mwscript namespaceDupes.php --wiki=thwiki --fix ([[phab:T216322|T216322]])
* 11:00 jijiki: Disable puppet on mw* servers to merge 507939 - [[phab:T219150|T219150]]
*11:28 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [[:gerrit:486221{{!}}Add Author namespace in Sanskrit Wikisource|gerrit:486221Add Author namespace in Sanskrit Wikisource]] ([[phab:T214553|T214553]]) (duration: 00m 46s)
* 10:42 jynus: upgrade and restart db1117 (temporary proxy fail for passive host, reduced redundancy for m*)
*11:24 urbanecm@deploy1001: Synchronized wmf-config/CommonSettings.php: [[:gerrit:495918{{!}}Create new protection levels for dewiktionary|gerrit:495918Create new protection levels for dewiktionary]] (2/2, [[phab:T216885|T216885]]) (duration: 00m 47s)
* 10:39 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
*11:23 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [[:gerrit:495918{{!}}Create new protection levels for dewiktionary|gerrit:495918Create new protection levels for dewiktionary]] (1/2, [[phab:T216885|T216885]]) (duration: 00m 47s)
* 10:39 jbond@cumin1001: START - Cookbook sre.hosts.downtime
*11:19 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [[:gerrit:494016{{!}}Add editcontentmodel right to the templateeditor group on testwiki|gerrit:494016Add editcontentmodel right to the templateeditor group on testwiki]] ([[phab:T217499|T217499]]) (duration: 00m 47s)
* 10:36 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
*11:15 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [[:gerrit:491054{{!}}Add new namespaces for th.wiki|gerrit:491054Add new namespaces for th.wiki]] ([[phab:T216322|T216322]]) (duration: 00m 47s)
* 10:36 jbond@cumin1001: START - Cookbook sre.hosts.downtime
*11:09 krinkle@deploy1001: Synchronized php-1.34.0-wmf.6/includes/: [[phab:T221577|T221577]] / {{Gerrit|1286d131c01886}} (duration: 01m 07s)
* 10:32 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
*11:02 moritzm: draining ganeti1007 for eventual reboot to MDS-enabled Linux kernel
* 10:32 jbond@cumin1001: START - Cookbook sre.hosts.downtime
*11:02 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 10:28 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
*11:02 jmm@cumin2001: START - Cookbook sre.hosts.downtime
* 10:28 jbond@cumin1001: START - Cookbook sre.hosts.downtime
*10:44 jbond42: mw1* restarts will be delayed untill 11:15
* 10:25 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
*10:42 jbond42: will start rolling reboots of mw1* servers 1t 10:50
* 10:25 jbond@cumin1001: START - Cookbook sre.hosts.downtime
*09:27 moritzm: draining ganeti1006 for eventual reboot to MDS-enabled Linux kernel
* 10:22 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
*09:25 jijiki: disable puppet on mc* hosts to merge 511963 and 511973
* 10:22 jbond@cumin1001: START - Cookbook sre.hosts.downtime
*09:01 moritzm: draining ganeti1005 for eventual reboot to MDS-enabled Linux kernel
* 10:19 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
*08:57 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 10:19 jbond@cumin1001: START - Cookbook sre.hosts.downtime
*08:57 jmm@cumin2001: START - Cookbook sre.hosts.downtime
* 10:18 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
*08:32 elukey: remove memcached nutcracker config from mw1* hosts (not used). Changes will be picked up when nutcracker will be restarted (after reboots, etc..) - [[phab:T214275|T214275]]
* 10:18 jbond@cumin1001: START - Cookbook sre.hosts.downtime
*08:23 moritzm: draining ganeti1004 for eventual reboot to MDS-enabled Linux kernel
* 10:15 jbond@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
*08:04 marostegui: Stop MySQL on db2046 to clone db2058 - [[phab:T221533|T221533]]
* 10:15 jbond@cumin1001: START - Cookbook sre.hosts.downtime
*08:04 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Depool db2046 (duration: 00m 47s)
* 10:07 jynus: upgrade and restart test-s4 hosts (db1111, db1112)
*08:03 elukey: restart hive-server2 on an-coord1001 to pick up new GC/Heap settings
* 09:42 jynus: stop and upgrade db1102
*07:35 mobrovac@deploy1001: Finished deploy [restbase/deploy@abcb534]: Use only Proton for PDF rendering - [[phab:T210651|T210651]] (duration: 19m 16s)
* 09:32 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
*07:22 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 09:32 jbond@cumin1001: START - Cookbook sre.hosts.downtime
*07:21 jmm@cumin2001: START - Cookbook sre.hosts.downtime
* 09:31 _joe_: depooling mw1261 for benchmarking for [[phab:T224491|T224491]]
*07:21 moritzm: draining ganeti1002 for eventual reboot to MDS-enabled Linux kernel
* 09:26 jynus@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1089 with low weight (duration: 00m 55s)
*07:16 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Move db2058 from s4 to s6 (duration: 00m 47s)
* 08:54 jynus: stop and restart db1089 for upgrade
*07:16 mobrovac@deploy1001: Started deploy [restbase/deploy@abcb534]: Use only Proton for PDF rendering - [[phab:T210651|T210651]]
* 08:50 onimisionipe: maps2001 postgres initialization - [[phab:T224395|T224395]]
*06:57 elukey: restart hive metastore on an-coord1001 to apply new GC/heap settings
* 08:44 jynus@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1089 for maintenance (duration: 00m 57s)
*06:42 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Fully repool db1091 after upgrade (duration: 00m 48s)
* 08:32 jynus@deploy1001: Synchronized wmf-config/db-codfw.php: Repool db2087 for maintenance (duration: 01m 00s)
*06:21 elukey: restart pdfrender on scb1002 (flapping)
* 08:10 mobrovac: drop old Parsoid tables from cassandra -- [[phab:T223998|T223998]]
*06:04 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: More traffic to db1091 after upgrade (duration: 00m 47s)
* 07:40 mobrovac@deploy1001: Finished deploy [restbase/deploy@92591a7]: Switch to OpenAPI v3 and drop page/html/title/revision/tid - [[phab:T218218|T218218]] [[phab:T215956|T215956]] (duration: 19m 28s)
*05:54 marostegui: Stop MySQL on db2078:m3 - [[phab:T221533|T221533]]
* 07:33 _joe_: upgraded service-checker on icinga1001,2
*05:51 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Slowly repool db1091 after upgrade (duration: 00m 47s)
* 07:21 mobrovac@deploy1001: Started deploy [restbase/deploy@92591a7]: Switch to OpenAPI v3 and drop page/html/title/revision/tid - [[phab:T218218|T218218]] [[phab:T215956|T215956]]
*05:40 marostegui: Stop MySQL on db1091 for MySQL upgrade [[phab:T224852|T224852]]
* 00:40 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Repool db2091 - [[phab:T224393|T224393]] (duration: 00m 56s)
*05:40 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1091 for upgrade (duration: 00m 48s)
* 00:24 mutante: re-enabling puppet on phab1001 now that it does not have the phab role anymore ([[phab:T221389|T221389]])
*05:27 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1097 after upgrade (duration: 00m 46s)
* 00:17 mutante: rsyncing /srv/repos again. pulling on phab2001 from phab1003 ([[phab:T221389|T221389]])
*05:19 marostegui: Stop MySQL on db1097 for upgrade
*05:19 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1097 for upgrade (duration: 00m 47s)
*04:59 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Remove db1081 from API (duration: 00m 49s)
*01:10 bstorm_: [[phab:T223406|T223406]] depooled/repooled labsdb1009 for view updates
*00:09 bstorm_: [[phab:T223406|T223406]] repooled labsdb1011 after completing view updates


== 2019-05-29 ==
==2019-06-03==
* 23:37 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Remove wikibase sameAs A/B test config, part II (duration: 00m 56s)
* 23:36 jforrester@deploy1001: sync-file aborted: Remove wikibase sameAs A/B test config, part I (duration: 00m 00s)
* 23:35 jforrester@deploy1001: Synchronized wmf-config/Wikibase.php: Remove wikibase sameAs A/B test config, part I (duration: 00m 56s)
* 23:26 jforrester@deploy1001: Synchronized php-1.34.0-wmf.7/extensions/AbuseFilter/includes/parser/AbuseFilterTokenizer.php: SWAT AbuseFilter: Tokenizer caching back to APC {{Gerrit|I8c6a4a95e}} (duration: 00m 54s)
* 23:19 jforrester@deploy1001: Synchronized wmf-config/flaggedrevs.php: Replace FR constants with numbers {{Gerrit|Ia52f644948}} (duration: 00m 56s)
* 23:17 jforrester@deploy1001: Synchronized multiversion/MWScript.php: Mark refreshMessageBlobs.php as a global script (duration: 00m 56s)
* 23:15 mutante: repooled phab2001-vcs , fixes pybal / lvs alerts
* 23:13 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=phab2001-vcs.codfw.wmnet
* 23:10 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT Enable wgSpecialSearchFormOptions on production Wikidata [[phab:T55652|T55652]] (duration: 00m 57s)
* 23:01 mutante: phab2001 - same issue with tin.eqiad.wmnet still showing up when first trying to git clone
* 22:52 mutante: misweb2001 - a2dismod mpm_event ; systemctl restart apache2 to fix php7.0 dependency issue
* 22:50 mutante: miscweb2001 - when first trying to git pull iegreview - still tries to resolve 'tin.eqiad.wmnet' which is long gone. fix is still to manually edit /srv/deployment/iegreview/iegreview-cache/cache/.git/config
* 22:46 jforrester@deploy1001: Synchronized wmf-config/CirrusSearch-common.php: Hot-deploy [[phab:T224634|T224634]] to fix CirrusSearch for extension registration (duration: 00m 57s)
* 21:50 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=phab2001-vcs.codfw.wmnet
* 21:47 mutante: installing OS on miscweb2001 VM failed at grub install step :( [[phab:T224323|T224323]]
* 21:47 mutante: sign puppet cert request for phab2001 after reinstall (for some reason it needed me to connect to console and hit enter, reimage script itself was stuck)
* 20:54 mutante: creating new ganeti VM miscweb2001.codfw.wmnet with same specs as krypton.eqiad.wmnet ([[phab:T224323|T224323]])
* 20:35 arlolra: Updated Parsoid to {{Gerrit|8546c79}} ([[phab:T219927|T219927]], [[phab:T211125|T211125]])
* 20:35 ejegg: updated payments-wiki from {{Gerrit|332aaa96e2}} to {{Gerrit|45b73e7749}}
* 20:28 arlolra@deploy1001: Finished deploy [parsoid/deploy@6caac43]: Updating Parsoid to {{Gerrit|8546c79}} (duration: 07m 46s)
* 20:20 arlolra@deploy1001: Started deploy [parsoid/deploy@6caac43]: Updating Parsoid to {{Gerrit|8546c79}}
* 20:10 bblack: pool cp3044 (esams cache_upload ats-be) - [[phab:T222937|T222937]]
* 19:46 reedy@deploy1001: Synchronized php-1.34.0-wmf.7/extensions/Collection/: Replace missing wfCollectionSuggestAction (duration: 00m 57s)
* 19:45 XioNoX: enable cr1-codfw:et-0/2/1 - [[phab:T224511|T224511]]
* 19:45 reedy@deploy1001: Synchronized php-1.34.0-wmf.6/extensions/Collection/: Replace missing wfCollectionSuggestAction (duration: 01m 01s)
* 19:42 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=phab2001-vcs.codfw.wmnet
* 19:32 mutante: phab2001 - reinstalling with stretch - upgrade from jessie ([[phab:T190568|T190568]])
* 19:09 XioNoX: enable cr1-codfw:et-0/2/0 - [[phab:T224511|T224511]]
* 18:37 bblack@cumin1001: conftool action : set/pooled=no; selector: name=cp3044.esams.wmnet
* 17:44 XioNoX: enable cr1-codfw:et-0/0/1 - [[phab:T224511|T224511]]
* 17:13 XioNoX: enable cr1-codfw:et-0/0/0 - [[phab:T224511|T224511]]
* 17:02 urbanecm@deploy1001: Synchronized wmf-config/CommonSettings.php: [[:gerrit:501926{{!}}Change arwiki default user preferences]], part 3/3 ([[phab:T220186|T220186]]) (duration: 00m 56s)
* 17:00 urbanecm@deploy1001: Synchronized wmf-config/flaggedrevs.php: [[:gerrit:501926{{!}}Change arwiki default user preferences]], part 2/3 ([[phab:T220186|T220186]]) (duration: 00m 56s)
* 16:59 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [[:gerrit:501926{{!}}Change arwiki default user preferences]], part 1/3 ([[phab:T220186|T220186]]) (duration: 00m 56s)
* 16:48 sbisson@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:512942]] Revert: Hardcode korean help desk config (duration: 00m 56s)
* 16:45 sbisson@deploy1001: Synchronized php-1.34.0-wmf.7/extensions/GrowthExperiments/includes/HelpPanel.php: SWAT: [[gerrit:512941]] Prevent parsing of GEHelpPanelHelpDeskTitle from accessing the session (duration: 00m 56s)
* 16:42 sbisson@deploy1001: Synchronized php-1.34.0-wmf.6/extensions/GrowthExperiments/includes/HelpPanel.php: SWAT: [[gerrit:512940]] Prevent parsing of GEHelpPanelHelpDeskTitle from accessing the session (duration: 01m 00s)
* 16:32 sbisson@deploy1001: Synchronized php-1.34.0-wmf.6/extensions/GrowthExperiments/includes/HelpPanel/QuestionRecord.php: SWAT: [[gerrit:512950]] Revert: Fix phan job: ignore line using JsonSerializable (duration: 00m 57s)
* 16:08 gehel@cumin1001: END (PASS) - Cookbook sre.elasticsearch.rolling-reboot (exit_code=0)
* 15:55 jynus: upgrade and restart db2087
* 15:11 moritzm: draining ganeti2008 for eventual reboot to pick up MDS-enabled kernel
* 15:06 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 15:06 jmm@cumin2001: START - Cookbook sre.hosts.downtime
* 15:06 anomie@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Setting actor migration to write-new/read-new on group 1 ([[phab:T188327|T188327]]) (duration: 00m 57s)
* 14:54 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 14:54 jmm@cumin2001: START - Cookbook sre.hosts.downtime
* 14:54 moritzm: draining ganeti2007 for eventual reboot to pick up MDS-enabled kernel
* 14:51 XioNoX: `request chassis fpc online slot 0` on cr1-codfw - [[phab:T224511|T224511]]
* 14:48 XioNoX: `request chassis fpc offline slot 0` on cr1-codfw - [[phab:T224511|T224511]]
* 14:47 XioNoX: disable et- interfaces on cr1-codfw - [[phab:T224511|T224511]]
* 14:45 andrewbogott: reimaging cloudcontrol1003 [[phab:T221770|T221770]]
* 14:34 moritzm: draining ganeti2006 for eventual reboot to pick up MDS-enabled kernel
* 14:34 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 14:34 jmm@cumin2001: START - Cookbook sre.hosts.downtime
* 14:32 andrewbogott: powering off cloudcontrol1003 as one last check to see what explodes before I reimage it
* 14:30 _joe_: installing the new service checker on restbase in eqiad
* 14:29 _joe_: installing new service checker version on restbase in codfw
* 14:20 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 14:20 jbond@cumin1001: START - Cookbook sre.hosts.downtime
* 14:09 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 14:09 jbond@cumin1001: START - Cookbook sre.hosts.downtime
* 14:01 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 14:01 jbond@cumin1001: START - Cookbook sre.hosts.downtime
* 13:58 gehel@cumin1001: START - Cookbook sre.elasticsearch.rolling-reboot
* 13:58 gehel@cumin1001: END (ERROR) - Cookbook sre.elasticsearch.rolling-reboot (exit_code=97)
* 13:54 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 13:54 jbond@cumin1001: START - Cookbook sre.hosts.downtime
* 13:48 urandom: decommissioning restbase1015-c -- [[phab:T223976|T223976]]
* 13:35 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 13:35 jmm@cumin2001: START - Cookbook sre.hosts.downtime
* 13:19 zfilipin@deploy1001: Synchronized php: group1 wikis to 1.34.0-wmf.7 (duration: 00m 58s)
* 13:18 zfilipin@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.34.0-wmf.7
* 13:16 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 13:16 jbond@cumin1001: START - Cookbook sre.hosts.downtime
* 13:12 Urbanecm: mwscript emptyUserGroup.php --wiki=fawiki 'uploader' finished ([[phab:T221441|T221441]])
* 13:06 andrewbogott: stopping openstack services on cloudcontrol1003 in anticipation of a re-image
* 13:03 gehel@cumin1001: START - Cookbook sre.elasticsearch.rolling-reboot
* 13:02 gehel@cumin1001: END (ERROR) - Cookbook sre.elasticsearch.rolling-reboot (exit_code=97)
* 13:02 gehel@cumin1001: END (PASS) - Cookbook sre.elasticsearch.force-shard-allocation (exit_code=0)
* 13:02 gehel@cumin1001: START - Cookbook sre.elasticsearch.force-shard-allocation
* 13:01 gehel@cumin1001: END (PASS) - Cookbook sre.elasticsearch.force-shard-allocation (exit_code=0)
* 13:01 gehel@cumin1001: START - Cookbook sre.elasticsearch.force-shard-allocation
* 13:00 gehel@cumin1001: START - Cookbook sre.elasticsearch.rolling-reboot
* 12:42 Zppix: [12:27:02]  jbond@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 12:41 Zppix: [12:27:02] jbond@cumin1001 START - Cookbook sre.hosts.downtime
* 12:40 Zppix: [12:23:06] <jijiki> Rolling restart pdfrender on scb*
* {{safesubst:SAL entry|1=12:39 Zppix: [[12:20:49]  jbond@cumin1001 START - Cookbook sre.hosts.downtime}}
* 12:39 Zppix: [12:20:49] jbond@cumin1001 START - Cookbook sre.hosts.downtime
* 12:38 Zppix: [12:11:55] jbond@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 12:38 Zppix: [12:11:54] jbond@cumin1001 START - Cookbook sre.hosts.downtime
* 12:37 Zppix: [12:01:54] jbond@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0
* 12:36 Zppix: [12:01:54] jbond@cumin1001 START - Cookbook sre.hosts.downtime
* 12:36 Zppix: [12:00:21] marostegui@deploy1001 Synchronized wmf-config/db-eqiad.php: Remove db2037 from config as it will be decommissioned [[phab:T221533|T221533]] (duration: 00m 56s)
* 12:35 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 12:35 jbond@cumin1001: START - Cookbook sre.hosts.downtime
* 12:34 Zppix: [11:59:19] marostegui@deploy1001 Synchronized wmf-config/db-codfw.php: Remove db2037 from config as it will be decommissioned [[phab:T221533|T221533]]
* 12:33 Zppix: [11:58:16] <arturo> [[phab:T221770|T221770]] icinga downtime cloudcontrol1003.wikimedia.org for upcoming rebuild as stretch
* 12:32 Zppix: [11:57:57] aborrero@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 12:32 Zppix: [11:57:55] aborrero@cumin1001 START - Cookbook sre.hosts.downtime
* 12:31 Zppix: [11:55:54] <Urbanecm> EU SWAT finished, maintenance script emptyUserGroup.php still running in separate tmux session
* 12:31 Zppix: [11:55:11] urbanecm@deploy1001 Synchronized wmf-config/InitialiseSettings.php: [[:gerrit:511849{{!}}Set wgLocaltimezone for euwiki to Europe/Berlin]] ([[phab:T224091|T224091]]) (duration: 00m 57s)
* 12:30 Zppix: [11:55:10] jbond@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 12:29 Zppix: [11:55:09]  jbond@cumin1001 START - Cookbook sre.hosts.downtime
* 11:55 jbond@cumin1001: START - Cookbook sre.hosts.downtime
* 11:50 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [[:gerrit:471260{{!}}RSS: Update URLs to the old Wikimedia Foundation blog to point to the new site]] ([[phab:T208458|T208458]]) (duration: 00m 57s)
* 11:46 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 11:46 jbond@cumin1001: START - Cookbook sre.hosts.downtime
* 11:46 gehel@cumin1001: END (ERROR) - Cookbook sre.elasticsearch.rolling-reboot (exit_code=97)
* 11:45 Urbanecm: Started mwscript emptyUserGroup.php --wiki=fawiki 'uploader' ([[phab:T221441|T221441]])
* 11:44 urbanecm@deploy1001: Synchronized dblists/commonsuploads.dblist: [[:gerrit:505228{{!}}Remove uploader user group from fawiki and merge it with autoconfirmed]], part 2 ([[phab:T221441|T221441]]) (duration: 00m 55s)
* 11:43 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [[:gerrit:505228{{!}}Remove uploader user group from fawiki and merge it with autoconfirmed]], part 1 ([[phab:T221441|T221441]]) (duration: 00m 55s)
* 11:40 Urbanecm: Purged angwikibooks HD logos
* 11:38 urbanecm@deploy1001: Synchronized static/images/project-logos/: [[:gerrit:512433{{!}}Add HD logo for angwikibooks]], logo files ([[phab:T150618|T150618]]) (duration: 00m 56s)
* 11:35 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [[:gerrit:512478{{!}}Enable transwiki import between sqwiki and sqwikiquote]] ([[phab:T221234|T221234]]) (duration: 00m 56s)
* 11:31 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 11:31 jbond@cumin1001: START - Cookbook sre.hosts.downtime
* 11:30 pmiazga@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:509130 Enable Advanced Mobile Contributions Overflow menu (T223883)]] (duration: 00m 57s)
* 11:18 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [[:gerrit:512488{{!}}Remove bureaucrat protection level for all Serbian projects]] ([[phab:T217005|T217005]]) (duration: 00m 57s)
* 11:14 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [[:gerrit:512487{{!}}Fix Serbian projects wgRestrictionLevels]] ([[phab:T217005|T217005]]) (duration: 00m 57s)
* 11:11 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 11:11 jbond@cumin1001: START - Cookbook sre.hosts.downtime
* 11:08 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [[:gerrit:506892{{!}}Add namespace aliases on zhwiktionary]] ([[phab:T222024|T222024]]) (duration: 00m 57s)
* 11:02 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 11:02 jbond@cumin1001: START - Cookbook sre.hosts.downtime
* 10:59 gehel@cumin1001: START - Cookbook sre.elasticsearch.rolling-reboot
* 10:57 jynus@deploy1001: Synchronized wmf-config/db-codfw.php: Depool db2087 for  maintenance (duration: 01m 11s)
* 10:57 Urbanecm: deleteBatch.php for srwikinews finished ([[phab:T212346|T212346]])
* 10:48 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 10:48 jmm@cumin2001: START - Cookbook sre.hosts.downtime
* 10:33 mobrovac@deploy1001: Finished deploy [restbase/deploy@92591a7] (dev-cluster): Switch to OpenAPI v3 (duration: 03m 36s)
* 10:29 mobrovac@deploy1001: Started deploy [restbase/deploy@92591a7] (dev-cluster): Switch to OpenAPI v3
* 09:51 gehel@cumin1001: END (ERROR) - Cookbook sre.elasticsearch.rolling-reboot (exit_code=97)
* 09:45 _joe_: uploading a new service-checker version to jessie-wikimedia
* 09:18 gehel@cumin1001: START - Cookbook sre.elasticsearch.rolling-reboot
* 08:51 moritzm: draining ganeti2002 for eventual reboot to pick up MDS-enabled kernel
* 08:31 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 08:31 jmm@cumin2001: START - Cookbook sre.hosts.downtime
* 08:31 moritzm: draining ganeti2001 for eventual reboot to pick up MDS-enabled kernel
* 07:42 mobrovac: decommission restbase1015-b -- [[phab:T223976|T223976]]
* 07:40 godog: ms-be2043 start sdd rebuild - [[phab:T222654|T222654]]
* 07:03 jijiki: restarting pdfrender on scb1003


== 2019-05-28 ==
*22:20 bstorm_: [[phab:T223406|T223406]] depooled labsdb1011
* 23:19 jforrester@deploy1001: Synchronized php-1.34.0-wmf.7/extensions/TimedMediaHandler/includes/ApiTimedText.php: [[phab:T224522|T224522]] Fix fatal in ApiTimedText following redirect pages (duration: 00m 56s)
*22:09 bstorm_: [[phab:T223406|T223406]] repooled labsdb1010 after completing view updates
* 23:17 jforrester@deploy1001: Synchronized php-1.34.0-wmf.7/extensions/TimedMediaHandler/includes/handlers/TextHandler/TextHandler.php: [[phab:T224367|T224367]] Fix regression in subtitles for non-English sites on Commons videos (duration: 00m 57s)
*21:29 XioNoX: drop all ICMP frag on all routers - [[phab:T224186|T224186]]
* 23:17 bstorm_: [[phab:T221339|T221339]] completed view updates on labsdb1009 without depooling
*19:57 XioNoX: stop sampling from cr2-eqiad
* 23:16 jforrester@deploy1001: Synchronized php-1.34.0-wmf.6/extensions/TimedMediaHandler/includes/handlers/TextHandler/TextHandler.php: [[phab:T224367|T224367]] Fix regression in subtitles for non-English sites on Commons videos (duration: 00m 56s)
*18:48 XioNoX: Add RPKI validators to all routers - [[phab:T220669|T220669]]
* 23:14 jforrester@deploy1001: Synchronized php-1.34.0-wmf.6/extensions/TimedMediaHandler/includes/ApiTimedText.php: [[phab:T224522|T224522]] Fix fatal in ApiTimedText following redirect pages (duration: 00m 58s)
*18:35 hashar: switch most Quibble jobs to node 10 [[phab:T222406|T222406]] - ttps://gerrit.wikimedia.org/r/#/c/integration/config/+/514034/ [[phab:T222406|T222406]]
* 23:11 jforrester@deploy1001: Synchronized wmf-config/flaggedrevs.php: FlaggedRevisions: Copy in rest of the config, for static registration {{Gerrit|I77d70519f}} {{Gerrit|Id0cd2e18c}} (duration: 00m 56s)
*18:35 XioNoX: drop all ICMP frag on cr1/2-eqiad - [[phab:T224186|T224186]]
* 23:10 bstorm_: [[phab:T221339|T221339]] repooled labsdb1011
*18:17 XioNoX: add routinator 0.4.0 to APT repo - [[phab:T220669|T220669]]
* 23:06 jforrester@deploy1001: Synchronized wmf-config/throttle.php: Remove expired throttle rules {{Gerrit|I4ba3d489}} (duration: 00m 55s)
*17:16 onimisionipe@deploy1001: Finished deploy [wdqs/wdqs@9e3035c]: Blazegraph version wmf.4 (duration: 11m 29s)
* 23:06 bstorm_: [[phab:T221339|T221339]] depooled labsdb1011 and updated views
*17:05 onimisionipe@deploy1001: Started deploy [wdqs/wdqs@9e3035c]: Blazegraph version wmf.4
* 23:02 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT [[phab:T55652|T55652]] Enable wgSpecialSearchFormOptions on testwikidata (duration: 00m 56s)
*16:40 onimisionipe: started osm-import on maps2004 - [[phab:T224395|T224395]]
* 22:49 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: SWAT Fix order of edit tabs for multi-tabs on SET wikis [[phab:T223793|T223793]] (duration: 00m 57s)
*16:30 bstorm_: [[phab:T223406|T223406]] depooled labsdb1010 for view updates
* 22:28 cstone_: Re-enabled fundraising thank you mail job
*15:39 bstorm_: [[phab:T223406|T223406]] labsdb1012 updated views for actor table changes
* 22:25 mutante: cp3034 - sudo -i varnish-backend-restart
*14:46 akosiaris: deploy kask in sessionstore kubernetes namespace in eqiad, codfw [[phab:T220401|T220401]]
* 22:18 cstone_: Updated fundraising civicrm from {{Gerrit|21afd001b6}} to {{Gerrit|bb4acf3d8a}}
*14:34 arturo: [[phab:T221769|T221769]] reimaging cloudservices1003 to stretch
* 22:14 mutante: cp3035 - varnish-backend-restart
*14:20 vgutierrez: upgrading acme-chief to version 0.17 in acme-chief production instances - [[phab:T220518|T220518]]
* 22:13 bstorm_: repooled labsdb1010
*13:53 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 22:09 mutante: cp3034 - restart varnish backend
*13:53 jmm@cumin2001: START - Cookbook sre.hosts.downtime
* 22:09 XioNoX: restart varnish backend on cp3039
*13:53 moritzm: draining ganeti1001 for eventual reboot to MDS-enabled Linux kernel
* 22:02 cstone_: Disabled fundraising thank you mail job
*13:44 mholloway-shell@deploy1001: Synchronized wmf-config/InitialiseSettings.php: WikimediaEditorTasks: Drop caption edit counter unlock delay to 0 (duration: 00m 49s)
* 21:46 bstorm_: depool labsdb1010 for view updates
*13:38 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Pool db1138 into s4 API (duration: 00m 48s)
* 21:38 smalyshev@deploy1001: Finished deploy [wdqs/wdqs@5a69072]: Deploy GUI & Blazegraph update (duration: 14m 37s)
*13:19 marostegui: Move db2078:3321 under db2062 [[phab:T220170|T220170]]
* 21:35 urandom: decommissioning restbase1015-a -- [[phab:T223976|T223976]]
*13:03 arturo: add prometheus-pdns-rec-exporter v0.7 to stretch-wikimedia ([[phab:T224877|T224877]])
* 21:24 smalyshev@deploy1001: Started deploy [wdqs/wdqs@5a69072]: Deploy GUI & Blazegraph update
*12:56 anomie@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Setting actor migration to write-new/read-new on remaining wikis ([[phab:T188327|T188327]]) (duration: 00m 48s)
* 21:23 ebernhardson: restart elasticsearch on cloudelastic1001 to test sanely sized readahead on /dev/dm-0
*12:24 arturo: add prometheus-pdns-exporter v0.4 to stretch-wikimedia ([[phab:T224877|T224877]])
* 21:11 gehel@cumin2001: END (PASS) - Cookbook sre.elasticsearch.rolling-reboot (exit_code=0)
*11:28 gehel: reboot relforge for microcode + jvm upgrade
* 20:58 mutante: phab1003 / phab2001 - removing 'apache restart' from root's crontab (gerrit:512977) ([[phab:T187790|T187790]])
*11:17 jijiki: Restarting php7.2-fpm in eqiad in batches of 2 for 513949
* 20:28 mholloway-shell@deploy1001: Synchronized wmf-config/InitialiseSettings.php: WikimediaEditorTasks: Update caption edit target counts (duration: 00m 57s)
*11:15 Urbanecm: EU SWAT done
* 19:17 gehel@cumin2001: START - Cookbook sre.elasticsearch.rolling-reboot
*11:15 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [[:gerrit:513740{{!}}Add Wikiprojekti namespace to wgExtraSignatureNamespaces for fiwiki|gerrit:513740Add Wikiprojekti namespace to wgExtraSignatureNamespaces for fiwiki]] ([[phab:T224215|T224215]]) (duration: 00m 47s)
* 19:15 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Remove db1064 from config as it will be decommissioned [[phab:T223217|T223217]] (duration: 00m 55s)
*11:11 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [[:gerrit:503680{{!}}Add 5 active namespaces for VisualEditor on en.wikiversity|gerrit:503680Add 5 active namespaces for VisualEditor on en.wikiversity]] ([[phab:T220881|T220881]]) (duration: 00m 48s)
* 19:14 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Remove db1064 from config as it will be decommissioned [[phab:T223217|T223217]] (duration: 00m 56s)
*11:06 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [[:gerrit:513720{{!}}Add "Zerrenda" (list) namespace to VisualEditor on euwiki|gerrit:513720Add "Zerrenda" (list) namespace to VisualEditor on euwiki]] ([[phab:T224801|T224801]]) (duration: 00m 48s)
* 19:02 marostegui: Reboot db2091 for full OS and MySQL upgrade - [[phab:T224393|T224393]]
*10:52 moritzm: upgrading maps servers to new Java security release
* 18:55 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Stop setting wgMediaInfoEnableFilePageDepicts, no longer read (duration: 00m 57s)
*10:47 moritzm: upgrading WDQS servers to new Java security release
* 18:51 jforrester@deploy1001: Synchronized wmf-config/Wikibase.php: Add forwards-compatibility for dataCdnMaxAge (duration: 01m 00s)
*10:42 vgutierrez: upgrading prometheus-trafficserver-exporter in upload_ats ulsfo instances
* 18:11 marostegui: Start mysql for s2 and s4 on db2091 [[phab:T224393|T224393]]
*10:41 jdrewniak@deploy1001: Synchronized portals: Wikimedia Portals Update: [[gerrit:513972{{!}} Bumping portals to master (T128546)|gerrit:513972 Bumping portals to master (T128546)]] (duration: 00m 47s)
* 17:57 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
*10:40 jdrewniak@deploy1001: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: [[gerrit:513972{{!}} Bumping portals to master (T128546)|gerrit:513972 Bumping portals to master (T128546)]] (duration: 00m 49s)
* 17:57 jbond@cumin1001: START - Cookbook sre.hosts.downtime
*10:36 jijiki: Restarting php7.2-fpm in codfw in batches of 2 for 513949
* 17:52 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
*10:34 moritzm: upgrading Elastic servers to new Java security release
* 17:52 jmm@cumin2001: START - Cookbook sre.hosts.downtime
*10:26 bmansurov@deploy1001: Finished deploy [recommendation-api/deploy@5046f3c]: Update the recommendation API service (duration: 03m 15s)
* 17:48 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
*10:23 bmansurov@deploy1001: Started deploy [recommendation-api/deploy@5046f3c]: Update the recommendation API service
* 17:48 jbond@cumin1001: START - Cookbook sre.hosts.downtime
*10:03 oblivian@puppetmaster1001: conftool action : set/pooled=false; selector: name=codfw,dnsdisc=kartotherian
* 17:42 moritzm: rebooting yubiauth* servers for kernel update
*10:02 oblivian@puppetmaster1001: conftool action : set/pooled=true; selector: name=eqiad,dnsdisc=kartotherian
* 17:41 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
*09:48 onimisionipe: depooled maps codfw due to lag and disk issues - [[phab:T224395|T224395]]
* 17:41 jmm@cumin2001: START - Cookbook sre.hosts.downtime
*09:46 moritzm: upgrading Druid/Kafka-Jumbo servers to new Java security release (will be picked up by forthcoming MDS reboots)
* 17:35 mholloway-shell@deploy1001: Finished deploy [mobileapps/deploy@0735c45]: Update mobileapps to {{Gerrit|ab67b78}} (duration: 05m 56s)
*09:43 moritzm: upgrading AQS servers to new Java security release (will be picked up by forthcoming MDS reboots)
* 17:34 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
*09:33 moritzm: upgrading Hadoop servers to new Java security release (will be picked up by forthcoming MDS reboots)
* 17:34 jbond@cumin1001: START - Cookbook sre.hosts.downtime
*08:18 ema: cp1077: restart varnish-be
* 17:29 mholloway-shell@deploy1001: Started deploy [mobileapps/deploy@0735c45]: Update mobileapps to {{Gerrit|ab67b78}}
*08:17 elukey: manually removed phab_clean_tmp from www-data's crontab on phab1001 to reduce cronspam
* 17:26 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
*08:16 ema: cp1075: restart varnish-be
* 17:26 jbond@cumin1001: START - Cookbook sre.hosts.downtime
*08:03 marostegui: Stop MySQL on db1064 [[phab:T223217|T223217]]
* 17:11 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
*08:01 marostegui: Remove db1064 from tendril and zarcillo [[phab:T223217|T223217]]
* 17:11 jbond@cumin1001: START - Cookbook sre.hosts.downtime
*07:58 elukey: refresh field list for logstash (via kibana Management -> Index patterns -> etc..)
* 17:03 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
*07:48 marostegui: Repool db1103 after upgrade [[phab:T224852|T224852]]
* 17:03 jbond@cumin1001: START - Cookbook sre.hosts.downtime
*07:29 marostegui: Stop MySQL on db1103 (s2 and s4) for upgrade [[phab:T224852|T224852]]
* 16:57 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
*07:29 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1103 for upgrade (duration: 00m 47s)
* 16:57 jbond@cumin1001: START - Cookbook sre.hosts.downtime
*07:10 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Fully repool db1081 into API after upgrade (duration: 00m 48s)
* 16:49 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
*06:50 elukey: roll restart varnishkafka (via puppet) for a config change - [[phab:T224236|T224236]]
* 16:49 jbond@cumin1001: START - Cookbook sre.hosts.downtime
*06:46 kartik@deploy1001: scap-helm cxserver finished
* 16:41 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
*06:46 kartik@deploy1001: scap-helm cxserver cluster eqiad completed
* 16:41 jbond@cumin1001: START - Cookbook sre.hosts.downtime
*06:45 kartik@deploy1001: scap-helm cxserver upgrade -f cxserver-eqiad-values.yaml production stable/cxserver [namespace: cxserver, clusters: eqiad]
* 16:35 hoo: Ran scap pull on mw1240 (curl -H 'Host: www.wikidata.org' … mw1240.eqiad.wmnet/wiki/Special:SetEntitySchemaLabelDescriptionAliases/E10/en returned 404)
*06:45 kartik@deploy1001: scap-helm cxserver finished
* 16:27 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
*06:45 kartik@deploy1001: scap-helm cxserver cluster codfw completed
* 16:27 jbond@cumin1001: START - Cookbook sre.hosts.downtime
*06:45 kartik@deploy1001: scap-helm cxserver upgrade -f cxserver-codfw-values.yaml production stable/cxserver [namespace: cxserver, clusters: codfw]
* 16:20 Lucas_WMDE: lucaswerkmeister-wmde@mw1271:~$ scap pull
*06:44 kartik@deploy1001: scap-helm cxserver finished
* 16:17 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
*06:44 kartik@deploy1001: scap-helm cxserver cluster staging completed
* 16:17 jmm@cumin2001: START - Cookbook sre.hosts.downtime
*06:44 kartik@deploy1001: scap-helm cxserver upgrade -f cxserver-staging-values.yaml staging stable/cxserver [namespace: cxserver, clusters: staging]
* 16:15 moritzm: rearmed keyholder on deploy2001 following reboot
*06:39 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Slowly repool db1081 into API after upgrade (duration: 00m 49s)
* 16:09 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
*06:26 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1081 after upgrade (duration: 00m 46s)
* 16:09 jmm@cumin2001: START - Cookbook sre.hosts.downtime
*06:04 marostegui: Stop MySQL on db1081 for upgrade - [[phab:T224852|T224852]]
* 16:09 jmm@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
*06:00 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1081 for upgrade (duration: 00m 47s)
* 16:09 jmm@cumin2001: START - Cookbook sre.hosts.downtime
*05:50 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Fully repool es1019 [[phab:T213422|T213422]] (duration: 00m 46s)
* 16:04 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
*05:45 marostegui: Upgrade mariadb on dbstore1004 - [[phab:T224852|T224852]]
* 16:04 jbond@cumin1001: START - Cookbook sre.hosts.downtime
*05:17 marostegui: Upgrade MariaDB on codfw hosts in preparation for s4 master failover [[phab:T217396|T217396]]
* 15:56 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
*05:15 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: More traffic to es1019 [[phab:T213422|T213422]] (duration: 00m 46s)
* 15:56 jbond@cumin1001: START - Cookbook sre.hosts.downtime
*05:05 marostegui: Remove db2037 from tendril and zarcillo [[phab:T224720|T224720]]
* 15:54 papaul: shutting down db2091 for firmware upgrade
*05:04 marostegui: Stop MySQL on db2037 for decommission [[phab:T224720|T224720]]
* 15:53 godog: put back wrongly-replaced sdf on ms-be2043 - [[phab:T222654|T222654]]
*04:56 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Slowly repool es1019 [[phab:T213422|T213422]] (duration: 00m 51s)
* 15:42 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 15:42 jbond@cumin1001: START - Cookbook sre.hosts.downtime
* 15:42 Lucas_WMDE: Extension:EntitySchema deployment finished successfully
* 15:38 Lucas_WMDE: lucaswerkmeister-wmde@mwmaint1002:~$ mwscript extensions/EntitySchema/maintenance/createPreexistingSchemas.php --wiki=wikidatawiki
* 15:38 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [[gerrit:512909{{!}}Enable extension EntitySchema in production]] (duration: 00m 56s)
* 15:35 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 15:35 jbond@cumin1001: START - Cookbook sre.hosts.downtime
* 15:34 lucaswerkmeister-wmde@deploy1001: Synchronized php-1.34.0-wmf.6/extensions/EntitySchema/: [[gerrit:512911{{!}}Steal maintenance script user]] (duration: 00m 58s)
* 15:26 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 15:26 jbond@cumin1001: START - Cookbook sre.hosts.downtime
* 15:17 Lucas_WMDE: lucaswerkmeister-wmde@mwmaint1002:~$ mwscript extensions/EntitySchema/maintenance/createPreexistingSchemas.php --wiki=testwikidatawiki
* 15:17 lucaswerkmeister-wmde@deploy1001: Synchronized php-1.34.0-wmf.7/extensions/EntitySchema/: [[gerrit:512912{{!}}Steal maintenance script user]] – forgot `git submodule update` before previous sync (duration: 00m 57s)
* 15:15 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 15:15 jbond@cumin1001: START - Cookbook sre.hosts.downtime
* 15:11 lucaswerkmeister-wmde@deploy1001: Synchronized php-1.34.0-wmf.7/extensions/EntitySchema/: [[gerrit:512912{{!}}Steal maintenance script user]] (duration: 00m 59s)
* 15:08 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 15:07 jbond@cumin1001: START - Cookbook sre.hosts.downtime
* 15:01 gehel@cumin2001: END (ERROR) - Cookbook sre.elasticsearch.rolling-reboot (exit_code=97)
* 14:57 jbond42: reboot ms-be2016
* 14:57 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 14:57 jbond@cumin1001: START - Cookbook sre.hosts.downtime
* 14:36 gehel@cumin2001: START - Cookbook sre.elasticsearch.rolling-reboot
* 14:30 zfilipin@deploy1001: rebuilt and synchronized wikiversions files: group0 to 1.34.0-wmf.7
* 14:10 herron: beginning rolling reboots of codfw kafka-main cluster for security updates
* 14:10 zfilipin@deploy1001: Finished scap: testwiki to php-1.34.0-wmf.7 and rebuild l10n cache (duration: 34m 22s)
* 14:04 gehel@cumin2001: END (ERROR) - Cookbook sre.elasticsearch.rolling-reboot (exit_code=97)
* 13:50 _joe_: hhvm restarted on mwdebug1001
* 13:48 _joe_: stopping hhvm on mwdebug1001 for testing
* 13:39 gehel@cumin2001: START - Cookbook sre.elasticsearch.rolling-reboot
* 13:35 zfilipin@deploy1001: Started scap: testwiki to php-1.34.0-wmf.7 and rebuild l10n cache
* 13:32 gilles@deploy1001: Finished deploy [performance/asoranking@60369cc]: [[phab:T224388|T224388]] (duration: 00m 03s)
* 13:31 gilles@deploy1001: Started deploy [performance/asoranking@60369cc]: [[phab:T224388|T224388]]
* 13:31 gilles@deploy1001: deploy aborted: [[phab:T224388|T224388]] (duration: 00m 01s)
* 13:31 gilles@deploy1001: Started deploy [performance/asoranking@1c60db1]: [[phab:T224388|T224388]]
* 13:24 urandom: decommissioning restbase1014-c -- [[phab:T223976|T223976]]
* 13:23 gehel@cumin2001: END (ERROR) - Cookbook sre.elasticsearch.rolling-reboot (exit_code=97)
* 12:55 gehel@cumin2001: START - Cookbook sre.elasticsearch.rolling-reboot
* 12:51 gilles@deploy1001: Finished deploy [performance/asoranking@1c60db1]: [[phab:T224388|T224388]] (duration: 00m 04s)
* 12:50 gilles@deploy1001: Started deploy [performance/asoranking@1c60db1]: [[phab:T224388|T224388]]
* 12:40 gilles@deploy1001: Finished deploy [performance/asoranking@157c25f]: [[phab:T224388|T224388]] (duration: 00m 06s)
* 12:40 gilles@deploy1001: Started deploy [performance/asoranking@157c25f]: [[phab:T224388|T224388]]
* 12:13 raynor: EU SWAT done
* 12:11 pmiazga@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:512743 Disable the rdf2latex Collection portlet format(T224433)]] (duration: 00m 55s)
* 12:00 raynor: EU SWAT re-opened
* 11:58 Lucas_WMDE: EU SWAT done
* 11:54 Lucas_WMDE: ^ error, no change to wiki
* 11:54 Lucas_WMDE: lucaswerkmeister-wmde@mwmaint1002:~$ mwscript extensions/EntitySchema/maintenance/createPreexistingSchemas.php --wiki=testwikidatawiki
* 11:52 lucaswerkmeister-wmde@deploy1001: Synchronized php-1.34.0-wmf.6/extensions/EntitySchema/: SWAT: [[gerrit:512689{{!}}Add maintenance script to create preexisting Schemas]] + [[gerrit:512717{{!}}Small maintenance script adjustments]] (duration: 00m 54s)
* 11:48 lucaswerkmeister-wmde@deploy1001: Synchronized php-1.34.0-wmf.6/extensions/EntitySchema: SWAT: [[gerrit:512677{{!}}Skip configured IDs]] (duration: 00m 57s)
* 11:43 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:511753{{!}}Add a list of IDs to skip in production]] (duration: 00m 54s)
* 11:37 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config: SWAT: [[gerrit:510204{{!}}Add feature flag config for breaking Wikibase API change (T223300)]] (duration: 00m 54s)
* 11:31 Urbanecm: Ran namespaceDupes.php for urwikibooks, urwikiquote, urwiktionary and aswikisource
* 11:27 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: [[:gerrit:512426{{!}}Use underscores instead of spaces in wgMetaNamespace(Talk) for several projects]] ([[phab:T223039|T223039]]) (duration: 00m 54s)
* 11:25 arturo: merging change to the puppet sudo module https://gerrit.wikimedia.org/r/c/operations/puppet/+/508311
* 11:18 urbanecm@deploy1001: Synchronized wmf-config/abusefilter.php: SWAT: [[:gerrit:512422{{!}}Add abusefilter-modify-restricted to abusefilter group on plwiki (T224308)]] (duration: 02m 36s)
* 10:54 zfilipin@deploy1001: scap failed: CalledProcessError Command '/usr/local/bin/mwscript rebuildLocalisationCache.php --wiki="testwiki" --outdir="/tmp/scap_l10n_4182265560" --threads=30 --lang en  --quiet' returned non-zero exit status 1 (duration: 03m 00s)
* 10:51 zfilipin@deploy1001: Started scap: testwiki to php-1.34.0-wmf.7 and rebuild l10n cache
* 10:48 zfilipin@deploy1001: Pruned MediaWiki: 1.34.0-wmf.3 [keeping static files] (duration: 01m 32s)
* 10:45 zfilipin@deploy1001: Pruned MediaWiki: 1.34.0-wmf.4 [keeping static files] (duration: 06m 06s)
* 09:32 mobrovac@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Allow MW to honour the X-Request-Id header if set - [[phab:T201409|T201409]] (duration: 01m 12s)
* 09:28 moritzm: installing php5 security updates
* 09:00 moritzm: installing ffmpeg security updates
* 08:58 gehel: rebooting wdqs nodes for kernel upgrade
* 08:54 jiji@deploy1001: Finished deploy [cpjobqueue/deploy@04cc66d]: Migrating ORESFetchScoresJob  to PHP7 - [[phab:T219148|T219148]] (duration: 01m 21s)
* 08:52 jiji@deploy1001: Started deploy [cpjobqueue/deploy@04cc66d]: Migrating ORESFetchScoresJob  to PHP7 - [[phab:T219148|T219148]]
* 08:52 moritzm: uploaded ffmpeg 3.2.14-1~deb9u1+wmf3 to component/vp9 of stretch-wikimedia (rebase of our vp9-row-mt backport to the latest stretch-security ffmpeg update)
* 08:47 vgutierrez: uploaded acme-chief 0.17 to apt.wikimedia.org (buster) - [[phab:T220518|T220518]] [[phab:T213820|T213820]]
* 08:40 volans: [[phab:T224448|T224448]] sudo cumin -b 15 -p 95 'R:git::clone' 'run-puppet-agent -q --failed-only'
* 08:29 volans: restarting gerrit due to stack threads - [[phab:T224448|T224448]]
* 07:17 moritzm: uploaded ffmpeg 3.2.14-1~deb9u1+wmf1 to component/vp9 of stretch-wikimedia (rebase of our vp9-row-mt backport to the latest stretch-security ffmpeg update)
* 07:02 mobrovac: decommission restbase1014-b -- [[phab:T223976|T223976]]
* 06:40 jiji@deploy1001: Synchronized wmf-config/CommonSettings.php: Send 20% of anonymous users to PHP7.2 - [[phab:T219150|T219150]] (duration: 00m 51s)
* 00:38 urandom: decommissioning restbase1014-a -- [[phab:T223976|T223976]]


== 2019-05-27 ==
==2019-06-02==
* 23:19 thcipriani: gerrit back after restarting due to [[phab:T224448|T224448]]
* 23:10 thcipriani: restarting gerrit due to active threads being stuck being a sendemail thread.
* 22:52 gilles@deploy1001: Finished deploy [performance/asoranking@bacfc37]: [[phab:T224388|T224388]] (duration: 00m 05s)
* 22:52 gilles@deploy1001: Started deploy [performance/asoranking@bacfc37]: [[phab:T224388|T224388]]
* 22:19 gilles@deploy1001: Finished deploy [performance/asoranking@d0c156e]: [[phab:T224388|T224388]] (duration: 00m 05s)
* 22:19 gilles@deploy1001: Started deploy [performance/asoranking@d0c156e]: [[phab:T224388|T224388]]
* 20:19 gilles@deploy1001: Finished deploy [performance/asoranking@61039f1]: (no justification provided) (duration: 00m 06s)
* 20:19 gilles@deploy1001: Started deploy [performance/asoranking@61039f1]: (no justification provided)
* 18:41 krinkle@deploy1001: Synchronized php-1.34.0-wmf.6/includes/libs/rdbms: {{Gerrit|66556bf37e8}} / [[phab:T223310|T223310]], [[phab:T223978|T223978]] (duration: 00m 50s)
* 18:06 krinkle@deploy1001: Synchronized errorpages/: {{Gerrit|4ffcbfc2ba3}} (duration: 00m 48s)
* 17:56 andrewbogott: re-imaging cloudservices1004 in order to make sure our apt magic is working properly
* 17:37 andrewbogott: refreshing puppet-compiler facts
* 16:40 volans: removed unreferenced files in /etc/dhcp/ on install[12]002
* 16:34 mobrovac: decommission restbase1013-c - [[phab:T223976|T223976]]
* 15:40 akosiaris: initialize termbox namespace on eqiad/codfw/staging kubernetes clusters [[phab:T220402|T220402]]
* 15:36 akosiaris: initialize sessionstore namespace on eqiad/codfw/staging kubernetes clusters [[phab:T220401|T220401]]
* 13:03 godog: swift eqiad-prod: ms-be1033 weight to 0 - [[phab:T223518|T223518]]
* 11:33 onimisionipe: starting osm initial import on maps2004 - [[phab:T224395|T224395]]
* 10:35 mobrovac: decommission restbase1013-b - [[phab:T223976|T223976]]
* 10:31 onimisionipe: rebooting maps2004 - cassandra unit failed and got stuck
* 09:59 jiji@deploy1001: Finished deploy [cpjobqueue/deploy@421c029]: Migrating wikibase-addUsagesForPage  to PHP7 - [[phab:T219148|T219148]] (duration: 01m 09s)
* 09:58 jiji@deploy1001: Started deploy [cpjobqueue/deploy@421c029]: Migrating wikibase-addUsagesForPage  to PHP7 - [[phab:T219148|T219148]]
* 09:52 _joe_: disabling puppet on mw1261, running some tests for [[phab:T223180|T223180]]
* 08:52 arturo: 1 day downtime systemd check for cloudcontrol1003
* 08:27 jiji@deploy1001: Synchronized wmf-config/db-codfw.php: Depool db2091 - [[phab:T224393|T224393]] (duration: 00m 49s)
* 08:03 gehel: depool maps2004 - [[phab:T224395|T224395]]
* 07:05 gehel: running nodetool repair on maps2004 -[[phab:T224395|T224395]]
* 04:23 gilles@deploy1001: Finished deploy [performance/asoranking@61039f1]: (no justification provided) (duration: 00m 28s)
* 04:23 gilles@deploy1001: Started deploy [performance/asoranking@61039f1]: (no justification provided)
* 02:59 urandom: decommissioning restbase1013-a -- [[phab:T223976|T223976]]


== 2019-05-26 ==
*20:28 onimisionipe: pooled wdqs1007. It caught up on lag
* 20:39 urandom: decommissioning restbase1012-c -- [[phab:T223976|T223976]]
*15:24 onimisionipe: depooled wdqs1007 to catch up on lags
* 14:09 urandom: decommissioning restbase1012-b -- [[phab:T223976|T223976]]
*15:22 onimisionipe: depool wdqs internal cluster to allow  them catch up on lags. depool one at a time
* 13:37 krinkle@deploy1001: Synchronized php-1.34.0-wmf.6/includes/debug: [[phab:T187147|T187147]] / {{Gerrit|2be7aa4bc4af36}} (duration: 00m 51s)
*03:09 andrewbogott: restarting pdns-recursor on cloudservices 1003 and 1004 (but not at the same time)
* 08:01 mobrovac: decommission restbase1012-a - [[phab:T223976|T223976]]


== 2019-05-25 ==
==2019-06-01==
* 22:41 urandom: decommissioning restbase1011-c -- [[phab:T223976|T223976]]
* 22:00 krinkle@deploy1001: Synchronized php-1.34.0-wmf.6/includes/Linker.php: [[phab:T222628|T222628]] / {{Gerrit|c735a545df3a}} (duration: 00m 51s)
* 19:12 andrewbogott: reimaging cloudservices1004 with Stretch
* 13:46 urandom: decommissioning restbase1011-b -- [[phab:T223976|T223976]]
* 12:28 godog: bounce thumbor on thumbor1002
* 12:21 godog: bounce thumbor on thumbor1002
* 11:48 _joe_: restarted tumbor-instances on thumbor1001
* 09:20 mobrovac: decommission restbase1011-b - [[phab:T223976|T223976]]
* 04:56 ariel@deploy1001: Finished deploy [dumps/dumps@61114e0]: add namespaces param only once for abstracts with lang variants (duration: 00m 07s)
* 04:56 ariel@deploy1001: Started deploy [dumps/dumps@61114e0]: add namespaces param only once for abstracts with lang variants
* 00:30 jforrester@deploy1001: Synchronized php-1.34.0-wmf.6/extensions/VisualEditor/modules/ve-mw/init/ve.init.mw.ArticleTarget.js: Hot-deploy [[phab:T224319|T224319]] for VisualEditor switching and auto-restore (duration: 00m 50s)


== 2019-05-24 ==
*22:49 krinkle@deploy1001: Synchronized php-1.34.0-wmf.7/extensions/3D/modules/mmv.3d.js: [[phab:T224812|T224812]] / {{Gerrit|bd4fbfddbe1a0}} (duration: 01m 07s)
* 21:56 urandom: decommissioning restbase1011-a -- [[phab:T223976|T223976]]
* 16:34 XioNoX: add routinator package to reprepro/APT - [[phab:T220669|T220669]]
* 15:44 urandom: decommissioning restbase1010-c -- [[phab:T223976|T223976]]
* 15:30 XioNoX: disable bgp to telia on cr1-codfw for X-connect investigation - [[phab:T222967|T222967]]
* 15:01 jbond42: upload python{,3}-statsd.3.2.1-2 to jessie-wikimedia
* 14:55 krinkle@deploy1001: Synchronized php-1.34.0-wmf.6/includes/libs/objectcache/: {{Gerrit|d262078b1}} / [[phab:T220470|T220470]] (duration: 01m 06s)
* 11:45 hoo: Updated the Wikidata property suggester with data from the 2019-05-13 JSON dump and applied the [[phab:T132839|T132839]] workarounds
* 11:32 jbond42: [actully] rebooting prometheous1004 now
* 11:32 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 11:32 jbond@cumin1001: START - Cookbook sre.hosts.downtime
* 11:25 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 11:25 jbond@cumin1001: START - Cookbook sre.hosts.downtime
* 11:23 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 11:23 jbond@cumin1001: START - Cookbook sre.hosts.downtime
* 11:23 jbond@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 11:23 jbond@cumin1001: START - Cookbook sre.hosts.downtime
* 11:23 jbond42: rebooting prometheous1004
* 10:56 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 10:56 jbond@cumin1001: START - Cookbook sre.hosts.downtime
* 10:56 jbond42: rebooting prometheous2003
* 10:25 jbond42: rebooting prometheous2004
* 10:25 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 10:25 jbond@cumin1001: START - Cookbook sre.hosts.downtime
* 10:09 mobrovac: decommission restbase1010-b - [[phab:T223976|T223976]]
* 07:33 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 07:33 jmm@cumin2001: START - Cookbook sre.hosts.downtime
* 07:32 moritzm: rebooting labweb* for kernel security update
* 07:05 mobrovac: restbase-dev1006 force-stop the cassandra instances, fsync exception during decomm - [[phab:T224260|T224260]]
* 06:47 moritzm: bounced ferm on mw2286, wasn't correctly started after reboot
* 06:45 mobrovac: restbase-dev1006 decommission cass-b - [[phab:T224260|T224260]]
* 06:43 _joe_: disable notifications in icinga for restbase-dev1006 [[phab:T224260|T224260]]
* 06:40 mobrovac: restbase-dev1006 decommission cass-a - [[phab:T224260|T224260]]
* 06:39 mobrovac: restbase-dev1006 stop restbase - [[phab:T224260|T224260]]
* 06:38 mobrovac: restbase-dev1006 puppet disabled - [[phab:T224260|T224260]]
* 06:26 mobrovac@deploy1001: Finished deploy [restbase/deploy@b153f5d] (dev-cluster): Remove Parsoid fallback and rate-limit stashing (duration: 05m 41s)
* 06:20 mobrovac@deploy1001: Started deploy [restbase/deploy@b153f5d] (dev-cluster): Remove Parsoid fallback and rate-limit stashing
* 06:20 mobrovac@deploy1001: Finished deploy [restbase/deploy@b153f5d]: Remove Parsoid fallback and rate-limit stashing - [[phab:T215956|T215956]] [[phab:T224055|T224055]] (duration: 21m 30s)
* 06:17 marostegui: Stop MySQL on db2078:m1 to clone db2062 - [[phab:T220170|T220170]]
* 06:08 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: More traffic to new hosts [[phab:T220170|T220170]] (duration: 00m 48s)
* 05:58 mobrovac@deploy1001: Started deploy [restbase/deploy@b153f5d]: Remove Parsoid fallback and rate-limit stashing - [[phab:T215956|T215956]] [[phab:T224055|T224055]]
* 05:35 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Remove db2062 from config [[phab:T220170|T220170]] (duration: 00m 48s)
* 05:34 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Remove db2062 from config [[phab:T220170|T220170]] (duration: 00m 49s)
* 05:30 marostegui: Reload haproxy on dbproxy1010 to repool labsdb1011
* 00:32 XioNoX: remove lvs1001-5 bgp sessions from cr1/2-eqiad - [[phab:T224223|T224223]]
* 00:27 XioNoX: remove term protect-old-lvs-servers from cr1/2-eqiad - [[phab:T224223|T224223]]
* 00:20 urandom: decommissioning restbase1010-a -- [[phab:T223976|T223976]]
* 00:04 ebernhardson@deploy1001: Finished scap: php-1.34.0-wmf.6/extensions/CirrusSearch/includes/ [[phab:T223738|T223738]] Consider searching out of limits an error (duration: 21m 32s)


== 2019-05-23 ==
==2019-05-31==
* 23:43 ebernhardson@deploy1001: Started scap: php-1.34.0-wmf.6/extensions/CirrusSearch/includes/ [[phab:T223738|T223738]] Consider searching out of limits an error
* 23:08 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Invariant config cleanup VII–X, InitialiseSettings (duration: 00m 48s)
* 23:06 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: Invariant config cleanup VII–X, CommonSettings (duration: 00m 47s)
* 23:00 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Invariant config cleanup VI, InitialiseSettings (duration: 00m 47s)
* 22:59 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: Invariant config cleanup VI, CommonSettings (duration: 00m 48s)
* 22:57 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Invariant config cleanup V, InitialiseSettings (duration: 00m 47s)
* 22:56 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: Invariant config cleanup V, CommonSettings (duration: 00m 47s)
* 22:53 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Invariant config cleanup IV, InitialiseSettings (duration: 00m 47s)
* 22:51 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: Invariant config cleanup IV, CommonSettings (duration: 00m 48s)
* 22:50 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Invariant config cleanup III, InitialiseSettings (duration: 00m 47s)
* 22:47 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: Invariant config cleanup III, CommonSettings (duration: 00m 48s)
* 22:44 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Invariant config cleanup II, InitialiseSettings (duration: 00m 48s)
* 22:43 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: Invariant config cleanup II, CommonSettings (duration: 00m 48s)
* 22:39 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Invariant config cleanup I, InitialiseSettings (duration: 00m 47s)
* 22:37 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: Invariant config cleanup I, CommonSettings (duration: 00m 48s)
* 22:32 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Stop setting wmgUseClusterSquid, never varied, no longer used (duration: 00m 48s)
* 22:29 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: Stop reading wmgUseClusterSquid, never varied (duration: 00m 47s)
* 22:25 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: [[phab:T104148|T104148]] Duplicate …Squid variables into …Cdn ahead of MW renaming, part 3 (duration: 00m 47s)
* 22:24 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [[phab:T104148|T104148]] Duplicate …Squid variables into …Cdn ahead of MW renaming, part 2 (duration: 00m 48s)
* 22:23 jforrester@deploy1001: Synchronized wmf-config/reverse-proxy.php: [[phab:T104148|T104148]] Duplicate …Squid variables into …Cdn ahead of MW renaming, part 1 (duration: 00m 48s)
* 22:19 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [[phab:T223793|T223793]] Drop wmgVisualEditorSingleEditTabSecondaryEditor and wmgVisualEditorSecondaryTabs from InitialiseSettings (duration: 00m 48s)
* 22:17 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: [[phab:T223793|T223793]] Read wmgVisualEditorIsSecondaryEditor in CommonSettings (duration: 00m 48s)
* 22:13 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [[phab:T223793|T223793]] Add wmgVisualEditorIsSecondaryEditor to InitialiseSettings (duration: 00m 49s)
* 19:48 ejegg: updated payments-wiki from {{Gerrit|786d76e212}} to {{Gerrit|332aaa96e2}}
* 18:54 urandom: decommissioning restbase1009-c -- [[phab:T223976|T223976]]
* 16:13 twentyafterfour: restarting phd on phab1003 to pick up new php module config
* 15:57 moritzm: rebooting furud/flerovium for kernel updates
* 15:56 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 15:56 jmm@cumin2001: START - Cookbook sre.hosts.downtime
* 15:56 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 15:56 jmm@cumin2001: START - Cookbook sre.hosts.downtime
* 15:33 ottomata: rolling restart of swift-proxy to apply creation of analytics_admin account
* 15:31 hashar@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Hardcode korean help desk config - [[phab:T224224|T224224]] (duration: 00m 48s)
* 15:31 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 15:31 jbond@cumin1001: START - Cookbook sre.hosts.downtime
* 15:31 jbond42: reboot thumbor2004
* 15:02 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 15:02 jbond@cumin1001: START - Cookbook sre.hosts.downtime
* 15:02 jbond42: reboot thumbor2003
* 14:57 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 14:57 jbond@cumin1001: START - Cookbook sre.hosts.downtime
* 14:57 jbond42: reboot thumbor2002
* 14:51 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 14:51 jbond@cumin1001: START - Cookbook sre.hosts.downtime
* 14:51 jbond@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 14:51 jbond@cumin1001: START - Cookbook sre.hosts.downtime
* 14:50 jbond42: reboot thumbor2001
* 14:43 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 14:43 jbond@cumin1001: START - Cookbook sre.hosts.downtime
* 14:43 jbond42: reboot thumbor1004
* 14:37 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 14:37 jbond@cumin1001: START - Cookbook sre.hosts.downtime
* 14:36 jbond42: reboot thumbor1003
* 14:30 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 14:29 jbond@cumin1001: START - Cookbook sre.hosts.downtime
* 14:28 jbond42: reboot thumbor1002
* 14:25 jbond@cumin1001: conftool action : set/pooled=yes; selector: name=thumbor1001.eqiad.wmnet
* 14:21 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 14:21 jbond@cumin1001: START - Cookbook sre.hosts.downtime
* 14:21 jbond@cumin1001: conftool action : set/pooled=no; selector: name=thumbor1001.eqiad.wmnet
* 13:56 sbisson@deploy1001: Synchronized php-1.34.0-wmf.6/extensions/Echo: SWAT: [[gerrit:512070{{!}}Don't add CommentStoreComment as plaintext params]] (duration: 00m 50s)
* 13:55 urandom: decommissioning restbase1009-b -- [[phab:T223976|T223976]]
* 13:41 bblack: stopped pybal on lvs1001-6 - [[phab:T224223|T224223]]
* 13:25 hashar@deploy1001: rebuilt and synchronized wikiversions files: all wikis to 1.34.0-wmf.6
* 13:00 godog: swift eqiad-prod: ms-be1033 weight to 1500 - [[phab:T223518|T223518]]
* 12:04 moritzm: powercycling mw2268 (stuck after reboot)
* 11:50 jbond42: will shortly start rolling reboots of thumbor servers
* 11:37 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 11:37 jmm@cumin2001: START - Cookbook sre.hosts.downtime
* 11:34 jmm@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 11:34 jmm@cumin1001: START - Cookbook sre.hosts.downtime
* 11:23 moritzm: rebooting auth1002 for kernel update
* 11:21 jmm@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 11:21 jmm@cumin1001: START - Cookbook sre.hosts.downtime
* 10:51 Amir1: Deploying EntitySchema to testwikidatawiki is done
* 10:50 Amir1: ladsgroup@mwmaint1002:/srv/mediawiki/php-1.34.0-wmf.5$ mwscript sql.php --wiki=wikidatawiki extensions/EntitySchema/sql/EntitySchema.sql ([[phab:T216955|T216955]])
* 10:50 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:511844{{!}}deploy WikibaseSchema to test (T216956)]] (duration: 00m 56s)
* 10:44 Amir1: ladsgroup@mwmaint1002:/srv/mediawiki/php-1.34.0-wmf.5$ mwscript sql.php --wiki=testwikidatawiki extensions/EntitySchema/sql/EntitySchema.sql ([[phab:T216956|T216956]])
* 10:28 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 10:28 jmm@cumin2001: START - Cookbook sre.hosts.downtime
* 10:16 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Fully repool db1080 (duration: 00m 57s)
* 10:15 _joe_: restarted php7.2-fpm on mw1261 to assess the effect of a larger APCu shm size [[phab:T223180|T223180]]
* 10:00 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 10:00 moritzm: rebooting remaining mw servers in codfw (sans mcrouter proxies for now)
* 10:00 jmm@cumin2001: START - Cookbook sre.hosts.downtime
* 09:51 hashar@deploy1001: Synchronized php-1.34.0-wmf.6/extensions/Collection: Rename wfAjaxCollectionGetItemList() [[phab:T224093|T224093]] (duration: 00m 57s)
* 09:37 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1080 into API (duration: 00m 55s)
* 09:22 godog: bounce rsyslog on lithium - listener stuck /T199406
* 09:10 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 09:10 jmm@cumin2001: START - Cookbook sre.hosts.downtime
* 09:10 moritzm: rebooting scb servers in eqiad
* 09:02 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Slowly repool db1080 (duration: 00m 55s)
* 08:29 marostegui: Upgrade MySQL and kernel on db1080
* 08:29 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1080 (duration: 00m 55s)
* 08:26 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 08:26 jmm@cumin2001: START - Cookbook sre.hosts.downtime
* 08:26 moritzm: rebooting scb servers in codfw
* 07:58 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Slowly repool db1080 (duration: 00m 56s)
* 07:34 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 07:34 jmm@cumin2001: START - Cookbook sre.hosts.downtime
* 07:33 moritzm: rebooting swift frontends in eqiad
* 07:26 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: More traffic to db1136 (duration: 00m 53s)
* 07:11 marostegui: Stop MySQL on db1117:3323 to clone db1128 [[phab:T222682|T222682]]
* 06:38 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: More traffic to db1136 (duration: 00m 55s)
* 06:29 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Remove db2065 from config as it will be moved to m3 to replace db2042 (duration: 00m 55s)
* 06:28 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Remove db2065 from config as it will be moved to m3 to replace db2042 (duration: 00m 56s)
* 06:14 mobrovac: start ruwiki dumps to fill the new parsoid tables - [[phab:T215956|T215956]]
* 05:33 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Promote db2070 as m5 codfw master - [[phab:T221533|T221533]] (duration: 00m 54s)
* 05:29 marostegui: Promote db2070 to m5 codfw master instead of db2037 - [[phab:T221533|T221533]]
* 05:20 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Clarify db2107 status - will be the new master (duration: 00m 54s)
* 05:03 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Pool db1136 into s7 [[phab:T222682|T222682]] (duration: 00m 55s)
* 05:02 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Pool db1136 into s7 [[phab:T222682|T222682]] (duration: 00m 55s)
* 04:57 mobrovac: decommission restbase1009-a - [[phab:T223976|T223976]]
* 04:46 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1080 (duration: 00m 55s)
* 04:40 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1080 (duration: 00m 58s)
* 04:24 mobrovac: start nl, pt, pl wiki dumps to fill the new parsoid tables - [[phab:T215956|T215956]]
* 03:50 twentyafterfour: m3 database activity levels look like they have returned to normal
* 03:48 twentyafterfour: puppet runs cleanly on phab1003
* 03:39 mutante: phab1003 - disabling puppet; /etc/php/7.2/fpm/conf.d# ln -s /etc/php/7.2/mods-available/ldap.ini 20-ldap.ini ; systemctl restart php7.2-fpm
* 03:27 twentyafterfour: restarted php-fpm on phab1003
* 02:56 mutante: phab1001 - removing community_metrics and project_changes cron jobs to avoid duplicate mails
* 02:51 mutante: phab1003 - chown -R phd /srv/repos/
* 02:41 twentyafterfour: downtimed the systemd state on phab1001 for 1 year
* 02:35 mutante: phabricator - going read-write again
* 02:24 twentyafterfour: manually started aphlict on phab1003
* 02:06 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=phab1003-vcs.eqiad.wmnet
* 02:04 mutante: puppetmaster1001 - sudo -i conftool-merge
* 01:52 twentyafterfour: phabricator is now served by phab1003 though still in read-only mode for a bit longer
* 01:52 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=phab1003-vcs.eqiad.wmnet
* 01:49 mutante: puppetmaster1001 - conftool-merge
* 01:41 eileen: civicrm revision changed from {{Gerrit|e6e846708f}} to {{Gerrit|21afd001b6}}, config revision is {{Gerrit|87e78d3eac}}
* 01:37 mutante: depooled phab1001-vcs from git-ssh via conftool
* 01:36 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=phab1001-vcs.eqiad.wmnet
* 01:33 mutante: run puppet on mx1001/mx2001 - switch mail route for phab to phab1003
* 01:30 mutante: switched from phab1001 to phab1003 - applied on cp1008 varnish canary first
* 01:28 twentyafterfour: stopping phd on phab1001
* 01:18 mutante: phabricator going readonly momentarily
* 01:09 twentyafterfour: extended phab downtime in icinga, actual downtime hasn't started yet, prep work taking longer than expected
* 00:52 smalyshev@deploy1001: Finished deploy [wdqs/wdqs@e040c6c]: Deploy GUI update (duration: 09m 54s)
* 00:45 mutante: phab1003 - rsyncing /srv/repos from phab1001
* 00:42 smalyshev@deploy1001: Started deploy [wdqs/wdqs@e040c6c]: Deploy GUI update
* 00:33 ejegg: updated payments-wiki from {{Gerrit|fa005a0640}} to {{Gerrit|786d76e212}}


== 2019-05-22 ==
*21:47 aaron@deploy1001: Synchronized wmf-config/db-eqiad.php: Set "secret" field in $wgLBFactoryConf for ChronologyProtector HMACs (duration: 00m 47s)
* 23:30 twentyafterfour: scheduling downtime for phabricator from 0:00 to 1:00 utc
*21:46 aaron@deploy1001: Synchronized wmf-config/db-codfw.php: Set "secret" field in $wgLBFactoryConf for ChronologyProtector HMACs (duration: 00m 50s)
* 23:10 maxsem@deploy1001: Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/operations/mediawiki-config/+/511889/ (duration: 00m 55s)
*21:10 bblack: cp3034: repool - [[phab:T222937|T222937]]
* 22:18 mdholloway: mobileapps rolled back deployment (again) due to occasional references endpoint timeouts
*20:04 bblack: cp3034: depool for reimage - [[phab:T222937|T222937]]
* 22:17 mholloway-shell@deploy1001: Finished deploy [mobileapps/deploy@39e0ef1]: Update mobileapps to {{Gerrit|fcf3724}}, take 2 (duration: 07m 19s)
*18:44 marostegui: Start MySQL on es1019 - [[phab:T213422|T213422]]
* 22:15 foks: reset user email and password for Nv8200pa
*18:34 jgleeson: payments-wiki updated from {{Gerrit|a76658f0a3}} to {{Gerrit|c6c7bbf71e}}
* 22:09 mholloway-shell@deploy1001: Started deploy [mobileapps/deploy@39e0ef1]: Update mobileapps to {{Gerrit|fcf3724}}, take 2
*17:29 andrewbogott: added jeh to the 'ops' group in ldap
* 22:09 mdholloway: mobileapps rolled back deployment due to endpoint check failure (not the same one as before); retrying momentarily
*16:20 ariel@deploy1001: Finished deploy [dumps/dumps@fd6100a]: remove orderrevs config option, unneeded now (duration: 00m 03s)
* 22:08 mholloway-shell@deploy1001: Finished deploy [mobileapps/deploy@39e0ef1]: Update mobileapps to {{Gerrit|fcf3724}} (duration: 03m 25s)
*16:20 ariel@deploy1001: Started deploy [dumps/dumps@fd6100a]: remove orderrevs config option, unneeded now
* 22:08 foks: reset user email and password for DarkKyoushu
*15:05 bblack: cp3039: restart varnish-be for mbox lag (likely induced by 3049's depool for ATS conversion!)
* 22:05 mholloway-shell@deploy1001: Started deploy [mobileapps/deploy@39e0ef1]: Update mobileapps to {{Gerrit|fcf3724}}
*15:00 Krinkle: krinkle@deploy1001: pulling down {{Gerrit|6f91b41}} for php-1.34-wmf.7/extensions/ORES (without deploy), commit seems test-only
* 21:51 krinkle@deploy1001: Synchronized php-1.34.0-wmf.5/includes/resourceloader/MessageBlobStore.php: [[phab:T222539|T222539]] / {{Gerrit|734b3d84f7}} (duration: 00m 56s)
*14:59 Krinkle: krinkle@deploy1001: git status in php-1.34-wmf.7/ is dirty (extensions/ORES)
* 21:47 krinkle@deploy1001: Synchronized php-1.34.0-wmf.6/includes/resourceloader/MessageBlobStore.php: [[phab:T222539|T222539]] / {{Gerrit|3cb01cc73ce9}} (duration: 00m 56s)
*14:52 bblack: pool cp3049 back into service - [[phab:T222937|T222937]]
* 21:41 urandom: decommissioning restbase1008-c -- [[phab:T223976|T223976]]
*14:32 onimisionipe: depool maps2004 (again) - [[phab:T224395|T224395]]
* 20:46 mdholloway: mobileapps rolled back deployment due to endpoint check failures
*14:32 elukey: powercycle notebook1003 - host stuck due to user processes, no ssh available, OOM didn't trigger
* 20:43 mholloway-shell@deploy1001: Finished deploy [mobileapps/deploy@07632e1]: Update mobileapps to {{Gerrit|b058298}}, take 2 (duration: 04m 19s)
*14:20 _joe_: rolling restart of php-fpm across production to pick up the shorter revalidate frequency for [[phab:T224491|T224491]]
* 20:39 mholloway-shell@deploy1001: Started deploy [mobileapps/deploy@07632e1]: Update mobileapps to {{Gerrit|b058298}}, take 2
*14:10 bblack: reboot cp3049 - [[phab:T222937|T222937]]
* 20:38 mholloway-shell@deploy1001: Finished deploy [mobileapps/deploy@07632e1]: Update mobileapps to {{Gerrit|b058298}} (duration: 02m 41s)
*13:16 bblack: depool cp3049 for reimage - [[phab:T222937|T222937]]
* 20:35 mholloway-shell@deploy1001: Started deploy [mobileapps/deploy@07632e1]: Update mobileapps to {{Gerrit|b058298}}
*11:46 jynus: stop and upgrade db2084
* 19:26 jforrester@deploy1001: Finished scap: Re-build i18n and re-scap everything for i18n issues for [[phab:T224116|T224116]] [[phab:T224124|T224124]] [[phab:T220731|T220731]] (duration: 32m 55s)
*11:09 jynus@deploy1001: Synchronized wmf-config/db-eqiad.php: Fully repool db1099 after maintenance (duration: 00m 48s)
* 18:53 jforrester@deploy1001: Started scap: Re-build i18n and re-scap everything for i18n issues for [[phab:T224116|T224116]] [[phab:T224124|T224124]] [[phab:T220731|T220731]]
*10:54 jynus: depool labsdb1010 for maintenance
* 18:35 jforrester@deploy1001: Synchronized php-1.34.0-wmf.6/extensions/FlaggedRevs: Hot-deploy reverting FlaggedRevs config for [[phab:T224116|T224116]] [[phab:T224124|T224124]] (duration: 00m 58s)
*10:47 arturo: merging multiple commits to labs/private.git. We now require `puppet-merge --labsprivate` and people may not be yet aware of that
* 18:17 jforrester@deploy1001: Synchronized php-1.34.0-wmf.6/extensions/UrlShortener/modules/ext.urlShortener.special.js: Fix i18n/command mix-up {{Gerrit|Ic99cf063a}} (duration: 01m 00s)
*09:28 jynus: stop and upgrade db2073
* 17:38 bblack: repool cp3046 as esams cache_upload ats-be node - [[phab:T222937|T222937]]
*09:11 jynus: stop and upgrade db2095 (s2, s4, s6, s7)
* 17:06 urandom: decommissioning restbase1008-b -- [[phab:T223976|T223976]]
*08:33 jynus: upgrade and restart db2065
* 16:17 hashar@deploy1001: rebuilt and synchronized wikiversions files: Revert group1 to 1.34.0-wmf.5 [[phab:T224116|T224116]] [[phab:T224124|T224124]] # [[phab:T220731|T220731]]
*08:16 jynus: depool labsdb1011 for maintenance
* 15:11 jbond@cumin1001: conftool action : set/pooled=yes; selector: name=dns1002.wikimedia.org
*07:54 jynus@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1099 with low weight (duration: 00m 49s)
* 15:08 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
*07:43 _joe_: restarting php-fpm on canaries
* 15:08 jbond@cumin1001: START - Cookbook sre.hosts.downtime
*07:24 _joe_: repooling mw1348
* 15:08 jbond@cumin1001: conftool action : set/pooled=no; selector: name=dns1002.wikimedia.org
*07:24 jynus: upgrade and restart labsdb1009
* 15:07 jbond@cumin1001: conftool action : set/pooled=yes; selector: name=dns1001.wikimedia.org
*07:15 _joe_: draining mw1348 from traffic
* 15:04 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
*07:14 jynus: depool labsdb1009 for maintenance
* 15:04 jbond@cumin1001: START - Cookbook sre.hosts.downtime
*06:55 jynus: upgrade and restart db2058
* 15:04 jbond@cumin1001: conftool action : set/pooled=no; selector: name=dns1001.wikimedia.org
*06:33 _joe_: repooled mw1348
* 15:00 jbond@cumin1001: conftool action : set/pooled=yes; selector: name=dns2002.wikimedia.org
*06:21 jijiki: depool mw1348
* 14:58 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
*06:16 _joe_: restarting php-fpm on mw1348
* 14:58 jbond@cumin1001: START - Cookbook sre.hosts.downtime
*00:08 jgleeson: Updating civicrm from {{Gerrit|bb4acf3d8a}} to {{Gerrit|e028bfcd63}}
* 14:58 jbond@cumin1001: conftool action : set/pooled=no; selector: name=dns2002.wikimedia.org
* 14:57 jbond@cumin1001: conftool action : set/pooled=yes; selector: name=dns2001.wikimedia.org
* 14:54 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 14:54 jbond@cumin1001: START - Cookbook sre.hosts.downtime
* 14:54 jbond@cumin1001: conftool action : set/pooled=no; selector: name=dns2001.wikimedia.org
* 14:49 jbond@cumin1001: conftool action : set/pooled=yes; selector: name=nescio.wikimedia.org
* 14:45 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 14:45 jbond@cumin1001: START - Cookbook sre.hosts.downtime
* 14:45 jbond@cumin1001: conftool action : set/pooled=no; selector: name=nescio.wikimedia.org
* 14:42 jbond@cumin1001: conftool action : set/pooled=yes; selector: name=maerlant.wikimedia.org
* 14:35 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 14:35 jbond@cumin1001: START - Cookbook sre.hosts.downtime
* 14:35 jbond@cumin1001: conftool action : set/pooled=no; selector: name=maerlant.wikimedia.org
* 14:17 jbond@cumin1001: conftool action : set/pooled=yes; selector: name=dns4002.wikimedia.org
* 14:14 hashar: 1.34.0-wmf.6 deployed to group1 with the exception of cawikinews due to [[phab:T224116|T224116]]
* 14:14 mobrovac: start it, es wiki dumps (fr and de completed) to fill the new parsoid tables - [[phab:T215956|T215956]]
* 14:11 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 14:11 jbond@cumin1001: START - Cookbook sre.hosts.downtime
* 14:10 jbond@cumin1001: conftool action : set/pooled=no; selector: name=dns4002.wikimedia.org
* 14:09 jbond@cumin1001: conftool action : set/pooled=yes; selector: name=dns4001.wikimedia.org
* 14:08 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 14:08 jbond@cumin1001: START - Cookbook sre.hosts.downtime
* 14:02 marostegui: Stop MySQL on db2078 for upgrade
* 13:58 bblack: depool cp3046 for reimage to ats-be - [[phab:T222937|T222937]]
* 13:58 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 13:58 jmm@cumin2001: START - Cookbook sre.hosts.downtime
* 13:57 moritzm: rebooting swift frontends in codfw
* 13:46 jbond@cumin1001: conftool action : set/pooled=yes; selector: name=dns5002.wikimedia.org
* 13:43 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 13:43 jbond@cumin1001: START - Cookbook sre.hosts.downtime
* 13:43 jbond@cumin1001: conftool action : set/pooled=no; selector: name=dns5002.wikimedia.org
* 13:42 jbond@cumin1001: conftool action : set/pooled=yes; selector: name=dns5001.wikimedia.org
* 13:36 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 13:36 jbond@cumin1001: START - Cookbook sre.hosts.downtime
* 13:35 jbond@cumin1001: conftool action : set/pooled=no; selector: name=dns5001.wikimedia.org
* 13:27 reedy@deploy1001: Synchronized php-1.34.0-wmf.6/extensions/Collection/templates/: [[phab:T224092|T224092]] (duration: 00m 58s)
* 13:13 hashar@deploy1001: Synchronized php: group1 wikis to 1.34.0-wmf.6 (duration: 00m 54s)
* 13:06 urandom: decommissioning restbase1008-a -- [[phab:T223976|T223976]]
* 12:39 marostegui: Stop replication on db2048 (s1 codfw master) to rebuild revision table - this will generate lag on codfw - [[phab:T224017|T224017]]
* 12:35 bblack: cp3035: restarting varnish backend
* 12:34 marostegui: Stop replication on db1080 to rebuild revision table - [[phab:T224017|T224017]]
* 12:29 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1080 to rebuild revision table [[phab:T224017|T224017]] (duration: 00m 55s)
* 11:30 Amir1: EU SWAT is done
* 11:30 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:503342{{!}}Remove constraint-suggestions beta feature (T220609)]] (duration: 00m 57s)
* 11:19 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:509878{{!}}Add configuration for EntitySchema ShExSimpleUrl (T223120)]] (duration: 00m 56s)
* 11:14 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:511674{{!}}[SDC] Enable depicts qualifiers on testcommons]] (duration: 00m 57s)
* 10:01 vgutierrez: restarting varnish-backend on cp3039
* 09:52 mobrovac: start the en, fr and de wiki dumps again to populate the new parsoid table - [[phab:T215956|T215956]]
* 09:43 mobrovac@deploy1001: Finished deploy [restbase/deploy@b90fb8b]: Temporarily copy from old tables to new ones if the data is not found, the correct way this time - [[phab:T215956|T215956]] (duration: 27m 07s)
* 09:42 marostegui: Stop MySQL on db2078:m5 to clone db2070 - [[phab:T221533|T221533]]
* 09:16 mobrovac@deploy1001: Started deploy [restbase/deploy@b90fb8b]: Temporarily copy from old tables to new ones if the data is not found, the correct way this time - [[phab:T215956|T215956]]
* 08:52 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Move db2070 from s1 to m5 (duration: 00m 55s)
* 08:51 marostegui@deploy1001: sync-file aborted: Move db2070 from s1 to m5 (duration: 00m 03s)
* 08:42 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 08:42 jmm@cumin2001: START - Cookbook sre.hosts.downtime
* 08:32 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Fully repool db1086 (duration: 00m 56s)
* 08:17 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1086 into API (duration: 00m 56s)
* 08:03 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Slowly repool db1086 (duration: 00m 55s)
* 07:41 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Tackle s8 codfw weights [[phab:T220170|T220170]] (duration: 00m 55s)
* 07:36 mobrovac: decommission restbase1007-c - [[phab:T223976|T223976]]
* 07:24 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Tackle s4 codfw weights [[phab:T220170|T220170]] (duration: 01m 06s)
* 07:23 marostegui: Restart MySQL on db2090 to change binlog format [[phab:T220170|T220170]]
* 06:17 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Remove db2040 from config [[phab:T224079|T224079]] (duration: 00m 55s)
* 06:16 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Remove db2040 from config [[phab:T224079|T224079]] (duration: 00m 56s)
* 06:13 marostegui: Remove db2040 from zarcillo and tendril - [[phab:T224079|T224079]]
* 06:01 marostegui: Stop MySQL on db2040 - [[phab:T224079|T224079]]
* 05:42 marostegui: Stop MySQL on db1086 to clone db1136
* 05:39 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1086 (duration: 00m 55s)
* 05:26 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Pool db2118 and db2120 into s7 [[phab:T222772|T222772]] (duration: 00m 55s)
* 05:25 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Pool db2118 and db2120 into s7 [[phab:T222772|T222772]] (duration: 00m 55s)
* 05:09 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Remove db1118 from s1 api and pool db1134 instead [[phab:T224017|T224017]] (duration: 00m 57s)
* 04:41 gilles: purging ruwiki and eswiki to make them get the new origin trial tokens
* 04:39 gilles@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Renew origin trial tokens (duration: 00m 57s)
* 03:22 legoktm: removed 2fa for [[phab:T224075|T224075]]
* 01:46 aaron@deploy1001: Synchronized php-1.34.0-wmf.5/includes/specials/SpecialWatchlist.php: {{Gerrit|68eeaa5b76738a6a07d148391220cdb6c8fd1d23}} (duration: 00m 57s)
* 01:22 aaron@deploy1001: Synchronized php-1.34.0-wmf.6/includes/specials/SpecialWatchlist.php: {{Gerrit|447bf504e498e2c18f29b90f7760514102236e4e}} (duration: 00m 57s)


== 2019-05-21 ==
==2019-05-30==
* 23:47 maxsem@deploy1001: Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/operations/mediawiki-config/+/511668/ (duration: 00m 57s)
* 23:34 maxsem@deploy1001: Synchronized wmf-config/CommonSettings.php: https://gerrit.wikimedia.org/r/#/c/operations/mediawiki-config/+/511667/ (duration: 00m 56s)
* 22:56 mutante: ms-be2034 - degraded systemd state was cleared and originally caused by " failed Session 72587 of user debmonitor"
* 22:56 mutante: ms-be2034 -  sudo systemctl reset-failed
* 22:51 urandom: decommissioning restbase1007-b -- [[phab:T223976|T223976]]
* 21:35 ejegg: updated payments-wiki from {{Gerrit|d5ef5ad067}} to {{Gerrit|fa005a0640}}
* 21:21 mutante: re-enabling puppet on mc1* hosts
* 20:43 mutante: re-enabling puppet on all hosts using memcached class - except mc1*
* 20:31 mutante: mc2019 - stopping memcached and letting puppet restart it to confirm no issues after switching to systemd::service
* 20:20 mutante: disabling puppet on all servers using class memcached (57)
* 20:06 tzatziki: removing (another) two files for legal compliance
* 19:43 tzatziki: removing two files for legal compliance
* 19:12 thcipriani: gerrit back on 2.15.13
* 19:09 thcipriani: restart gerrit for 2.15.13 update
* 19:08 thcipriani@deploy1001: Finished deploy [gerrit/gerrit@2de9001]: Gerrit to 2.15.13 (cobalt, restart incoming) (duration: 00m 20s)
* 19:08 thcipriani@deploy1001: Started deploy [gerrit/gerrit@2de9001]: Gerrit to 2.15.13 (cobalt, restart incoming)
* 19:06 thcipriani@deploy1001: Finished deploy [gerrit/gerrit@2de9001]: Gerrit to 2.15.13 (gerrit 2001 only) (duration: 00m 11s)
* 19:06 thcipriani@deploy1001: Started deploy [gerrit/gerrit@2de9001]: Gerrit to 2.15.13 (gerrit 2001 only)
* 18:50 bblack: repooling cp1085 frontends (weren't meant to be depooled)
* 18:38 bblack: re-pooling eqiad front edge traffic (onto new LVSes from [[phab:T184293|T184293]] )
* 18:36 XioNoX: update lvs static routes on cr1/2-eqiad - [[phab:T184293|T184293]]
* 18:06 andrewbogott: restarting rabbitmq-server on cloudcontrol1003 (turning on HA queues)
* 17:59 bblack: rebooting lvs1016 in attempt to clear interface config issues - [[phab:T224027|T224027]]
* 17:51 XioNoX: add BGP sessions to AS202053 in esams
* 17:31 bblack: eqiad LVS: low-traffic (all internal services): puppeting lvs1016, bringing back pybal in "secondary" role for all 3 traffic classes (high-traffic1, high-traffic2, low-traffic), no traffic shift expected (again, after merging last-minute fixup https://gerrit.wikimedia.org/r/c/operations/puppet/+/511759 )
* 17:25 bblack: eqiad LVS: low-traffic (all internal services): puppeting lvs1016, bringing back pybal in "secondary" role for all 3 traffic classes (high-traffic1, high-traffic2, low-traffic), no traffic shift expected
* 17:24 bblack: eqiad LVS: low-traffic (all internal services): puppeting lvs1006, basically no-op
* 17:21 bblack: eqiad LVS: low-traffic (all internal services): puppeting lvs1015, bringing back pybal in primary role, shifting traffic to lvs1015
* 17:20 bblack: eqiad LVS: low-traffic (all internal services): disable pybal on lvs1016 + lvs1015, shifting traffic to lvs1006
* 17:18 reedy@deploy1001: Synchronized php-1.34.0-wmf.6/extensions/Collection/includes/CollectionHooks.php: Fix paths (duration: 00m 56s)
* 17:17 bblack: eqiad LVS: high-traffic2 (upload): puppeting lvs1005, basically no-op
* 17:15 bblack: eqiad LVS: high-traffic2 (upload): puppeting lvs1002, bringing back pybal in backup role, no traffic shift
* 17:13 bblack: eqiad LVS: high-traffic2 (upload): puppeting lvs1014, bringing back pybal in primary role, shifting traffic to lvs1014
* 17:11 bblack: eqiad LVS: high-traffic2 (upload): disable pybal on lvs1014 + lvs1002, shifting traffic to lvs1005
* 17:09 bblack: eqiad LVS: high-traffic1 (text): puppeting lvs1004, basically no-op
* 17:07 bblack: eqiad LVS: high-traffic1 (text): puppeting lvs1001, bringing back pybal in backup role, no traffic shift
* 17:06 bblack: eqiad LVS: high-traffic1 (text): puppeting lvs1013, bringing back pybal in primary role, shifting traffic to lvs1013
* 17:04 bblack: eqiad LVS: high-traffic1 (text): disable pybal on lvs1013 + lvs1001, shifting traffic to lvs1004
* 16:55 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 16:55 jbond@cumin1001: START - Cookbook sre.hosts.downtime
* 16:55 jbond42: rebooting wtp1046-1048
* 16:55 bblack: starting Eqiad LVS re-arrangement shortly - [[phab:T184293|T184293]] - https://gerrit.wikimedia.org/r/c/operations/puppet/+/511717 (eqiad front edge is still depooled from public traffic)
* 16:50 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 16:50 jbond@cumin1001: START - Cookbook sre.hosts.downtime
* 16:50 jbond42: rebooting wtp1043-1045
* 16:46 mutante: rebooting phab1003 (non-prod)
* 16:44 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 16:44 jbond@cumin1001: START - Cookbook sre.hosts.downtime
* 16:44 jbond42: rebooting wtp1040-1042
* 16:39 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 16:39 jbond@cumin1001: START - Cookbook sre.hosts.downtime
* 16:39 jbond42: rebooting wtp1037-1039
* 16:26 mobrovac: truncate "others_T_parsoid".data
* 16:25 mobrovac: restbase truncate "commons_T_parsoid".data
* 16:24 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 16:24 jbond@cumin1001: START - Cookbook sre.hosts.downtime
* 16:24 jbond42: rebooting wtp1033-1034
* 16:18 mobrovac: restbase truncate "enwiki_T_parsoid".data
* 16:16 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 16:16 jbond@cumin1001: START - Cookbook sre.hosts.downtime
* 16:16 jbond42: rebooting wtp1031-1032
* 16:10 mobrovac: restbase truncate "wikipedia_T_parsoid".data
* 16:09 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 16:09 jbond@cumin1001: START - Cookbook sre.hosts.downtime
* 16:09 jbond42: rebooting wtp1029-2030
* 16:01 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 16:01 jbond@cumin1001: START - Cookbook sre.hosts.downtime
* 16:01 jbond42: rebooting wtp1027-2028
* 15:56 urandom: decommissioning restbase1007-a -- [[phab:T208087|T208087]]
* 15:54 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 15:54 jbond@cumin1001: START - Cookbook sre.hosts.downtime
* 15:54 jbond42: rebooting wtp1025-2026
* 15:45 mobrovac@deploy1001: Finished deploy [restbase/deploy@cf00120]: Revert Temporarily copy from old tables to new ones if the data is not found, rb1007 (duration: 02m 43s)
* 15:42 mobrovac@deploy1001: Started deploy [restbase/deploy@cf00120]: Revert Temporarily copy from old tables to new ones if the data is not found, rb1007
* 15:42 mobrovac@deploy1001: Finished deploy [restbase/deploy@cf00120]: Revert Temporarily copy from old tables to new ones if the data is not found (duration: 02m 40s)
* 15:40 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 15:40 jbond@cumin1001: START - Cookbook sre.hosts.downtime
* 15:40 jbond42: rebooting wtp2019-2020
* 15:39 mobrovac@deploy1001: Started deploy [restbase/deploy@cf00120]: Revert Temporarily copy from old tables to new ones if the data is not found
* 15:38 mobrovac@deploy1001: Finished deploy [restbase/deploy@022cb98]: Temporarily copy from old tables to new ones if the data is not found, take #2 (duration: 00m 45s)
* 15:38 mobrovac@deploy1001: Started deploy [restbase/deploy@022cb98]: Temporarily copy from old tables to new ones if the data is not found, take #2
* 15:37 mobrovac@deploy1001: Finished deploy [restbase/deploy@022cb98]: Temporarily copy from old tables to new ones if the data is not found - [[phab:T215956|T215956]] (duration: 07m 10s)
* 15:37 oblivian@deploy1001: Synchronized wmf-config/CommonSettings.php: Moving to 10% of users on php7 [[phab:T219150|T219150]] (duration: 00m 57s)
* 15:32 XioNoX: enable BGP to telia on cr1-codfw - [[phab:T222967|T222967]]
* 15:30 mobrovac@deploy1001: Started deploy [restbase/deploy@022cb98]: Temporarily copy from old tables to new ones if the data is not found - [[phab:T215956|T215956]]
* 15:24 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 15:23 jbond@cumin1001: START - Cookbook sre.hosts.downtime
* 15:23 jbond42: rebooting wtp2017-2018
* 15:13 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 15:13 jbond@cumin1001: START - Cookbook sre.hosts.downtime
* 15:13 jbond42: rebooting wtp2015-2016
* 15:10 XioNoX: disable BGP to telia on cr1-codfw - [[phab:T222967|T222967]]
* 15:05 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 15:05 jbond@cumin1001: START - Cookbook sre.hosts.downtime
* 15:05 jbond42: rebooting wtp2013-2014
* 15:02 crusnov@deploy1001: Finished deploy [netbox/deploy@3091b51]: deploy new version of ganeti-netbox sync - [[phab:T220422|T220422]] (duration: 00m 55s)
* 15:01 crusnov@deploy1001: Started deploy [netbox/deploy@3091b51]: deploy new version of ganeti-netbox sync - [[phab:T220422|T220422]]
* 14:57 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 14:57 jbond@cumin1001: START - Cookbook sre.hosts.downtime
* 14:57 jbond42: rebooting wtp2011-2012
* 14:57 hashar@deploy1001: rebuilt and synchronized wikiversions files: group0 to 1.34.0-wmf.6
* 14:50 jbond42: rebooting wtp2009-2010
* 14:50 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 14:50 jbond@cumin1001: START - Cookbook sre.hosts.downtime
* 14:44 jbond42: rebooting wtp2007-2008
* 14:44 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 14:44 jbond@cumin1001: START - Cookbook sre.hosts.downtime
* 14:37 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 14:37 jbond@cumin1001: START - Cookbook sre.hosts.downtime
* 14:37 jbond42: rebooting wtp2005-2006
* 14:31 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 14:31 jbond@cumin1001: START - Cookbook sre.hosts.downtime
* 14:31 jbond42: rebooting wtp2003-2004
* 14:27 hashar@deploy1001: Finished scap: testwiki to php-1.344.0-wmf.6 and rebuild l10n cache # [[phab:T220731|T220731]] (duration: 48m 09s)
* 14:26 volans: restarting wikibugs
* 14:25 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 14:25 elukey@cumin1001: START - Cookbook sre.hosts.decommission
* 14:14 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 14:13 jbond@cumin1001: START - Cookbook sre.hosts.downtime
* 14:13 jbond42: rebooting wtp2001-2002
* 13:50 bblack: rebooting lvs1013,14,15 for verification
* 13:39 hashar@deploy1001: Started scap: testwiki to php-1.344.0-wmf.6 and rebuild l10n cache # [[phab:T220731|T220731]]
* 13:37 hashar@deploy1001: Pruned MediaWiki: 1.34.0-wmf.1 (duration: 02m 12s)
* 13:36 hashar: scap clean --verbose --delete 1.34.0-wmf.1  # [[phab:T220731|T220731]]
* 13:29 hashar: scap clean --verbose --delete 1.33.0-wmf.25  # [[phab:T220731|T220731]]
* 13:25 godog: swift eqiad-prod: start depool ms-be1033 - [[phab:T223518|T223518]]
* 13:24 hashar: Applied security patches to 1.34.0-wmf.6 # [[phab:T220731|T220731]]
* 13:24 hashar: Applied security patches to 1.34.0-wmf.6
* 13:23 bblack: rebooting lvs1013 (possibly a few times, debugging startup issues)
* 13:20 hashar: scap prep 1.34.0-wmf.6  # [[phab:T220731|T220731]]
* 13:11 hashar: Updated plugins on https://releases-jenkins.wikimedia.org/
* 13:09 hashar: Restarting Jenkins [[phab:T224002|T224002]]
* 12:45 hashar: Cutting branch wmf/1.34.0-wmf.6 # [[phab:T220731|T220731]]
* 12:22 volans: restarting Icinga on icinga1001 to pick up new open files limits
* 12:08 jiji@deploy1001: Finished deploy [cpjobqueue/deploy@4588f16]: Migrating htmlCacheUpdate to PHP7 - [[phab:T219148|T219148]] (duration: 00m 54s)
* 12:07 jiji@deploy1001: Started deploy [cpjobqueue/deploy@4588f16]: Migrating htmlCacheUpdate to PHP7 - [[phab:T219148|T219148]]
* 11:59 mobrovac: started dewiki dumps - [[phab:T215956|T215956]]
* 11:58 mobrovac: started frwiki dumps - [[phab:T215956|T215956]]
* 11:46 mobrovac: started enwiki dumps - [[phab:T215956|T215956]]
* 11:27 Amir1: EU SWAT is done
* 11:27 ladsgroup@deploy1001: Synchronized wmf-config/CommonSettings.php: SWAT: [[gerrit:511658{{!}}Revert "Switch off php7 for investigation of production instabilities"]] (duration: 00m 50s)
* 11:20 volans: restarting Icinga on icinga2001 (passive server) to pick up new open file limits
* 11:18 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 11:17 jbond@cumin1001: START - Cookbook sre.hosts.downtime
* 11:17 jbond42: reboot wtp1025.eqiad.wmnet
* 11:10 ladsgroup@deploy1001: Synchronized wmf-config/CommonSettings.php: SWAT: [[gerrit:505816{{!}}Define wmgUseEntitySchema (T221651)]], part II (duration: 00m 49s)
* 11:09 mobrovac@deploy1001: Finished deploy [restbase/deploy@cf00120]: Switch Parsoid to simple k/v bucket - [[phab:T215956|T215956]] (duration: 25m 50s)
* 11:08 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:505816{{!}}Define wmgUseEntitySchema (T221651)]], part I (duration: 00m 50s)
* 11:07 godog: swift codfw-prod: remove ms-be201[345] - [[phab:T221068|T221068]]
* 10:59 _joe_: rolling restart of php7.2-fpm across the fleet to pick up a config change
* 10:44 mobrovac@deploy1001: Started deploy [restbase/deploy@cf00120]: Switch Parsoid to simple k/v bucket - [[phab:T215956|T215956]]
* 10:39 jijiki: updating prometheus-mcrouter-exporter on mw* servers
* 10:26 godog: pool new restbase hosts - [[phab:T219404|T219404]]
* 10:20 filippo@puppetmaster1001: conftool action : set/pooled=yes; selector: name=restbase1019.eqiad.wmnet
* 09:49 moritzm: updated buster netboot image to daily image from {{Gerrit|20190521}}
* 09:26 moritzm: reimaging graphite2001 to buster for some d-i tests
* 08:58 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Repool db2104 as candidate master and as API (duration: 00m 51s)
* 08:56 marostegui: Stop MySQL on db2041 as it will be decommissioned [[phab:T223950|T223950]]
* 06:59 oblivian@deploy1001: Synchronized wmf-config/CommonSettings.php: Turning off php7 sampling for investigation in [[phab:T223952|T223952]] (duration: 00m 53s)
* 06:55 elukey: reboot of stat100[4,5,6,7] and notebook100[3,4] for kernel upgrades
* 06:31 marostegui: Stop mariadb on db2104 to convert it to s2 candidate master
* 06:30 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Depool db2104 (duration: 00m 51s)
* 05:50 marostegui: Remove db2041 from tendril and zarcillo - [[phab:T223950|T223950]]
* 05:43 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Remove db2041 for decommissioning [[phab:T223950|T223950]] (duration: 00m 51s)
* 05:42 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Remove db2041 for decommissioning [[phab:T223950|T223950]] (duration: 00m 51s)
* 05:16 marostegui: Stop MySQL on db2040
* 05:16 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Depool db2040 (duration: 00m 50s)
* 05:14 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Pool db2114 into s6 - [[phab:T222772|T222772]] (duration: 00m 50s)
* 05:13 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Pool db2114 into s6 - [[phab:T222772|T222772]] (duration: 00m 51s)
* 03:36 urandom: bootstrapping restbase1027-c -- [[phab:T219404|T219404]]
* 00:47 urandom: bootstrapping restbase1027-b -- [[phab:T219404|T219404]]
* 00:05 aaron@deploy1001: Synchronized php-1.34.0-wmf.5/includes/libs/objectcache/APCUBagOStuff.php: {{Gerrit|982299d635623279}} (duration: 00m 54s)


== 2019-05-20 ==
*23:36 XioNoX: remove BGP sessions to starhub on cr4-ulsfo (left the IXP)
* 21:07 ejegg: updated payments-wiki from {{Gerrit|8397ccf9cc}} to {{Gerrit|d5ef5ad067}}
*22:59 marxarelli: deleted 95 docker images from contint1001, freeing ~ 8G on / cc: [[phab:T219850|T219850]]
* 19:20 mobrovac: bootstrap restbase1027-a - [[phab:T219404|T219404]]
*22:59 XioNoX: add terms to drop specific icmp frag packets from cr1/2-eqiad - [[phab:T224186|T224186]]
* 18:55 krinkle@deploy1001: Synchronized php-1.34.0-wmf.5/includes/Linker.php: [[phab:T222857|T222857]] / {{Gerrit|Iecc2140fabd3}} (duration: 00m 54s)
*22:53 marxarelli: deleting stale docker images from contint1001, cc: [[phab:T207707|T207707]] [[phab:T219850|T219850]]
* 16:43 onimisionipe: rolling reboot of maps eqiad to pick kernel upgrades
*22:25 mutante: phab2001 / phab1003 - why is 'git status' in /srv/phab/phabricator unclean with lots of file deletions but also not identical
* 16:38 mobrovac: bootstrap restbase1026-c - [[phab:T219404|T219404]]
*22:24 mutante: phab2001 - scap pull - but it fails with directory /srv/mediawiki not found  that's so wrong
* 15:26 onimisionipe: rebooting codfw maps to pick up kernel upgrades
*22:20 niharika29@deploy1001: Synchronized php-1.34.0-wmf.7/extensions/WikimediaEvents/: Avoid division by zero warnings [[phab:T224686|T224686]] (duration: 00m 49s)
* 15:26 marostegui: Stop replication on labsdb1011 to start compressing tables - [[phab:T222978|T222978]]
*22:19 niharika29@deploy1001: Synchronized php-1.34.0-wmf.7/extensions/PageTriage/: Fix broken feed - [[phab:T224693|T224693]] (duration: 00m 51s)
* 15:13 anomie@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Setting actor migration to write-new/read-new on group 0 ([[phab:T188327|T188327]]) (duration: 00m 55s)
*21:27 Krinkle: krinkle@mwmaint1002 Add 1 row to pagetriage_tags table on test2wiki db, based on PageTriageTagsPatch-recreated.sql. [[phab:T224693|T224693]], [[phab:T189929|T189929]]
* 14:54 bblack: rebooting lvs1013, lvs1014, lvs1015 (not in active service, yet)
*21:12 Krinkle: krinkle@mwmaint1002 Add 1 row to pagetriage_tags table on testwiki db, based on PageTriageTagsPatch-recreated.sql. [[phab:T224693|T224693]], [[phab:T189929|T189929]]
* 14:43 jiji@deploy1001: Finished deploy [cpjobqueue/deploy@89b0ad0]: Migrating RecordLintJob to PHP7 - [[phab:T219148|T219148]] (duration: 00m 55s)
*21:11 Krinkle: krinkle@mwmaint1002 Add 1 row to pagetriage_tags table on enwiki, based on PageTriageTagsPatch-recreated.sql. [[phab:T224693|T224693]], [[phab:T189929|T189929]]
* 14:42 jiji@deploy1001: Started deploy [cpjobqueue/deploy@89b0ad0]: Migrating RecordLintJob to PHP7 - [[phab:T219148|T219148]]
*21:10 niharika29@deploy1001: Synchronized php-1.34.0-wmf.7/extensions/PageTriage: Bump wgPageTriageCacheVersion [[phab:T224693|T224693]] (duration: 00m 51s)
* 14:21 marostegui: Reload haproxy on dbroxy1010 to depool labsdb1011
*21:07 XioNoX: add RPKI sessions on cr4-ulsfo - [[phab:T220669|T220669]]
* 14:14 marostegui: Reload haproxy on dbroxy1010 to repool labsdb1010
*20:39 twentyafterfour: phabricator: restart ssh-phab.service
* 13:58 mobrovac: bootstrap restbase1026-b - [[phab:T219404|T219404]]
*19:49 mutante: sodium (mirrors) - sudo -u mirror /usr/local/sbin/update-ubuntu-mirror
* 12:43 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: More traffic to db1126 and db1134 (duration: 00m 50s)
*18:49 Urbanecm: Morning SWAT finished
* 11:44 fsero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
*18:47 urbanecm@deploy1001: Synchronized php-1.34.0-wmf.7/extensions/GrowthExperiments/: [[:gerrit:513300{{!}}QuestionPoster: Correctly set timestamp when question is posted|gerrit:513300QuestionPoster: Correctly set timestamp when question is posted]] ([[phab:T223338|T223338]]) (duration: 00m 51s)
* 11:44 fsero@cumin1001: START - Cookbook sre.hosts.downtime
*18:26 mutante: phab1003 - switch 'vcs' user to 'NP' to match phab1001 setup and then /srv/phab/phabricator# ./bin/config set diffusion.ssh-user vcs ([[phab:T224677|T224677]])
* 11:28 fsero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
*18:24 XioNoX: bounce eqord-ulsfo interface to try to fix BFD sessions
* 11:28 fsero@cumin1001: START - Cookbook sre.hosts.downtime
*18:12 Krinkle: Running `php7adm /opcache-free`  on mw1348 and mw1321, [[phab:T224491|T224491]]
* 11:21 fsero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
*18:12 Krinkle: Running `php7adm /opcache-free`  on mw1348 and mw1321
* 11:21 fsero@cumin1001: START - Cookbook sre.hosts.downtime
*18:11 Krinkle: mw1348 (recent api/php72 100% experiment) shows signs of corruption
* 11:17 mobrovac: bootstrap restbase1026-a - [[phab:T219404|T219404]]
*18:11 Krinkle: mw1321 php7.2 shows signs of corruption for over 2 hours – https://phabricator.wikimedia.org/T224491#5224464
* 11:16 fsero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
*18:03 krinkle@deploy1001: Synchronized wmf-config/arclamp.php: (no justification provided) (duration: 00m 53s)
* 11:15 fsero@cumin1001: START - Cookbook sre.hosts.downtime
*16:24 bblack: re-pool cp3047 into service as ats-be - [[phab:T222937|T222937]]
* 11:01 arturo: icinga downtime toolschecker for 3h for [[phab:T223332|T223332]]
*16:04 mutante: phab1001 - removing 2620:0:861:103:10:64:32:186/128 from eth0
* 10:45 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
*16:03 mutante: phab1001 - removing 10.64.32.186/32 from eth0
* 10:44 jmm@cumin2001: START - Cookbook sre.hosts.downtime
*16:01 mutante: phab1001 - removing git-ssh.wm.org IP from interface - phab1003 - activating IPv6 listen address for git-ssh
* 10:43 jdrewniak@deploy1001: Synchronized portals: Wikimedia Portals Update: [[gerrit:511398{{!}} Bumping portals to master (T128546)]] (duration: 00m 49s)
*15:36 jynus: stop es1019 for maintenance [[phab:T213422|T213422]]
* 10:42 jdrewniak@deploy1001: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: [[gerrit:511398{{!}} Bumping portals to master (T128546)]] (duration: 00m 50s)
*15:26 cmjohnson1: shutting down db1099 to swap DIMM [[phab:T221502|T221502]]
* 10:27 moritzm: rebooting contint1001 for kernel update
*15:20 jynus@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1089 with full weight; depool es1019 (duration: 00m 52s)
* 10:25 hashar: contint1001: docker image prune -f  {{!}} Total reclaimed space: 7.115GB {{!}} [[phab:T207707|T207707]]
*15:19 herron: performing rolling reboots of eqiad kafka main cluster hosts for security updates
* 10:20 hashar: Stopped Zuul gracefully
*15:06 onimisionipe: pooled maps2004 - osm import is complete - [[phab:T224395|T224395]]
* 10:18 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
*14:44 andrewbogott: reimaging cloudvirtan1001 for [[phab:T224566|T224566]]
* 10:18 jmm@cumin2001: START - Cookbook sre.hosts.downtime
*14:43 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 10:18 fsero: puppet reenabled certs renewed - [[phab:T221346|T221346]]
*14:43 jbond@cumin1001: START - Cookbook sre.hosts.downtime
* 10:08 fsero: rolling over certs into mcrouter proxies codfw - [[phab:T221346|T221346]]
*14:42 andrewbogott: reimaging cloudvirtan1001
* 10:03 fsero: rolling over certs into mcrouter proxies eqiad - [[phab:T221346|T221346]]
*14:29 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 09:42 marostegui: Remove db2036 from tendril and zarcillo - [[phab:T223885|T223885]]
*14:29 jbond@cumin1001: START - Cookbook sre.hosts.downtime
* 09:39 marostegui: Stop MySQL on db2036 [[phab:T223885|T223885]]
*14:25 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 09:38 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Remove db2036, going to be decommissioned [[phab:T223885|T223885]] (duration: 00m 49s)
*14:25 jbond@cumin1001: START - Cookbook sre.hosts.downtime
* 09:37 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Remove db2036, going to be decommissioned [[phab:T223885|T223885]] (duration: 00m 49s)
*14:22 bblack: rebooting cp3047 (post-reimage/puppetization for [[phab:T222937|T222937]])
* 09:36 fsero: rolling over new certs to all mcrouter hosts except proxys - [[phab:T221346|T221346]]
*14:21 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 09:26 fsero: continue to rolling over new certs - [[phab:T221346|T221346]]
*14:21 jbond@cumin1001: START - Cookbook sre.hosts.downtime
* 09:01 fsero: disabling puppet on mcrouter hosts for regenerating certs - [[phab:T221346|T221346]]
*14:05 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 08:49 moritzm: installing atftpd security updates
*14:05 jbond@cumin1001: START - Cookbook sre.hosts.downtime
* 08:43 mobrovac: bootstrap restbase1025-c - [[phab:T219404|T219404]]
*14:00 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 08:38 moritzm: installing samba security updates
*14:00 jbond@cumin1001: START - Cookbook sre.hosts.downtime
* 08:36 moritzm: installing ghostscript security updates on jessie
*13:57 jijiki: enable puppet on mw* in eqiad
* 08:25 moritzm: installing cups-filter security updates on jessie (prerequisite for ghostscript security update)
*13:44 volans: rm /root/.ssh/known_hosts on cumin[12]001
* 07:48 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: More traffic to db1126 and db1134 (duration: 00m 48s)
*13:40 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 07:26 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Repool db2046 (duration: 00m 50s)
*13:40 jbond@cumin1001: START - Cookbook sre.hosts.downtime
* 06:25 elukey: rebuild and upload memkeys 20181031-1 to stretch-wikimedia
*13:36 zfilipin@deploy1001: rebuilt and synchronized wikiversions files: all wikis to 1.34.0-wmf.7
* 06:21 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: More traffic to db1126 and db1134 (duration: 00m 49s)
*13:28 jijiki: Enabling puppet on mw*.codfw.net
* 06:20 elukey: upgrade memkeys to version 20181031-1 on all the mc* hosts (was deployied only on a few of them) - [[phab:T208376|T208376]]
*13:22 zfilipin@deploy1001: Synchronized php-1.34.0-wmf.7/resources/src/jquery/jquery.suggestions.js: SWAT: <nowiki>[[gerrit:513237|jquery.suggestions: Do not show suggestions on prefilled values ([T224524])]]</nowiki> (duration: 00m 58s)
* 06:11 mobrovac: bootstrap restbase1025-b - [[phab:T219404|T219404]]
*13:17 filippo@puppetmaster1001: conftool action : set/pooled=no; selector: name=restbase1015.eqiad.wmnet
* 06:00 elukey: powercycle analytics1071 - soft lockups error messages in the dmesg
*13:17 filippo@puppetmaster1001: conftool action : set/pooled=no; selector: name=restbase1014.eqiad.wmnet
* 05:51 marostegui: Reload haproxy on dbproxy1010 to depool labsdb1010
*13:17 filippo@puppetmaster1001: conftool action : set/pooled=no; selector: name=restbase1013.eqiad.wmnet
* 05:42 marostegui: Reload haproxy on dbproxy1010 and dbproxy1011 to repool labsdb1009 and restore original weights
*13:17 filippo@puppetmaster1001: conftool action : set/pooled=no; selector: name=restbase1012.eqiad.wmnet
* 05:39 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Pool db1126 into s8, db1134 into s1 [[phab:T222682|T222682]] (duration: 00m 49s)
*13:17 filippo@puppetmaster1001: conftool action : set/pooled=no; selector: name=restbase1011.eqiad.wmnet
* 05:38 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Pool db1126 into s8, db1134 into s1 [[phab:T222682|T222682]] (duration: 00m 49s)
*13:16 filippo@puppetmaster1001: conftool action : set/pooled=no; selector: name=restbase1010.eqiad.wmnet
* 05:12 marostegui: Stop MySQL on db2046
*13:16 filippo@puppetmaster1001: conftool action : set/pooled=no; selector: name=restbase1009.eqiad.wmnet
* 05:11 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Depool db2046 (duration: 00m 50s)
*13:16 filippo@puppetmaster1001: conftool action : set/pooled=no; selector: name=restbase1008.eqiad.wmnet
* 05:07 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Repool db2038 (duration: 00m 49s)
*13:16 filippo@puppetmaster1001: conftool action : set/pooled=no; selector: name=restbase1007.eqiad.wmnet
* 05:01 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Depool db2038 (duration: 00m 55s)
*13:08 bblack: cp3047 puppet-disable + depool for reimage to ATS - [[phab:T222937|T222937]]
* 02:42 cdanis: cdanis@cp1075.eqiad.wmnet ~ % sudo -i varnish-backend-restart
*13:03 marostegui: Stop MySQL on db1099 for onsite maintenance - [[phab:T221502|T221502]]
*13:02 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1099 [[phab:T221502|T221502]] (duration: 00m 56s)
*13:00 reedy@deploy1001: Synchronized php-1.34.0-wmf.7/tests/phpunit/includes/: [[phab:T222628|T222628]] (duration: 01m 06s)
*12:58 reedy@deploy1001: Synchronized php-1.34.0-wmf.7/includes/Linker.php: [[phab:T222628|T222628]] (duration: 01m 04s)
*12:49 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
*12:49 jbond@cumin1001: START - Cookbook sre.hosts.downtime
*12:42 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
*12:42 jbond@cumin1001: START - Cookbook sre.hosts.downtime
*12:37 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
*12:37 jbond@cumin1001: START - Cookbook sre.hosts.downtime
*12:25 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
*12:25 jbond@cumin1001: START - Cookbook sre.hosts.downtime
*12:21 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
*12:21 jbond@cumin1001: START - Cookbook sre.hosts.downtime
*12:15 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
*12:15 jbond@cumin1001: START - Cookbook sre.hosts.downtime
*12:08 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
*12:08 jbond@cumin1001: START - Cookbook sre.hosts.downtime
*12:03 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
*12:02 jbond@cumin1001: START - Cookbook sre.hosts.downtime
*11:57 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
*11:57 jbond@cumin1001: START - Cookbook sre.hosts.downtime
*11:52 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
*11:52 jbond@cumin1001: START - Cookbook sre.hosts.downtime
*11:44 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
*11:44 jbond@cumin1001: START - Cookbook sre.hosts.downtime
*11:39 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
*11:39 jbond@cumin1001: START - Cookbook sre.hosts.downtime
*11:36 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
*11:35 jbond@cumin1001: START - Cookbook sre.hosts.downtime
*11:34 akosiaris: reboot ganeti2003 for kernel upgrades
*11:28 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
*11:28 jbond@cumin1001: START - Cookbook sre.hosts.downtime
*11:24 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
*11:24 jbond@cumin1001: START - Cookbook sre.hosts.downtime
*11:20 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
*11:20 jbond@cumin1001: START - Cookbook sre.hosts.downtime
*11:14 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
*11:14 jbond@cumin1001: START - Cookbook sre.hosts.downtime
*11:14 _joe_: freed opcache on mw1281
*11:11 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
*11:11 jbond@cumin1001: START - Cookbook sre.hosts.downtime
*11:07 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
*11:07 jbond@cumin1001: START - Cookbook sre.hosts.downtime
*11:05 Urbanecm: EU SWAT finished
*11:04 urbanecm@deploy1001: Synchronized wmf-config/abusefilter.php: [[:gerrit:Enable abusefilter blocking ability in plwiki]] ([[phab:T224617|T224617]]) (duration: 00m 58s)
*11:00 jijiki: Disable puppet on mw* servers to merge 507939 - [[phab:T219150|T219150]]
*10:42 jynus: upgrade and restart db1117 (temporary proxy fail for passive host, reduced redundancy for m*)
*10:39 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
*10:39 jbond@cumin1001: START - Cookbook sre.hosts.downtime
*10:36 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
*10:36 jbond@cumin1001: START - Cookbook sre.hosts.downtime
*10:32 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
*10:32 jbond@cumin1001: START - Cookbook sre.hosts.downtime
*10:28 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
*10:28 jbond@cumin1001: START - Cookbook sre.hosts.downtime
*10:25 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
*10:25 jbond@cumin1001: START - Cookbook sre.hosts.downtime
*10:22 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
*10:22 jbond@cumin1001: START - Cookbook sre.hosts.downtime
*10:19 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
*10:19 jbond@cumin1001: START - Cookbook sre.hosts.downtime
*10:18 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
*10:18 jbond@cumin1001: START - Cookbook sre.hosts.downtime
*10:15 jbond@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
*10:15 jbond@cumin1001: START - Cookbook sre.hosts.downtime
*10:07 jynus: upgrade and restart test-s4 hosts (db1111, db1112)
*09:42 jynus: stop and upgrade db1102
*09:32 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
*09:32 jbond@cumin1001: START - Cookbook sre.hosts.downtime
*09:31 _joe_: depooling mw1261 for benchmarking for [[phab:T224491|T224491]]
*09:26 jynus@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1089 with low weight (duration: 00m 55s)
*08:54 jynus: stop and restart db1089 for upgrade
*08:50 onimisionipe: maps2001 postgres initialization - [[phab:T224395|T224395]]
*08:44 jynus@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1089 for maintenance (duration: 00m 57s)
*08:32 jynus@deploy1001: Synchronized wmf-config/db-codfw.php: Repool db2087 for maintenance (duration: 01m 00s)
*08:10 mobrovac: drop old Parsoid tables from cassandra -- [[phab:T223998|T223998]]
*07:40 mobrovac@deploy1001: Finished deploy [restbase/deploy@92591a7]: Switch to OpenAPI v3 and drop page/html/title/revision/tid - [[phab:T218218|T218218]] [[phab:T215956|T215956]] (duration: 19m 28s)
*07:33 _joe_: upgraded service-checker on icinga1001,2
*07:21 mobrovac@deploy1001: Started deploy [restbase/deploy@92591a7]: Switch to OpenAPI v3 and drop page/html/title/revision/tid - [[phab:T218218|T218218]] [[phab:T215956|T215956]]
*00:40 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Repool db2091 - [[phab:T224393|T224393]] (duration: 00m 56s)
*00:24 mutante: re-enabling puppet on phab1001 now that it does not have the phab role anymore ([[phab:T221389|T221389]])
*00:17 mutante: rsyncing /srv/repos again. pulling on phab2001 from phab1003 ([[phab:T221389|T221389]])


== 2019-05-19 ==
==2019-05-29==
* 20:16 ariel@deploy1001: Finished deploy [dumps/dumps@4febe0c]: for abstract dumps, skip any processing of pages not in main namespace (duration: 00m 03s)
* 20:16 ariel@deploy1001: Started deploy [dumps/dumps@4febe0c]: for abstract dumps, skip any processing of pages not in main namespace
* 17:51 mobrovac: bootstrap restbase1025-a - [[phab:T219404|T219404]]
* 13:26 ebernhardson@deploy1001: Synchronized wmf-config/ProductionServices.php: [[phab:T223734|T223734]]: Depool cloudelastic100[12] (duration: 00m 49s)
* 12:37 reedy@deploy1001: Synchronized wmf-config/interwiki-labs.php: update (duration: 00m 57s)
* 10:32 reedy@deploy1001: Synchronized wikiversions-labs.json: [[phab:T223770|T223770]] (duration: 00m 48s)
* 10:31 reedy@deploy1001: Synchronized dblists/all-labs.dblist: [[phab:T223770|T223770]] (duration: 00m 51s)
* 10:12 mobrovac: bootstrap restbase1024-c - [[phab:T219404|T219404]]
* 09:59 ebernhardson: eqiad psi elasticsearch high disk watermark to 89% to allow unallocated shard to initialize
* 09:56 ebernhardson: eqiad psi elasticsearch low disk watermark to 79% to allow unallocated shard to initialize
* 08:13 jijiki: varnish-backend-restart on cp1087
* 06:56 mobrovac: bootstrap restbase1024-b - [[phab:T219404|T219404]]
* 05:09 marostegui: varnish-backend-restart on cp1081


== 2019-05-18 ==
*23:37 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Remove wikibase sameAs A/B test config, part II (duration: 00m 56s)
* 23:53 bblack: rebooting lvs1015 for interface changes
*23:36 jforrester@deploy1001: sync-file aborted: Remove wikibase sameAs A/B test config, part I (duration: 00m 00s)
* 22:44 bblack: imaging lvs1013-lvs1015
*23:35 jforrester@deploy1001: Synchronized wmf-config/Wikibase.php: Remove wikibase sameAs A/B test config, part I (duration: 00m 56s)
* 21:01 bblack: depooling eqiad public front edge in authdns
*23:26 jforrester@deploy1001: Synchronized php-1.34.0-wmf.7/extensions/AbuseFilter/includes/parser/AbuseFilterTokenizer.php: SWAT AbuseFilter: Tokenizer caching back to APC {{Gerrit|I8c6a4a95e}} (duration: 00m 54s)
* 19:18 krinkle@deploy1001: Synchronized php-1.34.0-wmf.5/extensions/Collection/templates/CollectionSuggestTemplate.php: [[phab:T223742|T223742]] / {{Gerrit|89bd434a21a745ec}} (duration: 00m 49s)
*23:19 jforrester@deploy1001: Synchronized wmf-config/flaggedrevs.php: Replace FR constants with numbers {{Gerrit|Ia52f644948}} (duration: 00m 56s)
* 19:16 mobrovac: bootstrap restbase1024-a - [[phab:T219404|T219404]]
*23:17 jforrester@deploy1001: Synchronized multiversion/MWScript.php: Mark refreshMessageBlobs.php as a global script (duration: 00m 56s)
* 18:50 krinkle@deploy1001: Synchronized wmf-config/CommonSettings.php: [[phab:T222146|T222146]] / {{Gerrit|9385b2dd66}} (duration: 00m 50s)
*23:15 mutante: repooled phab2001-vcs , fixes pybal / lvs alerts
* 16:53 mobrovac: bootstrap restbase1023-c - [[phab:T219404|T219404]]
*23:13 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=phab2001-vcs.codfw.wmnet
* 15:57 krinkle@deploy1001: Synchronized php-1.34.0-wmf.5/extensions/TimedMediaHandler/includes/handlers/WebMHandler/WebMHandler.php: [[phab:T223445|T223445]] / {{Gerrit|a9df59c59d7a30}} (duration: 00m 51s)
*23:10 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT Enable wgSpecialSearchFormOptions on production Wikidata [[phab:T55652|T55652]] (duration: 00m 57s)
* 14:59 reedy@deploy1001: Synchronized wmf-config/InitialiseSettings.php: whitespace is srs (duration: 00m 49s)
*23:01 mutante: phab2001 - same issue with tin.eqiad.wmnet still showing up when first trying to git clone
* 14:56 reedy@deploy1001: Synchronized wmf-config/flaggedrevs.php: Copy in default config (duration: 01m 04s)
*22:52 mutante: misweb2001 - a2dismod mpm_event ; systemctl restart apache2 to fix php7.0 dependency issue
* 13:51 urandom: bootstrapping restbase1023-b - [[phab:T219404|T219404]]
*22:50 mutante: miscweb2001 - when first trying to git pull iegreview - still tries to resolve 'tin.eqiad.wmnet' which is long gone. fix is still to manually edit /srv/deployment/iegreview/iegreview-cache/cache/.git/config
* 05:41 mobrovac: bootstrap rb1023-a - [[phab:T219404|T219404]]
*22:46 jforrester@deploy1001: Synchronized wmf-config/CirrusSearch-common.php: Hot-deploy [[phab:T224634|T224634]] to fix CirrusSearch for extension registration (duration: 00m 57s)
* 02:37 urandom: bootstrapping restbase1022-c - [[phab:T219404|T219404]]
*21:50 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=phab2001-vcs.codfw.wmnet
*21:47 mutante: installing OS on miscweb2001 VM failed at grub install step :( [[phab:T224323|T224323]]
*21:47 mutante: sign puppet cert request for phab2001 after reinstall (for some reason it needed me to connect to console and hit enter, reimage script itself was stuck)
*20:54 mutante: creating new ganeti VM miscweb2001.codfw.wmnet with same specs as krypton.eqiad.wmnet ([[phab:T224323|T224323]])
*20:35 arlolra: Updated Parsoid to {{Gerrit|8546c79}} ([[phab:T219927|T219927]], [[phab:T211125|T211125]])
*20:35 ejegg: updated payments-wiki from {{Gerrit|332aaa96e2}} to {{Gerrit|45b73e7749}}
*20:28 arlolra@deploy1001: Finished deploy [parsoid/deploy@6caac43]: Updating Parsoid to {{Gerrit|8546c79}} (duration: 07m 46s)
*20:20 arlolra@deploy1001: Started deploy [parsoid/deploy@6caac43]: Updating Parsoid to {{Gerrit|8546c79}}
*20:10 bblack: pool cp3044 (esams cache_upload ats-be) - [[phab:T222937|T222937]]
*19:46 reedy@deploy1001: Synchronized php-1.34.0-wmf.7/extensions/Collection/: Replace missing wfCollectionSuggestAction (duration: 00m 57s)
*19:45 XioNoX: enable cr1-codfw:et-0/2/1 - [[phab:T224511|T224511]]
*19:45 reedy@deploy1001: Synchronized php-1.34.0-wmf.6/extensions/Collection/: Replace missing wfCollectionSuggestAction (duration: 01m 01s)
*19:42 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=phab2001-vcs.codfw.wmnet
*19:32 mutante: phab2001 - reinstalling with stretch - upgrade from jessie ([[phab:T190568|T190568]])
*19:09 XioNoX: enable cr1-codfw:et-0/2/0 - [[phab:T224511|T224511]]
*18:37 bblack@cumin1001: conftool action : set/pooled=no; selector: name=cp3044.esams.wmnet
*17:44 XioNoX: enable cr1-codfw:et-0/0/1 - [[phab:T224511|T224511]]
*17:13 XioNoX: enable cr1-codfw:et-0/0/0 - [[phab:T224511|T224511]]
*17:02 urbanecm@deploy1001: Synchronized wmf-config/CommonSettings.php: [[:gerrit:501926{{!}}Change arwiki default user preferences|gerrit:501926Change arwiki default user preferences]], part 3/3 ([[phab:T220186|T220186]]) (duration: 00m 56s)
*17:00 urbanecm@deploy1001: Synchronized wmf-config/flaggedrevs.php: [[:gerrit:501926{{!}}Change arwiki default user preferences|gerrit:501926Change arwiki default user preferences]], part 2/3 ([[phab:T220186|T220186]]) (duration: 00m 56s)
*16:59 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [[:gerrit:501926{{!}}Change arwiki default user preferences|gerrit:501926Change arwiki default user preferences]], part 1/3 ([[phab:T220186|T220186]]) (duration: 00m 56s)
*16:48 sbisson@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:512942]] Revert: Hardcode korean help desk config (duration: 00m 56s)
*16:45 sbisson@deploy1001: Synchronized php-1.34.0-wmf.7/extensions/GrowthExperiments/includes/HelpPanel.php: SWAT: [[gerrit:512941]] Prevent parsing of GEHelpPanelHelpDeskTitle from accessing the session (duration: 00m 56s)
*16:42 sbisson@deploy1001: Synchronized php-1.34.0-wmf.6/extensions/GrowthExperiments/includes/HelpPanel.php: SWAT: [[gerrit:512940]] Prevent parsing of GEHelpPanelHelpDeskTitle from accessing the session (duration: 01m 00s)
*16:32 sbisson@deploy1001: Synchronized php-1.34.0-wmf.6/extensions/GrowthExperiments/includes/HelpPanel/QuestionRecord.php: SWAT: [[gerrit:512950]] Revert: Fix phan job: ignore line using JsonSerializable (duration: 00m 57s)
*16:08 gehel@cumin1001: END (PASS) - Cookbook sre.elasticsearch.rolling-reboot (exit_code=0)
*15:55 jynus: upgrade and restart db2087
*15:11 moritzm: draining ganeti2008 for eventual reboot to pick up MDS-enabled kernel
*15:06 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
*15:06 jmm@cumin2001: START - Cookbook sre.hosts.downtime
*15:06 anomie@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Setting actor migration to write-new/read-new on group 1 ([[phab:T188327|T188327]]) (duration: 00m 57s)
*14:54 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
*14:54 jmm@cumin2001: START - Cookbook sre.hosts.downtime
*14:54 moritzm: draining ganeti2007 for eventual reboot to pick up MDS-enabled kernel
*14:51 XioNoX: `request chassis fpc online slot 0` on cr1-codfw - [[phab:T224511|T224511]]
*14:48 XioNoX: `request chassis fpc offline slot 0` on cr1-codfw - [[phab:T224511|T224511]]
*14:47 XioNoX: disable et- interfaces on cr1-codfw - [[phab:T224511|T224511]]
*14:45 andrewbogott: reimaging cloudcontrol1003 [[phab:T221770|T221770]]
*14:34 moritzm: draining ganeti2006 for eventual reboot to pick up MDS-enabled kernel
*14:34 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
*14:34 jmm@cumin2001: START - Cookbook sre.hosts.downtime
*14:32 andrewbogott: powering off cloudcontrol1003 as one last check to see what explodes before I reimage it
*14:30 _joe_: installing the new service checker on restbase in eqiad
*14:29 _joe_: installing new service checker version on restbase in codfw
*14:20 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
*14:20 jbond@cumin1001: START - Cookbook sre.hosts.downtime
*14:09 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
*14:09 jbond@cumin1001: START - Cookbook sre.hosts.downtime
*14:01 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
*14:01 jbond@cumin1001: START - Cookbook sre.hosts.downtime
*13:58 gehel@cumin1001: START - Cookbook sre.elasticsearch.rolling-reboot
*13:58 gehel@cumin1001: END (ERROR) - Cookbook sre.elasticsearch.rolling-reboot (exit_code=97)
*13:54 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
*13:54 jbond@cumin1001: START - Cookbook sre.hosts.downtime
*13:48 urandom: decommissioning restbase1015-c -- [[phab:T223976|T223976]]
*13:35 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
*13:35 jmm@cumin2001: START - Cookbook sre.hosts.downtime
*13:19 zfilipin@deploy1001: Synchronized php: group1 wikis to 1.34.0-wmf.7 (duration: 00m 58s)
*13:18 zfilipin@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.34.0-wmf.7
*13:16 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
*13:16 jbond@cumin1001: START - Cookbook sre.hosts.downtime
*13:12 Urbanecm: mwscript emptyUserGroup.php --wiki=fawiki 'uploader' finished ([[phab:T221441|T221441]])
*13:06 andrewbogott: stopping openstack services on cloudcontrol1003 in anticipation of a re-image
*13:03 gehel@cumin1001: START - Cookbook sre.elasticsearch.rolling-reboot
*13:02 gehel@cumin1001: END (ERROR) - Cookbook sre.elasticsearch.rolling-reboot (exit_code=97)
*13:02 gehel@cumin1001: END (PASS) - Cookbook sre.elasticsearch.force-shard-allocation (exit_code=0)
*13:02 gehel@cumin1001: START - Cookbook sre.elasticsearch.force-shard-allocation
*13:01 gehel@cumin1001: END (PASS) - Cookbook sre.elasticsearch.force-shard-allocation (exit_code=0)
*13:01 gehel@cumin1001: START - Cookbook sre.elasticsearch.force-shard-allocation
*13:00 gehel@cumin1001: START - Cookbook sre.elasticsearch.rolling-reboot
*12:42 Zppix: [12:27:02]  jbond@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
*12:41 Zppix: [12:27:02] jbond@cumin1001 START - Cookbook sre.hosts.downtime
*12:40 Zppix: [12:23:06] <jijiki> Rolling restart pdfrender on scb*
*<nowiki>{{safesubst:SAL entry|1=12:39 Zppix: [[12:20:49]  jbond@cumin1001 START - Cookbook sre.hosts.downtime}}</nowiki>
*12:39 Zppix: [12:20:49] jbond@cumin1001 START - Cookbook sre.hosts.downtime
*12:38 Zppix: [12:11:55] jbond@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
*12:38 Zppix: [12:11:54] jbond@cumin1001 START - Cookbook sre.hosts.downtime
*12:37 Zppix: [12:01:54] jbond@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0
*12:36 Zppix: [12:01:54] jbond@cumin1001 START - Cookbook sre.hosts.downtime
*12:36 Zppix: [12:00:21] marostegui@deploy1001 Synchronized wmf-config/db-eqiad.php: Remove db2037 from config as it will be decommissioned [[phab:T221533|T221533]] (duration: 00m 56s)
*12:35 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
*12:35 jbond@cumin1001: START - Cookbook sre.hosts.downtime
*12:34 Zppix: [11:59:19] marostegui@deploy1001 Synchronized wmf-config/db-codfw.php: Remove db2037 from config as it will be decommissioned [[phab:T221533|T221533]]
*12:33 Zppix: [11:58:16] <arturo> [[phab:T221770|T221770]] icinga downtime cloudcontrol1003.wikimedia.org for upcoming rebuild as stretch
*12:32 Zppix: [11:57:57] aborrero@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
*12:32 Zppix: [11:57:55] aborrero@cumin1001 START - Cookbook sre.hosts.downtime
*12:31 Zppix: [11:55:54] <Urbanecm> EU SWAT finished, maintenance script emptyUserGroup.php still running in separate tmux session
*12:31 Zppix: [11:55:11] urbanecm@deploy1001 Synchronized wmf-config/InitialiseSettings.php: [[:gerrit:511849{{!}}Set wgLocaltimezone for euwiki to Europe/Berlin|gerrit:511849Set wgLocaltimezone for euwiki to Europe/Berlin]] ([[phab:T224091|T224091]]) (duration: 00m 57s)
*12:30 Zppix: [11:55:10] jbond@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
*12:29 Zppix: [11:55:09]  jbond@cumin1001 START - Cookbook sre.hosts.downtime
*11:55 jbond@cumin1001: START - Cookbook sre.hosts.downtime
*11:50 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [[:gerrit:471260{{!}}RSS: Update URLs to the old Wikimedia Foundation blog to point to the new site|gerrit:471260RSS: Update URLs to the old Wikimedia Foundation blog to point to the new site]] ([[phab:T208458|T208458]]) (duration: 00m 57s)
*11:46 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
*11:46 jbond@cumin1001: START - Cookbook sre.hosts.downtime
*11:46 gehel@cumin1001: END (ERROR) - Cookbook sre.elasticsearch.rolling-reboot (exit_code=97)
*11:45 Urbanecm: Started mwscript emptyUserGroup.php --wiki=fawiki 'uploader' ([[phab:T221441|T221441]])
*11:44 urbanecm@deploy1001: Synchronized dblists/commonsuploads.dblist: [[:gerrit:505228{{!}}Remove uploader user group from fawiki and merge it with autoconfirmed|gerrit:505228Remove uploader user group from fawiki and merge it with autoconfirmed]], part 2 ([[phab:T221441|T221441]]) (duration: 00m 55s)
*11:43 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [[:gerrit:505228{{!}}Remove uploader user group from fawiki and merge it with autoconfirmed|gerrit:505228Remove uploader user group from fawiki and merge it with autoconfirmed]], part 1 ([[phab:T221441|T221441]]) (duration: 00m 55s)
*11:40 Urbanecm: Purged angwikibooks HD logos
*11:38 urbanecm@deploy1001: Synchronized static/images/project-logos/: [[:gerrit:512433{{!}}Add HD logo for angwikibooks|gerrit:512433Add HD logo for angwikibooks]], logo files ([[phab:T150618|T150618]]) (duration: 00m 56s)
*11:35 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [[:gerrit:512478{{!}}Enable transwiki import between sqwiki and sqwikiquote|gerrit:512478Enable transwiki import between sqwiki and sqwikiquote]] ([[phab:T221234|T221234]]) (duration: 00m 56s)
*11:31 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
*11:31 jbond@cumin1001: START - Cookbook sre.hosts.downtime
*11:30 pmiazga@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:509130 Enable Advanced Mobile Contributions Overflow menu (T223883)]] (duration: 00m 57s)
*11:18 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [[:gerrit:512488{{!}}Remove bureaucrat protection level for all Serbian projects|gerrit:512488Remove bureaucrat protection level for all Serbian projects]] ([[phab:T217005|T217005]]) (duration: 00m 57s)
*11:14 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [[:gerrit:512487{{!}}Fix Serbian projects wgRestrictionLevels|gerrit:512487Fix Serbian projects wgRestrictionLevels]] ([[phab:T217005|T217005]]) (duration: 00m 57s)
*11:11 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
*11:11 jbond@cumin1001: START - Cookbook sre.hosts.downtime
*11:08 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [[:gerrit:506892{{!}}Add namespace aliases on zhwiktionary|gerrit:506892Add namespace aliases on zhwiktionary]] ([[phab:T222024|T222024]]) (duration: 00m 57s)
*11:02 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
*11:02 jbond@cumin1001: START - Cookbook sre.hosts.downtime
*10:59 gehel@cumin1001: START - Cookbook sre.elasticsearch.rolling-reboot
*10:57 jynus@deploy1001: Synchronized wmf-config/db-codfw.php: Depool db2087 for  maintenance (duration: 01m 11s)
*10:57 Urbanecm: deleteBatch.php for srwikinews finished ([[phab:T212346|T212346]])
*10:48 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
*10:48 jmm@cumin2001: START - Cookbook sre.hosts.downtime
*10:33 mobrovac@deploy1001: Finished deploy [restbase/deploy@92591a7] (dev-cluster): Switch to OpenAPI v3 (duration: 03m 36s)
*10:29 mobrovac@deploy1001: Started deploy [restbase/deploy@92591a7] (dev-cluster): Switch to OpenAPI v3
*09:51 gehel@cumin1001: END (ERROR) - Cookbook sre.elasticsearch.rolling-reboot (exit_code=97)
*09:45 _joe_: uploading a new service-checker version to jessie-wikimedia
*09:18 gehel@cumin1001: START - Cookbook sre.elasticsearch.rolling-reboot
*08:51 moritzm: draining ganeti2002 for eventual reboot to pick up MDS-enabled kernel
*08:31 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
*08:31 jmm@cumin2001: START - Cookbook sre.hosts.downtime
*08:31 moritzm: draining ganeti2001 for eventual reboot to pick up MDS-enabled kernel
*07:42 mobrovac: decommission restbase1015-b -- [[phab:T223976|T223976]]
*07:40 godog: ms-be2043 start sdd rebuild - [[phab:T222654|T222654]]
*07:03 jijiki: restarting pdfrender on scb1003


== 2019-05-17 ==
==2019-05-28==
* 23:55 urandom: bootstrapping restbase1022-b - [[phab:T219404|T219404]]
* 23:11 foks: removing one file for legal compliance
* 15:20 hashar@deploy1001: Synchronized php-1.34.0-wmf.5/includes/api/ApiUpload.php: Revert "Always validate uploads over api" - [[phab:T223448|T223448]] ([[phab:T222994|T222994]] [[phab:T223446|T223446]]) (duration: 01m 00s)
* 15:18 hashar: Deploying hotfix https://gerrit.wikimedia.org/r/#/c/mediawiki/core/+/510924/ . Should restore upload of large files on commons and other wikis #[[phab:T223448|T223448]] (poke [[phab:T22994|T22994]]  [[phab:T223446|T223446]] )
* 14:51 mobrovac: bootstrap restbase1022-a - [[phab:T219404|T219404]]
* 14:43 fsero: reenabling puppet puppet on mcrouter hosts  for [[phab:T221346|T221346]], checks in place is there any alert for cert expiration and mcrouter this is the source :)
* 14:17 jynus@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1098 & db1131 after maintenance (duration: 00m 49s)
* 14:09 fsero: second round of setting up cert check, disablign puppet on mcrouter hosts [[phab:T221346|T221346]]
* 12:58 mobrovac: bootstrap restbase1021-c - [[phab:T219404|T219404]]
* 10:59 mobrovac: bootstrap restbase1021-b - [[phab:T219404|T219404]]
* 09:27 godog: swift remove ms-be101[345] from rings - [[phab:T220590|T220590]]
* 09:02 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Fully repool db1083 (duration: 00m 48s)
* 08:53 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: More traffic to db1083 (duration: 00m 49s)
* 08:43 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: More traffic to db1083 (duration: 00m 49s)
* 08:32 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: More traffic to db1083 (duration: 00m 49s)
* 08:24 fsero: reenabling puppet after reverting [[phab:T221346|T221346]]
* 08:19 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Slowly repool db1083 (duration: 00m 59s)
* 07:57 fsero: disabling puppet on mcrouter hosts for [[phab:T221346|T221346]]
* 07:12 marostegui: Compress s7 on labsdb1012 [[phab:T222978|T222978]]
* 06:36 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Pool db2111 and db2113 into s5 [[phab:T222772|T222772]] (duration: 00m 49s)
* 06:35 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Pool db2111 and db2113 into s5 [[phab:T222772|T222772]] (duration: 00m 50s)
* 05:19 marostegui: Stop MySQL on db1083 to clone db1134
* 05:17 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1083 (duration: 00m 50s)
* 05:00 mobrovac: bootstrap 1021-a - [[phab:T219404|T219404]]


== 2019-05-16 ==
*23:19 jforrester@deploy1001: Synchronized php-1.34.0-wmf.7/extensions/TimedMediaHandler/includes/ApiTimedText.php: [[phab:T224522|T224522]] Fix fatal in ApiTimedText following redirect pages (duration: 00m 56s)
* 21:02 Jeff_Green: authdns-update to switch payments.wikimedia.org back to eqiad cluster
*23:17 jforrester@deploy1001: Synchronized php-1.34.0-wmf.7/extensions/TimedMediaHandler/includes/handlers/TextHandler/TextHandler.php: [[phab:T224367|T224367]] Fix regression in subtitles for non-English sites on Commons videos (duration: 00m 57s)
* 19:24 onimisionipe: pooling elastic2038 - shards are properly balanced across nodes
*23:17 bstorm_: [[phab:T221339|T221339]] completed view updates on labsdb1009 without depooling
* 18:31 onimisionipe: depooling elastic2038 to investigate more
*23:16 jforrester@deploy1001: Synchronized php-1.34.0-wmf.6/extensions/TimedMediaHandler/includes/handlers/TextHandler/TextHandler.php: [[phab:T224367|T224367]] Fix regression in subtitles for non-English sites on Commons videos (duration: 00m 56s)
* 17:26 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
*23:14 jforrester@deploy1001: Synchronized php-1.34.0-wmf.6/extensions/TimedMediaHandler/includes/ApiTimedText.php: [[phab:T224522|T224522]] Fix fatal in ApiTimedText following redirect pages (duration: 00m 58s)
* 17:26 jbond@cumin1001: START - Cookbook sre.hosts.downtime
*23:11 jforrester@deploy1001: Synchronized wmf-config/flaggedrevs.php: FlaggedRevisions: Copy in rest of the config, for static registration {{Gerrit|I77d70519f}} {{Gerrit|Id0cd2e18c}} (duration: 00m 56s)
* 17:26 jbond42: reboot ores1007-1009
*23:10 bstorm_: [[phab:T221339|T221339]] repooled labsdb1011
* 17:15 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
*23:06 jforrester@deploy1001: Synchronized wmf-config/throttle.php: Remove expired throttle rules {{Gerrit|I4ba3d489}} (duration: 00m 55s)
* 17:15 jbond@cumin1001: START - Cookbook sre.hosts.downtime
*23:06 bstorm_: [[phab:T221339|T221339]] depooled labsdb1011 and updated views
* 17:15 jbond42: reboot ores1005-1006
*23:02 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT [[phab:T55652|T55652]] Enable wgSpecialSearchFormOptions on testwikidata (duration: 00m 56s)
* 17:10 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
*22:49 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: SWAT Fix order of edit tabs for multi-tabs on SET wikis [[phab:T223793|T223793]] (duration: 00m 57s)
* 17:10 jbond@cumin1001: START - Cookbook sre.hosts.downtime
*22:28 cstone_: Re-enabled fundraising thank you mail job
* 17:10 jbond42: reboot ores1003-1004
*22:25 mutante: cp3034 - sudo -i varnish-backend-restart
* 17:05 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
*22:18 cstone_: Updated fundraising civicrm from {{Gerrit|21afd001b6}} to {{Gerrit|bb4acf3d8a}}
* 17:05 jbond@cumin1001: START - Cookbook sre.hosts.downtime
*22:14 mutante: cp3035 - varnish-backend-restart
* 17:05 jbond42: reboot ores1001-1002
*22:13 bstorm_: repooled labsdb1010
* 17:00 jbond42: reboot orespoolcounter[12]002
*22:09 mutante: cp3034 - restart varnish backend
* 16:53 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
*22:09 XioNoX: restart varnish backend on cp3039
* 16:53 jbond@cumin1001: START - Cookbook sre.hosts.downtime
*22:02 cstone_: Disabled fundraising thank you mail job
* 16:53 jbond@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
*21:46 bstorm_: depool labsdb1010 for view updates
* 16:53 jbond@cumin1001: START - Cookbook sre.hosts.downtime
*21:38 smalyshev@deploy1001: Finished deploy [wdqs/wdqs@5a69072]: Deploy GUI & Blazegraph update (duration: 14m 37s)
* 16:53 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
*21:35 urandom: decommissioning restbase1015-a -- [[phab:T223976|T223976]]
* 16:52 jbond@cumin1001: START - Cookbook sre.hosts.downtime
*21:24 smalyshev@deploy1001: Started deploy [wdqs/wdqs@5a69072]: Deploy GUI & Blazegraph update
* 16:52 jbond@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
*21:23 ebernhardson: restart elasticsearch on cloudelastic1001 to test sanely sized readahead on /dev/dm-0
* 16:52 jbond@cumin1001: START - Cookbook sre.hosts.downtime
*21:11 gehel@cumin2001: END (PASS) - Cookbook sre.elasticsearch.rolling-reboot (exit_code=0)
* 16:51 jbond42: reboot orespoolcounter[12]001
*20:58 mutante: phab1003 / phab2001 - removing 'apache restart' from root's crontab (gerrit:512977) ([[phab:T187790|T187790]])
* 16:44 jbond42: reboot ores2008-2009
*20:28 mholloway-shell@deploy1001: Synchronized wmf-config/InitialiseSettings.php: WikimediaEditorTasks: Update caption edit target counts (duration: 00m 57s)
* 16:38 jbond42: will frist reboot ores2006-2007
*19:17 gehel@cumin2001: START - Cookbook sre.elasticsearch.rolling-reboot
* 16:36 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
*19:15 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Remove db1064 from config as it will be decommissioned [[phab:T223217|T223217]] (duration: 00m 55s)
* 16:36 jbond@cumin1001: START - Cookbook sre.hosts.downtime
*19:14 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Remove db1064 from config as it will be decommissioned [[phab:T223217|T223217]] (duration: 00m 56s)
* 16:36 jbond42: reboot ores2006-2009
*19:02 marostegui: Reboot db2091 for full OS and MySQL upgrade - [[phab:T224393|T224393]]
* 16:28 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
*18:55 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Stop setting wgMediaInfoEnableFilePageDepicts, no longer read (duration: 00m 57s)
* 16:28 jbond@cumin1001: START - Cookbook sre.hosts.downtime
*18:51 jforrester@deploy1001: Synchronized wmf-config/Wikibase.php: Add forwards-compatibility for dataCdnMaxAge (duration: 01m 00s)
* 16:28 jbond42: reboot ores2003-2005
*18:11 marostegui: Start mysql for s2 and s4 on db2091 [[phab:T224393|T224393]]
* 16:22 XioNoX: add BGP session to Hetzner in AMS-IX
*17:57 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 16:19 akosiaris: switch all etcd* kubestagetcd* servers from "drbd" ganeti disk template to "plain" ganeti disk template
*17:57 jbond@cumin1001: START - Cookbook sre.hosts.downtime
* 16:17 jbond42: reboot ores2001-2002
*17:52 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 16:16 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
*17:52 jmm@cumin2001: START - Cookbook sre.hosts.downtime
* 16:16 jbond@cumin1001: START - Cookbook sre.hosts.downtime
*17:48 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 15:59 akosiaris: build service-checker OCI container 0.0.2 with 0.1.5 service-checker version [[phab:T220401|T220401]]
*17:48 jbond@cumin1001: START - Cookbook sre.hosts.downtime
* 15:49 jforrester@deploy1001: Synchronized php-1.34.0-wmf.5/extensions/CirrusSearch/includes/InterwikiSearcher.php: Hot-deploy CirrusSearch interwiki no result UBN [[phab:T223449|T223449]] (duration: 00m 49s)
*17:42 moritzm: rebooting yubiauth* servers for kernel update
* 15:45 marostegui: Drop the following databases from tendril to recreated them with the right user: db1127,db1129,db1130, db1131, db1137,db1138
*17:41 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 15:35 jforrester@deploy1001: Synchronized php-1.34.0-wmf.5/includes/specials/pagers/ContribsPager.php: Hot-deploy Contribs getNamespaceInfo UBN fix [[phab:T223440|T223440]] (duration: 00m 53s)
*17:41 jmm@cumin2001: START - Cookbook sre.hosts.downtime
* 15:25 aborrero@puppetmaster1001: conftool action : set/pooled=yes; selector: name=labweb1001.wikimedia.org,service=labweb
*17:35 mholloway-shell@deploy1001: Finished deploy [mobileapps/deploy@0735c45]: Update mobileapps to {{Gerrit|ab67b78}} (duration: 05m 56s)
* 15:02 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
*17:34 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 15:02 jbond@cumin1001: START - Cookbook sre.hosts.downtime
*17:34 jbond@cumin1001: START - Cookbook sre.hosts.downtime
* 15:02 jbond42: rebooting aqs1009
*17:29 mholloway-shell@deploy1001: Started deploy [mobileapps/deploy@0735c45]: Update mobileapps to {{Gerrit|ab67b78}}
* 14:54 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
*17:26 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 14:54 jbond@cumin1001: START - Cookbook sre.hosts.downtime
*17:26 jbond@cumin1001: START - Cookbook sre.hosts.downtime
* 14:54 jbond42: rebooting aqs1008
*17:11 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 14:45 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
*17:11 jbond@cumin1001: START - Cookbook sre.hosts.downtime
* 14:45 jbond@cumin1001: START - Cookbook sre.hosts.downtime
*17:03 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 14:45 jbond42: rebooting aqs1007
*17:03 jbond@cumin1001: START - Cookbook sre.hosts.downtime
* 14:34 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
*16:57 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 14:34 jbond@cumin1001: START - Cookbook sre.hosts.downtime
*16:57 jbond@cumin1001: START - Cookbook sre.hosts.downtime
* 14:34 jbond42: rebooting aqs1006
*16:49 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 14:28 jbond42: rebooting aqs1005
*16:49 jbond@cumin1001: START - Cookbook sre.hosts.downtime
* 14:21 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
*16:41 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 14:21 jbond@cumin1001: START - Cookbook sre.hosts.downtime
*16:41 jbond@cumin1001: START - Cookbook sre.hosts.downtime
* 14:18 moritzm: powercycling mw2199, stuck during reboot
*16:35 hoo: Ran scap pull on mw1240 (curl -H 'Host: www.wikidata.org' … mw1240.eqiad.wmnet/wiki/Special:SetEntitySchemaLabelDescriptionAliases/E10/en returned 404)
* 14:08 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
*16:27 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 14:08 jbond@cumin1001: START - Cookbook sre.hosts.downtime
*16:27 jbond@cumin1001: START - Cookbook sre.hosts.downtime
* 14:07 jbond@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
*16:20 Lucas_WMDE: lucaswerkmeister-wmde@mw1271:~$ scap pull
* 14:07 jbond@cumin1001: START - Cookbook sre.hosts.downtime
*16:17 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 14:07 jbond@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
*16:17 jmm@cumin2001: START - Cookbook sre.hosts.downtime
* 14:07 jbond@cumin1001: START - Cookbook sre.hosts.downtime
*16:15 moritzm: rearmed keyholder on deploy2001 following reboot
* 13:57 marostegui: and recreate the following hosts in tendril: db2103,db2104,db2105,db2106,db2107,db2108,db2109,db2110,db2111,db2112,db2113,db2115,db2116,db2117,db2119 [[phab:T222772|T222772]]
*16:09 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 13:50 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
*16:09 jmm@cumin2001: START - Cookbook sre.hosts.downtime
* 13:50 jmm@cumin2001: START - Cookbook sre.hosts.downtime
*16:09 jmm@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 13:39 cmjohnson1: replacing pdu in rack B5 eqiad
*16:09 jmm@cumin2001: START - Cookbook sre.hosts.downtime
* 13:04 hashar@deploy1001: rebuilt and synchronized wikiversions files: all wikis to 1.34.0-wmf.5
*16:04 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 13:00 arturo: labweb1001 depooled
*16:04 jbond@cumin1001: START - Cookbook sre.hosts.downtime
* 12:59 mobrovac: bootstrap restbase1020-c - [[phab:T219404|T219404]]
*15:56 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 12:21 godog: stop swift and rsync on ms-be10[16,17,18,32,33] for eqiad B5 pdu replacement - [[phab:T223126|T223126]]
*15:56 jbond@cumin1001: START - Cookbook sre.hosts.downtime
* 12:03 jynus: stop and shutdown db1098,db1131,db1139 [[phab:T223126|T223126]]
*15:54 papaul: shutting down db2091 for firmware upgrade
* 11:56 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
*15:53 godog: put back wrongly-replaced sdf on ms-be2043 - [[phab:T222654|T222654]]
* 11:55 jmm@cumin2001: START - Cookbook sre.hosts.downtime
*15:42 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 11:54 moritzm: rebooting mw app servers in codfw for kernel update
*15:42 jbond@cumin1001: START - Cookbook sre.hosts.downtime
* 11:32 hoo@deploy1001: Synchronized wmf-config/extension-list: Add EntitySchema to extension-list ([[phab:T221650|T221650]]) (duration: 00m 56s)
*15:42 Lucas_WMDE: Extension:EntitySchema deployment finished successfully
* 11:22 jynus@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1098 & db1131 for maintenance (duration: 00m 57s)
*15:38 Lucas_WMDE: lucaswerkmeister-wmde@mwmaint1002:~$ mwscript extensions/EntitySchema/maintenance/createPreexistingSchemas.php --wiki=wikidatawiki
* 11:00 arturo: [[phab:T223148|T223148]] downtime cloudvirt[1014,1028].eqiad.wmnet and labweb1001.wikimedia.org for 8 hours
*15:38 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [[gerrit:512909{{!}}Enable extension EntitySchema in production|gerrit:512909Enable extension EntitySchema in production]] (duration: 00m 56s)
* 11:00 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
*15:35 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 11:00 aborrero@cumin1001: START - Cookbook sre.hosts.downtime
*15:35 jbond@cumin1001: START - Cookbook sre.hosts.downtime
* 11:00 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
*15:34 lucaswerkmeister-wmde@deploy1001: Synchronized php-1.34.0-wmf.6/extensions/EntitySchema/: [[gerrit:512911{{!}}Steal maintenance script user|gerrit:512911Steal maintenance script user]] (duration: 00m 58s)
* 11:00 aborrero@cumin1001: START - Cookbook sre.hosts.downtime
*15:26 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 11:00 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
*15:26 jbond@cumin1001: START - Cookbook sre.hosts.downtime
* 11:00 aborrero@cumin1001: START - Cookbook sre.hosts.downtime
*15:17 Lucas_WMDE: lucaswerkmeister-wmde@mwmaint1002:~$ mwscript extensions/EntitySchema/maintenance/createPreexistingSchemas.php --wiki=testwikidatawiki
* 10:50 godog: bootstrap restbase1020-b - [[phab:T219404|T219404]]
*15:17 lucaswerkmeister-wmde@deploy1001: Synchronized php-1.34.0-wmf.7/extensions/EntitySchema/: [[gerrit:512912{{!}}Steal maintenance script user|gerrit:512912Steal maintenance script user]] – forgot `git submodule update` before previous sync (duration: 00m 57s)
* 10:27 jiji@deploy1001: Finished deploy [cpjobqueue/deploy@4d55dff]: Migrating updateBetaFeaturesUserCounts to PHP7 - [[phab:T219148|T219148]] (duration: 01m 07s)
*15:15 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 10:26 jiji@deploy1001: Started deploy [cpjobqueue/deploy@4d55dff]: Migrating updateBetaFeaturesUserCounts to PHP7 - [[phab:T219148|T219148]]
*15:15 jbond@cumin1001: START - Cookbook sre.hosts.downtime
* 08:52 akosiaris: upgrade mathoid to statsd_exporter 0.9 [[phab:T220709|T220709]]
*15:11 lucaswerkmeister-wmde@deploy1001: Synchronized php-1.34.0-wmf.7/extensions/EntitySchema/: [[gerrit:512912{{!}}Steal maintenance script user|gerrit:512912Steal maintenance script user]] (duration: 00m 59s)
* 08:48 akosiaris@deploy1001: scap-helm mathoid finished
*15:08 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 08:48 akosiaris@deploy1001: scap-helm mathoid cluster codfw completed
*15:07 jbond@cumin1001: START - Cookbook sre.hosts.downtime
* 08:48 akosiaris@deploy1001: scap-helm mathoid cluster eqiad completed
*15:01 gehel@cumin2001: END (ERROR) - Cookbook sre.elasticsearch.rolling-reboot (exit_code=97)
* 08:48 akosiaris@deploy1001: scap-helm mathoid upgrade -f mathoid-values.yaml production stable/mathoid [namespace: mathoid, clusters: eqiad,codfw]
*14:57 jbond42: reboot ms-be2016
* 08:47 akosiaris@deploy1001: scap-helm mathoid upgrade -f mathoid-values.yaml [namespace: mathoid, clusters: eqiad,codfw]
*14:57 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 08:37 godog: bootstrap restbase1020-a - [[phab:T219404|T219404]]
*14:57 jbond@cumin1001: START - Cookbook sre.hosts.downtime
* 08:32 elukey: depool/restart-nutcracker-pool mw1293/1313 - [[phab:T214275|T214275]]
*14:36 gehel@cumin2001: START - Cookbook sre.elasticsearch.rolling-reboot
* 08:22 elukey: depool/restart-nutcracker-pool mw1238 - [[phab:T214275|T214275]]
*14:30 zfilipin@deploy1001: rebuilt and synchronized wikiversions files: group0 to 1.34.0-wmf.7
* 08:03 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Fully repool db1104 (duration: 00m 56s)
*14:10 herron: beginning rolling reboots of codfw kafka-main cluster for security updates
* 07:57 moritzm: installing linux 4.9.168-1+deb9u2~deb8u1 kernel on jessie hosts (no reboots, just installing the new package)
*14:10 zfilipin@deploy1001: Finished scap: testwiki to php-1.34.0-wmf.7 and rebuild l10n cache (duration: 34m 22s)
* 07:45 moritzm: removed intel-microcode 3.{{Gerrit|20180807a}} from jessie-wikimedia (superceded by newer version in security.debian.org, which doesn't get picked up by apt due to the higher apr priority of jessie-wikimedia)
*14:04 gehel@cumin2001: END (ERROR) - Cookbook sre.elasticsearch.rolling-reboot (exit_code=97)
* 07:44 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1104 into API (duration: 00m 56s)
*13:50 _joe_: hhvm restarted on mwdebug1001
* 07:25 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Slowly repool db1104 (duration: 00m 57s)
*13:48 _joe_: stopping hhvm on mwdebug1001 for testing
* 06:59 moritzm: installing intel-microcode updates
*13:39 gehel@cumin2001: START - Cookbook sre.elasticsearch.rolling-reboot
* 05:34 elukey: roll restart of nutcracker on mw2* to pick up new config changes (no more memcached config) - [[phab:T214275|T214275]]
*13:35 zfilipin@deploy1001: Started scap: testwiki to php-1.34.0-wmf.7 and rebuild l10n cache
* 05:33 marostegui: Stop MySQL on db1104 to clone db1126
*13:32 gilles@deploy1001: Finished deploy [performance/asoranking@60369cc]: [[phab:T224388|T224388]] (duration: 00m 03s)
* 05:29 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1104 (duration: 00m 56s)
*13:31 gilles@deploy1001: Started deploy [performance/asoranking@60369cc]: [[phab:T224388|T224388]]
* 05:18 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Pool db2106, db2110, db2119 into s4 - [[phab:T222772|T222772]] (duration: 00m 56s)
*13:31 gilles@deploy1001: deploy aborted: [[phab:T224388|T224388]] (duration: 00m 01s)
* 05:17 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Pool db2106, db2110, db2119 into s4 - [[phab:T222772|T222772]] (duration: 00m 58s)
*13:31 gilles@deploy1001: Started deploy [performance/asoranking@1c60db1]: [[phab:T224388|T224388]]
* 02:27 onimisionipe: pooling elastic2038 after unbanning - [[phab:T217398|T217398]]
*13:24 urandom: decommissioning restbase1014-c -- [[phab:T223976|T223976]]
*13:23 gehel@cumin2001: END (ERROR) - Cookbook sre.elasticsearch.rolling-reboot (exit_code=97)
*12:55 gehel@cumin2001: START - Cookbook sre.elasticsearch.rolling-reboot
*12:51 gilles@deploy1001: Finished deploy [performance/asoranking@1c60db1]: [[phab:T224388|T224388]] (duration: 00m 04s)
*12:50 gilles@deploy1001: Started deploy [performance/asoranking@1c60db1]: [[phab:T224388|T224388]]
*12:40 gilles@deploy1001: Finished deploy [performance/asoranking@157c25f]: [[phab:T224388|T224388]] (duration: 00m 06s)
*12:40 gilles@deploy1001: Started deploy [performance/asoranking@157c25f]: [[phab:T224388|T224388]]
*12:13 raynor: EU SWAT done
*12:11 pmiazga@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:512743 Disable the rdf2latex Collection portlet format(T224433)]] (duration: 00m 55s)
*12:00 raynor: EU SWAT re-opened
*11:58 Lucas_WMDE: EU SWAT done
*11:54 Lucas_WMDE: ^ error, no change to wiki
*11:54 Lucas_WMDE: lucaswerkmeister-wmde@mwmaint1002:~$ mwscript extensions/EntitySchema/maintenance/createPreexistingSchemas.php --wiki=testwikidatawiki
*11:52 lucaswerkmeister-wmde@deploy1001: Synchronized php-1.34.0-wmf.6/extensions/EntitySchema/: SWAT: [[gerrit:512689{{!}}Add maintenance script to create preexisting Schemas|gerrit:512689Add maintenance script to create preexisting Schemas]] + [[gerrit:512717{{!}}Small maintenance script adjustments|gerrit:512717Small maintenance script adjustments]] (duration: 00m 54s)
*11:48 lucaswerkmeister-wmde@deploy1001: Synchronized php-1.34.0-wmf.6/extensions/EntitySchema: SWAT: [[gerrit:512677{{!}}Skip configured IDs|gerrit:512677Skip configured IDs]] (duration: 00m 57s)
*11:43 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:511753{{!}}Add a list of IDs to skip in production|gerrit:511753Add a list of IDs to skip in production]] (duration: 00m 54s)
*11:37 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config: SWAT: [[gerrit:510204{{!}}Add feature flag config for breaking Wikibase API change (T223300)|gerrit:510204Add feature flag config for breaking Wikibase API change (T223300)]] (duration: 00m 54s)
*11:31 Urbanecm: Ran namespaceDupes.php for urwikibooks, urwikiquote, urwiktionary and aswikisource
*11:27 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: [[:gerrit:512426{{!}}Use underscores instead of spaces in wgMetaNamespace(Talk) for several projects|gerrit:512426Use underscores instead of spaces in wgMetaNamespace(Talk) for several projects]] ([[phab:T223039|T223039]]) (duration: 00m 54s)
*11:25 arturo: merging change to the puppet sudo module https://gerrit.wikimedia.org/r/c/operations/puppet/+/508311
*11:18 urbanecm@deploy1001: Synchronized wmf-config/abusefilter.php: SWAT: [[:gerrit:512422{{!}}Add abusefilter-modify-restricted to abusefilter group on plwiki (T224308)|gerrit:512422Add abusefilter-modify-restricted to abusefilter group on plwiki (T224308)]] (duration: 02m 36s)
*10:54 zfilipin@deploy1001: scap failed: CalledProcessError Command '/usr/local/bin/mwscript rebuildLocalisationCache.php --wiki="testwiki" --outdir="/tmp/scap_l10n_4182265560" --threads=30 --lang en  --quiet' returned non-zero exit status 1 (duration: 03m 00s)
*10:51 zfilipin@deploy1001: Started scap: testwiki to php-1.34.0-wmf.7 and rebuild l10n cache
*10:48 zfilipin@deploy1001: Pruned MediaWiki: 1.34.0-wmf.3 [keeping static files] (duration: 01m 32s)
*10:45 zfilipin@deploy1001: Pruned MediaWiki: 1.34.0-wmf.4 [keeping static files] (duration: 06m 06s)
*09:32 mobrovac@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Allow MW to honour the X-Request-Id header if set - [[phab:T201409|T201409]] (duration: 01m 12s)
*09:28 moritzm: installing php5 security updates
*09:00 moritzm: installing ffmpeg security updates
*08:58 gehel: rebooting wdqs nodes for kernel upgrade
*08:54 jiji@deploy1001: Finished deploy [cpjobqueue/deploy@04cc66d]: Migrating ORESFetchScoresJob  to PHP7 - [[phab:T219148|T219148]] (duration: 01m 21s)
*08:52 jiji@deploy1001: Started deploy [cpjobqueue/deploy@04cc66d]: Migrating ORESFetchScoresJob  to PHP7 - [[phab:T219148|T219148]]
*08:52 moritzm: uploaded ffmpeg 3.2.14-1~deb9u1+wmf3 to component/vp9 of stretch-wikimedia (rebase of our vp9-row-mt backport to the latest stretch-security ffmpeg update)
*08:47 vgutierrez: uploaded acme-chief 0.17 to apt.wikimedia.org (buster) - [[phab:T220518|T220518]] [[phab:T213820|T213820]]
*08:40 volans: [[phab:T224448|T224448]] sudo cumin -b 15 -p 95 'R:git::clone' 'run-puppet-agent -q --failed-only'
*08:29 volans: restarting gerrit due to stack threads - [[phab:T224448|T224448]]
*07:17 moritzm: uploaded ffmpeg 3.2.14-1~deb9u1+wmf1 to component/vp9 of stretch-wikimedia (rebase of our vp9-row-mt backport to the latest stretch-security ffmpeg update)
*07:02 mobrovac: decommission restbase1014-b -- [[phab:T223976|T223976]]
*06:40 jiji@deploy1001: Synchronized wmf-config/CommonSettings.php: Send 20% of anonymous users to PHP7.2 - [[phab:T219150|T219150]] (duration: 00m 51s)
*00:38 urandom: decommissioning restbase1014-a -- [[phab:T223976|T223976]]


== 2019-05-15 ==
==2019-05-27==
* 22:16 mutante: phab1003 - start ssh-phab service after adding service IPs
* 22:01 eileen: civicrm update - lost the commit versions but 5.13.4 release
* 21:47 mutante: phab1003 - ip -6 addr del 2620:0:861:ed1a::3:16/128 dev lo - remove extra service IP for phab's separate sshd, duplicated with phab1001 ([[phab:T190568|T190568]])
* 21:24 jforrester@deploy1001: Synchronized wmf-config/MetaContactPages.php: Add movecomsignup contact page on meta [[phab:T218363|T218363]] (duration: 00m 56s)
* 21:23 eileen: civicrm revision changed from {{Gerrit|7d3ef1f2ae}} to {{Gerrit|c69c6e2e6a}}, config revision is {{Gerrit|a099f13a55}}
* 21:00 fdans@deploy1001: Finished deploy [analytics/refinery@ffa4931]: deploying analytics refinery (duration: 15m 31s)
* 20:45 tgr@deploy1001: Finished deploy [proton/deploy@9373c42]: Add gistcdn.githack.com to host blacklist ([[phab:T213362|T213362]]) (duration: 02m 41s)
* 20:45 fdans@deploy1001: Started deploy [analytics/refinery@ffa4931]: deploying analytics refinery
* 20:42 tgr@deploy1001: Started deploy [proton/deploy@9373c42]: Add gistcdn.githack.com to host blacklist ([[phab:T213362|T213362]])
* 20:20 robh: rebooting cloudvirt1015 into dell hardware tests per [[phab:T220853|T220853]]
* 20:18 arlolra@deploy1001: Finished deploy [parsoid/deploy@8f28977]: Updating Parsoid to {{Gerrit|6658cad}} (duration: 06m 23s)
* 20:12 arlolra@deploy1001: Started deploy [parsoid/deploy@8f28977]: Updating Parsoid to {{Gerrit|6658cad}}
* 19:42 hashar: group 1 promoted to 1.34.0-wmf.5  apparently without any issue # [[phab:T220730|T220730]]
* 19:03 hashar@deploy1001: Synchronized php: group1 wikis to 1.34.0-wmf.5 (duration: 00m 58s)
* 19:02 hashar@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.34.0-wmf.5
* 18:38 andyrussg@deploed php-1.34.0-wmf.5/extensions/CentralNotice/: Revert CentralNotice (duration: 01m 00s)
* 17:32 thcipriani: deploy1001:sudo -u www-data /usr/local/bin/foreachwiki extensions/WikimediaMaintenance/refreshMessageBlobs.php
* 17:19 onimisionipe: unban elastic2038 from shard allocation - [[phab:T217398|T217398]]
* 17:19 XenoRyet: updated civicrm from {{Gerrit|4b6d569383}} to {{Gerrit|7d3ef1f2ae}}
* 17:09 elukey: powerup elastic2038 (was down for maintenance)
* 17:01 godog: bootstrap restbase1019-c - [[phab:T219404|T219404]]
* 16:58 bstorm_: [[phab:T212972|T212972]] updated all views on labsdb1012
* 16:50 elukey: restart Hadoop HDFS namenodes on an-master100[1,2] to pick up new settings
* 16:40 urandom: bootstrap restbase1019-c - [[phab:T219404|T219404]]
* 16:28 elukey: restart nutcracker on mw2240 to pick up the new config (no more memcached settings)
* 16:26 bstorm_: [[phab:T212972|T212972]] updated all views on labsdb1009
* 16:17 reedy@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [[phab:T223166|T223166]] (duration: 00m 56s)
* 16:16 reedy@deploy1001: Synchronized php-1.34.0-wmf.5/extensions/WikimediaEvents/: [[phab:T219128|T219128]] (duration: 01m 13s)
* 16:14 reedy@deploy1001: Synchronized php-1.34.0-wmf.4/extensions/WikimediaEvents/: [[phab:T219128|T219128]] (duration: 01m 06s)
* 16:03 jynus: disable puppet on all production databases
* 15:21 reedy@deploy1001: Synchronized php-1.34.0-wmf.4/extensions/GrowthExperiments/includes/HelpPanel/QuestionPoster.php: [[phab:T222980|T222980]] (duration: 00m 57s)
* 14:28 andrewbogott: repooling labweb1002
* 14:16 andrewbogott: depooling labweb1002 to test https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/509916/
* 14:15 godog: bootstrap restbase1019-b - [[phab:T219404|T219404]]
* 13:21 anomie@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Setting actor migration to write-new/read-new on testwikis and mediawikiwiki ([[phab:T188327|T188327]]) (duration: 00m 57s)
* 12:22 Lucas_WMDE: EU SWAT done
* 12:20 lucaswerkmeister-wmde@deploy1001: Synchronized php-1.34.0-wmf.4/extensions/VisualEditor/: SWAT: [[gerrit:510217{{!}}VisualEditorHooks: Use isVisualAvailable() when changing tabs/editsections]] + [[gerrit:510218{{!}}DesktopArticleTarget.init: Allow veaction=edit to override namespace settings (T221892)]] (duration: 01m 15s)
* 12:20 akosiaris: depool esams, network issues
* 11:47 akosiaris@deploy1001: scap-helm mathoid finished
* 11:47 akosiaris@deploy1001: scap-helm mathoid cluster staging completed
* 11:46 akosiaris@deploy1001: scap-helm mathoid upgrade --wait -f mathoid-staging-values.yaml staging stable/mathoid [namespace: mathoid, clusters: staging]
* 11:41 akosiaris@deploy1001: scap-helm citoid finished
* 11:41 akosiaris@deploy1001: scap-helm citoid cluster eqiad completed
* 11:41 akosiaris@deploy1001: scap-helm citoid upgrade -f citoid-eqiad-values.yaml production stable/citoid [namespace: citoid, clusters: eqiad]
* 11:32 akosiaris@deploy1001: scap-helm citoid finished
* 11:32 akosiaris@deploy1001: scap-helm citoid cluster codfw completed
* 11:31 akosiaris@deploy1001: scap-helm citoid upgrade -f citoid-codfw-values.yaml production stable/citoid [namespace: citoid, clusters: codfw]
* 11:31 godog: bootstrap restbase1019-a - [[phab:T219404|T219404]]
* 11:29 akosiaris: upgrade to statsd_export 0.9 for citoid [[phab:T220709|T220709]]
* 11:27 akosiaris@deploy1001: scap-helm citoid finished
* 11:27 akosiaris@deploy1001: scap-helm citoid cluster staging completed
* 11:27 akosiaris@deploy1001: scap-helm citoid upgrade -f citoid-staging-values.yaml staging stable/citoid [namespace: citoid, clusters: staging]
* 10:31 elukey: superset.wikimedia.org moved to analytics-tool1004 (Buster + python 3.7 + Superset 0.32 upgrade)
* 10:27 moritzm: installing linux 4.9.168-1+deb9u2 kernel on stretch hosts (no reboots, just installing the new package)
* 10:04 elukey@deploy1001: Finished deploy [analytics/superset/deploy@9cdb9c5]: Superset 0.32 - update pyhive dependency (duration: 00m 26s)
* 10:04 elukey@deploy1001: Started deploy [analytics/superset/deploy@9cdb9c5]: Superset 0.32 - update pyhive dependency
* 09:33 hashar: Disable CI castor cache system since the instance is being migrated. Some / most CI jobs might have failed for the last 20 minutes or so [[phab:T223148|T223148]]
* 08:45 elukey@deploy1001: Finished deploy [analytics/superset/deploy@31c2c30]: Superset 0.32 (duration: 00m 26s)
* 08:44 elukey@deploy1001: Started deploy [analytics/superset/deploy@31c2c30]: Superset 0.32
* 08:36 elukey: stop superset on analytics-tool1003 as prep step for the migration to the new host - [[phab:T212243|T212243]]
* 08:31 moritzm: rebooting mw2164
* 07:33 elukey: restart nutcracker on mw2245 to pick up config changes (removal of memcached config)
* 07:29 elukey: powercycle an-worker1094 (OEM event occurred, checking if temporary)
* 07:21 oblivian@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Remove the php7 beta feature [[phab:T219128|T219128]] (duration: 00m 59s)
* 06:24 elukey: force remount of /mnt/hdfs on stat1007 - fuse hdfs stuck
* 01:40 eileen: process control updated - omnigroupmember.load re-enabled
* 01:39 eileen: civicrm revision changed from {{Gerrit|5024c968ed}} to {{Gerrit|4b6d569383}}, config revision is {{Gerrit|a099f13a55}}


== 2019-05-14 ==
*23:19 thcipriani: gerrit back after restarting due to [[phab:T224448|T224448]]
* 20:44 herron@deploy1001: Finished deploy [logstash/plugins@7fb8843]: adding logstash-filter-truncate plugin (duration: 00m 07s)
*23:10 thcipriani: restarting gerrit due to active threads being stuck being a sendemail thread.
* 20:43 herron@deploy1001: Started deploy [logstash/plugins@7fb8843]: adding logstash-filter-truncate plugin
*22:52 gilles@deploy1001: Finished deploy [performance/asoranking@bacfc37]: [[phab:T224388|T224388]] (duration: 00m 05s)
* 20:41 herron@deploy1001: Finished deploy [logstash/plugins@7fb8843]: (no justification provided) (duration: 00m 01s)
*22:52 gilles@deploy1001: Started deploy [performance/asoranking@bacfc37]: [[phab:T224388|T224388]]
* 20:41 herron@deploy1001: Started deploy [logstash/plugins@7fb8843]: (no justification provided)
*22:19 gilles@deploy1001: Finished deploy [performance/asoranking@d0c156e]: [[phab:T224388|T224388]] (duration: 00m 05s)
* 20:13 chaomodus: restarting gerrit on cobalt to pick up metrics export changes
*22:19 gilles@deploy1001: Started deploy [performance/asoranking@d0c156e]: [[phab:T224388|T224388]]
* 19:37 herron: adding logstash filter truncate plugin to prod logstash collectors
*20:19 gilles@deploy1001: Finished deploy [performance/asoranking@61039f1]: (no justification provided) (duration: 00m 06s)
* 19:28 gehel: shutting down elastic2038 for memory replacement - [[phab:T217398|T217398]]
*20:19 gilles@deploy1001: Started deploy [performance/asoranking@61039f1]: (no justification provided)
* 19:25 gehel: ban elastic2038 from elasticsearch cluster for memory replacement - [[phab:T217398|T217398]]
*18:41 krinkle@deploy1001: Synchronized php-1.34.0-wmf.6/includes/libs/rdbms: {{Gerrit|66556bf37e8}} / [[phab:T223310|T223310]], [[phab:T223978|T223978]] (duration: 00m 50s)
* 18:21 mutante: mwmaint1002 - deleting /root/home-mwmaint2001 to save space - confirmed we have bacula backups of home on mwmaint2001
*18:06 krinkle@deploy1001: Synchronized errorpages/: {{Gerrit|4ffcbfc2ba3}} (duration: 00m 48s)
* 17:55 mutante: elastic2029 - enable puppet agent - was disabled without reason and nobody seems to have logged in recently
*17:56 andrewbogott: re-imaging cloudservices1004 in order to make sure our apt magic is working properly
* 17:54 mutante: elastic2038 - restart nagios-nrpe-server - attempt to fix "CHECK_NRPE STATE UNKNOWN" for a single check
*17:37 andrewbogott: refreshing puppet-compiler facts
* 17:32 mutante: contint1001 - mkdir /srv/zuul-logs ; mv /var/log/zuul/debug.log* /srv/zuul-logs/ to prevent CI running out of disk again ([[phab:T207707|T207707]])
*16:40 volans: removed unreferenced files in /etc/dhcp/ on install[12]002
* 17:22 mbsantos@deploy1001: Finished deploy [proton/deploy@881b22b]: Update chromium-render to {{Gerrit|8cc96e7}} make timeout handler more robust ([[phab:T217724|T217724]]) (duration: 02m 23s)
*16:34 mobrovac: decommission restbase1013-c - [[phab:T223976|T223976]]
* 17:20 mbsantos@deploy1001: Started deploy [proton/deploy@881b22b]: Update chromium-render to {{Gerrit|8cc96e7}} make timeout handler more robust ([[phab:T217724|T217724]])
*15:40 akosiaris: initialize termbox namespace on eqiad/codfw/staging kubernetes clusters [[phab:T220402|T220402]]
* 16:30 jynus: stop replication and start table recompression on labsdb1009 [[phab:T222978|T222978]]
*15:36 akosiaris: initialize sessionstore namespace on eqiad/codfw/staging kubernetes clusters [[phab:T220401|T220401]]
* 16:22 godog: statsd_exporter 0.9 upgrade on thumbor - [[phab:T220709|T220709]]
*13:03 godog: swift eqiad-prod: ms-be1033 weight to 0 - [[phab:T223518|T223518]]
* 16:04 gilles@deploy1001: Finished deploy [performance/coal@5a32eb2]: [[phab:T221401|T221401]] (duration: 00m 06s)
*11:33 onimisionipe: starting osm initial import on maps2004 - [[phab:T224395|T224395]]
* 16:04 gilles@deploy1001: Started deploy [performance/coal@5a32eb2]: [[phab:T221401|T221401]]
*10:35 mobrovac: decommission restbase1013-b - [[phab:T223976|T223976]]
* 15:56 jforrester@deploy1001: Synchronized php-1.34.0-wmf.4/extensions/VisualEditor/includes/ApiVisualEditor.php: Hot-deploy VE unset variable fix [[phab:T223281|T223281]] (duration: 00m 55s)
*10:31 onimisionipe: rebooting maps2004 - cassandra unit failed and got stuck
* 15:51 jforrester@deploy1001: Synchronized php-1.34.0-wmf.5/extensions/VisualEditor/includes/ApiVisualEditor.php: Hot-deploy VE unset variable fix [[phab:T223281|T223281]] (duration: 00m 57s)
*09:59 jiji@deploy1001: Finished deploy [cpjobqueue/deploy@421c029]: Migrating wikibase-addUsagesForPage  to PHP7 - [[phab:T219148|T219148]] (duration: 01m 09s)
* 15:49 crusnov@deploy1001: Finished deploy [netbox/deploy@81059c6]: Deploy new reqs for reports (duration: 00m 55s)
*09:58 jiji@deploy1001: Started deploy [cpjobqueue/deploy@421c029]: Migrating wikibase-addUsagesForPage  to PHP7 - [[phab:T219148|T219148]]
* 15:49 crusnov@deploy1001: Started deploy [netbox/deploy@81059c6]: Deploy new reqs for reports
*09:52 _joe_: disabling puppet on mw1261, running some tests for [[phab:T223180|T223180]]
* 15:43 jynus: reload haproxy config @ dbproxy1010, dbproxy1011
*08:52 arturo: 1 day downtime systemd check for cloudcontrol1003
* 15:38 XioNoX: re-activate bgp to telia on cr1-codfw - [[phab:T222967|T222967]]
*08:27 jiji@deploy1001: Synchronized wmf-config/db-codfw.php: Depool db2091 - [[phab:T224393|T224393]] (duration: 00m 49s)
* 15:33 XioNoX: deactivate bgp to telia on cr1-codfw - [[phab:T222967|T222967]]
*08:03 gehel: depool maps2004 - [[phab:T224395|T224395]]
* 15:19 papaul: shutting down elastic2038 for memory replacement
*07:05 gehel: running nodetool repair on maps2004 -[[phab:T224395|T224395]]
* 15:14 hashar: mw1263: scap pull
*04:23 gilles@deploy1001: Finished deploy [performance/asoranking@61039f1]: (no justification provided) (duration: 00m 28s)
* 14:53 hashar@deploy1001: rebuilt and synchronized wikiversions files: group0 to 1.34.0-wmf.5
*04:23 gilles@deploy1001: Started deploy [performance/asoranking@61039f1]: (no justification provided)
* 14:50 moritzm: rebooting mw1263 for kernel update
*02:59 urandom: decommissioning restbase1013-a -- [[phab:T223976|T223976]]
* 14:47 hashar@deploy1001: Finished scap: testwiki to 1.34.0-wmf.5 and rebuild l10n cache (duration: 62m 47s)
* 14:07 _joe_: apt-get lean on mwmaint1002
* 13:44 hashar@deploy1001: Started scap: testwiki to 1.34.0-wmf.5 and rebuild l10n cache
* 13:44 godog: rearm keyholder on deploy and cumin hosts
* 13:27 hashar@deploy1001: Finished scap: testwiki to 1.34.0-wmf.5 and rebuild l10n cache (duration: 14m 39s)
* 13:12 hashar: train delay, I forgot to sync 1.34.0-wmf.5
* 13:12 hashar@deploy1001: Started scap: testwiki to 1.34.0-wmf.5 and rebuild l10n cache
* 12:37 jforrester@deploy1001: Synchronized php-1.34.0-wmf.4/extensions/VisualEditor/: Hot-deploy [[phab:T223023|T223023]] fix {{Gerrit|I1b35b28e42}} for mobile VE edit section switches (duration: 00m 54s)
* 12:10 moritzm: rebooting mw2164 for kernel update
* 11:33 hashar@deploy1001: Pruned MediaWiki: 1.33.0-wmf.24 (duration: 03m 20s)
* 11:30 hashar: Deleting 1.33.0-wmf.24 from deploy1001 # [[phab:T220730|T220730]]
* 11:28 kart_: EU-Mid day SWAT Done.
* 11:25 kartik@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT [[gerrit:508818{{!}}Decrease idwiki MT thresold for publishing]] ([[phab:T222782|T222782]]) (duration: 00m 51s)
* 11:23 hashar@deploy1001: clean aborted: Pruned MediaWiki: 1.33.0-wmf.23 (duration: 14m 31s)
* 11:23 jbond42: cumin1001 ~ % sudo cumin A:all '/usr/local/sbin/run-puppet-agent --failed-only
* 11:18 jbond42: enable puppet issue fixed https://gerrit.wikimedia.org/r/c/operations/puppet/+/510131
* 11:12 ema: pool cp3036 reimaged to ATS [[phab:T222937|T222937]]
* 11:09 hashar: Deleting 1.33.0-wmf.23 from deploy1001 # [[phab:T220730|T220730]]
* 11:09 jbond42: disable puppet
* 10:58 hashar: scap prep 1.34.0-wmf.5 # [[phab:T220730|T220730]]
* 10:16 hashar: Cutting branches for 1.34.0-wmf.5
* 10:01 ema: depool cp3036 and reimage as upload_ats [[phab:T222937|T222937]]
* 09:55 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Remove db2034 from config [[phab:T219493|T219493]] (duration: 00m 49s)
* 09:53 marostegui@deploy1001: scap failed: average error rate on 4/11 canaries increased by 10x (rerun with --force to override this check, see https://logstash.wikimedia.org/goto/db09a36be5ed3e81155041f7d46ad040 for details)
* 09:52 marostegui: Remove db2034 from tendril and zarcillo - [[phab:T219493|T219493]]
* 09:51 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Remove db2034 from config [[phab:T219493|T219493]] (duration: 00m 50s)
* 09:34 jynus: restart apache on ununpentium
* 09:29 marostegui: Parsercache deployment window FINISHED
* 09:27 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Deploy second parsercache key change everywhere after deploying it in batches first [[phab:T210725|T210725]] (duration: 00m 50s)
* 09:15 godog: statsd_exporter 0.9 upgrade on ores - [[phab:T220709|T220709]]
* 09:02 godog: statsd_exporter 0.9 upgrade on logstash - [[phab:T220709|T220709]]
* 08:53 jynus: failing connections over dbproxy1006 to dbproxy1001
* 07:48 moritzm: installing bind security updates for stretch (only client-side tools/libraries in use)
* 06:45 ema: cp-ats: upgrade trafficserver to 8.0.3-1wm2
* 06:20 ema: cp4021: upgrade trafficserver to 8.0.3-1wm2
* 06:15 ema: upload trafficserver 8.0.3-1wm2 to stretch-wikimedia
* 06:02 marostegui: Deploy parsercache change to eqiad canaries - [[phab:T210725|T210725]]
* 06:01 marostegui: Lock wmf-config deployment on deploy1001 to slowly change parsercache key on eqiad - [[phab:T210725|T210725]]
* 06:01 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Change parsercache on codfw [[phab:T210725|T210725]] (duration: 00m 54s)
* 01:55 mutante: re-scheduled nginx / HTTP availability icinga checks
* 01:42 mutante: cumin -b 6 'R:git::clone' 'run-puppet-agent -q --failed-only'
* 01:37 mutante: restarting Gerrit to apply 2 config changes - disable DNS reverse lookup (gerrit:508127) & list projects from index (gerrit:508892) - removes blockers for 2.16 upgrade ([[phab:T200739|T200739]])
* 00:32 mutante: restarting wikibugs because it left some channels


== 2019-05-13 ==
==2019-05-26==
* 20:29 ejegg: updated payments-wiki from {{Gerrit|6e0172bac3}} to {{Gerrit|8397ccf9cc}}
* 20:24 halfak@deploy1001: Finished deploy [ores/deploy@c17a1a2]: [[phab:T202202|T202202]] (duration: 04m 16s)
* 20:20 halfak@deploy1001: Started deploy [ores/deploy@c17a1a2]: [[phab:T202202|T202202]]
* 20:19 ariel@deploy1001: Finished deploy [dumps/dumps@941d374]: lbzip2 decompression for 7z file production for big wikis (duration: 00m 03s)
* 20:19 ariel@deploy1001: Started deploy [dumps/dumps@941d374]: lbzip2 decompression for 7z file production for big wikis
* 20:04 halfak@deploy1001: Started deploy [ores/deploy@c17a1a2]: [[phab:T202202|T202202]]
* 18:29 otto@deploy1001: Synchronized wmf-config/InitialiseSettings.php: re-sync: re-enabling all eventgate-analytics monolog events - [[phab:T222962|T222962]] (duration: 00m 49s)
* 18:28 ejegg: updated SmashPig standalone deploy {{Gerrit|22b6982}} Try turning off WSDL caching for Adyen
* 18:25 reedy@deploy1001: Synchronized wmf-config/CommonSettings.php: [[phab:T222954|T222954]] (duration: 00m 49s)
* 18:19 otto@deploy1001: Synchronized wmf-config/InitialiseSettings.php: re-enabling all eventgate-analytics monolog events - [[phab:T222962|T222962]] (duration: 00m 50s)
* 18:17 ottomata: re-enabling all eventgate-analytics monolog events - [[phab:T222962|T222962]]
* 18:12 reedy@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [[phab:T223006|T223006]] [[phab:T222740|T222740]] [[phab:T222044|T222044]] (duration: 00m 49s)
* 18:07 otto@deploy1001: scap-helm eventgate-analytics finished
* 18:07 otto@deploy1001: scap-helm eventgate-analytics cluster eqiad completed
* 18:07 otto@deploy1001: scap-helm eventgate-analytics upgrade analytics -f analytics/eqiad-values.yaml --reset-values stable/eventgate [namespace: eventgate-analytics, clusters: eqiad]
* 18:05 otto@deploy1001: scap-helm eventgate-analytics finished
* 18:05 otto@deploy1001: scap-helm eventgate-analytics cluster codfw completed
* 18:04 otto@deploy1001: scap-helm eventgate-analytics upgrade analytics -f analytics/codfw-values.yaml --reset-values stable/eventgate [namespace: eventgate-analytics, clusters: codfw]
* 18:03 fsero: deleting eventgate-analytics-production releases on codfw
* 18:01 otto@deploy1001: scap-helm eventgate-analytics finished
* 18:01 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
* 18:01 otto@deploy1001: scap-helm eventgate-analytics install -n analytics -f analytics/staging-values.yaml stable/eventgate [namespace: eventgate-analytics, clusters: staging]
* 17:57 fsero: deleting eventgate-analytics and eventgate-analytics-staging releases on staging
* 17:41 otto@deploy1001: Synchronized wmf-config/InitialiseSettings.php: retry - disabling all eventgate-analytics monolog events for eventgate chart migration - [[phab:T222962|T222962]] (duration: 00m 50s)
* 17:11 otto@deploy1001: Synchronized wmf-config/InitialiseSettings.php: disabling all eventgate-analytics monolog events for eventgate chart migration - [[phab:T222962|T222962]] (duration: 00m 50s)
* 17:10 ottomata: disabling all eventgate-analytics monolog events for eventgate chart migration - [[phab:T222962|T222962]]
* 16:14 Amir1: removing tokipona language terms from items using maintenance script ([[phab:T200432|T200432]])
* 16:00 andrewbogott: reimaging clouvirt1024 (for the last time I hope)
* 14:33 otto@deploy1001: Synchronized wmf-config/ProductionServices.php: no-op in prod - Configure eventgate services in beta (duration: 00m 49s)
* 14:32 otto@deploy1001: Synchronized wmf-config/LabsServices.php: no-op in prod - Configure eventgate services in beta (duration: 00m 49s)
* 14:05 moritzm: uploaded puppet 4.8.2-5+wmf1 to component/puppetdb4 for apt.wikimedia.org/stretch-wikimedia  ([[phab:T219803|T219803]])
* 14:00 elukey: roll restart of aqs on aqs1* to pick up new druid settings
* 13:50 cdanis: cdanis@cumin1001.eqiad.wmnet ~ % sudo cumin -p95 -b8 'ms-fe2*' 'run-puppet-agent'
* 13:46 moritzm: updating puppet on deployment-puppetmaster03 to 4.8.2-5+wmf1 ([[phab:T219803|T219803]])
* 13:39 akosiaris: bump eventgate-analytics chart to 0.0.36. Renames nodejs GC stats to microseconds and bumps the biggest bucket to 100ms. [[phab:T220709|T220709]]
* 13:38 akosiaris@deploy1001: scap-helm eventgate-analytics finished
* 13:38 akosiaris@deploy1001: scap-helm eventgate-analytics cluster staging completed
* 13:38 akosiaris@deploy1001: scap-helm eventgate-analytics upgrade -f eventgate-analytics-staging-values.yaml staging stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
* 13:38 akosiaris@deploy1001: scap-helm eventgate-analytics finished
* 13:38 akosiaris@deploy1001: scap-helm eventgate-analytics cluster eqiad completed
* 13:38 akosiaris@deploy1001: scap-helm eventgate-analytics upgrade -f eventgate-analytics-eqiad-values.yaml production stable/eventgate-analytics [namespace: eventgate-analytics, clusters: eqiad]
* 13:38 akosiaris@deploy1001: scap-helm eventgate-analytics finished
* 13:38 akosiaris@deploy1001: scap-helm eventgate-analytics cluster codfw completed
* 13:38 akosiaris@deploy1001: scap-helm eventgate-analytics upgrade -f eventgate-analytics-codfw-values.yaml production stable/eventgate-analytics [namespace: eventgate-analytics, clusters: codfw]
* 13:36 anomie@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Setting actor migration to write-both/read-new on all wikis ([[phab:T188327|T188327]]) (duration: 00m 50s)
* 13:30 cdanis: cdanis@cumin1001.eqiad.wmnet ~ % sudo cumin -p95 -b8 'ms-be2*' 'run-puppet-agent'
* 13:29 cdanis: swift codfw-prod: deploy {{Gerrit|I1035824d}}
* 13:25 moritzm: uploaded puppetdb 4.4.0-1~wmf2 to component/puppetdb4 for apt.wikimedia.org/stretch-wikimedia  ([[phab:T219803|T219803]])
* 13:07 akosiaris: bump cxserver chart to 0.0.7. Renames nodejs GC stats to microseconds and bumps the biggest bucket to 100ms. [[phab:T220709|T220709]]
* 13:06 akosiaris@deploy1001: scap-helm cxserver finished
* 13:06 akosiaris@deploy1001: scap-helm cxserver cluster staging completed
* 13:06 akosiaris@deploy1001: scap-helm cxserver upgrade -f cxserver-staging-values.yaml staging stable/cxserver [namespace: cxserver, clusters: staging]
* 13:06 akosiaris@deploy1001: scap-helm cxserver finished
* 13:06 akosiaris@deploy1001: scap-helm cxserver cluster eqiad completed
* 13:06 akosiaris@deploy1001: scap-helm cxserver upgrade -f cxserver-eqiad-values.yaml production stable/cxserver [namespace: cxserver, clusters: eqiad]
* 13:06 akosiaris@deploy1001: scap-helm cxserver finished
* 13:06 akosiaris@deploy1001: scap-helm cxserver cluster codfw completed
* 13:05 akosiaris@deploy1001: scap-helm cxserver upgrade -f cxserver-codfw-values.yaml production stable/cxserver [namespace: cxserver, clusters: codfw]
* 13:04 arturo: install libjs-jquery from stretch in cloudnet servers [[phab:T222862|T222862]]
* 13:03 arturo: enable puppet in cloudvirt1024 to refresh some apt config [[phab:T222862|T222862]]
* 12:50 moritzm: updating puppetdb on deployment-puppetdb02 to 4.4.0-1~wmf2 ([[phab:T219803|T219803]])
* 12:36 cdanis: root@ms-be2013.codfw.wmnet ~ # umount /srv/swift-storage/sda1 && mount /srv/swift-storage/sda1 && umount /srv/swift-storage/sdb1 && mount /srv/swift-storage/sdb1
* 12:36 krinkle@deploy1001: Synchronized php-1.34.0-wmf.4/resources/src/startup/startup.js: {{Gerrit|I76a2c8d52fa}} (duration: 00m 51s)
* 12:33 cdanis: root@ms-be2013.codfw.wmnet ~ # mount /srv/swift-storage/sdf1
* 12:25 cdanis: cdanis@ms-be2015.codfw.wmnet ~ % sudo umount /srv/swift-storage/sdl1 && sudo mount /srv/swift-storage/sdl1
* 12:25 cdanis: cdanis@ms-be2015.codfw.wmnet ~ % sudo umount /srv/swift-storage/sdf1 && sudo mount /srv/swift-storage/sdf1
* 12:18 cdanis: cdanis@ms-be2015.codfw.wmnet /var/log % sudo mount /srv/swift-storage/sda1
* 12:08 reedy@deploy1001: Synchronized php-1.34.0-wmf.4/extensions/Wikibase/lib/includes/Formatters/CachingKartographerEmbeddingHandler.php: [[phab:T223085|T223085]] (duration: 00m 50s)
* 11:59 reedy@deploy1001: Synchronized php-1.34.0-wmf.4/composer.json: [[phab:T215746|T215746]] (duration: 00m 49s)
* 11:58 reedy@deploy1001: Synchronized php-1.34.0-wmf.4/vendor/: [[phab:T215746|T215746]] (duration: 01m 30s)
* 11:43 reedy@deploy1001: Synchronized php-1.34.0-wmf.4/extensions/VisualEditor/: [[phab:T222639|T222639]] (duration: 00m 52s)
* 11:04 ema: cp-ats rolling restart to apply https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/509456/
* 10:39 jforrester@deploy1001: Synchronized php-1.34.0-wmf.4/includes/http/HttpRequestFactory.php: [[phab:T222935|T222935]] Hot-deploy fix for HttpRequestFactory (duration: 00m 50s)
* 10:38 jbond42: update puppet5 and facter3 in eqiad
* 10:17 vgutierrez: rebooting cloudvirt1024 - [[phab:T209707|T209707]]
* 09:40 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1064 [[phab:T217396|T217396]] (duration: 00m 49s)
* 09:33 hashar: Upgrading Zuul 2.5.1-wmf7 -> 2.5.1-wmf9 [[phab:T105474|T105474]]
* 07:27 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Fully pool db1130 (s5) and db1138 (s4) [[phab:T222682|T222682]] (duration: 00m 50s)
* 07:08 elukey: slow roll restart of celery on ores* nodes to allow cores to be generated upon segfault - [[phab:T222866|T222866]]
* 07:05 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: More traffic for db1130 (s5) and db1138 (s4) [[phab:T222682|T222682]] (duration: 00m 50s)
* 06:53 moritzm: installing ghostscript security updates
* 06:44 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: More traffic for db1130 (s5) and db1138 (s4) [[phab:T222682|T222682]] (duration: 00m 49s)
* 06:09 marostegui: Compress s2, s6 and s7 on labsdb1012 - [[phab:T222978|T222978]]
* 05:50 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: More traffic for db1130 (s5) and db1138 (s4) [[phab:T222682|T222682]] (duration: 00m 49s)
* 05:41 marostegui: Optimize tables on pc2007
* 05:18 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Pool db1130 into s5 and db1138 into s4 [[phab:T222682|T222682]] (duration: 00m 49s)
* 05:17 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Pool db1130 into s5 and db1138 into s4 [[phab:T222682|T222682]] (duration: 00m 51s)


== 2019-05-12 ==
*20:39 urandom: decommissioning restbase1012-c -- [[phab:T223976|T223976]]
* 15:32 elukey: rollback python-kafka one eventlog1002 to 1.4.1-1~stretch1 - [[phab:T222941|T222941]]
*14:09 urandom: decommissioning restbase1012-b -- [[phab:T223976|T223976]]
* 12:14 elukey: restart eventlogging on eventlog1002 - all processors stuck due to kafka python ([[phab:T222941|T222941]])
*13:37 krinkle@deploy1001: Synchronized php-1.34.0-wmf.6/includes/debug: [[phab:T187147|T187147]] / {{Gerrit|2be7aa4bc4af36}} (duration: 00m 51s)
* 05:31 marostegui: DIsable notifications for db1116:s8 Slave LAG check as this is a snapshot source
*08:01 mobrovac: decommission restbase1012-a - [[phab:T223976|T223976]]


== 2019-05-11 ==
==2019-05-25==
* 18:26 reedy@deploy1001: Synchronized wmf-config/interwiki.php: Update interwiki cache (duration: 02m 57s)
* 06:37 elukey: restart eventlogging on eventlog1002 - huge kafka consumer lag accumulated ([[phab:T222941|T222941]])
* 02:01 mutante: actinium - low disk space - apt-get clean - gzip /var/log/squid3/access.log.1


== 2019-05-10 ==
*22:41 urandom: decommissioning restbase1011-c -- [[phab:T223976|T223976]]
* 18:58 cdanis: cdanis@cumin1001.eqiad.wmnet ~ % sudo cumin -b 15 -p 95 '*' 'run-puppet-agent -q --failed-only'
*22:00 krinkle@deploy1001: Synchronized php-1.34.0-wmf.6/includes/Linker.php: [[phab:T222628|T222628]] / {{Gerrit|c735a545df3a}} (duration: 00m 51s)
* 18:51 cdanis: cdanis@cumin1001.eqiad.wmnet ~ % sudo cumin -b 15 -p 95 '*' 'run-puppet-agent -q --failed-only'
*19:12 andrewbogott: reimaging cloudservices1004 with Stretch
* 18:49 cdanis: cdanis@cumin1001.eqiad.wmnet ~ % sudo cumin '*' 'enable-puppet "Puppet breakages on all hosts -- cdanis"'
*13:46 urandom: decommissioning restbase1011-b -- [[phab:T223976|T223976]]
* 18:39 cdanis: cdanis@cumin1001.eqiad.wmnet ~ % sudo cumin '*' 'disable-puppet "Puppet breakages on all hosts -- cdanis"'
*12:28 godog: bounce thumbor on thumbor1002
* 16:50 reedy@deploy1001: Synchronized dblists/: Update size related dblists (duration: 00m 49s)
*12:21 godog: bounce thumbor on thumbor1002
* 16:31 ebernhardson: drop archive indices from cloudelastic
*11:48 _joe_: restarted tumbor-instances on thumbor1001
* 16:11 ariel@deploy1001: Finished deploy [dumps/dumps@70e8498]: look for dumpstatus json file per wiki run (duration: 00m 05s)
*09:20 mobrovac: decommission restbase1011-b - [[phab:T223976|T223976]]
* 16:11 ariel@deploy1001: Started deploy [dumps/dumps@70e8498]: look for dumpstatus json file per wiki run
*04:56 ariel@deploy1001: Finished deploy [dumps/dumps@61114e0]: add namespaces param only once for abstracts with lang variants (duration: 00m 07s)
* 16:05 ejegg: moved adyen smashpig job runner to frdev1001
*04:56 ariel@deploy1001: Started deploy [dumps/dumps@61114e0]: add namespaces param only once for abstracts with lang variants
* 15:25 _joe_: wiped opcache clean on all api, appservers
*00:30 jforrester@deploy1001: Synchronized php-1.34.0-wmf.6/extensions/VisualEditor/modules/ve-mw/init/ve.init.mw.ArticleTarget.js: Hot-deploy [[phab:T224319|T224319]] for VisualEditor switching and auto-restore (duration: 00m 50s)
* 15:05 cdanis: cdanis@mw1239.eqiad.wmnet ~ % sudo php7adm /opcache-free
* 15:05 Krinkle: fix opcache krinkle@mw1268:~$ scap pull
* 15:04 cdanis: cdanis@mw1268.eqiad.wmnet ~ % sudo php7adm /opcache-free
* 15:03 Krinkle: ran 'scap pull' on mw1239.eqiad.wmnet to fix opcache corruption
* 14:56 jbond42: uploade zuul_2.5.10-wmf9 to jessie-wikimedia
* 14:54 krinkle@deploy1001: Synchronized wmf-config/CommonSettings.php: [[phab:T99740|T99740]] / {{Gerrit|d9dbecad9c7b}} (duration: 00m 51s)
* 14:33 akosiaris@deploy1001: scap-helm eventgate-analytics finished
* 14:32 akosiaris@deploy1001: scap-helm eventgate-analytics cluster staging completed
* 14:32 akosiaris@deploy1001: scap-helm eventgate-analytics upgrade -f lala.yaml staging stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
* 14:30 akosiaris@deploy1001: scap-helm eventgate-analytics finished
* 14:30 akosiaris@deploy1001: scap-helm eventgate-analytics cluster eqiad completed
* 14:30 akosiaris@deploy1001: scap-helm eventgate-analytics upgrade -f eventgate-analytics-eqiad-values.yaml production stable/eventgate-analytics [namespace: eventgate-analytics, clusters: eqiad]
* 14:30 akosiaris@deploy1001: scap-helm eventgate-analytics finished
* 14:30 akosiaris@deploy1001: scap-helm eventgate-analytics cluster codfw completed
* 14:30 akosiaris@deploy1001: scap-helm eventgate-analytics upgrade -f eventgate-analytics-codfw-values.yaml production stable/eventgate-analytics [namespace: eventgate-analytics, clusters: codfw]
* 13:30 ema: pool cp3038 w/ ATS backend [[phab:T222937|T222937]]
* 12:19 ema: depool cp3038 and reimage as upload_ats [[phab:T222937|T222937]]
* 11:52 jbond42: (un)load edac kernel modules on elastic1029 to test resetting counters
* 11:04 jbond42: restart refinery-eventlogging-saltrotate on an-coord1001
* 10:30 moritzm: installing symfony security updates
* 09:17 jynus: disabling replication lag alerts for backup source hosts on s1, s4, s8 [[phab:T206203|T206203]]
* 07:14 moritzm: uploaded linux-meta 1.21 for jessie-wikimedia (pointing to the new -9 ABI introduced with the 4.9.168 kernel)
* 07:12 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Fully repool db1100 into API (duration: 00m 50s)
* 06:55 ema: swift-fe: rolling restart to enable ensure_max_age [[phab:T222937|T222937]]
* 06:40 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1100 into API (duration: 00m 50s)
* 06:27 ema: ms-fe1005: pool with ensure_max_age [[phab:T222937|T222937]]
* 06:26 ariel@deploy1001: Finished deploy [dumps/dumps@6f9a5a4]: remove sleep between incr dumps of wikis (duration: 00m 05s)
* 06:26 ariel@deploy1001: Started deploy [dumps/dumps@6f9a5a4]: remove sleep between incr dumps of wikis
* 06:22 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1100 (duration: 00m 50s)
* 06:18 ema: ms-fe1005: depool and test ensure_max_age [[phab:T222937|T222937]]
* 06:09 _joe_: depooling mw1261 for tests
* 05:41 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Pool db2105 db2109 into s3 [[phab:T222772|T222772]] (duration: 00m 49s)
* 05:40 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Pool db2105 db2109 into s3 [[phab:T222772|T222772]] (duration: 00m 52s)
* 05:40 elukey: execute kafka preferred-replica-election on kafka-jumbo1001 as attempt to rebalance traffic (1002 seems handling way more than others since some days)
* 05:32 elukey: restart eventlogging daemons on eventlog1002 - kafka consumer errors in the logs, some lag built over time
* 05:08 marostegui: Stop MySQL on db1100
* 05:04 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1100 (duration: 00m 50s)
* 04:56 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Repool db2112 (duration: 00m 51s)
* 00:15 smalyshev@deploy1001: Finished deploy [wdqs/wdqs@e13facb]: Downgrade LDF server back for [[phab:T222471|T222471]] (duration: 00m 37s)
* 00:14 smalyshev@deploy1001: Started deploy [wdqs/wdqs@e13facb]: Downgrade LDF server back for [[phab:T222471|T222471]]