You are browsing a read-only backup copy of Wikitech. The live site can be found at wikitech.wikimedia.org

Server Admin Log: Difference between revisions

From Wikitech-static
Jump to navigation Jump to search
imported>Stashbot
(eileen: tools revision changed from a9e7dc1559 to 7b6018a16e)
imported>Stashbot
(cwhite: draining shards from logstash1010, logstash1033, logstash1034, logstash1035 - T321410)
 
(789 intermediate revisions by 4 users not shown)
Line 1: Line 1:
== 2020-07-15 ==
== 2022-12-03 ==
* 01:45 eileen: tools revision changed from {{Gerrit|a9e7dc1559}} to {{Gerrit|7b6018a16e}}
* 00:17 cwhite: draining shards from logstash1010, logstash1033, logstash1034, logstash1035 - [[phab:T321410|T321410]]
* 00:26 ryankemper@deploy1001: Finished deploy [wdqs/wdqs@8f6f660]: 0.3.41 (duration: 15m 10s)
* 00:11 ryankemper@deploy1001: Started deploy [wdqs/wdqs@8f6f660]: 0.3.41


== 2020-07-14 ==
== 2022-12-02 ==
* 19:52 jforrester@deploy1001: Synchronized php-1.35.0-wmf.41/vendor/wikimedia/parsoid/: [[phab:T252448|T252448]] [[phab:T255190|T255190]] Bump Parsoid to v0.12.0-a23 (duration: 01m 06s)
* 19:42 volans@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 18:13 ryankemper: all long-running elasticsearch reindex jobs are complete
* 19:42 volans@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Force run after a permission problem - volans@cumin1001"
* 18:09 jforrester@deploy1001: Synchronized dblists/: [[phab:T32405|T32405]] [[phab:T254287|T254287]] Remove the mobilemainpagelegacy dblist (duration: 01m 04s)
* 19:41 volans@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Force run after a permission problem - volans@cumin1001"
* 18:07 jforrester@deploy1001: Synchronized multiversion/MWConfigCacheGenerator.php: [[phab:T32405|T32405]] [[phab:T254287|T254287]] Stop loading the mobilemainpagelegacy dblist (duration: 01m 05s)
* 19:39 volans@cumin1001: START - Cookbook sre.dns.netbox
* 18:05 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [[phab:T32405|T32405]] [[phab:T254287|T254287]] Stop varying wgMFSpecialCaseMainPage (duration: 01m 05s)
* 19:38 volans@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 15:56 elukey: upgrade spark2 on stat100x to 2.4.4-bin-hadoop2.6-3
* 19:37 volans@cumin1001: START - Cookbook sre.dns.netbox
* 15:40 hnowlan@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'changeprop' for release 'production' .
* 19:36 volans: fixed git checkout permissions [[phab:T324334|T324334]]
* 15:25 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 19:11 sukhe: restart pybal on lvs5004
* 15:23 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
* 19:07 mutante: gitlab-runner* - upgrading gitlab-runner package version
* 15:17 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 18:55 sukhe: homer "cr*-eqsin*" commit "running homer for Gerrit: 863383"
* 15:13 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
* 18:53 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts lvs5001.eqsin.wmnet
* 15:12 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 18:53 sukhe@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 15:10 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
* 18:53 sukhe@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: lvs5001.eqsin.wmnet decommissioned, removing all IPs except the asset tag one - sukhe@cumin2002"
* 15:04 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 18:51 sukhe@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: lvs5001.eqsin.wmnet decommissioned, removing all IPs except the asset tag one - sukhe@cumin2002"
* 15:02 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
* 18:49 sukhe@cumin2002: START - Cookbook sre.dns.netbox
* 14:57 jforrester@deploy1001: Synchronized php-1.35.0-wmf.41/skins/Vector/includes/SkinVector.php: [[phab:T257914|T257914]] Restore div wrapper around print footer (duration: 01m 03s)
* 18:44 sukhe@cumin2002: START - Cookbook sre.hosts.decommission for hosts lvs5001.eqsin.wmnet
* 14:53 hnowlan@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'changeprop' for release 'production' .
* 18:22 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on lvs5001.eqsin.wmnet with reason: downtimed, in the process of decom
* 14:50 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 18:21 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 4:00:00 on lvs5001.eqsin.wmnet with reason: downtimed, in the process of decom
* 14:48 jforrester@deploy1001: Synchronized php-1.35.0-wmf.41/extensions/WikibaseMediaInfo/src/WikibaseMediaInfoHooks.php: Fix case of directory name (duration: 01m 05s)
* 18:20 sukhe: decomm lvs5001: restarting pybal
* 14:48 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
* 18:14 sukhe: cr[23]-eqsin*: set routing-options static route 103.102.166.224/28 next-hop 10.132.0.39
* 14:48 moritzm: rebooting apt1001 for kernel update
* 18:05 volans@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 14:42 jynus: stopping db1117:3322 (m2) replication temp. for otrs db cloning [[phab:T257928|T257928]]
* 18:05 volans@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Test run after git gc - volans@cumin1001"
* 14:40 hnowlan@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'changeprop' for release 'staging' .
* 18:03 volans@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Test run after git gc - volans@cumin1001"
* 14:33 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 18:01 volans@cumin1001: START - Cookbook sre.dns.netbox
* 14:31 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
* 18:00 volans: performed git gc on all (auth)dns hosts in /srv/git/netbox_dns_snippets - [[phab:T324334|T324334]]
* 14:26 oblivian@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'cxserver' for release 'production' .
* 17:36 sukhe: homer "cr*-eqsin*" commit "running homer for Gerrit: 862944"
* 14:23 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 16:56 bking@cumin2002: END (PASS) - Cookbook sre.wdqs.restart (exit_code=0)
* 14:21 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
* 16:53 jnuche@deploy1002: Finished scap: testing k8s deployment (duration: 08m 35s)
* 14:20 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 16:49 bking@cumin2002: START - Cookbook sre.wdqs.restart
* 14:18 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
* 16:49 bblack: (above agent runs completed on all text nodes for requestctl-for-misc patch)
* 14:18 jmm@cumin2001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99)
* 16:44 jnuche@deploy1002: Started scap: testing k8s deployment
* 14:18 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
* 16:44 bblack: running agent on A:cp-text for https://gerrit.wikimedia.org/r/c/operations/puppet/+/863375 (requestctl for misc)
* 14:14 oblivian@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'cxserver' for release 'production' .
* 16:29 bking@cumin2002: END (PASS) - Cookbook sre.wdqs.restart (exit_code=0)
* 14:13 andrewbogott: upgrading wikitech-static to mw 1.34.2
* 16:28 sukhe@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host lvs5004.eqsin.wmnet with OS buster
* 14:11 oblivian@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'cxserver' for release 'staging' .
* 16:21 bking@cumin2002: START - Cookbook sre.wdqs.restart
* 14:05 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 16:03 bking@cumin2002: END (PASS) - Cookbook sre.wdqs.restart (exit_code=0)
* 14:01 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
* 16:02 sukhe@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs5004.eqsin.wmnet with reason: host reimage
* 13:42 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.change-distro (exit_code=0)
* 15:59 sukhe@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs5004.eqsin.wmnet with reason: host reimage
* 13:28 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1112', diff saved to https://phabricator.wikimedia.org/P11900 and previous config saved to /var/cache/conftool/dbconfig/20200714-132823-marostegui.json
* 15:55 bking@cumin2002: START - Cookbook sre.wdqs.restart
* 13:27 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1130', diff saved to https://phabricator.wikimedia.org/P11899 and previous config saved to /var/cache/conftool/dbconfig/20200714-132742-marostegui.json
* 15:48 sukhe: homer "cr*-eqsin*" commit "running homer for Gerrit: 862998"
* 13:27 jbond@cumin1001: conftool action : set/pooled=yes; selector: name=dns1001.wikimedia.org
* 15:47 bking@cumin2002: END (PASS) - Cookbook sre.wdqs.restart (exit_code=0)
* 13:24 jbond42: reboot dns1001
* 15:43 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dns5004.wikimedia.org with OS buster
* 13:23 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 15:40 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
* 13:23 jbond@cumin1001: START - Cookbook sre.hosts.downtime
* 15:40 bking@cumin2002: START - Cookbook sre.wdqs.restart
* 13:22 jbond@cumin1001: conftool action : set/pooled=no; selector: name=dns1001.wikimedia.org
* 15:36 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
* 13:22 jbond@cumin1001: conftool action : set/pooled=yes; selector: name=dns1002.wikimedia.org
* 15:33 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
* 13:18 jbond42: reboot dns1002
* 15:30 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
* 13:18 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 15:29 bking@cumin2002: END (PASS) - Cookbook sre.wdqs.restart (exit_code=0)
* 13:18 jbond@cumin1001: START - Cookbook sre.hosts.downtime
* 15:28 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
* 13:18 jbond@cumin1001: conftool action : set/pooled=no; selector: name=dns1002.wikimedia.org
* 15:22 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' .
* 13:16 jbond@cumin1001: conftool action : set/pooled=yes; selector: name=dns2002.wikimedia.org
* 15:22 bking@cumin2002: START - Cookbook sre.wdqs.restart
* 13:13 jbond42: reboot dns2002
* 15:16 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dns5004.wikimedia.org with reason: host reimage
* 13:13 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 15:13 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
* 13:13 jbond@cumin1001: START - Cookbook sre.hosts.downtime
* 15:12 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on dns5004.wikimedia.org with reason: host reimage
* 13:13 jbond@cumin1001: conftool action : set/pooled=no; selector: name=dns2002.wikimedia.org
* 15:06 volans: run `git gc` on /srv/netbox-exports/dns.git on netbox[12]002 - [[phab:T324334|T324334]]
* 13:13 jbond@cumin1001: conftool action : set/pooled=yes; selector: name=dns2001.wikimedia.org
* 14:48 sukhe@cumin1001: START - Cookbook sre.hosts.reimage for host lvs5004.eqsin.wmnet with OS buster
* 13:10 jbond42: reboot dns2001
* 14:38 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host dns5004.wikimedia.org with OS buster
* 13:10 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 12:09 jynus: dropping all databases from db1133
* 13:10 jbond@cumin1001: START - Cookbook sre.hosts.downtime
* 11:16 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ganeti5001.eqsin.wmnet
* 13:10 jbond@cumin1001: conftool action : set/pooled=no; selector: name=dns2001.wikimedia.org
* 11:16 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 13:09 elukey@cumin1001: START - Cookbook sre.hadoop.change-distro
* 11:16 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti5001.eqsin.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
* 13:06 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.stop-cluster (exit_code=0)
* 11:12 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti5001.eqsin.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
* 13:01 jbond42: rebooting dns3002
* 11:02 jmm@cumin2002: START - Cookbook sre.dns.netbox
* 13:01 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 10:57 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ganeti5001.eqsin.wmnet
* 13:01 jbond@cumin1001: START - Cookbook sre.hosts.downtime
* 10:56 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
* 12:58 elukey@cumin1001: START - Cookbook sre.hadoop.stop-cluster
* 10:34 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on ganeti5001.eqsin.wmnet with reason: Remove from cluster for decom
* 12:57 oblivian@deploy1001: Synchronized wmf-config/InitialiseSettings.php: revert forcehttps after fixing [[phab:T257887|T257887]] (duration: 01m 02s)
* 10:34 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on ganeti5001.eqsin.wmnet with reason: Remove from cluster for decom
* 12:31 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.change-distro (exit_code=0)
* 10:01 vgutierrez: upload acme-chief 0.36 to apt.wm.o (bullseye) - [[phab:T321309|T321309]]
* 12:24 jbond42: route ns0.wikimedia.org to codfw for reboot
* 09:58 moritzm: installing publicsuffix updates from bullseye/buster point releases
* 12:20 moritzm: installing xen security updates (client-side tools/libs)
* 09:54 moritzm: installing debootstrap updates from bullseye point release
* 12:19 jbond42: re-enable puppet fleet
* 09:53 moritzm: rebalance ganeti codfw/C [[phab:T323222|T323222]]
* 12:07 jbond42: disable puppet fleet wide to reboot puppetdb's
* 09:52 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.addnode (exit_code=0) for new host ganeti2013.codfw.wmnet to cluster codfw and group C
* 12:07 jbond42: disable puppet ro reboot puppetdb's
* 09:51 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti2013.codfw.wmnet to cluster codfw and group C
* 12:01 jforrester@deploy1001: rebuilt and synchronized wikiversions files: group0 wikis to 1.35.0-wmf.41
* 09:11 marostegui@cumin1001: dbctl commit (dc=all): 'db1134 (re)pooling @ 100%: After cloning db1206', diff saved to https://phabricator.wikimedia.org/P42215 and previous config saved to /var/cache/conftool/dbconfig/20221202-091126-root.json
* 11:36 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1130 for query plan checks [[phab:T238966|T238966]] ', diff saved to https://phabricator.wikimedia.org/P11898 and previous config saved to /var/cache/conftool/dbconfig/20200714-113612-marostegui.json
* 08:56 marostegui@cumin1001: dbctl commit (dc=all): 'db1134 (re)pooling @ 75%: After cloning db1206', diff saved to https://phabricator.wikimedia.org/P42214 and previous config saved to /var/cache/conftool/dbconfig/20221202-085621-root.json
* 11:35 _joe_: restart pybal on lvs2009 [[phab:T257887|T257887]]
* 08:41 jayme@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
* 11:31 _joe_: restart pybal on lvs2010 [[phab:T257887|T257887]]
* 08:41 jayme@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
* 11:25 _joe_: restart pybal on lvs1015 [[phab:T257887|T257887]]
* 08:41 marostegui@cumin1001: dbctl commit (dc=all): 'db1134 (re)pooling @ 50%: After cloning db1206', diff saved to https://phabricator.wikimedia.org/P42213 and previous config saved to /var/cache/conftool/dbconfig/20221202-084116-root.json
* 11:22 _joe_: restart pybal on lvs1016
* 08:41 jayme@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
* 11:15 jayme@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
* 08:40 jayme@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
* 11:03 jayme@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
* 08:26 marostegui@cumin1001: dbctl commit (dc=all): 'db1134 (re)pooling @ 25%: After cloning db1206', diff saved to https://phabricator.wikimedia.org/P42212 and previous config saved to /var/cache/conftool/dbconfig/20221202-082611-root.json
* 10:59 jayme@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'wikifeeds' for release 'staging' .
* 08:11 marostegui@cumin1001: dbctl commit (dc=all): 'db1134 (re)pooling @ 10%: After cloning db1206', diff saved to https://phabricator.wikimedia.org/P42211 and previous config saved to /var/cache/conftool/dbconfig/20221202-081106-root.json
* 10:56 volans@cumin1001: conftool action : set/pooled=inactive; selector: name=wtp2005.codfw.wmnet
* 07:56 marostegui@cumin1001: dbctl commit (dc=all): 'db1134 (re)pooling @ 5%: After cloning db1206', diff saved to https://phabricator.wikimedia.org/P42210 and previous config saved to /var/cache/conftool/dbconfig/20221202-075601-root.json
* 10:52 volans: powerdown wtp2005, hardware issue - [[phab:T257903|T257903]]
* 07:49 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
* 10:47 volans@cumin1001: conftool action : set/pooled=no; selector: name=wtp2005.codfw.wmnet
* 07:49 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 10:45 jiji@cumin1001: conftool action : set/pooled=no; selector: name=wtp2005.codfw.wmnet,service=parsoid-php
* 07:49 elukey@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
* 10:45 jiji@cumin1001: conftool action : set/pooled=no; selector: name=wtp2005.codfw.wmnet,service=parsoid
* 07:49 elukey@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
* 10:45 effie: depool wtp2005
* 07:49 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
* 10:42 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 07:49 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
* 10:42 aborrero@cumin1001: START - Cookbook sre.hosts.downtime
* 07:43 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
* 10:39 elukey@cumin1001: START - Cookbook sre.hadoop.change-distro
* 07:43 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
* 10:39 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.stop-cluster (exit_code=0)
* 07:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1163 (re)pooling @ 100%: Maint done', diff saved to https://phabricator.wikimedia.org/P42209 and previous config saved to /var/cache/conftool/dbconfig/20221202-074300-ladsgroup.json
* 10:32 elukey@cumin1001: START - Cookbook sre.hadoop.stop-cluster
* 07:41 moritzm: draining ganeti5001 for eventual decom [[phab:T322048|T322048]]
* 10:18 oblivian@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'termbox' for release 'production' .
* 07:41 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
* 10:14 James_F: Running AbuseFilter's updateVarDumps for group1 [[phab:T246539|T246539]]
* 07:41 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
* 10:13 oblivian@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'termbox' for release 'production' .
* 07:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1163 (re)pooling @ 75%: Maint done', diff saved to https://phabricator.wikimedia.org/P42208 and previous config saved to /var/cache/conftool/dbconfig/20221202-072755-ladsgroup.json
* 10:10 oblivian@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'termbox' for release 'staging' .
* 07:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1163 (re)pooling @ 25%: Maint done', diff saved to https://phabricator.wikimedia.org/P42207 and previous config saved to /var/cache/conftool/dbconfig/20221202-071250-ladsgroup.json
* 10:10 oblivian@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'termbox' for release 'test' .
* 06:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1163 (re)pooling @ 10%: Maint done', diff saved to https://phabricator.wikimedia.org/P42206 and previous config saved to /var/cache/conftool/dbconfig/20221202-065745-ladsgroup.json
* 09:44 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1112', diff saved to https://phabricator.wikimedia.org/P11897 and previous config saved to /var/cache/conftool/dbconfig/20200714-094449-marostegui.json
* 06:13 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1134', diff saved to https://phabricator.wikimedia.org/P42204 and previous config saved to /var/cache/conftool/dbconfig/20221202-061259-marostegui.json
* 09:43 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1075', diff saved to https://phabricator.wikimedia.org/P11896 and previous config saved to /var/cache/conftool/dbconfig/20200714-094354-marostegui.json
* 00:09 rzl@cumin1001: conftool action : set/pooled=no; selector: name=mw14(45{{!}}46).eqiad.wmnet,cluster=jobrunner
* 09:39 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: ExtensionDistributor: Add REL1_35 as a candidate release (duration: 01m 06s)
* 00:09 rzl@cumin1001: conftool action : set/pooled=no; selector: name=mw14(39{{!}}40).eqiad.wmnet,cluster=videoscaler
* 09:05 jforrester@deploy1001: Finished scap: Re-re-start full scap to push out wmf.41 and switch testwikis to it [[phab:T256669|T256669]] (duration: 51m 41s)
* 00:07 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dns5004.wikimedia.org with OS buster
* 08:40 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1135 for PDU upgrade [[phab:T257871|T257871]]', diff saved to https://phabricator.wikimedia.org/P11895 and previous config saved to /var/cache/conftool/dbconfig/20200714-084033-marostegui.json
* 08:30 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 08:30 akosiaris@cumin1001: START - Cookbook sre.hosts.downtime
* 08:30 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 08:30 akosiaris@cumin1001: START - Cookbook sre.hosts.downtime
* 08:30 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 08:30 akosiaris@cumin1001: START - Cookbook sre.hosts.downtime
* 08:13 jforrester@deploy1001: Started scap: Re-re-start full scap to push out wmf.41 and switch testwikis to it [[phab:T256669|T256669]]
* 08:05 akosiaris: restart pybal on lvs2009
* 08:03 _joe_: restart pybal on lvs1016
* 08:02 akosiaris: restart pybal on lvs2007
* 08:01 akosiaris@cumin1001: conftool action : set/pooled=inactive; selector: name=restbase2009.codfw.wmnet
* 08:00 _joe_: restart pybal on lvs1015
* 08:00 akosiaris: restart pybal on lvs2010 after merging https://gerrit.wikimedia.org/r/612487
* 07:52 jforrester@deploy1001: sync aborted: Re-start full scap to push out wmf.41 and switch testwikis to it [[phab:T256669|T256669]] (duration: 02m 14s)
* 07:50 jforrester@deploy1001: Started scap: Re-start full scap to push out wmf.41 and switch testwikis to it [[phab:T256669|T256669]]
* 07:48 oblivian@deploy1001: Synchronized wmf-config/InitialiseSettings.php: revert forcehttps in an attempt to fix [[phab:T257887|T257887]] (duration: 01m 06s)
* 07:32 oblivian@deploy1001: sync-file aborted: revert forcehttps in an attempt to fix [[phab:T257887|T257887]] (duration: 00m 20s)
* 07:31 oblivian@deploy1001: Scap failed!: 7/9 canaries failed their endpoint checks(http://en.wikipedia.org)
* 07:27 moritzm: installing libtasn1-6 security updates
* 07:12 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1075', diff saved to https://phabricator.wikimedia.org/P11894 and previous config saved to /var/cache/conftool/dbconfig/20200714-071233-marostegui.json
* 07:04 marostegui: Drop gerrit, gerritro, gerrittest users from m2 databases - [[phab:T255715|T255715]]
* 06:58 marostegui: Stop mysql on db1131 for HW maintenance
* 06:56 oblivian@deploy2001: helmfile [EQIAD] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
* 06:54 jforrester@deploy1001: scap failed: RuntimeError Scap failed!: 9/9 canaries failed their endpoint checks(http://en.wikipedia.org) (duration: 24m 59s)
* 06:54 jforrester@deploy1001: Scap failed!: 9/9 canaries failed their endpoint checks(http://en.wikipedia.org)
* 06:53 oblivian@deploy2001: helmfile [EQIAD] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
* 06:53 marostegui: Deploy MCR schema change on s5 primary master [[phab:T238966|T238966]]
* 06:52 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1078', diff saved to https://phabricator.wikimedia.org/P11893 and previous config saved to /var/cache/conftool/dbconfig/20200714-065229-marostegui.json
* 06:29 jforrester@deploy1001: Started scap: testwikis wikis to 1.35.0-wmf.41
* 05:15 marostegui@cumin1001: dbctl commit (dc=all): 'Decrease a bit db1088 load', diff saved to https://phabricator.wikimedia.org/P11891 and previous config saved to /var/cache/conftool/dbconfig/20200714-051551-marostegui.json
* 05:09 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1131 for HW maintenance', diff saved to https://phabricator.wikimedia.org/P11890 and previous config saved to /var/cache/conftool/dbconfig/20200714-050931-marostegui.json
* 05:09 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1093 from api', diff saved to https://phabricator.wikimedia.org/P11889 and previous config saved to /var/cache/conftool/dbconfig/20200714-050912-marostegui.json
* 05:01 marostegui@cumin1001: dbctl commit (dc=all): 'Promote db1093 to s6 master and remove read-only from s6 [[phab:T257253|T257253]]', diff saved to https://phabricator.wikimedia.org/P11888 and previous config saved to /var/cache/conftool/dbconfig/20200714-050157-marostegui.json
* 05:00 marostegui@cumin1001: dbctl commit (dc=all): 'Set s6 as read-only for maintenance [[phab:T257253|T257253]]', diff saved to https://phabricator.wikimedia.org/P11887 and previous config saved to /var/cache/conftool/dbconfig/20200714-050039-marostegui.json
* 05:00 marostegui: Starting s6 failover from db1131 to db1093 - [[phab:T257253|T257253]]
* 04:59 James_F: 1.35.0-wmf.41 branched at {{Gerrit|7d04152db4f8ea9a459511bed8117101d9bb4602}}
* 04:39 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1078', diff saved to https://phabricator.wikimedia.org/P11886 and previous config saved to /var/cache/conftool/dbconfig/20200714-043907-marostegui.json
* 04:15 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1093 in preparation for failover', diff saved to https://phabricator.wikimedia.org/P11885 and previous config saved to /var/cache/conftool/dbconfig/20200714-041548-marostegui.json
* 04:14 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1130', diff saved to https://phabricator.wikimedia.org/P11884 and previous config saved to /var/cache/conftool/dbconfig/20200714-041440-marostegui.json
* 01:23 ryankemper: Started long-running Elasticsearch reindex of `eqiad`, `codfw`, and `cloudelastic`. tmux session `reindex` under `ryankemper` on `mwmaint1002`
* 01:20 cdanis: ❌cdanis@lvs1015.eqiad.wmnet ~ 🕤🍺 sudo systemctl restart pybal.service
* 01:15 cdanis: ✔️ cdanis@lvs1016.eqiad.wmnet ~ 🕘🍺 sudo systemctl restart pybal.service
* 01:14 cdanis: ✔️ cdanis@lvs2009.codfw.wmnet ~ 🕘🍺 sudo systemctl restart pybal.service
* 01:01 cdanis: ✔️ cdanis@lvs2010.codfw.wmnet ~ 🕘🍺 sudo systemctl restart pybal.service


== 2020-07-13 ==
== 2022-12-01 ==
* 23:06 mutante: releases* delete /usr/local/sbin/sync-* scripts created by rsync::quickdatacopy and let puppet recreate the ones still needed
* 23:47 rzl@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw[1347-1348].eqiad.wmnet
* 22:27 krinkle@deploy1001: Synchronized wmf-config/InitialiseSettings.
* 23:47 rzl@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 23:47 rzl@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: mw[1347-1348].eqiad.wmnet decommissioned, removing all IPs except the asset tag one - rzl@cumin1001"
* 23:45 rzl@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: mw[1347-1348].eqiad.wmnet decommissioned, removing all IPs except the asset tag one - rzl@cumin1001"
* 23:43 rzl@cumin1001: START - Cookbook sre.dns.netbox
* 23:37 rzl@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw[1347-1348].eqiad.wmnet
* 23:35 rzl@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw[1327-1346].eqiad.wmnet
* 23:35 rzl@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 23:35 rzl@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: mw[1327-1346].eqiad.wmnet decommissioned, removing all IPs except the asset tag one - rzl@cumin1001"
* 23:34 rzl@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: mw[1327-1346].eqiad.wmnet decommissioned, removing all IPs except the asset tag one - rzl@cumin1001"
* 23:31 rzl@cumin1001: START - Cookbook sre.dns.netbox
* 22:59 rzl@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw[1327-1346].eqiad.wmnet
* 22:57 urbanecm@deploy1002: Finished scap: Backport for [[gerrit:856008{{!}}GrowthExperiments: Remove unused config variable GEMentorDashboardUseVue]] (duration: 07m 28s)
* 22:57 rzl: rzl@puppetmaster1001:~$ sudo puppet node deactivate mw1320.eqiad.wmnet  # [[phab:T306162|T306162]]
* 22:56 rzl: rzl@puppetmaster1001:~$ sudo puppet node deactivate mw1312.eqiad.wmnet  # [[phab:T306162|T306162]]
* 22:54 rzl@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts mw[1307-1326].eqiad.wmnet
* 22:54 rzl@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 22:54 rzl@cumin1001: END (PASS


== 2020-07-11 ==
==Archives ==
* 19:16 qchris: Restarting Gerrit on gerrit1001 to switch to new gerrit.war and zuul plugin
* 19:16 qchris@deploy1001: Finished deploy [gerrit/gerrit@a71a0df]: Gerrit to v3.2.2-138-g230805407f and zuul plugin to master-12-ge51d7e8 on gerrit1001 (duration: 00m 07s)
* 19:15 qchris@deploy1001: Started deploy [gerrit/gerrit@a71a0df]: Gerrit to v3.2.2-138-g230805407f and zuul plugin to master-12-ge51d7e8 on gerrit1001
* 19:08 qchris: Restarting Gerrit on gerrit2001 to switch to new gerrit.war and zuul plugin
* 18:55 qchris@deploy1001: Finished deploy [gerrit/gerrit@a71a0df]: Gerrit to v3.2.2-138-g230805407f and zuul plugin to master-12-ge51d7e8 on gerrit2001 (duration: 00m 10s)
* 18:55 qchris@deploy1001: Started deploy [gerrit/gerrit@a71a0df]: Gerrit to v3.2.2-138-g230805407f and zuul plugin to master-12-ge51d7e8 on gerrit2001
 
== 2020-07-10 ==
* 21:52 ryankemper: Started long-running reindex of Elasticsearch indices in `eqiad`, `codfw`, and `dewiki` on `mwmaint1002` under tmux session `reindex` for user `ryankemper`
* 20:26 jgleeson: updated fundraising-tools from {{Gerrit|08ba1f6177}} to {{Gerrit|f8e424fe32}}
* 19:02 mutante: removing firewall hole for gerrit -> mysql servers on dbproxy servers for misc db's
* 18:44 mutante: kubernetes1004 - started nagios-nrpe-server
* 17:57 ebernhardson: change loginwiki password for Cindy-the-browser-test-bot, no email account was associated to allow for normal reset.
* 17:05 krinkle@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|I63fcea7737}} (duration: 00m 57s)
* 16:16 elukey@cumin1001: END (FAIL) - Cookbook sre.hadoop.change-distro (exit_code=99)
* 15:57 milimetric@deploy1001: Finished deploy [analytics/refinery@4d40145] (thin): Update EventLogging refine whitelist (THIN) (duration: 00m 08s)
* 15:56 milimetric@deploy1001: Started deploy [analytics/refinery@4d40145] (thin): Update EventLogging refine whitelist (THIN)
* 15:44 milimetric@deploy1001: Finished deploy [analytics/refinery@4d40145]: Update EventLogging refine whitelist (duration: 15m 17s)
* 15:30 elukey@cumin1001: START - Cookbook sre.hadoop.change-distro
* 15:29 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.stop-cluster (exit_code=0)
* 15:29 milimetric@deploy1001: Started deploy [analytics/refinery@4d40145]: Update EventLogging refine whitelist
* 15:19 elukey@cumin1001: START - Cookbook sre.hadoop.stop-cluster
* 15:03 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.change-distro (exit_code=0)
* 14:39 elukey@cumin1001: START - Cookbook sre.hadoop.change-distro
* 14:37 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.stop-cluster (exit_code=0)
* 14:30 elukey@cumin1001: START - Cookbook sre.hadoop.stop-cluster
* 13:41 godog: bounce ms-be1037, not quite responsive
* 12:36 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1110', diff saved to https://phabricator.wikimedia.org/P11860 and previous config saved to /var/cache/conftool/dbconfig/20200710-123604-marostegui.json
* 12:20 reedy@deploy1001: Synchronized php-1.35.0-wmf.40/extensions/Score/: Make Score errors use a specific css class (duration: 00m 58s)
* 10:21 kormat@cumin1001: dbctl commit (dc=all): 'Finish repooling es1021, and remove weight from es1010 [[phab:T257284|T257284]]', diff saved to https://phabricator.wikimedia.org/P11859 and previous config saved to /var/cache/conftool/dbconfig/20200710-102147-kormat.json
* 09:49 kormat@cumin1001: dbctl commit (dc=all): 'Start repooling es1021 after reimage @ 50% [[phab:T257284|T257284]]', diff saved to https://phabricator.wikimedia.org/P11858 and previous config saved to /var/cache/conftool/dbconfig/20200710-094954-kormat.json
* 09:04 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 09:02 kormat@cumin1001: START - Cookbook sre.hosts.downtime
* 08:51 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1110', diff saved to https://phabricator.wikimedia.org/P11857 and previous config saved to /var/cache/conftool/dbconfig/20200710-085157-marostegui.json
* 08:51 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1106', diff saved to https://phabricator.wikimedia.org/P11856 and previous config saved to /var/cache/conftool/dbconfig/20200710-085112-marostegui.json
* 08:50 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1107', diff saved to https://phabricator.wikimedia.org/P11855 and previous config saved to /var/cache/conftool/dbconfig/20200710-085040-marostegui.json
* 08:23 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1106', diff saved to https://phabricator.wikimedia.org/P11853 and previous config saved to /var/cache/conftool/dbconfig/20200710-082346-marostegui.json
* 08:23 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1107', diff saved to https://phabricator.wikimedia.org/P11852 and previous config saved to /var/cache/conftool/dbconfig/20200710-082329-marostegui.json
* 08:22 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 08:22 jmm@cumin2001: START - Cookbook sre.hosts.downtime
* 08:22 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 08:22 jmm@cumin2001: START - Cookbook sre.hosts.downtime
* 08:09 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1107', diff saved to https://phabricator.wikimedia.org/P11851 and previous config saved to /var/cache/conftool/dbconfig/20200710-080912-marostegui.json
* 08:09 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1119', diff saved to https://phabricator.wikimedia.org/P11850 and previous config saved to /var/cache/conftool/dbconfig/20200710-080854-marostegui.json
* 08:09 kormat@cumin1001: dbctl commit (dc=all): 'Depool es1021 for reimaging [[phab:T257284|T257284]]', diff saved to https://phabricator.wikimedia.org/P11849 and previous config saved to /var/cache/conftool/dbconfig/20200710-080843-kormat.json
* 08:01 kormat@cumin1001: dbctl commit (dc=all): 'Reset es2020/es2021 to correct weights after master switch [[phab:T257284|T257284]]', diff saved to https://phabricator.wikimedia.org/P11848 and previous config saved to /var/cache/conftool/dbconfig/20200710-080133-kormat.json
* 08:00 moritzm: installing cron security updates on jessie (stretch/buster already fixed)
* 07:56 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1119', diff saved to https://phabricator.wikimedia.org/P11847 and previous config saved to /var/cache/conftool/dbconfig/20200710-075608-marostegui.json
* 07:55 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1107', diff saved to https://phabricator.wikimedia.org/P11846 and previous config saved to /var/cache/conftool/dbconfig/20200710-075500-marostegui.json
* 07:54 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1079', diff saved to https://phabricator.wikimedia.org/P11845 and previous config saved to /var/cache/conftool/dbconfig/20200710-075431-marostegui.json
* 07:44 kormat: reimaging es1021 to buster [[phab:T257284|T257284]]
* 07:43 kormat@cumin1001: dbctl commit (dc=all): 'Add weight to es1020, reduce weight on es1021 [[phab:T257284|T257284]]', diff saved to https://phabricator.wikimedia.org/P11844 and previous config saved to /var/cache/conftool/dbconfig/20200710-074326-kormat.json
* 07:41 jbond@deploy1001: Finished deploy [librenms/librenms@0a88d64]: redeplopy to [try and] fix php errors (duration: 00m 05s)
* 07:41 jbond@deploy1001: Started deploy [librenms/librenms@0a88d64]: redeplopy to [try and] fix php errors
* 07:32 moritzm: installing e2fsprogs security updates on jessie (stretch/buster already fixed)
* 07:15 akosiaris@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'proton' for release 'production' .
* 07:14 akosiaris@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'proton' for release 'production' .
* 07:13 akosiaris@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'proton' for release 'production' .
* 06:57 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1099:3311', diff saved to https://phabricator.wikimedia.org/P11843 and previous config saved to /var/cache/conftool/dbconfig/20200710-065751-marostegui.json
* 06:38 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1099:3311', diff saved to https://phabricator.wikimedia.org/P11841 and previous config saved to /var/cache/conftool/dbconfig/20200710-063818-marostegui.json
* 06:37 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1134', diff saved to https://phabricator.wikimedia.org/P11840 and previous config saved to /var/cache/conftool/dbconfig/20200710-063746-marostegui.json
* 06:35 marostegui: Compress InnoDB on db1124:3311 (Sanitarium - lag will appear on s1 on labsdb) - [[phab:T254462|T254462]]
* 04:44 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1134', diff saved to https://phabricator.wikimedia.org/P11839 and previous config saved to /var/cache/conftool/dbconfig/20200710-044428-marostegui.json
* 01:44 mutante: LDAP - adding coka to wmde and nda ([[phab:T257038|T257038]])
* 00:47 Reedy: truncated labswiki.interwiki table (outdated and unnecessary)
 
== 2020-07-09 ==
* 23:10 krinkle@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|I2c2dea832}} (duration: 00m 56s)
* 21:52 tgr: all sessions have been invalidated due to [[phab:T256395|T256395]]
* 20:58 eileen: https://phabricator.wikimedia.org/T253152
* 19:16 herron: upgraded eqiad elk7 cluster from 7.4.2 to 7.8.0 [[phab:T234854|T234854]]
* 19:05 twentyafterfour@deploy1001: rebuilt and synchronized wikiversions files: all wikis to 1.35.0-wmf.40  refs [[phab:T256668|T256668]]
* 18:51 elukey: update spark2 to 2.4.4-bin-hadoop2.6-3 for buster-wikimedia
* 18:44 mutante: stat1004, stat1006, stat1007 - upgrading git-review package from 1.25 to 1.27 so that it keeps working with new Gerrit 3.2 ([[phab:T257609|T257609]])
* 18:10 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|9f2557f848e99facaa62ca6b3a948cc3e32c32a3}}: Updating config for Readers Web affinity quicksurvey ([[phab:T246977|T246977]]) (duration: 01m 06s)
* 17:42 chaomodus: codfw frack management dns automation deployment complete [[phab:T233183|T233183]]
* 17:37 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 17:36 James_F: Synchronized wmf-config/CommonSettings.php: ExtensionDistribution: Drop REL1_33, EOL'ed [[phab:T256087|T256087]]
* 17:35 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
* 17:35 moritzm: rebooting moscovium for kernel update
* 17:33 chaomodus: deploying frack codfw management dns automation
* 17:32 crusnov@cumin2001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 17:31 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 17:28 crusnov@cumin2001: START - Cookbook sre.dns.netbox
* 17:27 moritzm: rebooting planet1002 (planet.wikimedia.org) for kernel update
* 17:27 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
* 17:10 krinkle@deploy1001: Synchronized wmf-config/: {{Gerrit|Ia2f5eddbf2aad2}} (duration: 01m 04s)
* 17:09 krinkle@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|Ia2f5eddbf2aad2}} (duration: 01m 05s)
* 15:32 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 15:29 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
* 14:29 papaul: replacing msw-b1,b2,b3 and b4
* 14:03 moritzm: installing libtirpc security updates
* 13:45 moritzm: installing gnutls28 security updates
* 13:31 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1089', diff saved to https://phabricator.wikimedia.org/P11831 and previous config saved to /var/cache/conftool/dbconfig/20200709-133134-marostegui.json
* 13:31 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 13:29 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
* 13:29 moritzm: rebooting puppetboard1001 (puppetboard.wikimedia.org) for kernel update
* 13:15 moritzm: installing ffmpeg security updates
* 13:11 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 13:10 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1089', diff saved to https://phabricator.wikimedia.org/P11830 and previous config saved to /var/cache/conftool/dbconfig/20200709-131039-marostegui.json
* 13:08 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
* 13:07 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 13:05 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
* 13:00 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 12:58 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
* 12:57 akosiaris@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'proton' for release 'production' .
* 12:57 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 12:56 akosiaris@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'proton' for release 'production' .
* 12:56 akosiaris@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'proton' for release 'production' .
* 12:54 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
* 12:54 moritzm: rebooting install* servers for kernel security update
* 12:43 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 12:40 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
* 12:40 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 12:38 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
* 12:38 moritzm: rebooting urldownloader1001/2001 for kernel update (failed over, these are now the inactive ones)
* 12:23 jmm@cumin2001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99)
* 12:22 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
* 12:22 moritzm: rebooting dbmonitor1001 / tendril.wikimedia.org for kernek update
* 12:11 XioNoX: enable asw2-b-eqiad:ae3 (to cloudsw1-c8) - [[phab:T251632|T251632]]
* 11:56 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 11:54 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
* 11:52 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 11:50 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
* 11:50 moritzm: rebooting debmonitor1001 for kernel update
* 11:42 urbanecm@deploy1001: Synchronized php-1.35.0-wmf.40/extensions/Translate/tag/SpecialPageTranslation.php: {{Gerrit|6541d3ff51f52fe8a1bdbfa86022f8d97d6c7680}}: DeprecatablePropertyArray: Use MW_VERSION instead of array_key_exists ([[phab:T257531|T257531]]) (duration: 01m 05s)
* 11:28 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|3a7c1c33e58637437f819edf039008a00dc5be27}}: Rename namespace on kn.wikipedia.org ([[phab:T255337|T255337]]) (duration: 01m 04s)
* 11:24 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|0a3c1f94a702b527842ed4f34d8bf41b26235e64}}: Add *.oireachtas.ie to the wgCopyUploadsDomains whitelist for commonswiki ([[phab:T256543|T256543]]) (duration: 01m 04s)
* 11:19 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 11:17 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
* 11:10 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 11:10 aborrero@cumin1001: START - Cookbook sre.hosts.downtime
* 11:09 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|e6f442c6900524482806aeb1b5162e65bf7c97ac}}: Enable Quicksurveys for Desktop Improvements Project ([[phab:T246977|T246977]]) (duration: 01m 06s)
* 11:01 vgutierrez: restart ats-tls on cp1085
* 10:55 _joe_: restarting php7.2-fpm on mw1282, workers failing with sigill
* 10:54 _joe_: depool mw1282
* 10:54 mvolz@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'citoid' for release 'production' .
* 10:34 mvolz@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'citoid' for release 'production' .
* 10:23 _joe_: rolling restart the remaining restbases in eqiad, and all of codfw
* 10:22 mvolz@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'citoid' for release 'staging' .
* 10:09 _joe_: restarting restbase on rb1020-22
* 09:53 _joe_: restarting restbase on restbase1024,1023
* 09:36 _joe_: restarting restbase on rb1026,1027 to switch to proton on k8s
* 09:34 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 09:31 marostegui@cumin1001: START - Cookbook sre.hosts.downtime
* 09:28 _joe_: restarting restbase on restbase1025 to pick up the switch to k8s of proton
* 09:27 godog: bounce thanos-compact on thanos-fe2001
* 09:07 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.change-distro (exit_code=0)
* 08:52 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1079', diff saved to https://phabricator.wikimedia.org/P11828 and previous config saved to /var/cache/conftool/dbconfig/20200709-085228-marostegui.json
* 08:44 marostegui: Stop haproxy on dbproxy1017 before upgrading to buster - [[phab:T255408|T255408]]
* 08:23 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1136', diff saved to https://phabricator.wikimedia.org/P11827 and previous config saved to /var/cache/conftool/dbconfig/20200709-082355-marostegui.json
* 08:23 moritzm: imported osm2pgsql 0.96.0+ds-1~bpo9+1 to "main" component [[phab:T256877|T256877]]
* 08:22 elukey@cumin1001: START - Cookbook sre.hadoop.change-distro
* 08:20 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.stop-cluster (exit_code=0)
* 08:13 elukey@cumin1001: START - Cookbook sre.hadoop.stop-cluster
* 08:11 XioNoX: disable igmp snooping on msw1-codfw
* 07:59 marostegui: Stop db1117:3322 to clone db1084, this will trigger haproxy alerts - [[phab:T257540|T257540]]
* 07:57 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1136', diff saved to https://phabricator.wikimedia.org/P11825 and previous config saved to /var/cache/conftool/dbconfig/20200709-075749-marostegui.json
* 07:19 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 07:15 marostegui@cumin1001: START - Cookbook sre.hosts.downtime
* 06:52 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 06:49 marostegui@cumin1001: START - Cookbook sre.hosts.downtime
* 05:39 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1101:3317', diff saved to https://phabricator.wikimedia.org/P11824 and previous config saved to /var/cache/conftool/dbconfig/20200709-053905-marostegui.json
* 05:32 marostegui@cumin1001: dbctl commit (dc=all): 'Remove db1084 from dbctl', diff saved to https://phabricator.wikimedia.org/P11823 and previous config saved to /var/cache/conftool/dbconfig/20200709-053206-marostegui.json
* 05:18 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1084', diff saved to https://phabricator.wikimedia.org/P11822 and previous config saved to /var/cache/conftool/dbconfig/20200709-051826-marostegui.json
* 05:13 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1101:3317', diff saved to https://phabricator.wikimedia.org/P11821 and previous config saved to /var/cache/conftool/dbconfig/20200709-051355-marostegui.json
* 05:11 marostegui: Remove revision triggers from db2093:3315 [[phab:T238966|T238966]]
* 05:10 marostegui: Deploy schema change on s5 codfw, lag will be generated - [[phab:T238966|T238966]]
* 01:43 tzatziki: reset email for GseSro
* 00:58 cdanis: ✔️ cdanis@cumin1001.eqiad.wmnet ~ 🕘🍺 sudo cumin A:cp 'enable-puppet "cdanis deploying {{Gerrit|I6c1b646e}} [[phab:T256395|T256395]]"'
* 00:49 cdanis: ✔️ cdanis@cumin1001.eqiad.wmnet ~ 🕘🍺 sudo cumin A:cp 'disable-puppet "cdanis deploying {{Gerrit|I6c1b646e}} [[phab:T256395|T256395]]"'
 
== 2020-07-08 ==
* 21:56 mutante: deleting files from releases2001 that are not existing on releases1001 to make them mirrors. rsync with --delete and the command from quickdatacopy class ([[phab:T247652|T247652]])
* 21:55 mutante: rsyncing releases files from releases1001 to releases2002 and releases1002. deleting files from releases2002 not existing on releases1002 to make them mirrors ( [[phab:T247652|T247652]]_
* 20:59 cstone: civicrm revision changed from {{Gerrit|d73ee2e73f}} to {{Gerrit|8b09c87ce2}},
* 20:27 Amir1: end of foreachwikiindblist wikidataclient extensions/Wikibase/lib/maintenance/populateSitesTable.php --force-protocol https ([[phab:T256012|T256012]])
* 20:08 Amir1_: start of foreachwikiindblist wikidataclient extensions/Wikibase/lib/maintenance/populateSitesTable.php --force-protocol https ([[phab:T256012|T256012]])
* 19:18 twentyafterfour@deploy1001: Synchronized php: group1 wikis to 1.35.0-wmf.40  refs [[phab:T256668|T256668]] (duration: 01m 04s)
* 19:17 twentyafterfour@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.35.0-wmf.40  refs [[phab:T256668|T256668]]
* 18:58 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|091442cf035a6d76f1211291afbb3193c513595d}}: Add *.nga.gov to the wgCopyUploadsDomains allowlist of Wikimedia Commons ([[phab:T256518|T256518]]) (duration: 01m 04s)
* 18:55 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|2e5943ddb30e08607a9ffb6ed05a042e8367e2e1}}: Add scan-bugs.org to $wgCopyUploadsDomains ([[phab:T256569|T256569]]) (duration: 01m 04s)
* 18:46 urbanecm@deploy1001: Synchronized static/images/project-logos/: {{Gerrit|f42cdf2}}: Change bnwiki logo ([[phab:T255328|T255328]]) (duration: 01m 04s)
* 18:27 ppchelko@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Cleanup: remove temporary wmgDisableHTCP variable gerrit:607596 [[phab:T250781|T250781]] IS.php (duration: 01m 01s)
* 18:20 ppchelko@deploy1001: Synchronized wmf-config/CommonSettings.php: Disable HTCP purging everywhere gerrit:607593 [[phab:T250781|T250781]] CS.php (duration: 01m 03s)
* 18:18 ppchelko@deploy1001: Synchronized wmf-config/wikitech.php: Disable HTCP purging everywhere gerrit:607593 [[phab:T250781|T250781]] wikitech.php (duration: 01m 04s)
* 18:17 ppchelko@deploy1001: Synchronized wmf-config/reverse-proxy.php: Disable HTCP purging everywhere gerrit:607593 [[phab:T250781|T250781]] reverse-proxy.php (duration: 01m 04s)
* 18:11 ppchelko@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Add wgEventServiceDefault to refactor EventBus event stream config gerrit:610160 [[phab:T229863|T229863]], IS.php (duration: 01m 03s)
* 18:04 ppchelko@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Add wgEventServiceDefault to refactor EventBus event stream config gerrit:610160 [[phab:T229863|T229863]] (duration: 01m 04s)
* 17:34 elukey@cumin1001: END (FAIL) - Cookbook sre.hadoop.change-distro (exit_code=99)
* 17:16 elukey@cumin1001: START - Cookbook sre.hadoop.change-distro
* 17:16 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.stop-cluster (exit_code=0)
* 17:08 elukey@cumin1001: START - Cookbook sre.hadoop.stop-cluster
* 16:57 _joe_: restarting restbase across the fleet to transition to using envoy
* 16:40 _joe_: restarting restbase on restbase2010 to route calls to mediawiki, parsoid via envoy
* 16:40 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 16:37 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 16:32 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 16:27 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 15:22 jgleeson: updated fundraising-tools from {{Gerrit|a244e0e85f}} --> {{Gerrit|f5b8528214}}
* 15:16 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 15:12 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
* 15:12 moritzm: rebooting people1002 (people.wikimedia.org) for kernel security update
* 15:06 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 15:04 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
* 14:48 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 14:46 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
* 14:46 moritzm: installing isc-dhcp security updates
* 14:31 elukey@cumin1001: END (FAIL) - Cookbook sre.hadoop.change-distro (exit_code=99)
* 14:31 moritzm: installing gdk-pixbuf security updates
* 14:26 _joe_: repooling mw1346
* 14:24 _joe_: php7adm /opcache-free on mw1346
* 14:15 jbond42: switch icinga authentication to CAS SSO
* 14:12 _joe_: depooling mw1346
* 14:12 elukey@cumin1001: START - Cookbook sre.hadoop.change-distro
* 14:11 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.stop-cluster (exit_code=0)
* 14:06 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 14:04 elukey@cumin1001: START - Cookbook sre.hadoop.stop-cluster
* 14:04 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
* 14:04 moritzm: rebooting idp-test1001 for kernel update
* 13:59 elukey@cumin1001: END (ERROR) - Cookbook sre.hadoop.stop-cluster (exit_code=97)
* 13:59 elukey@cumin1001: START - Cookbook sre.hadoop.stop-cluster
* 13:39 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.change-distro (exit_code=0)
* 13:31 jynus: replacing ssh key for ci_docroot at deploy1001
* 13:31 moritzm: imported git 2.20.1-2+deb10u3~wmf1 for stretch-wikimedia component/git [[phab:T257308|T257308]]
* 13:10 elukey@cumin1001: START - Cookbook sre.hadoop.change-distro
* 13:07 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.stop-cluster (exit_code=0)
* 13:00 elukey@cumin1001: START - Cookbook sre.hadoop.stop-cluster
* 12:41 marostegui: Deploy schema change on s7 codfw, lag is expected
* 12:17 xionox-tmp: rollout less frequent option-refresh-rate - [[phab:T240658|T240658]]
* 12:01 xionox-tmp: renumber eqiad NTT link - [[phab:T254877|T254877]]
* 11:42 awight: EU BACON complete
* 11:41 awight@deploy1001: Synchronized wmf-config/InitialiseSettings.php: BACON: [[gerrit:610234{{!}}Undeploy graphoid for phase 1 wikis (T257402)]] (duration: 01m 03s)
* 11:31 awight@deploy1001: Synchronized wmf-config/InitialiseSettings.php: BACON: [[gerrit:610268{{!}}Add nature.com to commonswiki wgCopyUploadDomains (T254342)]] (duration: 01m 03s)
* 11:29 moritzm: installing freetype security updates
* 11:26 awight@deploy1001: Synchronized wmf-config/InitialiseSettings.php: BACON: [[gerrit:609991{{!}}[hiwikibooks] Translate sitename for hi.wikibooks (T256587)]] (duration: 01m 03s)
* 11:19 awight@deploy1001: Synchronized wmf-config/InitialiseSettings.php: BACON: [[gerrit:609990{{!}}[arwiki] Grant 'patrolmarks' to all (T257106)]] (duration: 01m 04s)
* 11:18 akosiaris@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'proton' for release 'production' .
* 11:18 moritzm: installing libgcrypt20 security updates
* 11:16 akosiaris@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'proton' for release 'production' .
* 11:07 awight@deploy1001: Synchronized wmf-config/InitialiseSettings.php: BACON: [[gerrit:610056{{!}}Provision WMDE TeWü survey for prototype 1 (T257306)]], file 2/2 (duration: 01m 03s)
* 11:06 awight@deploy1001: Synchronized wmf-config/InitialiseSettings-labs.php: BACON: [[gerrit:610056{{!}}Provision WMDE TeWü survey for prototype 1 (T257306)]], file 1/2 (duration: 01m 16s)
* 11:05 akosiaris@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'proton' for release 'production' .
* 11:05 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1121', diff saved to https://phabricator.wikimedia.org/P11818 and previous config saved to /var/cache/conftool/dbconfig/20200708-110546-marostegui.json
* 10:51 akosiaris@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'kube-system' for release 'calico-policy-controller' .
* 10:51 akosiaris@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'kube-system' for release 'calico-policy-controller' .
* 10:50 akosiaris: apply calico egress policies
* 10:50 akosiaris@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'kube-system' for release 'calico-policy-controller' .
* 10:45 moritzm: installing json-c security updates
* 10:25 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1121', diff saved to https://phabricator.wikimedia.org/P11817 and previous config saved to /var/cache/conftool/dbconfig/20200708-102553-marostegui.json
* 10:25 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1084', diff saved to https://phabricator.wikimedia.org/P11816 and previous config saved to /var/cache/conftool/dbconfig/20200708-102500-marostegui.json
* 10:13 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1084', diff saved to https://phabricator.wikimedia.org/P11815 and previous config saved to /var/cache/conftool/dbconfig/20200708-101313-marostegui.json
* 09:58 kormat@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 09:57 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.stop-cluster (exit_code=0)
* 09:56 kormat@cumin2001: START - Cookbook sre.hosts.downtime
* 09:50 elukey@cumin1001: START - Cookbook sre.hadoop.stop-cluster
* 09:45 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1149', diff saved to https://phabricator.wikimedia.org/P11814 and previous config saved to /var/cache/conftool/dbconfig/20200708-094539-marostegui.json
* 09:26 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1149', diff saved to https://phabricator.wikimedia.org/P11813 and previous config saved to /var/cache/conftool/dbconfig/20200708-092650-marostegui.json
* 09:26 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1148', diff saved to https://phabricator.wikimedia.org/P11812 and previous config saved to /var/cache/conftool/dbconfig/20200708-092627-marostegui.json
* 09:24 xionox-tmp: renumber eqord NTT link - [[phab:T254877|T254877]]
* 09:18 xionox-tmp: remove eqord-eqiad tunnel - [[phab:T254877|T254877]]
* 09:15 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1148', diff saved to https://phabricator.wikimedia.org/P11811 and previous config saved to /var/cache/conftool/dbconfig/20200708-091557-marostegui.json
* 08:57 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1147', diff saved to https://phabricator.wikimedia.org/P11810 and previous config saved to /var/cache/conftool/dbconfig/20200708-085745-marostegui.json
* 08:55 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 08:54 marostegui@cumin1001: START - Cookbook sre.hosts.decommission
* 08:54 marostegui@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=99)
* 08:54 marostegui@cumin1001: START - Cookbook sre.hosts.decommission
* 08:50 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1147', diff saved to https://phabricator.wikimedia.org/P11809 and previous config saved to /var/cache/conftool/dbconfig/20200708-085024-marostegui.json
* 08:42 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1074', diff saved to https://phabricator.wikimedia.org/P11808 and previous config saved to /var/cache/conftool/dbconfig/20200708-084227-marostegui.json
* 08:40 moritzm: upgrading docker on remaining buster hosts
* 08:38 hashar: Upgraded docker.io on contint1001 and contint2001
* 08:28 marostegui: Remove dbproxy1003 grants from misc hosts [[phab:T231280|T231280]]
* 08:26 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1146:3314', diff saved to https://phabricator.wikimedia.org/P11807 and previous config saved to /var/cache/conftool/dbconfig/20200708-082624-marostegui.json
* 08:20 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1146:3314', diff saved to https://phabricator.wikimedia.org/P11806 and previous config saved to /var/cache/conftool/dbconfig/20200708-082040-marostegui.json
* 08:16 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1144:3314', diff saved to https://phabricator.wikimedia.org/P11805 and previous config saved to /var/cache/conftool/dbconfig/20200708-081647-marostegui.json
* 08:15 kormat@cumin1001: dbctl commit (dc=all): 'Depool es2020 for reimaging [[phab:T257284|T257284]]', diff saved to https://phabricator.wikimedia.org/P11804 and previous config saved to /var/cache/conftool/dbconfig/20200708-081519-kormat.json
* 08:00 marostegui: Failover m1 from db1097 to db1080 - [[phab:T256717|T256717]]
* 07:57 kormat: reimaging es2020 to buster [[phab:T257284|T257284]]
* 07:51 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 07:49 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1144:3314', diff saved to https://phabricator.wikimedia.org/P11803 and previous config saved to /var/cache/conftool/dbconfig/20200708-074939-marostegui.json
* 07:49 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
* 07:48 jynus: stop bacula-director on backup1001 in preparation for m1 switchover [[phab:T256717|T256717]]
* 07:47 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 07:47 akosiaris@cumin1001: START - Cookbook sre.hosts.downtime
* 07:47 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 07:47 akosiaris@cumin1001: START - Cookbook sre.hosts.downtime
* 07:47 akosiaris@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 07:47 akosiaris@cumin1001: START - Cookbook sre.hosts.downtime
* 07:45 moritzm: installing PHP 7.3 security updates
* 07:35 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1143', diff saved to https://phabricator.wikimedia.org/P11802 and previous config saved to /var/cache/conftool/dbconfig/20200708-073548-marostegui.json
* 07:30 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1143', diff saved to https://phabricator.wikimedia.org/P11801 and previous config saved to /var/cache/conftool/dbconfig/20200708-073037-marostegui.json
* 07:30 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1142', diff saved to https://phabricator.wikimedia.org/P11800 and previous config saved to /var/cache/conftool/dbconfig/20200708-073011-marostegui.json
* 07:24 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1142', diff saved to https://phabricator.wikimedia.org/P11799 and previous config saved to /var/cache/conftool/dbconfig/20200708-072431-marostegui.json
* 07:09 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1141', diff saved to https://phabricator.wikimedia.org/P11798 and previous config saved to /var/cache/conftool/dbconfig/20200708-070921-marostegui.json
* 07:04 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1141', diff saved to https://phabricator.wikimedia.org/P11797 and previous config saved to /var/cache/conftool/dbconfig/20200708-070432-marostegui.json
* 07:04 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1138', diff saved to https://phabricator.wikimedia.org/P11796 and previous config saved to /var/cache/conftool/dbconfig/20200708-070403-marostegui.json
* 06:47 marostegui: start topology changes on m1 [[phab:T256717|T256717]]
* 06:43 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1138', diff saved to https://phabricator.wikimedia.org/P11795 and previous config saved to /var/cache/conftool/dbconfig/20200708-064354-marostegui.json
* 06:36 marostegui: Deploy schema change on s2 primary master db1122 [[phab:T238966|T238966]]
* 06:18 _joe_: rolling restart of restbase to pick up the proton url change
* 03:36 andrew@deploy1001: Finished deploy [horizon/deploy@505819d]: further fixes for proxy editing --bug 610130 (duration: 03m 44s)
* 03:32 andrew@deploy1001: Started deploy [horizon/deploy@505819d]: further fixes for proxy editing --bug 610130
 
== 2020-07-07 ==
* 22:41 mutante: new Wikimedia Annual Report 2019 now available on annual.wikimedia.org
* 21:29 andrew@deploy1001: Finished deploy [horizon/deploy@fce8183]: further fixes for proxy editing --bug 610130 (duration: 03m 35s)
* 21:25 andrew@deploy1001: Started deploy [horizon/deploy@fce8183]: further fixes for proxy editing --bug 610130
* 21:10 andrew@deploy1001: Finished deploy [horizon/deploy@abcd051]: further fixes for proxy editing --bug 610130 (duration: 03m 26s)
* 21:07 andrew@deploy1001: Started deploy [horizon/deploy@abcd051]: further fixes for proxy editing --bug 610130
* 20:41 ppchelko@deploy1001: Finished deploy [restbase/deploy@05b8bd5]: Remove restbase2009, take 2 (duration: 09m 15s)
* 20:32 ppchelko@deploy1001: Started deploy [restbase/deploy@05b8bd5]: Remove restbase2009, take 2
* 20:32 ppchelko@deploy1001: Finished deploy [restbase/deploy@05b8bd5]: Remove restbase2009 (duration: 14m 28s)
* 20:24 mutante: kubernetes1003 - starting nagios-nrpe-server
* 20:23 mutante: kubernetes1001 - starting nagios-nrpe-server
* 20:17 ppchelko@deploy1001: Started deploy [restbase/deploy@05b8bd5]: Remove restbase2009
* 19:29 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 19:27 mutante: destroying VM gerrit1002 - decom cookbook
* 19:26 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
* 19:18 twentyafterfour@deploy1001: rebuilt and synchronized wikiversions files: group0 wikis to 1.35.0-wmf.40  refs [[phab:T256668|T256668]]
* 19:04 mutante: contint2001 - move /var/lib/zuul/.ssh/known_hosts to root and run puppet to recreate it
* 18:38 andrew@deploy1001: Finished deploy [horizon/deploy@eaa056e]: fix for proxy editing --bug 610130 (duration: 03m 18s)
* 18:35 andrew@deploy1001: Started deploy [horizon/deploy@eaa056e]: fix for proxy editing --bug 610130
* 18:27 andrew@deploy1001: Finished deploy [horizon/deploy@a39e86c]: update proxy UI to support editing existing proxies (duration: 03m 26s)
* 18:23 andrew@deploy1001: Started deploy [horizon/deploy@a39e86c]: update proxy UI to support editing existing proxies
* 18:10 krinkle@deploy1001: Synchronized w/: remove untracked test cookie file (duration: 01m 04s)
* 18:08 krinkle@deploy1001: Synchronized php-1.35.0-wmf.40/includes/Revision/RevisionStore.php: {{Gerrit|I8f986daeab4}} (duration: 01m 05s)
* 17:59 herron: imported (logstash{{!}}kibana{{!}}elasticsearch)-oss-7.8.0 into buster-wikimedia thirdparty/elastic78
* 17:54 hnowlan: finished removing restbase2009 from cassandra pool
* 17:06 hnowlan: removed restbase2009-b from cassandra pool, removing restbase2009-c
* 16:40 lucaswerkmeister-wmde@deploy1001: Synchronized php-1.35.0-wmf.40/extensions/Wikibase: Backport: [[gerrit:610086{{!}}Revert "Don’t load $wgWBClientSettings in WikibaseClient.php" (T257296)]] (duration: 01m 10s)
* 15:49 hnowlan: running nodetool removenode for restbase2009-a
* 15:38 hnowlan@deploy1001: Started restart [restbase/deploy@05b8bd5]: Restarting restbase after removal of restbase2009
* 15:27 elukey: root-tmux on cumin1001 - cumin 'c:profile::mediawiki::mcrouter_wancache' '/usr/local/sbin/restart-mcrouter' -b 2 -s 5 - roll restart of mw-mcrouter to pick up new settings - [[phab:T255511|T255511]]
* 15:13 hnowlan@deploy1001: Started restart [restbase/deploy@05b8bd5]: Restarting restbase after removal of restbase2009
* 15:12 otto@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'eventgate-main' for release 'canary' .
* 15:12 otto@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
* 15:09 otto@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
* 15:09 otto@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'canary' .
* 15:06 otto@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'eventgate-analytics' for release 'production' .
* 15:04 otto@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'canary' .
* 15:04 otto@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
* 15:02 otto@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
* 15:02 otto@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'eventgate-main' for release 'canary' .
* 15:01 otto@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'eventgate-analytics' for release 'production' .
* 14:58 hashar@deploy1001: Finished deploy [integration/docroot@708d3eb]: Second deployment to ensure everything works fine. Thank you jynus (duration: 00m 04s)
* 14:58 hashar@deploy1001: Started deploy [integration/docroot@708d3eb]: Second deployment to ensure everything works fine. Thank you jynus
* 14:53 _joe_: restarted restbase on restbase2022 after removing restbase2009 from the cassandra seeds
* 14:48 otto@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'canary' .
* 14:47 otto@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
* 14:38 otto@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'canary' .
* 14:38 otto@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
* 14:31 otto@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
* 14:31 otto@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'eventgate-main' for release 'canary' .
* 14:30 papaul: replacing msw-a5,a6,a7 and a8
* 14:30 otto@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'eventgate-analytics' for release 'production' .
* 14:24 otto@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
* 14:24 otto@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'canary' .
* 14:20 otto@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'canary' .
* 14:20 otto@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
* 14:16 hashar@deploy1001: Finished deploy [integration/docroot@708d3eb]: (no justification provided) (duration: 00m 09s)
* 14:16 hashar@deploy1001: Started deploy [integration/docroot@708d3eb]: (no justification provided)
* 13:38 _joe_: rolling restart of restbase to pick up using envoy
* 13:31 otto@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'canary' .
* 13:31 otto@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
* 13:29 XioNoX: cr2-eqiad> request vmhost snapshot routing-engine both - [[phab:T257153|T257153]]
* 13:24 XioNoX: cr1-eqiad> request vmhost snapshot routing-engine both - [[phab:T257153|T257153]]
* 13:15 kormat@cumin1001: dbctl commit (dc=all): 'Promote es2021 to es4 master [[phab:T257284|T257284]]', diff saved to https://phabricator.wikimedia.org/P11789 and previous config saved to /var/cache/conftool/dbconfig/20200707-131524-kormat.json
* 12:59 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 12:57 elukey@cumin1001: START - Cookbook sre.hosts.downtime
* 12:44 kormat: starting (codfw) es5 failover from es2020 to es2021 [[phab:T257284|T257284]]
* 12:30 kormat@cumin1001: dbctl commit (dc=all): 'Set es2021 to weight 50 [[phab:T257284|T257284]]', diff saved to https://phabricator.wikimedia.org/P11787 and previous config saved to /var/cache/conftool/dbconfig/20200707-123003-kormat.json
* 12:12 jforrester@deploy1001: Finished scap: Full scap and testwikis to 1.35.0-wmf.40 for [[phab:T256668|T256668]] (duration: 33m 09s)
* 12:01 marostegui: Deploy schema change on labswiki (wikitech) master - [[phab:T253276|T253276]]
* 11:58 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1082', diff saved to https://phabricator.wikimedia.org/P11786 and previous config saved to /var/cache/conftool/dbconfig/20200707-115838-marostegui.json
* 11:39 jforrester@deploy1001: Started scap: Full scap and testwikis to 1.35.0-wmf.40 for [[phab:T256668|T256668]]
* 11:38 jforrester@deploy1001: scap failed: LockFailedError Failed to acquire lock "/var/lock/scap.operations_mediawiki-config.lock"; owner is "jforrester"; reason is "testwikis wikis to 1.35.0-wmf.40" (duration: 00m 00s)
* 11:33 moritzm: installing PHP 7.0 security updates
* 11:29 marostegui: Deploy schema change on db1082, this will create lag on s5 labs
* 11:29 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1082', diff saved to https://phabricator.wikimedia.org/P11784 and previous config saved to /var/cache/conftool/dbconfig/20200707-112926-marostegui.json
* 11:28 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1130', diff saved to https://phabricator.wikimedia.org/P11783 and previous config saved to /var/cache/conftool/dbconfig/20200707-112830-marostegui.json
* 11:26 godog: test bumping logstash7 batch size to 256
* 11:17 moritzm: prune PHP 7.0 packages from mwdebug1001/2001/2002
* 11:05 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1130', diff saved to https://phabricator.wikimedia.org/P11782 and previous config saved to /var/cache/conftool/dbconfig/20200707-110506-marostegui.json
* 11:04 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1110', diff saved to https://phabricator.wikimedia.org/P11781 and previous config saved to /var/cache/conftool/dbconfig/20200707-110412-marostegui.json
* 10:57 moritzm: prune PHP 7.0 packages from mw2190-mw2214
* 10:46 jforrester@deploy1001: Started scap: testwikis wikis to 1.35.0-wmf.40
* 10:44 jforrester@deploy1001: Pruned MediaWiki: 1.35.0-wmf.38 (duration: 17m 23s)
* 10:32 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1110', diff saved to https://phabricator.wikimedia.org/P11780 and previous config saved to /var/cache/conftool/dbconfig/20200707-103255-marostegui.json
* 10:27 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1113:3315', diff saved to https://phabricator.wikimedia.org/P11779 and previous config saved to /var/cache/conftool/dbconfig/20200707-102757-marostegui.json
* 10:26 moritzm: prune PHP 7.0 packages from mw2135-mw2147
* 10:12 addshore@deploy1001: Synchronized wmf-config/config/testcommonswiki.yaml: [[gerrit:609985]] Make testcommonswiki a testwikidata client [[phab:T257266|T257266]] PT2/2 (duration: 00m 55s)
* 10:11 addshore@deploy1001: sync-file aborted: [[gerrit:609985]] Make testcommonswiki a testwikidata client [[phab:T257266|T257266]] PT1/2 (duration: 00m 00s)
* 10:10 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1113:3315', diff saved to https://phabricator.wikimedia.org/P11778 and previous config saved to /var/cache/conftool/dbconfig/20200707-101043-marostegui.json
* 10:10 addshore@deploy1001: Synchronized dblists/wikidataclient-test.dblist: [[gerrit:609985]] Make testcommonswiki a testwikidata client [[phab:T257266|T257266]] PT1/2 (duration: 00m 56s)
* 10:08 addshore@deploy1001: sync-file aborted: [[gerrit:609985]] Make testcommonswiki a testwikidata client [[phab:T257266|T257266]] PT1/2 (duration: 00m 36s)
* 10:06 elukey: decommission archiva1001
* 10:05 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 10:03 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1144:3315', diff saved to https://phabricator.wikimedia.org/P11777 and previous config saved to /var/cache/conftool/dbconfig/20200707-100328-marostegui.json
* 10:03 elukey@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 10:03 elukey@cumin1001: START - Cookbook sre.hosts.downtime
* 10:03 elukey@cumin1001: START - Cookbook sre.hosts.decommission
* 09:54 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1144:3315', diff saved to https://phabricator.wikimedia.org/P11776 and previous config saved to /var/cache/conftool/dbconfig/20200707-095443-marostegui.json
* 09:54 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1096:3315', diff saved to https://phabricator.wikimedia.org/P11775 and previous config saved to /var/cache/conftool/dbconfig/20200707-095428-marostegui.json
* 09:42 addshore@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [[gerrit:609971]] [[phab:T257266|T257266]] Enable sitelinks to testcommons from test wikidata sites (duration: 00m 56s)
* 09:40 kormat@cumin1001: dbctl commit (dc=all): 'Repool es2021 after reimaging [[phab:T257284|T257284]]', diff saved to https://phabricator.wikimedia.org/P11774 and previous config saved to /var/cache/conftool/dbconfig/20200707-094017-kormat.json
* 09:37 addshore@deploy1001: Synchronized wmf-config: [[gerrit:609986]] [[phab:T257266|T257266]] [[phab:T241975|T241975]] Wikibase: Remove config option wmgUseEntitySourceBasedFederation (take2) (duration: 00m 57s)
* 09:36 _joe_: errata: restbase2010, not 2009
* 09:36 _joe_: applying the new configuration using the service proxy to restbase2009 too
* 09:34 godog: bounce logstash on logstash1023
* 09:33 addshore@deploy1001: Synchronized wmf-config/Wikibase.php: [[gerrit:609645]] [[phab:T257266|T257266]] [[phab:T241975|T241975]] Wikibase: stop using wmgUseEntitySourceBasedFederation (take2) (duration: 00m 59s)
* 09:33 _joe_: depooling restbase1025 while we fix the troubled relationship between envoy and proton
* 09:33 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1096:3315', diff saved to https://phabricator.wikimedia.org/P11773 and previous config saved to /var/cache/conftool/dbconfig/20200707-093345-marostegui.json
* 09:26 marostegui@cumin1001: dbctl commit (dc=all): 'Remove weight from es1024 as it is the current master [[phab:T255755|T255755]]', diff saved to https://phabricator.wikimedia.org/P11772 and previous config saved to /var/cache/conftool/dbconfig/20200707-092635-marostegui.json
* 09:24 James_F: 1.35.0-wmf.40 was branched at {{Gerrit|88ecd6df00a46e432c06c1cf40d5098128abc4d8}} for [[phab:T256668|T256668]]
* 09:23 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool es1023 after reimage [[phab:T255755|T255755]]', diff saved to https://phabricator.wikimedia.org/P11771 and previous config saved to /var/cache/conftool/dbconfig/20200707-092357-marostegui.json
* 09:10 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool es1023 after reimage [[phab:T255755|T255755]]', diff saved to https://phabricator.wikimedia.org/P11770 and previous config saved to /var/cache/conftool/dbconfig/20200707-091015-marostegui.json
* 08:33 kormat@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 08:31 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool es1023 after reimage [[phab:T255755|T255755]]', diff saved to https://phabricator.wikimedia.org/P11769 and previous config saved to /var/cache/conftool/dbconfig/20200707-083144-marostegui.json
* 08:30 kormat@cumin2001: START - Cookbook sre.hosts.downtime
* 08:26 XioNoX: cr2-codfw> request vmhost snapshot routing-engine both - [[phab:T257153|T257153]]
* 08:22 XioNoX: cr2-eqsin> request vmhost snapshot - [[phab:T257153|T257153]]
* 08:19 XioNoX: cr2-eqord> request vmhost snapshot - [[phab:T257153|T257153]]
* 08:19 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool es1023 after reimage [[phab:T255755|T255755]]', diff saved to https://phabricator.wikimedia.org/P11768 and previous config saved to /var/cache/conftool/dbconfig/20200707-081909-marostegui.json
* 08:18 elukey@cumin1001: END (ERROR) - Cookbook sre.hadoop.change-distro (exit_code=97)
* 08:17 XioNoX: cr2-eqdfw> request vmhost snapshot - [[phab:T257153|T257153]]
* 08:15 XioNoX: cr3-knams> request vmhost snapshot - [[phab:T257153|T257153]]
* 08:15 hashar: upgrading and restart CI Jenkins on contint2001 # [[phab:T256978|T256978]]
* 08:12 XioNoX: cr4-ulsfo> request vmhost snapshot - [[phab:T257153|T257153]]
* 08:09 kormat@cumin1001: dbctl commit (dc=all): 'Depool es2021 for reimaging [[phab:T257284|T257284]]', diff saved to https://phabricator.wikimedia.org/P11767 and previous config saved to /var/cache/conftool/dbconfig/20200707-080914-kormat.json
* 07:50 marostegui: Stop MySQL on db1074 to deploy schema change and remove triggers - [[phab:T238966|T238966]]
* 07:45 _joe_: restarting restbase again on rb1025
* 07:44 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1074 for schema change', diff saved to https://phabricator.wikimedia.org/P11766 and previous config saved to /var/cache/conftool/dbconfig/20200707-074435-marostegui.json
* 07:41 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 07:39 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1079 and db1136  [[phab:T257216|T257216]]', diff saved to https://phabricator.wikimedia.org/P11765 and previous config saved to /var/cache/conftool/dbconfig/20200707-073918-marostegui.json
* 07:38 marostegui@cumin1001: START - Cookbook sre.hosts.downtime
* 07:31 _joe_: restarting restbase on restbase1025, reaching proton via envoy for now
* 07:31 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:609644{{!}}Revert "Commons: Define entity sources configuration" (T256906, T256907, T256909, T254315, T257266)]] (forgot to git rebase so the last sync was a no-op) (duration: 00m 56s)
* 07:27 elukey@cumin1001: START - Cookbook sre.hadoop.change-distro
* 07:27 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:609644{{!}}Revert "Commons: Define entity sources configuration" (T256906, T256907, T256909, T254315, T257266)]] (duration: 00m 53s)
* 07:27 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1079 and give more main weight to db1136  [[phab:T257216|T257216]]', diff saved to https://phabricator.wikimedia.org/P11764 and previous config saved to /var/cache/conftool/dbconfig/20200707-072703-marostegui.json
* 07:24 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/: Config: [[gerrit:609643{{!}}Revert "Wikidata client wikis: Define entity sources configuration (take 2)" (T254315, T257266)]] (duration: 00m 56s)
* 07:24 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.stop-cluster (exit_code=0)
* 07:23 lucaswerkmeister-wmde@deploy1001: Synchronized dblists/wikidataclient.dblist: Config: [[gerrit:609643{{!}}Revert "Wikidata client wikis: Define entity sources configuration (take 2)" (T254315, T257266)]] (duration: 00m 56s)
* 07:19 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/Wikibase.php: Config: [[gerrit:609642{{!}}Revert "Wikibase: stop using wmgUseEntitySourceBasedFederation" (T241975, T257266)]] (duration: 00m 55s)
* 07:16 elukey@cumin1001: START - Cookbook sre.hadoop.stop-cluster
* 07:15 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:609641{{!}}Revert "Wikibase: Remove config option wmgUseEntitySourceBasedFederation" (T241975, T257266)]] (duration: 00m 57s)
* 07:10 _joe_: restart restbase on restbase1025 to pick up the switch to https for cxserver
* 06:37 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1079 and give more main weight to db1136  [[phab:T257216|T257216]]', diff saved to https://phabricator.wikimedia.org/P11762 and previous config saved to /var/cache/conftool/dbconfig/20200707-063737-marostegui.json
* 06:29 marostegui: Reimage es1023 to Buster [[phab:T255755|T255755]]
* 06:20 marostegui@cumin1001: dbctl commit (dc=all): 'Give db1136 some weight back into main traffic [[phab:T257216|T257216]]', diff saved to https://phabricator.wikimedia.org/P11761 and previous config saved to /var/cache/conftool/dbconfig/20200707-062008-marostegui.json
* 06:18 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1079 [[phab:T257216|T257216]]', diff saved to https://phabricator.wikimedia.org/P11760 and previous config saved to /var/cache/conftool/dbconfig/20200707-061849-marostegui.json
* 05:26 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Enable es5 writes [[phab:T255755|T255755]] (duration: 00m 56s)
* 05:16 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es1023 entirely [[phab:T255755|T255755]]', diff saved to https://phabricator.wikimedia.org/P11759 and previous config saved to /var/cache/conftool/dbconfig/20200707-051620-marostegui.json
* 05:12 marostegui@cumin1001: dbctl commit (dc=all): 'Promote es1024 to es5 master [[phab:T255755|T255755]]', diff saved to https://phabricator.wikimedia.org/P11758 and previous config saved to /var/cache/conftool/dbconfig/20200707-051236-marostegui.json
* 05:02 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Disable es5 writes [[phab:T255755|T255755]] (duration: 00m 56s)
* 05:01 marostegui: "Starting es failover from es1023 to es1024 - https://phabricator.wikimedia.org/T255755"
* 01:05 ejegg: turned on debug logging for Adyen SmashPig
* 00:22 cstone: civicrm revision changed from {{Gerrit|a48caf0f37}} to {{Gerrit|d73ee2e73f}}
 
== 2020-07-06 ==
* 23:32 reedy@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enable sidebar instrumentation on test wikipedia (duration: 00m 56s)
* 23:32 eileen: process-control config revision is {{Gerrit|3fe6753e56}}
* 23:22 reedy@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Change some zh canonical namespaces. Don't index NS_USER on hywiki (duration: 00m 58s)
* 22:59 eileen: tools revision changed from {{Gerrit|e974147f27}} to {{Gerrit|73557b8038}}
* 22:14 ryankemper@deploy1001: Finished deploy [wdqs/wdqs@65502b2]: 0.3.40 (duration: 18m 58s)
* 21:55 ryankemper@deploy1001: Started deploy [wdqs/wdqs@65502b2]: 0.3.40
* 21:52 hashar: Upgraded Jenkins on releases1002 and releases2002 # [[phab:T256978|T256978]]
* 21:41 mutante: upgrading jenkins on releases1001 and releases2001 ([[phab:T256980|T256980]])
* 21:37 mutante: importing jenkins 2.235.1 into APT repo for both stretch and buster [[phab:T256980|T256980]]
* 20:08 eileen: tools revision is {{Gerrit|e974147f27}}
* 19:41 qchris: Enabling puppet on gerrit1002 again to catch up with puppetmaster.
* 18:56 addshore: backport / deploy window done
* 18:55 addshore@deploy1001: Synchronized wmf-config: [[gerrit:569263]] [[phab:T241975|T241975]] Wikibase: Remove config option wmgUseEntitySourceBasedFederation (duration: 00m 58s)
* 18:54 addshore@deploy1001: sync-file aborted: [[gerrit:569263]]  (duration: 00m 00s)
* 18:51 addshore@deploy1001: Synchronized wmf-config/Wikibase.php: [[gerrit:608944]] [[phab:T241975|T241975]] Wikibase: stop using wmgUseEntitySourceBasedFederation (duration: 00m 56s)
* 18:47 addshore@deploy1001: Synchronized dblists/wikidataclient.dblist: [[phab:T254315|T254315]] Wikidata client wikis: Define entity sources configuration (take 2) [[gerrit:608839]] (duration: 00m 56s)
* 18:45 addshore@deploy1001: Synchronized wmf-config: [[phab:T254315|T254315]] Wikidata client wikis: Define entity sources configuration (take 2) [[gerrit:608839]] (duration: 00m 58s)
* 18:38 addshore@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [[phab:T256906|T256906]] [[phab:T256907|T256907]] [[phab:T256909|T256909]] [[phab:T254315|T254315]] [[gerrit:569260]] Commons: Define entity sources configuration (duration: 00m 56s)
* 18:30 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|adffbe6}}: Enable validation of new signatures ([[phab:T248632|T248632]]) (duration: 00m 57s)
* 18:24 urbanecm@deploy1001: Synchronized wmf-config/abusefilter.php: {{Gerrit|8878c60}}: Add `abusefilter-view` as a default right for the CU log user ([[phab:T255506|T255506]]) (duration: 00m 55s)
* 18:22 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|1398171}}: Add arbcom group to plwiki ([[phab:T256572|T256572]]) (duration: 00m 56s)
* 18:08 andrew@deploy1001: Finished deploy [horizon/deploy@bb176c2]: update proxy UI to support multiple pre-set domains (duration: 03m 39s)
* 18:04 andrew@deploy1001: Started deploy [horizon/deploy@bb176c2]: update proxy UI to support multiple pre-set domains
* 17:54 otto@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Migrate SearchSatisfaction from EventLogging to EventGate on all wikis - [[phab:T249261|T249261]] - take 2 (duration: 00m 56s)
* 17:50 otto@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Migrate SearchSatisfaction from EventLogging to EventGate on all wikis - [[phab:T249261|T249261]] (duration: 00m 56s)
* 16:09 otto@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Migrate SearchSatisfaction from EventLogging to EventGate on group1 - [[phab:T249261|T249261]] (duration: 00m 58s)
* 15:02 jynus: removing old snapshots for x1 on dbprov[12]002
* 14:50 mholloway-shell@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
* 14:46 mholloway-shell@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
* 14:44 mholloway-shell@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'wikifeeds' for release 'staging' .
* 14:42 moritzm: installing PHP 7.0 security updates
* 14:37 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1074', diff saved to https://phabricator.wikimedia.org/P11753 and previous config saved to /var/cache/conftool/dbconfig/20200706-143754-marostegui.json
* 14:36 godog: reboot ms-be2025 for hw raid software upgrade - [[phab:T257214|T257214]]
* 14:28 godog: powercycle ms-be2025, no ssh available - [[phab:T257214|T257214]]
* 14:14 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 14:09 marostegui: Stop MySQL and poweroff db1079 [[phab:T257216|T257216]]
* 14:08 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 14:02 jynus@cumin1001: dbctl commit (dc=all): 'depool db1136 from main traffic as it is the only s7 api host right now', diff saved to https://phabricator.wikimedia.org/P11752 and previous config saved to /var/cache/conftool/dbconfig/20200706-140217-jynus.json
* 13:56 marostegui: Downtime and reboot db1079 after BBU crash
* 13:54 jynus@cumin1001: dbctl commit (dc=all): 'depool db1079', diff saved to https://phabricator.wikimedia.org/P11751 and previous config saved to /var/cache/conftool/dbconfig/20200706-135430-jynus.json
* 13:30 marostegui: Deploy schema change on s5 codfw master [[phab:T253276|T253276]]
* 13:26 marostegui@cumin1001: dbctl commit (dc=all): 'Reduce es1024 weight in preparation for tomorrow's switchover [[phab:T255755|T255755]]', diff saved to https://phabricator.wikimedia.org/P11750 and previous config saved to /var/cache/conftool/dbconfig/20200706-132634-marostegui.json
* 13:03 elukey: force umount/mount of /mnt/hdfs on an-airflow1001 to unblock dpkg checks (fuse misbehaving, all checks hanging)
* 12:53 elukey: kill hanging lsof processes on an-airflow to reduce cpu load
* 12:42 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1074', diff saved to https://phabricator.wikimedia.org/P11748 and previous config saved to /var/cache/conftool/dbconfig/20200706-124237-marostegui.json
* 12:41 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1129', diff saved to https://phabricator.wikimedia.org/P11747 and previous config saved to /var/cache/conftool/dbconfig/20200706-124105-marostegui.json
* 11:17 Urbanecm: EU B&C window was done
* 11:16 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|5d971dc}}: GrowthExperiments: Remove overrides to welcome survey privacy policy URL ([[phab:T252572|T252572]]) (duration: 00m 56s)
* 11:12 marostegui: Deploy schema changes on db1129
* 11:12 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1129', diff saved to https://phabricator.wikimedia.org/P11746 and previous config saved to /var/cache/conftool/dbconfig/20200706-111221-marostegui.json
* 11:09 marostegui: Compress InnoDB on db1107 [[phab:T254462|T254462]]
* 11:09 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|f4b5001}}: Add arxiv.org to commonswiki wgCopyUploadsDomains ([[phab:T257036|T257036]]) (duration: 00m 56s)
* 11:07 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1107 [[phab:T254462|T254462]]', diff saved to https://phabricator.wikimedia.org/P11745 and previous config saved to /var/cache/conftool/dbconfig/20200706-110723-marostegui.json
* 11:05 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1076', diff saved to https://phabricator.wikimedia.org/P11744 and previous config saved to /var/cache/conftool/dbconfig/20200706-110544-marostegui.json
* 11:05 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|3bc1b46}}: Remove "Create a book" link from sidebar on Finnish Wikipedia ([[phab:T257073|T257073]]) (duration: 00m 56s)
* 10:52 jdrewniak@deploy1001: Synchronized portals: Wikimedia Portals Update: [[gerrit:609762{{!}} Bumping portals to master (609762)]] (duration: 00m 57s)
* 10:51 jdrewniak@deploy1001: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: [[gerrit:609762{{!}} Bumping portals to master (609762)]] (duration: 00m 56s)
* 10:31 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 10:28 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
* 10:28 moritzm: rebooting idp1001 for kernel update
* 09:35 urbanecm@deploy1001: Synchronized private/PrivateSettings.php: Update [[phab:T250887|T250887]] mitigations (duration: 00m 58s)
* 08:51 XioNoX: cr1-codfw> request vmhost snapshot routing-engine both - [[phab:T257153|T257153]]
* 08:44 XioNoX: cr3-ulsfo> request vmhost snapshot - [[phab:T257153|T257153]]
* 08:24 kormat: restarting all mariadb instances on sanitarium hosts [[phab:T256545|T256545]]
* 08:09 elukey: roll restart aqs on aqs100[4-9] to pick up new druid settings
* 08:05 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1076', diff saved to https://phabricator.wikimedia.org/P11742 and previous config saved to /var/cache/conftool/dbconfig/20200706-080509-marostegui.json
* 07:58 qchris: Disable puppet on gerrit1002 (gerrit-test) to deploy Gerrit UI updates there to gather more feedback
* 07:51 elukey: enable binlog on matomo's database on matomo1002
* 07:46 XioNoX: repool eqsin - [[phab:T257154|T257154]]
* 07:11 XioNoX: reboot cr3-eqsin - [[phab:T257154|T257154]]
* 06:55 XioNoX: depool eqsin for cr3-eqsin reboot/investigation - [[phab:T257154|T257154]]
* 06:54 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1089', diff saved to https://phabricator.wikimedia.org/P11740 and previous config saved to /var/cache/conftool/dbconfig/20200706-065437-marostegui.json
* 06:54 elukey@cumin1001: END (FAIL) - Cookbook sre.hadoop.change-distro (exit_code=99)
* 06:22 elukey@cumin1001: START - Cookbook sre.hadoop.change-distro
* 06:21 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.stop-cluster (exit_code=0)
* 06:14 elukey@cumin1001: START - Cookbook sre.hadoop.stop-cluster
* 05:45 kart_: Updated cxserver to 2020-07-01-044435-production ([[phab:T254143|T254143]])
* 05:40 kartik@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'cxserver' for release 'production' .
* 05:36 kartik@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'cxserver' for release 'production' .
* 05:32 kartik@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'cxserver' for release 'staging' .
* 05:13 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1089', diff saved to https://phabricator.wikimedia.org/P11739 and previous config saved to /var/cache/conftool/dbconfig/20200706-051333-marostegui.json
* 05:03 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1089', diff saved to https://phabricator.wikimedia.org/P11738 and previous config saved to /var/cache/conftool/dbconfig/20200706-050347-marostegui.json
* 04:49 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1089', diff saved to https://phabricator.wikimedia.org/P11737 and previous config saved to /var/cache/conftool/dbconfig/20200706-044908-marostegui.json
 
== 2020-07-05 ==
* 21:50 qchris: Restarting gerrit on gerrit1001 to pick up new war and jars.
* 21:50 qchris@deploy1001: Finished deploy [gerrit/gerrit@fbd0684]: Bump gerrit to 3.2.2-102-g3bbb138e13, zuul plugin to master-0-g7accc67, and gitiles to v3.2.2-1-g00c5ca0-with-0e3b533 on gerrit1001 (duration: 00m 07s)
* 21:50 qchris@deploy1001: Started deploy [gerrit/gerrit@fbd0684]: Bump gerrit to 3.2.2-102-g3bbb138e13, zuul plugin to master-0-g7accc67, and gitiles to v3.2.2-1-g00c5ca0-with-0e3b533 on gerrit1001
* 21:46 qchris: Restarting gerrit on gerrit2001 to pick up new war and jars.
* 21:45 qchris@deploy1001: Finished deploy [gerrit/gerrit@fbd0684]: Bump gerrit to 3.2.2-102-g3bbb138e13, zuul plugin to master-0-g7accc67, and gitiles to v3.2.2-1-g00c5ca0-with-0e3b533 on gerrit2001 (duration: 00m 10s)
* 21:45 qchris@deploy1001: Started deploy [gerrit/gerrit@fbd0684]: Bump gerrit to 3.2.2-102-g3bbb138e13, zuul plugin to master-0-g7accc67, and gitiles to v3.2.2-1-g00c5ca0-with-0e3b533 on gerrit2001
* 21:32 qchris: Restarting gerrit on gerrit1002 to pick up new wars and jars.
* 21:32 qchris@deploy1001: Finished deploy [gerrit/gerrit@fbd0684]: Bump gerrit to 3.2.2-102-g3bbb138e13 and zuul plugin to master-0-g7accc67 (duration: 00m 08s)
* 21:32 qchris@deploy1001: Started deploy [gerrit/gerrit@fbd0684]: Bump gerrit to 3.2.2-102-g3bbb138e13 and zuul plugin to master-0-g7accc67
* 21:20 qchris: Enable puppet on gerrit1002 (gerrit-test) again to let it catch up again
* 16:01 gehel: restart elastic-psi on elastic1052 (high GC rate)
* 15:56 gehel: restart blazegraph + updater on wdqs1007 and depool to allow catching up on lag
 
== 2020-07-04 ==
* 19:23 qchris@deploy1001: Finished deploy [gerrit/gerrit@b78914b]: Bump gitiles to v3.2.2-1-g00c5ca0-with-0e3b533 on gerrit1002 (duration: 00m 08s)
* 19:23 qchris@deploy1001: Started deploy [gerrit/gerrit@b78914b]: Bump gitiles to v3.2.2-1-g00c5ca0-with-0e3b533 on gerrit1002
* 14:05 qchris: Disable puppet on gerrit1002 (gerrit-test) to deploy Gerrit UI updates there to gather feedback
* 12:42 reedy@deploy1001: Synchronized wmf-config/interwiki.php: Update interwiki cache (duration: 02m 24s)
* 02:28 reedy@deploy1001: Synchronized php-1.35.0-wmf.39/extensions/Score/includes/Score.php: Short circuit lilypond version check to allow usage of cached files [[phab:T257066|T257066]] (duration: 00m 55s)
 
== 2020-07-03 ==
* 21:49 reedy@deploy1001: Synchronized php-1.35.0-wmf.39/extensions/Score/: Sync maintenance script (duration: 00m 58s)
* 18:47 cdanis: ✔️ cdanis@an-coord1001.eqiad.wmnet ~ 🕒☕ sudo systemctl restart hive-server2.service
* 16:51 krinkle@deploy1001: Synchronized wmf-config/CommonSettings.php: {{Gerrit|Ifa929b2ad4}} (duration: 00m 57s)
* 16:02 reedy@deploy1001: Synchronized wmf-config/CommonSettings.php: Rename wgRestrictionMethod to wgShellRestrictionMethod (duration: 00m 58s)
* 15:46 jayme@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
* 15:43 jayme@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
* 15:43 jynus@cumin1001: dbctl commit (dc=all): 'Reduce db1118 weight to spread load mode evenly', diff saved to https://phabricator.wikimedia.org/P11730 and previous config saved to /var/cache/conftool/dbconfig/20200703-154337-jynus.json
* 15:40 jayme@cumin1001: START - Cookbook sre.ganeti.makevm
* 15:38 jayme@cumin1001: START - Cookbook sre.ganeti.makevm
* 15:09 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.stop-cluster (exit_code=0)
* 15:02 elukey@cumin1001: START - Cookbook sre.hadoop.stop-cluster
* 14:11 elukey@cumin1001: END (FAIL) - Cookbook sre.hadoop.stop-cluster (exit_code=99)
* 14:11 _joe_: restarted php-fpm on wtp1033, stuck in sigill
* 13:59 elukey@cumin1001: START - Cookbook sre.hadoop.stop-cluster
* 12:41 hashar: Restarting Zuul / CI
* 11:39 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 11:36 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
* 11:32 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 11:29 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
* 11:29 moritzm: rebooting urldownloader standby hosts for kernel updates (1002/2002)
* 10:59 moritzm: installing json-c security updates on jessie
* 10:51 moritzm: installing ruby-json security updates
* 10:25 moritzm: installing nss security updates on jessie
* 10:18 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 10:15 elukey: notebook1004 renamed to an-scheduler1001
* 10:15 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
* 10:09 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 10:07 elukey@cumin1001: START - Cookbook sre.hosts.downtime
* 09:06 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 09:04 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
* 09:00 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 08:58 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
* 08:58 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 08:56 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
* 08:55 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 08:51 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
* 08:47 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 08:43 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
* 08:43 moritzm: rebooting netflow* hosts for kernel security update
* 08:16 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 08:14 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
* 08:04 jayme: authdns-update for chartmuseum - [[phab:T256970|T256970]]
* 08:03 elukey@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
* 07:55 moritzm: installing mutt security updates for jessie (stretch/buster already fixed)
* 07:44 elukey@cumin1001: START - Cookbook sre.dns.netbox
* 07:40 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 07:39 elukey@cumin1001: START - Cookbook sre.hosts.decommission
* 07:00 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 07:00 jmm@cumin2001: START - Cookbook sre.hosts.downtime
* 06:47 moritzm: installing php5 security updates
* 06:27 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 06:27 jmm@cumin2001: START - Cookbook sre.hosts.downtime
* 06:09 moritzm: rebooting mw1390-mw1419 for kernel security updates
* 05:46 XioNoX: remove chassis redundancy failover from fasw-c-eqiad for consistency with all other VCs
* 05:33 XioNoX: remove chassis redundancy failover from fasw-c-codfw for consistency with all other VCs
 
== 2020-07-02 ==
* 23:22 jhuneidi@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'blubberoid' for release 'production' .
* 23:16 jhuneidi@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'blubberoid' for release 'production' .
* 22:03 jhuneidi@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'blubberoid' for release 'staging' .
* 21:56 mutante: gerrit1001 (prod gerrit) - restarting gerrit service
* 21:52 maryum: frwikibooks reindex sucessful, continuing on with remainder of french wikis
* 21:32 mutante: gerrit - deleted gerrit db_pass from prod private repo, running puppet
* 21:25 mutante: gerrit2001 - restarted gerrit
* 21:14 mutante: gerrit1002 restarted gerrit
* 20:20 maryum: reindexing frwikibooks to test https://gerrit.wikimedia.org/r/c/mediawiki/extensions/CirrusSearch/+/604221
* 19:52 mutante: gerrit2001 - restarting gerrit after removing db_pass from config
* 16:05 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
* 16:01 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 15:40 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 15:37 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
* 15:23 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
* 15:19 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 15:07 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 15:07 jmm@cumin2001: START - Cookbook sre.hosts.downtime
* 14:42 moritzm: rebooting mw1370-mw1389 for kernel security updates
* 14:33 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 14:33 jmm@cumin2001: START - Cookbook sre.hosts.downtime
* 14:03 kormat: stopped mariadb@s8 on dbstore1005 for data restoration [[phab:T256966|T256966]]
* 12:43 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 12:43 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 12:36 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 12:36 jmm@cumin2001: START - Cookbook sre.hosts.downtime
* 12:32 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 12:32 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 12:31 moritzm: rebooting mw1349-mw1369 for kernel security updates
* 12:28 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 12:27 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 12:27 vgutierrez: rolling restart of esams load balancers to catch up on kernel upgrades
* 12:12 XioNoX: pre-configure asw2-b-eqiad<->cloudsw1-c8-eqiad - [[phab:T251632|T251632]]
* 12:10 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 12:10 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 11:53 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 11:53 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 11:40 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 11:40 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 11:35 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 11:35 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 11:33 vgutierrez: rolling restart of codfw load balancers to catch up on kernel upgrades
* 11:18 akosiaris: preactively restart docker-registry on registry1001, registry1002 to force CA refresh
* 11:16 akosiaris: restart docker-registry on registry2002 for CA refresh
* 11:14 _joe_: restarting docker-registry on registry2001
* 10:34 godog: move "cluster overview" dashboard to Thanos - [[phab:T256954|T256954]]
* 09:35 XioNoX: advertise codfw prefixes from eqord
* 09:28 jayme: imported chartmuseum_0.12.0-2 to buster-wikimedia - [[phab:T253843|T253843]]
* 09:07 addshore: addshore@mwmaint1002:~$ mwscript maintenance/createAndPromote.php --wiki testwikidatawiki --force --custom-groups oversight "DCausse_(WMF)" # [[phab:T256949|T256949]]
* 09:07 addshore: addshore@mwmaint1002:~$ mwscript maintenance/createAndPromote.php --wiki testwikidatawiki --force --custom-groups oversight "Addshore" # [[phab:T256949|T256949]]
* 08:59 XioNoX: deploy flex flow for MX204s - [[phab:T248394|T248394]]
* 05:52 _joe_: removing all tags for envoy-tls-local-proxy
* 05:46 _joe_: upload docker-report 0.0.4 on buster-wikimedia [[phab:T242604|T242604]]
* 04:32 eileen: process-control config revision is {{Gerrit|b4655897b5}}
* 03:17 eileen: process-control config revision is {{Gerrit|12fe6b5151}}
* 03:15 eileen: tools revision changed from {{Gerrit|4ea8567819}} to {{Gerrit|e974147f27}}
* 02:32 eileen: tools revision changed from {{Gerrit|e38f7a83d4}} to {{Gerrit|4ea8567819}}
* 00:53 eileen: tools revision changed from {{Gerrit|806e2b4412}} to {{Gerrit|e38f7a83d4}}
 
== 2020-07-01 ==
* 23:53 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Set $wgForceUIAsContentMsg for zhwikibooks, zhwikinews, zhwikiquote, zhwikisource, zhwikiversity, zhwiktionary ([[phab:T256521|T256521]]) (duration: 00m 55s)
* 23:35 ejegg: updated fundraising CiviCRM from {{Gerrit|391d0fdf75}} to {{Gerrit|a48caf0f37}}
* 23:32 catrope@deploy1001: Synchronized static/images/project-logos/: Change Simplified Chinese logo for zhwiki ([[phab:T256839|T256839]]) (duration: 00m 55s)
* 23:18 krinkle@deploy1001: Synchronized wmf-config/CommonSettings.php: {{Gerrit|Ibb42db7fd1ee}} (duration: 00m 55s)
* 23:00 bstorm: set a short downtime on labstore1006/7 to prevent alert while disabling direct systemd monitoring
* 22:37 krinkle@deploy1001: Synchronized php-1.35.0-wmf.39/includes/Title.php: {{Gerrit|I8d5bad9c654c4ab}} (duration: 01m 00s)
* 21:00 mholloway-shell@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
* 20:58 mholloway-shell@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
* 20:56 mholloway-shell@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'mobileapps' for release 'staging' .
* 20:56 Krinkle: krinkle@deploy1001 Ran `scap deploy --init` for /srv/deployment/performance/arc-lamp
* 20:55 mholloway-shell@deploy1001: Finished deploy [mobileapps/deploy@d7476f5]: Update mobileapps to {{Gerrit|953fc41a}} (duration: 04m 08s)
* 20:51 mholloway-shell@deploy1001: Started deploy [mobileapps/deploy@d7476f5]: Update mobileapps to {{Gerrit|953fc41a}}
* 20:27 eileen: tools revision changed from {{Gerrit|6f38c14fe3}} to {{Gerrit|806e2b4412}} -
* 20:11 eileen: tools revision changed from {{Gerrit|aab96444df}} to {{Gerrit|6f38c14fe3}}
* 19:23 twentyafterfour: 1.35.0-wmf.39 is now deployed to group2 wikis, everything appears to be normal.  refs [[phab:T254176|T254176]]
* 19:18 twentyafterfour@deploy1001: rebuilt and synchronized wikiversions files: group2 wikis to 1.35.0-wmf.39  refs [[phab:T254176|T254176]]
* 18:44 addshore@deploy1001: Synchronized wmf-config: REVERT [[phab:T254315|T254315]] Wikidata client wikis: Define entity sources configuration [[gerrit:569259]] (duration: 01m 04s)
* 18:41 addshore@deploy1001: sync-file aborted: [[phab:T254315|T254315]] Wikidata client wikis: Define entity sources configuration [[gerrit:569259]] (duration: 00m 38s)
* 18:38 joal@deploy1001: Finished deploy [analytics/refinery@8b7bddf] (thin): Regular analytics weekly train THIN [analytics/refinery@8b7bddf] (duration: 02m 19s)
* 18:36 joal@deploy1001: Started deploy [analytics/refinery@8b7bddf] (thin): Regular analytics weekly train THIN [analytics/refinery@8b7bddf]
* 18:35 joal@deploy1001: Finished deploy [analytics/refinery@8b7bddf]: Regular analytics weekly train [analytics/refinery@8b7bddf] (duration: 08m 09s)
* 18:27 joal@deploy1001: Started deploy [analytics/refinery@8b7bddf]: Regular analytics weekly train [analytics/refinery@8b7bddf]
* 18:25 joal@deploy1001: Finished deploy [analytics/refinery@114bfed]: Regular analytics weekly train [analytics/refinery@114bfed] (duration: 03m 41s)
* 18:21 joal@deploy1001: Started deploy [analytics/refinery@114bfed]: Regular analytics weekly train [analytics/refinery@114bfed]
* 18:18 ppchelko@deploy1001: Synchronized wmf-config/InitialiseSettings-labs.php: Enable kafka purges on wikitech gerrit:607590 IS-labs.php (duration: 01m 03s)
* 18:07 ppchelko@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Deploy MediaModeration on all production wikis gerrit:608753 (duration: 01m 07s)
* 17:14 XioNoX: set flex-flow-sizing to cr2-eqsin - [[phab:T248394|T248394]]
* 16:57 XioNoX: restart cr2-eqsin for software upgrade - [[phab:T243080|T243080]]
* 16:00 XioNoX: updating eqsin LVS BGP neighbors IPs - [[phab:T255766|T255766]]
* 15:16 XioNoX: re0.cr1-eqsin> request system power-off both-routing-engines - [[phab:T255766|T255766]]
* 15:15 XioNoX: disable BGP to pybal on cr1-eqsin - [[phab:T255766|T255766]]
* 15:13 XioNoX: disable cr1-eqsin transit/peering BGP - [[phab:T255766|T255766]]
* 15:09 XioNoX: bump eqsin-codfw ospf link cost - [[phab:T255766|T255766]]
* 15:03 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 15:03 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 15:03 XioNoX: move vrrp master to cr2-eqsin - [[phab:T255766|T255766]]
* 15:00 XioNoX: depool eqsin for routers work - [[phab:T255766|T255766]]
* 14:28 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 14:28 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 14:04 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 14:04 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 13:37 hashar: contint1001 stopped zuul-merger for a test. started it again
* 13:35 hashar: Restarting zuul-merger on contint2001 # [[phab:T252310|T252310]]
* 13:30 hashar@deploy1001: Finished deploy [zuul/deploy@00f69b3]: (no justification provided) (duration: 00m 08s)
* 13:30 hashar@deploy1001: Started deploy [zuul/deploy@00f69b3]: (no justification provided)
* 13:29 hashar@deploy1001: Finished deploy [zuul/deploy@00f69b3]: (no justification provided) (duration: 00m 32s)
* 13:28 hashar@deploy1001: Started deploy [zuul/deploy@00f69b3]: (no justification provided)
* 13:16 hashar@deploy1001: Synchronized php: group1 wikis to 1.35.0-wmf.39 (duration: 01m 04s)
* 13:15 hashar@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.35.0-wmf.39
* 13:10 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 13:10 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 13:09 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 13:09 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 13:08 cdanis: ✔️ cdanis@netflow2001.codfw.wmnet ~ 🕘☕ sudo apt remove valgrind libc6-dbg
* 13:03 cdanis: [[phab:T256790|T256790]] ✔️ cdanis@cumin1001.eqiad.wmnet ~ 🕘☕ sudo cumin 'netflow[3-5]001*' 'systemctl restart nfacctd'
* 12:58 cdanis: [[phab:T256790|T256790]] ✔️ cdanis@cumin1001.eqiad.wmnet ~ 🕘☕ sudo debdeploy deploy -u 2020-07-01-pmacct.yaml -s netflow
* 12:55 cdanis: [[phab:T256790|T256790]] ✔️ cdanis@apt1001.wikimedia.org ~ 🕘☕ sudo -E reprepro -C main include buster-wikimedia pmacct_1.7.2-3+wmf1_amd64.changes
* 12:53 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 12:53 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 11:47 ema: A:cp upgrade librdkafka1 to 0.11.6-1.1wmf1 and restart purged, varnishkafka [[phab:T256444|T256444]]
* 11:46 addshore@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [[phab:T254315|T254315]] Wikidata: Define entity sources configuration [[gerrit:569258]] (duration: 01m 06s)
* 11:32 Lucas_WMDE: EU B&C window done
* 11:24 lucaswerkmeister-wmde@deploy1001: Synchronized w/touch.php: Config: [[gerrit:608713{{!}}Fully set MW_NO_SESSION for browser metadata endpoints]], 4/4 (duration: 01m 06s)
* 11:22 lucaswerkmeister-wmde@deploy1001: Synchronized w/robots.php: Config: [[gerrit:608713{{!}}Fully set MW_NO_SESSION for browser metadata endpoints]], 3/4 (duration: 01m 03s)
* 11:21 lucaswerkmeister-wmde@deploy1001: Synchronized w/favicon.php: Config: [[gerrit:608713{{!}}Fully set MW_NO_SESSION for browser metadata endpoints]], 2/4 (duration: 01m 04s)
* 11:19 lucaswerkmeister-wmde@deploy1001: Synchronized w/extract2.php: Config: [[gerrit:608713{{!}}Fully set MW_NO_SESSION for browser metadata endpoints]], 1/4 (duration: 01m 16s)
* 11:07 Amir1: Changing datatype of several properties with mwscript extensions/Wikibase/repo/maintenance/changePropertyDataType.php ([[phab:T255241|T255241]])
* 11:07 jayme@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'blubberoid' for release 'staging' .
* 11:02 ema: restbase2009 depooled [[phab:T256863|T256863]]
* 11:02 ema@puppetmaster1001: conftool action : set/pooled=no; selector: name=restbase2009.codfw.wmnet
* 10:50 ema: power on restbase2009
* 10:45 jayme: draining and docker restart (one at a time) kubernetes[1001-1004].eqiad.wmnet - [[phab:T256786|T256786]]
* 10:34 ema: power-cycle restbase2009
* 10:17 XioNoX: renumber NTT transit links - [[phab:T254877|T254877]]
* 10:16 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 10:16 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 10:14 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 10:14 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 10:09 jayme: draining and docker restart (one at a time) kubernetes[2001-2004].codfw.wmnet
* 09:52 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 09:52 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 09:46 jayme: cordoning kubernetes[2001-2004].codfw.wmnet,kubernetes[1001-1004].eqiad.wmnet - [[phab:T256786|T256786]]
* 09:42 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 09:42 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 09:34 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 09:34 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 09:23 jayme: restarting dockerd on kubestage1002.eqiad.wmnet - [[phab:T256786|T256786]]
* 09:15 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 09:15 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 09:08 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 09:08 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 08:53 jayme: draining kubernetes staging node kubestage1001.eqiad.wmnet - [[phab:T256786|T256786]]
* 08:52 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 08:52 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 08:44 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 08:44 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 08:29 XioNoX: disable BGP to nfacct in eqiad - [[phab:T256790|T256790]]
* 08:23 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 08:23 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 08:08 jayme@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'mobileapps' for release 'staging' .
* 08:05 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 08:05 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 08:01 vgutierrez: rolling restart of esams cache nodes to catch up on kernel upgrades
* 07:42 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 07:42 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 07:40 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 07:40 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 07:39 ema: cp2041: restart purged, varnishkafka after librdkafka1 upgrade to 0.11.6-1.1wmf1 [[phab:T256444|T256444]]
* 05:47 _joe_: restarting nfacctd on netflow1001, it's segfaulting
* 04:01 krinkle@deploy1001: Synchronized php-1.35.0-wmf.39/maintenance/findBadBlobs.php: {{Gerrit|I47c11190b665}} (duration: 01m 08s)
* 00:14 krinkle@deploy1001: Synchronized private/PrivateSettings.php: [[phab:T254795|T254795]] - Set $wmgXhguiDBuser and $wmgXhguiDBpasswor (duration: 01m 06s)
 
== 2020-06-30 ==
* 21:48 crusnov@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 21:46 crusnov@cumin1001: START - Cookbook sre.hosts.reboot-single
* 21:45 crusnov@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 21:43 crusnov@cumin1001: START - Cookbook sre.hosts.reboot-single
* 21:42 crusnov@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 21:40 crusnov@cumin1001: START - Cookbook sre.hosts.reboot-single
* 21:40 crusnov@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 21:38 crusnov@cumin1001: START - Cookbook sre.hosts.reboot-single
* 21:38 crusnov@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99)
* 21:38 crusnov@cumin1001: START - Cookbook sre.hosts.reboot-single
* 19:19 hashar@deploy1001: rebuilt and synchronized wikiversions files: group 0 wikis to 1.35.0-wmf.39 # [[phab:T254176|T254176]]
* 18:31 cdanis: [[phab:T256790|T256790]] ✔️ cdanis@netflow2001.codfw.wmnet ~ 🕝☕ sudo apt install valgrind
* 18:27 tgr: Morning deploys done
* 18:23 tgr@deploy1001: Synchronized php-1.35.0-wmf.39/extensions/ElectronPdfService/src/ElectronPdfServiceHooks.php: Backport: [[gerrit:608485{{!}}Hotfix: "Undefined index: print" (T256761)]] (duration: 01m 05s)
* 18:11 shdubsh: restart varnishmtail,atsmtail,ncredirmtail on ncredir,cp hosts in codfw and eqsin
* 18:05 cdanis: installing libc6-dbg on netflow2001 [[phab:T256790|T256790]]
* 17:40 mdholloway: mobileapps deployments on k8s failing with timeouts; filed [[phab:T256786|T256786]]
* 17:37 cdanis: ✔️ cdanis@netflow2001.codfw.wmnet ~ 🕜☕ sudo systemctl restart nfacctd
* 17:33 mholloway-shell@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'mobileapps' for release 'staging' .
* 17:18 mholloway-shell@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'mobileapps' for release 'staging' .
* 17:17 papaul: uplugging msw-c3 power to relocate port on PDU
* 17:09 mholloway-shell@deploy1001: Finished deploy [mobileapps/deploy@f9df1af]: Update mobileapps to {{Gerrit|5c7611b9}} (duration: 03m 33s)
* 17:05 mholloway-shell@deploy1001: Started deploy [mobileapps/deploy@f9df1af]: Update mobileapps to {{Gerrit|5c7611b9}}
* 16:57 cdanis: [[phab:T256444|T256444]] restarted purged on cp2030 and repooling
* 16:48 cdanis: [[phab:T256444|T256444]] ✔️ cdanis@cp2030.codfw.wmnet ~ 🕐☕ sudo depool
* 15:54 otto@deploy1001: Finished deploy [analytics/refinery@d63944e]: Deploying new camus wmf10 jar to an-launcher1002 for [[phab:T256370|T256370]] - take 3 (duration: 00m 03s)
* 15:54 otto@deploy1001: Started deploy [analytics/refinery@d63944e]: Deploying new camus wmf10 jar to an-launcher1002 for [[phab:T256370|T256370]] - take 3
* 15:40 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 15:35 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
* 15:32 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 15:30 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
* 15:24 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 15:20 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
* 15:16 otto@deploy1001: Finished deploy [analytics/refinery@1112749]: roll back to {{Gerrit|1112749}} on an-launcher1002, git-fat not pulling artifacts (duration: 01m 21s)
* 15:14 otto@deploy1001: Started deploy [analytics/refinery@1112749]: roll back to {{Gerrit|1112749}} on an-launcher1002, git-fat not pulling artifacts
* 15:14 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 15:10 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
* 15:10 moritzm: rebooting mwdebug* hosts for kernel security update
* 15:05 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 15:03 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
* 15:01 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 14:59 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
* 14:59 moritzm: rebooting failoid hosts for kernel update
* 14:49 otto@deploy1001: Finished deploy [analytics/refinery@d63944e]: Deploying new camus wmf10 jar to an-launcher1002 for [[phab:T256370|T256370]] - take 3 (duration: 00m 03s)
* 14:49 otto@deploy1001: Started deploy [analytics/refinery@d63944e]: Deploying new camus wmf10 jar to an-launcher1002 for [[phab:T256370|T256370]] - take 3
* 14:47 otto@deploy1001: Finished deploy [analytics/refinery@d63944e]: Deploying new camus wmf10 jar to an-launcher1002 for [[phab:T256370|T256370]] - take 2 (duration: 00m 03s)
* 14:47 otto@deploy1001: Started deploy [analytics/refinery@d63944e]: Deploying new camus wmf10 jar to an-launcher1002 for [[phab:T256370|T256370]] - take 2
* 14:44 hashar: Train blocked on Flow being broken: [[phab:T256761|T256761]]  # [[phab:T254176|T254176]]
* 14:38 hashar@deploy1001: rebuilt and synchronized wikiversions files: Revert "group0 wikis to 1.35.0-wmf.39" - [[phab:T256759|T256759]]
* 14:34 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 14:32 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
* 14:28 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 14:26 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
* 14:25 hashar@deploy1001: rebuilt and synchronized wikiversions files: group0 wikis to 1.35.0-wmf.39
* 14:21 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 14:21 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 14:21 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 14:18 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
* 14:18 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 14:15 moritzm: rebooting miscweb servers for kernel security update
* 14:15 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
* 14:13 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 14:11 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
* 14:10 otto@deploy1001: Finished deploy [analytics/refinery@d63944e]: Deploying new camus wmf10 jar to an-launcher1002 for [[phab:T256370|T256370]] (duration: 01m 56s)
* 14:10 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 14:09 hashar@deploy1001: Finished scap: testwikis wikis to 1.35.0-wmf.39 (duration: 62m 30s)
* 14:08 otto@deploy1001: Started deploy [analytics/refinery@d63944e]: Deploying new camus wmf10 jar to an-launcher1002 for [[phab:T256370|T256370]]
* 14:08 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
* 13:59 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 13:59 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 13:59 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 13:57 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
* 13:51 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 13:49 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
* 13:37 moritzm: rebooting LDAP replicas for kernel security update
* 13:32 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 13:32 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 13:31 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 13:30 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 13:07 hashar@deploy1001: Started scap: testwikis wikis to 1.35.0-wmf.39
* 12:03 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 12:03 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 11:35 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 11:35 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 11:33 awight: EU BACON cooked
* 11:32 awight@deploy1001: Synchronized wmf-config/InitialiseSettings.php: BACON: [[gerrit:608478{{!}}Configure TeWü survey on dewiki (take 2) (T253112)]] (duration: 00m 58s)
* 11:32 jayme: restarted docker-reporter-base-images and docker-reporter-releng-images on deneb - [[phab:T253396|T253396]]
* 11:31 jayme: pushed a scratch docker image as docker-registry.discovery.wmnet/envoy-tls-local-proxy:dontuseme - [[phab:T253396|T253396]]
* 11:28 awight@deploy1001: Synchronized php-1.35.0-wmf.38/extensions/QuickSurveys: BACON: [[gerrit:608477{{!}}Embedded surveys are hidden when no element is available (T256627)]] (duration: 00m 56s)
* 11:26 awight@deploy1001: Synchronized php-1.35.0-wmf.38/extensions/FileImporter: BACON: [[gerrit:608476{{!}}Set Status error if permission check returns false. (T256428)]] (duration: 00m 58s)
* 11:13 ema: deneb: systemctl restart docker-reporter-base-images.service
* 10:59 ema: upload librdkafka 0.11.6-1.1wmf1 to buster-wikimedia https://phabricator.wikimedia.org/P11703 [[phab:T256444|T256444]]
* 10:59 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 10:59 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 10:52 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1076', diff saved to https://phabricator.wikimedia.org/P11710 and previous config saved to /var/cache/conftool/dbconfig/20200630-105254-marostegui.json
* 10:45 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 10:45 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 10:41 ema: cp2040: restart purged and varnishkafka to use updated librdkafka1 [[phab:T256444|T256444]]
* 10:38 ema: cp2040: upgrade librdkafka1 to 0.11.6-1.1wmf1 https://phabricator.wikimedia.org/P11703 [[phab:T256444|T256444]]
* 10:37 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 10:37 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 10:30 hashar@deploy1001: Synchronized php-1.35.0-wmf.39/includes/specials/SpecialUndelete.php: Remove another use of PageArchive::getRevision - [[phab:T249982|T249982]] [[phab:T254176|T254176]] (duration: 00m 56s)
* 10:09 marostegui: Deploy schema change on db1076
* 10:09 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1076', diff saved to https://phabricator.wikimedia.org/P11708 and previous config saved to /var/cache/conftool/dbconfig/20200630-100912-marostegui.json
* 10:04 vgutierrez: rolling restart of eqiad cache nodes to catch up on kernel upgrades
* 10:03 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 10:03 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 10:02 volker-e@deploy1001: Finished deploy [design/style-guide@e3fda83]: Deploy design/style-guide:  (duration: 00m 07s)
* 10:02 volker-e@deploy1001: Started deploy [design/style-guide@e3fda83]: Deploy design/style-guide:
* 09:47 hashar@deploy1001: Pruned MediaWiki: 1.35.0-wmf.37 (duration: 02m 20s)
* 09:40 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 09:40 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 09:21 hashar@deploy1001: Pruned MediaWiki: 1.35.0-wmf.36 (duration: 28m 11s)
* 08:54 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 08:53 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 08:53 hashar@deploy1001: clean aborted: Pruned MediaWiki: 1.35.0-wmf.36 (duration: 00m 00s)
* 08:51 hashar: Applied security patches to wmf/1.35.0-wmf.39 # [[phab:T254176|T254176]]
* 08:51 vgutierrez: rolling restart of codfw cp nodes after "re-formatting" nvme devices - [[phab:T256655|T256655]]
* 08:23 vgutierrez: repool cp3053 - [[phab:T256632|T256632]]
* 08:10 hashar: 1.35.0-wmf.39 was branched at {{Gerrit|e169e3dabcb2217809fc41ba44b43a39ae1a678e}} [[phab:T254176|T254176]]
* 08:05 marostegui: Stop MySQL on db1117:3322 to clone db1080 (this will trigger haproxy alerts) - [[phab:T256717|T256717]]
* 08:05 vgutierrez: powercycle cp3053 (unresponsive after reboot) - [[phab:T256632|T256632]]
* 08:01 jbond42: disable puppet to restart puppetmasters front ends
* 07:42 vgutierrez: reboot cp3053 - [[phab:T256632|T256632]]
* 05:51 jhuneidi@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'blubberoid' for release 'staging' .
* 05:13 marostegui: Deploy schema change on s8 codfw - [[phab:T256680|T256680]]
* 04:58 marostegui: remove pl_from index from db1141, db1121, db1148 - [[phab:T256684|T256684]]
* 04:57 jhuneidi@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'blubberoid' for release 'staging' .
* 04:56 marostegui: Remove plfrom from db1096:3316 and db1098:3316 - [[phab:T256684|T256684]]
 
== 2020-06-29 ==
* 23:28 eileen: civicrm revision changed from {{Gerrit|52a32f2d66}} to {{Gerrit|391d0fdf75}}, config revision is {{Gerrit|f1b4bdb7b7}}
* 22:00 sbassett: Deployed patch for [[phab:T256171|T256171]]
* 21:56 sbassett: Deployed patch for [[phab:T255918|T255918]]
* 20:00 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1144:3315 [[phab:T256679|T256679]]', diff saved to https://phabricator.wikimedia.org/P11699 and previous config saved to /var/cache/conftool/dbconfig/20200629-200002-marostegui.json
* 19:43 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1144:3315 [[phab:T256679|T256679]]', diff saved to https://phabricator.wikimedia.org/P11698 and previous config saved to /var/cache/conftool/dbconfig/20200629-194327-marostegui.json
* 18:55 shdubsh: test mtail rc35+wmf2 on cp5001 - [[phab:T255776|T255776]]
* 18:15 Urbanecm: Morning B&C done
* 18:15 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|c86fcd4}}: Add HTTP proxy to MediaModeration ([[phab:T247943|T247943]]) (duration: 00m 58s)
* 18:10 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|aeb7b52}}: Setup rollbacker and mover on lijwiki ([[phab:T256109|T256109]]) (duration: 02m 05s)
* 17:30 sukhe: LDAP - added datn to groups wmde, nda - [[phab:T254442|T254442]]
* 15:43 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 15:43 akosiaris@cumin1001: START - Cookbook sre.hosts.downtime
* 15:37 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 15:37 akosiaris@cumin1001: START - Cookbook sre.hosts.downtime
* 15:31 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1105:3312', diff saved to https://phabricator.wikimedia.org/P11696 and previous config saved to /var/cache/conftool/dbconfig/20200629-153140-marostegui.json
* 15:20 gehel: repool wdqs1004 - catched up on lag
* 14:50 hnowlan@deploy1001: Finished deploy [restbase/deploy@900bcf6]: Redeploy to fix transient error in gom wiktionary deploy (duration: 00m 06s)
* 14:50 hnowlan@deploy1001: Started deploy [restbase/deploy@900bcf6]: Redeploy to fix transient error in gom wiktionary deploy
* 14:48 hnowlan@deploy1001: Finished deploy [restbase/deploy@900bcf6]: Enable gom wiktionary (duration: 13m 40s)
* 14:34 hnowlan@deploy1001: Started deploy [restbase/deploy@900bcf6]: Enable gom wiktionary
* 14:33 hnowlan@deploy1001: Finished deploy [restbase/deploy@900bcf6]: Enable gom wiktionary (duration: 17m 49s)
* 14:28 ema: A:cp rolling purged upgrade to 0.16 [[phab:T256479|T256479]]
* 14:22 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:608309{{!}}Add "E" as an alias of EntitySchema namespace on wikidata (T245529)]] (duration: 00m 57s)
* 14:20 ema: upload purged 0.16 to apt.wm.org [[phab:T256479|T256479]]
* 14:16 hnowlan@deploy1001: Started deploy [restbase/deploy@900bcf6]: Enable gom wiktionary
* 14:14 hnowlan@deploy1001: Finished deploy [restbase/deploy@ce5177e]: Enable gom wiktionary (duration: 20m 44s)
* 14:02 jforrester@deploy1001: Synchronized multiversion/MWConfigCacheGenerator.php: Fix 'closed-labs' reading as 'closed' for static config (duration: 00m 56s)
* 13:54 jforrester@deploy1001: Synchronized dblists/: Drop nonbetafeatures dblist, unused (duration: 00m 57s)
* 13:54 hnowlan@deploy1001: Started deploy [restbase/deploy@ce5177e]: Enable gom wiktionary
* 13:50 jforrester@deploy1001: Synchronized multiversion/MWConfigCacheGenerator.php: Drop 'nonbetafeatures' dblist from production reads (duration: 00m 56s)
* 13:49 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Switch uses from nonbetafeatures to lockeddown (duration: 00m 57s)
* 13:47 jforrester@deploy1001: Synchronized multiversion/MWConfigCacheGenerator.php: Add 'lockeddown' dblist to production reads (duration: 00m 57s)
* 13:43 jforrester@deploy1001: Synchronized dblists/lockeddown.dblist: Add lockddown dblist (unused as yet) (duration: 00m 59s)
* 13:35 vgutierrez: depool cp3053 due to nvme hardware issues
* 13:02 XioNoX: test pfw3-codfw uplinks failover
* 13:00 elukey: move archiva.wikimedia.org to archiva1002 (new buster vm); create archiva-old.wikimedia.org to archiva1001
* 12:58 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1105:3312', diff saved to https://phabricator.wikimedia.org/P11693 and previous config saved to /var/cache/conftool/dbconfig/20200629-125824-marostegui.json
* 12:56 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1085', diff saved to https://phabricator.wikimedia.org/P11692 and previous config saved to /var/cache/conftool/dbconfig/20200629-125630-marostegui.json
* 12:41 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 12:38 marostegui@cumin1001: START - Cookbook sre.hosts.downtime
* 12:32 jayme: deleted all tags for docker-registry.wikimedia.org/envoy-tls-local-proxy from docker registry - [[phab:T253396|T253396]]
* 12:20 marostegui: Stop MySQL on db2096 (codfw x1 master) for reimage [[phab:T254871|T254871]]
* 12:03 cdanis: re-pool eqiad [[phab:T256512|T256512]]
* 11:59 cdanis: deployed {{Gerrit|I132075ee}} on cr1-eqiad [[phab:T256512|T256512]]
* 11:58 cdanis: deployed {{Gerrit|I132075ee}} on cr2-eqiad [[phab:T256512|T256512]]
* 11:58 cdanis: deployed {{Gerrit|I132075ee}} on cr2-eqiad
* 11:41 cdanis: depool eqiad  [[phab:T256512|T256512]]
* 11:15 awight: EU BACON cooked
* 11:08 marostegui: Deploy schema change on db1095:3312 (lag will show up)
* 10:41 jdrewniak@deploy1001: Synchronized portals: Wikimedia Portals Update: [[gerrit:608284{{!}} Bumping portals to master (608284)]] (duration: 00m 57s)
* 10:41 jdrewniak@deploy1001: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: [[gerrit:608284{{!}} Bumping portals to master (608284)]] (duration: 00m 58s)
* 10:29 gehel: restart blazegraph on wdqs1004 + depool to catchup on lag
* 09:59 ema: cp2040: upgrade purged to 0.16 [[phab:T256479|T256479]]
* 09:59 jbond42: switch idp to memcached
* 09:47 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 09:47 jmm@cumin2001: START - Cookbook sre.hosts.downtime
* 09:45 marostegui: Deploy schema change on dbstore1004:3312
* 09:11 jbond42: dploying shellcheck CI https://gerrit.wikimedia.org/r/c/operations/puppet/+/602693
* 08:59 marostegui: Compress InnoDB on db1089 (this will cause lag and will take a few days) - [[phab:T254462|T254462]]
* 08:58 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1089 for InnoDB compression [[phab:T254462|T254462]]', diff saved to https://phabricator.wikimedia.org/P11690 and previous config saved to /var/cache/conftool/dbconfig/20200629-085854-marostegui.json
* 08:48 marostegui@cumin1001: dbctl commit (dc=all): 'Fully pool db1135 into s1 [[phab:T253217|T253217]]', diff saved to https://phabricator.wikimedia.org/P11688 and previous config saved to /var/cache/conftool/dbconfig/20200629-084827-marostegui.json
* 08:40 ema: cp2034: restart purged [[phab:T256444|T256444]]
* 08:36 ema: cp4025: restart purged [[phab:T256444|T256444]]
* 08:36 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly pool db1135 into s1 [[phab:T253217|T253217]]', diff saved to https://phabricator.wikimedia.org/P11687 and previous config saved to /var/cache/conftool/dbconfig/20200629-083631-marostegui.json
* 08:33 ema: cp1087, cp2033, cp2037, cp2039: repool after spending (way) more than 24h depooled [[phab:T256444|T256444]]
* 08:26 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly pool db1135 into s1 [[phab:T253217|T253217]]', diff saved to https://phabricator.wikimedia.org/P11686 and previous config saved to /var/cache/conftool/dbconfig/20200629-082635-marostegui.json
* 08:24 marostegui: Deploy schema change on s2 codfw (lag will show up) [[phab:T253276|T253276]]
* 08:04 XioNoX: add term selected-paths to policy BGP_IXP_in on all routers
* 08:03 godog: prometheus eqiad -- lvextend --resizefs --size +200G vg-ssd/prometheus-ops
* 08:02 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly pool db1135 into s1 [[phab:T253217|T253217]]', diff saved to https://phabricator.wikimedia.org/P11685 and previous config saved to /var/cache/conftool/dbconfig/20200629-080253-marostegui.json
* 07:46 marostegui@cumin1001: dbctl commit (dc=all): 'Add db1135 (depooled) to s1 [[phab:T253217|T253217]]', diff saved to https://phabricator.wikimedia.org/P11684 and previous config saved to /var/cache/conftool/dbconfig/20200629-074611-marostegui.json
* 07:16 XioNoX: push new pfw firewall rules - [[phab:T256170|T256170]]
* 07:13 marostegui: Deploy schema change on db1085 with replication to labs [[phab:T253276|T253276]]
* 07:12 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1085', diff saved to https://phabricator.wikimedia.org/P11683 and previous config saved to /var/cache/conftool/dbconfig/20200629-071236-marostegui.json
* 06:53 marostegui@cumin1001: dbctl commit (dc=all): 'Remove db1080 from MW', diff saved to https://phabricator.wikimedia.org/P11682 and previous config saved to /var/cache/conftool/dbconfig/20200629-065335-marostegui.json
* 06:50 elukey: execute gnt-instance remove an-launcher1001.eqiad.wmnet on ganeti1011 - [[phab:T256363|T256363]]
* 06:47 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 06:46 elukey@cumin1001: START - Cookbook sre.hosts.decommission
* 06:45 marostegui: Deploy MCR schema change  on db1090:3312
* 06:35 elukey: force puppet run on ores* to overcome celery OOMs on some nodes
* 04:57 marostegui: Stop MySQL on db1080 to clone db1135 [[phab:T253217|T253217]]
* 04:56 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 04:53 marostegui@cumin1001: START - Cookbook sre.hosts.downtime
 
== 2020-06-28 ==
* 21:43 krinkle@deploy1001: Synchronized wmf-config/CommonSettings.php: no-op {{Gerrit|I56eb4a802}} (duration: 00m 58s)
* 21:38 krinkle@deploy1001: Synchronized wmf-config/InitialiseSettings-labs.php: beta-only {{Gerrit|I56eb4a802}} (duration: 01m 00s)
 
== 2020-06-27 ==
* 20:22 qchris: Gerrit upgrade done.
* 19:49 mutante: removed 2620:0:861:3:208:80:154:136 from /etc/network/interfaces on gerrit1001, rebooting
* 19:27 mutante: rebooting gerrit1001 one more time
* 19:24 mutante: restarted ferm on gerrit1001
* 19:19 mutante: rebooting gerrit1001 one more time
* 19:05 mutante: rebooting gerrit1001
* 18:58 mutante: rebooting gerrit2001
* 18:49 hashar: Enabling beta cluster update job (gerrit maintenance) https://integration.wikimedia.org/ci/view/Beta/job/beta-code-update-eqiad/
* 18:35 qchris@deploy1001: Finished deploy [gerrit/gerrit@da40615]: Gerrit to v3.2.2-98-g98d827eaa3 on gerrit2001 (duration: 00m 10s)
* 18:34 qchris@deploy1001: Started deploy [gerrit/gerrit@da40615]: Gerrit to v3.2.2-98-g98d827eaa3 on gerrit2001
* 18:27 qchris@deploy1001: Finished deploy [gerrit/gerrit@da40615]: Gerrit to v3.2.2-98-g98d827eaa3 on gerrit1001 (duration: 00m 08s)
* 18:27 qchris@deploy1001: Started deploy [gerrit/gerrit@da40615]: Gerrit to v3.2.2-98-g98d827eaa3 on gerrit1001
* 17:25 hashar: Disabled beta cluster update job (gerrit maintenance) https://integration.wikimedia.org/ci/view/Beta/job/beta-code-update-eqiad/
* 17:19 qchris: Stopping gerrit on gerrit1001 for the Gerrit upgrade
* 17:14 qchris: Duplicating reviewdb changes so we get a cheap and quick rollback
* 17:11 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 17:11 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 17:11 qchris: Disabling puppet on gerrit1001 for Gerrit upgrades + data migrations
* 17:11 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 17:11 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 17:07 qchris: Starting Gerrit upgrade to v3.2.2-98-g98d827eaa3
* 15:44 qchris@deploy1001: Finished deploy [gerrit/gerrit@da40615]: Gerrit to v3.2.2-98-g98d827eaa3 on gerrit1002 (gerrit-test) (duration: 00m 08s)
* 15:44 qchris@deploy1001: Started deploy [gerrit/gerrit@da40615]: Gerrit to v3.2.2-98-g98d827eaa3 on gerrit1002 (gerrit-test)
* 13:03 qchris@deploy1001: Finished deploy [gerrit/gerrit@460e439]: Gerrit to v3.2.2-97-gcaf5020db1 on gerrit1002 (gerrit-test) (duration: 00m 08s)
* 13:03 qchris@deploy1001: Started deploy [gerrit/gerrit@460e439]: Gerrit to v3.2.2-97-gcaf5020db1 on gerrit1002 (gerrit-test)
 
== 2020-06-26 ==
* 18:42 robh: all ulsfo onsite work completed as of 30 minutes ago
* 17:52 robh: msw2-ulsfo work done, all mgmt items confirmed back online and icinga alerts cleared, moving onto msw1-ulsfo (rack 22) and will lose all mgmt in that rack for next 10-20 minutes [[phab:T256300|T256300]]
* 17:52 robh: msw2-ulsfo work done, all mgmt items confirmed back online and icinga alerts cleared, moving onto msw1-ulsfo (rack 22) and will lose all mgmt in that rack for next 10-20 minutes
* 17:11 robh: msw work in ulsfo via [[phab:T256300|T256300]]
* 10:24 ema: pool 5006 [[phab:T256449|T256449]]
* 10:22 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1085', diff saved to https://phabricator.wikimedia.org/P11677 and previous config saved to /var/cache/conftool/dbconfig/20200626-102248-marostegui.json
* 10:22 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1093', diff saved to https://phabricator.wikimedia.org/P11676 and previous config saved to /var/cache/conftool/dbconfig/20200626-102201-marostegui.json
* 10:03 ema: cp2039: restart purged [[phab:T256444|T256444]]
* 09:57 ema: cp2037: restart purged [[phab:T256444|T256444]]
* 09:55 ema: cp1087: restart purged [[phab:T256444|T256444]]
* 09:46 ema: cp2033: restart purged [[phab:T256444|T256444]]
* 09:38 akosiaris: move the sessionstore eqiad pods back to the dedicated sessionstore nodes
* 09:37 akosiaris@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'sessionstore' for release 'production' .
* 09:35 akosiaris: move the sessionstore codfw pods back to the dedicated sessionstore nodes
* 09:35 akosiaris@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'sessionstore' for release 'production' .
* 09:08 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1093 for schema change', diff saved to https://phabricator.wikimedia.org/P11675 and previous config saved to /var/cache/conftool/dbconfig/20200626-090813-marostegui.json
* 08:58 jynus@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 08:56 jynus@cumin1001: START - Cookbook sre.hosts.downtime
* 08:33 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1088', diff saved to https://phabricator.wikimedia.org/P11674 and previous config saved to /var/cache/conftool/dbconfig/20200626-083319-marostegui.json
* 08:25 ayounsi@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 08:22 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1088 for schema change', diff saved to https://phabricator.wikimedia.org/P11673 and previous config saved to /var/cache/conftool/dbconfig/20200626-082242-marostegui.json
* 08:20 ayounsi@cumin1001: START - Cookbook sre.dns.netbox
* 08:20 ayounsi@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
* 08:05 akosiaris@cumin1001: conftool action : set/pooled=yes; selector: name=kubernetes.*.wmnet
* 08:04 akosiaris@cumin1001: conftool action : set/weight=10; selector: name=kubernetes.*.wmnet
* 08:04 akosiaris: pool all new kubernetes nodes in LVS [[phab:T252185|T252185]] [[phab:T256236|T256236]]
* 07:57 ayounsi@cumin1001: START - Cookbook sre.dns.netbox
* 07:44 volans: force rebooted cp5006 that is unresponsive (after having depooled it) - [[phab:T256449|T256449]]
* 07:42 volans@cumin1001: conftool action : set/pooled=no; selector: name=cp5006.eqsin.wmnet
* 06:40 tstarling@deploy1001: Synchronized wmf-config/InitialiseSettings.php: add cache-cookies log channel (duration: 00m 59s)
* 05:13 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db2088:3312, db2104', diff saved to https://phabricator.wikimedia.org/P11672 and previous config saved to /var/cache/conftool/dbconfig/20200626-051328-marostegui.json
* 05:06 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 05:03 marostegui@cumin1001: START - Cookbook sre.hosts.downtime
* 04:01 cdanis: re-enable puppet on cps
* 03:54 cdanis: ✔️ cdanis@cumin1001.eqiad.wmnet ~ 🕛🍺 sudo cumin A:cp 'disable-puppet "I39e1c68a is broken"'
* 03:54 cdanis: https://gerrit.wikimedia.org/r/c/operations/puppet/+/607917
* 02:52 tstarling@deploy1001: Synchronized private/PrivateSettings.php: updating wgAuthenticationTokenVersion per my wikitech-l post (duration: 00m 57s)
* 02:19 cdanis: three more hosts not processing purges for multiple days ✔️ cdanis@cumin1001.eqiad.wmnet ~ 🕥🍺 sudo cumin 'cp2033*,cp2037*,cp2039*' 'depool'
* 02:17 cdanis: depooling cp1087 which has not been processing purges for 11.415 days
* 01:53 cdanis: {{Gerrit|I6cc5f3e6}} has been deployed to all cp text nodes [[phab:T256395|T256395]]
* 01:41 cdanis: ✔️ cdanis@cumin1001.eqiad.wmnet ~ 🕘🍺 sudo cumin A:cp 'enable-puppet "cdanis deploying {{Gerrit|I6cc5f3e6}} [[phab:T256395|T256395]]"'
* 01:13 cdanis: ✔️ cdanis@cumin1001.eqiad.wmnet ~ 🕘🍺 sudo cumin A:cp 'disable-puppet "cdanis deploying {{Gerrit|I6cc5f3e6}} [[phab:T256395|T256395]]"'
* 00:41 eileen: tools revision changed from {{Gerrit|c96813eda4}} to {{Gerrit|aab96444df}}
* 00:38 tstarling@deploy1001: Synchronized w/T256395-cookie-test.php: (no justification provided) (duration: 00m 56s)
* 00:36 tstarling@deploy1001: Synchronized w/T256395-cookie-test.php: (no justification provided) (duration: 00m 58s)
 
== 2020-06-25 ==
* 23:37 mutante: puppetmaster - signing certs and initial puppet run for logstash1030/logstash1031 - no prod role yet
* 22:25 mutante: puppetmaster - signing certs and initial run for logstash2030/2031 - no prod role yet
* 20:57 dzahn@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
* 20:31 dzahn@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
* 19:30 dcausse: repooling wdqs1007.eqiad.wmnet
* 19:05 brennen@deploy1001: rebuilt and synchronized wikiversions files: all wikis to 1.35.0-wmf.38
* 18:58 mutante: LDAP - added qchris to archiva-deployers ([[phab:T256404|T256404]])
* 17:37 mutante: mwmaint1002 - restarted apache2 to add server_headers snippet for [[phab:T255629|T255629]] - but not working as expected yet
* 16:40 dzahn@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
* 16:31 sukhe@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 16:31 sukhe@cumin1001: START - Cookbook sre.hosts.downtime
* 16:28 krinkle@deploy1001: Synchronized wmf-config/logging.php: {{Gerrit|Ia6ef7617d378}} (duration: 01m 02s)
* 16:20 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 16:16 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
* 16:16 dzahn@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
* 16:16 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 16:15 Krinkle: I've deleted a "saved object" visualisation in logstash called "Production Errors & Deployments" which seemed to be corrupt and redirect random logstash dashboards to a management page. Backed up at https://phabricator.wikimedia.org/P11666 (NDA)
* 16:15 moritzm: installing libxml2 security updates
* 16:12 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
* 16:09 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 16:06 moritzm: installing 4.9.210-1+deb9u1~deb8u1 on jessie hosts (fixed kernel for recent cacheoutattack CPU leaks)
* 16:04 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
* 16:03 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 15:59 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
* 15:55 krinkle@deploy1001: Synchronized wmf-config/logging.php: {{Gerrit|I4c519f88c613fc}} (duration: 01m 05s)
* 15:54 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm
* 15:53 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm
* 15:51 vgutierrez: upgrade ATS in eqiad to version 8.0.8
* 15:42 ppchelko@deploy1001: Finished deploy [restbase/deploy@821e96b]: Only emit vary: accept-language for feeds when it matters [[phab:T256358|T256358]], more groups (duration: 05m 09s)
* 15:37 ppchelko@deploy1001: Started deploy [restbase/deploy@821e96b]: Only emit vary: accept-language for feeds when it matters [[phab:T256358|T256358]], more groups
* 15:37 ppchelko@deploy1001: Finished deploy [restbase/deploy@821e96b]: Only emit vary: accept-language for feeds when it matters [[phab:T256358|T256358]], more groups (duration: 03m 38s)
* 15:33 ppchelko@deploy1001: Started deploy [restbase/deploy@821e96b]: Only emit vary: accept-language for feeds when it matters [[phab:T256358|T256358]], more groups
* 15:33 ppchelko@deploy1001: Finished deploy [restbase/deploy@821e96b]: Only emit vary: accept-language for feeds when it matters [[phab:T256358|T256358]], more groups (duration: 03m 24s)
* 15:30 vgutierrez: upgrade ATS in codfw to version 8.0.8
* 15:30 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm
* 15:30 ppchelko@deploy1001: Started deploy [restbase/deploy@821e96b]: Only emit vary: accept-language for feeds when it matters [[phab:T256358|T256358]], more groups
* 15:29 ppchelko@deploy1001: Finished deploy [restbase/deploy@821e96b]: Only emit vary: accept-language for feeds when it matters [[phab:T256358|T256358]], take 2 (duration: 06m 38s)
* 15:29 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm
* 15:25 reedy@deploy1001: Synchronized wmf-config/CommonSettings.php: structured logging for xff log, stop logging jobrunner requests (duration: 01m 05s)
* 15:23 ppchelko@deploy1001: Started deploy [restbase/deploy@821e96b]: Only emit vary: accept-language for feeds when it matters [[phab:T256358|T256358]], take 2
* 15:20 ppchelko@deploy1001: Finished deploy [restbase/deploy@821e96b]: Only emit vary: accept-language for feeds when it matters [[phab:T256358|T256358]] (duration: 01m 37s)
* 15:19 ppchelko@deploy1001: Started deploy [restbase/deploy@821e96b]: Only emit vary: accept-language for feeds when it matters [[phab:T256358|T256358]]
* 14:51 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 14:48 elukey@cumin1001: START - Cookbook sre.hosts.downtime
* 14:43 vgutierrez: upgrade ATS in esams to version 8.0.8
* 14:29 papaul: replacing mr1-codfw
* 14:24 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 14:19 vgutierrez: upgrade ATS in eqsin to version 8.0.8
* 14:19 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
* 14:17 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 14:12 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
* 14:05 marostegui: Stop MySQL on db2104 and db2088:3312
* 14:05 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2104', diff saved to https://phabricator.wikimedia.org/P11664 and previous config saved to /var/cache/conftool/dbconfig/20200625-140519-marostegui.json
* 14:04 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 14:04 marostegui@cumin2001: dbctl commit (dc=all): 'Depool db2088:3312', diff saved to https://phabricator.wikimedia.org/P11663 and previous config saved to /var/cache/conftool/dbconfig/20200625-140421-marostegui.json
* 13:59 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
* 13:57 reedy@deploy1001: Synchronized wmf-config/CommonSettings.php: [[phab:T254301|T254301]] Remove OAuthReplaceMessage hook subscriber (duration: 01m 05s)
* 13:56 vgutierrez: upgrade ATS in ulsfo to version 8.0.8
* 13:51 vgutierrez: upload trafficserver 8.0.8 to apt.wm.o (buster)
* 13:51 reedy@deploy1001: Synchronized wmf-config/CommonSettings-labs.php: Replace PasswordNotInLargeBlacklist with PasswordNotInCommonList (duration: 01m 05s)
* 13:49 reedy@deploy1001: Synchronized wmf-config/CommonSettings.php: Replace PasswordNotInLargeBlacklist with PasswordNotInCommonList (duration: 01m 06s)
* 13:40 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 13:36 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
* 13:30 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 13:28 godog: bounce logstash on logstash1007
* 13:26 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
* 13:19 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 13:13 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
* 13:02 moritzm: installing 4.9.210-1+deb9u1~deb8u1 on jessie hosts (fixed kernel for recent cacheoutattack CPU leaks)
* 12:55 elukey: rename notebook1003 to an-launcher1002 - [[phab:T256363|T256363]]
* 12:51 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 12:47 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
* 12:45 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 12:44 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 12:44 elukey@cumin1001: START - Cookbook sre.hosts.decommission
* 12:42 moritzm: installing libmspack security updates
* 12:39 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
* 12:34 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 12:32 moritzm: installing libssh2 security updates
* 12:30 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
* 12:30 elukey@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 12:27 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 12:26 moritzm: installing libjpeg-turbo security updates
* 12:25 elukey@cumin1001: START - Cookbook sre.hosts.downtime
* 12:25 elukey@cumin1001: START - Cookbook sre.hosts.downtime
* 12:21 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 12:17 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
* 12:13 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 12:09 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
* 11:55 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 11:55 moritzm: installing python3.4 security updates
* 11:55 awight: EU BACON is cooked
* 11:51 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
* 11:50 awight@deploy1001: Synchronized wmf-config/InitialiseSettings.php: BACON: [[gerrit:607767{{!}}Enable QuickSurveys on metawiki (T253112)]] (duration: 01m 05s)
* 11:48 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 11:45 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
* 11:41 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 11:38 awight@deploy1001: Synchronized wmf-config/InitialiseSettings.php: BACON: [[gerrit:607763{{!}}Enable WMDE Tech Wishes survey configuration (T253112)]] (duration: 01m 09s)
* 11:36 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
* 11:27 moritzm: rolling reboot of  ms-be[1044-1059].eqiad.wmnet
* 11:25 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 11:21 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
* 11:06 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 11:01 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
* 11:00 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 10:56 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
* 10:53 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 10:50 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
* 10:45 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 10:45 moritzm: rolling reboot of ms-be[2044-2056]
* 10:41 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
* 10:38 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 10:33 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
* 10:32 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 10:27 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
* 10:25 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 10:22 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
* 10:18 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 10:17 akosiaris@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
* 10:17 akosiaris@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
* 10:13 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
* 10:12 akosiaris@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
* 10:12 akosiaris@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
* 10:07 akosiaris@cumin1001: START - Cookbook sre.ganeti.makevm
* 10:07 akosiaris@cumin1001: START - Cookbook sre.ganeti.makevm
* 10:07 akosiaris@cumin1001: START - Cookbook sre.ganeti.makevm
* 10:07 akosiaris@cumin1001: START - Cookbook sre.ganeti.makevm
* 10:04 akosiaris: poweroff kubestagetcd1004 and ganeti1005 for [[phab:T244530|T244530]]
* 10:04 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 10:00 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 10:00 akosiaris@cumin1001: START - Cookbook sre.hosts.downtime
* 10:00 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 10:00 akosiaris@cumin1001: START - Cookbook sre.hosts.downtime
* 09:59 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
* 09:58 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 09:57 akosiaris@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99)
* 09:57 akosiaris@cumin1001: START - Cookbook sre.ganeti.makevm
* 09:53 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
* 09:37 volans@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 09:34 volans@cumin1001: START - Cookbook sre.dns.netbox
* 09:28 akosiaris: schedule downtime for eqiad wikifeeds as it's flapping too much without yet knowing why. [[phab:T256358|T256358]]
* 09:28 godog: extend lv on thanos-fe2001 and restart thanos-compact
* 09:21 vgutierrez: rolling restart of  ncredir instances to catch up on kernel updates
* 09:13 joal@deploy1001: Finished deploy [analytics/refinery@4aba370] (thin): Analytics fix over weekly train THIN [analytics/refinery@4aba370] (duration: 00m 10s)
* 09:13 joal@deploy1001: Started deploy [analytics/refinery@4aba370] (thin): Analytics fix over weekly train THIN [analytics/refinery@4aba370]
* 09:13 joal@deploy1001: Finished deploy [analytics/refinery@4aba370]: Analytics fix over weekly train [analytics/refinery@4aba370] (duration: 16m 27s)
* 09:01 vgutierrez: restarting acme-chief instances to catch up on kernel updates
* 08:56 joal@deploy1001: Started deploy [analytics/refinery@4aba370]: Analytics fix over weekly train [analytics/refinery@4aba370]
* 08:42 hashar: releases2002: restarted bacula-fd to take in account the puppet provided configuration  # [[phab:T247652|T247652]]
* 08:14 jynus: restarting bacula-dir on backup1001
* 08:09 akosiaris: restart etherpad-lite on etherpad1002
* 08:03 marostegui: Failover m1 from db1135 to db1097 - [[phab:T254556|T254556]]
* 07:52 jynus: stop bacula-director on backup1001 for db maintenance [[phab:T254556|T254556]]
* 07:49 akosiaris@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99)
* 07:49 akosiaris@cumin1001: START - Cookbook sre.ganeti.makevm
* 07:49 akosiaris@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99)
* 07:49 akosiaris@cumin1001: START - Cookbook sre.ganeti.makevm
* 07:49 akosiaris@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99)
* 07:48 akosiaris@cumin1001: START - Cookbook sre.ganeti.makevm
* 07:48 akosiaris@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99)
* 07:47 akosiaris@cumin1001: START - Cookbook sre.ganeti.makevm
* 07:36 elukey: reboot an-launcher1001 for kernel upgrades
* 07:18 elukey: reboot kafkamon* vms for kernel upgrades
* 07:08 marostegui: Start pre switchover steps on m1 [[phab:T254556|T254556]]
* 06:40 elukey: reboot matomo1002 for kernel upgrades
* 06:35 elukey: reboot archiva1002 (new vm, not yet in service) for kernel upgrades
* 06:34 elukey: reboot archiva for kernel upgrades
* 06:31 elukey: force puppet run on ores1003/1005 to restore celery (killed by the oom)
* 06:24 elukey: reboot an-tool* vms for kernel upgrades
* 06:23 elukey: reboot analytics-tool1004 for kernel upgrades (Superset host)
* 06:22 elukey: reboot analytics-tool1001 for kernel upgrades
* 06:19 elukey: execute ip addr flush ens5 on an-airflow1001 to clear RTNETLINK answers: File exists (error from ifup@ens5.service)
* 06:03 elukey: reboot an-airflow1001 for kernel upgrades
* 04:26 marostegui: Remove triggers from db2095:3312 - [[phab:T238966|T238966]]
* 04:25 marostegui: Deploy schema change on s2 codfw - [[phab:T238966|T238966]]
* 00:48 twentyafterfour: restart php-fpm on phab1001 to fix [[phab:T256343|T256343]]
* 00:12 twentyafterfour: phabricator updated, all seems normal
* 00:11 twentyafterfour: updating phabricator to release/2020-06-25/1, momentary (<1 minute) downtime expected.
 
== 2020-06-24 ==
* 23:44 mutante: releases2002 - systemctl stop jenkins, kill 15244 (rogue jenkins process), start jenkins with systemctl start jenkins ([[phab:T247652|T247652]])
* 23:43 mutante: releases1002 - kill rogue jenkins process, start jenkins with systemctl start jenkins ([[phab:T247652|T247652]])
* 23:02 mutante: releases1002/2002 - disabling puppet, removing failing cron job to pull deployment_charts (because /srv/deployment-charts does not exist yet)
* 21:45 shdubsh: install mtail 3.0.0~rc35+wmf2 on logstash1007 - [[phab:T255776|T255776]]
* 20:42 brennen@deploy1001: Synchronized php: group1 wikis to 1.35.0-wmf.38 (duration: 01m 06s)
* 20:41 brennen@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.35.0-wmf.38
* 20:41 brennen: train 1.35.0-wmf.38: attempting to roll forward to group1 after php-fpm restart on mw1287 ([[phab:T256305|T256305]], [[phab:T254175|T254175]])
* 20:32 cdanis: restarting php-fpm on mw1287 [[phab:T256305|T256305]]
* 20:32 bsitzmann@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
* 20:30 bsitzmann@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
* 20:28 bsitzmann@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'mobileapps' for release 'staging' .
* 20:14 halfak@deploy1001: Finished deploy [ores/deploy@1b87365]: [[phab:T254505|T254505]] (duration: 14m 08s)
* 20:09 bsitzmann@deploy1001: Finished deploy [mobileapps/deploy@80c763d]: Update mobileapps to {{Gerrit|a413db4f}} (duration: 03m 37s)
* 20:06 bsitzmann@deploy1001: Started deploy [mobileapps/deploy@80c763d]: Update mobileapps to {{Gerrit|a413db4f}}
* 20:00 halfak@deploy1001: Started deploy [ores/deploy@1b87365]: [[phab:T254505|T254505]]
* 19:38 otto@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Revert Migrate SearchSatisfaction from EventLogging to EventGate on group1 - [[phab:T249261|T249261]] (duration: 01m 06s)
* 19:17 brennen@deploy1001: rebuilt and synchronized wikiversions files: Revert group1 wikis to 1.35.0-wmf.37
* 19:11 brennen@deploy1001: Synchronized php: group1 wikis to 1.35.0-wmf.38 (duration: 01m 04s)
* 19:10 brennen@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.35.0-wmf.38
* 19:01 brennen: train 1.35.0-wmf.38: finished triage meeting, clear to proceed to group 1 ([[phab:T254175|T254175]])
* 18:53 joal@deploy1001: Finished deploy [analytics/refinery@1112749] (thin): Regular analytics weekly train THIN [analytics/refinery@1112749] (duration: 00m 09s)
* 18:53 joal@deploy1001: Started deploy [analytics/refinery@1112749] (thin): Regular analytics weekly train THIN [analytics/refinery@1112749]
* 18:53 joal@deploy1001: Finished deploy [analytics/refinery@1112749]: Regular analytics weekly train [analytics/refinery@1112749] (duration: 05m 50s)
* 18:49 Urbanecm: Morning B&C deploy window is done
* 18:48 cstone: payments-wiki revision changed from {{Gerrit|28ad76dcd7}} to {{Gerrit|91852dbc9b}}
* 18:47 Urbanecm: mwscript namespaceDupes.php --wiki=guwiki --fix ([[phab:T255358|T255358]])
* 18:47 joal@deploy1001: Started deploy [analytics/refinery@1112749]: Regular analytics weekly train [analytics/refinery@1112749]
* 18:46 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|2a1dfc5}}: Set namespace aliases for guwiki ([[phab:T255358|T255358]]) (duration: 01m 05s)
* 18:42 Urbanecm: mwscript namespaceDupes.php --wiki=banwiki --add-prefix=[[phab:T255941|T255941]] --fix ([[phab:T255941|T255941]])
* 18:41 Urbanecm: Run mwscript namespaceDupes.php --wiki=banwiki --fix ([[phab:T255941|T255941]])
* 18:41 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|c6d6c85}}: Set WP as a NS_PROJECT alias for banwiki ([[phab:T255941|T255941]]) (duration: 01m 06s)
* 18:38 Urbanecm: Run mwscript namespaceDupes.php dewiktionary --fix ([[phab:T256242|T256242]])
* 18:38 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|2b93e0f}}: Define Rekonstruktion NS for dewiktionary ([[phab:T256242|T256242]]) (duration: 01m 05s)
* 18:29 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|dea9214}}: Revert "IS: Cleanup some redundant rows." ([[phab:T256279|T256279]]) (duration: 01m 05s)
* 18:25 ppchelko@deploy1001: Synchronized wmf-config/InitialiseSettings.php: EventBus: Emit kafka purges for everything gerrit:607298 (duration: 01m 05s)
* 18:19 ppchelko@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enable MediaModeration on group0 gerrit:607327 (duration: 01m 04s)
* 18:08 ppchelko@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enable click tracking in Vector on beta cluster gerrit:607136 IS.php (duration: 01m 05s)
* 18:06 ppchelko@deploy1001: Synchronized wmf-config/InitialiseSettings-labs.php: Enable click tracking in Vector on beta cluster gerrit:607136 IS-labs.php (duration: 01m 07s)
* 17:31 elukey: update archiva-ci user's password in Jenkins credentials plugin
* 16:56 elukey: update archiva-deploy user's password in Jenkins credentials plugin
* 16:46 ppchelko@deploy1001: Finished deploy [restbase/deploy@5f08f32]: Release PCS endpoints updates, feeds timed out, redo (duration: 05m 11s)
* 16:41 ppchelko@deploy1001: Started deploy [restbase/deploy@5f08f32]: Release PCS endpoints updates, feeds timed out, redo
* 16:40 ppchelko@deploy1001: Finished deploy [restbase/deploy@5f08f32]: Release PCS endpoints updates, take 2 (duration: 14m 11s)
* 16:34 brennen@deploy1001: Finished scap: (no justification provided) (duration: 60m 22s)
* 16:30 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 16:27 elukey@cumin1001: START - Cookbook sre.hosts.downtime
* 16:26 ppchelko@deploy1001: Started deploy [restbase/deploy@5f08f32]: Release PCS endpoints updates, take 2
* 16:17 elukey: reimage db1108 to debian Buster - [[phab:T234826|T234826]]
* 15:53 ppchelko@deploy1001: Finished deploy [restbase/deploy@386b736]: Revert (duration: 27m 21s)
* 15:38 brennen: previous scap sync for [[phab:T256151|T256151]] - [[gerrit:607379]] and [[gerrit:607380]]
* 15:36 kormat@cumin1001: dbctl commit (dc=all): 'Pool db1088 @ 100% into s6 [[phab:T255927|T255927]]', diff saved to https://phabricator.wikimedia.org/P11652 and previous config saved to /var/cache/conftool/dbconfig/20200624-153604-kormat.json
* 15:34 brennen@deploy1001: Started scap: (no justification provided)
* 15:25 ppchelko@deploy1001: Started deploy [restbase/deploy@386b736]: Revert
* 15:24 ppchelko@deploy1001: deploy aborted: Release updates to PCS endpoints (duration: 05m 04s)
* 15:20 jayme: rolling restart of swift-proxy on thanos-fe[2001-2003].codfw.wmnet,thanos-fe[1001-1003].eqiad.wmnet - [[phab:T256020|T256020]]
* 15:19 ppchelko@deploy1001: Started deploy [restbase/deploy@9686627]: Release updates to PCS endpoints
* 15:06 brennen: merging backports and running a full scap sync for UBN at [[phab:T256151|T256151]]
* 15:00 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 14:57 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
* 14:57 moritzm: rebooting deneb for kernel update
* 14:57 ema: rmlist teampractices [[phab:T255525|T255525]]
* 14:42 otto@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Migrate SearchSatisfaction from EventLogging to EventGate on group0 - [[phab:T249261|T249261]] (duration: 01m 06s)
* 13:28 nikerabbit@deploy1001: Synchronized wmf-config/CommonSettings.php: [config] 603167 Remove TranslationNotifications user settings 1/2 (2nd attempt, now with correct file) (duration: 01m 06s)
* 13:23 marostegui: Deploy schema change on s6 eqiad primary master - [[phab:T238966|T238966]]
* 12:59 jbond42: update metamonitoring to use icinga-extmon.wikimedia.org
* 12:23 akosiaris@cumin1001: conftool action : set/pooled=yes; selector: name=kubernetes1005.eqiad.wmnet
* 12:23 akosiaris@cumin1001: conftool action : set/pooled=yes; selector: name=kubernetes1006.eqiad.wmnet
* 12:19 akosiaris@cumin1001: conftool action : set/pooled=no; selector: name=kubernetes1006.eqiad.wmnet
* 12:19 akosiaris@cumin1001: conftool action : set/pooled=no; selector: name=kubernetes1005.eqiad.wmnet
* 12:19 akosiaris@cumin1001: conftool action : set/pooled=yes; selector: name=kubernetes2005.codfw.wmnet
* 12:19 akosiaris@cumin1001: conftool action : set/pooled=yes; selector: name=kubernetes2006.codfw.wmnet
* 12:17 akosiaris: depool/drain/reboot/pool kubernetes1005,6 for CPU capacity increase [[phab:T256236|T256236]]
* 12:14 akosiaris: reboot kubernetes2005,6 for CPU capacity increase [[phab:T256236|T256236]]
* 12:11 akosiaris: depool kubernetes2005,kubernetes2006 for CPU capacity increase [[phab:T256236|T256236]]
* 12:10 akosiaris: depool kubernetes2005,kubernetes2006 for CPU capacity increase
* 12:05 akosiaris@cumin1001: conftool action : set/pooled=no; selector: name=kubernetes2006.codfw.wmnet
* 12:05 akosiaris@cumin1001: conftool action : set/pooled=no; selector: name=kubernetes2005.codfw.wmnet
* 12:04 awight: EU vegan BACON cooked
* 12:03 awight@deploy1001: Synchronized php-1.35.0-wmf.38/extensions/GrowthExperiments: BACON: [[gerrit:607453{{!}}Help panel home screen menu item fixes (T255254)]] (duration: 01m 06s)
* 11:40 nikerabbit@deploy1001: Synchronized private/PrivateSettings.php: Remove TranslationNotifications user settings 3/2 (duration: 01m 06s)
* 11:35 nikerabbit@deploy1001: Synchronized private/readme.php: [config] 607414 Remove TranslationNotifications user settings 2/2 (duration: 01m 04s)
* 11:28 nikerabbit@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [config] 603167 Remove TranslationNotifications user settings 1/2 (duration: 01m 03s)
* 11:09 awight@deploy1001: Synchronized wmf-config/CommonSettings.php: BACON: [[gerrit:605255{{!}}TwoColConflict: Talk page small deployment CommonSettings.php (T254458)]] (duration: 01m 17s)
* 10:45 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 10:39 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
* 10:38 marostegui: Stop haproxy on dbproxy1003 [[phab:T256216|T256216]]
* 10:36 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 10:36 jmm@cumin2001: START - Cookbook sre.hosts.downtime
* 10:01 volans: Production management IP allocation must be done from Netbox from now on, see https://wikitech.wikimedia.org/wiki/DNS/Netbox#Cutoff_dates
* 09:55 volans@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 09:53 kormat@cumin1001: dbctl commit (dc=all): 'Pool db1088 @ 75% into s6 [[phab:T255927|T255927]]', diff saved to https://phabricator.wikimedia.org/P11648 and previous config saved to /var/cache/conftool/dbconfig/20200624-095338-kormat.json
* 09:50 volans@cumin1001: START - Cookbook sre.dns.netbox
* 09:36 kormat@cumin1001: dbctl commit (dc=all): 'Pool db1088 @ 50% into s6 [[phab:T255927|T255927]]', diff saved to https://phabricator.wikimedia.org/P11647 and previous config saved to /var/cache/conftool/dbconfig/20200624-093624-kormat.json
* 09:13 marostegui@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 09:10 marostegui@cumin2001: START - Cookbook sre.hosts.downtime
* 08:40 moritzm: prune remaining nginx packages on mw* servers [[phab:T255565|T255565]]
* 08:31 kormat@cumin1001: dbctl commit (dc=all): 'Pool db1088 @ 20% into s6 [[phab:T255927|T255927]]', diff saved to https://phabricator.wikimedia.org/P11645 and previous config saved to /var/cache/conftool/dbconfig/20200624-083120-kormat.json
* 08:06 moritzm: re-enable puppet in eqiad
* 08:04 marostegui@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 08:04 marostegui@cumin2001: START - Cookbook sre.hosts.downtime
* 08:00 moritzm: disable puppet in eqiad to unblock puppetdb1002 VM migration
* 07:22 gehel: restarting blazegraph on wdqs1007
* 06:53 moritzm: draining ganeti1009 for eventual reboot
* 06:28 XioNoX: enable peering BGP sessions on AMS-IX - [[phab:T253970|T253970]]
* 05:59 XioNoX: disable peering BGP sessions on AMS-IX - [[phab:T253970|T253970]]
* 05:34 marostegui@cumin2001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 05:33 marostegui@cumin2001: START - Cookbook sre.hosts.decommission
* 05:14 marostegui: Remove grants from dbproxy1008 - [[phab:T231280|T231280]] [[phab:T255406|T255406]]
* 05:03 marostegui: Remove revision triggers from db1125:·3316
* 05:02 marostegui@cumin2001: dbctl commit (dc=all): 'Depool db1085 for MCR schema change', diff saved to https://phabricator.wikimedia.org/P11643 and previous config saved to /var/cache/conftool/dbconfig/20200624-050235-marostegui.json
* 04:53 marostegui: Reload haproxy on dbproxy1012 and dbproxy1014
* 00:35 ejegg: restarted fundraising jobs on main CiviCRM box
* 00:33 ejegg: updated Fundraising CiviCRM from {{Gerrit|f01b036128}} to {{Gerrit|52a32f2d66}}
 
== 2020-06-23 ==
* 23:16 wkandek: releases1002 is back after being moved to row D ([[phab:T255590|T255590]])
* 23:11 dzahn@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
* 22:35 ejegg: disabled fundraising jobs on civi1001 for testing on civi2001
* 22:24 wkandek@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
* 22:13 AndyRussG: updated payments-wiki from {{Gerrit|5fd4eb1519}} to {{Gerrit|28ad76dcd7}}
* 22:06 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm
* 21:23 wkandek@cumin1001: START - Cookbook sre.ganeti.makevm
* 21:23 dzahn@cumin1001: END (ERROR) - Cookbook sre.ganeti.makevm (exit_code=97)
* 21:23 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm
* 21:22 wkandek@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99)
* 21:22 wkandek@cumin1001: START - Cookbook sre.ganeti.makevm
* 21:22 dzahn@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99)
* 21:22 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm
* 21:15 wkandek@cumin1001: END (ERROR) - Cookbook sre.hosts.decommission (exit_code=97)
* 21:14 wkandek@cumin1001: START - Cookbook sre.hosts.decommission
* 20:31 otto@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Migrate TemplateWizard from EventLogging to EventGate on all wikis - take 2 - [[phab:T238230|T238230]] (duration: 01m 06s)
* 19:16 otto@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Migrate TemplateWizard from EventLogging to EventGate on all wikis - [[phab:T238230|T238230]] (duration: 01m 05s)
* 19:06 brennen@deploy1001: rebuilt and synchronized wikiversions files: group0 wikis to 1.35.0-wmf.38
* 18:55 mutante: gerrit1001 (prod) - restarting gerrit service to verify config changes
* 18:53 otto@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Migrate TemplateWizard from EventLogging to EventGate on group0 - [[phab:T238230|T238230]] (duration: 01m 06s)
* 18:24 reedy@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [[phab:T254925|T254925]] [[phab:T246489|T246489]] (duration: 01m 06s)
* 18:04 brennen@deploy1001: Finished scap: testwikis wikis to 1.35.0-wmf.38 (duration: 85m 53s)
* 16:39 brennen@deploy1001: Started scap: testwikis wikis to 1.35.0-wmf.38
* 16:01 brennen: 1.35.0-wmf.38 was branched at {{Gerrit|a35f7318}} for https://phabricator.wikimedia.org/T254175
* 15:47 moritzm: prune nginx packages on mwdebug hosts [[phab:T255565|T255565]]
* 15:37 moritzm: prune nginx packages on mw1380-mw1412 [[phab:T255565|T255565]]
* 15:28 moritzm: installing libvpx security updates
* 15:27 mutante: removing ganeti VM xhgui1001 from eqiad row_A, will recreate in another row for rebalancing VMs between rows ([[phab:T180761|T180761]] [[phab:T238098|T238098]])
* 15:26 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 15:18 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
* 15:12 mutante: removing ganeti VM releases1002 in eqiad row_A - will recreate in another row to re-balance ([[phab:T255590|T255590]])
* 15:12 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 15:10 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
* 14:56 moritzm: failover ganeti master in eqiad to ganeti1011
* 14:55 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 14:50 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
* 14:48 urbanecm@deploy1001: Synchronized private/PrivateSettings.php: [[phab:T250887|T250887]] (duration: 00m 58s)
* 14:08 mholloway-shell@deploy1001: Finished deploy [recommendation-api/deploy@db7fd80]: Update recommendation-api to {{Gerrit|7e00177}} (duration: 03m 13s)
* 14:05 mholloway-shell@deploy1001: Started deploy [recommendation-api/deploy@db7fd80]: Update recommendation-api to {{Gerrit|7e00177}}
* 13:54 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 13:54 jmm@cumin2001: START - Cookbook sre.hosts.downtime
* 13:54 jmm@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 13:54 jmm@cumin2001: START - Cookbook sre.hosts.downtime
* 13:34 moritzm: draining ganeti1012 for eventual reboot
* 13:34 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 13:28 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
* 12:56 jynus@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 12:54 jynus@cumin1001: START - Cookbook sre.hosts.downtime
* 12:45 moritzm: draining ganeti1011 for eventual reboot
* 12:45 marostegui: Deploy schema change on s6 codfw master (lag will appear on codfw) - [[phab:T253276|T253276]]
* 12:36 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 12:31 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
* 12:00 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 11:56 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
* 11:35 awight: EU BACON cooked
* 11:34 awight@deploy1001: Synchronized php-1.35.0-wmf.37/extensions/TwoColConflict/: BACON: [[gerrit:607248{{!}}Fix broken copy link in JS mode (T253724)]] (duration: 00m 57s)
* 11:07 mlitn@deploy1001: Synchronized wmf-config/InitialiseSettings.php: test commons: Use the database name in the Wikibase entity source config (duration: 00m 59s)
* 11:04 moritzm: draining ganeti1008 for eventual reboot
* 10:58 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 10:55 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
* 10:42 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 10:42 jmm@cumin2001: START - Cookbook sre.hosts.downtime
* 10:40 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 10:38 moritzm: temporarily shutdown xhgui1001/releases1002 to reshuffle Ganeti instances for reboots
* 10:38 kormat@cumin1001: START - Cookbook sre.hosts.downtime
* 10:35 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 10:35 jmm@cumin2001: START - Cookbook sre.hosts.downtime
* 10:22 kormat: reimaging db1088 to buster [[phab:T250666|T250666]]
* 10:03 jynus@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 10:01 jynus@cumin2001: START - Cookbook sre.hosts.downtime
* 09:48 jbond42: add new CI check for cloud yaml data https://gerrit.wikimedia.org/r/c/operations/puppet/+/606444/
* 09:46 jynus: stopping and reimaging db2101 into buster [[phab:T254871|T254871]]
* 09:32 marostegui: Reload haproxy on dbproxy1012 and dbproxy1014 to test db1097 as secondary for 24h [[phab:T254556|T254556]]
* 08:46 ema: mwmaint1002: add uid=abban,ou=people,dc=wikimedia,dc=org to group 'nda' [[phab:T255775|T255775]]
* 08:38 XioNoX: re-enable peering BGP sessions on AMS-IX - [[phab:T253970|T253970]]
* 08:03 moritzm: draining ganeti1007 for eventual reboot
* 07:58 XioNoX: restart scs-a8-eqiad - [[phab:T256101|T256101]]
* 07:51 marostegui@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 07:49 marostegui@cumin2001: START - Cookbook sre.hosts.downtime
* 07:42 marostegui: Deploy schema change on db1088
* 07:30 marostegui: Reimage db2133 (m2 codfw master) to Buster (this will trigger haproxy IRC alert) [[phab:T250666|T250666]]
* 07:01 marostegui@cumin2001: dbctl commit (dc=all): 'Fully repool db1118', diff saved to https://phabricator.wikimedia.org/P11637 and previous config saved to /var/cache/conftool/dbconfig/20200623-070120-marostegui.json
* 06:06 XioNoX: disable peering BGP sessions on AMS-IX - [[phab:T253970|T253970]]
* 05:24 marostegui: Compress InnoDB on db1080 [[phab:T254462|T254462]]
* 05:23 marostegui@cumin2001: dbctl commit (dc=all): 'Depool db1080 for InnoDB compression', diff saved to https://phabricator.wikimedia.org/P11636 and previous config saved to /var/cache/conftool/dbconfig/20200623-052350-marostegui.json
* 05:22 marostegui@cumin2001: dbctl commit (dc=all): 'Slowly repool db1118', diff saved to https://phabricator.wikimedia.org/P11635 and previous config saved to /var/cache/conftool/dbconfig/20200623-052254-marostegui.json
* 05:12 marostegui@cumin2001: dbctl commit (dc=all): 'Slowly repool db1118', diff saved to https://phabricator.wikimedia.org/P11634 and previous config saved to /var/cache/conftool/dbconfig/20200623-051159-marostegui.json
* 05:03 marostegui@cumin2001: dbctl commit (dc=all): 'Slowly repool db1118', diff saved to https://phabricator.wikimedia.org/P11633 and previous config saved to /var/cache/conftool/dbconfig/20200623-050314-marostegui.json
 
== 2020-06-22 ==
* 23:41 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: touch for [[phab:T247330|T247330]] (duration: 00m 56s)
* 23:36 catrope@deploy1001: Synchronized dblists/: Close trwikinews ([[phab:T247330|T247330]]) (duration: 00m 58s)
* 23:28 RoanKattouw: Synchronized wmf-config/InitialiseSettings.php: Create rollbacker group on elwiktionary ([[phab:T255569|T255569]])  (typoed the task number before)
* 23:26 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Create rollbacker group on elwiktionary ([[phab:T225569|T225569]]) (duration: 00m 56s)
* 23:21 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Add localized sitename for bewikibooks ([[phab:T253962|T253962]]) (duration: 00m 57s)
* 23:16 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Add domains to wgCopyUploadsDomains ([[phab:T255336|T255336]], [[phab:T255363|T255363]], [[phab:T255386|T255386]], [[phab:T255313|T255313]]) (duration: 01m 01s)
* 22:39 bstorm_: downtimed labstore1005 to prevent an alert during puppet merge [[phab:T253353|T253353]]
* 22:38 volans@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 22:35 volans@cumin1001: START - Cookbook sre.dns.netbox
* 22:16 ebernhardson@deploy1001: Finished deploy [wikimedia/discovery/analytics@f2002c8]: bump glent jar to 0.2.2 (duration: 00m 56s)
* 22:15 ebernhardson@deploy1001: Started deploy [wikimedia/discovery/analytics@f2002c8]: bump glent jar to 0.2.2
* 22:12 volans: cleanup interfaces and addresses in Netbox for offline servers - [[phab:T233183|T233183]]
* 21:59 ebernhardson@deploy1001: Finished deploy [wikimedia/discovery/analytics@6e7f9f7]: bump glent jar to 0.2.2 (duration: 00m 18s)
* 21:58 ebernhardson@deploy1001: Started deploy [wikimedia/discovery/analytics@6e7f9f7]: bump glent jar to 0.2.2
* 17:19 mutante: gerrit1002 - let puppet remove [database] secttion from config; restart gerrit another time
* 17:14 mutante: gerrit1002 (gerrit-test): re-enabled puppet, restarted gerrit service
* 16:58 volans@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 16:49 volans@cumin1001: START - Cookbook sre.dns.netbox
* 15:05 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 15:01 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
* 14:48 moritzm: installing mutt security updates
* 14:47 Amir1: creating shnwiktionary is done
* 14:44 ladsgroup@deploy1001: Synchronized wmf-config/interwiki.php: Update interwiki cache (duration: 02m 58s)
* 14:42 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 14:41 ladsgroup@deploy1001: Synchronized static/images/project-logos/: Creating shnwiktionary ([[phab:T253029|T253029]]) (duration: 00m 56s)
* 14:40 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Creating shnwiktionary ([[phab:T253029|T253029]]) (duration: 00m 56s)
* 14:39 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
* 14:37 ladsgroup@deploy1001: rebuilt and synchronized wikiversions files: Creating shnwiktionary ([[phab:T253029|T253029]])
* 14:36 ladsgroup@deploy1001: Synchronized dblists: Creating shnwiktionary ([[phab:T253029|T253029]]) (duration: 00m 58s)
* 14:16 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 14:10 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
* 14:05 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 13:59 moritzm: re-enabling Puppet in codfw
* 13:55 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
* 13:51 moritzm: disable Puppet in codfw to reduce puppetdb2002 memory activity, unblocking the migration of the Ganeti instance for a reboot
* 13:19 otto@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Bump eventlogging_Test schema version to 1.1.0 to pick up client_dt and set wgEventLoggingServiceUri for all wikis - [[phab:T238230|T238230]] (duration: 00m 58s)
* 13:11 marostegui: Stop MySQL on db2078 instances
* 12:53 vgutierrez: upgrade to trafficserver 8.0.8~rc0-1wm1 on cp5006 and cp5012
* 12:45 moritzm: draining ganeti2007 for eventual reboot
* 12:42 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 12:38 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
* 12:31 akosiaris: failover logstash2023 from ganeti2007->ganeti2023 for migration_downtime change to apply
* 12:26 volans@deploy1001: Finished deploy [homer/deploy@e9acec8]: Release v0.2.3 on cumin1001 now on buster (duration: 01m 25s)
* 12:24 volans@deploy1001: Started deploy [homer/deploy@e9acec8]: Release v0.2.3 on cumin1001 now on buster
* 12:22 volans@deploy1001: Finished deploy [homer/deploy@e9acec8]: Release v0.2.3 on cumin1001 now on buster (duration: 00m 03s)
* 12:22 volans@deploy1001: Started deploy [homer/deploy@e9acec8]: Release v0.2.3 on cumin1001 now on buster
* 11:53 Urbanecm: EU B&C window done
* 11:50 urbanecm@deploy1001: Synchronized php-1.35.0-wmf.37/extensions/VisualEditor/modules/: Backport: {{Gerrit|0a08066}}: Revert "Allow generic params to be passed to getWikitextFragment" ([[phab:T255785|T255785]]) (duration: 00m 58s)
* 11:45 marostegui@cumin2001: dbctl commit (dc=all): 'Fully repool db1094', diff saved to https://phabricator.wikimedia.org/P11627 and previous config saved to /var/cache/conftool/dbconfig/20200622-114554-marostegui.json
* 11:40 moritzm: draining ganeti2008 for eventual reboot
* 11:37 volans@deploy1001: Finished deploy [homer/deploy@e9acec8]: Release v0.2.3 on cumin1001 now on buster (duration: 00m 28s)
* 11:37 volans@deploy1001: Started deploy [homer/deploy@e9acec8]: Release v0.2.3 on cumin1001 now on buster
* 11:34 marostegui@cumin2001: dbctl commit (dc=all): 'Slowly repool db1094', diff saved to https://phabricator.wikimedia.org/P11625 and previous config saved to /var/cache/conftool/dbconfig/20200622-113401-marostegui.json
* 11:30 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|74e8295}}: IS: Cleanup some redundant rows (duration: 00m 56s)
* 11:29 Urbanecm: Run namespaceDupes.php for zh* projects ([[phab:T165593|T165593]])
* 11:24 marostegui@cumin2001: dbctl commit (dc=all): 'Slowly repool db1094', diff saved to https://phabricator.wikimedia.org/P11623 and previous config saved to /var/cache/conftool/dbconfig/20200622-112451-marostegui.json
* 11:24 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|db952ba}}: Add zh-hans and zh-hant translation of Module and Module_talk aliases for all Zh Projects ([[phab:T165593|T165593]]) (duration: 00m 56s)
* 11:16 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|1301fd4}}: Add import sources for gomwiktionary ([[phab:T255098|T255098]]) (duration: 00m 57s)
* 11:08 marostegui@cumin2001: dbctl commit (dc=all): 'Slowly repool db1094', diff saved to https://phabricator.wikimedia.org/P11622 and previous config saved to /var/cache/conftool/dbconfig/20200622-110806-marostegui.json
* 11:07 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|defa81e}}: Disable NS_USER(_TALK) search engine indexing on trwiki ([[phab:T255538|T255538]]) (duration: 00m 58s)
* 10:35 jdrewniak@deploy1001: Synchronized portals: Wikimedia Portals Update: [[gerrit:606985{{!}} Bumping portals to master (606985)]] (duration: 00m 56s)
* 10:34 jdrewniak@deploy1001: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: [[gerrit:606985{{!}} Bumping portals to master (606985)]] (duration: 01m 12s)
* 09:58 marostegui@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 09:56 marostegui@cumin2001: START - Cookbook sre.hosts.downtime
* 09:33 marostegui@cumin2001: dbctl commit (dc=all): 'Depool db1094 for reimage', diff saved to https://phabricator.wikimedia.org/P11621 and previous config saved to /var/cache/conftool/dbconfig/20200622-093323-marostegui.json
* 09:31 godog: roll-restart logstash in codfw/eqiad to apply configuration change
* 08:59 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 08:56 jmm@cumin2001: START - Cookbook sre.hosts.downtime
* 08:33 moritzm: reimaging cumin1001 to buster [[phab:T245114|T245114]]
* 08:13 godog: extend prometheus codfw ops filesystem to 1TB
* 08:02 vgutierrez: upgrade to trafficserver 8.0.8~rc0-1wm1 on cp4026 and cp4032
* 08:02 vgutierrez: upload trafficserver 8.0.8~rc0-1wm1 to apt.wm.o (buster)
* 07:33 marostegui@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 07:30 marostegui@cumin2001: START - Cookbook sre.hosts.downtime
* 07:16 marostegui: Reimage db1117 (irc haproxy alerts will be triggered)
* 06:26 marostegui@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 06:24 marostegui@cumin2001: START - Cookbook sre.hosts.downtime
* 06:06 marostegui: Stop MySQL on dbstore1005 for reimage to Buster - [[phab:T254870|T254870]]
* 05:58 marostegui: Compress InnoDb on db1118 [[phab:T254462|T254462]]
* 05:51 marostegui@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 05:49 marostegui@cumin2001: START - Cookbook sre.hosts.downtime
* 05:43 marostegui: Stop haproxy on dbproxy1008 - [[phab:T255406|T255406]]
* 05:33 marostegui@cumin2001: dbctl commit (dc=all): 'Depool db1118 for reimage and InnoDB compression', diff saved to https://phabricator.wikimedia.org/P11617 and previous config saved to /var/cache/conftool/dbconfig/20200622-053334-marostegui.json
* 05:31 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1134', diff saved to https://phabricator.wikimedia.org/P11616 and previous config saved to /var/cache/conftool/dbconfig/20200622-053104-marostegui.json
* 05:17 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1134', diff saved to https://phabricator.wikimedia.org/P11615 and previous config saved to /var/cache/conftool/dbconfig/20200622-051730-marostegui.json
* 05:17 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1134', diff saved to https://phabricator.wikimedia.org/P11614 and previous config saved to /var/cache/conftool/dbconfig/20200622-051720-marostegui.json
* 05:03 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1134', diff saved to https://phabricator.wikimedia.org/P11613 and previous config saved to /var/cache/conftool/dbconfig/20200622-050259-marostegui.json
* 04:50 marostegui: Deploy schema change on s3 primary master with a big sleep between wikis - [[phab:T250066|T250066]]
* 04:48 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1134', diff saved to https://phabricator.wikimedia.org/P11612 and previous config saved to /var/cache/conftool/dbconfig/20200622-044853-marostegui.json
 
== 2020-06-20 ==
* 22:56 cdanis@cumin2001: dbctl commit (dc=all): 'db1088 seems to have crashed', diff saved to https://phabricator.wikimedia.org/P11611 and previous config saved to /var/cache/conftool/dbconfig/20200620-225624-cdanis.json
* 07:42 elukey: powercycle an-worker1093 - bug soft lock up CPU showed in mgmt console
* 07:36 elukey: powercycle an-worker1091 - bug soft lock up CPU showed in mgmt console
 
== 2020-06-19 ==
* 18:10 otto@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Bump eventlogging_Test schema version to 1.1.0 to pick up client_dt - [[phab:T238230|T238230]] (duration: 00m 59s)
* 16:07 mutante: ganeti4003 - rebooting install4001 - trying to bootstrap OS install from install2003
* 15:47 dzahn@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
* 15:28 godog: roll-restart kibana to apply new settings
* 13:01 moritzm: installing cups security updates (client side libs/tools)
* 12:31 qchris: Disabling puppet on gerrit1002 (test instance) to do some more testing
* 12:14 godog: delete march indices from logstash 5 eqiad to free up space
* 12:12 marostegui@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 12:10 marostegui@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 12:08 marostegui@cumin2001: START - Cookbook sre.hosts.downtime
* 12:07 marostegui@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 12:06 marostegui@cumin2001: START - Cookbook sre.hosts.downtime
* 12:05 marostegui@cumin2001: START - Cookbook sre.hosts.downtime
* 11:39 marostegui: Reimage db2116 db2119 db2130
* 10:55 moritzm: installing mesa security updates
* 10:49 godog: close april logstash indices on logstash 5 eqiad
* 10:45 moritzm: installing tomcat8 security updates
* 10:38 jayme: imported chartmuseum_0.12.0-1 to buster-wikimedia
* 10:24 marostegui@cumin2001: dbctl commit (dc=all): 'Repool db1093', diff saved to https://phabricator.wikimedia.org/P11604 and previous config saved to /var/cache/conftool/dbconfig/20200619-102447-marostegui.json
* 10:21 godog: start closing logstash indices for 2020.03 in elastic 5 eqiad
* 09:22 godog: restart elasticsearch on logstash1010
* 09:14 apergos: rsync from dumpsdata1003 as root to labstore1007 of dumps output files to catch up, with --bwlimit=160000 up from 80000
* 08:45 volans: backup netbox and run one-time script to reserve first IPs on all infra prefixes on Netbox - [[phab:T233183|T233183]]
* 08:45 godog: roll restart elasticsearch_5@production-logstash-eqiad
* 08:26 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 08:21 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
* 08:18 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 08:15 godog: roll-restart logstash elk5 for "JVM GC Old generation-s runs" alert
* 08:12 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
* 08:00 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 07:59 marostegui@cumin2001: dbctl commit (dc=all): 'Depool db1093', diff saved to https://phabricator.wikimedia.org/P11601 and previous config saved to /var/cache/conftool/dbconfig/20200619-075907-marostegui.json
* 07:54 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
* 07:52 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 07:47 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
* 07:47 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 07:44 marostegui@cumin2001: dbctl commit (dc=all): 'Repool db1098:3316', diff saved to https://phabricator.wikimedia.org/P11600 and previous config saved to /var/cache/conftool/dbconfig/20200619-074420-marostegui.json
* 07:39 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
* 07:28 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 07:23 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
* 07:22 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 07:16 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
* 07:15 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 07:10 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
* 07:09 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 07:03 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
* 07:02 moritzm: rebooting ganeti nodes in eqiad for kernel security updates
* 06:57 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 06:51 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
* 06:47 moritzm: force reinstall of memcached 1.6 deb packages to ensure that the override is used in addition to the unmodified systemd unit from the deb [[phab:T233933|T233933]]
* 06:39 marostegui@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 06:36 marostegui@cumin2001: START - Cookbook sre.hosts.downtime
* 06:20 marostegui: Stop mysql on db2132 to reimage m1 codfw master - [[phab:T254556|T254556]]
* 06:19 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db2075 db2111', diff saved to https://phabricator.wikimedia.org/P11599 and previous config saved to /var/cache/conftool/dbconfig/20200619-061922-marostegui.json
* 06:05 marostegui@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 06:02 marostegui@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 06:01 marostegui@cumin2001: START - Cookbook sre.hosts.downtime
* 06:00 marostegui@cumin2001: START - Cookbook sre.hosts.downtime
* 05:54 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1112', diff saved to https://phabricator.wikimedia.org/P11598 and previous config saved to /var/cache/conftool/dbconfig/20200619-055430-marostegui.json
* 05:41 marostegui@cumin2001: dbctl commit (dc=all): 'Depool db2075 and db2111 for reimage', diff saved to https://phabricator.wikimedia.org/P11597 and previous config saved to /var/cache/conftool/dbconfig/20200619-054118-marostegui.json
* 05:34 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db2108', diff saved to https://phabricator.wikimedia.org/P11596 and previous config saved to /var/cache/conftool/dbconfig/20200619-053402-marostegui.json
* 05:25 marostegui@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 05:23 marostegui@cumin2001: START - Cookbook sre.hosts.downtime
* 04:44 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2108 for reimage', diff saved to https://phabricator.wikimedia.org/P11595 and previous config saved to /var/cache/conftool/dbconfig/20200619-044440-marostegui.json
* 04:39 marostegui@cumin2001: dbctl commit (dc=all): 'Depool db1098:3316', diff saved to https://phabricator.wikimedia.org/P11594 and previous config saved to /var/cache/conftool/dbconfig/20200619-043956-marostegui.json
* 04:35 marostegui@cumin2001: dbctl commit (dc=all): 'Depool db1112', diff saved to https://phabricator.wikimedia.org/P11593 and previous config saved to /var/cache/conftool/dbconfig/20200619-043554-marostegui.json
 
== 2020-06-18 ==
* 22:30 otto@deploy1001: Synchronized wmf-config/InitialiseSettings.php: EventLogging to EventGate: - SearchSatisfaction on all wikis - [[phab:T249261|T249261]] (duration: 00m 56s)
* 21:14 volans: start check-homer-diff.service on cumin2001 after merging the fix r/606526
* 20:17 otto@deploy1001: Synchronized wmf-config/InitialiseSettings.php: EventLogging to EventGate: - SearchSatisfaction on all wikis - [[phab:T249261|T249261]] (duration: 00m 57s)
* 19:44 otto@deploy1001: Synchronized wmf-config/InitialiseSettings.php: EventLogging to EventGate: - SearchSatisfaction on group1 wikis - [[phab:T249261|T249261]] (duration: 00m 57s)
* 18:53 wkandek@cumin1001: conftool action : set/pooled=yes; selector: name=mw2339.codfw.wmnet
* 18:35 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 18:32 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 17:16 wkandek@cumin1001: conftool action : set/pooled=no; selector: name=mw2339.codfw.wmnet
* 17:14 wkandek@cumin1001: conftool action : set/pooled=yes; selector: name=mw2339.codfw.wmnet
* 17:12 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2339.codfw.wmnet
* 16:51 maryum: reindex suspended until deployment of code
* 16:49 hnowlan: Shut off non-dockerised deployment-prep instance of changeprop
* 16:15 maryum: reindexing French wiki in Elasticsearch
* 15:37 Reedy: creatd bot_passwords tables on officeiwki and otrs_wikiwiki [[phab:T254925|T254925]] [[phab:T246489|T246489]]
* 15:34 moritzm: installing harfbuzz security updates
* 15:23 moritzm: installing Ruby 2.1 security updates
* 15:15 moritzm: installing python-django security updates (packaged buster version)
* 15:04 moritzm: installing bind updates on jessie (client side tools/libs)
* 14:19 marostegui@cumin2001: dbctl commit (dc=all): 'Repool db1078', diff saved to https://phabricator.wikimedia.org/P11591 and previous config saved to /var/cache/conftool/dbconfig/20200618-141941-marostegui.json
* 14:14 moritzm: failover ganeti master in codfw to ganeti2021
* 14:03 marostegui@cumin2001: dbctl commit (dc=all): 'Depool db1078 for schema change', diff saved to https://phabricator.wikimedia.org/P11590 and previous config saved to /var/cache/conftool/dbconfig/20200618-140352-marostegui.json
* 14:02 marostegui@cumin2001: dbctl commit (dc=all): 'Repool db1075', diff saved to https://phabricator.wikimedia.org/P11589 and previous config saved to /var/cache/conftool/dbconfig/20200618-140203-marostegui.json
* 13:53 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 13:53 akosiaris@cumin1001: START - Cookbook sre.hosts.downtime
* 13:52 akosiaris: restart logstash2005 for applying an increased ganeti migration_downtime of 10k
* 13:47 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 13:43 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
* 13:40 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 13:34 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
* 13:11 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 13:05 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
* 12:52 marostegui@cumin2001: dbctl commit (dc=all): 'Depool db1075 for schema change', diff saved to https://phabricator.wikimedia.org/P11586 and previous config saved to /var/cache/conftool/dbconfig/20200618-125216-marostegui.json
* 12:48 marostegui@cumin1001: dbctl commit (dc=all): 'Remove weight from es5 master as es1024 is fully repooled now', diff saved to https://phabricator.wikimedia.org/P11585 and previous config saved to /var/cache/conftool/dbconfig/20200618-124801-marostegui.json
* 12:23 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 12:20 kormat@cumin1001: START - Cookbook sre.hosts.downtime
* 12:05 kormat: reimaging db1077 for final test [[phab:T251768|T251768]]
* 11:51 jbond@deploy1001: Synchronized wmf-config/CommonSettings-labs.php: (no justification provided) (duration: 01m 00s)
* 11:06 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 11:00 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
* 10:34 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 10:34 jmm@cumin2001: START - Cookbook sre.hosts.downtime
* 09:54 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 09:48 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
* 09:40 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db2076', diff saved to https://phabricator.wikimedia.org/P11583 and previous config saved to /var/cache/conftool/dbconfig/20200618-094001-marostegui.json
* 09:39 akosiaris: update wikifeeds to latest chart version in codfw
* 09:39 akosiaris@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
* 09:38 marostegui@cumin2001: dbctl commit (dc=all): 'Repool es2022', diff saved to https://phabricator.wikimedia.org/P11582 and previous config saved to /var/cache/conftool/dbconfig/20200618-093803-marostegui.json
* 09:38 akosiaris: uncordon kubernetes20<nowiki>{</nowiki>07..14<nowiki>}</nowiki> and kubernetes10<nowiki>{</nowiki>07..14<nowiki>}</nowiki>. Nodes are now fully put in rotation and ready to receive production traffic
* 09:34 marostegui: Deploy schema change on s3 codfw master (this will create lag on codfw) - [[phab:T250066|T250066]]
* 09:31 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 09:30 godog: temp stop logstash on elk7 to test 8 pipeline workers - [[phab:T255243|T255243]]
* 09:25 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
* 09:15 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 09:09 marostegui@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 09:08 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
* 09:06 marostegui@cumin2001: START - Cookbook sre.hosts.downtime
* 09:04 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 08:59 marostegui@cumin2001: dbctl commit (dc=all): 'Fully repool es1025', diff saved to https://phabricator.wikimedia.org/P11581 and previous config saved to /var/cache/conftool/dbconfig/20200618-085927-marostegui.json
* 08:59 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
* 08:50 ayounsi@cumin2001: END (FAIL) - Cookbook sre.network.prepare-upgrade (exit_code=1)
* 08:49 ayounsi@cumin2001: START - Cookbook sre.network.prepare-upgrade
* 08:49 ayounsi@cumin2001: END (FAIL) - Cookbook sre.network.prepare-upgrade (exit_code=99)
* 08:49 ayounsi@cumin2001: START - Cookbook sre.network.prepare-upgrade
* 08:49 marostegui@cumin2001: dbctl commit (dc=all): 'Slowly repool es1025', diff saved to https://phabricator.wikimedia.org/P11580 and previous config saved to /var/cache/conftool/dbconfig/20200618-084929-marostegui.json
* 08:47 marostegui@cumin2001: dbctl commit (dc=all): 'Depool es2022 for reimage', diff saved to https://phabricator.wikimedia.org/P11578 and previous config saved to /var/cache/conftool/dbconfig/20200618-084720-marostegui.json
* 08:47 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 08:40 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
* 08:37 marostegui@cumin2001: dbctl commit (dc=all): 'Slowly repool es1025', diff saved to https://phabricator.wikimedia.org/P11577 and previous config saved to /var/cache/conftool/dbconfig/20200618-083749-marostegui.json
* 08:25 elukey: change archiva-ci password in archiva
* 08:24 marostegui@cumin2001: dbctl commit (dc=all): 'Slowly repool es1025', diff saved to https://phabricator.wikimedia.org/P11576 and previous config saved to /var/cache/conftool/dbconfig/20200618-082432-marostegui.json
* 08:12 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 08:10 marostegui@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 08:08 marostegui@cumin2001: START - Cookbook sre.hosts.downtime
* 08:06 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
* 08:05 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 08:00 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
* 07:57 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 07:51 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
* 07:50 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 07:45 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
* 07:41 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 07:41 marostegui: Reimage es1025
* 07:35 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
* 07:34 marostegui@cumin2001: dbctl commit (dc=all): 'Repool db1136', diff saved to https://phabricator.wikimedia.org/P11574 and previous config saved to /var/cache/conftool/dbconfig/20200618-073414-marostegui.json
* 07:33 ayounsi@cumin2001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 07:31 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 07:25 ayounsi@cumin2001: START - Cookbook sre.dns.netbox
* 07:24 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
* 07:22 moritzm: rolling reboot of ganeti servers in codfw
* 07:10 ayounsi@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
* 07:07 ayounsi@cumin1001: START - Cookbook sre.dns.netbox
* 04:50 marostegui@cumin2001: dbctl commit (dc=all): 'Depool db1136', diff saved to https://phabricator.wikimedia.org/P11573 and previous config saved to /var/cache/conftool/dbconfig/20200618-045047-marostegui.json
 
== 2020-06-17 ==
* 23:25 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|0e7079d}}: Install DiscussionTools on all wikis (attempt 2) ([[phab:T252264|T252264]]; [[phab:T253943|T253943]]) (duration: 00m 56s)
* 23:23 urbanecm@deploy1001: Synchronized php-1.35.0-wmf.36/extensions/DiscussionTools/includes/Hooks.php: {{Gerrit|ff01083}}: Use $wgLocaltimezone global instead of request context ([[phab:T255704|T255704]]) (duration: 00m 57s)
* 23:21 urbanecm@deploy1001: Synchronized php-1.35.0-wmf.37/extensions/DiscussionTools/includes/Hooks.php: {{Gerrit|4551d29}}: Use $wgLocaltimezone global instead of request context ([[phab:T252264|T252264]]; [[phab:T253943|T253943]]; [[phab:T255704|T255704]]) (duration: 00m 58s)
* 23:01 ryankemper@deploy1001: Finished deploy [wdqs/wdqs@79fb82f]: 0.3.39 (duration: 14m 38s)
* 22:47 ryankemper@deploy1001: Started deploy [wdqs/wdqs@79fb82f]: 0.3.39
* 21:01 ryankemper@cumin2001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 20:32 hashar: Fixed up zuul-merger on contint1001 due to some faulty hotfix
* 20:08 hashar: Stopped zuul-merger on contint1001
* 19:21 marostegui: Deploy schema change on s6 codfw master [[phab:T238966|T238966]]
* 19:17 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1094', diff saved to https://phabricator.wikimedia.org/P11572 and previous config saved to /var/cache/conftool/dbconfig/20200617-191723-marostegui.json
* 19:11 ryankemper@cumin2001: START - Cookbook sre.wdqs.data-transfer
* 19:08 ryankemper@cumin2001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 19:05 ryankemper@cumin2001: START - Cookbook sre.wdqs.data-transfer
* 18:57 milimetric@deploy1001: Finished deploy [analytics/refinery@6640d6f] (thin): Quick fix for data quality bundles (THIN) (duration: 00m 10s)
* 18:57 milimetric@deploy1001: Started deploy [analytics/refinery@6640d6f] (thin): Quick fix for data quality bundles (THIN)
* 18:52 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 18:49 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 18:44 milimetric@deploy1001: Finished deploy [analytics/refinery@6640d6f]: Quick fix for data quality bundles (duration: 27m 55s)
* 18:41 Urbanecm: Morning B&C window done
* 18:31 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|96153f9}}: Add temporary logging for mediamoderation ([[phab:T247943|T247943]]) (duration: 00m 56s)
* 18:24 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: REVERT: {{Gerrit|ae76450}}: Install DiscussionTools on all wikis ([[phab:T252264|T252264]]; [[phab:T253943|T253943]]) (duration: 00m 34s)
* 18:22 urbanecm@deploy1001: scap failed: average error rate on 3/9 canaries increased by 10x (rerun with --force to override this check, see https://logstash.wikimedia.org/goto/e474f13ffac6b8c3bf919c4aeafc8c9b for details)
* 18:21 urbanecm@deploy1001: scap failed: average error rate on 9/9 canaries increased by 10x (rerun with --force to override this check, see https://logstash.wikimedia.org/goto/e474f13ffac6b8c3bf919c4aeafc8c9b for details)
* 18:16 milimetric@deploy1001: Started deploy [analytics/refinery@6640d6f]: Quick fix for data quality bundles
* 18:14 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|c9f6452}}: Set DiscussionToolsEnableVisual to true by default ([[phab:T251654|T251654]]) (duration: 00m 56s)
* 18:05 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 18:04 elukey@cumin1001: START - Cookbook sre.hosts.decommission
* 16:57 otto@deploy1001: Synchronized wmf-config/InitialiseSettings.php: EventLogging to EventGate: - SearchSatisfaction on group0 wikis - [[phab:T249261|T249261]] (duration: 00m 56s)
* 16:00 marostegui@cumin2001: dbctl commit (dc=all): 'Depool db1094', diff saved to https://phabricator.wikimedia.org/P11571 and previous config saved to /var/cache/conftool/dbconfig/20200617-160013-marostegui.json
* 15:28 godog: temp bump logstash7 workers to 8 and temp stop logstash - [[phab:T255243|T255243]]
* 15:17 jforrester@deploy1001: Synchronized private/PrivateSettings.php: [[phab:T247943|T247943]] Add API key and recipient config for MediaModeration (duration: 00m 55s)
* 15:17 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2338.codfw.wmnet
* 15:11 dzahn@cumin1001: conftool action : set/weight=15; selector: name=mw233[5-9].codfw.wmnet
* 15:11 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: [[phab:T247943|T247943]] Install MediaModeration extension - III: Install where enabled (duration: 00m 56s)
* 15:10 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2335.codfw.wmnet
* 15:09 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2336.codfw.wmnet
* 15:09 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2337.codfw.wmnet
* 15:09 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2339.codfw.wmnet
* 15:08 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw233[5-9].codfw.wmnet
* 14:58 jforrester@deploy1001: Synchronized php-1.35.0-wmf.37/extensions/GrowthExperiments/modules/help/ext.growthExperiments.HelpPanelProcessDialog.js: [[phab:T255607|T255607]] Fix help panel sizing logic (duration: 00m 56s)
* 14:54 hnowlan@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' .
* 14:52 hnowlan@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' .
* 14:52 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 14:50 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 14:49 mdholloway: rolled back recommendation-api deployment due to canary endpoint check failure ([[phab:T255683|T255683]])
* 14:44 mholloway-shell@deploy1001: Finished deploy [recommendation-api/deploy@c39d567]: Update recommendation-api to {{Gerrit|db97742}} (duration: 01m 16s)
* 14:43 mholloway-shell@deploy1001: Started deploy [recommendation-api/deploy@c39d567]: Update recommendation-api to {{Gerrit|db97742}}
* 14:30 akosiaris: redrain kubernetes1007-14
* 14:27 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 14:27 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 14:27 mutante: disabling puppet on icinga to avoid alert spam when adding new appservers
* 14:25 akosiaris@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
* 14:22 akosiaris: uncordon kubernetes10<nowiki>{</nowiki>07..14<nowiki>}</nowiki> again
* 14:13 mutante: generating new mcrouter certs for mw2335 - mw2339 ([[phab:T247021|T247021]])
* 14:02 mutante: rebooting mw2335 through mw2339 (not in service)
* 13:51 XioNoX: cleanup msw1-codfw interfaces
* 13:44 akosiaris: redrain kubernetes1007-14
* 13:37 akosiaris@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'mathoid' for release 'production' .
* 13:35 akosiaris@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'blubberoid' for release 'production' .
* 13:31 otto@deploy1001: Synchronized wmf-config/InitialiseSettings.php: EventLogging to EventGate: - SearchSatisfaction on testwiki version 1.1.0 - [[phab:T249261|T249261]] (duration: 00m 58s)
* 13:30 moritzm: upgrade remaining parsoid nodes to PHP 7.2.31
* 13:21 jbond42: re-enable puppet on C:memcached nodes
* 13:04 marostegui@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 13:04 marostegui: The above db1129 depool was meant to be a repool, wrong commit message
* 13:03 liw@deploy1001: rebuilt and synchronized wikiversions files: all wikis to 1.35.0-wmf.37
* 13:03 jbond42: disable puppet on C:memcache to deploy a new change
* 13:02 marostegui@cumin2001: dbctl commit (dc=all): 'Depool db1129', diff saved to https://phabricator.wikimedia.org/P11567 and previous config saved to /var/cache/conftool/dbconfig/20200617-130236-marostegui.json
* 13:02 akosiaris@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 13:00 marostegui@cumin2001: START - Cookbook sre.hosts.downtime
* 13:00 akosiaris@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 13:00 akosiaris@cumin2001: START - Cookbook sre.hosts.downtime
* 13:00 akosiaris@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 13:00 akosiaris@cumin2001: START - Cookbook sre.hosts.downtime
* 13:00 akosiaris@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 12:59 akosiaris@cumin2001: START - Cookbook sre.hosts.downtime
* 12:59 akosiaris@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 12:59 akosiaris@cumin2001: START - Cookbook sre.hosts.downtime
* 12:59 akosiaris@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 12:59 akosiaris@cumin2001: START - Cookbook sre.hosts.downtime
* 12:59 akosiaris@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 12:59 akosiaris@cumin2001: START - Cookbook sre.hosts.downtime
* 12:59 akosiaris@cumin2001: START - Cookbook sre.hosts.downtime
* 12:54 hnowlan: upgraded cpjobqueue to newer container image, rolled back
* 12:40 marostegui@cumin2001: dbctl commit (dc=all): 'Add db2091 to s8 [[phab:T253217|T253217]]', diff saved to https://phabricator.wikimedia.org/P11566 and previous config saved to /var/cache/conftool/dbconfig/20200617-124034-marostegui.json
* 12:32 hnowlan: Removed remaining changeprop systemd components from scb
* 12:06 marostegui@cumin2001: dbctl commit (dc=all): 'Depool db2076 to remove triggers from sanitarium [[phab:T238966|T238966]]', diff saved to https://phabricator.wikimedia.org/P11565 and previous config saved to /var/cache/conftool/dbconfig/20200617-120622-marostegui.json
* 11:59 Amir1: not today, just EU noon
* 11:59 Amir1: B&C is done for today
* 11:58 ladsgroup@deploy1001: Synchronized wmf-config/config/trwikisource.yaml: [[gerrit:605656{{!}}Change sidebar upload link destination for tr.wikisource (T253490)]] (duration: 01m 03s)
* 11:55 ladsgroup@deploy1001: Synchronized dblists/commonsuploads.dblist: [[gerrit:605656{{!}}Change sidebar upload link destination for tr.wikisource (T253490)]] (duration: 01m 04s)
* 11:48 hnowlan@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' .
* 11:47 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [[gerrit:605652{{!}}Add extended-confirmed group and restriction level for rowiki (T254471)]] (duration: 01m 04s)
* 11:30 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es1025 for reimage, give weight to es1023 (es5 master)', diff saved to https://phabricator.wikimedia.org/P11563 and previous config saved to /var/cache/conftool/dbconfig/20200617-113026-marostegui.json
* 11:23 ladsgroup@deploy1001: Synchronized php-1.35.0-wmf.37/extensions/GrowthExperiments/extension.json: [[gerrit:606122{{!}}Fix NewcomerTask schema (T255597)]] (duration: 01m 04s)
* 11:18 ladsgroup@deploy1001: Synchronized php-1.35.0-wmf.36/extensions/GrowthExperiments/extension.json: [[gerrit:606121{{!}}Fix NewcomerTask schema (T255597)]] (duration: 01m 06s)
* 11:07 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [[gerrit:606075{{!}}Set hiwiktionary timezone to Asia/Kolkata (T255531)]] (duration: 01m 05s)
* 10:48 marostegui@cumin2001: dbctl commit (dc=all): 'Remove db2091 from dbctl in s2 and s4', diff saved to https://phabricator.wikimedia.org/P11562 and previous config saved to /var/cache/conftool/dbconfig/20200617-104816-marostegui.json
* 10:40 marostegui@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 10:38 marostegui@cumin2001: START - Cookbook sre.hosts.downtime
* 10:31 liw@deploy1001: Synchronized php: group1 wikis to 1.35.0-wmf.37 (duration: 01m 04s)
* 10:30 liw@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.35.0-wmf.37
* 09:44 marostegui@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 09:42 marostegui@cumin2001: START - Cookbook sre.hosts.downtime
* 09:40 hnowlan: killing stale changeprop instances running on scb hosts
* 09:16 jforrester@deploy1001: Synchronized php-1.35.0-wmf.37/extensions/Flow/: [[phab:T255608|T255608]] Revert 'Hooks: Use PageMoveComplete instead of TitleMoveCompleting' (duration: 01m 05s)
* 09:15 marostegui@cumin2001: dbctl commit (dc=all): 'Fully repool db1113:3315, db1113:3316', diff saved to https://phabricator.wikimedia.org/P11558 and previous config saved to /var/cache/conftool/dbconfig/20200617-091509-marostegui.json
* 09:11 jforrester@deploy1001: Synchronized php-1.35.0-wmf.37/includes/HookContainer/DeprecatedHooks.php: [[phab:T255608|T255608]] Revert 'Hard deprecate the  hook' (duration: 01m 05s)
* 09:02 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [[phab:T247943|T247943]] Install MediaModeration extension - II: Add flag to IS (duration: 01m 05s)
* 08:56 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 08:56 jmm@cumin2001: START - Cookbook sre.hosts.downtime
* 08:52 marostegui@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 08:49 marostegui@cumin2001: START - Cookbook sre.hosts.downtime
* 08:47 marostegui@cumin2001: dbctl commit (dc=all): 'Slowly repool db1113:3315, db1113:3316', diff saved to https://phabricator.wikimedia.org/P11557 and previous config saved to /var/cache/conftool/dbconfig/20200617-084751-marostegui.json
* 08:44 marostegui@cumin2001: dbctl commit (dc=all): 'Slowly repool db1113:3315, db1113:3316', diff saved to https://phabricator.wikimedia.org/P11556 and previous config saved to /var/cache/conftool/dbconfig/20200617-084402-marostegui.json
* 08:43 jforrester@deploy1001: Synchronized php-1.35.0-wmf.37/includes/EditPage.php: [[phab:T255177|T255177]] [[phab:T255614|T255614]] Do not return internal edit status from EditPage (duration: 01m 08s)
* 08:31 marostegui@cumin2001: dbctl commit (dc=all): 'Slowly repool db1113:3315, db1113:3316', diff saved to https://phabricator.wikimedia.org/P11554 and previous config saved to /var/cache/conftool/dbconfig/20200617-083120-marostegui.json
* 08:30 godog: start logstash on logstash7 - [[phab:T255243|T255243]]
* 08:29 moritzm: prune nginx from remaining mw* servers in codfw [[phab:T255565|T255565]]
* 08:23 marostegui@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 08:20 marostegui@cumin2001: START - Cookbook sre.hosts.downtime
* 08:10 godog: stop logstash temporarily on logstash7 hosts to test increased es shards - [[phab:T255243|T255243]]
* 08:05 marostegui@cumin2001: dbctl commit (dc=all): 'Depool db1113:3315 db1113:3316', diff saved to https://phabricator.wikimedia.org/P11553 and previous config saved to /var/cache/conftool/dbconfig/20200617-080511-marostegui.json
* 07:53 elukey: reboot kafka-jumbo1009 for kernel upgrades
* 06:40 elukey: reboot krb1001 for kernel upgrades
* 06:24 elukey: reboot an-master100[1,2] for kernel upgrades
* 06:23 XioNoX: set lacp active on cr2-esams:ae2 - [[phab:T253970|T253970]]
* 06:15 tstarling@deploy1001: Synchronized wmf-config/PoolCounterSettings.php: test fast stale mode on testwiki [[phab:T250248|T250248]] (duration: 01m 17s)
* 06:03 elukey: reboot an-conf100[1-3] for kernel upgrades
* 05:45 elukey: reboot stat1007/8 for kernel upgrades
* 05:45 elukey: clean up old systemd timer config on an-coord1001 (came up after the last reboot)
* 05:42 volker-e@deploy1001: Finished deploy [design/style-guide@37c67dd]: Deploy design/style-guide:  (duration: 00m 05s)
* 05:42 volker-e@deploy1001: Started deploy [design/style-guide@37c67dd]: Deploy design/style-guide:
* 05:34 marostegui@cumin2001: dbctl commit (dc=all): 'Fully repool db1090:3312, db1090:3317', diff saved to https://phabricator.wikimedia.org/P11552 and previous config saved to /var/cache/conftool/dbconfig/20200617-053421-marostegui.json
* 05:29 marostegui: Deploy schema change on s7 codfw (lag will appear) - [[phab:T250066|T250066]]
* 05:28 marostegui@cumin2001: dbctl commit (dc=all): 'Slowly repool db1090:3312, db1090:3317', diff saved to https://phabricator.wikimedia.org/P11551 and previous config saved to /var/cache/conftool/dbconfig/20200617-052809-marostegui.json
* 05:22 marostegui@cumin2001: dbctl commit (dc=all): 'Slowly repool db1090:3312, db1090:3317', diff saved to https://phabricator.wikimedia.org/P11550 and previous config saved to /var/cache/conftool/dbconfig/20200617-052202-marostegui.json
* 05:19 marostegui@cumin2001: dbctl commit (dc=all): 'Slowly repool db1090:3312, db1090:3317', diff saved to https://phabricator.wikimedia.org/P11549 and previous config saved to /var/cache/conftool/dbconfig/20200617-051916-marostegui.json
* 05:10 marostegui@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 05:08 marostegui@cumin2001: START - Cookbook sre.hosts.downtime
* 04:51 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1090:3312, db1090:3317 for reimage', diff saved to https://phabricator.wikimedia.org/P11548 and previous config saved to /var/cache/conftool/dbconfig/20200617-045105-marostegui.json
* 04:44 marostegui: Reload pt-kill on labsdb analytics host to pick up new config
* 04:38 marostegui@cumin2001: dbctl commit (dc=all): 'Depool db1129', diff saved to https://phabricator.wikimedia.org/P11547 and previous config saved to /var/cache/conftool/dbconfig/20200617-043826-marostegui.json
* 01:43 shdubsh: restart elasticsearch on logstash1011
 
== 2020-06-16 ==
* 23:43 crusnov@deploy1001: Finished deploy [netbox/deploy@5251cf1]: Deploying Netbox to netbox-dev [[phab:T253140|T253140]] (duration: 00m 05s)
* 23:43 crusnov@deploy1001: Started deploy [netbox/deploy@5251cf1]: Deploying Netbox to netbox-dev [[phab:T253140|T253140]]
* 23:35 ebernhardson@deploy1001: Synchronized wmf-config/InitialiseSettings.php: cirrus: update ML models for ko and zh, drop ja (duration: 01m 00s)
* 23:34 ebernhardson@deploy1001: sync-file aborted: cirrus: update ML models for ko and zh, drop ja (duration: 00m 04s)
* 22:40 krinkle@deploy1001: Synchronized src/Noc/: (no justification provided) (duration: 01m 04s)
* 22:31 krinkle@deploy1001: Synchronized docroot/noc: (no justification provided) (duration: 01m 05s)
* 21:12 krinkle@deploy1001: Synchronized php-1.35.0-wmf.37/extensions/WikimediaEvents/modules/: {{Gerrit|I67794c6c7192571}} (duration: 01m 04s)
* 20:42 brennen@deploy1001: rebuilt and synchronized wikiversions files: Revert group1 wikis to 1.35.0-wmf.37
* 20:41 foks: reset email and pw for CactusJack
* 20:32 brennen: rolling 1.35.0-wmf.37 back to group0
* 20:29 mutante: signing puppet cert requests for releases1002 and releases2002 - [[phab:T255590|T255590]]
* 19:24 brennen@deploy1001: Synchronized php: group1 wikis to 1.35.0-wmf.37 (duration: 01m 04s)
* 19:23 brennen@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.35.0-wmf.37
* 19:18 otto@deploy1001: Started deploy [analytics/refinery@8b8ce6e]: deploying refinery source 0.0.127 for eventlogging -> eventgate migration - [[phab:T249261|T249261]]
* 19:15 brennen@deploy1001: Synchronized php-1.35.0-wmf.37/skins/Vector/resources/skins.vector.styles/: [[gerrit:605975{{!}}Restore Watchlist star]] (duration: 01m 05s)
* 19:03 brennen: CORRECTION: holding _1.35.0-wmf.37_ deploy to group1 for a few minutes while merging & testing fix for [[phab:T255574|T255574]]
* 19:01 brennen: holding 1.35.0-wmf.27 deploy to group1 for a few minutes while merging & testing fix for [[phab:T255574|T255574]]
* 18:59 dzahn@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
* 18:52 qchris: Turning on puppet again on gerrit1002 to avoid having it lag too far behind.
* 18:32 dzahn@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
* 18:18 mutante: mw2293 - scap pull (because Icinga reports mismatched MW versions)
* 18:01 crusnov@cumin2001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
* 17:55 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm
* 17:52 crusnov@cumin2001: START - Cookbook sre.ganeti.makevm
* 17:44 ebernhardson@deploy1001: Finished deploy [wikimedia/discovery/analytics@f4f5d7b]: airflow: adjust glent legal cutoff (duration: 01m 35s)
* 17:42 ebernhardson@deploy1001: Started deploy [wikimedia/discovery/analytics@f4f5d7b]: airflow: adjust glent legal cutoff
* 17:32 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm
* 17:03 herron: performing rolling reboots of kafka-main hosts for security updates [[phab:T254990|T254990]]
* 16:27 hnowlan@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'changeprop' for release 'production' .
* 16:26 hnowlan: Updating changeprop to new container version with updated dependencies
* 16:07 hnowlan@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'changeprop' for release 'production' .
* 16:04 hnowlan@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'staging' .
* 16:02 elukey: reboot kafka-jumbo1008 for kernel upgrades
* 15:58 hnowlan@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'changeprop' for release 'staging' .
* 15:49 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1076', diff saved to https://phabricator.wikimedia.org/P11543 and previous config saved to /var/cache/conftool/dbconfig/20200616-154924-marostegui.json
* 15:45 ebernhardson@deploy1001: Finished deploy [wikimedia/discovery/analytics@7d4458c]: Reduce glent maximum yarn resource usage to reasonable levels (duration: 00m 41s)
* 15:44 ebernhardson@deploy1001: Started deploy [wikimedia/discovery/analytics@7d4458c]: Reduce glent maximum yarn resource usage to reasonable levels
* 15:26 milimetric@deploy1001: Finished deploy [analytics/refinery@c652f62] (thin): Regular analytics weekly THIN train [analytics/refinery@c652f62] (duration: 00m 08s)
* 15:25 milimetric@deploy1001: Started deploy [analytics/refinery@c652f62] (thin): Regular analytics weekly THIN train [analytics/refinery@c652f62]
* 15:23 milimetric@deploy1001: Finished deploy [analytics/refinery@c652f62]: Regular analytics weekly train [analytics/refinery@c652f62] (duration: 07m 56s)
* 15:20 elukey: reboot kafka-jumbo1007 for kernel upgrades
* 15:15 moritzm: upgrading intel-microcode on jessie hosts
* 15:15 milimetric@deploy1001: Started deploy [analytics/refinery@c652f62]: Regular analytics weekly train [analytics/refinery@c652f62]
* 15:06 elukey: reboot an-coord1001 for kernel upgrades
* 14:49 hnowlan@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'changeprop' for release 'staging' .
* 14:49 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 14:45 moritzm: rebooting scandium for kernel security update
* 14:45 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
* 14:43 cdanis: repool eqiad [[phab:T243080|T243080]]
* 14:40 papaul: power off ms-be2018 for BBU replacement
* 14:33 cdanis: eqiad router upgrades completed! 🎉 [[phab:T243080|T243080]]
* 14:33 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 14:31 elukey: reboot druid100[7,8] for kernel upgrades
* 14:28 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
* 14:25 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 14:22 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
* 14:15 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1076', diff saved to https://phabricator.wikimedia.org/P11541 and previous config saved to /var/cache/conftool/dbconfig/20200616-141540-marostegui.json
* 14:14 cdanis: [[phab:T243080|T243080]] cdanis@re1.cr2-eqiad> request chassis routing-engine master switch
* 14:11 moritzm: removing stray nginx packages from mw canaries (mw1261-mw1265 and mw1276-mw1283) [[phab:T255565|T255565]]
* 14:06 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 14:03 akosiaris@cumin1001: START - Cookbook sre.hosts.decommission
* 14:03 akosiaris@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
* 14:03 akosiaris@cumin1001: START - Cookbook sre.hosts.decommission
* 14:03 akosiaris@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=99)
* 14:03 akosiaris@cumin1001: START - Cookbook sre.hosts.decommission
* 13:56 cdanis: [[phab:T243080|T243080]] cdanis@re0.cr2-eqiad> request chassis routing-engine master switch
* 13:50 cdanis: cr2-eqiad: rebooting RE1 [backup] with new junos version [[phab:T243080|T243080]]
* 13:39 cdanis: cr2-eqiad: disable transit/peering BGP & bump fr MED [[phab:T243080|T243080]]
* 13:32 marostegui@cumin2001: dbctl commit (dc=all): 'Repool db2092 [[phab:T254462|T254462]]', diff saved to https://phabricator.wikimedia.org/P11535 and previous config saved to /var/cache/conftool/dbconfig/20200616-133241-marostegui.json
* 13:17 XioNoX: pfw3-eqiad rollback MED to cr1 to 0 - [[phab:T243080|T243080]]
* 13:12 XioNoX: add graceful-switchover to cr1-eqiad
* 13:09 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 13:08 liw@deploy1001: rebuilt and synchronized wikiversions files: group0 wikis to 1.35.0-wmf.37
* 13:06 akosiaris@cumin1001: START - Cookbook sre.hosts.decommission
* 13:03 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 13:03 cdanis: [[phab:T243080|T243080]] cdanis@re1.cr1-eqiad> request chassis routing-engine master switch
* 13:03 jmm@cumin2001: START - Cookbook sre.hosts.downtime
* 13:03 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 13:03 jmm@cumin2001: START - Cookbook sre.hosts.downtime
* 13:01 moritzm: rebooting mw2291-mw2334
* 12:54 hnowlan@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'changeprop' for release 'staging' .
* 12:51 hnowlan@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'changeprop' for release 'staging' .
* 12:47 jbond42: upload new memcache package with TLS to component/memcached16 in buster-wikimedia
* 12:42 XioNoX: pfw3-eqiad set MED to cr1 to 300 - [[phab:T243080|T243080]]
* 12:38 hnowlan@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'changeprop' for release 'staging' .
* 12:31 cdanis: [[phab:T243080|T243080]] cr1-eqiad: request chassis routing-engine master switch
* 12:31 cdanis: cr1-eqiad: request chassis routing-engine master switch
* 12:25 cdanis: cr1-eqiad: rebooting RE1 [backup] with new junos version [[phab:T243080|T243080]]
* 12:15 cdanis: cdanis@re0.cr1-eqiad# commit confirmed 2 comment "force VRRP failover [[phab:T243080|T243080]]"
* 12:14 cdanis: disable transit/peering & increase frack MED on cr1-eqiad [[phab:T243080|T243080]]
* 12:09 hnowlan@cumin2001: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0)
* 11:48 cdanis: depooling eqiad for router upgrade [[phab:T243080|T243080]]
* 11:42 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 11:42 akosiaris@cumin1001: START - Cookbook sre.hosts.downtime
* 11:42 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 11:42 akosiaris@cumin1001: START - Cookbook sre.hosts.downtime
* 11:42 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 11:42 akosiaris@cumin1001: START - Cookbook sre.hosts.downtime
* 11:42 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 11:41 akosiaris@cumin1001: START - Cookbook sre.hosts.downtime
* 11:41 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 11:41 akosiaris@cumin1001: START - Cookbook sre.hosts.downtime
* 11:41 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 11:41 akosiaris@cumin1001: START - Cookbook sre.hosts.downtime
* 11:41 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 11:41 akosiaris@cumin1001: START - Cookbook sre.hosts.downtime
* 11:41 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 11:41 akosiaris@cumin1001: START - Cookbook sre.hosts.downtime
* 11:41 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 11:41 akosiaris@cumin1001: START - Cookbook sre.hosts.downtime
* 11:41 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 11:41 akosiaris@cumin1001: START - Cookbook sre.hosts.downtime
* 11:40 hnowlan: roll-restarting restbase201[0-2] for cert updates
* 11:40 hnowlan@cumin2001: START - Cookbook sre.cassandra.roll-restart
* 11:39 jmm@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 11:39 jmm@cumin1001: START - Cookbook sre.hosts.downtime
* 11:38 hnowlan@cumin2001: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0)
* 11:35 elukey: reboot an-druid100[1,2] for kernel upgrades
* 11:27 hnowlan: roll-restart restbase2009 for cert update
* 11:26 hnowlan@cumin2001: START - Cookbook sre.cassandra.roll-restart
* 11:21 hnowlan@cumin1001: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0)
* 11:18 jforrester@deploy1001: Synchronized dblists/mobilemainpagelegacy.dblist: [[phab:T32405|T32405]] [[phab:T254731|T254731]] Drop mobile special casing of main page for simplewiki, itwikisource, vecwikisource (duration: 01m 05s)
* 11:15 moritzm: updating perf on stretch hosts
* 11:14 marostegui: Deploy MCR schema change on db2087:3316
* 11:09 moritzm: updating perf on buster
* 11:02 moritzm: rebooting mw2350-mw2376
* 11:01 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Stop setting wgActorTableSchemaMigrationStage, no longer read in core (duration: 01m 05s)
* 10:52 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Stop setting wgTagStatisticsNewTable, no longer read in core (duration: 01m 04s)
* 10:51 hnowlan: roll-restarting restbase101[6-8].eqiad.wmnet for cert updates
* 10:50 hnowlan@cumin1001: START - Cookbook sre.cassandra.roll-restart
* 10:44 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Stop setting wgChangeTagsSchemaMigrationStage, no longer read in core (duration: 01m 06s)
* 10:26 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Stop setting wgCommentTableSchemaMigrationStage, no longer read in core (duration: 01m 07s)
* 09:54 volans: restarting netbox to pickup modified customscripts
* 09:14 filippo@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=thanos-swift,name=eqiad
* 08:53 godog: roll restart prometheus eqiad ops to enable thanos upload
* 08:48 marostegui: Upgrade db2132
* 08:44 marostegui@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 08:42 marostegui@cumin2001: START - Cookbook sre.hosts.downtime
* 08:39 liw@deploy1001: Finished scap: testwikis wikis to 1.35.0-wmf.37 (duration: 59m 05s)
* 08:19 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 08:19 akosiaris@cumin1001: START - Cookbook sre.hosts.downtime
* 08:19 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 08:19 akosiaris@cumin1001: START - Cookbook sre.hosts.downtime
* 08:19 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 08:19 akosiaris@cumin1001: START - Cookbook sre.hosts.downtime
* 08:18 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 08:18 akosiaris@cumin1001: START - Cookbook sre.hosts.downtime
* 08:18 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 08:18 akosiaris@cumin1001: START - Cookbook sre.hosts.downtime
* 08:18 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 08:18 akosiaris@cumin1001: START - Cookbook sre.hosts.downtime
* 08:09 volans@deploy1001: Finished deploy [homer/deploy@e9acec8]: Release v0.2.3 on cumin2001 now on buster (take 3bis) (duration: 00m 12s)
* 08:09 volans@deploy1001: Started deploy [homer/deploy@e9acec8]: Release v0.2.3 on cumin2001 now on buster (take 3bis)
* 08:09 volans@deploy1001: Finished deploy [homer/deploy@e9acec8]: Release v0.2.3 on cumin2001 now on buster (take 3) (duration: 01m 37s)
* 08:08 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 08:08 akosiaris@cumin1001: START - Cookbook sre.hosts.downtime
* 08:08 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 08:08 akosiaris@cumin1001: START - Cookbook sre.hosts.downtime
* 08:08 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 08:08 akosiaris@cumin1001: START - Cookbook sre.hosts.downtime
* 08:07 volans@deploy1001: Started deploy [homer/deploy@e9acec8]: Release v0.2.3 on cumin2001 now on buster (take 3)
* 07:59 volans@deploy1001: Finished deploy [homer/deploy@85e92b8]: Release v0.2.3 on cumin2001 now on buster (take 2) (duration: 00m 57s)
* 07:58 volans@deploy1001: Started deploy [homer/deploy@85e92b8]: Release v0.2.3 on cumin2001 now on buster (take 2)
* 07:49 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 07:49 akosiaris@cumin1001: START - Cookbook sre.hosts.downtime
* 07:40 liw@deploy1001: Started scap: testwikis wikis to 1.35.0-wmf.37
* 07:37 liw@deploy1001: Pruned MediaWiki: 1.35.0-wmf.35 (duration: 01m 47s)
* 07:31 liw@deploy1001: Pruned MediaWiki: 1.35.0-wmf.34 (duration: 11m 52s)
* 07:23 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.prepare-upgrade (exit_code=0)
* 07:08 ayounsi@cumin1001: START - Cookbook sre.network.prepare-upgrade
* 07:07 liw: 1.35.0-wmf.37 was branched at {{Gerrit|f856960f17b2a477640c5d848926c04f0d56196c}} for [[phab:T254174|T254174]]
* 07:06 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1148', diff saved to https://phabricator.wikimedia.org/P11526 and previous config saved to /var/cache/conftool/dbconfig/20200616-070651-marostegui.json
* 07:04 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1148', diff saved to https://phabricator.wikimedia.org/P11525 and previous config saved to /var/cache/conftool/dbconfig/20200616-070450-marostegui.json
* 07:04 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1084', diff saved to https://phabricator.wikimedia.org/P11524 and previous config saved to /var/cache/conftool/dbconfig/20200616-070429-marostegui.json
* 07:02 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1084', diff saved to https://phabricator.wikimedia.org/P11523 and previous config saved to /var/cache/conftool/dbconfig/20200616-070209-marostegui.json
* 06:57 marostegui: Compress InnoDB on db1134 [[phab:T254462|T254462]]
* 06:56 marostegui@cumin2001: dbctl commit (dc=all): 'Depool db1134 for InnoDB compression [[phab:T254462|T254462]]', diff saved to https://phabricator.wikimedia.org/P11522 and previous config saved to /var/cache/conftool/dbconfig/20200616-065600-marostegui.json
* 06:55 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.prepare-upgrade (exit_code=0)
* 06:54 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1093', diff saved to https://phabricator.wikimedia.org/P11521 and previous config saved to /var/cache/conftool/dbconfig/20200616-065412-marostegui.json
* 06:40 ayounsi@cumin1001: START - Cookbook sre.network.prepare-upgrade
* 06:25 elukey: roll restart memcached on mc-gp* (gutter pools) to pick up new slab size distribution setting - [[phab:T252391|T252391]]
* 06:04 hashar: Restarted Zuul scheduler and merger on contint2001 a couple hotfixes # [[phab:T252310|T252310]] [[phab:T255424|T255424]]
* 05:54 volker-e@deploy1001: Finished deploy [design/style-guide@37c67dd]: Deploy design/style-guide:  (duration: 00m 05s)
* 05:54 volker-e@deploy1001: Started deploy [design/style-guide@37c67dd]: Deploy design/style-guide:
* 05:04 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 05:01 marostegui@cumin1001: START - Cookbook sre.hosts.downtime
* 04:59 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1146:3314', diff saved to https://phabricator.wikimedia.org/P11520 and previous config saved to /var/cache/conftool/dbconfig/20200616-045958-marostegui.json
* 04:57 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1146:3314', diff saved to https://phabricator.wikimedia.org/P11519 and previous config saved to /var/cache/conftool/dbconfig/20200616-045744-marostegui.json
* 04:56 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1147', diff saved to https://phabricator.wikimedia.org/P11518 and previous config saved to /var/cache/conftool/dbconfig/20200616-045636-marostegui.json
* 04:55 marostegui: Deploy schema change on db1147
* 04:54 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1147', diff saved to https://phabricator.wikimedia.org/P11517 and previous config saved to /var/cache/conftool/dbconfig/20200616-045451-marostegui.json
* 04:46 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1149', diff saved to https://phabricator.wikimedia.org/P11516 and previous config saved to /var/cache/conftool/dbconfig/20200616-044612-marostegui.json
* 04:44 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1149', diff saved to https://phabricator.wikimedia.org/P11515 and previous config saved to /var/cache/conftool/dbconfig/20200616-044409-marostegui.json
* 04:43 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1143', diff saved to https://phabricator.wikimedia.org/P11514 and previous config saved to /var/cache/conftool/dbconfig/20200616-044326-marostegui.json
* 04:41 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1143', diff saved to https://phabricator.wikimedia.org/P11513 and previous config saved to /var/cache/conftool/dbconfig/20200616-044126-marostegui.json
* 04:40 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1138', diff saved to https://phabricator.wikimedia.org/P11512 and previous config saved to /var/cache/conftool/dbconfig/20200616-044036-marostegui.json
* 04:37 marostegui: Deploy schema change on db1138
* 04:37 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1138', diff saved to https://phabricator.wikimedia.org/P11511 and previous config saved to /var/cache/conftool/dbconfig/20200616-043748-marostegui.json
* 00:28 tstarling@deploy1001: Synchronized wmf-config/CommonSettings.php: limit HTTP client timeout [[phab:T245170|T245170]] (duration: 00m 56s)
* 00:25 tstarling@deploy1001: Synchronized wmf-config/set-time-limit.php: expose excimer timeout as a global variable [[phab:T245170|T245170]] (duration: 00m 56s)
* 00:16 ebernhardson@deploy1001: Finished deploy [wikimedia/discovery/analytics@17212bb]: airflow: migrate leven-dist to edit-dist (duration: 00m 45s)
* 00:16 volker-e@deploy1001: Finished deploy [design/style-guide@37c67dd]: Deploy design/style-guide:  (duration: 00m 04s)
* 00:16 volker-e@deploy1001: Started deploy [design/style-guide@37c67dd]: Deploy design/style-guide:
* 00:16 ebernhardson@deploy1001: Started deploy [wikimedia/discovery/analytics@17212bb]: airflow: migrate leven-dist to edit-dist
 
== 2020-06-15 ==
* 23:56 tstarling@deploy1001: Synchronized wmf-config/PoolCounterSettings.php: reducing connect timeout per [[phab:T105378|T105378]] (duration: 01m 00s)
* 23:31 ebernhardson@deploy1001: Finished deploy [wikimedia/discovery/analytics@eb0ac12]: Ship templatad table names in HivePartitionRangeSensor (duration: 00m 49s)
* 23:30 ebernhardson@deploy1001: Started deploy [wikimedia/discovery/analytics@eb0ac12]: Ship templatad table names in HivePartitionRangeSensor
* 22:58 krinkle@deploy1001: Synchronized wmf-config/PhpAutoPrepend.php: {{Gerrit|If7e1613cbcf8}} (duration: 00m 56s)
* 22:57 krinkle@deploy1001: Synchronized wmf-config/profiler.php: {{Gerrit|If7e1613cbcf8}} (duration: 00m 59s)
* 22:02 bstorm_: downtimed puppet alerts for testing some changes on labstore1004/5
* 20:59 ebernhardson@deploy1001: Finished deploy [search/airflow@62a024b]: Add pydruid to airflow (duration: 00m 50s)
* 20:58 ebernhardson@deploy1001: Started deploy [search/airflow@62a024b]: Add pydruid to airflow
* 20:55 shdubsh: update mtail to 3.0.0~rc35 on the rest of the hosts - eqiad and esams
* 20:44 shdubsh: update mtail to 3.0.0~rc35 on cp nodes in eqiad and esams
* 20:30 shdubsh: update mtail to 3.0.0~rc35 on wtp in eqiad
* 19:35 shdubsh: update mtail to 3.0.0~rc35 on mw in eqiad
* 18:50 ebernhardson@deploy1001: Finished deploy [wikimedia/discovery/analytics@41186c8]: port glent from oozie to airflow (duration: 00m 39s)
* 18:50 ebernhardson@deploy1001: Started deploy [wikimedia/discovery/analytics@41186c8]: port glent from oozie to airflow
* 18:28 addshore@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [[gerrit:605584]] [[phab:T254315|T254315]] test wikidata: Use the database name in the Wikibase entity source config (duration: 00m 58s)
* 17:56 krinkle@deploy1001: Synchronized wmf-config: {{Gerrit|I7721f4018b07dac}} (duration: 00m 58s)
* 17:55 krinkle@deploy1001: Synchronized wmf-config/ProductionServices.php: {{Gerrit|I7721f4018b07dac}} (duration: 00m 57s)
* 17:52 krinkle@deploy1001: Synchronized lib/: {{Gerrit|I7721f4018b07dac}} (duration: 00m 58s)
* 15:38 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1142', diff saved to https://phabricator.wikimedia.org/P11504 and previous config saved to /var/cache/conftool/dbconfig/20200615-153825-marostegui.json
* 15:37 marostegui: Deploy schema change on db1142
* 15:36 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1142', diff saved to https://phabricator.wikimedia.org/P11503 and previous config saved to /var/cache/conftool/dbconfig/20200615-153630-marostegui.json
* 15:35 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1141', diff saved to https://phabricator.wikimedia.org/P11502 and previous config saved to /var/cache/conftool/dbconfig/20200615-153546-marostegui.json
* 15:33 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1141', diff saved to https://phabricator.wikimedia.org/P11501 and previous config saved to /var/cache/conftool/dbconfig/20200615-153344-marostegui.json
* 15:16 moritzm: upgrading wtp1025-wtp1027 to PHP 7.2.31
* 15:09 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1121', diff saved to https://phabricator.wikimedia.org/P11499 and previous config saved to /var/cache/conftool/dbconfig/20200615-150908-marostegui.json
* 15:07 marostegui: Deploy schema change on db1121 (and labs)
* 15:06 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1121', diff saved to https://phabricator.wikimedia.org/P11498 and previous config saved to /var/cache/conftool/dbconfig/20200615-150639-marostegui.json
* 15:01 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1144:3314', diff saved to https://phabricator.wikimedia.org/P11497 and previous config saved to /var/cache/conftool/dbconfig/20200615-150148-marostegui.json
* 15:00 marostegui: Deploy schema change on db1144:3314
* 14:59 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1144:3314', diff saved to https://phabricator.wikimedia.org/P11496 and previous config saved to /var/cache/conftool/dbconfig/20200615-145914-marostegui.json
* 14:55 XioNoX: delete VCP from msw1-codfw
* 14:24 marostegui: Deploy schema change on db2107 (s2 codfw master) - [[phab:T250066|T250066]]
* 14:16 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 14:11 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
* 14:09 elukey@cumin2001: END (PASS) - Cookbook sre.hadoop.roll-restart-workers (exit_code=0)
* 13:54 marostegui: Deploy schema change on db1100 (s5 master) - [[phab:T250066|T250066]]
* 13:54 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 13:49 marostegui: Upgrade db2133
* 13:49 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
* 13:44 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 13:44 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 13:41 marostegui@cumin1001: START - Cookbook sre.hosts.downtime
* 13:40 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
* 13:38 elukey@cumin2001: START - Cookbook sre.hadoop.roll-restart-workers
* 13:31 volans@deploy1001: Finished deploy [homer/deploy@ac7a4c6]: Release v0.2.3 on cumin2001 now on buster (duration: 01m 15s)
* 13:30 moritzm: rolling reboot on the ganeti cluster in esams (for kernel security updates and to pick up the network changes to provides instances with a public IP)
* 13:30 volans@deploy1001: Started deploy [homer/deploy@ac7a4c6]: Release v0.2.3 on cumin2001 now on buster
* 13:26 hashar: Started zuul-merger on contint1001 with newer virtualenv # [[phab:T255424|T255424]]
* 13:26 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 13:21 filippo@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=thanos-query,name=eqiad
* 13:20 hashar: Stopping zuul-merger on contint1001 to rebuild the virtualenv # [[phab:T255424|T255424]]
* 13:19 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
* 13:04 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 12:59 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2091:3312, db2091:3314 - [[phab:T253217|T253217]]', diff saved to https://phabricator.wikimedia.org/P11495 and previous config saved to /var/cache/conftool/dbconfig/20200615-125856-marostegui.json
* 12:58 vgutierrez: upgrade acme-chief to version 0.26
* 12:57 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
* 12:46 vgutierrez: upload acme-chief 0.26 to apt.wm.o (buster) - [[phab:T255249|T255249]]
* 12:43 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 12:38 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
* 12:34 moritzm: rolling reboot on the ganeti cluster in eqsin (for security updates and to pick up the network changes to provides instances with a public IP)
* 12:12 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 12:11 marostegui: Upgrade db2134
* 12:09 jmm@cumin2001: START - Cookbook sre.hosts.downtime
* 12:01 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 11:59 marostegui@cumin1001: START - Cookbook sre.hosts.downtime
* 11:57 moritzm: reimaging sretest1002 to validate the reimage script on Buster
* 11:43 marostegui: Reimage dbproxy2003 which points to m3-master.codfw.wmnet (not in use) - [[phab:T255408|T255408]]
* 11:40 tgr@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:605543{{!}}GrowthExperiments: Switch on guidance feature (T239181)]] (duration: 00m 57s)
* 11:10 sukhe@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 11:10 sukhe@cumin1001: START - Cookbook sre.hosts.downtime
* 11:07 hnowlan: regenerated certificates for restbase2009, restbase101[678], restbase201[012]. Did not roll-restart yet
* 11:07 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 11:04 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
* 11:03 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 11:03 jmm@cumin2001: START - Cookbook sre.hosts.downtime
* 10:54 moritzm: imported python-phabricator 0.7.0-2~wmf2 to apt.wikimedia.org/buster-wikimedia [[phab:T245114|T245114]]
* 10:39 jdrewniak@deploy1001: Synchronized portals: Wikimedia Portals Update: [[gerrit:605553{{!}} Bumping portals to master (605553)]] (duration: 00m 58s)
* 10:38 hnowlan: regenerated restbase2009's cassandra certificates
* 10:38 jdrewniak@deploy1001: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: [[gerrit:605553{{!}} Bumping portals to master (605553)]] (duration: 00m 58s)
* 10:16 jmm@cumin1001: END (ERROR) - Cookbook sre.hosts.reboot-single (exit_code=97)
* 10:16 jmm@cumin1001: START - Cookbook sre.hosts.reboot-single
* 10:12 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [[phab:T254820|T254820]] [enwikivoyage] Undeploy the Listings extension (duration: 01m 00s)
* 10:08 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 10:05 marostegui@cumin1001: START - Cookbook sre.hosts.downtime
* 09:53 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 09:50 filippo@cumin1001: START - Cookbook sre.hosts.downtime
* 09:46 godog: run logstash benchmark on logstash1023
* 09:42 volans: deploying esams mgmt DNS records automatically generated by Netbox ( operations/dns/+/604136/ ) - [[phab:T233183|T233183]]
* 09:41 volans@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 09:35 volans@cumin1001: START - Cookbook sre.dns.netbox
* 09:29 elukey: update analytics-in4/6 filters on cr1-cr2 eqiad to update the Druid term (new nodes added)
* 09:21 jbond42: offlining puppetmaster1003 and 2003 for reboot
* 09:17 XioNoX: reduce ae device-count from 10 to 3 on asw2-a/b/c-eqiad
* 09:14 jmm@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 09:11 jmm@cumin1001: START - Cookbook sre.hosts.downtime
* 08:55 marostegui: Deploy schema change on db2123 (s5 codfw master) - [[phab:T250066|T250066]]
* 08:50 kart_: Updated cxserver to 2020-06-10-044445-production ([[phab:T246319|T246319]], [[phab:T254959|T254959]])
* 08:46 kartik@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'cxserver' for release 'production' .
* 08:42 kartik@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'cxserver' for release 'production' .
* 08:39 kartik@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'cxserver' for release 'staging' .
* 08:34 moritzm: reimaging cumin2001 [[phab:T245114|T245114]]
* 08:22 marostegui: Switchover m3-master from dbproxy1008 to dbproxy1016 - [[phab:T202367|T202367]]
* 08:17 marostegui: Deploy schema change on db1131 (s6 master) - [[phab:T250066|T250066]]
* 08:09 moritzm: installing libexif security updates
* 07:51 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 07:49 marostegui@cumin1001: START - Cookbook sre.hosts.downtime
* 07:46 XioNoX: standardize ae device-count on all routers
* 07:36 XioNoX: push new pfw firewall policies - [[phab:T255185|T255185]]
* 07:28 marostegui: Deploy schema change on db1093
* 07:28 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1093 for schema change', diff saved to https://phabricator.wikimedia.org/P11492 and previous config saved to /var/cache/conftool/dbconfig/20200615-072835-marostegui.json
* 07:27 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2092', diff saved to https://phabricator.wikimedia.org/P11491 and previous config saved to /var/cache/conftool/dbconfig/20200615-072742-marostegui.json
* 06:21 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 06:19 marostegui@cumin1001: START - Cookbook sre.hosts.downtime
 
== 2020-06-14 ==
* 13:51 qchris: Disabling puppet on gerrit1002 (test instance) to do some more upgrade testing
 
== 2020-06-13 ==
* 21:12 qchris: Enabling puppet on gerrit1002 (test instance). Done with testing for today.
* 12:51 herron: restarted logstash service on logstash1007, logstash1009
* 12:34 qchris: Disabling puppet on gerrit1002 (test instance) to do some more upgrade testing
* 12:33 godog: bounce logstash on logstash1008, GC death
 
== 2020-06-12 ==
* 17:44 herron: restarting logstash1011 elasticsearch instance
* 16:49 elukey: restart php-fpm and pool mw1384 - [[phab:T255282|T255282]]
* 16:33 elukey: (correct) depool again mw1384 - investigation will follow up in a task
* 16:32 elukey: depool again mw1348 - investigation will follow up in a task
* 15:49 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 15:44 akosiaris@cumin1001: START - Cookbook sre.hosts.downtime
* 15:40 hnowlan@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'changeprop' for release 'production' .
* 15:40 pt1979@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 15:37 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 15:36 hnowlan@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'changeprop' for release 'production' .
* 15:27 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 15:25 hnowlan@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'changeprop' for release 'production' .
* 15:24 akosiaris@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 15:24 akosiaris@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 15:24 akosiaris@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 15:24 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 15:22 akosiaris@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 15:22 akosiaris@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 15:22 akosiaris@cumin1001: START - Cookbook sre.hosts.downtime
* 15:22 akosiaris@cumin1001: START - Cookbook sre.hosts.downtime
* 15:22 akosiaris@cumin1001: START - Cookbook sre.hosts.downtime
* 15:22 akosiaris@cumin1001: START - Cookbook sre.hosts.downtime
* 15:22 akosiaris@cumin1001: START - Cookbook sre.hosts.downtime
* 15:22 akosiaris@cumin1001: START - Cookbook sre.hosts.downtime
* 15:22 akosiaris@cumin1001: START - Cookbook sre.hosts.downtime
* 14:51 elukey: repool mw1384 as test
* 14:31 akosiaris@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'changeprop' for release 'production' .
* 14:30 akosiaris: bump cpu limits for changeprop another 50%
* 14:30 akosiaris@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'changeprop' for release 'production' .
* 13:36 akosiaris@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'changeprop' for release 'production' .
* 13:34 akosiaris: update changeprop in eqiad+codfw for higher CPU limits
* 13:34 akosiaris@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'changeprop' for release 'production' .
* 13:12 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1088 after schema change', diff saved to https://phabricator.wikimedia.org/P11483 and previous config saved to /var/cache/conftool/dbconfig/20200612-131205-marostegui.json
* 12:40 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1088 for schema change', diff saved to https://phabricator.wikimedia.org/P11482 and previous config saved to /var/cache/conftool/dbconfig/20200612-124015-marostegui.json
* 12:18 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.upgrade-and-reboot (exit_code=0)
* 11:52 filippo@cumin1001: START - Cookbook sre.hosts.upgrade-and-reboot
* 11:23 jmm@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 11:19 jmm@cumin1001: START - Cookbook sre.hosts.reboot-single
* 11:15 moritzm: failover ganeti master in ulsfo to ganeti4003
* 11:14 marostegui@cumin1001: dbctl commit (dc=all): 'Pool db2080 and db2084 into s8 [[phab:T253217|T253217]]', diff saved to https://phabricator.wikimedia.org/P11481 and previous config saved to /var/cache/conftool/dbconfig/20200612-111422-marostegui.json
* 11:11 jmm@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 11:07 jmm@cumin1001: START - Cookbook sre.hosts.reboot-single
* 11:02 jmm@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 10:58 jmm@cumin1001: START - Cookbook sre.hosts.reboot-single
* 10:39 jmm@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 10:36 jmm@cumin1001: START - Cookbook sre.hosts.reboot-single
* 10:33 moritzm: rolling restart of the ulsfo ganeti cluster
* 10:21 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.upgrade-and-reboot (exit_code=0)
* 10:02 filippo@cumin1001: START - Cookbook sre.hosts.upgrade-and-reboot
* 10:01 jmm@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 10:01 filippo@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1)
* 10:01 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single
* 10:01 jmm@cumin1001: START - Cookbook sre.hosts.downtime
* 09:58 marostegui@cumin1001: dbctl commit (dc=all): 'Include db2084 in dbctl, depooled', diff saved to https://phabricator.wikimedia.org/P11480 and previous config saved to /var/cache/conftool/dbconfig/20200612-095855-marostegui.json
* 09:58 godog: roll-restart thanos-fe / thanos-be for microcode updates
* 08:51 elukey: restart gerrit on gerrit1001
* 08:48 elukey: update cr1/cr2 analyitics filters for [[phab:T252767|T252767]] and [[phab:T252675|T252675]]
* 08:44 marostegui: Compress InnoDB on db2092 - [[phab:T254462|T254462]]
* 08:36 marostegui: Clone db2084 from db2080
* 08:32 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2080 to clone db2084', diff saved to https://phabricator.wikimedia.org/P11478 and previous config saved to /var/cache/conftool/dbconfig/20200612-083231-marostegui.json
* 08:24 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 08:22 marostegui@cumin1001: START - Cookbook sre.hosts.downtime
* 08:14 marostegui@cumin1001: dbctl commit (dc=all): 'Remove db2084 from s4 and s5', diff saved to https://phabricator.wikimedia.org/P11477 and previous config saved to /var/cache/conftool/dbconfig/20200612-081455-marostegui.json
* 07:56 elukey: depool mw1384
* 07:52 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2084 from s4 and s5', diff saved to https://phabricator.wikimedia.org/P11476 and previous config saved to /var/cache/conftool/dbconfig/20200612-075202-marostegui.json
* 07:26 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 07:24 marostegui@cumin1001: START - Cookbook sre.hosts.downtime
* 07:08 marostegui: Reimage db2086
* 07:07 elukey: depool/scap pull/pool mw1384
* 07:05 moritzm: installing intel-microcode security updates (regressions have been sorted out)
* 05:42 moritzm: installing stretch kernel security updates  (no reboots yet)
* 05:40 moritzm: installing buster kernel security updates  (no reboots yet)
* 04:54 marostegui: Deploy schema change on s6 codfw - [[phab:T250066|T250066]]
* 01:02 ejegg: updated payments-wiki from {{Gerrit|aceddff8b5}} to {{Gerrit|5fd4eb1519}}
* 00:10 Amir1: BACON is done
 
== 2020-06-11 ==
* 23:54 ladsgroup@deploy1001: Synchronized php-1.35.0-wmf.36/extensions/Wikibase: [[gerrit:604845{{!}}Fix entity id lookup for interwiki special page links (T255078)]] (duration: 00m 38s)
* 23:51 ladsgroup@deploy1001: scap failed: average error rate on 3/9 canaries increased by 10x (rerun with --force to override this check, see https://logstash.wikimedia.org/goto/e474f13ffac6b8c3bf919c4aeafc8c9b for details)
* 23:43 ladsgroup@deploy1001: Synchronized wmf-config/extension-list: [[gerrit:604778{{!}}Remove ContributionTracking extension]] ([[phab:T255216|T255216]]), Part III (duration: 00m 57s)
* 23:42 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [[gerrit:604778{{!}}Remove ContributionTracking extension]] ([[phab:T255216|T255216]]), Part II (duration: 00m 58s)
* 23:38 ladsgroup@deploy1001: Synchronized wmf-config/CommonSettings.php: [[gerrit:604778{{!}}Remove ContributionTracking extension]] ([[phab:T255216|T255216]]), Part I (duration: 00m 59s)
* 23:37 Reedy: create cn_notice_regions on metawiki and testwiki [[phab:T252596|T252596]]
* 20:34 pt1979@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 20:31 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 20:15 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 20:13 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 20:00 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 19:59 jhuneidi@deploy1001: rebuilt and synchronized wikiversions files: all wikis to 1.35.0-wmf.36
* 19:58 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 19:33 akosiaris: apply emergency sessionstore fixes in codfw as well
* 19:32 akosiaris@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'sessionstore' for release 'production' .
* 19:25 gilles@deploy1001: Finished deploy [performance/asoranking@0a096c4]: [[phab:T252424|T252424]] (duration: 00m 47s)
* 19:19 gilles@deploy1001: Started deploy [performance/asoranking@0a096c4]: [[phab:T252424|T252424]]
* 19:12 akosiaris: repool eqiad for sessionstore
* 19:12 akosiaris@cumin1001: conftool action : set/pooled=true; selector: name=eqiad,dnsdisc=sessionstore
* 19:10 akosiaris: remove the podaffinity restrictions for sessionstore in eqiad
* 19:10 akosiaris@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'sessionstore' for release 'production' .
* 19:07 akosiaris@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'sessionstore' for release 'production' .
* 18:08 ppchelko@deploy1001: Synchronized wmf-config/reverse-proxy-staging.php: Beta: Switch from HTCP purging to kafka purging gerrit:603530, reverse-proxy-staging.php (duration: 01m 06s)
* 18:06 ppchelko@deploy1001: Synchronized wmf-config/InitialiseSettings-labs.php: Beta: Switch from HTCP purging to kafka purging gerrit:603530, IS-labs.php (duration: 01m 06s)
* 17:29 mbsantos@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'proton' for release 'production' .
* 17:26 mbsantos@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
* 17:22 mbsantos@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'proton' for release 'production' .
* 17:19 mbsantos@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
* 17:12 bstorm_: reboot for stretch upgrade on labstore1004 [[phab:T224582|T224582]]
* 16:49 bstorm_: doing stretch upgrade for labstore1004 [[phab:T224582|T224582]]
* 16:36 bstorm_: rebooting labstore1004 for upgrades [[phab:T224582|T224582]]
* 16:12 bstorm_: downtimed labstore1005 for upgrades on 1004 since that will alert as well [[phab:T224582|T224582]]
* 16:10 bstorm_: downtimed labstore1004 for upgrades [[phab:T224582|T224582]]
* 15:50 cstone: SmashPig revision changed from {{Gerrit|b9de3c7aac}} to {{Gerrit|2246685626}}
* 15:34 jmm@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 15:31 jmm@cumin1001: START - Cookbook sre.hosts.reboot-single
* 15:25 moritzm: installing buster kernel security updates  (no reboots yet)
* 15:04 jmm@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99)
* 15:04 mforns@deploy1001: Finished deploy [analytics/refinery@c969b56]: Regular analytics weekly train [analytics/refinery@c969b56afae1b2532e07f0ff699c2ce161360966] (duration: 01m 39s)
* 15:04 root@cumin1001: END (FAIL) - Cookbook sre.network.prepare-upgrade (exit_code=99)
* 15:04 root@cumin1001: START - Cookbook sre.network.prepare-upgrade
* 15:02 mforns@deploy1001: Started deploy [analytics/refinery@c969b56]: Regular analytics weekly train [analytics/refinery@c969b56afae1b2532e07f0ff699c2ce161360966]
* 15:02 jmm@cumin1001: START - Cookbook sre.hosts.reboot-single
* 14:56 herron: bounced elasticsearch on logstash1012
* 14:41 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 14:40 akosiaris@cumin1001: START - Cookbook sre.hosts.decommission
* 14:37 herron: enabled VO incident resolution notification in global settings
* 14:34 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 14:31 akosiaris@cumin1001: START - Cookbook sre.hosts.decommission
* 14:30 godog: bounce logstash on logstash1009, apparent GC death spiral
* 14:03 jmm@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99)
* 14:03 jmm@cumin1001: START - Cookbook sre.hosts.reboot-single
* 14:03 jmm@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1)
* 14:03 jmm@cumin1001: START - Cookbook sre.hosts.reboot-single
* 13:35 filippo@cumin1001: conftool action : set/pooled=false; selector: dnsdisc=thanos-query,name=eqiad
* 13:35 filippo@cumin1001: conftool action : set/pooled=false; selector: dnsdisc=thanos-swift,name=eqiad
* 12:39 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.prepare-upgrade (exit_code=0)
* 12:36 elukey: updated pcc facts
* 12:28 jayme@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'eventgate-main' for release 'canary' .
* 12:28 jayme@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
* 12:28 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 12:25 marostegui@cumin1001: START - Cookbook sre.hosts.downtime
* 12:15 jayme@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
* 12:15 jayme@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'canary' .
* 12:04 jforrester@deploy1001: Synchronized php-1.35.0-wmf.36/includes/title/NamespaceInfo.php: [[phab:T253098|T253098]] NamespaceInfo::makeValidNamespace: Don't throw for -1 or -2 (duration: 01m 06s)
* 12:03 marostegui: Reimage es2023 (es5 codfw master)
* 11:54 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db2075 [[phab:T254139|T254139]]', diff saved to https://phabricator.wikimedia.org/P11469 and previous config saved to /var/cache/conftool/dbconfig/20200611-115430-marostegui.json
* 11:46 marostegui: Deploy schema change on s6 codfw - [[phab:T250066|T250066]]
* 11:44 volans@deploy1001: Finished deploy [homer/deploy@df83901]: Release v0.2.3 (duration: 00m 25s)
* 11:44 volans@deploy1001: Started deploy [homer/deploy@df83901]: Release v0.2.3
* 11:36 ayounsi@cumin1001: START - Cookbook sre.network.prepare-upgrade
* 11:36 matthiasmullie: EU BACON done
* 11:35 mlitn@deploy1001: Synchronized php-1.35.0-wmf.36/extensions/GrowthExperiments: Help panel: Update guidance behavior rules (duration: 01m 06s)
* 11:34 jayme@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'canary' .
* 11:34 jayme@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
* 11:28 kartik@deploy1001: Synchronized php-1.35.0-wmf.36/extensions/ContentTranslation/modules/tools/mw.cx.tools.IssueTrackingTool.js: Backport: [[gerrit{{!}}604587{{!}}IssueTrackingTool: Fix js error in getCurrentNodeId method (T254965)]] (duration: 01m 07s)
* 11:08 jayme@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'eventgate-analytics' for release 'production' .
* 11:04 mlitn@deploy1001: Synchronized php-1.35.0-wmf.36/extensions/MachineVision: $aliases should be an array of strings, not AliasGroup objects (duration: 01m 07s)
* 10:47 moritzm: repooling mw1318,mw2139,mw2145,mw2147,mw2221,mw2219,mw2250,mw2350  (these were depooled, but seem all fine in Icinga and were probably just forgotten)
* 10:41 filippo@cumin1001: conftool action : set/pooled=yes; selector: cluster=thanos,service=thanos-swift
* 10:40 filippo@cumin1001: conftool action : set/pooled=yes; selector: cluster=thanos,service=thanos-query
* 10:37 moritzm: installing buster kernel security updates  (no reboots yet, on hold for regression-free microcode update)
* 10:32 godog: roll-restart pybal in eqiad lvs low-traffic
* 10:21 mutante: restarting gerrit on gerrit-replica (gerrit2001) - java.lang.OutOfMemoryError: Java heap space
* 10:21 Urbanecm: Run scap pull at mwdebug1001 to revert temporary changes
* 10:14 Urbanecm: Applying temporary changes on mwdebug1001
* 09:58 moritzm: upgrading netmon* to PHP 7.2.31
* 09:55 marostegui: Upgrade es2025
* 09:54 moritzm: upgrading mwmaint* to PHP 7.2.31
* 09:46 moritzm: upgrading labweb* PHP 7.2.31
* 09:36 elukey: switch piwik.wikimedia.org from matomo1001 to matomo1002 (new buster node)
* 09:02 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 09:00 marostegui@cumin1001: START - Cookbook sre.hosts.downtime
* 08:48 jayme@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
* 08:48 jayme@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'eventgate-main' for release 'canary' .
* 08:42 moritzm: imported memcached 1.6.6-1~wmf10u1
* 08:39 marostegui: Reimage es2024 to buster
* 08:30 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 08:30 filippo@cumin1001: START - Cookbook sre.hosts.downtime
* 08:25 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 08:25 akosiaris@cumin1001: START - Cookbook sre.hosts.downtime
* 08:25 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 08:25 akosiaris@cumin1001: START - Cookbook sre.hosts.downtime
* 08:24 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 08:24 akosiaris@cumin1001: START - Cookbook sre.hosts.downtime
* 08:24 akosiaris@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 08:24 akosiaris@cumin1001: START - Cookbook sre.hosts.downtime
* 08:23 jayme@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'canary' .
* 08:23 jayme@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
* 08:22 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 08:20 marostegui@cumin1001: START - Cookbook sre.hosts.downtime
* 08:18 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 08:18 filippo@cumin1001: START - Cookbook sre.hosts.downtime
* 08:01 jayme@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'canary' .
* 08:01 jayme@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
* 07:59 moritzm: upgrading remaining job runners in eqiad to PHP 7.2.31
* 07:59 hashar: Restarted Zuul on contint2001 for config change # [[phab:T253263|T253263]]
* 07:43 jayme@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'eventgate-analytics' for release 'production' .
* 07:34 moritzm: upgrading remaining app servers in eqiad to PHP 7.2.31
* 07:26 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 07:24 marostegui@cumin1001: START - Cookbook sre.hosts.downtime
* 07:07 marostegui: Stop MySQL on dbstore1003 for reimage - [[phab:T254870|T254870]]
* 06:38 XioNoX: make asw2-esams interfaces Homer like - [[phab:T250429|T250429]]
* 05:55 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1127 [[phab:T253217|T253217]]', diff saved to https://phabricator.wikimedia.org/P11467 and previous config saved to /var/cache/conftool/dbconfig/20200611-055536-marostegui.json
* 05:25 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1127 [[phab:T253217|T253217]]', diff saved to https://phabricator.wikimedia.org/P11466 and previous config saved to /var/cache/conftool/dbconfig/20200611-052535-marostegui.json
* 05:04 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1127 [[phab:T253217|T253217]]', diff saved to https://phabricator.wikimedia.org/P11465 and previous config saved to /var/cache/conftool/dbconfig/20200611-050446-marostegui.json
* 05:02 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1078', diff saved to https://phabricator.wikimedia.org/P11464 and previous config saved to /var/cache/conftool/dbconfig/20200611-050200-marostegui.json
* 04:54 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1078', diff saved to https://phabricator.wikimedia.org/P11463 and previous config saved to /var/cache/conftool/dbconfig/20200611-045426-marostegui.json
* 04:50 marostegui: Deploy schema change on testwiki - [[phab:T254371|T254371]]
* 04:47 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1084 and slowly repool db1127 [[phab:T253217|T253217]]', diff saved to https://phabricator.wikimedia.org/P11462 and previous config saved to /var/cache/conftool/dbconfig/20200611-044725-marostegui.json
* 03:13 shdubsh: removing WDQS-Streaming-Updater-POC metrics on graphite1004 - [[phab:T255044|T255044]]
* 02:43 tstarling@deploy1001: Synchronized php-1.35.0-wmf.36/extensions/Wikibase/lib/includes/Store/EntityLinkTargetEntityIdLookup.php: investigate UBN [[phab:T255078|T255078]] (duration: 01m 07s)
 
== 2020-06-10 ==
* 23:55 catrope@deploy1001: Synchronized php-1.35.0-wmf.36/includes/skins/SkinTemplate.php: [[phab:T255073|T255073]] (duration: 01m 07s)
* 22:14 eileen: civicrm revision changed from {{Gerrit|80a0d22350}} to {{Gerrit|f01b036128}}, config revision is {{Gerrit|a26d023633}}
* 21:23 akosiaris: increase memory/cpu limits for proton
* 21:23 akosiaris@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'proton' for release 'production' .
* 21:11 mbsantos@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'proton' for release 'production' .
* 21:08 akosiaris@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'mobileapps' for release 'staging' .
* 21:06 akosiaris@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'mobileapps' for release 'staging' .
* 20:45 mbsantos@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'proton' for release 'production' .
* 20:33 jhuneidi@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'mobileapps' for release 'staging' .
* 20:15 mbsantos@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'mobileapps' for release 'staging' .
* 20:04 mbsantos@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'mobileapps' for release 'staging' .
* 19:46 herron: bouncing elasticsearch on logstash1011
* 19:01 ppchelko@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Use EventRelayerNull for wikitech, gerrit:604469 (duration: 01m 05s)
* 18:54 urbanecm@deploy1001: Synchronized php-1.35.0-wmf.36/extensions/VisualEditor/: {{Gerrit|8958860}}: Make VisualEditorDisableForAnons only hide the tabs, not disable the editor ([[phab:T253941|T253941]]) (duration: 01m 07s)
* 18:32 urbanecm@deploy1001: Synchronized php-1.35.0-wmf.35/extensions/VisualEditor/: {{Gerrit|5f4c609}}: Make VisualEditorDisableForAnons only hide the tabs, not disable the editor ([[phab:T253941|T253941]]) (duration: 01m 14s)
* 16:40 godog: EDIT: in esams
* 16:39 godog: restart prometheus@ops in eqiad
* 16:31 ppchelko@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Disable HTCP purges everywhere, gerrit:603655 (duration: 01m 05s)
* 16:27 jayme@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'eventgate-main' for release 'canary' .
* 16:27 jayme@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
* 16:18 jayme@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'canary' .
* 16:18 jayme@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
* 16:13 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 16:13 ema: correction: restart purged on all *cache_upload* hosts to apply https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/604430/ [[phab:T250781|T250781]] [[phab:T133821|T133821]]
* 16:12 jayme@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'canary' .
* 16:12 jayme@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
* 16:12 ema: restart purged on all cache hosts to apply https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/604430/ [[phab:T250781|T250781]] [[phab:T133821|T133821]]
* 16:11 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 16:06 ema: cp3051: restart purged to apply https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/604430/ [[phab:T250781|T250781]] [[phab:T133821|T133821]]
* 16:02 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 16:00 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 15:49 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 15:45 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 15:38 jayme@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'eventgate-analytics' for release 'production' .
* 15:37 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 15:36 ppchelko@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Send kafka purges everywhere, gerrit:603654 (duration: 01m 05s)
* 15:35 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 15:32 ema: remaining-cp (non-ulsfo): rolling ats-tls-restart to apply https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/604305/ [[phab:T255015|T255015]]
* 15:29 ppchelko@deploy1001: Synchronized wmf-config/CommonSettings.php: Make kafka purges config more robust, gerrit:603649, CS.php (duration: 01m 05s)
* 15:27 ppchelko@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Make kafka purges config more robust, gerrit:603649, IS.php (duration: 01m 08s)
* 15:21 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 15:19 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 15:08 godog: roll-restart prometheus k8s to enable thanos upload
* 15:02 ema: A:cp-ulsfo: rolling ats-tls-restart to apply https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/604305/ [[phab:T255015|T255015]]
* 14:43 ema: A:cp rolling systemctl restart trafficserver
* 14:28 ema: systemctl restart trafficserver for instances critical in icinga
* 14:21 ema: cp3056: ats-backend-restart
* 14:09 ema: A:cp rolling ats-be/ats-tls restarts to apply https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/604305/ [[phab:T255015|T255015]]
* 14:08 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 14:06 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime
* 14:02 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 13:59 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime
* 13:57 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1094 into s7', diff saved to https://phabricator.wikimedia.org/P11458 and previous config saved to /var/cache/conftool/dbconfig/20200610-135753-marostegui.json
* 13:50 ema: cp3050: ats-tls-restart to apply https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/604305/ [[phab:T255015|T255015]]
* 13:50 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1094 into s7', diff saved to https://phabricator.wikimedia.org/P11457 and previous config saved to /var/cache/conftool/dbconfig/20200610-135039-marostegui.json
* 13:40 ema: cp3050: ats-backend-restart to apply https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/604305/ [[phab:T255015|T255015]]
* 13:36 ayounsi@cumin1001: END (FAIL) - Cookbook sre.network.prepare-upgrade (exit_code=99)
* 13:06 liw@deploy1001: Synchronized php: group1 wikis to 1.35.0-wmf.36 (duration: 01m 04s)
* 13:05 liw@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.35.0-wmf.36
* 12:33 ayounsi@cumin1001: START - Cookbook sre.network.prepare-upgrade
* 12:32 ayounsi@cumin1001: END (FAIL) - Cookbook sre.network.prepare-upgrade (exit_code=99)
* 12:32 ayounsi@cumin1001: START - Cookbook sre.network.prepare-upgrade
* 12:13 akosiaris: pool thumbor2002, thumbor2001. [[phab:T251570|T251570]]
* 12:12 akosiaris@cumin1001: conftool action : set/pooled=yes; selector: name=thumbor2002.codfw.wmnet
* 12:12 akosiaris@cumin1001: conftool action : set/pooled=yes; selector: name=thumbor2001.codfw.wmnet
* 11:50 marostegui: Deploy schema change on commonswiki codfw [[phab:T255003|T255003]]
* 11:41 moritzm: upgrading remaining app servers in codfw to PHP 7.2.31
* 11:38 marostegui: Deploy schema change on testcommonswiki [[phab:T255003|T255003]]
* 11:37 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|52091b8}}: Grant cswiki accountcreators tboverride-account and override-antispoof ([[phab:T254927|T254927]]) (duration: 01m 06s)
* 11:13 moritzm: upgrading remaining job runners in codfw to PHP 7.2.31
* 11:02 marostegui: Stop MySQL on db1094 to clone db1127
* 11:02 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1094 moving to clone db1127 [[phab:T253217|T253217]]', diff saved to https://phabricator.wikimedia.org/P11453 and previous config saved to /var/cache/conftool/dbconfig/20200610-110204-marostegui.json
* 10:57 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 10:54 marostegui@cumin1001: START - Cookbook sre.hosts.downtime
* 10:37 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1127 moving it to s7 [[phab:T253217|T253217]]', diff saved to https://phabricator.wikimedia.org/P11452 and previous config saved to /var/cache/conftool/dbconfig/20200610-103742-marostegui.json
* 10:28 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1103,db1137 into x1', diff saved to https://phabricator.wikimedia.org/P11451 and previous config saved to /var/cache/conftool/dbconfig/20200610-102805-marostegui.json
* 10:24 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [[phab:T254036|T254036]] Undeploy CollaborationKit: IV – Drop flag to load (duration: 01m 05s)
* 10:23 jayme: [[phab:T254581|T254581]] re-enabled puppet on all mw, api and jobrunner servers
* 10:20 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: [[phab:T254036|T254036]] Undeploy CollaborationKit: III – Drop ability to load (duration: 01m 05s)
* 10:16 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [[phab:T254036|T254036]] Undeploy CollaborationKit: II – Disable on Test Wikipedia (duration: 01m 37s)
* 10:14 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1103,db1137 into x1', diff saved to https://phabricator.wikimedia.org/P11450 and previous config saved to /var/cache/conftool/dbconfig/20200610-101407-marostegui.json
* 10:12 moritzm: upgrading remaining API servers in codfw to PHP 7.2.31
* 10:08 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1103,db1137 into x1', diff saved to https://phabricator.wikimedia.org/P11449 and previous config saved to /var/cache/conftool/dbconfig/20200610-100834-marostegui.json
* 10:03 jynus: cloning reviewdb into reviewdb-test at db1132 with replication enabled [[phab:T254516|T254516]]
* 10:03 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1103 into x1', diff saved to https://phabricator.wikimedia.org/P11448 and previous config saved to /var/cache/conftool/dbconfig/20200610-100306-marostegui.json
* 10:00 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1137 into x1', diff saved to https://phabricator.wikimedia.org/P11447 and previous config saved to /var/cache/conftool/dbconfig/20200610-100037-marostegui.json
* 09:35 volans: imported 0.0.38-1+deb10u1 into buster-wikimedia APT - [[phab:T245114|T245114]]
* 09:35 marostegui: Stop mysql on db1127 to clone db1103
* 09:34 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1137 for cloning db1103 - [[phab:T253217|T253217]]', diff saved to https://phabricator.wikimedia.org/P11443 and previous config saved to /var/cache/conftool/dbconfig/20200610-093440-marostegui.json
* 09:31 elukey@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
* 09:31 godog: configure thanos-be1* HDDs as raid0 - [[phab:T252186|T252186]]
* 09:26 marostegui@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 09:26 marostegui@cumin1001: START - Cookbook sre.hosts.downtime
* 09:26 marostegui@cumin1001: dbctl commit (dc=all): 'Add db1103 to dbctl, depooled [[phab:T253217|T253217]]', diff saved to https://phabricator.wikimedia.org/P11442 and previous config saved to /var/cache/conftool/dbconfig/20200610-092603-marostegui.json
* 09:24 marostegui@cumin1001: dbctl commit (dc=all): 'Remove db1103:3312 and db1103:3314', diff saved to https://phabricator.wikimedia.org/P11441 and previous config saved to /var/cache/conftool/dbconfig/20200610-092406-marostegui.json
* 09:14 jayme: [[phab:T254581|T254581]] disabling puppet on all mw, api and jobrunner servers to move termbox envoy config to TLS
* 09:08 kormat@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 09:08 kormat@cumin1001: START - Cookbook sre.hosts.downtime
* 08:50 XioNoX: make asw1-eqsin interfaces Homer like - [[phab:T250429|T250429]]
* 08:45 jayme@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'eventstreams' for release 'production' .
* 08:45 jayme@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'eventstreams' for release 'canary' .
* 08:45 elukey@cumin1001: START - Cookbook sre.ganeti.makevm
* 08:17 jayme@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'eventstreams' for release 'production' .
* 08:15 kormat@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 08:14 kormat@cumin1001: START - Cookbook sre.hosts.downtime
* 08:13 jayme@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'eventstreams' for release 'canary' .
* 07:53 kormat: reimaging db1077 [[phab:T252027|T252027]]
* 07:36 jayme@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'eventstreams' for release 'production' .
* 07:36 XioNoX: make asw2-ulsfo interfaces Homer like - [[phab:T250429|T250429]]
* 07:33 jayme@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'eventstreams' for release 'canary' .
* 07:31 moritzm: upgrade mw1298-mw1309 (job runners) to PHP 7.2.31
* 07:26 XioNoX: trunk public vlan to esams ganeti hosts - [[phab:T254157|T254157]]
* 07:16 XioNoX: trunk public vlan to eqsin ganeti hosts - [[phab:T254157|T254157]]
* 07:15 moritzm: upgrade remaining API servers in eqiad to PHP 7.2.31
* 07:08 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1103 for reimage - [[phab:T253217|T253217]]', diff saved to https://phabricator.wikimedia.org/P11439 and previous config saved to /var/cache/conftool/dbconfig/20200610-070822-marostegui.json
* 07:05 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db2113 after on-site maintenance [[phab:T251570|T251570]]', diff saved to https://phabricator.wikimedia.org/P11438 and previous config saved to /var/cache/conftool/dbconfig/20200610-070508-marostegui.json
* 06:53 XioNoX: trunk public vlan to ulsfo ganeti hosts - [[phab:T254157|T254157]]
* 05:10 marostegui: Deploy schema change on s3 master with 2 minutes sleep between wikis - [[phab:T206103|T206103]]
 
== 2020-06-09 ==
* 23:18 Reedy: run namespaceDupes.php --fix for hiwikibooks [[phab:T254012|T254012]]
* 23:10 reedy@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [[phab:T254706|T254706]] [[phab:T254012|T254012]] [[phab:T241893|T241893]] (duration: 01m 06s)
* 23:03 Reedy: created wikilove_log on slwiki [[phab:T254706|T254706]]
* 20:00 jhuneidi@deploy1001: Pruned MediaWiki: 1.35.0-wmf.32 (duration: 05m 11s)
* 19:51 jhuneidi@deploy1001: rebuilt and synchronized wikiversions files: group0 wikis to 1.35.0-wmf.36
* 19:42 jhuneidi@deploy1001: Finished scap: testwikis wikis to 1.35.0-wmf.36 (duration: 57m 47s)
* 19:29 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 19:26 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime
* 19:26 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 19:23 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime
* 19:10 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 19:07 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 19:07 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime
* 19:05 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime
* 18:45 jhuneidi@deploy1001: Started scap: testwikis wikis to 1.35.0-wmf.36
* 18:41 jforrester@deploy1001: Synchronized php-1.35.0-wmf.36/extensions/TimedMediaHandler/includes/TimedMediaHandler.php: [[phab:T254824|T254824]] Avoid undefined index error (duration: 00m 57s)
* 18:36 volans: migrated mgmt DNS records in eqsin to the Netbox-generated records - [[phab:T233183|T233183]]
* 18:13 jforrester@deploy1001: Synchronized php-1.35.0-wmf.36/extensions/CheckUser/: [[phab:T234921|T234921]] [[phab:T254912|T254912]] Use UserGroupManagerFactory with correct domain to fetch groups (duration: 02m 26s)
* 18:12 volans: uploaded cumin_4.0.0rc1-1_amd64.deb to apt.wikimedia.org buster-wikimedia
* 16:43 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 16:40 andrew@cumin1001: START - Cookbook sre.hosts.downtime
* 16:06 longma: cutting the branch for 1.35.0-wmf.36 [[phab:T254173|T254173]]
* 15:26 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 15:26 aborrero@cumin1001: START - Cookbook sre.hosts.downtime
* 15:25 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 15:25 aborrero@cumin1001: START - Cookbook sre.hosts.downtime
* 15:06 volans: forcing a debmonitor GC to verify the fix of [[phab:T254865|T254865]]
* 14:59 mutante: gerrit2001 - delete gerrit logfiles older than 30 days, crons are now enabled to keep doing it in the future
* 14:55 volans@deploy1001: Finished deploy [debmonitor/deploy@44aa1ee]: Release v0.2.5 (duration: 00m 43s)
* 14:54 volans@deploy1001: Started deploy [debmonitor/deploy@44aa1ee]: Release v0.2.5
* 14:49 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db2131 after reimage', diff saved to https://phabricator.wikimedia.org/P11436 and previous config saved to /var/cache/conftool/dbconfig/20200609-144929-marostegui.json
* 14:45 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 14:42 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 14:40 kormat@cumin1001: START - Cookbook sre.hosts.downtime
* 14:40 marostegui@cumin1001: START - Cookbook sre.hosts.downtime
* 14:34 moritzm: rebooting auth1002
* 14:33 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 14:32 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 14:32 jmm@cumin2001: START - Cookbook sre.hosts.downtime
* 14:30 andrew@cumin1001: START - Cookbook sre.hosts.downtime
* 14:12 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 14:09 marostegui@cumin1001: START - Cookbook sre.hosts.downtime
* 14:00 elukey: update release repository's settings  on Archiva - [[phab:T254849|T254849]]
* 14:00 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 14:00 aborrero@cumin1001: START - Cookbook sre.hosts.downtime
* 14:00 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 14:00 aborrero@cumin1001: START - Cookbook sre.hosts.downtime
* 13:56 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 13:54 kormat@cumin1001: START - Cookbook sre.hosts.downtime
* 12:38 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2131 for reimage', diff saved to https://phabricator.wikimedia.org/P11434 and previous config saved to /var/cache/conftool/dbconfig/20200609-123817-marostegui.json
* 12:22 kormat: reimaging sretest1002 [[phab:T252027|T252027]]
* 12:18 kartik@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'cxserver' for release 'production' .
* 12:16 kartik@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'cxserver' for release 'production' .
* 12:14 kartik@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'cxserver' for release 'staging' .
* 12:00 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1141 into s4 [[phab:T252512|T252512]]', diff saved to https://phabricator.wikimedia.org/P11433 and previous config saved to /var/cache/conftool/dbconfig/20200609-120009-marostegui.json
* 11:50 kartik@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'cxserver' for release 'production' .
* 11:50 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1141 into s4 [[phab:T252512|T252512]]', diff saved to https://phabricator.wikimedia.org/P11432 and previous config saved to /var/cache/conftool/dbconfig/20200609-115016-marostegui.json
* 11:46 kartik@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'cxserver' for release 'production' .
* 11:46 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1148 into s4 [[phab:T252512|T252512]]', diff saved to https://phabricator.wikimedia.org/P11431 and previous config saved to /var/cache/conftool/dbconfig/20200609-114615-marostegui.json
* 11:44 kartik@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'cxserver' for release 'staging' .
* 11:38 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1141 into s4 [[phab:T252512|T252512]]', diff saved to https://phabricator.wikimedia.org/P11430 and previous config saved to /var/cache/conftool/dbconfig/20200609-113818-marostegui.json
* 11:37 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1148 into s4 [[phab:T252512|T252512]]', diff saved to https://phabricator.wikimedia.org/P11429 and previous config saved to /var/cache/conftool/dbconfig/20200609-113702-marostegui.json
* 11:30 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1141 into s4 [[phab:T252512|T252512]]', diff saved to https://phabricator.wikimedia.org/P11428 and previous config saved to /var/cache/conftool/dbconfig/20200609-113056-marostegui.json
* 11:27 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1148 into s4 [[phab:T252512|T252512]]', diff saved to https://phabricator.wikimedia.org/P11427 and previous config saved to /var/cache/conftool/dbconfig/20200609-112701-marostegui.json
* 11:15 ladsgroup@deploy1001: Synchronized langlist: [[gerrit:602675{{!}}Add be-tarask to langlist (T111853)]] (duration: 00m 57s)
* 11:14 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1148 into s4 [[phab:T252512|T252512]]', diff saved to https://phabricator.wikimedia.org/P11426 and previous config saved to /var/cache/conftool/dbconfig/20200609-111443-marostegui.json
* 10:49 elukey: update pcc facts
* 10:48 moritzm: imported tqdm 4.23.4-1+wmf1 to buster-wikimedia/component/spicerack
* 10:35 volans: installed spicerack 0.0.38 on cumin[12]001
* 10:32 marostegui@cumin1001: dbctl commit (dc=all): 'Add db1141 depooled to s4 [[phab:T252512|T252512]]', diff saved to https://phabricator.wikimedia.org/P11425 and previous config saved to /var/cache/conftool/dbconfig/20200609-103252-marostegui.json
* 10:27 volans: uploaded spicerack_0.0.38-1_amd64.deb to apt.wikimedia.org stretch-wikimedia
* 10:14 jayme: restarting pybal on lvs1015 and lvs2009 for [[phab:T254581|T254581]]
* 10:12 XioNoX: "Re-order some BGP transit neighbors terms"
* 10:07 marostegui: Deploy schema change on s7 [[phab:T206103|T206103]]
* 10:00 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 10:00 akosiaris@cumin1001: START - Cookbook sre.hosts.downtime
* 10:00 akosiaris@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 10:00 akosiaris@cumin1001: START - Cookbook sre.hosts.downtime
* 09:57 jayme: restarting pybal on lvs1016 and lvs2010 for [[phab:T254581|T254581]]
* 09:57 akosiaris: correction: depool and set as inactive thumbor200<nowiki>{</nowiki>1,2<nowiki>}</nowiki> for [[phab:T251570|T251570]]
* 09:57 akosiaris: depool and set as inactive thumber200<nowiki>{</nowiki>1,2<nowiki>}</nowiki> for [[phab:T251750|T251750]]
* 09:56 vgutierrez: disable parent proxies on ats-tls
* 09:55 akosiaris@cumin1001: conftool action : set/pooled=inactive; selector: name=thumbor2001.codfw.wmnet
* 09:55 akosiaris@cumin1001: conftool action : set/pooled=inactive; selector: name=thumbor2002.codfw.wmnet
* 09:41 marostegui: Compress InnoDB on db2072 [[phab:T254462|T254462]]
* 09:34 marostegui: Stop MySQL on db1148 to clone db1141 - [[phab:T252512|T252512]]
* 09:30 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 09:29 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1148 to clone db1141 - [[phab:T252512|T252512]]', diff saved to https://phabricator.wikimedia.org/P11423 and previous config saved to /var/cache/conftool/dbconfig/20200609-092915-marostegui.json
* 09:28 marostegui@cumin1001: START - Cookbook sre.hosts.downtime
* 09:01 moritzm: rolling restart of cassandra on maps* to pick up Java security updates
* 08:39 moritzm: upgrading snapshot servers to PHP 7.2.31
* 08:28 moritzm: upgrading deployment servers to PHP 7.2.31
* 08:01 marostegui: stop m1 on db1117 to clone db1097 (this will trigger an haproxy irc alert) - [[phab:T254556|T254556]]
* 07:36 marostegui@cumin1001: dbctl commit (dc=all): 'Remove db1097 from config', diff saved to https://phabricator.wikimedia.org/P11421 and previous config saved to /var/cache/conftool/dbconfig/20200609-073635-marostegui.json
* 07:36 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 07:33 marostegui@cumin1001: START - Cookbook sre.hosts.downtime
* 07:30 moritzm: upgrading mw1390-mw1413 to PHP 7.2.31
* 07:11 ema: deployment-cache-text06: stop vhtcpd, start purged [[phab:T254844|T254844]]
* 07:09 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1097:3314, db1097:3315 [[phab:T253217|T253217]]', diff saved to https://phabricator.wikimedia.org/P11420 and previous config saved to /var/cache/conftool/dbconfig/20200609-070917-marostegui.json
* 06:53 marostegui: Stop MySQL on db2113 for maintenance - [[phab:T251570|T251570]]
* 06:51 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2113 for on-site maintenance [[phab:T251570|T251570]]', diff saved to https://phabricator.wikimedia.org/P11419 and previous config saved to /var/cache/conftool/dbconfig/20200609-065125-marostegui.json
* 06:48 marostegui@cumin1001: dbctl commit (dc=all): 'Fully pool db1091 into s1 [[phab:T253217|T253217]]', diff saved to https://phabricator.wikimedia.org/P11418 and previous config saved to /var/cache/conftool/dbconfig/20200609-064829-marostegui.json
* 06:40 marostegui: Deploy schema change on s2 [[phab:T206103|T206103]]
* 06:33 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly pool db1091 into s1 [[phab:T253217|T253217]]', diff saved to https://phabricator.wikimedia.org/P11417 and previous config saved to /var/cache/conftool/dbconfig/20200609-063344-marostegui.json
* 06:19 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly pool db1091 into s1 [[phab:T253217|T253217]]', diff saved to https://phabricator.wikimedia.org/P11416 and previous config saved to /var/cache/conftool/dbconfig/20200609-061916-marostegui.json
* 05:51 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly pool db1091 into s1 [[phab:T253217|T253217]]', diff saved to https://phabricator.wikimedia.org/P11415 and previous config saved to /var/cache/conftool/dbconfig/20200609-055128-marostegui.json
* 05:32 marostegui: Switch dbproxy1018 from "master" service to "replicas" - [[phab:T249188|T249188]]
* 01:02 eileen: civicrm revision changed from {{Gerrit|4a19db672f}} to {{Gerrit|80a0d22350}}, config revision is {{Gerrit|386b9bc457}}
* 00:39 ejegg: updated payments-wiki from {{Gerrit|c1d14a5db7}} to {{Gerrit|aceddff8b5}}
* 00:30 shdubsh: restart elasticsearch on logstash1010
* 00:24 eileen: civicrm revision changed from {{Gerrit|be4c5a4951}} to {{Gerrit|4a19db672f}}, config revision is {{Gerrit|386b9bc457}}
 
== 2020-06-08 ==
* 23:49 krinkle@deploy1001: Synchronized wmf-config/logging.php: {{Gerrit|If991929c84ff69}} (duration: 00m 57s)
* 23:35 krinkle@deploy1001: Synchronized wmf-config/logging.php: {{Gerrit|I8c22a1a8fc402}} (duration: 00m 58s)
* 23:32 foks: removing one file for legal compliance
* 23:25 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 23:23 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime
* 23:02 ryankemper@cumin2001: END (FAIL) - Cookbook sre.elasticsearch.rolling-upgrade (exit_code=99)
* 22:58 ryankemper@cumin2001: START - Cookbook sre.elasticsearch.rolling-upgrade
* 22:53 ryankemper@cumin2001: END (FAIL) - Cookbook sre.elasticsearch.rolling-upgrade (exit_code=99)
* 22:53 shdubsh: update mtail to 3.0.0~rc35 on mw and wtp hosts codfw
* 22:49 eileen: civicrm revision changed from {{Gerrit|11b0e7c7e5}} to {{Gerrit|be4c5a4951}}, config revision is {{Gerrit|386b9bc457}}
* 22:49 ryankemper@cumin2001: START - Cookbook sre.elasticsearch.rolling-upgrade
* 20:52 Amir1: applying the sql alter table on [[gerrit:594292{{!}}ipblocks]] on labswiki ([[phab:T251188|T251188]])
* 20:27 RoanKattouw: Running initUserPreference.php -s growthexperiments-homepage-enable -t growthexperiments-help-panel-tog-help-panel on wikis that have GrowthExperiments installed ([[phab:T240920|T240920]])
* 18:56 Urbanecm: Morning <del>SWAT</del>config/backport window done
* 18:56 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: {{Gerrit|1630a10}}: Set wgProofreadPagePageJoiner to __PAGEJOIN__ for zhwikisource ([[phab:T205826|T205826]]) (duration: 00m 58s)
* 18:55 urbanecm@deploy1001: sync-file aborted: SWAT: {{Gerrit|1630a10}}: Set wgProofreadPagePageJoiner to __PAGEJOIN__ for zhwikisource (duration: 00m 00s)
* 18:51 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: {{Gerrit|0e85203}}: Enable subpages in Page namespace on napwikisource ([[phab:T252755|T252755]]) (duration: 00m 58s)
* 18:44 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 18:42 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime
* 18:28 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: End GrowthExperiments homepage A/B test ([[phab:T254413|T254413]]) (duration: 00m 57s)
* 18:23 catrope@deploy1001: Synchronized wmf-config/CommonSettings.php: Disable HTCP purges for testwiki ([[phab:T250781|T250781]]) (part 2) (duration: 00m 56s)
* 18:20 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Disable HTCP purges for testwiki ([[phab:T250781|T250781]]) (part 1) (duration: 00m 59s)
* 17:50 elukey: restart prometheus burrow exporter for kafka main on kafkamon1001 - [[phab:T254498|T254498]]
* 17:43 ladsgroup@deploy1001: Synchronized php-1.35.0-wmf.35/resources/src/mediawiki.misc-authed-curate/rollback.js: Fix: Diff pages show rollback confirmation prompt if there is the "Mark as patrolled" link ([[phab:T254538|T254538]]) (duration: 00m 59s)
* 17:14 elukey@cumin1001: END (PASS) - Cookbook sre.kafka.roll-restart-mirror-maker (exit_code=0)
* 16:55 elukey@cumin1001: START - Cookbook sre.kafka.roll-restart-mirror-maker
* 16:54 elukey@cumin1001: END (PASS) - Cookbook sre.kafka.roll-restart-brokers (exit_code=0)
* 16:44 liw: testing upcoming Scap release on beta
* 15:29 hnowlan: Migrated all cpjobqueue jobs from scb to Kubernetes
* 15:29 hnowlan@deploy1001: Finished deploy [cpjobqueue/deploy@07d8c32]: Disabling jobs migrated to k8s (duration: 04m 34s)
* 15:28 jynus@cumin2001: dbctl commit (dc=all): 'depool db2075 for mw maintenance [[phab:T254139|T254139]]', diff saved to https://phabricator.wikimedia.org/P11411 and previous config saved to /var/cache/conftool/dbconfig/20200608-152811-jynus.json
* 15:24 hnowlan@deploy1001: Started deploy [cpjobqueue/deploy@07d8c32]: Disabling jobs migrated to k8s
* 15:12 ladsgroup@deploy1001: Synchronized php-1.35.0-wmf.35/extensions/Wikibase/client/includes/Store/Sql/DirectSqlStore.php: Wrap WAN-cached PropertyInfoLookup with an APCu cache, Part III out of III ([[phab:T254536|T254536]]) (duration: 00m 57s)
* 15:10 ladsgroup@deploy1001: Synchronized php-1.35.0-wmf.35/extensions/Wikibase/repo/includes/Store/Sql/SqlStore.php: Wrap WAN-cached PropertyInfoLookup with an APCu cache, Part II out of III ([[phab:T254536|T254536]]) (duration: 00m 57s)
* 15:09 ladsgroup@deploy1001: Synchronized php-1.35.0-wmf.35/extensions/Wikibase/lib/includes/Store/CachingPropertyInfoLookup.php: Wrap WAN-cached PropertyInfoLookup with an APCu cache, Part I out of III ([[phab:T254536|T254536]]) (duration: 00m 59s)
* 15:05 hnowlan@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' .
* 14:53 cdanis: ✔️ cdanis@cumin1001.eqiad.wmnet ~ 🕚☕ sudo cumin A:mw-canary 'enable-puppet "cdanis deploying {{Gerrit|I25ab44c1}} [[phab:T252605|T252605]]"'
* 14:52 hnowlan@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'staging' .
* 14:48 papaul: powering down ms-be2016 for BBU replacement
* 14:47 cdanis: ✔️ cdanis@cumin1001.eqiad.wmnet ~ 🕚☕ sudo cumin A:mw-canary 'disable-puppet "cdanis deploying {{Gerrit|I25ab44c1}} [[phab:T252605|T252605]]"'
* 14:41 moritzm: upgrading mw API servers in codfw to PHP 7.2.31
* 14:00 jbond42: updating puppet-merge https://gerrit.wikimedia.org/r/c/operations/puppet/+/602738/4
* 13:58 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 13:58 jmm@cumin2001: START - Cookbook sre.hosts.downtime
* 13:50 urbanecm@deploy1001: Synchronized private/PrivateSettings.php: Update mitigations for [[phab:T250887|T250887]] (duration: 00m 57s)
* 13:41 elukey@cumin1001: START - Cookbook sre.kafka.roll-restart-brokers
* 12:23 XioNoX: repool codfw - [[phab:T243080|T243080]]
* 12:18 XioNoX: rollback cr2-codfw vrrp/ospf/bgp changes - [[phab:T243080|T243080]]
* 12:18 marostegui: Compress InnoDB on db2094:3311 [[phab:T254462|T254462]]
* 12:09 XioNoX: cr2-codfw> request chassis routing-engine master switch - [[phab:T243080|T243080]]
* 12:05 XioNoX: reboot cr2-codfw:re0 (backup) - [[phab:T243080|T243080]]
* 11:53 XioNoX: cr2-codfw> request chassis routing-engine master switch - [[phab:T243080|T243080]]
* 11:53 moritzm: restarting dnsdist on malmok
* 11:53 marostegui: Deploy schema change on s3 - [[phab:T251188|T251188]]
* 11:49 XioNoX: reboot cr2-codfw:re1 (backup) - [[phab:T243080|T243080]]
* 11:45 moritzm: restarting slapd on ldap-corp* for Gnu TLS security update
* 11:43 moritzm: rolling restart of Apache on Kibana/7 host to pick up Gnu TLS security update
* 11:41 XioNoX: de-pref cr2-codfw OSPF - [[phab:T243080|T243080]]
* 11:39 XioNoX: deactivate cr2-codfw transit/peering - [[phab:T243080|T243080]]
* 11:38 XioNoX: fail vrrp master from cr2 to cr1 - [[phab:T243080|T243080]]
* 11:32 XioNoX: cr1-codfw set OSPF metrics back to normal - [[phab:T243080|T243080]]
* 11:30 XioNoX: cr1-codfw re-enable transit/peering - [[phab:T243080|T243080]]
* 11:29 XioNoX: cr1-codfw add graceful-restart - [[phab:T243080|T243080]]
* 11:28 XioNoX: cr1-codfw add graceful-switchover - [[phab:T243080|T243080]]
* 11:18 Lucas_WMDE: EU SWAT done
* 11:16 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:602981{{!}}Remove Wikibase idBlacklist setting (T254686)]], part 2 (duration: 00m 56s)
* 11:15 XioNoX: cr1-codfw> request chassis routing-engine master switch - [[phab:T243080|T243080]]
* 11:15 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/Wikibase.php: SWAT: [[gerrit:602981{{!}}Remove Wikibase idBlacklist setting (T254686)]], part 1 (duration: 00m 56s)
* 11:11 XioNoX: reboot cr1-codfw:re0 (backup) - [[phab:T243080|T243080]]
* 11:09 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:601409{{!}}Enable GrowthExperiments guidance everywhere behind feature flag (T253794)]] (duration: 00m 57s)
* 11:05 marostegui: Install events on es1 [[phab:T254689|T254689]]
* 11:05 XioNoX: install Junos on cr1-codfw:re0 (backup) - [[phab:T243080|T243080]]
* 10:56 XioNoX: do cr1-codfw RE mastership switch - [[phab:T243080|T243080]]
* 10:53 XioNoX: reboot cr1-codfw:re1 (backup) - [[phab:T243080|T243080]]
* 10:46 XioNoX: install Junos on cr1-codfw:re1 (backup) - [[phab:T243080|T243080]]
* 10:43 XioNoX: deactivate cr1-codfw transit/peering - [[phab:T243080|T243080]]
* 10:41 XioNoX: bump all cr1-codfw OSPF metrics - [[phab:T243080|T243080]]
* 10:41 jdrewniak@deploy1001: Synchronized portals: Wikimedia Portals Update: [[gerrit:603408{{!}} Bumping portals to master (603408)]] (duration: 00m 57s)
* 10:40 jdrewniak@deploy1001: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: [[gerrit:603408{{!}} Bumping portals to master (603408)]] (duration: 01m 09s)
* 10:39 XioNoX: depool codfw - [[phab:T243080|T243080]]
* 09:46 moritzm: installing gnutls28 security updates on buster (older releases not affected)
* 09:32 qchris: Turning on puppet on gerrit1002 again to avoid starting to lag too far behind
* 08:17 XioNoX: push [[phab:T250136|T250136]] to eqsin - [[phab:T250136|T250136]]
* 08:09 XioNoX: push [[phab:T250136|T250136]] to eqiad - [[phab:T250136|T250136]]
* 08:07 moritzm: upgrading mw1349-mw1383 to PHP 7.2.31
* 08:07 mutante: stat1006 moved broken jupyter-dedcode-singleuser.service out of /run/systemd/transient.  systemctl reset-failed
* 08:02 XioNoX: push [[phab:T250136|T250136]] to codfw - [[phab:T250136|T250136]]
* 07:58 XioNoX: push [[phab:T250136|T250136]] to eqord/eqdfw - [[phab:T250136|T250136]]
* 07:58 mutante: stat1006 bash[40607]: /bin/bash: line 0: exec: jupyterhub-singleuser: not found
* 07:57 mutante: ran puppet on all stat* hosts for an access request (dcipoletti was added) - stat1006 systemd state broke right after, jupyter-dedcode-singleuser.service  failed
* 07:46 XioNoX: push [[phab:T250136|T250136]] to esams/knams - [[phab:T250136|T250136]]
* 07:42 XioNoX: cr4-ulsfo protocols bgp group Transit4 family inet any -> unicast - [[phab:T250136|T250136]]
* 07:39 XioNoX: cr3-ulsfo protocols bgp group Transit4 family inet any -> unicast - [[phab:T250136|T250136]]
* 07:37 moritzm: installing nodejs security updates
* 07:05 marostegui: Stop MySQL on labsdb1012 to clone labsdb1011 [[phab:T249188|T249188]]
* 05:22 marostegui: Upgrade db1077 to 10.4.13 to test events memory leak
* 04:45 _joe_: de-firewalling mc1029
 
* 04:27 _joe_: firewallingf off memcached on mc1029
 
== 2020-06-05 ==
* 16:45 elukey@deploy1001: Finished deploy [analytics/turnilo/deploy@f7e4f78]: Upgrade to 1.24.0 (duration: 00m 11s)
* 16:45 elukey@deploy1001: Started deploy [analytics/turnilo/deploy@f7e4f78]: Upgrade to 1.24.0
* 16:29 bd808: Testing stashbot following hard restart of service. It was having LDAP connection failure problems.
* 16:00 AndyRussG: Turned off Fundraising job recurring_smashpig_charge
* 15:54 cdanis: enabling & rerunning puppet on netflow* [[phab:T254574|T254574]]
* 15:39 cdanis: disabling puppet on netflow* and trying {{Gerrit|I6598d8f8}} on netflow3001 first [[phab:T254574|T254574]]
* 15:39 cdanis: disabling puppet on netflow* and trying {{Gerrit|I6598d8f8}} on netflow3001 first
* 13:33 jayme@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'mathoid' for release 'staging' .
* 13:19 akosiaris@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'cxserver' for release 'staging' .
* 13:19 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 13:19 akosiaris@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'citoid' for release 'staging' .
* 13:18 akosiaris@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'staging' .
* 13:15 elukey@cumin1001: START - Cookbook sre.hosts.downtime
* 12:55 ladsgroup@deploy1001: Synchronized wmf-config/interwiki.php: Hotfix for be-tarask interwiki link being broken ([[phab:T111853|T111853]]) (duration: 01m 00s)
* 12:41 mutante: rebooting gerrit1002 to add more vCPUs, after [ganeti1009:~] $ sudo gnt-instance modify -B vcpus=8 gerrit1002.wikimedia.org [[phab:T239151|T239151]]
* 12:20 akosiaris@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'zotero' for release 'staging' .
* 12:19 akosiaris@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'wikifeeds' for release 'staging' .
* 12:19 akosiaris@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'cxserver' for release 'staging' .
* 12:19 akosiaris@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'citoid' for release 'staging' .
* 12:19 akosiaris@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'staging' .
* 12:17 akosiaris: update blubberoid changeprop changeprop-jobqueue citoid cxserver wikifeeds zotero in staging to latest charts
* 12:17 akosiaris@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'changeprop' for release 'staging' .
* 12:17 akosiaris@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'blubberoid' for release 'staging' .
* 12:17 akosiaris: fix typo in ganeti2016 /etc/network/interfaces and reboot
* 11:28 akosiaris: master-failover from ganeti2001 to ganeti2019 for ganeti01.svc.codfw.wmnet
* 11:25 akosiaris@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'kube-system' for release 'calico-policy-controller' .
* 11:25 akosiaris@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'kube-system' for release 'calico-policy-controller' .
* 11:25 akosiaris@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'kube-system' for release 'calico-policy-controller' .
* 11:14 mutante: running puppet on all ganeti nodes
* 11:05 ladsgroup@deploy1001: Synchronized wmf-config/interwiki.php: Update interwiki cache (duration: 02m 14s)
* 10:32 elukey@cumin1001: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0)
* 10:11 elukey@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
* 10:02 dzahn@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
* 09:49 jayme@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'mathoid' for release 'staging' .
* 09:46 elukey@cumin1001: START - Cookbook sre.ganeti.makevm
* 09:25 elukey@cumin1001: START - Cookbook sre.cassandra.roll-restart
* 09:03 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 09:00 akosiaris@cumin1001: START - Cookbook sre.hosts.downtime
* 08:44 akosiaris: reimage ganeti2016 for stretch
* 08:42 akosiaris: migrate mx2001.wikimedia.org to new ganeti nodes
* 08:40 akosiaris: migrate acrab to new ganeti nodes
* 08:38 akosiaris: failover master IP from ganeti1003 to ganeti1009
* 08:37 akosiaris: empty ganeti100<nowiki>{</nowiki>1,2,3,4<nowiki>}</nowiki>. Move all VMs to new ganeti nodes
* 08:28 akosiaris: migrate seaborgium.wikimedia.org to new ganeti nodes
* 08:27 akosiaris: migrate etherpad1002 to new ganeti nodes
* 08:11 marostegui: Upgrade db2075 to 10.1.45
* 07:52 vgutierrez: rolling restart of ats-tls - [[phab:T249335|T249335]]
* 07:20 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm
* 06:20 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 06:17 elukey@cumin1001: START - Cookbook sre.hosts.downtime
 
== 2020-06-04 ==
* 23:45 catrope@deploy1001: Synchronized wmf-config/mc.php: Set coalesceKeys=non-global for WANCache on enwiki (duration: 00m 59s)
* 23:29 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enable Minerva site notices on Wikivoyage wiis ([[phab:T254391|T254391]]) (duration: 00m 58s)
* 23:19 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Set guwiki timezone to Asia/Kolkata ([[phab:T253827|T253827]]) (duration: 00m 57s)
* 23:17 catrope@deploy1001: Synchronized static/images/: Change logo for zhwiki ([[phab:T254467|T254467]]) (duration: 01m 00s)
* 22:56 ryankemper: re-enabled puppet on `cloudelastic1006`. All `cloudelastic` instances now have puppet enabled and are in sync
* 20:56 ryankemper: enabled puppet on `cloudelastic1005` in order to kick off a puppet run and verify that this new node joins the ES cluster properly
* 20:39 ryankemper: disabled puppet on `cloudelastic100[5,6]` which are two racked nodes that we are now bringing into service. Will re-enable after successful puppet-merge / elasticsearch cluster join
* 20:38 ryankemper: disabled puppet on `cloudelastic100[5,6]` which are two racked nodes that we are now bringing into service. Will re-enable after successful puppet-merge / elasticsearch cluster join
* 19:04 jforrester@deploy1001: rebuilt and synchronized wikiversions files: all wikis to 1.35.0-wmf.35
* 15:12 elukey@puppetmaster1001: conftool action : set/pooled=yes; selector: name=druid1004.eqiad.wmnet
* 15:11 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 15:10 hnowlan@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' .
* 15:08 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 15:07 akosiaris@cumin1001: START - Cookbook sre.hosts.downtime
* 15:06 akosiaris@cumin1001: START - Cookbook sre.hosts.downtime
* 14:37 hnowlan@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'staging' .
* 14:36 moritzm: installing libexif security updates on jessie
* 14:08 moritzm: installing clamav security updates on mendelevium (ticket.wikimedia.org)
* 14:00 qchris: Stopping puppet on gerrit1002 (gerrit-test) to run tests for Gerrit upgrade
* 13:41 moritzm: bounced ferm on ms-be1023
* 13:35 moritzm: installing exim security updates on jessie (stretch/buster already done)
* 12:54 urbanecm@deploy1001 Synchronized wmf-config/InitialiseSettings.php: c06e720: Revert "wgNamespaceRobotPolicies: thwiki: Add 100 NS to noindex" (T253574) (duration: 01m 06s)
* 12:18 jayme@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'cxserver' for release 'production' .
* 12:14 jayme@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'citoid' for release 'production' .
* 12:02 moritzm: upgrading mw1276 to PHP 7.2.31
* 11:59 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1107', diff saved to https://phabricator.wikimedia.org/P11396 and previous config saved to /var/cache/conftool/dbconfig/20200604-115933-marostegui.json
* 11:53 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: {{Gerrit|ec07467}}: wgNamespaceRobotPolicies: thwiki: Add 100 NS to noindex ([[phab:T253574|T253574]]) (duration: 01m 15s)
* 11:49 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 11:49 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: {{Gerrit|338cb90}}: {{Gerrit|1ade16f}}: Change $wgNamespaceRobotPolicies on Thai wikis ([[phab:T253578|T253578]]; [[phab:T253577|T253577]]; [[phab:T253576|T253576]]; [[phab:T253575|T253575]]; [[phab:T253574|T253574]]) (duration: 01m 07s)
* 11:46 elukey@cumin1001: START - Cookbook sre.hosts.downtime
* 11:41 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1107', diff saved to https://phabricator.wikimedia.org/P11395 and previous config saved to /var/cache/conftool/dbconfig/20200604-114149-marostegui.json
* 11:29 marostegui: Compress InnoDB on db1091 before pooling it as new slave on s1 - [[phab:T254462|T254462]]
* 11:21 hashar@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [metawiki] Add `centralauth-rename` to WMF OIT staff - [[phab:T254372|T254372]] (duration: 01m 08s)
* 11:14 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 11:12 elukey@cumin1001: START - Cookbook sre.hosts.downtime
* 11:04 jayme@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
* 10:59 jayme@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'cxserver' for release 'production' .
* 10:53 jayme@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'citoid' for release 'production' .
* 10:53 elukey@puppetmaster1001: conftool action : set/pooled=no; selector: name=druid1004.eqiad.wmnet
* 10:46 marostegui: Deploy schema change on s3 (only testwiki) eqiad - [[phab:T238966|T238966]]
* 10:42 marostegui: Deploy schema change on s3 (only testwiki) codfw - [[phab:T238966|T238966]]
* 10:41 jbond42: deployed new version of puppet-merge revert is https://gerrit.wikimedia.org/r/c/operations/puppet/+/602329
* 09:57 jayme@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'wikifeeds' for release 'staging' .
* 09:56 akosiaris@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 09:56 akosiaris@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 09:55 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 09:55 akosiaris@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 09:55 akosiaris@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 09:54 jayme@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'cxserver' for release 'staging' .
* 09:53 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 09:51 akosiaris@cumin1001: START - Cookbook sre.hosts.downtime
* 09:51 akosiaris@cumin1001: START - Cookbook sre.hosts.downtime
* 09:50 akosiaris@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 09:50 akosiaris@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 09:50 akosiaris@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 09:50 akosiaris@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 09:50 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 09:50 akosiaris@cumin1001: START - Cookbook sre.hosts.downtime
* 09:50 akosiaris@cumin1001: START - Cookbook sre.hosts.downtime
* 09:50 akosiaris@cumin1001: START - Cookbook sre.hosts.downtime
* 09:50 akosiaris@cumin1001: START - Cookbook sre.hosts.downtime
* 09:50 akosiaris@cumin1001: START - Cookbook sre.hosts.downtime
* 09:50 akosiaris@cumin1001: START - Cookbook sre.hosts.downtime
* 09:50 akosiaris@cumin1001: START - Cookbook sre.hosts.downtime
* 09:48 akosiaris@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 09:48 akosiaris@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 09:48 akosiaris@cumin1001: START - Cookbook sre.hosts.downtime
* 09:48 akosiaris@cumin1001: START - Cookbook sre.hosts.downtime
* 09:48 akosiaris@cumin1001: START - Cookbook sre.hosts.downtime
* 09:48 akosiaris@cumin1001: START - Cookbook sre.hosts.downtime
* 09:46 jayme@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'citoid' for release 'staging' .
* 09:42 jmm@cumin2001: END (FAIL) - Cookbook sre.cassandra.roll-restart (exit_code=99)
* 09:42 jmm@cumin2001: START - Cookbook sre.cassandra.roll-restart
* 09:41 jmm@cumin2001: END (FAIL) - Cookbook sre.cassandra.roll-restart (exit_code=99)
* 09:41 jmm@cumin2001: START - Cookbook sre.cassandra.roll-restart
* 09:26 moritzm: rolling restart of cassandra on maps* to pick up Java security updates