You are browsing a read-only backup copy of Wikitech. The live site can be found at wikitech.wikimedia.org

Server Admin Log: Difference between revisions

From Wikitech-static
Jump to navigation Jump to search
imported>Stashbot
(reedy@deploy1001: Synchronized wmf-config/LabsServices.php: labs: Move RB traffic to new stretch host (duration: 01m 11s))
imported>Stashbot
(TheresNoTime: T302486 : `[samtar@mwmaint1002 ~]$ mwscript maintenance/fixMergeHistoryCorruption.php --wiki enwiki --dry-run --ns 828`)
 
(869 intermediate revisions by 4 users not shown)
Line 1: Line 1:
== 2020-04-19 ==
== 2022-12-04 ==
* 16:19 reedy@deploy1001: Synchronized wmf-config/LabsServices.php: labs: Move RB traffic to new stretch host (duration: 01m 11s)
* 04:19 TheresNoTime: [[phab:T302486|T302486]] : `[samtar@mwmaint1002 ~]$ mwscript maintenance/fixMergeHistoryCorruption.php --wiki enwiki --dry-run --ns 828`
* 16:05 vgutierrez: rolling restart of ats-tls in text@esams - [[phab:T249335|T249335]]
* 05:51 marostegui: Power back on db1140 [[phab:T250602|T250602]]


== 2020-04-18 ==
== 2022-12-03 ==
* 22:50 addshore: pool wdqs1006 blazegraph caught up [[phab:T242453|T242453]]
* 00:17 cwhite: draining shards from logstash1010, logstash1033, logstash1034, logstash1035 - [[phab:T321410|T321410]]
* 20:30 cdanis@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=wdqs,name=eqiad
* 20:27 thcipriani: restart gerrit-replica
* 16:40 dcausse: forcing replica count to 1 on some cloudelastic@chi indices
* 15:13 Amir1: applying schema change of [[phab:T139090|T139090]] on labswiki (wikitech)
* 14:03 cdanis@cumin1001: conftool action : set/pooled=false; selector: dnsdisc=wdqs,name=eqiad
* 12:19 addshore: restarting blazegraph on wdqs1006 blazegraph stuck [[phab:T242453|T242453]]
* 12:15 addshore: depool wdqs1006 blazegraph stuck [[phab:T242453|T242453]]
* 06:07 XioNoX: change OSPF metrics to prefer ulsfo tunnel transport


== 2020-04-17 ==
== 2022-12-02 ==
* 19:33 Krinkle: Depool mw1407.eqiad.wmnet for opcache testing.  Do not repool without first reverting https://gerrit.wikimedia.org/r/589674.
* 19:42 volans@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 19:32 Krinkle: Depool mw1407.eqiad.wmnet for opcache and LCStoreStaticArray testing. – [[phab:T99740|T99740]]
* 19:42 volans@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Force run after a permission problem - volans@cumin1001"
* 17:41 cmjohnson1: replacing network cable pc1009 [[phab:T250257|T250257]]
* 19:41 volans@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Force run after a permission problem - volans@cumin1001"
* 17:34 cmjohnson1: moving msw1 to msw-c racks mounted switch cable ports from port 49 to port 50
* 19:39 volans@cumin1001: START - Cookbook sre.dns.netbox
* 17:22 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 19:38 volans@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 17:22 jmm@cumin2001: START - Cookbook sre.hosts.downtime
* 19:37 volans@cumin1001: START - Cookbook sre.dns.netbox
* 16:15 Urbanecm: Revert recent email change of User:CPHL@SUL's email
* 19:36 volans: fixed git checkout permissions [[phab:T324334|T324334]]
* 16:05 otto@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'eventstreams' for release 'canary' .
* 19:11 sukhe: restart pybal on lvs5004
* 16:05 otto@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'eventstreams' for release 'production' .
* 19:07 mutante: gitlab-runner* - upgrading gitlab-runner package version
* 15:52 otto@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'eventgate-main' for release 'canary' .
* 18:55 sukhe: homer "cr*-eqsin*" commit "running homer for Gerrit: 863383"
* 15:52 otto@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'eventgate-main' for release 'production' .
* 18:53 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts lvs5001.eqsin.wmnet
* 15:48 otto@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'eventgate-analytics' for release 'canary' .
* 18:53 sukhe@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 15:48 otto@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'eventgate-analytics' for release 'production' .
* 18:53 sukhe@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: lvs5001.eqsin.wmnet decommissioned, removing all IPs except the asset tag one - sukhe@cumin2002"
* 15:42 otto@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'eventgate-analytics-external' for release 'canary' .
* 18:51 sukhe@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: lvs5001.eqsin.wmnet decommissioned, removing all IPs except the asset tag one - sukhe@cumin2002"
* 15:42 otto@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'eventgate-analytics-external' for release 'production' .
* 18:49 sukhe@cumin2002: START - Cookbook sre.dns.netbox
* 15:20 rzl: remove cronjobs from mwmaint1002 previously updated to systemd timers and erroneously left in crontab -- diffs: https://phabricator.wikimedia.org/P11012 [[phab:T211250|T211250]]
* 18:44 sukhe@cumin2002: START - Cookbook sre.hosts.decommission for hosts lvs5001.eqsin.wmnet
* 14:29 mutante: ganeti2001 - kileld and restarted gnt-rapi process with the correct new key and cert
* 18:22 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on lvs5001.eqsin.wmnet with reason: downtimed, in the process of decom
* 14:19 cdanis: add peer AS29802 to cr2-eqdfw and cr2-esams
* 18:21 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 4:00:00 on lvs5001.eqsin.wmnet with reason: downtimed, in the process of decom
* 14:01 mutante: netbox1001 - netbox_ganeti_eqiad_synx / systemd state fixed after gnt-rapi is runnign again on ganeti1003
* 18:20 sukhe: decomm lvs5001: restarting pybal
* 14:00 mutante: ganeti1003 - fixing gnt-rapi daemon not running
* 18:14 sukhe: cr[23]-eqsin*: set routing-options static route 103.102.166.224/28 next-hop 10.132.0.39
* 13:54 mateusbs17: Running VACUUM FULL for gis DB in maps2004.codfw.wmnet (which is depooled at the moment)
* 18:05 volans@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 13:00 mutante: netbox1001 - sudo systemctl start netbox_ganeti_eqiad_sync (was failed)
* 18:05 volans@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Test run after git gc - volans@cumin1001"
* 12:54 mutante: contint2001 /usr/local/sbin/build-envoy-config -c /etc/envoy ; restart envoyproxy; was not listening on admin port
* 18:03 volans@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Test run after git gc - volans@cumin1001"
* 12:45 mutante: cntint2001 - restart nagios-nrpe-server
* 18:01 volans@cumin1001: START - Cookbook sre.dns.netbox
* 12:28 moritzm: copied kubernetes-client from stretch-wikimedia to buster-wikimedia [[phab:T224591|T224591]]
* 18:00 volans: performed git gc on all (auth)dns hosts in /srv/git/netbox_dns_snippets - [[phab:T324334|T324334]]
* 11:35 mutante: contint2001 - apt-get update, run puppet to install helm-diff
* 17:36 sukhe: homer "cr*-eqsin*" commit "running homer for Gerrit: 862944"
* 11:33 jayme: imported helm-diff 2.11.0+3-2+deb10u1 to main for buster-wikimedia
* 16:56 bking@cumin2002: END (PASS) - Cookbook sre.wdqs.restart (exit_code=0)
* 11:23 dzahn@cumin2001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=99)
* 16:53 jnuche@deploy1002: Finished scap: testing k8s deployment (duration: 08m 35s)
* 11:23 dzahn@cumin2001: START - Cookbook sre.hosts.decommission
* 16:49 bking@cumin2002: START - Cookbook sre.wdqs.restart
* 11:22 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
* 16:49 bblack: (above agent runs completed on all text nodes for requestctl-for-misc patch)
* 11:21 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
* 16:44 jnuche@deploy1002: Started scap: testing k8s deployment
* 11:20 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
* 16:44 bblack: running agent on A:cp-text for https://gerrit.wikimedia.org/r/c/operations/puppet/+/863375 (requestctl for misc)
* 11:20 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
* 16:29 bking@cumin2002: END (PASS) - Cookbook sre.wdqs.restart (exit_code=0)
* 10:17 _joe_: contint1001:~$ sudo systemctl restart envoyproxy.service
* 16:28 sukhe@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host lvs5004.eqsin.wmnet with OS buster
* 10:16 _joe_: contint1001:~$ sudo /usr/local/sbin/build-envoy-config -c /etc/envoy
* 16:21 bking@cumin2002: START - Cookbook sre.wdqs.restart
* 10:07 kormat: change pc2010 to replicate from pc1010 [[phab:T247787|T247787]]
* 16:03 bking@cumin2002: END (PASS) - Cookbook sre.wdqs.restart (exit_code=0)
* 09:54 kormat: enabling replication from pc1007 to pc1010 [[phab:T247787|T247787]]
* 16:02 sukhe@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs5004.eqsin.wmnet with reason: host reimage
* 09:20 jayme: imported helm 2.12.2 to main for buster-wikimedia
* 15:59 sukhe@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs5004.eqsin.wmnet with reason: host reimage
* 09:07 vgutierrez: disable KA between ats-tls and varnish-fe on cp1077 - [[phab:T250258|T250258]]
* 15:55 bking@cumin2002: START - Cookbook sre.wdqs.restart
* 09:00 kormat: dropping wikidatawiki.wb_items_per_site_old table in eqiad (non-labs hosts) [[phab:T250345|T250345]]
* 15:48 sukhe: homer "cr*-eqsin*" commit "running homer for Gerrit: 862998"
* 08:15 kormat: dropping wikidatawiki.wb_items_per_site_old table in codfw  [[phab:T250345|T250345]]
* 15:47 bking@cumin2002: END (PASS) - Cookbook sre.wdqs.restart (exit_code=0)
* 07:54 ema: cache_text: puppet run to stop vhtcpd and start purged [[phab:T249325|T249325]]
* 15:43 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dns5004.wikimedia.org with OS buster
* 07:45 gehel: restart wdqs-updater on all nodes after deployment
* 15:40 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
* 06:31 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1092 after compression', diff saved to https://phabricator.wikimedia.org/P11005 and previous config saved to /var/cache/conftool/dbconfig/20200417-063138-marostegui.json
* 15:40 bking@cumin2002: START - Cookbook sre.wdqs.restart
* 06:30 marostegui@cumin1001: dbctl commit (dc=all): 'Remove db1111 from API', diff saved to https://phabricator.wikimedia.org/P11004 and previous config saved to /var/cache/conftool/dbconfig/20200417-063038-marostegui.json
* 15:36 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
* 06:26 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1092 after compression', diff saved to https://phabricator.wikimedia.org/P11003 and previous config saved to /var/cache/conftool/dbconfig/20200417-062642-marostegui.json
* 15:33 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
* 06:19 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1092 after compression', diff saved to https://phabricator.wikimedia.org/P11002 and previous config saved to /var/cache/conftool/dbconfig/20200417-061907-marostegui.json
* 15:30 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
* 06:04 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1092 after compression', diff saved to https://phabricator.wikimedia.org/P11001 and previous config saved to /var/cache/conftool/dbconfig/20200417-060419-marostegui.json
* 15:29 bking@cumin2002: END (PASS) - Cookbook sre.wdqs.restart (exit_code=0)
* 15:28 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
* 15:22 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' .
* 15:22 bking@cumin2002: START - Cookbook sre.wdqs.restart
* 15:16 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dns5004.wikimedia.org with reason: host reimage
* 15:13 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
* 15:12 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on dns5004.wikimedia.org with reason: host reimage
* 15:06 volans: run `git gc` on /srv/netbox-exports/dns.git on netbox[12]002 - [[phab:T324334|T324334]]
* 14:48 sukhe@cumin1001: START - Cookbook sre.hosts.reimage for host lvs5004.eqsin.wmnet with OS buster
* 14:38 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host dns5004.wikimedia.org with OS buster
* 12:09 jynus: dropping all databases from db1133
* 11:16 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ganeti5001.eqsin.wmnet
* 11:16 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 11:16 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti5001.eqsin.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
* 11:12 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti5001.eqsin.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
* 11:02 jmm@cumin2002: START - Cookbook sre.dns.netbox
* 10:57 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ganeti5001.eqsin.wmnet
* 10:56 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
* 10:34 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on ganeti5001.eqsin.wmnet with reason: Remove from cluster for decom
* 10:34 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on ganeti5001.eqsin.wmnet with reason: Remove from cluster for decom
* 10:01 vgutierrez: upload acme-chief 0.36 to apt.wm.o (bullseye) - [[phab:T321309|T321309]]
* 09:58 moritzm: installing publicsuffix updates from bullseye/buster point releases
* 09:54 moritzm: installing debootstrap updates from bullseye point release
* 09:53 moritzm: rebalance ganeti codfw/C [[phab:T323222|T323222]]
* 09:52 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.addnode (exit_code=0) for new host ganeti2013.codfw.wmnet to cluster codfw and group C
* 09:51 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti2013.codfw.wmnet to cluster codfw and group C
* 09:11 marostegui@cumin1001: dbctl commit (dc=all): 'db1134 (re)pooling @ 100%: After cloning db1206', diff saved to https://phabricator.wikimedia.org/P42215 and previous config saved to /var/cache/conftool/dbconfig/20221202-091126-root.json
* 08:56 marostegui@cumin1001: dbctl commit (dc=all): 'db1134 (re)pooling @ 75%: After cloning db1206', diff saved to https://phabricator.wikimedia.org/P42214 and previous config saved to /var/cache/conftool/dbconfig/20221202-085621-root.json
* 08:41 jayme@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
* 08:41 jayme@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
* 08:41 marostegui@cumin1001: dbctl commit (dc=all): 'db1134 (re)pooling @ 50%: After cloning db1206', diff saved to https://phabricator.wikimedia.org/P42213 and previous config saved to /var/cache/conftool/dbconfig/20221202-084116-root.json
* 08:41 jayme@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
* 08:40 jayme@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
* 08:26 marostegui@cumin1001: dbctl commit (dc=all): 'db1134 (re)pooling @ 25%: After cloning db1206', diff saved to https://phabricator.wikimedia.org/P42212 and previous config saved to /var/cache/conftool/dbconfig/20221202-082611-root.json
* 08:11 marostegui@cumin1001: dbctl commit (dc=all): 'db1134 (re)pooling @ 10%: After cloning db1206', diff saved to https://phabricator.wikimedia.org/P42211 and previous config saved to /var/cache/conftool/dbconfig/20221202-081106-root.json
* 07:56 marostegui@cumin1001: dbctl commit (dc=all): 'db1134 (re)pooling @ 5%: After cloning db1206', diff saved to https://phabricator.wikimedia.org/P42210 and previous config saved to /var/cache/conftool/dbconfig/20221202-075601-root.json
* 07:49 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
* 07:49 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 07:49 elukey@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
* 07:49 elukey@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
* 07:49 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
* 07:49 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
* 07:43 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
* 07:43 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
* 07:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1163 (re)pooling @ 100%: Maint done', diff saved to https://phabricator.wikimedia.org/P42209 and previous config saved to /var/cache/conftool/dbconfig/20221202-074300-ladsgroup.json
* 07:41 moritzm: draining ganeti5001 for eventual decom [[phab:T322048|T322048]]
* 07:41 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
* 07:41 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
* 07:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1163 (re)pooling @ 75%: Maint done', diff saved to https://phabricator.wikimedia.org/P42208 and previous config saved to /var/cache/conftool/dbconfig/20221202-072755-ladsgroup.json
* 07:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1163 (re)pooling @ 25%: Maint done', diff saved to https://phabricator.wikimedia.org/P42207 and previous config saved to /var/cache/conftool/dbconfig/20221202-071250-ladsgroup.json
* 06:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1163 (re)pooling @ 10%: Maint done', diff saved to https://phabricator.wikimedia.org/P42206 and previous config saved to /var/cache/conftool/dbconfig/20221202-065745-ladsgroup.json
* 06:13 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1134', diff saved to https://phabricator.wikimedia.org/P42204 and previous config saved to /var/cache/conftool/dbconfig/20221202-061259-marostegui.json
* 00:09 rzl@cumin1001: conftool action : set/pooled=no; selector: name=mw14(45{{!}}46).eqiad.wmnet,cluster=jobrunner
* 00:09 rzl@cumin1001: conftool action : set/pooled=no; selector: name=mw14(39{{!}}40).eqiad.wmnet,cluster=videoscaler
* 00:07 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dns5004.wikimedia.org with OS buster


== 2020-04-16 ==
== 2022-12-01 ==
* 22:34 maryum: reindexing wikis that failed from previous reindex on mwmain1002
* 23:47 rzl@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw[1347-1348].eqiad.wmnet
* 22:10 jforrester@deploy1001: Pruned MediaWiki: 1.35.0-wmf.26 (duration: 05m 26s)
* 23:47 rzl@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 21:59 jforrester@deploy1001: Synchronized php-1.35.0-wmf.28/extensions/FlaggedRevs/: [[phab:T250439|T250439]] Don't try to create a Revision with null (duration: 01m 02s)
* 23:47 rzl@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: mw[1347-1348].eqiad.wmnet decommissioned, removing all IPs except the asset tag one - rzl@cumin1001"
* 21:54 bsitzmann@deploy1001: helmfile [CODFW] Ran 'apply' command on namespace 'wikifeeds' for release 'production' .
* 23:45 rzl@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: mw[1347-1348].eqiad.wmnet decommissioned, removing all IPs except the asset tag one - rzl@cumin1001"
* 21:51 bsitzmann@deploy1001: helmfile [EQIAD] Ran 'apply' command on namespace 'wikifeeds' for release 'production' .
* 23:43 rzl@cumin1001: START - Cookbook sre.dns.netbox
* 21:48 mholloway-shell@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'wikifeeds' for release 'staging' .
* 23:37 rzl@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw[1347-1348].eqiad.wmnet
* 20:42 mstyles@deploy1001: Finished deploy [wdqs/wdqs@1fb52b3]: WDQS version 0.3.22 (duration: 11m 43s)
* 23:35 rzl@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw[1327-1346].eqiad.wmnet
* 20:30 mstyles@deploy1001: Started deploy [wdqs/wdqs@1fb52b3]: WDQS version 0.3.22
* 23:35 rzl@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 20:01 maryum: "beginning deploy of WDQS 0.3.22"
* 23:35 rzl@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: mw[1327-1346].eqiad.wmnet decommissioned, removing all IPs except the asset tag one - rzl@cumin1001"
* 19:06 jforrester@deploy1001: rebuilt and synchronized wikiversions files: all wikis to 1.35.0-wmf.28
* 23:34 rzl@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: mw[1327-1346].eqiad.wmnet decommissioned, removing all IPs except the asset tag one - rzl@cumin1001"
* 18:57 krinkle@deploy1001: Synchronized errorpages/404.php: {{Gerrit|I9fd5c99130c64}} (duration: 01m 07s)
* 23:31 rzl@cumin1001: START - Cookbook sre.dns.netbox
* 17:52 XioNoX: rename/format asw-ulsfo interfaces to match future homer driven format
* 22:59 rzl@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw[1327-1346].eqiad.wmnet
* 16:51 herron: kafka-logging eqiad set retention.bytes=500000000000 on topic udp_localhost-warning [[phab:T250133|T250133]]
* 22:57 urbanecm@deploy1002: Finished scap: Backport for [[gerrit:856008{{!}}GrowthExperiments: Remove unused config variable GEMentorDashboardUseVue]] (duration: 07m 28s)
* 16:45 herron: kafka-logging eqiad set retention.bytes=500000000000 on topic udp_localhost-info [[phab:T250133|T250133]]
* 22:57 rzl: rzl@puppetmaster1001:~$ sudo puppet node deactivate mw1320.eqiad.wmnet  # [[phab:T306162|T306162]]
* 16:30 hnowlan@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'changeprop' for release 'staging' .
* 22:56 rzl: rzl@puppetmaster1001:~$ sudo puppet node deactivate mw1312.eqiad.wmnet  # [[phab:T306162|T306162]]
* 15:54 elukey: restart chi on cloudelastic1001 with -XX:NewRatio=3 - [[phab:T231517|T231517]]
* 22:54 rzl@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts mw[1307-1326].eqiad.wmnet
* 15:26 akosiaris: truncate /var/log/ganeti/monitoring-daemon-error.log on ganeti1003, start again all ganeti daemons
* 22:54 rzl@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 15:20 akosiaris: stop ganeti daemons on ganeti1003
* 22:54 rzl@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: mw[1307-1326].eqiad.wmnet decommissioned, removing all IPs except the asset tag one - rzl@cumin1001"
* 15:02 Urbanecm: mwscript extensions/CentralAuth/maintenance/fixStuckGlobalRename.php --wiki=commonswiki --logwiki=metawiki 'Petri Gyula' '23eki' ([[phab:T250387|T250387]])
* 22:50 urbanecm@deploy1002: Started scap: Backport for [[gerrit:856008{{!}}GrowthExperiments: Remove unused config variable GEMentorDashboardUseVue]]
* 14:51 hknust: holger@mwmaint1002 END (Fail)  uppercaseTitlesForUnicodeTransition.php as part of [[phab:T219279|T219279]]
* 22:49 urbanecm@deploy1002: backport aborted(duration: 00m 03s)
* 14:30 hknust: holger@mwmaint1002 Starting uppercaseTitlesForUnicodeTransition.php as part of [[phab:T219279|T219279]]
* 22:42 andrewbogott: upgradedwikitech-static-ord (aka wikitech-static) to Debian Buster, installed php7.4, upgraded MW to 1_39. Will delete the rackspace backup image in a few days.
* 14:21 gehel@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99)
* 22:19 rzl@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: mw[1307-1326].eqiad.wmnet decommissioned, removing all IPs except the asset tag one - rzl@cumin1001"
* 14:17 hnowlan@deploy1001: Finished deploy [changeprop/deploy@354ae2d]: Enabling rules on k8s, disabling on scb (duration: 01m 12s)
* 22:07 rzl@cumin1001: START - Cookbook sre.dns.netbox
* 14:16 hnowlan@deploy1001: Started deploy [changeprop/deploy@354ae2d]: Enabling rules on k8s, disabling on scb
* 22:02 cwhite: restart swift-proxy on thanos::frontend eqiad
* 14:14 dcausse: elastic (search cluster) reindexing commonswiki_content in codfw and ediad ([[phab:T246882|T246882]])
* 22:01 brennen: end of utc late backport & config window
* 14:13 ema: cache: upgrade varnish to 5.1.3-1wm14 and rolling restart [[phab:T249810|T249810]]
* 21:46 brennen@deploy1002: Finished scap: Backport for [[gerrit:859568{{!}}GrowthExperiments: Enable user impact refresh script on pilot wikis (T322541)]] (duration: 07m 48s)
* 13:40 XioNoX: rename/format asw2-esams interfaces to match future homer driven format
* 21:40 brennen@deploy1002: brennen and kharlan: Backport for [[gerrit:859568{{!}}GrowthExperiments: Enable user impact refresh script on pilot wikis (T322541)]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet
* 13:36 kormat: Optimizing all tables on pc1010 [[phab:T247787|T247787]]
* 21:38 brennen@deploy1002: Started scap: Backport for [[gerrit:859568{{!}}GrowthExperiments: Enable user impact refresh script on pilot wikis (T322541)]]
* 13:32 hashar: Restarting CI Jenkins for plugin upgrade [[phab:T250377|T250377]]
* 21:34 brennen@deploy1002: Finished scap: Backport for [[gerrit:863011{{!}}New configs for android schemas]] (duration: 09m 49s)
* 13:04 hnowlan@deploy1001: Finished deploy [changeprop/deploy@baf0a4b]: Rollback removing k8s rules, again (duration: 00m 30s)
* 21:26 brennen@deploy1002: brennen and sharvaniharan: Backport for [[gerrit:863011{{!}}New configs for android schemas]] synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet
* 13:04 hnowlan@deploy1001: Started deploy [changeprop/deploy@baf0a4b]: Rollback removing k8s rules, again
* 21:25 andrewbogott: saving an image of wikitech-static-ord (aka wikitech-static) before upgrading the host to Buster
* 13:03 gehel@cumin1001: START - Cookbook sre.wdqs.data-reload
* 21:25 brennen@deploy1002: Started scap: Backport for [[gerrit:863011{{!}}New configs for android schemas]]
* 12:54 gehel@cumin1001: END (ERROR) - Cookbook sre.wdqs.data-reload (exit_code=97)
* 21:22 rzl@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw[1307-1326].eqiad.wmnet
* 12:54 gehel@cumin1001: START - Cookbook sre.wdqs.data-reload
* 21:21 brennen@deploy1002: Finished scap: Backport for [[gerrit:861853{{!}}Start writing to cul_actor on test wikis (T233004)]] (duration: 14m 56s)
* 12:48 vgutierrez: pool cp1087
* 21:13 rzl@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=99) for hosts mw[1307-1326].eqiad.wmnet
* 12:44 jynus: test sal again
* 21:10 rzl@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw[1307-1326].eqiad.wmnet
* 11:29 elukey: restart atskafka on cp3050 after maintenance
* 21:08 brennen@deploy1002: brennen and zabe: Backport for [[gerrit:861853{{!}}Start writing to cul_actor on test wikis (T233004)]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet
* 11:22 XioNoX: rename/format asw1-eqsin interfaces to match future homer driven format
* 21:06 brennen@deploy1002: Started scap: Backport for [[gerrit:861853{{!}}Start writing to cul_actor on test wikis (T233004)]]
* 11:17 elukey: stop atskafka on cp3050 to re-create the topic atskafka_test_webrequest_text on Kafka Jumbo - [[phab:T250347|T250347]]
* 20:47 aokoth@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for gitlab1004.wikimedia.org
* 11:16 Urbanecm: EU SWAT done
* 20:47 aokoth@cumin1001: START - Cookbook sre.hosts.remove-downtime for gitlab1004.wikimedia.org
* 11:15 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: {{Gerrit|a105f38}}: Remove broken groupOverrides from amwikimedia ([[phab:T249585|T249585]]) (duration: 01m 05s)
* 20:27 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1061.eqiad.wmnet with OS bullseye
* 11:12 urbanecm@deploy1001: Synchronized wmf-config/CommonSettings.php: SWAT: {{Gerrit|70ee5f6}}: Remove grants for tboverride and tboverride-account ([[phab:T241114|T241114]]) (duration: 01m 06s)
* 20:16 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dns5004.wikimedia.org with reason: host reimage
* 11:05 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: {{Gerrit|74ad793}}: Turn off direct account creations at Testwikidata ([[phab:T250348|T250348]]; take II) (duration: 01m 04s)
* 20:12 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1061.eqiad.wmnet with reason: host reimage
* 11:04 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: {{Gerrit|74ad793}}: Turn off direct account creations at Testwikidata ([[phab:T250348|T250348]]) (duration: 01m 06s)
* 20:12 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on dns5004.wikimedia.org with reason: host reimage
* 11:03 urbanecm@deploy1001: sync-file aborted: SWAT: {{Gerrit|74ad793}}: Turn off direct account creations at Testwikidata (duration: 00m 00s)
* 20:09 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1061.eqiad.wmnet with reason: host reimage
* 10:54 hnowlan@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'changeprop' for release 'production' .
* 20:00 aokoth@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on gitlab1004.wikimedia.org with reason: upgrade gitlab1004 to new version https://phabricator.wikmiedia.org/T324195
* 10:45 hnowlan@deploy1001: Finished deploy [changeprop/deploy@354ae2d]: Testing rules moved to k8s (duration: 01m 16s)
* 19:59 aokoth@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on gitlab1004.wikimedia.org with reason: upgrade gitlab1004 to new version https://phabricator.wikmiedia.org/T324195
* 10:45 vgutierrez: upgrading ATS to version 8.0.7-rc0-1wm3 -  [[phab:T249335|T249335]]
* 19:56 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host cloudvirt1061.eqiad.wmnet with OS bullseye
* 10:44 hnowlan@deploy1001: Started deploy [changeprop/deploy@354ae2d]: Testing rules moved to k8s
* 19:53 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cloudvirt1061']
* 10:44 vgutierrez: rolling restart of ats-tls to enable TLSv1.3 globally and disable the old TLS session cache - [[phab:T170567|T170567]]
* 19:44 mutante: gitlab-runner1002 - upgrading gitlab-runner package
* 10:35 hnowlan@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'changeprop' for release 'production' .
* 19:44 rzl@cumin2002: conftool action : set/pooled=inactive; selector: name=mw13(0[7-9]{{!}}[1-3]\d{{!}}4[0-8])\..*
* 10:35 hnowlan@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'changeprop' for release 'production' .
* 19:43 rzl@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on 42 hosts with reason: decom
* 10:31 hnowlan@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'changeprop' for release 'production' .
* 19:43 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
* 10:22 hnowlan@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'changeprop' for release 'staging' .
* 19:43 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
* 09:33 elukey: restart atskafka on cp3050 to pick up snappy compression - [[phab:T250347|T250347]]
* 19:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1200 ([[phab:T323907|T323907]])', diff saved to https://phabricator.wikimedia.org/P42201 and previous config saved to /var/cache/conftool/dbconfig/20221201-194301-ladsgroup.json
* 09:32 ema: cp2027: upgrade varnish to 5.1.3-1wm14 [[phab:T249810|T249810]]
* 19:42 rzl@cumin2002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on 42 hosts with reason: decom
* 09:17 ema: text@esams: stop vhtcpd, start purged [[phab:T249325|T249325]]
* 19:41 mutante: gitlab2002 (gitlab-replica) - upgrading gitlab-ce
* 09:16 jynus: starting es backups on backup2002 [[phab:T79922|T79922]]
* 19:40 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host dns5004.wikimedia.org with OS buster
* 08:33 kormat: Disconnect pc1008 replication from pc1010 [[phab:T247787|T247787]]
* 19:39 sukhe@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host dns5004.wikimedia.org with OS buster
* 08:22 ema: cp3050: upgrade purged to 0.7 [[phab:T249583|T249583]]
* 19:38 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudvirt1061']
* 08:22 ema: upload purged 0.7 to buster-wikimedia [[phab:T249583|T249583]]
* 19:35 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cloudvirt1061']
* 08:21 Urbanecm: Set email for Geraki@grwikimedia ([[phab:T245911|T245911]])
* 19:28 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudvirt1061']
* 08:18 kormat@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool pc1008 as pc2 master [[phab:T247787|T247787]] (duration: 01m 08s)
* 19:28 dancy@deploy1002: Finished scap: testing k8s deployment (duration: 06m 17s)
* 08:06 mutante: mw1396 - restarted php7.2-fpm - was: 503 Service Unavailable - header 'X-Powered-By: PHP/7.' not found on 'http://en.wikipedia.org:80/wiki/Main_Page'
* 19:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1200', diff saved to https://phabricator.wikimedia.org/P42200 and previous config saved to /var/cache/conftool/dbconfig/20221201-192755-ladsgroup.json
* 08:04 mutante: mw1396 - restarted apache
* 19:27 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
* 07:50 vgutierrez: rolling update ats to version 8.0.7-rc0-1wm3 in cp[4026,4032,5006,5012] - [[phab:T249335|T249335]]
* 19:27 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cloudvirt1061']
* 07:49 vgutierrez: upload trafficserver 8.0.7-rc0-1wm3 to apt.wm.o (buster) - [[phab:T249335|T249335]]
* 19:27 sukhe@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host lvs5004.eqsin.wmnet with OS buster
* 07:15 volker-e@deploy1001: Finished deploy [design/style-guide@2a7cc4a]: Deploy design/style-guide:  (duration: 00m 08s)
* 19:25 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudvirt1061']
* 07:15 volker-e@deploy1001: Started deploy [design/style-guide@2a7cc4a]: Deploy design/style-guide:
* 19:22 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1060.eqiad.wmnet with OS bullseye
* 06:33 moritzm: installing apache-log4j1.2 security updates on jessie
* 19:21 dancy@deploy1002: Started scap: testing k8s deployment
* 06:29 moritzm: installing icu security updates on jessie
* 19:21 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
* 06:15 moritzm: installing git security updates on jessie
* 19:20 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
* 05:43 marostegui@cumin1001: dbctl commit (dc=all): 'Reorganize s8 weights a little bit after the addition of the new host db1114', diff saved to https://phabricator.wikimedia.org/P10995 and previous config saved to /var/cache/conftool/dbconfig/20200416-054353-marostegui.json
* 19:16 dancy@deploy1002: rebuilt and synchronized wikiversions files: group2 wikis to 1.40.0-wmf.12  refs [[phab:T320517|T320517]]
* 05:33 elukey: restart hadoop-yarn-nodemanager on an-worker108[4,5] - failed after GC OOM events (heavy spark jobs)
* 19:15 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cloudvirt1061']
* 19:13 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
* 19:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1200', diff saved to https://phabricator.wikimedia.org/P42199 and previous config saved to /var/cache/conftool/dbconfig/20221201-191248-ladsgroup.json
* 19:09 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1057.eqiad.wmnet with OS bullseye
* 19:08 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
* 19:08 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
* 19:08 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
* 19:08 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
* 19:06 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1060.eqiad.wmnet with reason: host reimage
* 19:02 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1060.eqiad.wmnet with reason: host reimage
* 19:02 dancy@deploy1002: Installation of scap version "4.30.0" completed for 601 hosts
* 19:01 dancy@deploy1002: Installing scap version "4.30.0" for 601 hosts
* 18:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1200 ([[phab:T323907|T323907]])', diff saved to https://phabricator.wikimedia.org/P42197 and previous config saved to /var/cache/conftool/dbconfig/20221201-185742-ladsgroup.json
* 18:55 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1057.eqiad.wmnet with reason: host reimage
* 18:51 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1057.eqiad.wmnet with reason: host reimage
* 18:43 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudvirt1061']
* 18:38 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host cloudvirt1057.eqiad.wmnet with OS bullseye
* 18:38 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cloudvirt1061']
* 18:37 rzl@cumin2002: conftool action : set/pooled=no; selector: name=mw13(0[7-9]{{!}}[1-3]\d{{!}}4[0-8])\..*
* 18:34 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1057.eqiad.wmnet with OS bullseye
* 18:27 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/api-gateway: sync
* 18:27 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/api-gateway: sync
* 18:27 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/api-gateway: sync
* 18:26 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/api-gateway: sync
* 18:25 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/api-gateway: sync
* 18:25 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/api-gateway: sync
* 18:21 bd808@deploy1002: helmfile [eqiad] DONE helmfile.d/services/developer-portal: apply
* 18:19 bd808@deploy1002: helmfile [eqiad] START helmfile.d/services/developer-portal: apply
* 18:19 bd808@deploy1002: helmfile [codfw] DONE helmfile.d/services/developer-portal: apply
* 18:17 bd808@deploy1002: helmfile [codfw] START helmfile.d/services/developer-portal: apply
* 18:17 bd808@deploy1002: helmfile [staging] DONE helmfile.d/services/developer-portal: apply
* 18:16 bd808@deploy1002: helmfile [staging] START helmfile.d/services/developer-portal: apply
* 18:16 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1059.eqiad.wmnet with OS bullseye
* 18:14 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudvirt1061']
* 18:12 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host cloudvirt1060.eqiad.wmnet with OS bullseye
* 18:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1200 ([[phab:T323907|T323907]])', diff saved to https://phabricator.wikimedia.org/P42196 and previous config saved to /var/cache/conftool/dbconfig/20221201-181215-ladsgroup.json
* 18:12 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1200.eqiad.wmnet with reason: Maintenance
* 18:11 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1200.eqiad.wmnet with reason: Maintenance
* 18:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1185 ([[phab:T323907|T323907]])', diff saved to https://phabricator.wikimedia.org/P42195 and previous config saved to /var/cache/conftool/dbconfig/20221201-181153-ladsgroup.json
* 18:11 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cloudvirt1060']
* 18:11 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudvirt1060']
* 18:10 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1058.eqiad.wmnet with OS bullseye
* 18:01 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host lvs5004.eqsin.wmnet with OS buster
* 18:01 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1059.eqiad.wmnet with reason: host reimage
* 17:58 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1058.eqiad.wmnet with reason: host reimage
* 17:57 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1059.eqiad.wmnet with reason: host reimage
* 17:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1185', diff saved to https://phabricator.wikimedia.org/P42194 and previous config saved to /var/cache/conftool/dbconfig/20221201-175647-ladsgroup.json
* 17:55 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1058.eqiad.wmnet with reason: host reimage
* 17:51 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dns5004.wikimedia.org with reason: host reimage
* 17:50 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cloudvirt1060']
* 17:50 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudvirt1060']
* 17:47 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on dns5004.wikimedia.org with reason: host reimage
* 17:47 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cloudvirt1060']
* 17:46 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudvirt1060']
* 17:45 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cloudvirt1060']
* 17:44 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host cloudvirt1059.eqiad.wmnet with OS bullseye
* 17:42 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host cloudvirt1058.eqiad.wmnet with OS bullseye
* 17:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1185', diff saved to https://phabricator.wikimedia.org/P42193 and previous config saved to /var/cache/conftool/dbconfig/20221201-174140-ladsgroup.json
* 17:40 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cloudvirt1058']
* 17:40 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cloudvirt1059']
* 17:38 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host cloudvirt1057.eqiad.wmnet with OS bullseye
* 17:36 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cloudvirt1057']
* 17:34 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudvirt1060']
* 17:33 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudvirt1057']
* 17:32 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1056.eqiad.wmnet with OS bullseye
* 17:31 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cloudvirt1057']
* 17:27 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudvirt1059']
* 17:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1185 ([[phab:T323907|T323907]])', diff saved to https://phabricator.wikimedia.org/P42192 and previous config saved to /var/cache/conftool/dbconfig/20221201-172634-ladsgroup.json
* 17:26 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudvirt1058']
* 17:25 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cloudvirt1058']
* 17:24 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cloudvirt1059']
* 17:18 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1056.eqiad.wmnet with reason: host reimage
* 17:14 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host dns5004.wikimedia.org with OS buster
* 17:14 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1056.eqiad.wmnet with reason: host reimage
* 17:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2178 ([[phab:T323907|T323907]])', diff saved to https://phabricator.wikimedia.org/P42191 and previous config saved to /var/cache/conftool/dbconfig/20221201-171335-ladsgroup.json
* 17:08 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudvirt1059']
* 17:07 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudvirt1058']
* 17:02 jayme@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
* 17:01 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host cloudvirt1056.eqiad.wmnet with OS bullseye
* 17:01 jayme@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
* 16:59 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudvirt1057']
* 16:58 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1055.eqiad.wmnet with OS bullseye
* 16:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2178', diff saved to https://phabricator.wikimedia.org/P42190 and previous config saved to /var/cache/conftool/dbconfig/20221201-165828-ladsgroup.json
* 16:56 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
* 16:55 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
* 16:53 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1054.eqiad.wmnet with OS bullseye
* 16:50 robh@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host dns5004
* 16:50 robh@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host dns5004
* 16:50 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cloudvirt1057']
* 16:49 robh@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 16:49 robh@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: dns5004 fix - robh@cumin2002"
* 16:48 robh@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: dns5004 fix - robh@cumin2002"
* 16:46 robh@cumin2002: START - Cookbook sre.dns.netbox
* 16:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1185 ([[phab:T323907|T323907]])', diff saved to https://phabricator.wikimedia.org/P42189 and previous config saved to /var/cache/conftool/dbconfig/20221201-164509-ladsgroup.json
* 16:45 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1185.eqiad.wmnet with reason: Maintenance
* 16:44 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1185.eqiad.wmnet with reason: Maintenance
* 16:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1161 ([[phab:T323907|T323907]])', diff saved to https://phabricator.wikimedia.org/P42188 and previous config saved to /var/cache/conftool/dbconfig/20221201-164437-ladsgroup.json
* 16:44 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1055.eqiad.wmnet with reason: host reimage
* 16:43 moritzm: installing ini4j security updates
* 16:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2178', diff saved to https://phabricator.wikimedia.org/P42187 and previous config saved to /var/cache/conftool/dbconfig/20221201-164322-ladsgroup.json
* 16:42 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cloudvirt1056']
* 16:40 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1055.eqiad.wmnet with reason: host reimage
* 16:39 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1054.eqiad.wmnet with reason: host reimage
* 16:36 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1054.eqiad.wmnet with reason: host reimage
* 16:34 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudvirt1057']
* 16:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1161', diff saved to https://phabricator.wikimedia.org/P42185 and previous config saved to /var/cache/conftool/dbconfig/20221201-162930-ladsgroup.json
* 16:28 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host cloudvirt1055.eqiad.wmnet with OS bullseye
* 16:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2178 ([[phab:T323907|T323907]])', diff saved to https://phabricator.wikimedia.org/P42184 and previous config saved to /var/cache/conftool/dbconfig/20221201-162815-ladsgroup.json
* 16:26 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudvirt1056']
* 16:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1161', diff saved to https://phabricator.wikimedia.org/P42183 and previous config saved to /var/cache/conftool/dbconfig/20221201-161424-ladsgroup.json
* 16:13 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cloudvirt1055']
* 16:13 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cloudvirt1056']
* 16:07 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host cloudvirt1054.eqiad.wmnet with OS bullseye
* 16:06 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cloudvirt1054']
* 16:00 effie: php7.4 upgrade + apache upgrade + rolling restarts of parsoid servers - [[phab:T323358|T323358]]
* 16:00 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudvirt1055']
* 15:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1161 ([[phab:T323907|T323907]])', diff saved to https://phabricator.wikimedia.org/P42182 and previous config saved to /var/cache/conftool/dbconfig/20221201-155917-ladsgroup.json
* 15:58 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cloudvirt1055']
* 15:57 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudvirt1056']
* 15:57 effie: php7.4 upgrade + apache upgrade + rolling restarts of jobrunners/videoscalers servers - [[phab:T323358|T323358]]
* 15:50 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudvirt1054']
* 15:47 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cloudvirt1054']
* 15:45 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudvirt1055']
* 15:41 effie: php7.4 upgrade + apache upgrade + rolling restarts of api servers - [[phab:T323358|T323358]]
* 15:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2178 ([[phab:T323907|T323907]])', diff saved to https://phabricator.wikimedia.org/P42181 and previous config saved to /var/cache/conftool/dbconfig/20221201-153918-ladsgroup.json
* 15:39 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2178.codfw.wmnet with reason: Maintenance
* 15:39 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2178.codfw.wmnet with reason: Maintenance
* 15:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3315 ([[phab:T323907|T323907]])', diff saved to https://phabricator.wikimedia.org/P42180 and previous config saved to /var/cache/conftool/dbconfig/20221201-153856-ladsgroup.json
* 15:38 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts dns5001.wikimedia.org
* 15:38 sukhe@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 15:38 sukhe@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: dns5001.wikimedia.org decommissioned, removing all IPs except the asset tag one - sukhe@cumin2002"
* 15:37 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudvirt1054']
* 15:36 sukhe@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: dns5001.wikimedia.org decommissioned, removing all IPs except the asset tag one - sukhe@cumin2002"
* 15:34 sukhe@cumin2002: START - Cookbook sre.dns.netbox
* 15:28 sukhe@cumin2002: START - Cookbook sre.hosts.decommission for hosts dns5001.wikimedia.org
* 15:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3315', diff saved to https://phabricator.wikimedia.org/P42179 and previous config saved to /var/cache/conftool/dbconfig/20221201-152350-ladsgroup.json
* 15:12 effie: php7.4 upgrade + apache upgrade + rolling restarts of app servers - [[phab:T323358|T323358]]
* 15:11 sukhe: [done] homer "cr*-eqsin*" commit "running homer for Gerrit: 862321"
* 15:10 sukhe: homer "cr*-eqsin*" commit "running homer for Gerrit: 862321"
* 15:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3315', diff saved to https://phabricator.wikimedia.org/P42178 and previous config saved to /var/cache/conftool/dbconfig/20221201-150843-ladsgroup.json
* 15:01 Lucas_WMDE: UTC afternoon backport+config window done
* 15:00 lucaswerkmeister-wmde@deploy1002: Finished scap: Backport for [[gerrit:861431{{!}}Enable limited width on plwikisource MAIN namespace (T323185)]] (duration: 08m 06s)
* 14:59 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
* 14:58 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
* 14:58 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
* 14:57 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
* 14:53 lucaswerkmeister-wmde@deploy1002: lucaswerkmeister-wmde and soda: Backport for [[gerrit:861431{{!}}Enable limited width on plwikisource MAIN namespace (T323185)]] synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet
* 14:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3315 ([[phab:T323907|T323907]])', diff saved to https://phabricator.wikimedia.org/P42177 and previous config saved to /var/cache/conftool/dbconfig/20221201-145337-ladsgroup.json
* 14:52 lucaswerkmeister-wmde@deploy1002: Started scap: Backport for [[gerrit:861431{{!}}Enable limited width on plwikisource MAIN namespace (T323185)]]
* 14:52 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
* 14:52 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
* 14:52 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
* 14:51 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
* 14:50 moritzm: installing krb5 security updates
* 14:46 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
* 14:45 kharlan@deploy1002: Finished scap: Backport for [[gerrit:862839{{!}}GrowthExperiments: Enable new impact module on testwiki (T323526)]] (duration: 06m 12s)
* 14:42 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
* 14:42 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
* 14:42 XioNoX: add BGP sessions to RIPE RIS in drmrs
* 14:40 kharlan@deploy1002: kharlan and kharlan: Backport for [[gerrit:862839{{!}}GrowthExperiments: Enable new impact module on testwiki (T323526)]] synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet
* 14:39 kharlan@deploy1002: Started scap: Backport for [[gerrit:862839{{!}}GrowthExperiments: Enable new impact module on testwiki (T323526)]]
* 14:36 kharlan@deploy1002: Finished scap: Backport for [[gerrit:861506{{!}}[no-op] GrowthExperiments: Enable D3 in production (T318854)]] (duration: 06m 04s)
* 14:35 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
* 14:31 kharlan@deploy1002: kharlan and tgr: Backport for [[gerrit:861506{{!}}[no-op] GrowthExperiments: Enable D3 in production (T318854)]] synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet
* 14:30 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
* 14:30 kharlan@deploy1002: Started scap: Backport for [[gerrit:861506{{!}}[no-op] GrowthExperiments: Enable D3 in production (T318854)]]
* 14:29 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
* 14:29 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
* 14:29 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
* 14:27 kharlan@deploy1002: Finished scap: Backport for [[gerrit:862355{{!}}DatabaseUserImpactStore: Fix parameter style for upsert keys (T324188)]] (duration: 07m 25s)
* 14:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1161 ([[phab:T323907|T323907]])', diff saved to https://phabricator.wikimedia.org/P42176 and previous config saved to /var/cache/conftool/dbconfig/20221201-142735-ladsgroup.json
* 14:27 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
* 14:27 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
* 14:27 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1161.eqiad.wmnet with reason: Maintenance
* 14:26 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1161.eqiad.wmnet with reason: Maintenance
* 14:24 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
* 14:23 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
* 14:23 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
* 14:22 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
* 14:21 kharlan@deploy1002: kharlan and kharlan: Backport for [[gerrit:862355{{!}}DatabaseUserImpactStore: Fix parameter style for upsert keys (T324188)]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet
* 14:20 kharlan@deploy1002: Started scap: Backport for [[gerrit:862355{{!}}DatabaseUserImpactStore: Fix parameter style for upsert keys (T324188)]]
* 14:00 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 14:00 cmooney@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Adjust DNS for LVS eqsin. - cmooney@cumin1001"
* 13:30 cmooney@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Adjust DNS for LVS eqsin. - cmooney@cumin1001"
* 13:28 cmooney@cumin1001: START - Cookbook sre.dns.netbox
* 13:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2171:3315 ([[phab:T323907|T323907]])', diff saved to https://phabricator.wikimedia.org/P42175 and previous config saved to /var/cache/conftool/dbconfig/20221201-132000-ladsgroup.json
* 13:19 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2171.codfw.wmnet with reason: Maintenance
* 13:19 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2171.codfw.wmnet with reason: Maintenance
* 13:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2157 ([[phab:T323907|T323907]])', diff saved to https://phabricator.wikimedia.org/P42174 and previous config saved to /var/cache/conftool/dbconfig/20221201-131950-ladsgroup.json
* 13:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2157', diff saved to https://phabricator.wikimedia.org/P42172 and previous config saved to /var/cache/conftool/dbconfig/20221201-130443-ladsgroup.json
* 12:58 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1150.eqiad.wmnet with reason: Maintenance
* 12:58 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1150.eqiad.wmnet with reason: Maintenance
* 12:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3315 ([[phab:T323907|T323907]])', diff saved to https://phabricator.wikimedia.org/P42171 and previous config saved to /var/cache/conftool/dbconfig/20221201-125821-ladsgroup.json
* 12:50 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/api-gateway: sync
* 12:50 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/api-gateway: sync
* 12:50 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/api-gateway: sync
* 12:49 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/api-gateway: sync
* 12:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2157', diff saved to https://phabricator.wikimedia.org/P42170 and previous config saved to /var/cache/conftool/dbconfig/20221201-124936-ladsgroup.json
* 12:48 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/api-gateway: sync
* 12:48 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/api-gateway: sync
* 12:47 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/api-gateway: sync
* 12:47 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/api-gateway: sync
* 12:43 moritzm: installing glibc security updates on buster
* 12:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3315', diff saved to https://phabricator.wikimedia.org/P42169 and previous config saved to /var/cache/conftool/dbconfig/20221201-124314-ladsgroup.json
* 12:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2157 ([[phab:T323907|T323907]])', diff saved to https://phabricator.wikimedia.org/P42168 and previous config saved to /var/cache/conftool/dbconfig/20221201-123430-ladsgroup.json
* 12:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3315', diff saved to https://phabricator.wikimedia.org/P42167 and previous config saved to /var/cache/conftool/dbconfig/20221201-122807-ladsgroup.json
* 12:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3315 ([[phab:T323907|T323907]])', diff saved to https://phabricator.wikimedia.org/P42166 and previous config saved to /var/cache/conftool/dbconfig/20221201-121301-ladsgroup.json
* 12:01 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on dbstore1005.eqiad.wmnet with reason: Maintenance
* 12:01 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on dbstore1005.eqiad.wmnet with reason: Maintenance
* 12:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1201 ([[phab:T318605|T318605]])', diff saved to https://phabricator.wikimedia.org/P42165 and previous config saved to /var/cache/conftool/dbconfig/20221201-120102-ladsgroup.json
* 11:57 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti5004.eqsin.wmnet to cluster eqsin and group 1
* 11:55 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti5004.eqsin.wmnet to cluster eqsin and group 1
* 11:47 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti5004.eqsin.wmnet to cluster eqsin and group 1
* 11:46 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti5004.eqsin.wmnet to cluster eqsin and group 1
* 11:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1201', diff saved to https://phabricator.wikimedia.org/P42164 and previous config saved to /var/cache/conftool/dbconfig/20221201-114555-ladsgroup.json
* 11:41 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti5004.eqsin.wmnet
* 11:32 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti5004.eqsin.wmnet
* 11:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1201', diff saved to https://phabricator.wikimedia.org/P42163 and previous config saved to /var/cache/conftool/dbconfig/20221201-113049-ladsgroup.json
* 11:25 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
* 11:21 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
* 11:21 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
* 11:20 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
* 11:18 lucaswerkmeister-wmde@deploy1002: Finished scap: Backport for [[gerrit:862357{{!}}Fix broken search with vector-2022 on www.wikidata.org (T324148)]] (duration: 06m 56s)
* 11:15 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
* 11:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1201 ([[phab:T318605|T318605]])', diff saved to https://phabricator.wikimedia.org/P42162 and previous config saved to /var/cache/conftool/dbconfig/20221201-111542-ladsgroup.json
* 11:15 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
* 11:15 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
* 11:14 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
* 11:12 lucaswerkmeister-wmde@deploy1002: lucaswerkmeister-wmde and migr: Backport for [[gerrit:862357{{!}}Fix broken search with vector-2022 on www.wikidata.org (T324148)]] synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet
* 11:11 lucaswerkmeister-wmde@deploy1002: Started scap: Backport for [[gerrit:862357{{!}}Fix broken search with vector-2022 on www.wikidata.org (T324148)]]
* 11:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1201 ([[phab:T318605|T318605]])', diff saved to https://phabricator.wikimedia.org/P42161 and previous config saved to /var/cache/conftool/dbconfig/20221201-110938-ladsgroup.json
* 11:09 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1201.eqiad.wmnet with reason: Maintenance
* 11:09 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1201.eqiad.wmnet with reason: Maintenance
* 11:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1187 ([[phab:T318605|T318605]])', diff saved to https://phabricator.wikimedia.org/P42160 and previous config saved to /var/cache/conftool/dbconfig/20221201-110916-ladsgroup.json
* 11:00 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1163.eqiad.wmnet with reason: Maintenance
* 11:00 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1163.eqiad.wmnet with reason: Maintenance
* 10:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2157 ([[phab:T323907|T323907]])', diff saved to https://phabricator.wikimedia.org/P42159 and previous config saved to /var/cache/conftool/dbconfig/20221201-105938-ladsgroup.json
* 10:59 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2157.codfw.wmnet with reason: Maintenance
* 10:59 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2157.codfw.wmnet with reason: Maintenance
* 10:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2137:3315 ([[phab:T323907|T323907]])', diff saved to https://phabricator.wikimedia.org/P42158 and previous config saved to /var/cache/conftool/dbconfig/20221201-105916-ladsgroup.json
* 10:57 filippo@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=thanos-web
* 10:56 elukey: deleted knative controller + net-istio controllers on ml-serve-eqiad to clear out some weird state (causing high latencies for the k8s api)
* 10:55 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti5004.eqsin.wmnet
* 10:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1187', diff saved to https://phabricator.wikimedia.org/P42157 and previous config saved to /var/cache/conftool/dbconfig/20221201-105410-ladsgroup.json
* 10:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2137:3315', diff saved to https://phabricator.wikimedia.org/P42156 and previous config saved to /var/cache/conftool/dbconfig/20221201-104409-ladsgroup.json
* 10:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1187', diff saved to https://phabricator.wikimedia.org/P42155 and previous config saved to /var/cache/conftool/dbconfig/20221201-103903-ladsgroup.json
* 10:37 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti5004.eqsin.wmnet
* 10:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1144:3315 ([[phab:T323907|T323907]])', diff saved to https://phabricator.wikimedia.org/P42154 and previous config saved to /var/cache/conftool/dbconfig/20221201-103448-ladsgroup.json
* 10:34 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1144.eqiad.wmnet with reason: Maintenance
* 10:34 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1144.eqiad.wmnet with reason: Maintenance
* 10:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3315 ([[phab:T323907|T323907]])', diff saved to https://phabricator.wikimedia.org/P42153 and previous config saved to /var/cache/conftool/dbconfig/20221201-103426-ladsgroup.json
* 10:34 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti5004.eqsin.wmnet to cluster eqsin and group 1
* 10:34 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti5004.eqsin.wmnet to cluster eqsin and group 1
* 10:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2137:3315', diff saved to https://phabricator.wikimedia.org/P42152 and previous config saved to /var/cache/conftool/dbconfig/20221201-102903-ladsgroup.json
* 10:28 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti5004.eqsin.wmnet
* 10:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1187 ([[phab:T318605|T318605]])', diff saved to https://phabricator.wikimedia.org/P42151 and previous config saved to /var/cache/conftool/dbconfig/20221201-102357-ladsgroup.json
* 10:20 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti5004.eqsin.wmnet
* 10:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3315', diff saved to https://phabricator.wikimedia.org/P42150 and previous config saved to /var/cache/conftool/dbconfig/20221201-101920-ladsgroup.json
* 10:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1187 ([[phab:T318605|T318605]])', diff saved to https://phabricator.wikimedia.org/P42149 and previous config saved to /var/cache/conftool/dbconfig/20221201-101754-ladsgroup.json
* 10:17 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1187.eqiad.wmnet with reason: Maintenance
* 10:17 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1187.eqiad.wmnet with reason: Maintenance
* 10:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1180 ([[phab:T318605|T318605]])', diff saved to https://phabricator.wikimedia.org/P42148 and previous config saved to /var/cache/conftool/dbconfig/20221201-101733-ladsgroup.json
* 10:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2137:3315 ([[phab:T323907|T323907]])', diff saved to https://phabricator.wikimedia.org/P42147 and previous config saved to /var/cache/conftool/dbconfig/20221201-101356-ladsgroup.json
* 10:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3315', diff saved to https://phabricator.wikimedia.org/P42146 and previous config saved to /var/cache/conftool/dbconfig/20221201-100413-ladsgroup.json
* 10:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1180', diff saved to https://phabricator.wikimedia.org/P42145 and previous config saved to /var/cache/conftool/dbconfig/20221201-100227-ladsgroup.json
* 09:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3315 ([[phab:T323907|T323907]])', diff saved to https://phabricator.wikimedia.org/P42144 and previous config saved to /var/cache/conftool/dbconfig/20221201-094907-ladsgroup.json
* 09:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1180', diff saved to https://phabricator.wikimedia.org/P42143 and previous config saved to /var/cache/conftool/dbconfig/20221201-094720-ladsgroup.json
* 09:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1180 ([[phab:T318605|T318605]])', diff saved to https://phabricator.wikimedia.org/P42142 and previous config saved to /var/cache/conftool/dbconfig/20221201-093214-ladsgroup.json
* 09:27 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
* 09:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1180 ([[phab:T318605|T318605]])', diff saved to https://phabricator.wikimedia.org/P42141 and previous config saved to /var/cache/conftool/dbconfig/20221201-092455-ladsgroup.json
* 09:24 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1180.eqiad.wmnet with reason: Maintenance
* 09:24 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1180.eqiad.wmnet with reason: Maintenance
* 09:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1168 ([[phab:T318605|T318605]])', diff saved to https://phabricator.wikimedia.org/P42140 and previous config saved to /var/cache/conftool/dbconfig/20221201-092434-ladsgroup.json
* 09:21 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
* 09:21 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
* 09:19 kostajh: UTC morning deploys done
* 09:18 kharlan@deploy1002: Finished scap: Backport for [[gerrit:862354{{!}}User impact: Fix per-page pageview numbers (T323253)]] (duration: 08m 31s)
* 09:15 Emperor: depool, restart, repool swift-proxy on ms-fe1011
* 09:14 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
* 09:11 kharlan@deploy1002: kharlan and kharlan: Backport for [[gerrit:862354{{!}}User impact: Fix per-page pageview numbers (T323253)]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet
* 09:09 kharlan@deploy1002: Started scap: Backport for [[gerrit:862354{{!}}User impact: Fix per-page pageview numbers (T323253)]]
* 09:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1168', diff saved to https://phabricator.wikimedia.org/P42139 and previous config saved to /var/cache/conftool/dbconfig/20221201-090927-ladsgroup.json
* 09:07 moritzm: rebuilding raid on ganeti2013 [[phab:T323222|T323222]]
* 09:01 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host ganeti2013.codfw.wmnet
* 08:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1168', diff saved to https://phabricator.wikimedia.org/P42138 and previous config saved to /var/cache/conftool/dbconfig/20221201-085421-ladsgroup.json
* 08:49 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2013.codfw.wmnet
* 08:49 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
* 08:49 volans: restart idrac on mw1334, ipmi and remote ipmi works fine, ssh not responding
* 08:48 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
* 08:48 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
* 08:47 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
* 08:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2137:3315 ([[phab:T323907|T323907]])', diff saved to https://phabricator.wikimedia.org/P42137 and previous config saved to /var/cache/conftool/dbconfig/20221201-084147-ladsgroup.json
* 08:41 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2137.codfw.wmnet with reason: Maintenance
* 08:41 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2137.codfw.wmnet with reason: Maintenance
* 08:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2128 ([[phab:T323907|T323907]])', diff saved to https://phabricator.wikimedia.org/P42136 and previous config saved to /var/cache/conftool/dbconfig/20221201-084125-ladsgroup.json
* 08:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3314 ([[phab:T318605|T318605]])', diff saved to https://phabricator.wikimedia.org/P42135 and previous config saved to /var/cache/conftool/dbconfig/20221201-084026-ladsgroup.json
* 08:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1168 ([[phab:T318605|T318605]])', diff saved to https://phabricator.wikimedia.org/P42134 and previous config saved to /var/cache/conftool/dbconfig/20221201-083914-ladsgroup.json
* 08:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2128', diff saved to https://phabricator.wikimedia.org/P42131 and previous config saved to /var/cache/conftool/dbconfig/20221201-082619-ladsgroup.json
* 08:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3314', diff saved to https://phabricator.wikimedia.org/P42130 and previous config saved to /var/cache/conftool/dbconfig/20221201-082519-ladsgroup.json
* 08:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1168 ([[phab:T318605|T318605]])', diff saved to https://phabricator.wikimedia.org/P42129 and previous config saved to /var/cache/conftool/dbconfig/20221201-082215-ladsgroup.json
* 08:22 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1168.eqiad.wmnet with reason: Maintenance
* 08:21 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1168.eqiad.wmnet with reason: Maintenance
* 08:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1165 ([[phab:T318605|T318605]])', diff saved to https://phabricator.wikimedia.org/P42128 and previous config saved to /var/cache/conftool/dbconfig/20221201-082154-ladsgroup.json
* 08:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1113:3315 ([[phab:T323907|T323907]])', diff saved to https://phabricator.wikimedia.org/P42127 and previous config saved to /var/cache/conftool/dbconfig/20221201-081444-ladsgroup.json
* 08:14 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1113.eqiad.wmnet with reason: Maintenance
* 08:14 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1113.eqiad.wmnet with reason: Maintenance
* 08:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1110 ([[phab:T323907|T323907]])', diff saved to https://phabricator.wikimedia.org/P42126 and previous config saved to /var/cache/conftool/dbconfig/20221201-081433-ladsgroup.json
* 08:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2128', diff saved to https://phabricator.wikimedia.org/P42125 and previous config saved to /var/cache/conftool/dbconfig/20221201-081112-ladsgroup.json
* 08:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3314', diff saved to https://phabricator.wikimedia.org/P42124 and previous config saved to /var/cache/conftool/dbconfig/20221201-081013-ladsgroup.json
* 08:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1165', diff saved to https://phabricator.wikimedia.org/P42123 and previous config saved to /var/cache/conftool/dbconfig/20221201-080647-ladsgroup.json
* 07:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1110', diff saved to https://phabricator.wikimedia.org/P42122 and previous config saved to /var/cache/conftool/dbconfig/20221201-075927-ladsgroup.json
* 07:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2128 ([[phab:T323907|T323907]])', diff saved to https://phabricator.wikimedia.org/P42120 and previous config saved to /var/cache/conftool/dbconfig/20221201-075606-ladsgroup.json
* 07:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3314 ([[phab:T318605|T318605]])', diff saved to https://phabricator.wikimedia.org/P42119 and previous config saved to /var/cache/conftool/dbconfig/20221201-075506-ladsgroup.json
* 07:52 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 400474
* 07:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1165', diff saved to https://phabricator.wikimedia.org/P42118 and previous config saved to /var/cache/conftool/dbconfig/20221201-075140-ladsgroup.json
* 07:51 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 400474
* 07:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1110', diff saved to https://phabricator.wikimedia.org/P42117 and previous config saved to /var/cache/conftool/dbconfig/20221201-074420-ladsgroup.json
* 07:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1165 ([[phab:T318605|T318605]])', diff saved to https://phabricator.wikimedia.org/P42116 and previous config saved to /var/cache/conftool/dbconfig/20221201-073634-ladsgroup.json
* 07:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1165 ([[phab:T318605|T318605]])', diff saved to https://phabricator.wikimedia.org/P42115 and previous config saved to /var/cache/conftool/dbconfig/20221201-073015-ladsgroup.json
* 07:30 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
* 07:29 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
* 07:29 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1165.eqiad.wmnet with reason: Maintenance
* 07:29 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1165.eqiad.wmnet with reason: Maintenance
* 07:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1110 ([[phab:T323907|T323907]])', diff saved to https://phabricator.wikimedia.org/P42114 and previous config saved to /var/cache/conftool/dbconfig/20221201-072914-ladsgroup.json
* 07:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2180 ([[phab:T318605|T318605]])', diff saved to https://phabricator.wikimedia.org/P42113 and previous config saved to /var/cache/conftool/dbconfig/20221201-072659-ladsgroup.json
* 07:20 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1163.eqiad.wmnet with reason: Maintenance
* 07:20 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1163.eqiad.wmnet with reason: Maintenance
* 07:19 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1163.eqiad.wmnet with reason: Maintenance
* 07:19 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1163.eqiad.wmnet with reason: Maintenance
* 07:18 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1163.eqiad.wmnet with reason: Maintenance
* 07:18 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1163.eqiad.wmnet with reason: Maintenance
* 07:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2128 ([[phab:T323907|T323907]])', diff saved to https://phabricator.wikimedia.org/P42111 and previous config saved to /var/cache/conftool/dbconfig/20221201-071641-ladsgroup.json
* 07:16 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2094.codfw.wmnet with reason: Maintenance
* 07:16 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2094.codfw.wmnet with reason: Maintenance
* 07:16 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2128.codfw.wmnet with reason: Maintenance
* 07:16 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2128.codfw.wmnet with reason: Maintenance
* 07:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2123 ([[phab:T323907|T323907]])', diff saved to https://phabricator.wikimedia.org/P42110 and previous config saved to /var/cache/conftool/dbconfig/20221201-071615-ladsgroup.json
* 07:14 oblivian@deploy1002: helmfile [eqiad] DONE helmfile.d/services/tegola-vector-tiles: apply
* 07:13 oblivian@deploy1002: helmfile [eqiad] START helmfile.d/services/tegola-vector-tiles: apply
* 07:13 oblivian@deploy1002: helmfile [staging] DONE helmfile.d/services/tegola-vector-tiles: apply
* 07:13 oblivian@deploy1002: helmfile [staging] START helmfile.d/services/tegola-vector-tiles: apply
* 07:12 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/services/tegola-vector-tiles: apply
* 07:12 oblivian@deploy1002: helmfile [codfw] START helmfile.d/services/tegola-vector-tiles: apply
* 07:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2180', diff saved to https://phabricator.wikimedia.org/P42109 and previous config saved to /var/cache/conftool/dbconfig/20221201-071153-ladsgroup.json
* 07:09 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1163.eqiad.wmnet with reason: Maintenance
* 07:09 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1163.eqiad.wmnet with reason: Maintenance
* 07:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depool db1163 [[phab:T323547|T323547]]', diff saved to https://phabricator.wikimedia.org/P42108 and previous config saved to /var/cache/conftool/dbconfig/20221201-070758-ladsgroup.json
* 07:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Promote db1118 to s1 primary and set section read-write [[phab:T323547|T323547]]', diff saved to https://phabricator.wikimedia.org/P42107 and previous config saved to /var/cache/conftool/dbconfig/20221201-070203-ladsgroup.json
* 07:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Set s1 eqiad as read-only for maintenance - [[phab:T323547|T323547]]', diff saved to https://phabricator.wikimedia.org/P42106 and previous config saved to /var/cache/conftool/dbconfig/20221201-070131-ladsgroup.json
* 07:01 Amir1: Starting s1 eqiad failover from db1163 to db1118 - [[phab:T323547|T323547]]
* 07:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2123', diff saved to https://phabricator.wikimedia.org/P42105 and previous config saved to /var/cache/conftool/dbconfig/20221201-070108-ladsgroup.json
* 06:57 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1140.eqiad.wmnet with reason: Maintenance
* 06:57 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1140.eqiad.wmnet with reason: Maintenance
* 06:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1131 ([[phab:T318605|T318605]])', diff saved to https://phabricator.wikimedia.org/P42104 and previous config saved to /var/cache/conftool/dbconfig/20221201-065737-ladsgroup.json
* 06:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2180', diff saved to https://phabricator.wikimedia.org/P42103 and previous config saved to /var/cache/conftool/dbconfig/20221201-065646-ladsgroup.json
* 06:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2123', diff saved to https://phabricator.wikimedia.org/P42102 and previous config saved to /var/cache/conftool/dbconfig/20221201-064602-ladsgroup.json
* 06:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1131', diff saved to https://phabricator.wikimedia.org/P42101 and previous config saved to /var/cache/conftool/dbconfig/20221201-064230-ladsgroup.json
* 06:42 oblivian@deploy1002: helmfile [staging] DONE helmfile.d/services/zotero: apply
* 06:42 oblivian@deploy1002: helmfile [staging] START helmfile.d/services/zotero: apply
* 06:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2180 ([[phab:T318605|T318605]])', diff saved to https://phabricator.wikimedia.org/P42100 and previous config saved to /var/cache/conftool/dbconfig/20221201-064140-ladsgroup.json
* 06:41 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/services/zotero: apply
* 06:40 oblivian@deploy1002: helmfile [codfw] START helmfile.d/services/zotero: apply
* 06:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2180 ([[phab:T318605|T318605]])', diff saved to https://phabricator.wikimedia.org/P42099 and previous config saved to /var/cache/conftool/dbconfig/20221201-063930-ladsgroup.json
* 06:39 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2180.codfw.wmnet with reason: Maintenance
* 06:39 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2180.codfw.wmnet with reason: Maintenance
* 06:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3316 ([[phab:T318605|T318605]])', diff saved to https://phabricator.wikimedia.org/P42098 and previous config saved to /var/cache/conftool/dbconfig/20221201-063908-ladsgroup.json
* 06:36 oblivian@deploy1002: helmfile [eqiad] DONE helmfile.d/services/zotero: apply
* 06:35 oblivian@deploy1002: helmfile [eqiad] START helmfile.d/services/zotero: apply
* 06:31 oblivian@deploy1002: helmfile [eqiad] DONE helmfile.d/services/zotero: apply
* 06:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2123 ([[phab:T323907|T323907]])', diff saved to https://phabricator.wikimedia.org/P42097 and previous config saved to /var/cache/conftool/dbconfig/20221201-063055-ladsgroup.json
* 06:30 oblivian@deploy1002: helmfile [eqiad] START helmfile.d/services/zotero: apply
* 06:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1131', diff saved to https://phabricator.wikimedia.org/P42096 and previous config saved to /var/cache/conftool/dbconfig/20221201-062724-ladsgroup.json
* 06:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3316', diff saved to https://phabricator.wikimedia.org/P42095 and previous config saved to /var/cache/conftool/dbconfig/20221201-062402-ladsgroup.json
* 06:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1131 ([[phab:T318605|T318605]])', diff saved to https://phabricator.wikimedia.org/P42094 and previous config saved to /var/cache/conftool/dbconfig/20221201-061218-ladsgroup.json
* 06:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3316', diff saved to https://phabricator.wikimedia.org/P42093 and previous config saved to /var/cache/conftool/dbconfig/20221201-060855-ladsgroup.json
* 06:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1131 ([[phab:T318605|T318605]])', diff saved to https://phabricator.wikimedia.org/P42092 and previous config saved to /var/cache/conftool/dbconfig/20221201-060230-ladsgroup.json
* 06:02 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1131.eqiad.wmnet with reason: Maintenance
* 06:02 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1131.eqiad.wmnet with reason: Maintenance
* 06:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3316 ([[phab:T318605|T318605]])', diff saved to https://phabricator.wikimedia.org/P42091 and previous config saved to /var/cache/conftool/dbconfig/20221201-060206-ladsgroup.json
* 06:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Set db1118 with weight 0 [[phab:T323547|T323547]]', diff saved to https://phabricator.wikimedia.org/P42090 and previous config saved to /var/cache/conftool/dbconfig/20221201-060157-ladsgroup.json
* 06:01 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on 37 hosts with reason: Primary switchover s1 [[phab:T323547|T323547]]
* 06:01 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on 37 hosts with reason: Primary switchover s1 [[phab:T323547|T323547]]
* 05:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1110 ([[phab:T323907|T323907]])', diff saved to https://phabricator.wikimedia.org/P42089 and previous config saved to /var/cache/conftool/dbconfig/20221201-055359-ladsgroup.json
* 05:53 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1110.eqiad.wmnet with reason: Maintenance
* 05:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3316 ([[phab:T318605|T318605]])', diff saved to https://phabricator.wikimedia.org/P42088 and previous config saved to /var/cache/conftool/dbconfig/20221201-055349-ladsgroup.json
* 05:53 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1110.eqiad.wmnet with reason: Maintenance
* 05:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1100 ([[phab:T323907|T323907]])', diff saved to https://phabricator.wikimedia.org/P42087 and previous config saved to /var/cache/conftool/dbconfig/20221201-055337-ladsgroup.json
* 05:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2171:3316 ([[phab:T318605|T318605]])', diff saved to https://phabricator.wikimedia.org/P42086 and previous config saved to /var/cache/conftool/dbconfig/20221201-055239-ladsgroup.json
* 05:52 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2171.codfw.wmnet with reason: Maintenance
* 05:52 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2171.codfw.wmnet with reason: Maintenance
* 05:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2169:3316 ([[phab:T318605|T318605]])', diff saved to https://phabricator.wikimedia.org/P42085 and previous config saved to /var/cache/conftool/dbconfig/20221201-055218-ladsgroup.json
* 05:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2123 ([[phab:T323907|T323907]])', diff saved to https://phabricator.wikimedia.org/P42084 and previous config saved to /var/cache/conftool/dbconfig/20221201-055142-ladsgroup.json
* 05:51 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2123.codfw.wmnet with reason: Maintenance
* 05:51 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2123.codfw.wmnet with reason: Maintenance
* 05:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2111 ([[phab:T323907|T323907]])', diff saved to https://phabricator.wikimedia.org/P42083 and previous config saved to /var/cache/conftool/dbconfig/20221201-055120-ladsgroup.json
* 05:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3316', diff saved to https://phabricator.wikimedia.org/P42082 and previous config saved to /var/cache/conftool/dbconfig/20221201-054653-ladsgroup.json
* 05:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1100', diff saved to https://phabricator.wikimedia.org/P42081 and previous config saved to /var/cache/conftool/dbconfig/20221201-053831-ladsgroup.json
* 05:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2169:3316', diff saved to https://phabricator.wikimedia.org/P42080 and previous config saved to /var/cache/conftool/dbconfig/20221201-053711-ladsgroup.json
* 05:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2111', diff saved to https://phabricator.wikimedia.org/P42079 and previous config saved to /var/cache/conftool/dbconfig/20221201-053613-ladsgroup.json
* 05:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3316', diff saved to https://phabricator.wikimedia.org/P42078 and previous config saved to /var/cache/conftool/dbconfig/20221201-053147-ladsgroup.json
* 05:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1186 ([[phab:T322618|T322618]])', diff saved to https://phabricator.wikimedia.org/P42077 and previous config saved to /var/cache/conftool/dbconfig/20221201-052524-ladsgroup.json
* 05:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1100', diff saved to https://phabricator.wikimedia.org/P42076 and previous config saved to /var/cache/conftool/dbconfig/20221201-052325-ladsgroup.json
* 05:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1186 ([[phab:T322618|T322618]])', diff saved to https://phabricator.wikimedia.org/P42075 and previous config saved to /var/cache/conftool/dbconfig/20221201-052223-ladsgroup.json
* 05:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2169:3316', diff saved to https://phabricator.wikimedia.org/P42074 and previous config saved to /var/cache/conftool/dbconfig/20221201-052205-ladsgroup.json
* 05:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2111', diff saved to https://phabricator.wikimedia.org/P42073 and previous config saved to /var/cache/conftool/dbconfig/20221201-052107-ladsgroup.json
* 05:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1186 ([[phab:T322618|T322618]])', diff saved to https://phabricator.wikimedia.org/P42072 and previous config saved to /var/cache/conftool/dbconfig/20221201-052014-ladsgroup.json
* 05:20 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1186.eqiad.wmnet with reason: Maintenance
* 05:19 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1186.eqiad.wmnet with reason: Maintenance
* 05:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1184 ([[phab:T322618|T322618]])', diff saved to https://phabricator.wikimedia.org/P42071 and previous config saved to /var/cache/conftool/dbconfig/20221201-051942-ladsgroup.json
* 05:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3316 ([[phab:T318605|T318605]])', diff saved to https://phabricator.wikimedia.org/P42070 and previous config saved to /var/cache/conftool/dbconfig/20221201-051640-ladsgroup.json
* 05:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1100 ([[phab:T323907|T323907]])', diff saved to https://phabricator.wikimedia.org/P42069 and previous config saved to /var/cache/conftool/dbconfig/20221201-050818-ladsgroup.json
* 05:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2169:3316 ([[phab:T318605|T318605]])', diff saved to https://phabricator.wikimedia.org/P42068 and previous config saved to /var/cache/conftool/dbconfig/20221201-050658-ladsgroup.json
* 05:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2111 ([[phab:T323907|T323907]])', diff saved to https://phabricator.wikimedia.org/P42067 and previous config saved to /var/cache/conftool/dbconfig/20221201-050600-ladsgroup.json
* 05:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2169:3316 ([[phab:T318605|T318605]])', diff saved to https://phabricator.wikimedia.org/P42066 and previous config saved to /var/cache/conftool/dbconfig/20221201-050548-ladsgroup.json
* 05:05 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2169.codfw.wmnet with reason: Maintenance
* 05:05 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2169.codfw.wmnet with reason: Maintenance
* 05:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2158 ([[phab:T318605|T318605]])', diff saved to https://phabricator.wikimedia.org/P42065 and previous config saved to /var/cache/conftool/dbconfig/20221201-050527-ladsgroup.json
* 05:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1184', diff saved to https://phabricator.wikimedia.org/P42064 and previous config saved to /var/cache/conftool/dbconfig/20221201-050435-ladsgroup.json
* 04:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2158', diff saved to https://phabricator.wikimedia.org/P42063 and previous config saved to /var/cache/conftool/dbconfig/20221201-045020-ladsgroup.json
* 04:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1184', diff saved to https://phabricator.wikimedia.org/P42062 and previous config saved to /var/cache/conftool/dbconfig/20221201-044929-ladsgroup.json
* 04:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1113:3316 ([[phab:T318605|T318605]])', diff saved to https://phabricator.wikimedia.org/P42061 and previous config saved to /var/cache/conftool/dbconfig/20221201-044053-ladsgroup.json
* 04:40 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1113.eqiad.wmnet with reason: Maintenance
* 04:40 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1113.eqiad.wmnet with reason: Maintenance
* 04:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3316 ([[phab:T318605|T318605]])', diff saved to https://phabricator.wikimedia.org/P42060 and previous config saved to /var/cache/conftool/dbconfig/20221201-044031-ladsgroup.json
* 04:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2158', diff saved to https://phabricator.wikimedia.org/P42059 and previous config saved to /var/cache/conftool/dbconfig/20221201-043514-ladsgroup.json
* 04:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1184 ([[phab:T322618|T322618]])', diff saved to https://phabricator.wikimedia.org/P42058 and previous config saved to /var/cache/conftool/dbconfig/20221201-043422-ladsgroup.json
* 04:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1184 ([[phab:T322618|T322618]])', diff saved to https://phabricator.wikimedia.org/P42057 and previous config saved to /var/cache/conftool/dbconfig/20221201-043315-ladsgroup.json
* 04:33 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1184.eqiad.wmnet with reason: Maintenance
* 04:33 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1184.eqiad.wmnet with reason: Maintenance
* 04:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169 ([[phab:T322618|T322618]])', diff saved to https://phabricator.wikimedia.org/P42056 and previous config saved to /var/cache/conftool/dbconfig/20221201-043253-ladsgroup.json
* 04:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3316', diff saved to https://phabricator.wikimedia.org/P42055 and previous config saved to /var/cache/conftool/dbconfig/20221201-042525-ladsgroup.json
* 04:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1100 ([[phab:T323907|T323907]])', diff saved to https://phabricator.wikimedia.org/P42054 and previous config saved to /var/cache/conftool/dbconfig/20221201-042251-ladsgroup.json
* 04:22 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1100.eqiad.wmnet with reason: Maintenance
* 04:22 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1100.eqiad.wmnet with reason: Maintenance
* 04:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3315 ([[phab:T323907|T323907]])', diff saved to https://phabricator.wikimedia.org/P42053 and previous config saved to /var/cache/conftool/dbconfig/20221201-042229-ladsgroup.json
* 04:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2158 ([[phab:T318605|T318605]])', diff saved to https://phabricator.wikimedia.org/P42052 and previous config saved to /var/cache/conftool/dbconfig/20221201-042008-ladsgroup.json
* 04:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2158 ([[phab:T318605|T318605]])', diff saved to https://phabricator.wikimedia.org/P42051 and previous config saved to /var/cache/conftool/dbconfig/20221201-041758-ladsgroup.json
* 04:18 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db2095.codfw.wmnet with reason: Maintenance
* 04:17 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db2095.codfw.wmnet with reason: Maintenance
* 04:17 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2158.codfw.wmnet with reason: Maintenance
* 04:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169', diff saved to https://phabricator.wikimedia.org/P42050 and previous config saved to /var/cache/conftool/dbconfig/20221201-041747-ladsgroup.json
* 04:17 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2158.codfw.wmnet with reason: Maintenance
* 04:17 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2141.codfw.wmnet with reason: Maintenance
* 04:16 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2141.codfw.wmnet with reason: Maintenance
* 04:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2129 ([[phab:T318605|T318605]])', diff saved to https://phabricator.wikimedia.org/P42049 and previous config saved to /var/cache/conftool/dbconfig/20221201-041652-ladsgroup.json
* 04:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2176 ([[phab:T322618|T322618]])', diff saved to https://phabricator.wikimedia.org/P42048 and previous config saved to /var/cache/conftool/dbconfig/20221201-041322-ladsgroup.json
* 04:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3316', diff saved to https://phabricator.wikimedia.org/P42047 and previous config saved to /var/cache/conftool/dbconfig/20221201-041018-ladsgroup.json
* 04:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3315', diff saved to https://phabricator.wikimedia.org/P42046 and previous config saved to /var/cache/conftool/dbconfig/20221201-040723-ladsgroup.json
* 04:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169', diff saved to https://phabricator.wikimedia.org/P42045 and previous config saved to /var/cache/conftool/dbconfig/20221201-040240-ladsgroup.json
* 04:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2129', diff saved to https://phabricator.wikimedia.org/P42044 and previous config saved to /var/cache/conftool/dbconfig/20221201-040145-ladsgroup.json
* 03:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2176', diff saved to https://phabricator.wikimedia.org/P42043 and previous config saved to /var/cache/conftool/dbconfig/20221201-035816-ladsgroup.json
* 03:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3316 ([[phab:T318605|T318605]])', diff saved to https://phabricator.wikimedia.org/P42042 and previous config saved to /var/cache/conftool/dbconfig/20221201-035512-ladsgroup.json
* 03:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3315', diff saved to https://phabricator.wikimedia.org/P42041 and previous config saved to /var/cache/conftool/dbconfig/20221201-035216-ladsgroup.json
* 03:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169 ([[phab:T322618|T322618]])', diff saved to https://phabricator.wikimedia.org/P42040 and previous config saved to /var/cache/conftool/dbconfig/20221201-034734-ladsgroup.json
* 03:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2129', diff saved to https://phabricator.wikimedia.org/P42039 and previous config saved to /var/cache/conftool/dbconfig/20221201-034639-ladsgroup.json
* 03:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1169 ([[phab:T322618|T322618]])', diff saved to https://phabricator.wikimedia.org/P42038 and previous config saved to /var/cache/conftool/dbconfig/20221201-034627-ladsgroup.json
* 03:46 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1169.eqiad.wmnet with reason: Maintenance
* 03:45 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1169.eqiad.wmnet with reason: Maintenance
* 03:45 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1140.eqiad.wmnet with reason: Maintenance
* 03:45 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1140.eqiad.wmnet with reason: Maintenance
* 03:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135 ([[phab:T322618|T322618]])', diff saved to https://phabricator.wikimedia.org/P42037 and previous config saved to /var/cache/conftool/dbconfig/20221201-034527-ladsgroup.json
* 03:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2176', diff saved to https://phabricator.wikimedia.org/P42036 and previous config saved to /var/cache/conftool/dbconfig/20221201-034309-ladsgroup.json
* 03:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3315 ([[phab:T323907|T323907]])', diff saved to https://phabricator.wikimedia.org/P42035 and previous config saved to /var/cache/conftool/dbconfig/20221201-033710-ladsgroup.json
* 03:35 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp5027.eqsin.wmnet with OS buster
* 03:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2111 ([[phab:T323907|T323907]])', diff saved to https://phabricator.wikimedia.org/P42034 and previous config saved to /var/cache/conftool/dbconfig/20221201-033449-ladsgroup.json
* 03:34 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2111.codfw.wmnet with reason: Maintenance
* 03:34 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2111.codfw.wmnet with reason: Maintenance
* 03:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2129 ([[phab:T318605|T318605]])', diff saved to https://phabricator.wikimedia.org/P42033 and previous config saved to /var/cache/conftool/dbconfig/20221201-033132-ladsgroup.json
* 03:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135', diff saved to https://phabricator.wikimedia.org/P42032 and previous config saved to /var/cache/conftool/dbconfig/20221201-033020-ladsgroup.json
* 03:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2129 ([[phab:T318605|T318605]])', diff saved to https://phabricator.wikimedia.org/P42031 and previous config saved to /var/cache/conftool/dbconfig/20221201-032922-ladsgroup.json
* 03:29 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2129.codfw.wmnet with reason: Maintenance
* 03:29 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2129.codfw.wmnet with reason: Maintenance
* 03:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2124 ([[phab:T318605|T318605]])', diff saved to https://phabricator.wikimedia.org/P42030 and previous config saved to /var/cache/conftool/dbconfig/20221201-032901-ladsgroup.json
* 03:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2176 ([[phab:T322618|T322618]])', diff saved to https://phabricator.wikimedia.org/P42029 and previous config saved to /var/cache/conftool/dbconfig/20221201-032803-ladsgroup.json
* 03:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2176 ([[phab:T322618|T322618]])', diff saved to https://phabricator.wikimedia.org/P42028 and previous config saved to /var/cache/conftool/dbconfig/20221201-032553-ladsgroup.json
* 03:25 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2176.codfw.wmnet with reason: Maintenance
* 03:25 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2176.codfw.wmnet with reason: Maintenance
* 03:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2174 ([[phab:T322618|T322618]])', diff saved to https://phabricator.wikimedia.org/P42027 and previous config saved to /var/cache/conftool/dbconfig/20221201-032531-ladsgroup.json
* 03:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1098:3316 ([[phab:T318605|T318605]])', diff saved to https://phabricator.wikimedia.org/P42026 and previous config saved to /var/cache/conftool/dbconfig/20221201-031608-ladsgroup.json
* 03:16 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1098.eqiad.wmnet with reason: Maintenance
* 03:15 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1098.eqiad.wmnet with reason: Maintenance
* 03:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3316 ([[phab:T318605|T318605]])', diff saved to https://phabricator.wikimedia.org/P42025 and previous config saved to /var/cache/conftool/dbconfig/20221201-031546-ladsgroup.json
* 03:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135', diff saved to https://phabricator.wikimedia.org/P42024 and previous config saved to /var/cache/conftool/dbconfig/20221201-031514-ladsgroup.json
* 03:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2124', diff saved to https://phabricator.wikimedia.org/P42023 and previous config saved to /var/cache/conftool/dbconfig/20221201-031354-ladsgroup.json
* 03:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2174', diff saved to https://phabricator.wikimedia.org/P42022 and previous config saved to /var/cache/conftool/dbconfig/20221201-031024-ladsgroup.json
* 03:06 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp5027.eqsin.wmnet with reason: host reimage
* 03:03 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp5027.eqsin.wmnet with reason: host reimage
* 03:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3316', diff saved to https://phabricator.wikimedia.org/P42021 and previous config saved to /var/cache/conftool/dbconfig/20221201-030040-ladsgroup.json
* 03:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135 ([[phab:T322618|T322618]])', diff saved to https://phabricator.wikimedia.org/P42020 and previous config saved to /var/cache/conftool/dbconfig/20221201-030007-ladsgroup.json
* 02:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1135 ([[phab:T322618|T322618]])', diff saved to https://phabricator.wikimedia.org/P42019 and previous config saved to /var/cache/conftool/dbconfig/20221201-025900-ladsgroup.json
* 02:59 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1135.eqiad.wmnet with reason: Maintenance
* 02:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2124', diff saved to https://phabricator.wikimedia.org/P42018 and previous config saved to /var/cache/conftool/dbconfig/20221201-025848-ladsgroup.json
* 02:58 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1135.eqiad.wmnet with reason: Maintenance
* 02:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134 ([[phab:T322618|T322618]])', diff saved to https://phabricator.wikimedia.org/P42017 and previous config saved to /var/cache/conftool/dbconfig/20221201-025838-ladsgroup.json
* 02:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2174', diff saved to https://phabricator.wikimedia.org/P42016 and previous config saved to /var/cache/conftool/dbconfig/20221201-025517-ladsgroup.json
* 02:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3316', diff saved to https://phabricator.wikimedia.org/P42015 and previous config saved to /var/cache/conftool/dbconfig/20221201-024533-ladsgroup.json
* 02:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2124 ([[phab:T318605|T318605]])', diff saved to https://phabricator.wikimedia.org/P42014 and previous config saved to /var/cache/conftool/dbconfig/20221201-024341-ladsgroup.json
* 02:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134', diff saved to https://phabricator.wikimedia.org/P42013 and previous config saved to /var/cache/conftool/dbconfig/20221201-024331-ladsgroup.json
* 02:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2124 ([[phab:T318605|T318605]])', diff saved to https://phabricator.wikimedia.org/P42012 and previous config saved to /var/cache/conftool/dbconfig/20221201-024131-ladsgroup.json
* 02:41 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2124.codfw.wmnet with reason: Maintenance
* 02:41 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2124.codfw.wmnet with reason: Maintenance
* 02:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2117 ([[phab:T318605|T318605]])', diff saved to https://phabricator.wikimedia.org/P42011 and previous config saved to /var/cache/conftool/dbconfig/20221201-024110-ladsgroup.json
* 02:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2174 ([[phab:T322618|T322618]])', diff saved to https://phabricator.wikimedia.org/P42010 and previous config saved to /var/cache/conftool/dbconfig/20221201-024011-ladsgroup.json
* 02:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2174 ([[phab:T322618|T322618]])', diff saved to https://phabricator.wikimedia.org/P42009 and previous config saved to /var/cache/conftool/dbconfig/20221201-023801-ladsgroup.json
* 02:38 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2174.codfw.wmnet with reason: Maintenance
* 02:37 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2174.codfw.wmnet with reason: Maintenance
* 02:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2170:3311 ([[phab:T322618|T322618]])', diff saved to https://phabricator.wikimedia.org/P42008 and previous config saved to /var/cache/conftool/dbconfig/20221201-023750-ladsgroup.json
* 02:33 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp5027.eqsin.wmnet with OS buster
* 02:33 sukhe@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp5027.eqsin.wmnet with OS buster
* 02:32 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host druid1009.mgmt.eqiad.wmnet with reboot policy FORCED
* 02:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3316 ([[phab:T318605|T318605]])', diff saved to https://phabricator.wikimedia.org/P42007 and previous config saved to /var/cache/conftool/dbconfig/20221201-023027-ladsgroup.json
* 02:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134', diff saved to https://phabricator.wikimedia.org/P42006 and previous config saved to /var/cache/conftool/dbconfig/20221201-022825-ladsgroup.json
* 02:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2117', diff saved to https://phabricator.wikimedia.org/P42005 and previous config saved to /var/cache/conftool/dbconfig/20221201-022603-ladsgroup.json
* 02:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2170:3311', diff saved to https://phabricator.wikimedia.org/P42004 and previous config saved to /var/cache/conftool/dbconfig/20221201-022244-ladsgroup.json
* 02:22 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp5027.eqsin.wmnet with OS buster
* 02:21 sukhe@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cp5027.eqsin.wmnet with OS buster
* 02:21 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp5027.eqsin.wmnet with OS buster
* 02:20 sukhe@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp5027.eqsin.wmnet with OS buster
* 02:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134 ([[phab:T322618|T322618]])', diff saved to https://phabricator.wikimedia.org/P42003 and previous config saved to /var/cache/conftool/dbconfig/20221201-021318-ladsgroup.json
* 02:13 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 02:12 cmjohnson@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: an-coord - cmjohnson@cumin1001"
* 02:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1134 ([[phab:T322618|T322618]])', diff saved to https://phabricator.wikimedia.org/P42002 and previous config saved to /var/cache/conftool/dbconfig/20221201-021211-ladsgroup.json
* 02:12 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1134.eqiad.wmnet with reason: Maintenance
* 02:12 cmjohnson@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: an-coord - cmjohnson@cumin1001"
* 02:11 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1134.eqiad.wmnet with reason: Maintenance
* 02:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1132 ([[phab:T322618|T322618]])', diff saved to https://phabricator.wikimedia.org/P42001 and previous config saved to /var/cache/conftool/dbconfig/20221201-021149-ladsgroup.json
* 02:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2117', diff saved to https://phabricator.wikimedia.org/P42000 and previous config saved to /var/cache/conftool/dbconfig/20221201-021057-ladsgroup.json
* 02:09 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 02:09 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
* 02:08 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 02:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2170:3311', diff saved to https://phabricator.wikimedia.org/P41999 and previous config saved to /var/cache/conftool/dbconfig/20221201-020737-ladsgroup.json
* 02:03 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2101.codfw.wmnet with reason: Maintenance
* 02:03 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2101.codfw.wmnet with reason: Maintenance
* 02:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1096:3315 ([[phab:T323907|T323907]])', diff saved to https://phabricator.wikimedia.org/P41998 and previous config saved to /var/cache/conftool/dbconfig/20221201-020308-ladsgroup.json
* 02:03 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1096.eqiad.wmnet with reason: Maintenance
* 02:03 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1096.eqiad.wmnet with reason: Maintenance
* 01:59 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 01:59 cmjohnson@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cephosd - cmjohnson@cumin1001"
* 01:58 cmjohnson@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cephosd - cmjohnson@cumin1001"
* 01:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1132', diff saved to https://phabricator.wikimedia.org/P41997 and previous config saved to /var/cache/conftool/dbconfig/20221201-015643-ladsgroup.json
* 01:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2117 ([[phab:T318605|T318605]])', diff saved to https://phabricator.wikimedia.org/P41996 and previous config saved to /var/cache/conftool/dbconfig/20221201-015550-ladsgroup.json
* 01:55 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 01:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2117 ([[phab:T318605|T318605]])', diff saved to https://phabricator.wikimedia.org/P41995 and previous config saved to /var/cache/conftool/dbconfig/20221201-015340-ladsgroup.json
* 01:53 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2117.codfw.wmnet with reason: Maintenance
* 01:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1096:3316 ([[phab:T318605|T318605]])', diff saved to https://phabricator.wikimedia.org/P41994 and previous config saved to /var/cache/conftool/dbconfig/20221201-015332-ladsgroup.json
* 01:53 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1096.eqiad.wmnet with reason: Maintenance
* 01:53 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2117.codfw.wmnet with reason: Maintenance
* 01:53 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1096.eqiad.wmnet with reason: Maintenance
* 01:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2170:3311 ([[phab:T322618|T322618]])', diff saved to https://phabricator.wikimedia.org/P41993 and previous config saved to /var/cache/conftool/dbconfig/20221201-015230-ladsgroup.json
* 01:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1146:3314 ([[phab:T318605|T318605]])', diff saved to https://phabricator.wikimedia.org/P41992 and previous config saved to /var/cache/conftool/dbconfig/20221201-015115-ladsgroup.json
* 01:51 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1146.eqiad.wmnet with reason: Maintenance
* 01:50 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1146.eqiad.wmnet with reason: Maintenance
* 01:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2170:3311 ([[phab:T322618|T322618]])', diff saved to https://phabricator.wikimedia.org/P41991 and previous config saved to /var/cache/conftool/dbconfig/20221201-015020-ladsgroup.json
* 01:50 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2170.codfw.wmnet with reason: Maintenance
* 01:50 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2170.codfw.wmnet with reason: Maintenance
* 01:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2167:3311 ([[phab:T322618|T322618]])', diff saved to https://phabricator.wikimedia.org/P41990 and previous config saved to /var/cache/conftool/dbconfig/20221201-015010-ladsgroup.json
* 01:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1132', diff saved to https://phabricator.wikimedia.org/P41989 and previous config saved to /var/cache/conftool/dbconfig/20221201-014136-ladsgroup.json
* 01:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2167:3311', diff saved to https://phabricator.wikimedia.org/P41988 and previous config saved to /var/cache/conftool/dbconfig/20221201-013503-ladsgroup.json
* 01:27 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp5027.eqsin.wmnet with OS buster
* 01:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1132 ([[phab:T322618|T322618]])', diff saved to https://phabricator.wikimedia.org/P41987 and previous config saved to /var/cache/conftool/dbconfig/20221201-012630-ladsgroup.json
* 01:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1132 ([[phab:T322618|T322618]])', diff saved to https://phabricator.wikimedia.org/P41986 and previous config saved to /var/cache/conftool/dbconfig/20221201-012522-ladsgroup.json
* 01:25 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1132.eqiad.wmnet with reason: Maintenance
* 01:25 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1132.eqiad.wmnet with reason: Maintenance
* 01:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1128 ([[phab:T322618|T322618]])', diff saved to https://phabricator.wikimedia.org/P41985 and previous config saved to /var/cache/conftool/dbconfig/20221201-012500-ladsgroup.json
* 01:24 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp5026.eqsin.wmnet with OS buster
* 01:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2167:3311', diff saved to https://phabricator.wikimedia.org/P41984 and previous config saved to /var/cache/conftool/dbconfig/20221201-011957-ladsgroup.json
* 01:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1128', diff saved to https://phabricator.wikimedia.org/P41983 and previous config saved to /var/cache/conftool/dbconfig/20221201-010954-ladsgroup.json
* 01:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2167:3311 ([[phab:T322618|T322618]])', diff saved to https://phabricator.wikimedia.org/P41982 and previous config saved to /var/cache/conftool/dbconfig/20221201-010450-ladsgroup.json
* 01:04 ejegg: payments-wiki upgraded from {{Gerrit|96c74911}} to {{Gerrit|c52a6a39}}
* 01:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2167:3311 ([[phab:T322618|T322618]])', diff saved to https://phabricator.wikimedia.org/P41981 and previous config saved to /var/cache/conftool/dbconfig/20221201-010240-ladsgroup.json
* 01:02 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2167.codfw.wmnet with reason: Maintenance
* 01:02 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2167.codfw.wmnet with reason: Maintenance
* 01:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2153 ([[phab:T322618|T322618]])', diff saved to https://phabricator.wikimedia.org/P41980 and previous config saved to /var/cache/conftool/dbconfig/20221201-010219-ladsgroup.json
* 00:56 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp5026.eqsin.wmnet with reason: host reimage
* 00:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1128', diff saved to https://phabricator.wikimedia.org/P41979 and previous config saved to /var/cache/conftool/dbconfig/20221201-005447-ladsgroup.json
* 00:53 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp5026.eqsin.wmnet with reason: host reimage
* 00:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2153', diff saved to https://phabricator.wikimedia.org/P41978 and previous config saved to /var/cache/conftool/dbconfig/20221201-004712-ladsgroup.json
* 00:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1128 ([[phab:T322618|T322618]])', diff saved to https://phabricator.wikimedia.org/P41977 and previous config saved to /var/cache/conftool/dbconfig/20221201-003941-ladsgroup.json
* 00:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1128 ([[phab:T322618|T322618]])', diff saved to https://phabricator.wikimedia.org/P41976 and previous config saved to /var/cache/conftool/dbconfig/20221201-003533-ladsgroup.json
* 00:35 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1128.eqiad.wmnet with reason: Maintenance
* 00:35 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1128.eqiad.wmnet with reason: Maintenance
* 00:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1119 ([[phab:T322618|T322618]])', diff saved to https://phabricator.wikimedia.org/P41975 and previous config saved to /var/cache/conftool/dbconfig/20221201-003511-ladsgroup.json
* 00:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2153', diff saved to https://phabricator.wikimedia.org/P41974 and previous config saved to /var/cache/conftool/dbconfig/20221201-003205-ladsgroup.json
* 00:25 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp5026.eqsin.wmnet with OS buster
* 00:23 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1206.eqiad.wmnet with OS bullseye
* 00:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1119', diff saved to https://phabricator.wikimedia.org/P41973 and previous config saved to /var/cache/conftool/dbconfig/20221201-002005-ladsgroup.json
* 00:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2153 ([[phab:T322618|T322618]])', diff saved to https://phabricator.wikimedia.org/P41972 and previous config saved to /var/cache/conftool/dbconfig/20221201-001659-ladsgroup.json
* 00:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2153 ([[phab:T322618|T322618]])', diff saved to https://phabricator.wikimedia.org/P41971 and previous config saved to /var/cache/conftool/dbconfig/20221201-001449-ladsgroup.json
* 00:14 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2153.codfw.wmnet with reason: Maintenance
* 00:14 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2153.codfw.wmnet with reason: Maintenance
* 00:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2146 ([[phab:T322618|T322618]])', diff saved to https://phabricator.wikimedia.org/P41970 and previous config saved to /var/cache/conftool/dbconfig/20221201-001427-ladsgroup.json
* 00:10 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1206.eqiad.wmnet with reason: host reimage
* 00:07 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db1206.eqiad.wmnet with reason: host reimage
* 00:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1119', diff saved to https://phabricator.wikimedia.org/P41969 and previous config saved to /var/cache/conftool/dbconfig/20221201-000458-ladsgroup.json


== 2020-04-15 ==
==Archives ==
* 22:11 mholloway-shell@deploy1001: Synchronized php-1.35.0-wmf.28/extensions/MachineVision: Fix: Initialize categories array for initial images ([[phab:T250321|T250321]]) (duration: 01m 07s)
See [[Server Admin Log/Archives]].
* 21:48 maryum: removing duplicate incdices from production ES clusters that were created when reindexing failed
* 20:16 bsitzmann@deploy1001: Finished deploy [mobileapps/deploy@1907571]: Update mobileapps to {{Gerrit|ff34d0b5}} (duration: 04m 57s)
* 20:11 bsitzmann@deploy1001: Started deploy [mobileapps/deploy@1907571]: Update mobileapps to {{Gerrit|ff34d0b5}}
* 19:53 addshore: pool wdqs1006 caught up
* 19:44 addshore: depool wdqs1006 to catch up on lag
* 19:04 jforrester@deploy1001: Synchronized php: group1 wikis to 1.35.0-wmf.28 (duration: 01m 05s)
* 19:03 jforrester@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.35.0-wmf.28
* 18:44 krinkle@deploy1001: Synchronized wmf-config/CommonSettings.php: {{Gerrit|Idc81a885b2f3}}, [[phab:T196309|T196309]] (duration: 01m 07s)
* 18:12 tgr@deploy1001: Synchronized wmf-config/InitialiseSettings.php: re-sync (duration: 01m 07s)
* 18:10 tgr@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:589029{{!}}Fix GrowthExperiments helpdesk URL for frwiktionary (T235964)]] (duration: 01m 06s)
* 16:08 volker-e@deploy1001: Finished deploy [design/style-guide@a4d5794]: Deploy design/style-guide:  (duration: 00m 11s)
* 16:08 volker-e@deploy1001: Started deploy [design/style-guide@a4d5794]: Deploy design/style-guide:
* 15:46 ejegg: updated fundraising CiviCRM from {{Gerrit|18d7567cd7}} to {{Gerrit|1224b080c1}}
* 15:36 ema: cp2029,cp3050: upgrade purged to 0.6, restart varnish-fe [[phab:T249583|T249583]]
* 15:30 ema: upload purged 0.6 to buster-wikimedia [[phab:T249583|T249583]]
* 15:19 papaul: upgrading firmware on restbase2014
* 14:36 vgutierrez: rolling upgrade to ATS 8.0.7-rc0-1wm2 on cp[3064,3065,2042,2041,1090,1089] - [[phab:T249335|T249335]]
* 14:32 jforrester@deploy1001: Synchronized wmf-config/ProductionServices.php: Drop 'parsoidphp' service, we use 'parsoid' now (duration: 01m 06s)
* 14:27 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: Use 'parsoid' service in lieu of 'parsoidphp' (duration: 01m 07s)
* 14:25 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Touch and secondary sync of IS for cache-busting (duration: 01m 06s)
* 14:23 jforrester@deploy1001: Synchronized wmf-config/ProductionServices.php: Add 'parsoid' service to replace 'parsoidphp' (duration: 01m 06s)
* 14:17 jforrester@deploy1001: Synchronized wmf-config/wikitech.php: Use MediaWikiServices::getAuthManager on wikitech (duration: 01m 06s)
* 14:14 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [[phab:T242912|T242912]] Remove wgEnablePartialBlocks config, no longer read (duration: 01m 07s)
* 14:12 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: wmgExtraLanguageNames: Remove 'smn', supported by core since 1.35.0-wmf.26 (duration: 01m 06s)
* 14:10 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Touch and secondary sync of IS for cache-busting (duration: 01m 06s)
* 14:09 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [[phab:T250181|T250181]] [[phab:T250183|T250183]] Wikibase: Use false instead of database names for 'local' entity sources on test wikis (duration: 01m 06s)
* 14:02 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Touch and secondary sync of IS for cache-busting (duration: 01m 05s)
* 14:01 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Stop defining wmgMobileFrontend and wmgMinervaNeue, unread (duration: 01m 06s)
* 13:59 jforrester@deploy1001: Synchronized wmf-config/mobile.php: Stop reading wmgMobileFrontend and wmgMinervaNeue, always true (duration: 01m 06s)
* 13:52 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: Stop setting wgContentHandlerUseDB, now unread (duration: 01m 06s)
* 13:32 ema: upload varnish_5.1.3-1wm14 to buster-wikimedia [[phab:T249810|T249810]]
* 13:26 jforrester@deploy1001: Synchronized php-1.35.0-wmf.28/extensions/Flow/Hooks.php: [[phab:T248727|T248727]] Adjust to RevisionUndeleted hook now having  (duration: 01m 04s)
* 13:25 jforrester@deploy1001: Synchronized php-1.35.0-wmf.28/extensions/LiquidThreads/classes/DeletionController.php: [[phab:T248727|T248727]] Adjust to RevisionUndeleted hook now having  (duration: 01m 06s)
* 13:23 jforrester@deploy1001: Synchronized php-1.35.0-wmf.28/includes/page/PageArchive.php: [[phab:T248727|T248727]] Fix RevisionUndeleted hook to add  (duration: 01m 08s)
* 13:23 kormat@cumin1001: dbctl commit (dc=all): 'Increase db1114's weight to 100% of target, and reduce db1104 slightly [[phab:T250224|T250224]]', diff saved to https://phabricator.wikimedia.org/P10990 and previous config saved to /var/cache/conftool/dbconfig/20200415-132310-kormat.json
* 13:10 hashar: contint2001: starting zuul-merger process # [[phab:T224591|T224591]]
* 12:49 kormat@cumin1001: dbctl commit (dc=all): 'Increase db1114's weight to 50% of target [[phab:T250224|T250224]]', diff saved to https://phabricator.wikimedia.org/P10989 and previous config saved to /var/cache/conftool/dbconfig/20200415-124931-kormat.json
* 12:41 vgutierrez: rolling upgrade to ATS 8.0.7-rc0-1wm2 in ulsfo and eqsin - [[phab:T249335|T249335]]
* 12:03 mutante: puppetmaster1001: revoking ganeti01.svc.eqiad.wmnet and ganeti01.svc.codfw.wmnet certificates. adding eqiad and codfw to cergen .yaml file, recreating ganeti certs
* 11:27 awight@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:588701{{!}}Deploy Welcome Survey to Serbian Wikipedia and French Wiktionary (T249956)]] (double-sync) (duration: 01m 03s)
* 11:26 awight@deploy1001: sync-file aborted: SWAT: [[gerrit:588701{{!}}Deploy Welcome Survey to Serbian Wikipedia and French Wiktionary (T249956)]] (double-sync) (duration: 00m 02s)
* 11:23 awight: EU SWAT complete
* 11:22 awight@deploy1001: Synchronized php-1.35.0-wmf.28/extensions/TwoColConflict: SWAT: [[gerrit:588966{{!}}Flatten exit logging (T248601)]] (duration: 01m 09s)
* 11:09 awight@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:588701{{!}}Deploy Welcome Survey to Serbian Wikipedia and French Wiktionary (T249956)]] (duration: 01m 24s)
* 10:57 marostegui: Deploy schema change on s8 codfw master - [[phab:T250057|T250057]]
* 10:25 ema: cp3050: varnish-frontend-restart to clear mbox lag and see how long it takes to show up [[phab:T249583|T249583]]
* 10:02 ema: upload purged 0.5 to buster-wikimedia [[phab:T249583|T249583]]
* 09:50 jynus@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 09:48 jynus@cumin2001: START - Cookbook sre.hosts.downtime
* 09:48 vgutierrez: disable KA between ats-tls and varnish-fe for POST requests on eqiad - [[phab:T250258|T250258]]
* 09:45 godog: force-run curator from logstash1008 - [[phab:T250133|T250133]]
* 09:43 kormat@cumin1001: dbctl commit (dc=all): 'Increase db1114's weight some more [[phab:T250224|T250224]]', diff saved to https://phabricator.wikimedia.org/P10988 and previous config saved to /var/cache/conftool/dbconfig/20200415-094305-kormat.json
* 09:08 elukey: restart druid brokers on druid100[4-6] - stuck after datasource deletion
* 09:07 vgutierrez: repool cp1081
* 08:54 kormat@cumin1001: dbctl commit (dc=all): 'Increase db1114's weight [[phab:T250224|T250224]]', diff saved to https://phabricator.wikimedia.org/P10986 and previous config saved to /var/cache/conftool/dbconfig/20200415-085432-kormat.json
* 08:54 vgutierrez: depool cp1081 for debugging purposes
* 08:46 XioNoX: reset edac counters on scb1001
* 08:43 dcausse: errata: elastic (search cluster) reindexing commonswiki_content on cloudelastic ([[phab:T246882|T246882]])
* 08:42 dcausse: elastic (search cluster) reindex commmonswiki_content on cloudelastic ([[phab:T246882|T246882]])
* 08:14 marostegui@cumin1001: dbctl commit (dc=all): 'Pool db1114 on s8 with low weight [[phab:T250224|T250224]]', diff saved to https://phabricator.wikimedia.org/P10985 and previous config saved to /var/cache/conftool/dbconfig/20200415-081421-marostegui.json
* 07:59 marostegui: Deploy schema change on s7 codfw master - [[phab:T250057|T250057]]
* 07:35 elukey: restart cloudelastic-chi on cloudelastic1002 to apply new jvm settings - [[phab:T231517|T231517]]
* 06:55 mutante: install1003 moving /srv/autoinstall to /root, running puppet, leaving a README file to point out it moved to apt1001
* 06:47 marostegui: Deploy schema change on s6 codfw with replication - [[phab:T250057|T250057]]
* 06:43 marostegui: Deploy schema change on labtestwiki - [[phab:T250057|T250057]]
* 06:43 XioNoX: re-set asw2-c-eqiad's licenses
* 06:42 marostegui: Deploy schema change on labswiki - [[phab:T250057|T250057]]
* 06:32 XioNoX: set uRPF log action back to log infra wide - [[phab:T244147|T244147]]
* 06:04 vgutierrez: update to ats 8.0.7-rc0-1wm2 on cp[5006,5012] - [[phab:T249335|T249335]]
* 05:49 moritzm: installing git security updates
* 05:27 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 05:25 marostegui@cumin1001: START - Cookbook sre.hosts.downtime
* 05:22 kart_: Update cxserver to 2020-04-13-094138-production ([[phab:T239459|T239459]], [[phab:T249469|T249469]])
* 05:21 marostegui: Remove db1114 from tendril and zarcillo [[phab:T250224|T250224]]
* 05:17 kartik@deploy1001: helmfile [CODFW] Ran 'apply' command on namespace 'cxserver' for release 'production' .
* 05:13 kartik@deploy1001: helmfile [EQIAD] Ran 'apply' command on namespace 'cxserver' for release 'production' .
* 05:11 kartik@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'cxserver' for release 'staging' .
* 05:07 marostegui: Remove db1114 from tendril - [[phab:T250224|T250224]]
 
== 2020-04-14 ==
* 23:24 AndyRussG: re-enabled thank-you, onimailing and new recurring charge jobs
* 22:59 AndyRussG: disabled thank-you and omnimailing jobs
* 22:59 AndyRussG: fundraising civicrm revision changed from {{Gerrit|59e712ce8e}} to {{Gerrit|18d7567cd7}}
* 21:36 addshore: pool wdqs1006, it is caught up
* 21:03 addshore: depool wdqs1006 to give it a chance to catch up on lag
* 20:34 cdanis@cumin1001: dbctl commit (dc=all): 'tweak db1111 weight yet again', diff saved to https://phabricator.wikimedia.org/P10979 and previous config saved to /var/cache/conftool/dbconfig/20200414-203426-cdanis.json
* 20:18 James_F: Adding Create-Signed-Tag right to wikimedia-ui-base group for wikimedia-ui-base repo
* 20:14 marostegui@cumin1001: dbctl commit (dc=all): 'Change s8 weights', diff saved to https://phabricator.wikimedia.org/P10978 and previous config saved to /var/cache/conftool/dbconfig/20200414-201412-marostegui.json
* 19:58 marostegui@cumin1001: dbctl commit (dc=all): 'reduce db1126 weight due to cpu issues', diff saved to https://phabricator.wikimedia.org/P10977 and previous config saved to /var/cache/conftool/dbconfig/20200414-195855-marostegui.json
* 19:57 cdanis@cumin1001: dbctl commit (dc=all): '+db1111, -db1126', diff saved to https://phabricator.wikimedia.org/P10976 and previous config saved to /var/cache/conftool/dbconfig/20200414-195734-cdanis.json
* 19:51 cdanis@cumin1001: dbctl commit (dc=all): 'more weight to db1104', diff saved to https://phabricator.wikimedia.org/P10975 and previous config saved to /var/cache/conftool/dbconfig/20200414-195100-cdanis.json
* 19:47 cdanis@cumin1001: dbctl commit (dc=all): '+weight on db1104@s8', diff saved to https://phabricator.wikimedia.org/P10974 and previous config saved to /var/cache/conftool/dbconfig/20200414-194710-cdanis.json
* 19:26 jforrester@deploy1001: rebuilt and synchronized wikiversions files: group0 wikis to 1.35.0-wmf.28
* 19:22 ebernhardson@deploy1001: Finished scap: wmf-config/PoolCounterSettings.php cirrus: increase pool counter size for traffic shift to codfw (duration: 21m 55s)
* 19:00 ebernhardson@deploy1001: Started scap: wmf-config/PoolCounterSettings.php cirrus: increase pool counter size for traffic shift to codfw
* 18:41 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 18:38 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime
* 17:59 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 17:57 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime
* 17:35 jforrester@deploy1001: Finished scap: Testwikis to php-1.35.0-wmf.28 and rebuild i18n cache for [[phab:T247775|T247775]] (duration: 42m 37s)
* 17:26 ppchelko@deploy1001: Finished deploy [changeprop/deploy@baf0a4b]: Rollback removing k8s rules, again (duration: 00m 56s)
* 17:25 ppchelko@deploy1001: Started deploy [changeprop/deploy@baf0a4b]: Rollback removing k8s rules, again
* 17:23 ppchelko@deploy1001: deploy aborted: Rollback removing k8s rules, again (duration: 00m 05s)
* 17:23 ppchelko@deploy1001: Started deploy [changeprop/deploy@354ae2d]: Rollback removing k8s rules, again
* 17:12 ppchelko@deploy1001: Finished deploy [changeprop/deploy@354ae2d]: Remove rules enabled in k8s [[phab:T248677|T248677]] attempt 2 (duration: 00m 25s)
* 17:12 ppchelko@deploy1001: Started deploy [changeprop/deploy@354ae2d]: Remove rules enabled in k8s [[phab:T248677|T248677]] attempt 2
* 17:08 hnowlan@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'changeprop' for release 'production' .
* 17:07 hnowlan@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'changeprop' for release 'production' .
* 17:05 hnowlan@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'changeprop' for release 'staging' .
* 16:52 jforrester@deploy1001: Started scap: Testwikis to php-1.35.0-wmf.28 and rebuild i18n cache for [[phab:T247775|T247775]]
* 16:49 jforrester@deploy1001: sync aborted: testwikis wikis to 1.35.0-wmf.28 (duration: 00m 05s)
* 16:49 jforrester@deploy1001: Started scap: testwikis wikis to 1.35.0-wmf.28
* 16:38 akosiaris: stop all ganeti components (VMs are fine) on all ganeti2* hosts for key/cert rollover
* 16:38 jforrester@deploy1001: Pruned MediaWiki: 1.35.0-wmf.25 (duration: 17m 20s)
* 16:20 James_F: Scap cleaning 1.35.0-wmf.25 [[phab:T247775|T247775]]
* 16:07 ariel@deploy1001: Finished deploy [dumps/dumps@90cbab0]: fix listing of input files for 7z recompression, retry (duration: 00m 04s)
* 16:06 ariel@deploy1001: Started deploy [dumps/dumps@90cbab0]: fix listing of input files for 7z recompression, retry
* 16:06 ppchelko@deploy1001: Finished deploy [changeprop/deploy@baf0a4b]: Rollback removing k8s rules (duration: 01m 20s)
* 16:06 ejegg: disabled new recurring payments charge job
* 16:05 ppchelko@deploy1001: Started deploy [changeprop/deploy@baf0a4b]: Rollback removing k8s rules
* 16:04 ariel@deploy1001: Finished deploy [dumps/dumps@90cbab0]: fix listing of input files for 7z recompression (duration: 00m 04s)
* 16:04 ariel@deploy1001: Started deploy [dumps/dumps@90cbab0]: fix listing of input files for 7z recompression
* 15:52 ema: cp3050: suspend purged testing, varnish-frontend-restart to clear mailbox lag [[phab:T249583|T249583]]
* 15:50 hnowlan@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'changeprop' for release 'production' .
* 15:49 James_F: 1.35.0-wmf.28 was branched at {{Gerrit|ded5b87df12cea88d94dde0fa22cac13227f8e92}} for [[phab:T247775|T247775]]
* 15:47 hnowlan@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'changeprop' for release 'production' .
* 15:19 hnowlan@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'changeprop' for release 'production' .
* 15:17 hnowlan@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'changeprop' for release 'production' .
* 15:15 vgutierrez: update to ats 8.0.7-rc0-1wm2 on cp[4026,4032] - [[phab:T249335|T249335]]
* 15:13 vgutierrez: upload trafficserver 8.0.7-rc0-1wm2 to apt.wm.o (buster) - [[phab:T249335|T249335]]
* 15:12 hnowlan@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'changeprop' for release 'production' .
* 15:11 hnowlan@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'changeprop' for release 'production' .
* 14:44 ppchelko@deploy1001: Finished deploy [changeprop/deploy@354ae2d]: Remove rules enabled in k8s [[phab:T248677|T248677]] (duration: 01m 58s)
* 14:42 ppchelko@deploy1001: Started deploy [changeprop/deploy@354ae2d]: Remove rules enabled in k8s [[phab:T248677|T248677]]
* 14:34 godog: power down ms-be1023 - [[phab:T249174|T249174]]
* 14:33 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 14:33 filippo@cumin1001: START - Cookbook sre.hosts.downtime
* 14:33 filippo@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 14:33 filippo@cumin1001: START - Cookbook sre.hosts.downtime
* 14:15 elukey: enable TLS between weblog1001,mwlog2001.codfw.wmnet,mwlog1001 and Kafka Jumbo/Logging - [[phab:T250147|T250147]]
* 14:15 hashar: Rebasing mediawiki-config on deploy1001 for a deployment-prep config change ( https://gerrit.wikimedia.org/r/#/c/operations/mediawiki-config/+/588706/ )
* 14:12 ema: cp3050: resume purged testing [[phab:T249583|T249583]]
* 13:55 ema: upload purged 0.4 to buster-wikimedia [[phab:T249583|T249583]]
* 13:21 hashar: Starting zuul-merger on contint2001
* 12:50 vgutierrez: Enable inbound TLSv1.3 in text@eqsin - [[phab:T170567|T170567]]
* 12:03 jbond42: upgrade haproxy on dns servers
* 11:08 Urbanecm: EU SWAT done
* 11:05 Urbanecm: Purge https://en.wikipedia.org/static/images/project-logos/cswiki*.png ([[phab:T249173|T249173]])
* 11:04 urbanecm@deploy1001: Synchronized static/images/project-logos/: SWAT: {{Gerrit|7da408e}}: Revert "Enable cswiki anniversary logo" ([[phab:T249173|T249173]]) (duration: 01m 00s)
* 11:01 jynus: resizing backup1001:/srv/databases to 40 TB
* 10:55 XioNoX: set uRPF log action to syslog infra wide - [[phab:T244147|T244147]]
* 10:15 XioNoX: update prefix-list LVS-service-ips to add missing prefixes
* 09:49 XioNoX: re-order aggregate routes to standardize order
* 09:48 XioNoX: cleanup 2620:0:860::/46 and 208.80.152.0/22 aggregates from cr2-eqdfw - [[phab:T246721|T246721]]
* 09:47 XioNoX: cleanup 2620:0:860::/46 and 208.80.152.0/22 aggregates from cr2-eqord - [[phab:T246721|T246721]]
* 09:37 XioNoX: cleanup 2620:0:860::/46 and 208.80.152.0/22 aggregates from cr1/2-codfw - [[phab:T246721|T246721]]
* 09:17 XioNoX: add missing `routing-options rib inet6.0 aggregate defaults discard` where missing (cr3-knams, cr3-esams, cr2-eqord, cr2-eqdfw, cr1/2-eqiad/codfw)
* 09:13 godog: add mwilliams to 'wmf' ldap group - [[phab:T249844|T249844]]
* 09:08 marostegui: Add kormat to ops and wmf ldap groups - [[phab:T250134|T250134]]
* 08:49 elukey: restart elastic-chi on cloudelastic1001 with -XX:NewSize=10G - [[phab:T231517|T231517]]
* 07:33 elukey: apply CMS GC settings to chi on cloudelastic1001 - [[phab:T231517|T231517]]
* 05:30 vgutierrez: rolling upgrade to ats 8.0.7-rc0-1wm1 in esams and eqiad
* 05:01 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Repool pc2008 after upgrade (duration: 01m 00s)
 
== 2020-04-13 ==
* 23:24 mdholloway: re-ran extensions/MachineVision/maintenance/withholdImages.php on commonswiki
* 23:14 mholloway-shell@deploy1001: Synchronized wmf-config/InitialiseSettings.php: MachineVision withholding list additions ([[phab:T249939|T249939]]) (duration: 00m 59s)
* 22:41 cdanis: repool codfw
* 22:35 ebernhardson: restart elasticsearch_6@production-search-psi-eqiad on elastic1052 for excessive old gc over last few hours
* 22:35 ebernhardson: restart elasticsearch_6@production-search-psi-eqiad on elastic1052
* 22:08 cdanis: depool codfw
* 21:43 mdholloway: ran extensions/MachineVision/maintenance/removeBlacklistedSuggestions.php on commonswiki ([[phab:T249273|T249273]])
* 21:34 mdholloway: ran extensions/MachineVision/maintenance/removeBlacklistedSuggestions.php on testcommonswiki
* 21:32 mholloway-shell@deploy1001: Synchronized php-1.35.0-wmf.27/extensions/MachineVision: Add script to apply blacklist to current labels ([[phab:T249273|T249273]]) (duration: 00m 58s)
* 20:49 mholloway-shell@deploy1001: Synchronized wmf-config/InitialiseSettings.php: MachineVision blocklist update ([[phab:T249895|T249895]]) (duration: 00m 59s)
* 19:56 mdholloway: finished running extensions/MachineVision/maintenance/withholdImages.php on commonswiki ([[phab:T249939|T249939]])
* 19:51 mdholloway: running extensions/MachineVision/maintenance/withholdImages.php on commonswiki
* 19:41 mdholloway: ran extensions/MachineVision/maintenance/withholdImages.php on testcommonswiki
* 19:37 mholloway-shell@deploy1001: Synchronized php-1.35.0-wmf.27/extensions/MachineVision: Add support for WITHHOLD_ALL review state ([[phab:T249939|T249939]]) (duration: 01m 23s)
* 19:13 mholloway-shell@deploy1001: Synchronized wmf-config/InitialiseSettings.php: MachineVision: Add MachineVisionWithholdImageList config ([[phab:T249939|T249939]]) (duration: 01m 03s)
* 19:06 niedzielski: Morning SWAT done
* 19:02 niedzielski@deploy1001: Synchronized php-1.35.0-wmf.27/skins/MinervaNeue: SWAT: [[gerrit:588405{{!}}Update the icon glyph (T249864)]] (duration: 01m 00s)
* 18:49 niedzielski@deploy1001: Synchronized php-1.35.0-wmf.27/extensions/TwoColConflict: SWAT: [[gerrit:588370{{!}}Fix double HTML escaping of "copytext" lines in the diff (T249986)]] (duration: 01m 01s)
* 17:01 XioNoX: sample before any other border-in terms in eqiad
* 16:57 XioNoX: sample before any other border-in terms in esams
* 16:50 XioNoX: sample before any other border-in terms in dfw
* 16:46 XioNoX: sample before any other border-in terms in ulsfo
* 16:36 XioNoX: sample before any other border-in terms in eqsin
* 16:36 mholloway-shell@deploy1001: helmfile [CODFW] Ran 'apply' command on namespace 'wikifeeds' for release 'production' .
* 16:33 mholloway-shell@deploy1001: helmfile [EQIAD] Ran 'apply' command on namespace 'wikifeeds' for release 'production' .
* 16:31 XioNoX: Sample all inbound v6 traffic on cr2-eqsin
* 16:31 cmjohnson1: replacing msw-c6-eqiad
* 16:30 mholloway-shell@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'wikifeeds' for release 'staging' .
* 15:56 marostegui: Deploy schema change on s4 codfw master - [[phab:T250067|T250067]]
* 12:12 vgutierrez: rolling upgrade to ats 8.0.7-rc0-1wm1 in eqsin and codfw
* 11:58 akosiaris@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'kube-system' for release 'coredns' .
* 11:57 marostegui: Deploy schema change on eqiad s8 hosts - [[phab:T250062|T250062]]
* 11:53 akosiaris@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'kube-system' for release 'calico-policy-controller' .
* 11:53 akosiaris@deploy1001: helmfile [EQIAD] Ran 'apply' command on namespace 'kube-system' for release 'calico-policy-controller' .
* 11:53 akosiaris@deploy1001: helmfile [CODFW] Ran 'apply' command on namespace 'kube-system' for release 'calico-policy-controller' .
* 11:53 marostegui: Deploy schema change on codfw master (lag will appear on codfw) - [[phab:T250062|T250062]]
* 11:15 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: {{Gerrit|efe2feb}}: robots.txt: Disable indexing user (sub)pages and draft-related pages on srwiki ([[phab:T248860|T248860]]; take II) (duration: 00m 58s)
* 11:14 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: {{Gerrit|efe2feb}}: robots.txt: Disable indexing user (sub)pages and draft-related pages on srwiki ([[phab:T248860|T248860]]) (duration: 00m 58s)
* 10:37 jdrewniak@deploy1001: Synchronized portals: Wikimedia Portals Update: [[gerrit:588383{{!}} Bumping portals to master (563985)]] (duration: 00m 58s)
* 10:36 jdrewniak@deploy1001: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: [[gerrit:588383{{!}} Bumping portals to master (563985)]] (duration: 01m 00s)
* 10:24 mutante: depooled wdqs1004 by request because of high lag
* 10:19 marostegui: Kill updateSpecialPages.php --only=Fewestrevisions for s8 in mwmaint1002, the vslow host is lagging and creating errors
* 10:12 mutante: mwmaint1002 - sudo systemctl status mediawiki_job_translationnotifications-mediawikiwiki.service
* 09:52 Urbanecm: Rename user account Gerakiw@grwikimedia to Geraki@grwikimedia ([[phab:T245911|T245911]])
* 09:47 Urbanecm: mwscript createAndPromote.php --wiki=grwikimedia --force Gerakiw <redacted> ([[phab:T245911|T245911]])
* 08:15 marostegui: Remove grants for haproxy@10.64.37.15 from labsdb hosts [[phab:T231280|T231280]]
* 07:50 vgutierrez: enable memory tracking in ats-tls on cp1085 - [[phab:T249335|T249335]]
* 07:43 marostegui: Compress db1092 [[phab:T232446|T232446]]
* 07:41 marostegui@cumin1001: dbctl commit (dc=all): 'Temporary pool db1111 in s8 API', diff saved to https://phabricator.wikimedia.org/P10964 and previous config saved to /var/cache/conftool/dbconfig/20200413-074158-marostegui.json
* 07:40 vgutierrez: rolling upgrade to ats 8.0.7-rc0-1wm1 in ulsfo
* 07:39 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1092 [[phab:T232446|T232446]]', diff saved to https://phabricator.wikimedia.org/P10963 and previous config saved to /var/cache/conftool/dbconfig/20200413-073939-marostegui.json
* 07:17 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1110 [[phab:T249973|T249973]]', diff saved to https://phabricator.wikimedia.org/P10962 and previous config saved to /var/cache/conftool/dbconfig/20200413-071740-marostegui.json
* 06:51 marostegui: Deploy schema changes on db1110 - [[phab:T249973|T249973]]
* 06:50 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1110 [[phab:T249973|T249973]]', diff saved to https://phabricator.wikimedia.org/P10961 and previous config saved to /var/cache/conftool/dbconfig/20200413-065022-marostegui.json
* 06:36 elukey: temporary stopped puppet on restbase2014 to avoid attempts to start cassandra on each run - [[phab:T250050|T250050]]
* 06:23 vgutierrez: upgrade to ats 8.0.7-rc0-1wm1 on cp[4026,4032,5006,5012]
* 06:20 vgutierrez: upload trafficserver 8.0.7-rc0-1wm1 to apt.wm.o (buster)
* 05:25 vgutierrez: restart varnish-fe on cp3050
 
== 2020-04-12 ==
* 11:11 vgutierrez: restart ats-tls on cp5008.eqsin.wmnet - [[phab:T249335|T249335]]
* 10:18 elukey: restart wdqs-updater on wdqs1004 (logs show no reports from the past hours, last one were stack traces related to a json decode failure)
* 06:59 dcausse: restarting blazegraph on wdqs1004 ([[phab:T242453|T242453]])
* 06:35 elukey@puppetmaster1001: conftool action : set/pooled=no; selector: name=restbase1025.eqiad.wmnet
* 06:32 elukey: powerdown restbase1025 - [[phab:T250027|T250027]]
* 06:21 elukey: powercycle restbase1025 (not reachable, serial console shows blank, racadm getsel reports errors with DIMM_B2)
* 05:53 bblack: pushing https://gerrit.wikimedia.org/r/588134 to cache_text
* 05:50 vgutierrez: restart ats-tls on cp[1077,1081,1083,1085].eqiad.wmnet- [[phab:T249335|T249335]]
 
== 2020-04-11 ==
* 19:52 cdanis@cumin1001: dbctl commit (dc=all): 'slight deweight to db1111', diff saved to https://phabricator.wikimedia.org/P10960 and previous config saved to /var/cache/conftool/dbconfig/20200411-195235-cdanis.json
* 17:35 cdanis@cumin1001: dbctl commit (dc=all): 's8: +weight db1111, -weight db1126', diff saved to https://phabricator.wikimedia.org/P10959 and previous config saved to /var/cache/conftool/dbconfig/20200411-173517-cdanis.json
* 15:39 vgutierrez: restart ats-tls on cp[1077,1081,1083,1085].eqiad.wmnet- [[phab:T249335|T249335]]
* 09:30 elukey@cumin1001: END (PASS) - Cookbook sre.presto.roll-restart-workers (exit_code=0)
* 09:20 elukey@cumin1001: START - Cookbook sre.presto.roll-restart-workers
* 07:01 vgutierrez: restart ats-tls on cp[1079,1081,1083,1085].eqiad.wmnet- [[phab:T249335|T249335]]
 
== 2020-04-10 ==
* 21:12 cdanis@cumin1001: dbctl commit (dc=all): 'db1111 seems overloaded', diff saved to https://phabricator.wikimedia.org/P10954 and previous config saved to /var/cache/conftool/dbconfig/20200410-211202-cdanis.json
* 19:37 cdanis: cdanis@re0.cr1-codfw> clear bfd session address 208.80.153.220
* 15:03 vgutierrez: restart ats-tls on cp1083 and cp1085 - [[phab:T249335|T249335]]
* 13:14 hashar@deploy1001: Finished deploy [zuul/deploy@4a69913]: (no justification provided) (duration: 00m 40s)
* 13:14 hashar@deploy1001: Started deploy [zuul/deploy@4a69913]: (no justification provided)
* 13:12 mutante: restarted and re-armed keyholder on deploy1001 to pick up changes for zuul scap deploy
* 12:12 dzahn@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99)
* 12:11 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm
* 12:10 mutante: Creating VM people1002.eqiad.wmnet in cluster ganeti01.svc.eqiad.wmnet with row=A vcpus=1 memory=2GB disk=80GB link=private. ([[phab:T249907|T249907]])
* 12:10 dzahn@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99)
* 12:10 mutante: Creating VM people1002.eqiad.wmnet in cluster ganeti01.svc.eqiad.wmnet with row=A vcpus=1 memory=2GB disk=80GB link=private. This may take a few minutes.
* 12:10 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm
* 12:09 dzahn@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99)
* 12:09 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm
* 11:47 akosiaris@deploy1001: helmfile [CODFW] Ran 'apply' command on namespace 'mathoid' for release 'canary' .
* 11:47 akosiaris@deploy1001: helmfile [CODFW] Ran 'apply' command on namespace 'mathoid' for release 'production' .
* 11:44 akosiaris@deploy1001: helmfile [EQIAD] Ran 'apply' command on namespace 'mathoid' for release 'production' .
* 11:39 akosiaris@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'mathoid' for release 'staging' .
* 09:43 marostegui@cumin1001: dbctl commit (dc=all): 'Give more weight to db1089', diff saved to https://phabricator.wikimedia.org/P10953 and previous config saved to /var/cache/conftool/dbconfig/20200410-094359-marostegui.json
* 09:31 marostegui@cumin1001: dbctl commit (dc=all): 'Give more weight to db1089', diff saved to https://phabricator.wikimedia.org/P10952 and previous config saved to /var/cache/conftool/dbconfig/20200410-093129-marostegui.json
* 08:52 hashar@deploy1001: Finished deploy [zuul/deploy@4a69913]: (no justification provided) (duration: 00m 16s)
* 08:51 hashar@deploy1001: Started deploy [zuul/deploy@4a69913]: (no justification provided)
* 08:46 hashar@deploy1001: Finished deploy [zuul/deploy@5a0a03a]: (no justification provided) (duration: 02m 20s)
* 08:44 hashar@deploy1001: Started deploy [zuul/deploy@5a0a03a]: (no justification provided)
* 08:39 mutante: deploy1001 - keyholder disarm, keyholder arm
* 08:32 mutante: fix comment in deployment ssh key for zuul to include the path to the key on deploy1001
* 08:24 vgutierrez: update puppet compiler facts
* 08:20 hashar@deploy1001: Finished deploy [integration/zuul/deploy@6c3ddad]: (no justification provided) (duration: 00m 11s)
* 08:19 hashar@deploy1001: Started deploy [integration/zuul/deploy@6c3ddad]: (no justification provided)
* 08:03 hashar@deploy1001: Finished deploy [docker-pkg/deploy@9f2ba2c]: (no justification provided) (duration: 00m 05s)
* 08:03 hashar@deploy1001: Started deploy [docker-pkg/deploy@9f2ba2c]: (no justification provided)
* 07:52 mutante: closing port 80 on phab hosts for caching servers
* 07:37 ema: cp3050: back to vhtcpd for the holidays [[phab:T249583|T249583]]
* 07:00 mutante: sodium - sudo -u mirror ftpsync
* 06:58 mutante: armed keyholder on deploy1001
* 06:19 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 06:15 marostegui@cumin1001: START - Cookbook sre.hosts.downtime
* 06:00 marostegui: Stop MySQL on pc1008 for upgrade
 
== 2020-04-09 ==
* 23:44 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Touch and secondary sync of IS for cache-busting (duration: 00m 58s)
* 23:27 catrope@deploy1001: Synchronized wmf-config/mobile.php: Drop fallback support for wgMobileFrontendLogo ([[phab:T248500|T248500]]) (duration: 00m 58s)
* 23:21 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Drop unused config for main page CSS ([[phab:T243996|T243996]]) (duration: 00m 58s)
* 23:17 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Add extendedconfirmed group and protection level on jawiki ([[phab:T249820|T249820]]) (duration: 00m 59s)
* 22:01 sukhe: running initial metadb sync on cescout1001
* 19:43 mholloway-shell@deploy1001: helmfile [CODFW] Ran 'apply' command on namespace 'wikifeeds' for release 'production' .
* 19:41 mholloway-shell@deploy1001: helmfile [EQIAD] Ran 'apply' command on namespace 'wikifeeds' for release 'production' .
* 19:39 mholloway-shell@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'wikifeeds' for release 'staging' .
* 19:08 jhuneidi@deploy1001: rebuilt and synchronized wikiversions files: all wikis to 1.35.0-wmf.27  refs [[phab:T247774|T247774]]
* 19:01 longma: deploying 1.35.0-wmf.27 to all wikis
* 17:50 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 17:50 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 17:50 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 17:50 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 17:40 hnowlan@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'changeprop' for release 'production' .
* 17:24 hnowlan@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'changeprop' for release 'production' .
* 17:18 hnowlan@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'changeprop' for release 'staging' .
* 14:39 hnowlan@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'changeprop' for release 'staging' .
* 14:32 XioNoX: disable down interfaces from fasw-c-codfw (mintaka)
* 13:45 hnowlan@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'changeprop' for release 'staging' .
* 13:31 hnowlan@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'changeprop' for release 'staging' .
* 12:43 mlitn@deploy1001: Synchronized php-1.35.0-wmf.27/extensions/MachineVision/: [MachineVision] Fix statement creation from suggestion (duration: 01m 09s)
* 12:31 ema: cp3051: upgrade varnish to 5.1.3-1wm13 once again, restart varnish-fe [[phab:T249809|T249809]]
* 11:57 XioNoX: offload more traffic from NTT eqiad - [[phab:T249808|T249808]]
* 11:20 kartik@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit{{!}}587257{{!}}Enable ContentTranslation as a default tool in Slovenian WP (T248836)]], take II (duration: 01m 06s)
* 11:19 kartik@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit{{!}}587257{{!}}Enable ContentTranslation as a default tool in Slovenian WP (T248836)]] (duration: 01m 07s)
* 10:50 vgutierrez: rolling upgrade to trafficserver 8.0.6-1mw7
* 10:50 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 10:50 jmm@cumin2001: START - Cookbook sre.hosts.downtime
* 10:50 jmm@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 10:49 jmm@cumin2001: START - Cookbook sre.hosts.downtime
* 10:43 ema: repool cp3051 [[phab:T249809|T249809]]
* 10:30 ema: cp3051: re-enable transient storage limit, downgrade varnish to 5.1.3-1wm12 (no 0035-vbf_stp_condfetch_crash.patch) and restart varnish-fe [[phab:T249809|T249809]]
* 09:46 ema: cp3051: disable transient storage limit and restart varnish-fe [[phab:T249809|T249809]]
* 09:31 XioNoX: offload traffic from NTT eqiad - [[phab:T249808|T249808]]
* 07:56 mutante: contint2001 - a2dismod mpm_event - then run puppet to let it enable php_mod_7.3  (race condition like mentioned in https://gerrit.wikimedia.org/r/c/operations/puppet/+/451206) ([[phab:T224591|T224591]])
* 07:56 mutante: contint2001 - a2dismod mpm_event - then run puppet to let it enable php_mod_7.3  (race condition like mentioned in https://gerrit.wikimedia.org/r/c/operations/puppet/+/451206)
* 07:24 moritzm: synched jenkins 222.1 to apt.wikimedia.org (buster-wikimedia, thirdparty/ci) [[phab:T224591|T224591]]
* 07:12 marostegui: Repool labsdb1011
* 07:10 XioNoX: switch urpf from log to syslog in ulsfo
* 07:04 XioNoX: re-activate BGP to Zayo in eqiad
* 06:59 vgutierrez: upgrade ats to version 8.0.6-1wm7 in cp[4026,4032,5006,5012]
* 06:45 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 06:43 XioNoX: confirmed on one host that the change didn't break logstash. Re-enable Puppet on logstash hosts - [[phab:T244147|T244147]]
* 06:42 marostegui@cumin1001: START - Cookbook sre.hosts.downtime
* 06:36 XioNoX: disabling puppet on logstash host for CR deploy - [[phab:T244147|T244147]]
* 06:30 XioNoX: push urpf log only to eqiad - [[phab:T244147|T244147]]
* 06:25 XioNoX: push urpf log only to eqsin - [[phab:T244147|T244147]]
* 06:21 XioNoX: push urpf log only to AMS - [[phab:T244147|T244147]]
* 05:40 vgutierrez: upgrade ats to version 8.0.6-1wm6 in cp[4025,4031,5005,5011] - [[phab:T249335|T249335]]
* 05:37 marostegui: Stop MySQL on pc2008 for upgrade to Buster and 10.4
* 05:36 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Depool pc2008 for upgrade (duration: 01m 08s)
* 05:08 marostegui: Deploy schema change on db1123
* 05:07 vgutierrez: upload trafficserver 8.0.6-1wm6 to apt.wm.o (buster) - [[phab:T249335|T249335]]
 
== 2020-04-08 ==
* 21:20 jforrester@deploy1001: Synchronized php-1.35.0-wmf.27/extensions/TemplateData/includes/TemplateDataHooks.php: Restore call to OutputPage::setupOOUI() (duration: 01m 07s)
* 21:19 jforrester@deploy1001: Synchronized php-1.35.0-wmf.26/extensions/TemplateData/includes/TemplateDataHooks.php: Restore call to OutputPage::setupOOUI() (duration: 01m 09s)
* 20:09 otto@deploy1001: helmfile [EQIAD] Ran 'apply' command on namespace 'eventgate-main' for release 'canary' .
* 20:09 otto@deploy1001: helmfile [EQIAD] Ran 'apply' command on namespace 'eventgate-main' for release 'production' .
* 20:06 otto@deploy1001: helmfile [CODFW] Ran 'apply' command on namespace 'eventgate-main' for release 'canary' .
* 20:06 otto@deploy1001: helmfile [CODFW] Ran 'apply' command on namespace 'eventgate-main' for release 'production' .
* 20:04 otto@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'eventgate-main' for release 'canary' .
* 20:04 otto@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'eventgate-main' for release 'production' .
* 19:51 gehel: restart wdqs-updater after deployment
* 19:49 mstyles@deploy1001: Finished deploy [wdqs/wdqs@c2995eb]: WDQS version 0.3.21 (duration: 14m 37s)
* 19:44 dpifke@deploy1001: Finished deploy [performance/navtiming@4acb04d]: Deploy new navtiming with First Input Delay metric https://phabricator.wikimedia.org/T238091 (duration: 00m 05s)
* 19:44 dpifke@deploy1001: Started deploy [performance/navtiming@4acb04d]: Deploy new navtiming with First Input Delay metric https://phabricator.wikimedia.org/T238091
* 19:35 mstyles@deploy1001: Started deploy [wdqs/wdqs@c2995eb]: WDQS version 0.3.21
* 19:08 jhuneidi@deploy1001: Synchronized php: group1 wikis to 1.35.0-wmf.27  refs [[phab:T247774|T247774]] (duration: 01m 06s)
* 19:07 jhuneidi@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.35.0-wmf.27  refs [[phab:T247774|T247774]]
* 19:02 longma: deploying 1.35.0-wmf.27 to group1
* 18:37 jforrester@deploy1001: Synchronized php-1.35.0-wmf.27/skins/Vector: [[phab:T248761|T248761]]: Revert moving indicators in DOM (duration: 01m 07s)
* 18:17 reedy@deploy1001: Synchronized php-1.35.0-wmf.27/extensions/TemplateData/includes/TemplateDataHooks.php: [[phab:T236809|T236809]] (duration: 01m 06s)
* 18:16 reedy@deploy1001: Synchronized php-1.35.0-wmf.26/extensions/TemplateData/includes/TemplateDataHooks.php: [[phab:T236809|T236809]] (duration: 01m 10s)
* 17:31 hnowlan@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'changeprop' for release 'staging' .
* 17:23 hnowlan@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'changeprop' for release 'staging' .
* 17:13 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 17:13 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 16:16 ema: cache_upload: rolling varnish-fe restarts to bump transient storage limit [[phab:T185968|T185968]]
* 15:21 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 15:19 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 15:11 ema: cp3051: param.set shortlived=0 to try ease pressure on transient memory
* 14:23 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1112 after schema change', diff saved to https://phabricator.wikimedia.org/P10947 and previous config saved to /var/cache/conftool/dbconfig/20200408-142341-marostegui.json
* 14:14 jeh@deploy1001: Finished deploy [horizon/deploy@0d18f67]: update horizon submodule to enable server groups (duration: 03m 30s)
* 14:10 jeh@deploy1001: Started deploy [horizon/deploy@0d18f67]: update horizon submodule to enable server groups
* 13:40 mutante: stopped and masked zuul-merger service on contint2001 via puppet ([[phab:T224591|T224591]])
* 13:30 ema: cp3050: stop vhtcpd, start purged [[phab:T249583|T249583]]
* 13:22 vgutierrez: enable inbound TLSv1.3 in text@ulsfo - [[phab:T170567|T170567]]
* 13:05 ema: purged 0.1 uploaded to buster-wikimedia [[phab:T249583|T249583]]
* 12:31 tgr@deploy1001: Synchronized wmf-config/InitialiseSettings.php: re-sync (duration: 01m 07s)
* 12:29 tgr@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:585219{{!}}Enable GrowthExperiments suggested edits on uk, hu, hy, eu wikipedias (T247308)]] (duration: 01m 08s)
* {{safesubst:SAL entry|1=12:17 tgr@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:584135{{!}}Enable GrowthExperiments welcome survey on Ukrainian, Hungarian, Armenian Wikipedias (T238295) (duration: 01m 08s)}}
* 12:09 tgr@deploy1001: Synchronized wmf-config/: SWAT: [[gerrit:584183{{!}}Enable GrowthExperiments on French Wiktionary (T235964)]] (duration: 01m 06s)
* 11:56 tgr@deploy1001: Synchronized dblists/: SWAT: [[gerrit:584183{{!}}Enable GrowthExperiments on French Wiktionary (T235964)]] (duration: 01m 03s)
* 11:48 mutante: logstash1009 - restarted logstash
* 11:43 tgr@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:585766{{!}}Enable WikibaseQualityConstraints on test commons (T248117)]] (duration: 01m 05s)
* 11:43 marostegui: Deploy schema change on db1112, this will generate lag on labs s3
* 11:43 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1112 for schema change', diff saved to https://phabricator.wikimedia.org/P10942 and previous config saved to /var/cache/conftool/dbconfig/20200408-114315-marostegui.json
* 11:39 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1078 after schema change', diff saved to https://phabricator.wikimedia.org/P10941 and previous config saved to /var/cache/conftool/dbconfig/20200408-113901-marostegui.json
* 11:29 tgr@deploy1001: Synchronized wmf-config/: SWAT: [[gerrit:584133{{!}}Deploy GrowthExperiments on Serbian Wikipedia (T241181)]] (duration: 01m 06s)
* 11:28 tgr@deploy1001: Synchronized dblists/: SWAT: [[gerrit:584133{{!}}Deploy GrowthExperiments on Serbian Wikipedia (T241181)]] (duration: 01m 17s)
* 11:05 XioNoX: push urpf log only to codfw - [[phab:T244147|T244147]]
* 10:39 jbond42: restarting idp.wikimedia.org
* 10:14 marostegui: Deploy schema change on db1078
* 10:14 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1078 for schema change', diff saved to https://phabricator.wikimedia.org/P10940 and previous config saved to /var/cache/conftool/dbconfig/20200408-101431-marostegui.json
* 09:30 jynus: stopping and removing db1095:s8 instance
* 09:20 godog: upgrade grafana on cloudmetrics hosts - [[phab:T244208|T244208]]
* 09:17 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1075 after schema change', diff saved to https://phabricator.wikimedia.org/P10939 and previous config saved to /var/cache/conftool/dbconfig/20200408-091728-marostegui.json
* 09:11 gehel: setting weight=10 for all pooled wdqs servers in codfw - [[phab:T246343|T246343]]
* 09:10 marostegui: Reload proxies on dbproxy1018 and dbproxy1019 to depool labsdb1011 - [[phab:T249188|T249188]] [[phab:T248592|T248592]]
* 09:07 gehel: pooling wdqs200[78] - new servers ready to go! - [[phab:T246343|T246343]]
* 08:46 marostegui: Rename wb_terms and recreate views on labsdb1009-labsdb1011 - [[phab:T248592|T248592]] [[phab:T248086|T248086]]
* 08:39 godog: upgrade grafana on grafana1002 - [[phab:T244208|T244208]]
* 08:17 _joe_: switching parsoid to envoy (take 2) in eqiad
* 07:23 marostegui: Deploy schema change on db1075
* 07:23 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1075 for schema change', diff saved to https://phabricator.wikimedia.org/P10937 and previous config saved to /var/cache/conftool/dbconfig/20200408-072331-marostegui.json
* 06:31 marostegui: Deploy schema change on db1095:3313
* 06:11 marostegui: Stop haproxy on dbproxy1011 - [[phab:T231520|T231520]]
* 05:44 vgutierrez: rolling upgrade ATS to 8.0.6-1wm6 in cp[5006,5012,3065,3064,2042,2041,1090,1089]
* 05:34 marostegui: Deploy schema change on dbstore1004:3313
* 05:33 _joe_: repooling wtp1025, with envoy and logging any error above 404 [[phab:T249535|T249535]]
* 04:36 vgutierrez: rolling restart of ats-tls - [[phab:T249335|T249335]]
 
== 2020-04-07 ==
* 20:39 andrewbogott: correction: briefly downtiming ldap-eqiad-replica0 and ldap-eqiad-replica1.  I'm trying to investigate a possible split-brain so going to turn ldap off on one, and then the other, to see if behavior changes
* 20:37 andrewbogott: briefly downtiming serpens and seaborgium.  I'm trying to investigate a possible split-brain so going to turn ldap off on one, and then the other, to see if behavior changes
* 20:34 hoo: (Take 3) Temporary modified dumpsgen's crontab on snapshot1008 so that the Wikidata RDF dumps start now (broke as a side effect of [[phab:T249565|T249565]])
* 20:17 jhuneidi@deploy1001: rebuilt and synchronized wikiversions files: group0 wikis to 1.35.0-wmf.27  refs [[phab:T247774|T247774]]
* 20:09 jhuneidi@deploy1001: Finished scap: testwikis wikis to 1.35.0-wmf.27 (duration: 60m 34s)
* 20:08 hoo: (Take 2) Temporary modified dumpsgen's crontab on snapshot1008 so that the Wikidata RDF dumps start now (broke as a side effect of [[phab:T249565|T249565]])
* 19:45 hoo: Temporary modified dumpsgen's crontab on snapshot1008 so that the Wikidata RDF dumps start now (broke as a side effect of [[phab:T249565|T249565]])
* 19:13 XioNoX: push pfw firewall rules - [[phab:T249650|T249650]]
* 19:08 jhuneidi@deploy1001: Started scap: testwikis wikis to 1.35.0-wmf.27
* 18:48 jhuneidi@deploy1001: Pruned MediaWiki: 1.35.0-wmf.24 (duration: 12m 44s)
* 17:56 herron: increasing codfw.mediawiki.job.cirrusSearchElasticaWrite to 3 partitions [[phab:T240702|T240702]]
* 17:55 addshore@deploy1001: Synchronized wmf-config/CommonSettings.php: [[phab:T249565|T249565]] [[phab:T249595|T249595]] RejectParserCacheValue entries during wb_items_per_site drop incident (14.5/14.5h) retry (duration: 01m 02s)
* 17:54 addshore: last sync stuck on sync-masters
* 17:54 addshore@deploy1001: sync-file aborted: [[phab:T249565|T249565]] [[phab:T249595|T249595]] RejectParserCacheValue entries during wb_items_per_site drop incident (14.5/14.5h) (duration: 01m 16s)
* 17:49 ppchelko@deploy1001: Started restart [cpjobqueue/deploy@83c93d1]: Try to make it notice new partitions [[phab:T240702|T240702]]
* 17:40 herron: increasing eqiad.mediawiki.job.cirrusSearchElasticaWrite to 3 partitions [[phab:T240702|T240702]]
* 16:24 longma: 1.35.0-wmf.27 was branched at {{Gerrit|e76ac29cd9c57bed4097ec8a4ea8311fb55fd967}} for [[phab:T247774|T247774]]
* 16:16 hashar: restarting CI jenkins
* 15:53 hnowlan@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'changeprop' for release 'staging' .
* 15:21 moritzm: installing idp-test2001
* 15:20 XioNoX: enable uRPF loose mode (log only) on cr4-ulsfo - [[phab:T244147|T244147]]
* 15:17 addshore@deploy1001: Synchronized wmf-config/CommonSettings.php: [[phab:T249565|T249565]] [[phab:T249595|T249595]] RejectParserCacheValue entries during wb_items_per_site drop incident (12/14.5h) (duration: 01m 00s)
* 15:10 ema: cp3052: stop purged, start vhtcpd [[phab:T249583|T249583]] [[phab:T241232|T241232]]
* 15:00 hnowlan@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'changeprop' for release 'staging' .
* 14:56 addshore@deploy1001: Synchronized wmf-config/CommonSettings.php: [[phab:T249565|T249565]] [[phab:T249595|T249595]] RejectParserCacheValue entries during wb_items_per_site drop incident (10/14.5h) (duration: 00m 55s)
* 14:52 jeh: cloudvirt2003-dev: downtime in icinga and reboot to enable BIOS virtualization support [[phab:T249453|T249453]]
* 14:38 ema: cp3052: stop vhtcpd, start purged [[phab:T249583|T249583]]
* 14:35 addshore@deploy1001: Synchronized wmf-config/CommonSettings.php: [[phab:T249565|T249565]] [[phab:T249595|T249595]] RejectParserCacheValue entries during wb_items_per_site drop incident (8/14.5h) (duration: 00m 58s)
* 14:25 addshore@deploy1001: Synchronized wmf-config/CommonSettings.php: [[phab:T249565|T249565]] [[phab:T249595|T249595]] RejectParserCacheValue entries during wb_items_per_site drop incident (4/14.5h) (duration: 00m 58s)
* 14:15 addshore@deploy1001: Synchronized wmf-config/CommonSettings.php: [[phab:T249565|T249565]] [[phab:T249595|T249595]] RejectParserCacheValue entries during wb_items_per_site drop incident (2/14.5h) (duration: 00m 58s)
* 14:08 addshore@deploy1001: Synchronized wmf-config/CommonSettings.php: [[phab:T249565|T249565]] [[phab:T249595|T249595]] RejectParserCacheValue entries during wb_items_per_site drop incident (1h) take 2 (duration: 00m 57s)
* 13:57 addshore@deploy1001: Synchronized wmf-config/CommonSettings.php: REVERT [[phab:T249565|T249565]] [[phab:T249595|T249595]] RejectParserCacheValue entries during wb_items_per_site drop incident (1h) (duration: 00m 58s)
* 13:55 addshore@deploy1001: sync-file aborted: [[phab:T249565|T249565]] [[phab:T249595|T249595]] RejectParserCacheValue entries during wb_items_per_site drop incident (1h) (duration: 00m 29s)
* 13:17 vgutierrez: restart ats-tls on cp3056 - [[phab:T249335|T249335]]
* 12:59 vgutierrez: restart ats-tls on cp3052- [[phab:T249335|T249335]]
* 12:50 addshore: addshore@mwmaint1002:~$ mwscript extensions/Wikibase/repo/maintenance/rebuildItemsPerSite.php --wiki=wikidatawiki --file [[phab:T249596|T249596]]-6.list > [[phab:T249596|T249596]]-6.out # [[phab:T249565|T249565]]
* 12:43 addshore: addshore@mwmaint1002:~$ mwscript extensions/Wikibase/repo/maintenance/rebuildItemsPerSite.php --wiki=wikidatawiki --file [[phab:T249596|T249596]]-5.list > [[phab:T249596|T249596]]-5.out # [[phab:T249565|T249565]]
* 12:42 vgutierrez: restart ats-tls on cp3058 - [[phab:T249335|T249335]]
* 12:25 jmm@cumin2001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
* 12:06 addshore: addshore@mwmaint1002:~$ mwscript extensions/Wikibase/repo/maintenance/rebuildItemsPerSite.php --wiki=wikidatawiki --file [[phab:T249596|T249596]]-4.list > [[phab:T249596|T249596]]-4.out # [[phab:T249565|T249565]] [[phab:T249596|T249596]]
* 12:05 jmm@cumin2001: START - Cookbook sre.ganeti.makevm
* 11:52 marostegui@cumin1001: dbctl commit (dc=all): 'repool db1126', diff saved to https://phabricator.wikimedia.org/P10932 and previous config saved to /var/cache/conftool/dbconfig/20200407-115228-marostegui.json
* 11:51 marostegui@cumin1001: dbctl commit (dc=all): 'depool db1126', diff saved to https://phabricator.wikimedia.org/P10931 and previous config saved to /var/cache/conftool/dbconfig/20200407-115154-marostegui.json
* 11:50 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1092, db1111, db1099:3318 after table rename', diff saved to https://phabricator.wikimedia.org/P10930 and previous config saved to /var/cache/conftool/dbconfig/20200407-115058-marostegui.json
* 11:50 jynus: renaming wb_items_per_site_recovered to wb_items_per_site on s8
* 11:45 jynus: stopping s8 replication on db1116:3318, db1095:3318, db2079
* 11:42 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1092, db1111, db1099:3318 for table rename', diff saved to https://phabricator.wikimedia.org/P10929 and previous config saved to /var/cache/conftool/dbconfig/20200407-114258-marostegui.json
* 11:36 Amir1: stopped the rebuilt script ([[phab:T249565|T249565]])
* 11:34 addshore@deploy1001: Synchronized wmf-config/CommonSettings.php: cleanup [[phab:T203888|T203888]], Remove old unused RejectParserCacheValue hook (duration: 00m 59s)
* 11:09 marostegui: Deploy schema change on s3 codfw
* 11:07 jynus: starting recovery on all s8 hosts
* 10:45 hnowlan@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'changeprop' for release 'staging' .
* 10:41 addshore@deploy1001: Synchronized php-1.35.0-wmf.26/extensions/Wikibase/repo/maintenance/rebuildItemsPerSite.php: [[phab:T249565|T249565]] [[phab:T249596|T249596]] Wikibase rebuildItemsPerSite.php script that allows lists of ids (duration: 01m 00s)
* 10:27 jynus: starting recovery on db1099:3318
* 09:58 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1119 after schema change', diff saved to https://phabricator.wikimedia.org/P10927 and previous config saved to /var/cache/conftool/dbconfig/20200407-095852-marostegui.json
* 09:49 volans@deploy1001: Finished deploy [homer/deploy@887544c]: Release v0.2.0 (take 2) (duration: 00m 26s)
* 09:49 volans@deploy1001: Started deploy [homer/deploy@887544c]: Release v0.2.0 (take 2)
* 09:38 marostegui: Deploy schema change on db1119
* 09:38 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1119 for schema change', diff saved to https://phabricator.wikimedia.org/P10926 and previous config saved to /var/cache/conftool/dbconfig/20200407-093820-marostegui.json
* 09:36 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1134 after schema change', diff saved to https://phabricator.wikimedia.org/P10925 and previous config saved to /var/cache/conftool/dbconfig/20200407-093638-marostegui.json
* 09:31 volans@deploy1001: Finished deploy [homer/deploy@b4522ad]: Release v0.2.0 (duration: 00m 16s)
* 09:31 volans@deploy1001: Started deploy [homer/deploy@b4522ad]: Release v0.2.0
* 09:29 volans@deploy1001: Finished deploy [homer/deploy@ac7a818]: Inject plugins (take 3) (duration: 03m 03s)
* 09:26 volans@deploy1001: Started deploy [homer/deploy@ac7a818]: Inject plugins (take 3)
* 09:19 marostegui: Deploy schema change on db1134
* 09:18 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1134 for schema change', diff saved to https://phabricator.wikimedia.org/P10924 and previous config saved to /var/cache/conftool/dbconfig/20200407-091847-marostegui.json
* 09:17 volans@deploy1001: Finished deploy [homer/deploy@a03d7cd]: Inject plugins (take 2) (duration: 00m 29s)
* 09:17 volans@deploy1001: Started deploy [homer/deploy@a03d7cd]: Inject plugins (take 2)
* 09:04 vgutierrez: testing ATS 8.0.6-1wm6 on cp4026 and cp4032
* 08:58 volans@deploy1001: Finished deploy [homer/deploy@a03d7cd]: Inject plugins (duration: 04m 59s)
* 08:53 volans@deploy1001: Started deploy [homer/deploy@a03d7cd]: Inject plugins
* 08:46 XioNoX: enable uRPF loose mode (log only) on cr3-ulsfo v4 uplinks - [[phab:T244147|T244147]]
* 08:44 XioNoX: enable uRPF loose mode (log only) on cr3-ulsfo v6 uplinks - [[phab:T244147|T244147]]
* 08:42 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 08:37 mutante: decom ganeti VM miscweb1001 (stretch) - kept backup of old racktables files and db dump in /root/racktables on miscweb1002 ([[phab:T247648|T247648]])
* 08:33 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
* 08:31 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 08:30 mutante: decom ganeti VM miscweb2001 (stretch)
* 08:30 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
* 08:26 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1106 after schema change', diff saved to https://phabricator.wikimedia.org/P10923 and previous config saved to /var/cache/conftool/dbconfig/20200407-082607-marostegui.json
* 08:17 moritzm: installing php5 security updates
* 08:06 marostegui: Deploy schema change on db1106 (this will generate lag on s1 labs)
* 08:05 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1106 for schema change', diff saved to https://phabricator.wikimedia.org/P10922 and previous config saved to /var/cache/conftool/dbconfig/20200407-080533-marostegui.json
* 08:04 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1080 after schema change', diff saved to https://phabricator.wikimedia.org/P10921 and previous config saved to /var/cache/conftool/dbconfig/20200407-080443-marostegui.json
* 07:52 _joe_: disabling puppet on mwdebug1002
* 07:47 marostegui: Failover dbproxy1011 to dbproxy1019 - [[phab:T231520|T231520]])
* 07:43 marostegui: Deploy schema change on db1080
* 07:43 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1080 for schema change', diff saved to https://phabricator.wikimedia.org/P10920 and previous config saved to /var/cache/conftool/dbconfig/20200407-074321-marostegui.json
* 07:41 dcausse@deploy1001: Finished deploy [wdqs/wdqs@23495ae]: deploying wdqs 0.3.17 to wdqs2002: [[phab:T249196|T249196]] (duration: 01m 28s)
* 07:40 dcausse@deploy1001: Started deploy [wdqs/wdqs@23495ae]: deploying wdqs 0.3.17 to wdqs2002: [[phab:T249196|T249196]]
* 07:39 _joe_: depooling wtp1025, used for debugging
* 07:31 vgutierrez: enable parent proxies in ats-tls - [[phab:T249335|T249335]]
* 07:19 jynus: restarting s3 on db1095
* 07:02 moritzm: updating linux-image-4.9.0-11-amd64 where applicable
* 06:55 elukey@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 06:53 elukey@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 06:52 elukey@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 06:37 moritzm: installing ruby2.1 security updates
* 06:32 jynus: stopping slave (s3) on db1095
* 05:38 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [[gerrit:586488{{!}}Fix database name for repo in testwikidata (T249533)]], take II (duration: 00m 58s)
* 05:37 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [[gerrit:586488{{!}}Fix database name for repo in testwikidata (T249533)]] (duration: 01m 00s)
* 05:26 elukey@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 01:08 jforrester@deploy1001: Synchronized php-1.35.0-wmf.26/maintenance/: [[phab:T157651|T157651]] Remove sql.php from maintenance/ (duration: 00m 58s)
* 01:06 jforrester@deploy1001: Synchronized php-1.35.0-wmf.26/autoload.php: [[phab:T157651|T157651]] Remove sql.php from autoloader (duration: 00m 58s)
* 01:05 jforrester@deploy1001: Synchronized php-1.35.0-wmf.26/extensions/Wikibase/repo/includes/Store/Sql/DatabaseSchemaUpdater.php: [[phab:T208425|T208425]] [[phab:T249565|T249565]] Follow-up {{Gerrit|a956c655}}: Only avoid dropping wb_items_per_site so prod can be merged (duration: 00m 58s)
* 00:01 addshore@deploy1001: Synchronized php-1.35.0-wmf.26/extensions/Wikibase/repo/includes/Store/Sql/DatabaseSchemaUpdater.php: Do not try to drop things when theres no wb_terms table [[phab:T208425|T208425]] [[phab:T249565|T249565]] cache bust (duration: 01m 01s)
 
== 2020-04-06 ==
* 23:59 addshore@deploy1001: Synchronized php-1.35.0-wmf.26/extensions/Wikibase/repo/includes/Store/Sql/DatabaseSchemaUpdater.php: Do not try to drop things when theres no wb_terms table [[phab:T208425|T208425]] [[phab:T249565|T249565]] (duration: 00m 59s)
* 23:31 Amir1: ladsgroup@mwmaint1002:/srv/mediawiki-staging/php-1.35.0-wmf.26$ mwscript extensions/Wikibase/repo/maintenance/rebuildItemsPerSite.php --wiki=wikidatawiki
* 23:26 Amir1: created wb_items_per_site
* 19:05 elukey@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 19:03 elukey@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 19:00 elukey@cumin1001: END (ERROR) - Cookbook sre.wdqs.data-transfer (exit_code=97)
* 18:58 elukey@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 18:57 elukey@cumin1001: END (ERROR) - Cookbook sre.wdqs.data-transfer (exit_code=97)
* 18:51 elukey@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 18:42 elukey@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 18:22 Urbanecm: Morning SWAT done
* 18:19 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: {{Gerrit|335a924}}: Enable Local upload on azbwiki ([[phab:T248971|T248971]]; take II) (duration: 00m 58s)
* 18:16 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: {{Gerrit|335a924}}: Enable Local upload on azbwiki ([[phab:T248971|T248971]]) (duration: 00m 59s)
* 16:54 elukey@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 16:52 _joe_: parsoid migrated to use envoy for TLS termination
* 16:24 _joe_: switching parsoid-php to envoy for TLS termination
* 15:45 mholloway-shell@deploy1001: Synchronized wmf-config/InitialiseSettings.php: MachineVision: Label blacklist updates ([[phab:T249285|T249285]]) (duration: 00m 58s)
* 15:36 hnowlan@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'changeprop' for release 'production' .
* 15:04 elukey@cumin1001: END (ERROR) - Cookbook sre.wdqs.data-transfer (exit_code=97)
* 14:59 addshore: deploy slot done
* 14:55 addshore@deploy1001: Synchronized wmf-config/InitialiseSettings.php: TEST: Test commons: Define entity sources configuration [[phab:T248664|T248664]] (cache bust) (duration: 00m 57s)
* 14:54 addshore@deploy1001: Synchronized wmf-config/InitialiseSettings.php: TEST: Test commons: Define entity sources configuration [[phab:T248664|T248664]] (duration: 00m 57s)
* 14:50 addshore@deploy1001: Synchronized wmf-config/InitialiseSettings.php: TEST: Wikibase, entity source, use modern repoDatabase and interwikiPrefix [[phab:T248664|T248664]] (cache bust) (duration: 00m 57s)
* 14:49 addshore@deploy1001: Synchronized wmf-config/InitialiseSettings.php: TEST: Wikibase, entity source, use modern repoDatabase and interwikiPrefix [[phab:T248664|T248664]] (duration: 00m 58s)
* 14:42 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1118 after schema change', diff saved to https://phabricator.wikimedia.org/P10912 and previous config saved to /var/cache/conftool/dbconfig/20200406-144220-marostegui.json
* 14:41 addshore@deploy1001: Synchronized wmf-config/InitialiseSettings.php: TEST: Wikibase client entity source config [[phab:T248664|T248664]] (cache bust) (duration: 00m 58s)
* 14:40 addshore@deploy1001: Synchronized wmf-config/InitialiseSettings.php: TEST: Wikibase client entity source config [[phab:T248664|T248664]] (duration: 00m 59s)
* 14:37 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1118 after schema change', diff saved to https://phabricator.wikimedia.org/P10911 and previous config saved to /var/cache/conftool/dbconfig/20200406-143755-marostegui.json
* 14:30 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1118 after schema change', diff saved to https://phabricator.wikimedia.org/P10910 and previous config saved to /var/cache/conftool/dbconfig/20200406-143042-marostegui.json
* 14:26 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1118 after schema change', diff saved to https://phabricator.wikimedia.org/P10909 and previous config saved to /var/cache/conftool/dbconfig/20200406-142607-marostegui.json
* 14:24 addshore@deploy1001: Synchronized wmf-config/InitialiseSettings.php: TEST: Wikibase entity source config for testwikidatawiki [[phab:T248664|T248664]] (cachebust) (duration: 00m 58s)
* 14:23 addshore@deploy1001: Synchronized wmf-config/InitialiseSettings.php: TEST: Wikibase entity source config for testwikidatawiki [[phab:T248664|T248664]] (duration: 00m 59s)
* 14:09 elukey@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 14:07 elukey@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99)
* 14:07 elukey@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 13:47 sukhe: upload cescout 0.1.1-1 to apt.wm.o (buster) - [[phab:T247273|T247273]]
* 13:26 elukey: reboot stat1008 as test to verify ROCm 3.3 upgrades
* 13:22 elukey: stat1008 upgraded to ROCm 3.3 (enables Tensorflow 2.x)
* 13:05 ema: cache: upgrade varnish to 5.1.3-1wm13, begin rolling varnish-fe restarts [[phab:T249344|T249344]]
* 13:03 marostegui: Deploy schema change on db1118
* 13:03 jbond42: updating gnutls on buster
* 13:03 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1118 for schema change', diff saved to https://phabricator.wikimedia.org/P10906 and previous config saved to /var/cache/conftool/dbconfig/20200406-130320-marostegui.json
* 13:02 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1107 after schema change', diff saved to https://phabricator.wikimedia.org/P10905 and previous config saved to /var/cache/conftool/dbconfig/20200406-130255-marostegui.json
* 12:59 Urbanecm: Creation of grwikimedia is done ([[phab:T245911|T245911]])
* 12:59 urbanecm@deploy1001: Synchronized wmf-config/interwiki.php: Update interwiki cache (duration: 02m 22s)
* 12:55 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|77b9ae9}}: Create grwikimedia (duration: 00m 58s)
* 12:54 urbanecm@deploy1001: Synchronized static/images/project-logos/: {{Gerrit|77b9ae9}}: Create grwikimedia (duration: 00m 58s)
* 12:53 marostegui: Deploy schema change on db1107
* 12:53 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1107 for schema change', diff saved to https://phabricator.wikimedia.org/P10904 and previous config saved to /var/cache/conftool/dbconfig/20200406-125308-marostegui.json
* 12:52 urbanecm@deploy1001: Synchronized multiversion/MWMultiVersion.php: {{Gerrit|77b9ae9}}: Create grwikimedia (duration: 00m 58s)
* 12:52 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1089 after schema change', diff saved to https://phabricator.wikimedia.org/P10903 and previous config saved to /var/cache/conftool/dbconfig/20200406-125222-marostegui.json
* 12:46 urbanecm@deploy1001: rebuilt and synchronized wikiversions files: {{Gerrit|77b9ae9}}: Create grwikimedia
* 12:44 urbanecm@deploy1001: Synchronized dblists/: {{Gerrit|77b9ae9}}: Create grwikimedia (duration: 00m 59s)
* 12:37 XioNoX: Update eqiad analytics filters with new APT IPs
* 12:27 hnowlan@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'changeprop' for release 'production' .
* 12:21 marostegui: Deploy schema change on db1089
* 12:21 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1089 for schema change', diff saved to https://phabricator.wikimedia.org/P10902 and previous config saved to /var/cache/conftool/dbconfig/20200406-122123-marostegui.json
* 12:20 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1105:3311 after schema change', diff saved to https://phabricator.wikimedia.org/P10901 and previous config saved to /var/cache/conftool/dbconfig/20200406-122058-marostegui.json
* 12:14 hnowlan@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'changeprop' for release 'production' .
* 12:08 hnowlan@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'changeprop' for release 'production' .
* 12:04 godog: test grafana 6.7.2 upgrade on grafana2001 - [[phab:T244208|T244208]]
* 11:57 awight: EU swat complete
* {{safesubst:SAL entry|1=11:53 awight@deploy1001: Synchronized php-1.35.0-wmf.26/extensions/TwoColConflict: SWAT: [[gerrit:586309{{!}}Backport talk page and EventLogging changes (T248243, T249404) (duration: 00m 59s)}}
* 11:52 elukey@cumin1001: END (PASS) - Cookbook sre.aqs.roll-restart (exit_code=0)
* 11:48 elukey@cumin1001: START - Cookbook sre.aqs.roll-restart
* 11:48 awight@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:586325{{!}}Create account creator and rollback groups on yowiki (T249487)]] (duration: 00m 59s)
* 11:32 awight@deploy1001: Synchronized php-1.35.0-wmf.26/extensions/ContentTranslation: SWAT: [[gerrit:586311{{!}}Avoid failure on restoring draft with no categories (T249400)]] (duration: 01m 02s)
* 11:25 awight@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: double-syncing (duration: 00m 58s)
* 11:24 marostegui: Deploy schema change on db1105:3311
* 11:24 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1105:3311 for schema change', diff saved to https://phabricator.wikimedia.org/P10900 and previous config saved to /var/cache/conftool/dbconfig/20200406-112417-marostegui.json
* 11:21 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1099:3311 after schema change', diff saved to https://phabricator.wikimedia.org/P10899 and previous config saved to /var/cache/conftool/dbconfig/20200406-112123-marostegui.json
* 11:18 elukey: import AMD ROCm 3.3 packages in buster-wikimedia (component thirdparty/rocm33) - [[phab:T247082|T247082]]
* 11:17 awight@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:580394{{!}}cirrus: Increase commonswiki near match weight (T245642)]] (duration: 00m 59s)
* 11:11 awight@deploy1001: Synchronized wmf-config/CommonSettings.php: SWAT: [[gerrit:585779{{!}} Whitelist X-Wikimedia-Debug header for cross-wiki API requests (T249107)]] (duration: 00m 59s)
* 10:51 jdrewniak@deploy1001: Synchronized portals: Wikimedia Portals Update: [[gerrit:586305{{!}} Bumping portals to master (563985)]] (duration: 00m 58s)
* 10:50 jdrewniak@deploy1001: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: [[gerrit:586305{{!}} Bumping portals to master (563985)]] (duration: 01m 12s)
* 09:50 XioNoX: push pfw firewall policies - [[phab:T249267|T249267]]
* 09:40 marostegui: Deploy schema change on db1099:3311
* 09:39 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1099:3311 for schema change', diff saved to https://phabricator.wikimedia.org/P10898 and previous config saved to /var/cache/conftool/dbconfig/20200406-093944-marostegui.json
* 09:11 ema: cp2027: upgrade varnish to 5.1.3-1wm13 and restart varnish-fe [[phab:T249344|T249344]]
* 09:08 ema: upload varnish 5.1.3-1wm13 to buster-wikimedia on apt1001.wm.org [[phab:T249344|T249344]]
* 08:55 ariel@deploy1001: Finished deploy [dumps/dumps@ae1e705]: add prefetch test, fix multistream index file download link (duration: 00m 09s)
* 08:55 ariel@deploy1001: Started deploy [dumps/dumps@ae1e705]: add prefetch test, fix multistream index file download link
* 08:54 elukey: bootstrap wdqs200[7,8] - [[phab:T246343|T246343]]
* 08:50 marostegui: Deploy schema change on db1139:3311
* 08:18 _joe_: conversion of codfw api done
* 08:07 marostegui: Deploy schema change on dbstore1003:3311
* 07:54 vgutierrez: rolling restart of ats-tls to disable wmf-analytics log - [[phab:T249335|T249335]] [[phab:T237993|T237993]]
* 07:50 dcausse: search index: deleting stale index wikidatawiki_content_1585224806 on cloudelastic:9243
* 07:49 _joe_: eqiad API migrated to envoy for local TLS termination, now starting codfw
* 07:35 elukey: restart elasticsearch_6@cloudelastic-chi-eqiad on cloudelastic1003 as attempt to fix heavy GC runs (old gen) - [[phab:T231517|T231517]]
* 07:35 marostegui: Rename wb_terms on eqiad excluding labsdb1009, labdb1010, labsdb1011 - [[phab:T248086|T248086]]
* 07:06 marostegui: Rename wb_terms on codfw - [[phab:T248086|T248086]]
* 06:45 XioNoX: delete BGP to AS25074 in amsix
* 06:36 _joe_: converting the api servers to envoy for TLS in eqiad
* 06:30 marostegui: Upgrade dbproxy1019 - [[phab:T231520|T231520]]
* 06:18 marostegui: Deploy schema change on s1 codfw master, this will generate lag on codfw
* 05:54 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 05:54 marostegui@cumin1001: START - Cookbook sre.hosts.decommission
* 05:50 vgutierrez: ats-tls restart in cp3056, cp3058 and cp3062 - [[phab:T249335|T249335]]
* 05:45 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1079 after schema change', diff saved to https://phabricator.wikimedia.org/P10897 and previous config saved to /var/cache/conftool/dbconfig/20200406-054559-marostegui.json
* 05:18 marostegui: Deploy schema change on db1079 (this will generate lag on s7 labs)
* 05:17 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1079 for schema change', diff saved to https://phabricator.wikimedia.org/P10896 and previous config saved to /var/cache/conftool/dbconfig/20200406-051744-marostegui.json
* 05:16 vgutierrez: Enable inbound TLSv1.3 in upload@eqiad - [[phab:T170567|T170567]]
* 05:16 vgutierrez: Enable TLS Session Tickets on eqiad - [[phab:T245616|T245616]]
* 05:03 vgutierrez: ats-tls restart in cp1075, cp1081 and cp1087 - [[phab:T249335|T249335]]
 
== 2020-04-03 ==
* 21:17 andrewbogott: ugpraded wikitech-static to 1.34.1
* 17:58 mutante: rsync home dirs from install1002 to apt1001:/srv/home_install1002...
* 15:43 ema: cp3061: restart varnish-fe [[phab:T249344|T249344]]
* 15:30 hnowlan@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'changeprop' for release 'staging' .
* 15:19 hnowlan@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'changeprop' for release 'staging' .
* 15:18 hnowlan@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'changeprop' for release 'staging' .
* 15:18 ema: cp3057: restart varnish-fe [[phab:T249344|T249344]]
* 14:37 hashar: Restarting Jenkins for a CSP parameter [[phab:T245658|T245658]]
* 14:07 vgutierrez: restart ats-tls on cp1087 - [[phab:T249335|T249335]]
* 14:01 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1090:3317 after schema change', diff saved to https://phabricator.wikimedia.org/P10882 and previous config saved to /var/cache/conftool/dbconfig/20200403-140132-marostegui.json
* 13:55 vgutierrez: restart ats-tls on cp1075 and cp1081 - [[phab:T249335|T249335]]
* 12:49 marostegui: Deploy schema change on db1090:3317
* 12:49 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1090:3317 for schema change', diff saved to https://phabricator.wikimedia.org/P10881 and previous config saved to /var/cache/conftool/dbconfig/20200403-124908-marostegui.json
* 12:48 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1136 after schema change', diff saved to https://phabricator.wikimedia.org/P10880 and previous config saved to /var/cache/conftool/dbconfig/20200403-124827-marostegui.json
* 12:45 dcausse@deploy1001: Finished deploy [wdqs/wdqs@23495ae]: deploying wdqs 0.3.17 to wdqs1007: testing [[phab:T249196|T249196]] (duration: 00m 43s)
* 12:44 dcausse@deploy1001: Started deploy [wdqs/wdqs@23495ae]: deploying wdqs 0.3.17 to wdqs1007: testing [[phab:T249196|T249196]]
* 12:27 marostegui: Deploy schema change on db1136
* 12:27 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1136 for schema change', diff saved to https://phabricator.wikimedia.org/P10879 and previous config saved to /var/cache/conftool/dbconfig/20200403-122716-marostegui.json
* 12:23 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1094 after schema change', diff saved to https://phabricator.wikimedia.org/P10878 and previous config saved to /var/cache/conftool/dbconfig/20200403-122259-marostegui.json
* 12:00 marostegui: Deploy schema change on db1094
* 12:00 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1094 for schema change', diff saved to https://phabricator.wikimedia.org/P10877 and previous config saved to /var/cache/conftool/dbconfig/20200403-115959-marostegui.json
* 11:59 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1098:3317 after schema change', diff saved to Unable to send diff to phaste and previous config saved to /var/cache/conftool/dbconfig/20200403-115854-marostegui.json
* 11:40 marostegui: Deploy schema change on db1098:3317
* 11:40 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1098:3317 for schema change', diff saved to https://phabricator.wikimedia.org/P10875 and previous config saved to /var/cache/conftool/dbconfig/20200403-114004-marostegui.json
* 11:37 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1101:3317 after schema change', diff saved to https://phabricator.wikimedia.org/P10874 and previous config saved to /var/cache/conftool/dbconfig/20200403-113717-marostegui.json
* 10:38 marostegui: Deploy schema change on db1101:3317
* 10:38 urbanecm@deploy1001: Synchronized static/images/project-logos/: {{Gerrit|861b267}}: Enable cswiki anniversary logo ([[phab:T249173|T249173]]) (duration: 01m 02s)
* 10:37 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1101:3317 for schema change', diff saved to https://phabricator.wikimedia.org/P10872 and previous config saved to /var/cache/conftool/dbconfig/20200403-103746-marostegui.json
* 09:32 marostegui: Deploy schema on db1116:3317
* 08:43 marostegui: Deploy schema change on dbstore1003:3317
* 07:57 marostegui: Deploy schema change on s7 codfw master, this will generate lag on codfw
* 06:55 XioNoX: add fastnetmon 1.1.4 to buster-wikimedia - [[phab:T240658|T240658]]
* 06:25 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1126 after schema change', diff saved to https://phabricator.wikimedia.org/P10870 and previous config saved to /var/cache/conftool/dbconfig/20200403-062529-marostegui.json
* 05:21 marostegui: Deploy schema change on db1126
* 05:21 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1126 for schema change', diff saved to https://phabricator.wikimedia.org/P10869 and previous config saved to /var/cache/conftool/dbconfig/20200403-052115-marostegui.json
* 00:42 catrope@deploy1001: Synchronized php-1.35.0-wmf.26/extensions/FlaggedRevs/: Fix logic for determining if pending edits were null ([[phab:T249277|T249277]]) (duration: 01m 00s)
 
== 2020-04-02 ==
* 23:53 hoo: Started Wikibase rebuildItemsPerSite on mwmaint1002 for wikidatawiki. Can be killed at any time, if necessary.
* 23:09 catrope@deploy1001: Synchronized wmf-config/CommonSettings.php: Don't try to grant 'oathauth-enable' to '*' (part 2) ([[phab:T248282|T248282]]) (duration: 00m 58s)
* 19:53 jforrester@deploy1001: Synchronized php-1.35.0-wmf.26/extensions/Translate/specials/SpecialExportTranslations.php: [[phab:T249258|T249258]]: Revert 'Special:ExportTranslations: Disallow exporting huge groups' (duration: 00m 59s)
* 19:38 ppchelko@deploy1001: Finished deploy [restbase/deploy@7923c1f]: Update CSP headers for mobileapps [[phab:T248431|T248431]] (duration: 15m 13s)
* 19:35 jforrester@deploy1001: Synchronized php-1.35.0-wmf.26/includes/MovePage.php: [[phab:T248789|T248789]] MovePage: Use correct Title when creating the null revision (duration: 00m 59s)
* 19:30 hashar: docker-pkg update on contint hosts
* 19:30 hashar@deploy1001: Finished deploy [docker-pkg/deploy@9f2ba2c]: (no justification provided) (duration: 00m 12s)
* 19:29 hashar@deploy1001: Started deploy [docker-pkg/deploy@9f2ba2c]: (no justification provided)
* 19:23 ppchelko@deploy1001: Started deploy [restbase/deploy@7923c1f]: Update CSP headers for mobileapps [[phab:T248431|T248431]]
* 19:05 jhuneidi@deploy1001: rebuilt and synchronized wikiversions files: all wikis to 1.35.0-wmf.26  refs [[phab:T247773|T247773]]
* 19:00 longma: promoting all to 1.35.0-wmf.26
* 18:39 jhuneidi@deploy1001: Synchronized php: group1 wikis to 1.35.0-wmf.26  refs [[phab:T247773|T247773]] (duration: 01m 05s)
* 18:38 jhuneidi@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.35.0-wmf.26  refs [[phab:T247773|T247773]]
* 18:37 longma: rolling group1 to 1.35.0-wmf.26
* 18:27 urbanecm@deploy1001: Synchronized php-1.35.0-wmf.26/extensions/MobileFrontend/: SWAT: {{Gerrit|4e2a092}}: EditorGateway: Fix handling of null sectionId ([[phab:T249169|T249169]]) (duration: 01m 09s)
* 18:22 urbanecm@deploy1001: Synchronized php-1.35.0-wmf.26/extensions/VisualEditor/modules/ve-mw: SWAT: {{Gerrit|94ded03}}: Fix issues with treating section "numbers" as integers ([[phab:T248795|T248795]]; [[phab:T248968|T248968]]; [[phab:T249112|T249112]]) (duration: 01m 10s)
* 17:49 bsitzmann@deploy1001: Finished deploy [mobileapps/deploy@7650fbe]: Update mobileapps to {{Gerrit|61977bd7}} (duration: 03m 21s)
* 17:45 bsitzmann@deploy1001: Started deploy [mobileapps/deploy@7650fbe]: Update mobileapps to {{Gerrit|61977bd7}}
* 16:53 joal@deploy1001: Finished deploy [analytics/refinery@5b254c8] (thin): Regular analytics weekly train THIN [analytics/refinery@5b254c8] (duration: 00m 08s)
* 16:53 joal@deploy1001: Started deploy [analytics/refinery@5b254c8] (thin): Regular analytics weekly train THIN [analytics/refinery@5b254c8]
* 16:49 jforrester@deploy1001: Synchronized php-1.35.0-wmf.26/includes/actions/Action.php: [[phab:T249162|T249162]] Partially revert 'WikiPage/Article split. Rely on Article inside Action' (duration: 01m 07s)
* 16:44 joal@deploy1001: Finished deploy [analytics/refinery@5b254c8]: Regular analytics weekly train [analytics/refinery@5b254c8] (duration: 13m 50s)
* 16:37 volans@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 16:34 volans@cumin1001: START - Cookbook sre.dns.netbox
* 16:34 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Touch and secondary sync of IS for cache-busting (duration: 01m 05s)
* 16:33 jforrester@deploy1001: sync-file aborted: [[phab:T249014|T249014]] [siwiki] Change wgSitename to drop the ',' (duration: 00m 00s)
* 16:32 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [[phab:T249014|T249014]] [siwiki] Change wgSitename to drop the ',' (duration: 01m 07s)
* 16:30 joal@deploy1001: Started deploy [analytics/refinery@5b254c8]: Regular analytics weekly train [analytics/refinery@5b254c8]
* 16:19 XioNoX: upgrade netflow4001's fastnetmon to 1.1.4 - [[phab:T240658|T240658]]
* 14:56 XioNoX: push new test switch config for cloudvirt2001 - [[phab:T248425|T248425]]
* 14:33 vgutierrez: Enable inbound TLSv1.3 in upload@codfw - [[phab:T170567|T170567]]
* 14:33 vgutierrez: Enable TLS Session tickets in codfw - [[phab:T245616|T245616]]
* 14:24 jbond42: updating bluez on ganeti and cloudvirt
* 14:23 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1111 after schema change', diff saved to https://phabricator.wikimedia.org/P10865 and previous config saved to /var/cache/conftool/dbconfig/20200402-142338-marostegui.json
* 14:18 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1111 after schema change', diff saved to https://phabricator.wikimedia.org/P10864 and previous config saved to /var/cache/conftool/dbconfig/20200402-141802-marostegui.json
* 14:13 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1111 after schema change', diff saved to https://phabricator.wikimedia.org/P10863 and previous config saved to /var/cache/conftool/dbconfig/20200402-141335-marostegui.json
* 14:11 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1111 after schema change', diff saved to https://phabricator.wikimedia.org/P10862 and previous config saved to /var/cache/conftool/dbconfig/20200402-141149-marostegui.json
* 13:50 marostegui: Compress wbqc_constraints on testcommonswiki and commonswiki (empty tables) - [[phab:T248967|T248967]]
* 13:44 vgutierrez: update puppet compiler facts
* 13:40 marostegui: Deploy schema change on db1111
* 13:39 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1111 for schema change', diff saved to https://phabricator.wikimedia.org/P10861 and previous config saved to /var/cache/conftool/dbconfig/20200402-133956-marostegui.json
* 13:32 gehel: OSM data reimport on maps2004 - [[phab:T249086|T249086]]
* 12:55 mutante: mw1390 - mw1399 - pooled and active but status "staged" in netbox, fixing to 'active'
* 12:52 mutante: mw1297 - is pooled and serving traffic but status "staged" in netbox. set to "active"
* 11:40 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1087 after schema change', diff saved to https://phabricator.wikimedia.org/P10858 and previous config saved to /var/cache/conftool/dbconfig/20200402-114020-marostegui.json
* 11:06 mutante: decom planet1001 ([[phab:T248863|T248863]])
* 10:56 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 10:55 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
* 10:19 marostegui: Deploy schema change on db1087, this will generate lag on s8 on wiki replicas
* 10:19 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1087 for schema change', diff saved to https://phabricator.wikimedia.org/P10857 and previous config saved to /var/cache/conftool/dbconfig/20200402-101920-marostegui.json
* 10:17 elukey: set up TLS encryption for all pmacct instances on netflow* to Kafka Jumbo
* 10:17 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1104 after schema change', diff saved to https://phabricator.wikimedia.org/P10856 and previous config saved to /var/cache/conftool/dbconfig/20200402-101747-marostegui.json
* 09:47 marostegui: Remove haproxy@10.64.37.14 from labsdb hosts - [[phab:T231280|T231280]] [[phab:T248944|T248944]]
* 09:44 gehel: CORRECTION: depool maps2004 for data reimport - [[phab:T249086|T249086]]
* 09:40 gehel: depool wdqs2004 for data reimport - [[phab:T249086|T249086]]
* 09:33 oblivian@deploy1001: Finished deploy [docker-pkg/deploy@9f2ba2c]: (no justification provided) (duration: 00m 18s)
* 09:32 oblivian@deploy1001: Started deploy [docker-pkg/deploy@9f2ba2c]: (no justification provided)
* 09:28 oblivian@deploy1001: Finished deploy [docker-pkg/deploy@4f86d77]: (no justification provided) (duration: 00m 09s)
* 09:28 oblivian@deploy1001: Started deploy [docker-pkg/deploy@4f86d77]: (no justification provided)
* 08:51 marostegui: Deploy schema change db1104
* 08:50 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1104 for schema change', diff saved to https://phabricator.wikimedia.org/P10854 and previous config saved to /var/cache/conftool/dbconfig/20200402-085057-marostegui.json
* 08:50 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1092 after schema change', diff saved to https://phabricator.wikimedia.org/P10853 and previous config saved to /var/cache/conftool/dbconfig/20200402-085019-marostegui.json
* 08:28 gehel: repooling wdqs1006 - catched up on lag
* 08:22 vgutierrez: Enable inbound TLSv1.3 in upload@esams - [[phab:T170567|T170567]]
* 08:21 vgutierrez: Enable TLS Session tickets in esams - [[phab:T245616|T245616]]
* 07:45 moritzm: bounced ferm on ms-be1040
* 07:27 marostegui: Deploy schema change on db1092
* 07:27 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1092 for schema change', diff saved to https://phabricator.wikimedia.org/P10850 and previous config saved to /var/cache/conftool/dbconfig/20200402-072730-marostegui.json
* 07:25 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1101:3318 after schema change', diff saved to https://phabricator.wikimedia.org/P10849 and previous config saved to /var/cache/conftool/dbconfig/20200402-072500-marostegui.json
* 05:49 marostegui: Deploy schema change on db1101:3318
* 05:49 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1101:3318 for schema change', diff saved to https://phabricator.wikimedia.org/P10848 and previous config saved to /var/cache/conftool/dbconfig/20200402-054931-marostegui.json
* 05:29 elukey: powercycle analytics1045 (host not responsive to ssh, weird chars showed in mgmt serial console)
 
== 2020-04-01 ==
* 22:44 volker-e@deploy1001: Finished deploy [design/style-guide@4bfe647]: Deploy design/style-guide:  (duration: 00m 08s)
* 22:43 volker-e@deploy1001: Started deploy [design/style-guide@4bfe647]: Deploy design/style-guide:
* 22:02 volans: forcing logrotate on netflow2001 to compress yesterday's logs
* 21:53 volans: force-rebooting ms-be1023, unresponsive - [[phab:T249174|T249174]]
* 21:50 volans: stopped and restarted kafkatee-webrequest.service on netflow2001, was in a restart loop
* 19:48 marxarelli: rollback of 1.35.0-wmf.26 from group1 ([[phab:T247773|T247773]]). blocked by [[phab:T249162|T249162]]
* 19:30 dduvall@deploy1001: rebuilt and synchronized wikiversions files: rollback 1.35.0-wmf.26 from group1
* 19:21 dduvall@deploy1001: Synchronized php: group1 wikis to 1.35.0-wmf.26 (duration: 01m 06s)
* 19:20 dduvall@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.35.0-wmf.26
* 19:18 marxarelli: promoting group1 to 1.35.0-wmf.26 to group1
* 17:21 cdanis: ✔️ cdanis@cumin1001.eqiad.wmnet ~ 🕐☕ homer 'cr*eqord*' commit 'enable sampling on eqord Iac15379cc'
* 16:54 cdanis: ✔️ cdanis@cumin1001.eqiad.wmnet ~ 🕐☕ homer 'cr*eqdfw*' commit 'enable sampling on eqdfw Iac15379cc'
* 16:39 vgutierrez: pool cp2027 - [[phab:T248816|T248816]]
* 16:31 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 16:28 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 16:17 ariel@deploy1001: Finished deploy [dumps/dumps@21363c1]: page range prefetch fixup (duration: 00m 09s)
* 16:17 ariel@deploy1001: Started deploy [dumps/dumps@21363c1]: page range prefetch fixup
* 15:33 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 15:31 vgutierrez@cumin1001: START - Cookbook sre.hosts.decommission
* 15:31 vgutierrez@cumin1001: END (ERROR) - Cookbook sre.hosts.decommission (exit_code=97)
* 15:31 vgutierrez@cumin1001: START - Cookbook sre.hosts.decommission
* 15:29 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 15:29 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 15:27 vgutierrez: depool & decommission cp20[16,19,23,27] - [[phab:T249125|T249125]]
* 15:22 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1099:3318 after schema change', diff saved to https://phabricator.wikimedia.org/P10845 and previous config saved to /var/cache/conftool/dbconfig/20200401-152258-marostegui.json
* 15:11 herron: performing kafka-main rolling restarts to pick up security updates
* 14:52 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 14:50 vgutierrez@cumin1001: START - Cookbook sre.hosts.decommission
* 14:49 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 14:49 vgutierrez@cumin1001: START - Cookbook sre.hosts.decommission
* 14:46 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 14:46 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 14:43 vgutierrez: depool && decommission cp[2018,2020,2022,2024-2026].codfw.wmnet - [[phab:T249115|T249115]]
* 14:32 gehel: depooling wdqs1006 to allow catching up on lag
* 14:30 vgutierrez: pool cp2042 - [[phab:T248816|T248816]]
* 14:16 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 14:13 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 14:09 XioNoX: remove AS-path prepending in esams
* 13:47 XioNoX: remove AS-path prepending in eqsin
* 13:39 vgutierrez: pool cp2041 - [[phab:T248816|T248816]]
* 13:34 mutante: sodium (mirror): sudo -u mirror ftpsync to get Debian mirror updated (Icinga says it's old)
* 13:24 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 13:24 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 13:17 marostegui: Deploy schema change on db1099:3318
* 13:17 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1099:3318 for schema change', diff saved to https://phabricator.wikimedia.org/P10843 and previous config saved to /var/cache/conftool/dbconfig/20200401-131719-marostegui.json
* 13:13 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 13:10 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 12:19 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 12:19 vgutierrez@cumin1001: START - Cookbook sre.hosts.decommission
* 12:19 tgr@deploy1001: Synchronized wmf-config/config: SWAT: [[gerrit:584579{{!}}Sync growthexperiments dblist with actual state of wmgUseGrowthExperiments (T248844)]] (duration: 01m 06s)
* 12:18 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 12:18 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 12:17 tgr@deploy1001: Synchronized dblists/growthexperiments.dblist: SWAT: [[gerrit:584579{{!}}Sync growthexperiments dblist with actual state of wmgUseGrowthExperiments (T248844)]] (duration: 01m 05s)
* 12:17 XioNoX: restart nfacct on netflow4001 for kafka tls tests - [[phab:T248980|T248980]]
* 12:15 vgutierrez: depool & decommission cp2013 - [[phab:T249088|T249088]]
* 12:14 tgr@deploy1001: Synchronized wmf-config/InitialiseSettings.php: re-sync (duration: 01m 06s)
* 12:12 tgr@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:585059{{!}}Enable password-reset-update on all other than Wikipedias (T245791)]] (duration: 01m 07s)
* 12:09 marostegui: Deploy schema change on db1116:3318
* 12:05 cparle@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [SDC] Revert enabling WikibaseQualityConstraints on Commons take 2 (duration: 01m 08s)
* 12:04 cparle@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [SDC] Revert enabling WikibaseQualityConstraints on Commons (duration: 01m 05s)
* 11:54 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: {{Gerrit|4968501}}: Restrict short URL management log to stewards ([[phab:T221073|T221073]]; take II) (duration: 01m 05s)
* 11:53 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: {{Gerrit|4968501}}: Restrict short URL management log to stewards ([[phab:T221073|T221073]]) (duration: 01m 07s)
* 11:48 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php:  [SDC] Enable WikibaseQualityConstraints on Commons take II (duration: 01m 06s)
* 11:44 cparle@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [SDC] Enable WikibaseQualityConstraints on Commons (duration: 01m 18s)
* 11:20 cormacparle__: created table wbqc_constraints on commonswiki
* 11:03 jbond42: install bluez update on ganeti-canary and cloudvirt/cloudcontrol-dev
* 11:01 mutante: planet1001 - reinstall OS to test install_server switch, ATS switched to planet1002 earlier
* 10:47 marostegui: Deploy schema change on dbstore1005:3318
* 10:25 vgutierrez: pool cp2040 - [[phab:T248816|T248816]]
* 10:16 oblivian@puppetmaster1001: conftool action : set/pooled=yes:weight=1; selector: service=canary
* 09:55 dzahn@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
* 09:46 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm
* 09:37 marostegui: Deploy schema change on s8 codfw, this will generate lag on codfw
* 09:35 XioNoX: Update install servers IPs (dhcp helpers + firewall rules) - [[phab:T224576|T224576]]
* 09:34 mutante: install_servers: DHCP_relay in routers and TFTP server in DHCP server config have been switched from install1002/2002 to install1003/2003 - doing a test install, but if any issues report on [[phab:T224576|T224576]]
* 09:26 marostegui: last entry was for db2093
* 09:26 marostegui: Downgrade mariadb package from 10.4.12-2 to 10.4.12-1
* 09:09 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 09:07 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 09:05 mutante: planet - the backend server has been switched from planet1001 (stretch) to planet1002 (buster) - [[phab:T247651|T247651]]
* 08:46 mutante: deneb, boron: systemctl reset-failed to clear up systemd state alerts
* 08:43 marostegui: Stop haproxy on dbproxy1010 [[phab:T248944|T248944]]
* 08:37 jynus: restart bacula at backup1001
* 08:30 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 08:30 vgutierrez@cumin1001: START - Cookbook sre.hosts.decommission
* 08:28 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 08:28 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 08:28 vgutierrez: depool & decommission cp2017 - [[phab:T249084|T249084]]
* 08:21 vgutierrez: pool cp2039 - [[phab:T248816|T248816]]
* 08:09 marostegui: Deploy schema change on db1138 (s4 primary master)
* 08:06 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 08:04 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 07:13 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1121 after schema change', diff saved to https://phabricator.wikimedia.org/P10841 and previous config saved to /var/cache/conftool/dbconfig/20200401-071339-marostegui.json
* 07:12 vgutierrez: pool cp2038 - [[phab:T248816|T248816]]
* 06:38 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 06:38 vgutierrez@cumin1001: START - Cookbook sre.hosts.decommission
* 06:36 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 06:36 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 06:36 vgutierrez: depool & decommission cp2012 - [[phab:T249080|T249080]]
* 06:24 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 06:22 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 05:39 marostegui: Deploy schema change on db1121 (this will create lag on s4 labs)
* 05:38 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1121 for schema change', diff saved to https://phabricator.wikimedia.org/P10840 and previous config saved to /var/cache/conftool/dbconfig/20200401-053827-marostegui.json
* 00:39 reedy@deploy1001: Synchronized docroot/mediawiki.org/xml/: Update http and prot rel links to https, fix link to sitelist in MW Core (duration: 01m 06s)
* 00:12 reedy@deploy1001: Synchronized docroot/mediawiki.org/xml/: Add export-0.11 (duration: 01m 05s)
 
== 2020-03-31 ==
* 22:23 marxarelli: group0 to 1.35.0-wmf.26 ([[phab:T247773|T247773]]); no rise in error rates following redeployment
* 22:13 dduvall@deploy1001: rebuilt and synchronized wikiversions files: group0 to 1.35.0-wmf.26
* 22:07 dduvall@deploy1001: rebuilt and synchronized wikiversions files: testwiki to php-1.35.0-wmf.26 ([[phab:T247773|T247773]])
* 21:54 dduvall@deploy1001: sync aborted: testwiki to php-1.35.0-wmf.26 ([[phab:T247773|T247773]]) (duration: 07m 31s)
* 21:47 dduvall@deploy1001: Started scap: testwiki to php-1.35.0-wmf.26 ([[phab:T247773|T247773]])
* 21:46 jforrester@deploy1001: Synchronized php-1.35.0-wmf.26/includes/user/UserNameUtils.php: [[phab:T249045|T249045]] Use wfMessage in UserNameUtils::isUsable for now (duration: 00m 58s)
* 21:05 eileen: process-control config revision is {{Gerrit|f80d248113}} - (catch up dedupe now off - fyi MBeat )
* 20:59 hashar: contint1001: manually reverted /lib/systemd/system/jenkins.service
* 20:51 hashar: Restarting Jenkins for new CSP rules # [[phab:T245658|T245658]]
* 20:26 dduvall@deploy1001: rebuilt and synchronized wikiversions files: rolling back 1.35.0-wmf.26 testwiki deployment following significant increase in error rate (cc [[phab:T247773|T247773]])
* 20:14 marxarelli: correction: RequestContext::getLanguage errors are for testwiki deployment, pre group0
* 20:08 marxarelli: a slew of "ErrorException from line 334 of /srv/mediawiki/php-1.35.0-wmf.26/includes/context/RequestContext.php: PHP Warning: Recursion detected in RequestContext::getLanguage" after group0 deployment (cc [[phab:T247773|T247773]])
* 20:04 dduvall@deploy1001: Finished scap: testwiki to php-1.35.0-wmf.26 and rebuild l10n cache (duration: 142m 48s)
* 19:20 ariel@deploy1001: Finished deploy [dumps/dumps@713c297]: more filelist methods cleanup, sort prefetch possible files properly (duration: 00m 04s)
* 19:20 ariel@deploy1001: Started deploy [dumps/dumps@713c297]: more filelist methods cleanup, sort prefetch possible files properly
* 18:08 ariel@deploy1001: Finished deploy [dumps/dumps@8376c62]: bring snapshot1010 up to date (duration: 00m 05s)
* 18:07 ariel@deploy1001: Started deploy [dumps/dumps@8376c62]: bring snapshot1010 up to date
* 17:42 dduvall@deploy1001: Started scap: testwiki to php-1.35.0-wmf.26 and rebuild l10n cache
* 17:40 dduvall@deploy1001: Pruned MediaWiki: 1.35.0-wmf.23 (duration: 26m 51s)
* 17:38 elukey: restart elasticsearch_6@cloudelastic-chi-eqiad.service on cloudelastic1001 to see if it recovers from a trashing/gc state - [[phab:T231517|T231517]]
* 16:30 marxarelli: 1.35.0-wmf.26 was branched at {{Gerrit|bec758b668aaa57fc259a1d0ecf3b35340d2661b}} for [[phab:T247773|T247773]]
* 16:24 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Touch and secondary sync of IS for cache-busting (duration: 01m 00s)
* 16:15 vgutierrez: pool cp2037 - [[phab:T248816|T248816]]
* 15:39 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 15:36 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 15:35 mutante: decom mw1254 through mw1258 (last remaining old servers in rack D5, depooled a while ago and average response time is again under 200ms) [[phab:T247780|T247780]]
* 15:33 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
* 15:29 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 15:29 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 15:28 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 15:27 vgutierrez@cumin1001: START - Cookbook sre.hosts.decommission
* 15:27 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 15:27 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 15:26 vgutierrez: depool & decommission cp2010 - [[phab:T249002|T249002]]
* 15:15 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Touch and secondary sync of IS for cache-busting (duration: 00m 58s)
* 15:14 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [[phab:T245794|T245794]] Enable DiscussionTools as a beta feature on four wikis (duration: 01m 00s)
* 15:05 cdanis: cr1-eqiad: commit flex-flow-sizing [[phab:T248394|T248394]]
* 15:01 cdanis: cr2-eqiad: commit flex-flow-sizing [[phab:T248394|T248394]]
* 14:43 vgutierrez: pool cp2036 - [[phab:T248816|T248816]]
* 14:21 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw125[4-8].eqiad.wmnet
* 14:20 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 14:20 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 14:19 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 14:17 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 14:15 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1091 after schema change', diff saved to https://phabricator.wikimedia.org/P10834 and previous config saved to /var/cache/conftool/dbconfig/20200331-141459-marostegui.json
* 14:10 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 14:10 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 14:05 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw125[4-8].eqiad.wmnet
* 13:31 vgutierrez: Enable TLS Session tickets in eqsin - [[phab:T245616|T245616]]
* 13:05 XioNoX: update nat on pfw3-codfw - [[phab:T248906|T248906]]
* 13:03 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 13:03 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 12:49 _joe_: switching all appserver canaries to envoy
* 12:46 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 12:45 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 12:45 marostegui: Deploy schema change on db1091
* 12:44 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1091 for schema change', diff saved to https://phabricator.wikimedia.org/P10833 and previous config saved to /var/cache/conftool/dbconfig/20200331-124452-marostegui.json
* 12:34 _joe_: transitioning mw1261 to envoy
* 12:23 vgutierrez: rolling upgrade of ATS to version 8.0.6-1wm5 - [[phab:T248938|T248938]]
* 11:30 Lucas_WMDE: EU SWAT done
* 11:30 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/CommonSettings.php: SWAT: [[gerrit:584874{{!}}Disable TwoColConflict talk page workflow (T230231)]], take II (duration: 00m 57s)
* 11:29 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/CommonSettings.php: SWAT: [[gerrit:584874{{!}}Disable TwoColConflict talk page workflow (T230231)]] (duration: 00m 58s)
* 11:11 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:584574{{!}}Enable ContentTranslation in Lithuanian Wikipedia as a default tool (T248179)]], take II (duration: 00m 59s)
* 11:10 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:584574{{!}}Enable ContentTranslation in Lithuanian Wikipedia as a default tool (T248179)]] (duration: 01m 00s)
* 10:46 _joe_: disabled puppet on canary appservers, potentially dangerous change ahead
* 10:19 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1084 after schema change', diff saved to https://phabricator.wikimedia.org/P10831 and previous config saved to /var/cache/conftool/dbconfig/20200331-101953-marostegui.json
* 10:03 XioNoX: add BGP to AS41327 in AMS-IX
* 09:49 XioNoX: push homer diffs to mr1-eqsin
* 09:36 XioNoX: push homer diffs to mr1-eqiad
* 09:19 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 09:15 marostegui@cumin1001: START - Cookbook sre.hosts.downtime
* 09:10 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 09:09 vgutierrez@cumin1001: START - Cookbook sre.hosts.decommission
* 09:05 vgutierrez: upload trafficserver 8.0.5-1wm6 to apt.wm.o (buster) - [[phab:T248938|T248938]]
* 09:00 vgutierrez: depool & decommission cp2011 - [[phab:T248950|T248950]]
* 08:44 vgutierrez: pool cp2035 - [[phab:T248816|T248816]]
* 08:31 mutante: signed puppet cert for planet1002.eqiad.wmnet
* 08:29 marostegui: Depool db1084 for schema change
* 08:29 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1084 for schema change', diff saved to https://phabricator.wikimedia.org/P10829 and previous config saved to /var/cache/conftool/dbconfig/20200331-082904-marostegui.json
* 08:27 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1081 after schema change', diff saved to https://phabricator.wikimedia.org/P10828 and previous config saved to /var/cache/conftool/dbconfig/20200331-082711-marostegui.json
* 08:17 dzahn@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
* 08:08 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm
* 08:01 XioNoX: delete unused ROA for ARIN v4 prefixes - [[phab:T235886|T235886]]
* 07:49 vgutierrez@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 07:49 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 07:17 vgutierrez: pool cp2034 - [[phab:T248816|T248816]]
* 07:16 marostegui: Deploy schema change on db1081
* 07:15 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1081 for schema change', diff saved to https://phabricator.wikimedia.org/P10827 and previous config saved to /var/cache/conftool/dbconfig/20200331-071547-marostegui.json
* 07:14 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1103:3314 after schema change', diff saved to https://phabricator.wikimedia.org/P10826 and previous config saved to /var/cache/conftool/dbconfig/20200331-071401-marostegui.json
* 06:48 marostegui: Deploy schema change on db1103:3314
* 06:47 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1103:3314 for schema change', diff saved to https://phabricator.wikimedia.org/P10825 and previous config saved to /var/cache/conftool/dbconfig/20200331-064707-marostegui.json
* 06:46 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1097:3314 after schema change', diff saved to https://phabricator.wikimedia.org/P10824 and previous config saved to /var/cache/conftool/dbconfig/20200331-064627-marostegui.json
* 05:55 marostegui: Drop nova and nova_api from m5 master (db1133) - [[phab:T248313|T248313]]
* 05:55 kart_: Updated cxserver to 2020-03-30-145349-production ([[phab:T248578|T248578]])
* 05:55 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 05:54 vgutierrez@cumin1001: START - Cookbook sre.hosts.decommission
* 05:53 vgutierrez: depool && decommission cp2007 - [[phab:T248941|T248941]]
* 05:48 kartik@deploy1001: helmfile [EQIAD] Ran 'apply' command on namespace 'cxserver' for release 'production' .
* 05:46 vgutierrez@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 05:46 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 05:46 kartik@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'cxserver' for release 'staging' .
* 05:26 marostegui: Deploy schema change on db1097:3314
* 05:13 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1097:3314 for schema change', diff saved to https://phabricator.wikimedia.org/P10822 and previous config saved to /var/cache/conftool/dbconfig/20200331-051354-marostegui.json
* 00:26 eileen: civicrm revision changed from {{Gerrit|cf2e2c11c3}} to {{Gerrit|524b162174}}, config revision is {{Gerrit|708198a154}}
 
== 2020-03-30 ==
* 23:30 cdanis: cr3-esams: commit flex-flow-sizing [[phab:T248394|T248394]]
* 23:20 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Touch and secondary sync of IS for cache-busting (duration: 00m 58s)
* 23:19 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Alphabetize wikis in each GrowthExperiments settings (duration: 00m 58s)
* 23:16 cdanis: cr2-esams: commit flex-flow-sizing [[phab:T248394|T248394]]
* 23:08 cdanis: cdanis@cr3-knams# commit comment "sensible flow table sizes [[phab:T248394|T248394]]"
* 22:56 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Touch and secondary sync of IS for cache-busting (duration: 00m 58s)
* 22:53 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Provide wmgSiteLogoIcon (duration: 00m 57s)
* 22:52 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: Set wmgSiteLogoIcon for each project family and four special wikis (duration: 00m 58s)
* 22:50 jforrester@deploy1001: Synchronized wmf-config/mobile.php: Set wgMobileFrontendLogo from wgLogos['icon'] if set (duration: 00m 59s)
* 22:37 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Touch and secondary sync of IS for cache-busting (duration: 00m 57s)
* 22:36 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Split wgLogos setting into wmgSiteLogo1x etc. (duration: 00m 59s)
* 22:33 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: Construct wgLogos in CommonSettings so that projects can inherit values (duration: 01m 02s)
* 19:55 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 19:55 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime
* 15:36 ejegg: updated payments listener (standalone SmashPig) from {{Gerrit|dc0c6b208b}} to {{Gerrit|d80e4c5abd}}
* 15:32 vgutierrez: pool cp2033 - [[phab:T248816|T248816]]
* 15:25 jeh: add icinga 2h downtime and soft reset iDRAC on labstore1005.mgmt.eqiad.wmnet [[phab:T247965|T247965]]
* 14:58 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 14:57 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 14:57 vgutierrez@cumin1001: START - Cookbook sre.hosts.decommission
* 14:55 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 14:53 vgutierrez: depool & decommission cp2008 - [[phab:T248864|T248864]]
* 14:23 vgutierrez: pool cp2032 - [[phab:T248816|T248816]]
* 14:17 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 14:17 vgutierrez@cumin1001: START - Cookbook sre.hosts.decommission
* 14:01 vgutierrez: depool & decommission cp2006 - [[phab:T248856|T248856]]
* 13:57 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 13:45 vgutierrez: pool cp2031 - [[phab:T248816|T248816]]
* 13:09 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 13:07 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=99)
* 13:07 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
* 13:06 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 12:56 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 12:56 vgutierrez@cumin1001: START - Cookbook sre.hosts.decommission
* 12:53 vgutierrez: depool & decommission cp2005 - [[phab:T248848|T248848]]
* 12:26 cdanis: cdanis@re0.cr2-codfw# set chassis fpc 5 inline-services flex-flow-sizing    cdanis@re0.cr2-codfw# commit comment "flex-flow-sizing [[phab:T248394|T248394]]"
* 12:24 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 12:23 vgutierrez@cumin1001: START - Cookbook sre.hosts.decommission
* 12:21 vgutierrez: depool & decommission cp2004 - [[phab:T248824|T248824]]
* 12:03 XioNoX: delete unused ROA for ARIN v6 prefixes - [[phab:T235886|T235886]]
* 11:59 XioNoX: delete unused ROAs for RIPE prefixes - [[phab:T235886|T235886]]
* 11:42 mutante: miscweb2002 - race condition with apache2 mpm and php7.3 module met - a2dismond mpm_event ; systemctl restart apache2 ; puppet agent -tv (also see [[phab:T196968|T196968]], https://gerrit.wikimedia.org/r/c/operations/puppet/+/451206) [[phab:T247887|T247887]]
* 11:37 mutante: miscweb2002 - installed OS, added to puppet, added role and  ... sed -i 's/tin.eqiad/deployment.eqiad/g' /srv/deployment/iegreview/iegreview-cache/.config ([[phab:T247648|T247648]])
* 11:30 marostegui: Deploy schema change on dbstore1004:3314
* 11:22 XioNoX: delete ARIN allocations from RIPE's IRR - [[phab:T235886|T235886]]
* 11:11 Urbanecm: EU SWAT done
* 11:10 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: {{Gerrit|ac7e625}}: Add collections.nmnh.si.edu to $wgCopyUploadsDomains ([[phab:T248659|T248659]]; take II) (duration: 00m 58s)
* 11:09 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: {{Gerrit|ac7e625}}: Add collections.nmnh.si.edu to $wgCopyUploadsDomains ([[phab:T248659|T248659]]) (duration: 00m 58s)
* 11:08 vgutierrez: pool cp2030 - [[phab:T248816|T248816]]
* 11:07 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: {{Gerrit|c8c06f9}}: Add 3 additional namespaces and assoicated talk pages to trwiktionary ([[phab:T248734|T248734]]; take II) (duration: 00m 59s)
* 11:06 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: {{Gerrit|c8c06f9}}: Add 3 additional namespaces and assoicated talk pages to trwiktionary ([[phab:T248734|T248734]]) (duration: 00m 59s)
* 10:43 dzahn@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
* 10:34 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm
* 10:33 dzahn@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99)
* 10:33 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm
* 09:59 hoo: Temporary modified dumpsgen's crontab on snapshot1008 so that the Wikidata JSON dumps start at 9:59 UTC today ([[phab:T248612|T248612]])
* 09:56 hoo@deploy1001: Synchronized php-1.35.0-wmf.25/extensions/Wikibase/repo/maintenance/DumpEntities.php: DumpEntities: Fix DB group default override ([[phab:T248612|T248612]]) (duration: 01m 02s)
* 09:19 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 09:15 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 08:30 vgutierrez: pool cp2029 - [[phab:T248816|T248816]]
* 08:12 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 08:12 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 08:12 vgutierrez@cumin1001: START - Cookbook sre.hosts.decommission
* 08:10 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 07:53 vgutierrez: depool & decommission cp2002 - [[phab:T248818|T248818]]
* 07:48 marostegui: Run cloudcontrol1003:~# wmcs-wikireplica-dns to promote dbproxy1018 to wikireplicas active proxy [[phab:T231520|T231520]]
* 07:40 marostegui: Replace dbproxy1010 with dbproxy1011 for wiki replicas, analytics - [[phab:T231520|T231520]]
* 07:28 marostegui: Deploy schema change on labswiki (wikitech) - [[phab:T248333|T248333]]
* 07:26 marostegui: Deploy schema change on s4 codfw, this will generate lag on codfw - [[phab:T248333|T248333]]
* 07:17 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 07:17 vgutierrez@cumin1001: START - Cookbook sre.hosts.decommission
* 07:10 vgutierrez: depool and decommission cp2001 - [[phab:T248815|T248815]]
* 06:52 vgutierrez: pool cp2028 - [[phab:T247340|T247340]]
* 06:29 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 06:28 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1074 after schema change', diff saved to https://phabricator.wikimedia.org/P10813 and previous config saved to /var/cache/conftool/dbconfig/20200330-062858-marostegui.json
* 06:26 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 06:04 marostegui: Deploy schema change on db1074 with replication, this will generate lag on s2 labs
* 06:03 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1074 for schema change', diff saved to https://phabricator.wikimedia.org/P10812 and previous config saved to /var/cache/conftool/dbconfig/20200330-060338-marostegui.json
* 05:40 vgutierrez: pool cp2027 - [[phab:T247340|T247340]]
* 05:13 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 05:10 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 04:55 vgutierrez: Enable TLS Session tickets in ulsfo - [[phab:T245616|T245616]]
* 04:32 vgutierrez: upgrade ATS to version 8.0.6-1wm4 on ulsfo - [[phab:T245616|T245616]]
 
== 2020-03-29 ==
* 08:24 elukey: powercycle elastic1059 - mgmt/serial console stuck, no ssh - racadm getsel shows a lot of OEM errors occurred, nothing specific
 
== 2020-03-28 ==
* 16:54 elukey: restart yarn on analytics1071
* 12:05 vgutierrez: preemptive restart of ats-tls on cp1081 and cp3062 - [[phab:T248736|T248736]]
* 11:32 vgutierrez: restart ats-tls on cp1077 - [[phab:T248736|T248736]]
* 08:34 vgutierrez: pool cp1089
* 08:30 vgutierrez: restarting ats-tls on cp1089
 
== 2020-03-27 ==
* 20:51 ejegg: updated payments-wiki from {{Gerrit|db618f429d}} to {{Gerrit|1640f5e21e}}
* 15:15 andrew@deploy1001: Finished deploy [horizon/deploy@33e67f9]: fix Identity->Projects with keystone Queens (duration: 03m 35s)
* 15:12 andrew@deploy1001: Started deploy [horizon/deploy@33e67f9]: fix Identity->Projects with keystone Queens
* 14:41 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1129 after schema change', diff saved to https://phabricator.wikimedia.org/P10807 and previous config saved to /var/cache/conftool/dbconfig/20200327-144125-marostegui.json
* 14:22 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1129 for schema change', diff saved to https://phabricator.wikimedia.org/P10806 and previous config saved to /var/cache/conftool/dbconfig/20200327-142240-marostegui.json
* 14:19 moritzm: updating linux-image-4.9.0-11-amd64 where applicable
* 13:30 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1076 after schema change', diff saved to https://phabricator.wikimedia.org/P10805 and previous config saved to /var/cache/conftool/dbconfig/20200327-133022-marostegui.json
* 13:07 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1076 for schema change', diff saved to https://phabricator.wikimedia.org/P10804 and previous config saved to /var/cache/conftool/dbconfig/20200327-130706-marostegui.json
* 13:05 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1105:3312 after schema change', diff saved to https://phabricator.wikimedia.org/P10803 and previous config saved to /var/cache/conftool/dbconfig/20200327-130542-marostegui.json
* 12:49 Amir1: ladsgroup@mwmaint1002:~$ mwscript createAndPromote.php --wiki=labswiki --force "Ladsgroup" --interface-admin
* 12:21 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1105:3312 for schema change', diff saved to https://phabricator.wikimedia.org/P10802 and previous config saved to /var/cache/conftool/dbconfig/20200327-122144-marostegui.json
* 12:20 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1103:3312 after schema change', diff saved to https://phabricator.wikimedia.org/P10801 and previous config saved to /var/cache/conftool/dbconfig/20200327-122058-marostegui.json
* 12:02 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1103:3312 for schema change', diff saved to https://phabricator.wikimedia.org/P10800 and previous config saved to /var/cache/conftool/dbconfig/20200327-120234-marostegui.json
* 11:54 hnowlan@puppetmaster1001: conftool action : set/pooled=yes:weight=10; selector: dc=codfw,cluster=restbase,service=restbase-backend,name=restbase202[123].codfw.wmnet
* 11:51 hnowlan@puppetmaster1001: conftool action : set/pooled=yes:weight=10; selector: dc=codfw,cluster=restbase,service=restbase-ssl,name=restbase202[123].codfw.wmnet
* 11:45 hnowlan@puppetmaster1001: conftool action : set/pooled=yes:weight=10; selector: dc=codfw,cluster=restbase,service=restbase,name=restbase2023.codfw.wmnet
* 11:44 hnowlan@puppetmaster1001: conftool action : set/pooled=yes:weight=10; selector: dc=codfw,cluster=restbase,service=restbase,name=restbase2022.codfw.wmnet
* 11:44 oblivian@puppetmaster1001: conftool action : edit; selector: dc=codfw,cluster=restbase,service=restbase-ssl,name=restbase202[1].codfw.wmnet
* 11:44 hnowlan@puppetmaster1001: conftool action : set/pooled=yes:weight=10; selector: dc=codfw,cluster=restbase,service=restbase,name=restbase2021.codfw.wmnet
* 10:55 mutante: revoke puppet cert webserver-misc-apps.discovery.wmnet and recreate with additional SANs for new VMs
* 10:45 mutante: miscweb1002 - upload and unpack RackTables-0.21.4 ([[phab:T247646|T247646]] [[phab:T247648|T247648]])
* 10:28 marostegui: Alter db2125 s2 to set page_restrictions to default NULL - [[phab:T248333|T248333]]
* 10:12 mutante: miscweb1002 - sed -i 's/tin.eqiad/deployment.eqiad/g' /srv/deployment/iegreview/iegreview-cache/.config  [[phab:T247648|T247648]]
* 10:04 vgutierrez: upload trafficserver 8.0.6-1wm4 to apt.wm.o (buster) - [[phab:T245616|T245616]] [[phab:T170567|T170567]]
* 10:03 mutante: sodium - find /srv/mirrors/debian/ -user root -exec chown -h mirror:mirror <nowiki>{</nowiki><nowiki>}</nowiki> \;  (-h to also fix symbolic links); sudo -u mirror ftpsync ([[phab:T248660|T248660]])
* 10:02 marostegui: Alter db2084:3315 enwikivoyage.page to set page_restrictions to default NULL - [[phab:T248333|T248333]]
* 10:01 marostegui: Alter db1096:3315 enwikivoyage.page to set page_restrictions to default NULL - [[phab:T248333|T248333]]
* 09:37 mutante: sodium - running ftpsync as user mirror ([[phab:T248660|T248660]])
* 09:36 mutante: sodium fixing root owned files in /srv/mirrors/debian to be owned by mirror:mirror ([[phab:T248660|T248660]])
* 09:32 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1098:3316', diff saved to https://phabricator.wikimedia.org/P10799 and previous config saved to /var/cache/conftool/dbconfig/20200327-093214-marostegui.json
* 09:31 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1098:3316 for schema change', diff saved to https://phabricator.wikimedia.org/P10798 and previous config saved to /var/cache/conftool/dbconfig/20200327-093106-marostegui.json
* 07:58 marostegui: Deploy schema change on s2 codfw - this will generate lag on s2 codfw - [[phab:T248333|T248333]]
* 07:36 elukey: execute 'rm /etc/logrotate.d/ceph-common' on cloudvirt[1,2]* and cloudcontrol* to stop daily cronspam (file not in the puppet catalog anymore)
* 07:32 moritzm: installing grub2 updates from Stretch point release
* 07:23 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1130 after schema change', diff saved to https://phabricator.wikimedia.org/P10796 and previous config saved to /var/cache/conftool/dbconfig/20200327-072334-marostegui.json
* 07:02 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1130 for schema change', diff saved to https://phabricator.wikimedia.org/P10795 and previous config saved to /var/cache/conftool/dbconfig/20200327-070224-marostegui.json
* 07:00 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1082 after schema change', diff saved to https://phabricator.wikimedia.org/P10794 and previous config saved to /var/cache/conftool/dbconfig/20200327-070014-marostegui.json
* 06:31 marostegui: Deploy schema change on db1082, this will generate lag on s5 labs
* 06:30 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1082 for schema change', diff saved to https://phabricator.wikimedia.org/P10793 and previous config saved to /var/cache/conftool/dbconfig/20200327-063042-marostegui.json
 
== 2020-03-26 ==
* 23:07 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: {{Gerrit|ce63a4e}}: Enable wmgUseFooterContactLink for cswiki ([[phab:T248584|T248584]]; take II) (duration: 00m 57s)
* 23:05 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: {{Gerrit|ce63a4e}}: Enable wmgUseFooterContactLink for cswiki ([[phab:T248584|T248584]]) (duration: 00m 58s)
* 22:51 krinkle@deploy1001: Synchronized php-1.35.0-wmf.25/includes/user/UserRightsProxy.php: {{Gerrit|I9121f5aae}} (4/4) (duration: 00m 58s)
* 22:50 krinkle@deploy1001: Synchronized php-1.35.0-wmf.25/includes/search/SearchMySQL.php: {{Gerrit|I9121f5aae}} (3/4) (duration: 00m 58s)
* 22:48 krinkle@deploy1001: Synchronized php-1.35.0-wmf.25/includes/objectcache/SqlBagOStuff.php: {{Gerrit|I9121f5aae}} (2/4) (duration: 00m 58s)
* 22:44 krinkle@deploy1001: Synchronized php-1.35.0-wmf.25/includes/jobqueue/jobs/RecentChangesUpdateJob.php: {{Gerrit|I9121f5aae}} (1/4) (duration: 01m 00s)
* 22:05 ejegg: updated fundraising CiviCRM from {{Gerrit|f1cb23e809}} to {{Gerrit|cf2e2c11c3}}
* 21:43 mholloway-shell@deploy1001: Synchronized php-1.35.0-wmf.25/extensions/MachineVision: Fix: Stop sorting label suggestions by Wikidata ID in ApiQueryImageLabels (duration: 01m 00s)
* 21:34 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 21:32 cdanis: cdanis@re0.cr1-eqsin# set chassis afeb slot 0 inline-services flex-flow-sizing    cdanis@re0.cr1-eqsin# commit comment "flex-flow-sizing [[phab:T248394|T248394]]"
* 21:31 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 21:30 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 21:27 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 21:27 mholloway-shell@deploy1001: Finished deploy [mobileapps/deploy@f34260c]: Update mobileapps to {{Gerrit|3f30f20c}} (duration: 03m 07s)
* 21:24 mholloway-shell@deploy1001: Started deploy [mobileapps/deploy@f34260c]: Update mobileapps to {{Gerrit|3f30f20c}}
* 21:15 cdanis: repool ulsfo
* 21:12 cdanis: applied flow-table-size configuration to cr4-ulsfo which did not need a reboot to apply it [[phab:T248394|T248394]]
* 20:51 cdanis: cdanis@cr3-ulsfo> request system reboot
* 20:36 cdanis: depool ulsfo
* 16:52 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 16:50 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 16:43 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 16:40 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 16:34 XioNoX: stop exchanging full BGP view between eqiad and codfw - [[phab:T246721|T246721]]
* 16:19 pt1979@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 16:18 XioNoX: stop advertising 208.80.152.0/22 from eqiad - [[phab:T246721|T246721]]
* 16:15 mutante: signing puppet cert for miscweb1002, installed buster, added insetup role ([[phab:T247887|T247887]])
* 16:15 ebernhardson: set cloudelastic-chi wikidatawiki_content to 0 replicas while reindexing
* 16:14 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 16:14 moritzm: rebooting mw2150 for some tests
* 16:12 XioNoX: stop advertising 2620:0:860::/46 from eqiad - [[phab:T246721|T246721]]
* 16:12 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 16:11 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 16:11 jmm@cumin2001: START - Cookbook sre.hosts.downtime
* 16:10 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 15:58 volans@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 15:53 volans@cumin1001: START - Cookbook sre.dns.netbox
* 15:51 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 15:51 moritzm: installing grub2 updates from Stretch point release
* 15:49 XioNoX: start advertising 208.80.154.0/23 from eqiad - [[phab:T246721|T246721]]
* 15:49 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 15:48 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 15:46 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 15:40 XioNoX: start advertising 2620:0:861::/48 from eqiad - [[phab:T246721|T246721]]
* 15:20 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 15:17 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 15:15 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 15:12 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 15:10 dzahn@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
* 15:02 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 15:02 aborrero@cumin1001: START - Cookbook sre.hosts.downtime
* 15:02 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 15:02 aborrero@cumin1001: START - Cookbook sre.hosts.downtime
* 15:02 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 15:02 aborrero@cumin1001: START - Cookbook sre.hosts.downtime
* 15:02 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 15:02 aborrero@cumin1001: START - Cookbook sre.hosts.downtime
* 15:01 mutante: [[phab:T247887|T247887]] - create Ganeti VM miscweb1002.eqiad.wmnet in the ganeti01.svc.eqiad.wmnet cluster on row C with 1 vCPUs, 2GB of RAM, 20GB of disk in the private network.
* 15:01 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 15:01 aborrero@cumin1001: START - Cookbook sre.hosts.downtime
* 14:59 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm
* 14:59 dzahn@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99)
* 14:59 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm
* 14:50 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 14:47 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 14:26 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 14:23 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 13:56 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1110 after schema change', diff saved to https://phabricator.wikimedia.org/P10787 and previous config saved to /var/cache/conftool/dbconfig/20200326-135625-marostegui.json
* 13:29 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1110 for schema change', diff saved to https://phabricator.wikimedia.org/P10786 and previous config saved to /var/cache/conftool/dbconfig/20200326-132940-marostegui.json
* 13:01 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1097:3315 after schema change', diff saved to https://phabricator.wikimedia.org/P10785 and previous config saved to /var/cache/conftool/dbconfig/20200326-130122-marostegui.json
* 12:57 oblivian@deploy1001: Synchronized wmf-config/ProductionServices.php: eventgate-main to use envoy [[phab:T244843|T244843]] (duration: 01m 07s)
* 12:33 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1097:3315 for schema change', diff saved to https://phabricator.wikimedia.org/P10784 and previous config saved to /var/cache/conftool/dbconfig/20200326-123302-marostegui.json
* 12:31 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1096:3315 after schema change', diff saved to https://phabricator.wikimedia.org/P10783 and previous config saved to /var/cache/conftool/dbconfig/20200326-123157-marostegui.json
* 12:25 mutante: analytics1028 - performing a puppet change on every run (all other hosts doing this were fixed just recently)
* 12:19 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1096:3315 for schema change', diff saved to https://phabricator.wikimedia.org/P10782 and previous config saved to /var/cache/conftool/dbconfig/20200326-121859-marostegui.json
* 11:38 awight: EU SWAT done
* 11:37 awight@deploy1001: Synchronized php-1.35.0-wmf.25/extensions/TwoColConflict: SWAT: [[gerrit:583576{{!}}Two hotfixes for guided tour (T248465)]] (duration: 01m 07s)
* 11:25 mutante: sodium - running ftpsync to get Debian mirror in sync
* 11:23 dcausse@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [[phab:T231517|T231517]]: [cirrus] force cloudelastic replica count to 1 (duration: 01m 05s)
* 11:21 dcausse@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [[phab:T231517|T231517]]: [cirrus] force cloudelastic replica count to 1 (duration: 01m 06s)
* 11:12 urbanecm@deploy1001: Synchronized php-1.35.0-wmf.25/extensions/ContentTranslation/modules/ui/mw.cx.ui.Categories.js: SWAT: {{Gerrit|1ea6bad}}: Allow publishing to continue even with broken categories ([[phab:T248302|T248302]]) (duration: 01m 07s)
* 11:06 urbanecm@deploy1001: Synchronized wmf-config/throttle.php: SWAT: {{Gerrit|d1bb0b1}}: Removed expired throttle.php entries (duration: 01m 09s)
* 11:00 hnowlan@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'changeprop' for release 'production' .
* 10:58 hnowlan@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'changeprop' for release 'production' .
* 10:54 hnowlan@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'changeprop' for release 'production' .
* 10:16 XioNoX: esams/knams: advertise 185.15.58.0/23 instead of 185.15.56.0/22 - [[phab:T207753|T207753]]
* 09:50 elukey: reboot stat1008 - gpu + drivers in a weird state after multiple tests
* 09:00 XioNoX: push v4 conditional advertising on cr3-knams - [[phab:T236785|T236785]]
* 08:44 marostegui: Deploy schema change on s5 codfw, lag will show up on codfw - [[phab:T248333|T248333]]
* 08:27 XioNoX: troubleshot v6 conditional advertisement from cr3-knams - [[phab:T236785|T236785]]
* 07:58 XioNoX: remove BGP session to AS8001 in eqiad (down and not replying to email)
* 07:40 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1085 after schema change', diff saved to https://phabricator.wikimedia.org/P10781 and previous config saved to /var/cache/conftool/dbconfig/20200326-074033-marostegui.json
* 07:31 marostegui: Deploy schema change on db1085, lag will appear on s6 on labs
* 07:30 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1085 for schema change', diff saved to https://phabricator.wikimedia.org/P10780 and previous config saved to /var/cache/conftool/dbconfig/20200326-073048-marostegui.json
* 07:07 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1093 after schema change', diff saved to https://phabricator.wikimedia.org/P10779 and previous config saved to /var/cache/conftool/dbconfig/20200326-070746-marostegui.json
* 06:59 marostegui: Deploy schema change on db1093
* 06:59 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1093 for schema change', diff saved to https://phabricator.wikimedia.org/P10778 and previous config saved to /var/cache/conftool/dbconfig/20200326-065929-marostegui.json
* 06:58 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1088 after schema change', diff saved to https://phabricator.wikimedia.org/P10777 and previous config saved to /var/cache/conftool/dbconfig/20200326-065814-marostegui.json
* 06:48 marostegui: Deploy schema change on db1088
* 06:47 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1088 for schema change', diff saved to https://phabricator.wikimedia.org/P10776 and previous config saved to /var/cache/conftool/dbconfig/20200326-064748-marostegui.json
* 06:46 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1098:3316 after schema change', diff saved to https://phabricator.wikimedia.org/P10775 and previous config saved to /var/cache/conftool/dbconfig/20200326-064648-marostegui.json
* 06:39 marostegui: Deploy schema change on db1098:3316
* 06:38 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1098:3316 for schema change', diff saved to https://phabricator.wikimedia.org/P10774 and previous config saved to /var/cache/conftool/dbconfig/20200326-063844-marostegui.json
* 06:36 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1096:3316 after schema change', diff saved to https://phabricator.wikimedia.org/P10773 and previous config saved to /var/cache/conftool/dbconfig/20200326-063633-marostegui.json
* 06:26 marostegui: Deploy schema change on db1096:3316
* 06:26 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1096:3316 for schema change', diff saved to https://phabricator.wikimedia.org/P10772 and previous config saved to /var/cache/conftool/dbconfig/20200326-062631-marostegui.json
* 06:22 marostegui: Rename nova and nova_api tables on db1117:3325 - [[phab:T248313|T248313]]
* 00:06 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enable Special:Investigate on testwiki ([[phab:T247645|T247645]]) (duration: 03m 14s)
 
== 2020-03-25 ==
* 23:49 catrope@deploy1001: Synchronized wmf-config/CommonSettings.php: Add investigate to $wgAvailableRights ([[phab:T247645|T247645]]) (duration: 03m 16s)
* 23:42 catrope@deploy1001: Synchronized php-1.35.0-wmf.25/extensions/CheckUser/: Retry because mw1251 timed out, and it is a proxy (duration: 03m 15s)
* 23:38 catrope@deploy1001: Synchronized php-1.35.0-wmf.25/extensions/CheckUser/: Add new investigate right ([[phab:T247645|T247645]]) (duration: 03m 17s)
* 22:21 rzl@deploy1001: helmfile [EQIAD] Ran 'apply' command on namespace 'eventgate-logging-external' for release 'canary' .
* 22:21 rzl@deploy1001: helmfile [EQIAD] Ran 'apply' command on namespace 'eventgate-logging-external' for release 'production' .
* 22:16 rzl@deploy1001: helmfile [EQIAD] Ran 'apply' command on namespace 'eventgate-logging-external' for release 'canary' .
* 22:16 rzl@deploy1001: helmfile [EQIAD] Ran 'apply' command on namespace 'eventgate-logging-external' for release 'production' .
* 22:10 rzl@deploy1001: helmfile [CODFW] Ran 'apply' command on namespace 'eventgate-logging-external' for release 'canary' .
* 22:10 rzl@deploy1001: helmfile [CODFW] Ran 'apply' command on namespace 'eventgate-logging-external' for release 'production' .
* 22:05 rzl@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'eventgate-logging-external' for release 'canary' .
* 22:05 rzl@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'eventgate-logging-external' for release 'production' .
* 22:05 rlazarus: updating eventgate-logging-external to envoy 1.13.1 [[phab:T246868|T246868]]
* 22:00 rzl@deploy1001: helmfile [EQIAD] Ran 'apply' command on namespace 'eventgate-analytics-external' for release 'canary' .
* 22:00 rzl@deploy1001: helmfile [EQIAD] Ran 'apply' command on namespace 'eventgate-analytics-external' for release 'production' .
* 21:59 ppchelko@deploy1001: Finished deploy [restbase/deploy@a1c3be4] (dev-cluster): Remove experimental PCS endpoints (duration: 02m 57s)
* 21:56 ppchelko@deploy1001: Started deploy [restbase/deploy@a1c3be4] (dev-cluster): Remove experimental PCS endpoints
* 21:54 rzl@deploy1001: helmfile [CODFW] Ran 'apply' command on namespace 'eventgate-analytics-external' for release 'canary' .
* 21:54 rzl@deploy1001: helmfile [CODFW] Ran 'apply' command on namespace 'eventgate-analytics-external' for release 'production' .
* 21:46 urandom: dropping unused Cassandra keyspaces -- [[phab:T248018|T248018]]
* 21:45 rzl@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'eventgate-analytics-external' for release 'canary' .
* 21:44 rzl@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'eventgate-analytics-external' for release 'production' .
* 21:44 rlazarus: updating eventgate-analytics-external to envoy 1.13.1 [[phab:T246868|T246868]]
* 21:39 rzl@deploy1001: helmfile [EQIAD] Ran 'apply' command on namespace 'eventgate-analytics' for release 'canary' .
* 21:39 rzl@deploy1001: helmfile [EQIAD] Ran 'apply' command on namespace 'eventgate-analytics' for release 'production' .
* 21:27 rzl@deploy1001: helmfile [CODFW] Ran 'apply' command on namespace 'eventgate-analytics' for release 'canary' .
* 21:27 rzl@deploy1001: helmfile [CODFW] Ran 'apply' command on namespace 'eventgate-analytics' for release 'production' .
* 21:16 rlazarus: holding off on updating eventgate-analytics until EU time, to check on unexpected helmfile diffs [[phab:T246868|T246868]]
* 21:11 rzl@deploy1001: helmfile [CODFW] Ran 'apply' command on namespace 'eventgate-analytics' for release 'canary' .
* 21:11 rzl@deploy1001: helmfile [CODFW] Ran 'apply' command on namespace 'eventgate-analytics' for release 'production' .
* 21:10 rzl@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'eventgate-analytics' for release 'canary' .
* 21:10 rzl@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'eventgate-analytics' for release 'production' .
* 21:07 rzl@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'eventgate-analytics' for release 'canary' .
* 21:07 rzl@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'eventgate-analytics' for release 'production' .
* 21:07 rlazarus: updating eventgate-analytics to envoy 1.13.1 [[phab:T246868|T246868]]
* 20:36 rzl@deploy1001: helmfile [EQIAD] Ran 'apply' command on namespace 'cxserver' for release 'production' .
* 20:32 rzl@deploy1001: helmfile [CODFW] Ran 'apply' command on namespace 'cxserver' for release 'production' .
* 20:22 rzl@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'cxserver' for release 'staging' .
* 20:22 rlazarus: updating cxserver to envoy 1.13.1 [[phab:T246868|T246868]]
* 20:19 rzl@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'citoid' for release 'staging' .
* 20:19 rlazarus: updating citoid to envoy 1.13.1 [[phab:T246868|T246868]]
* 20:16 rzl@deploy1001: helmfile [EQIAD] Ran 'apply' command on namespace 'eventstreams' for release 'canary' .
* 20:16 rzl@deploy1001: helmfile [EQIAD] Ran 'apply' command on namespace 'eventstreams' for release 'production' .
* 20:01 rzl@deploy1001: helmfile [CODFW] Ran 'apply' command on namespace 'eventstreams' for release 'canary' .
* 20:01 rzl@deploy1001: helmfile [CODFW] Ran 'apply' command on namespace 'eventstreams' for release 'production' .
* 19:36 hasharDinner: Jenkins restarted on all machines
* 19:30 rzl@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'eventstreams' for release 'canary' .
* 19:30 rzl@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'eventstreams' for release 'production' .
* 19:29 rlazarus: updating eventstreams to envoy 1.13.1 [[phab:T246868|T246868]]
* 19:28 twentyafterfour: group1 looks good after deploying wmf.25 refs [[phab:T233873|T233873]]
* 19:27 hashar: upgrading Jenkins # [[phab:T248122|T248122]]
* 19:26 twentyafterfour@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.35.0-wmf.25  refs [[phab:T233873|T233873]]
* 19:26 twentyafterfour: scap sync-proxies failed on mw1251
* 18:53 ppchelko@deploy1001: Finished deploy [restbase/deploy@a1c3be4]: Add restbase202[123] [[phab:T244178|T244178]] (duration: 14m 00s)
* 18:39 ppchelko@deploy1001: Started deploy [restbase/deploy@a1c3be4]: Add restbase202[123] [[phab:T244178|T244178]]
* 18:39 ppchelko@deploy1001: Finished deploy [restbase/deploy@777b881]: Remove experimental PCS endpoints (duration: 14m 28s)
* 18:24 ppchelko@deploy1001: Started deploy [restbase/deploy@777b881]: Remove experimental PCS endpoints
* 18:21 tgr@deploy1001: Synchronized php-1.35.0-wmf.25/extensions/GrowthExperiments/modules/homepage/: re-sync, mw1251 failed (duration: 03m 18s)
* 18:13 tgr@deploy1001: Synchronized php-1.35.0-wmf.25/extensions/GrowthExperiments/modules/homepage/: SWAT: [[gerrit:583393{{!}}Mentorship module: Update for root screen refactor (T248422)]] (duration: 03m 23s)
* 18:06 ppchelko@deploy1001: Finished deploy [changeprop/deploy@4bdf55b]: Stop rerendering experimental PCS endpoints (duration: 01m 40s)
* 18:05 ppchelko@deploy1001: Started deploy [changeprop/deploy@4bdf55b]: Stop rerendering experimental PCS endpoints
* 17:43 mvolz@deploy1001: helmfile [CODFW] Ran 'apply' command on namespace 'citoid' for release 'production' .
* 17:38 mvolz@deploy1001: helmfile [EQIAD] Ran 'apply' command on namespace 'citoid' for release 'production' .
* 17:33 mvolz@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'citoid' for release 'staging' .
* 16:50 moritzm: installing python-bleach security updates
* 16:47 moritzm: updated jenkins packages on apt.wikimedia.org to 2.222.1
* 16:33 rzl@deploy1001: helmfile [EQIAD] Ran 'apply' command on namespace 'blubberoid' for release 'production' .
* 16:32 sukhe: upload cescout 0.1.0-1 to apt.wm.o (buster) - [[phab:T247273|T247273]]
* 16:17 rzl@deploy1001: helmfile [CODFW] Ran 'apply' command on namespace 'blubberoid' for release 'production' .
* 16:15 rzl@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'blubberoid' for release 'staging' .
* 16:07 rlazarus: updating blubberoid to envoy 1.13.1 [[phab:T246868|T246868]]
* 15:21 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db2115 after reimage to Buster', diff saved to https://phabricator.wikimedia.org/P10767 and previous config saved to /var/cache/conftool/dbconfig/20200325-152148-marostegui.json
* 15:14 moritzm: installing deneb.codfw.wmnet [[phab:T248165|T248165]]
* 14:51 cdanis: repool codfw [[phab:T248394|T248394]]
* 14:46 mutante: closed port 80 for caching servers on misc backends https://gerrit.wikimedia.org/r/q/topic:%22applayer-tls%22+(status:open%20OR%20status:merged) as final step per service on [[phab:T210411|T210411]]
* 14:39 mutante: static microsites (annual.wikimedia.org, research.wikimedia.org, static-bugzilla etc). closed port 80 for caching servers, finalizing switch to https behind caching servers
* 14:08 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 14:06 marostegui@cumin1001: START - Cookbook sre.hosts.downtime
* 13:53 hnowlan@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'changeprop' for release 'production' .
* 13:48 hnowlan@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'changeprop' for release 'staging' .
* 13:26 _joe_: cumin A:puppetmaster 'apt-get -y install puppet-common'
* 13:03 marostegui@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 13:02 marostegui@cumin1001: START - Cookbook sre.hosts.downtime
* 12:56 marostegui: Deploy schema change on db1139:3316
* 12:45 marostegui: Stop MySQL on db2115 for reimage to buster
* 11:50 cdanis: cr1-codfw: `set chassis fpc 5 inline-services flex-flow-sizing` and `request chassis fpc restart slot 5` [[phab:T248394|T248394]]
* 11:46 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2115 for upgrade', diff saved to https://phabricator.wikimedia.org/P10763 and previous config saved to /var/cache/conftool/dbconfig/20200325-114655-marostegui.json
* 11:39 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 11:37 mutante: decom mw1250 - mw1253
* 11:37 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
* 11:35 cdanis: depool codfw for router maintenance [[phab:T248394|T248394]]
* 11:33 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 11:32 mutante: decom mw1232 - mw1235
* 11:31 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
* 11:27 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw125[0-3].eqiad.wmnet
* 11:26 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw123[2-5].eqiad.wmnet
* 11:22 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 11:22 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 11:21 Urbanecm: EU SWAT done
* 11:21 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 11:21 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 11:20 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw123[2-5].eqiad.wmnet
* 11:20 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw125[0-3].eqiad.wmnet
* 11:19 urbanecm@deploy1001: Synchronized wmf-config/CommonSettings.php: SWAT: {{Gerrit|59412db}}: Add gwtoolset to available rights to allow granting to global groups (duration: 01m 07s)
* 11:12 urbanecm@deploy1001: Synchronized wmf-config/CommonSettings.php: SWAT: {{Gerrit|7b8d7c5}}: TwoColConflict: Limited default deployment CommonSettings.php ([[phab:T244863|T244863]]) (duration: 01m 06s)
* 11:10 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: {{Gerrit|81cda0f}}: TwoColConflict: Limited default deployment InitialiseSettings.php ([[phab:T244863|T244863]]; take II) (duration: 01m 06s)
* 11:08 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: {{Gerrit|81cda0f}}: TwoColConflict: Limited default deployment InitialiseSettings.php ([[phab:T244863|T244863]]) (duration: 01m 17s)
* 11:08 jynus@cumin1001: dbctl commit (dc=all): 'Reduce db1091 load, increase main traffic on all other s4 instances', diff saved to https://phabricator.wikimedia.org/P10762 and previous config saved to /var/cache/conftool/dbconfig/20200325-110821-jynus.json
* 10:55 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1137', diff saved to https://phabricator.wikimedia.org/P10761 and previous config saved to /var/cache/conftool/dbconfig/20200325-105503-marostegui.json
* 10:39 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1137', diff saved to https://phabricator.wikimedia.org/P10760 and previous config saved to /var/cache/conftool/dbconfig/20200325-103938-marostegui.json
* 10:37 XioNoX: change aggregate policy for 2620:0:862::/48 on cr3-knams - [[phab:T236785|T236785]]
* 10:19 XioNoX: change aggregate policy for v4 prefixes on cr2-eqdfw - [[phab:T236785|T236785]]
* 10:04 oblivian@deploy1001: helmfile [EQIAD] Ran 'apply' command on namespace 'eventgate-main' for release 'canary' .
* 10:04 oblivian@deploy1001: helmfile [EQIAD] Ran 'apply' command on namespace 'eventgate-main' for release 'production' .
* 09:56 XioNoX: change aggregate policy for 2620:0:860::/46 on cr2-eqdfw - [[phab:T236785|T236785]]
* 09:54 vgutierrez: Enable inbound TLSv1.3 on upload@eqsin - [[phab:T170567|T170567]]
* 09:27 jmm@cumin2001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
* 09:23 vgutierrez: upgrade ATS to 8.0.6-1wm3 on upload@eqsin - [[phab:T170567|T170567]]
* 09:14 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1137', diff saved to https://phabricator.wikimedia.org/P10759 and previous config saved to /var/cache/conftool/dbconfig/20200325-091421-marostegui.json
* 09:02 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1137', diff saved to https://phabricator.wikimedia.org/P10758 and previous config saved to /var/cache/conftool/dbconfig/20200325-090227-marostegui.json
* 08:55 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 08:53 marostegui@cumin1001: START - Cookbook sre.hosts.downtime
* 08:38 marostegui: Reimage db1137
* 08:18 marostegui: Reboot db1117 for full-upgrade
* 08:15 oblivian@deploy1001: helmfile [CODFW] Ran 'apply' command on namespace 'eventgate-main' for release 'canary' .
* 08:15 oblivian@deploy1001: helmfile [CODFW] Ran 'apply' command on namespace 'eventgate-main' for release 'production' .
* 08:14 _joe_: upgrading all eventgate-main to envoy 1.13.1 [[phab:T246868|T246868]]
* 08:12 marostegui: Stop all mysql daemons on db1117
* 07:50 oblivian@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'eventgate-main' for release 'canary' .
* 07:50 oblivian@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'eventgate-main' for release 'production' .
* 07:42 XioNoX: reboot scs-eqsin for CPU usage
* 07:20 jmm@cumin2001: START - Cookbook sre.ganeti.makevm
* 07:09 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1137 for upgrade', diff saved to https://phabricator.wikimedia.org/P10757 and previous config saved to /var/cache/conftool/dbconfig/20200325-070946-marostegui.json
* 06:57 marostegui: Deploy schema change on db2129 (s6 codfw master)
* 06:15 marostegui: Rename tables on db1133 (m5 master) nova_api database - [[phab:T248313|T248313]]
* 06:13 marostegui: Remove grants 'nova'@'208.80.154.23' on nova.* - [[phab:T248313|T248313]]
 
== 2020-03-24 ==
* 20:53 cdanis: repool eqsin
* 20:52 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: Don't hard-set wgTmhUseBetaFeatures to true, let it vary by wiki (duration: 01m 07s)
* 20:50 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Touch and secondary sync of IS for cache-busting (duration: 01m 07s)
* 20:49 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Set wgTmhUseBetaFeatures to vary by wiki (duration: 01m 06s)
* 20:35 twentyafterfour@deploy1001: rebuilt and synchronized wikiversions files: Attempt #2: group0 wikis to 1.35.0-wmf.25 refs [[phab:T233873|T233873]]
* 20:32 twentyafterfour@deploy1001: Synchronized wmf-config: Now touch and sync again because of settings cache rache condition. refs [[phab:T248409|T248409]] (duration: 00m 59s)
* 20:31 cdanis: rebooting cr2-eqsin [[phab:T248394|T248394]]
* 20:30 twentyafterfour@deploy1001: Synchronized wmf-config: Now sync InitializeSettings* refs [[phab:T248409|T248409]] (duration: 00m 59s)
* 20:28 twentyafterfour@deploy1001: Synchronized wmf-config/CommonSettings.php: sync CommonSettings before InitialiseSettings refs [[phab:T248409|T248409]] (duration: 00m 58s)
* 20:27 volans: force rebooting analytics1044 from console, host down and unreachable (ping, ssh, console)
* 20:26 cdanis: commit flow-table-size on cr2-eqsin [[phab:T248394|T248394]]
* 20:19 cdanis: eqsin depooled for router maintenance at 16:15
* 19:29 twentyafterfour@deploy1001: scap failed: average error rate on 4/11 canaries increased by 10x (rerun with --force to override this check, see https://logstash.wikimedia.org/goto/db09a36be5ed3e81155041f7d46ad040 for details)
* 19:29 twentyafterfour: rolling back to wmf.24 due to high error rate refs [[phab:T233873|T233873]]
* 19:28 twentyafterfour@deploy1001: scap failed: average error rate on 7/11 canaries increased by 10x (rerun with --force to override this check, see https://logstash.wikimedia.org/goto/db09a36be5ed3e81155041f7d46ad040 for details)
* 18:49 gehel: repooling wdqs1006, catched up on lag
* 17:12 hashar@deploy1001: Finished scap: testwiki to 1.35.0-wmf.25 and rebuild l10n cache # [[phab:T233873|T233873]] (duration: 77m 52s)
* 17:10 ebernhardson: update cloudelastic-chi replica counts from 2 to 1 [[phab:T231517|T231517]]
* 16:41 moritzm: installing linux-perf updates on stretch
* 16:31 moritzm: installing linux-perf-4.19 updates on buster
* 15:58 mutante: installing OS on otrs1001.eqiad.wmnet ([[phab:T248028|T248028]])
* 15:55 hashar@deploy1001: Started scap: testwiki to 1.35.0-wmf.25 and rebuild l10n cache # [[phab:T233873|T233873]]
* 15:35 hnowlan@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'changeprop' for release 'staging' .
* 15:31 hashar@deploy1001: Pruned MediaWiki: 1.35.0-wmf.22 (duration: 02m 02s)
* 15:29 hashar@deploy1001: Pruned MediaWiki: 1.35.0-wmf.21 (duration: 24m 00s)
* 15:17 hashar: Cleaning old MediaWiki deployments # [[phab:T233873|T233873]]
* 15:03 hashar: Applied patches to 1.35.0-wmf.25 # [[phab:T233873|T233873]]
* 14:59 hashar: scap prep 1.35.0-wmf.25 # [[phab:T233873|T233873]]
* 14:55 gehel: depooling wdqs1006 to catch up on lag
* 14:28 marostegui: Deploy schema change on db2117 (s6)
* 14:26 hashar: Branching wmf/1.35.0-wmf.25 # [[phab:T233873|T233873]]
* 13:22 moritzm: installing glib2.0 updates from Stretch point release
* 13:04 moritzm: installing maridb-10.1 updates from Stretch point release (client/tools/libraries as packaged by Debian, different from wmf-mariadb)
* 12:16 Urbanecm: mwscript extensions/CentralAuth/maintenance/fixStuckGlobalRename.php --wiki=commonswiki --logwiki=metawiki 'Toroid~huwiki' 'Toroidt' ([[phab:T248371|T248371]])
* 12:10 Urbanecm: mwscript extensions/CentralAuth/maintenance/fixStuckGlobalRename.php --wiki=enwiki --logwiki=metawiki 'Erika Greenberg' 'Copperqueen' ([[phab:T248371|T248371]])
* 11:57 Urbanecm: mwscript extensions/CentralAuth/maintenance/fixStuckGlobalRename.php --wiki=commonswiki --logwiki=metawiki 'Romy merdeka' 'Romy_Dwi_Laksono' ([[phab:T248371|T248371]])
* 11:55 marostegui: Deploy schema change on db2087 db2089 db2097
* 11:34 Urbanecm: EU SWAT done
* 11:29 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: {{Gerrit|e28c819}}: Enable visualeditor on hewiktionary by default ([[phab:T248311|T248311]]; take II) (duration: 00m 59s)
* 11:28 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: {{Gerrit|e28c819}}: Enable visualeditor on hewiktionary by default ([[phab:T248311|T248311]]) (duration: 00m 59s)
* 11:25 urbanecm@deploy1001: Synchronized dblists/visualeditor-nondefault.dblist: SWAT: {{Gerrit|e28c819}}: Enable visualeditor on hewiktionary by default ([[phab:T248311|T248311]]) (duration: 01m 03s)
* 10:08 gehel: restart blazegraph and updater on wdqs1004
* 09:41 marostegui: Deploy schema change on db2076 (s6)
* 08:39 marostegui: Rename nova database tables on db1133 (m5 master) - [[phab:T248313|T248313]]
* 08:25 marostegui: Rename wikidatawiki.wb_terms on db1104 - [[phab:T248086|T248086]]
* 07:33 elukey: restart update-openstack-mirror.service on sodium
* 06:55 marostegui: Reboot dbproxy1018
* 06:42 marostegui: Reboot dbproxy1019
* 06:16 marostegui: Create empty database testreduce on m5 master [[phab:T245408|T245408]]
* 06:01 marostegui@cumin1001: dbctl commit (dc=all): 'Set db1087, vslow s8, with weight 1 as it originally had', diff saved to https://phabricator.wikimedia.org/P10753 and previous config saved to /var/cache/conftool/dbconfig/20200324-060133-marostegui.json
 
== 2020-03-23 ==
* 21:50 krinkle@deploy1001: Synchronized docroot/noc/css/vector.css: {{Gerrit|I627a0ddba5}} (duration: 01m 02s)
* 21:39 mholloway-shell@deploy1001: Finished deploy [recommendation-api/deploy@26aa5c3]: Update recommendation-api to {{Gerrit|3141cb6}} (duration: 03m 21s)
* 18:45 Urbanecm: Morning SWAT done
* 18:41 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: {{Gerrit|0e535b1}}: InitialiseSettings - clean up groupOverrides layout / spacing ([[phab:T231178|T231178]]; take II) (duration: 00m 59s)
* 18:39 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: {{Gerrit|0e535b1}}: InitialiseSettings - clean up groupOverrides layout / spacing ([[phab:T231178|T231178]]) (duration: 01m 00s)
* 18:35 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: {{Gerrit|6ca1593}}: wgCopyUploadsDomains: Fix supremecourt.gov ([[phab:T248146|T248146]]; take II) (duration: 00m 59s)
* 18:33 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: {{Gerrit|6ca1593}}: wgCopyUploadsDomains: Fix supremecourt.gov ([[phab:T248146|T248146]]) (duration: 01m 00s)
* 18:32 urbanecm@deploy1001: Synchronized php-1.35.0-wmf.24/extensions/VisualEditor/includes/ApiVisualEditorEdit.php: SWAT: {{Gerrit|cbda0e5}}: ApiVisualEditorEdit: Fix handling of minor parameter ([[phab:T248257|T248257]]) (duration: 01m 00s)
* 18:24 urbanecm@deploy1001: Synchronized wmf-config/CommonSettings.php: SWAT: {{Gerrit|212114e}}: Dont try to grant `oathauth-enable` to `*` ([[phab:T248282|T248282]]) (duration: 00m 59s)
* 18:19 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: {{Gerrit|0c12fc2}}: wgCopyUploadsDomains: Add supremecourt.gov ([[phab:T248146|T248146]], take II) (duration: 00m 59s)
* 18:18 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: {{Gerrit|0c12fc2}}: wgCopyUploadsDomains: Add supremecourt.gov ([[phab:T248146|T248146]]) (duration: 01m 00s)
* 18:18 otto@deploy1001: helmfile [EQIAD] Ran 'apply' command on namespace 'eventstreams' for release 'canary' .
* 18:18 otto@deploy1001: helmfile [EQIAD] Ran 'apply' command on namespace 'eventstreams' for release 'production' .
* 18:15 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: {{Gerrit|5eb70ac}}: Add configuration variable $wgRestAPIAdditionalRouteFiles ([[phab:T247997|T247997]]; take II) (duration: 00m 59s)
* 18:14 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: {{Gerrit|5eb70ac}}: Add configuration variable $wgRestAPIAdditionalRouteFiles ([[phab:T247997|T247997]]) (duration: 01m 00s)
* 18:09 otto@deploy1001: helmfile [CODFW] Ran 'apply' command on namespace 'eventstreams' for release 'canary' .
* 18:09 otto@deploy1001: helmfile [CODFW] Ran 'apply' command on namespace 'eventstreams' for release 'production' .
* 18:08 otto@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'eventstreams' for release 'production' .
* 18:05 otto@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'eventstreams' for release 'canary' .
* 18:05 otto@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'eventstreams' for release 'production' .
* 17:57 otto@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'eventstreams' for release 'canary' .
* 17:57 otto@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'eventstreams' for release 'production' .
* 16:31 ema: upload atskafka 0.5 to buster-wikimedia [[phab:T237993|T237993]]
* 15:59 otto@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Re-enablle client side error logging for group0 and hawwike - [[phab:T226986|T226986]] (take 2) (duration: 00m 59s)
* 15:56 otto@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Re-enablle client side error logging for group0 and hawwike - [[phab:T226986|T226986]] (duration: 01m 00s)
* 15:32 moritzm: installing maridb-10.1 updates from Stretch point release (client/tools/libraries as packaged by Debian, different from wmf-mariadb)
* 15:24 hnowlan@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'changeprop' for release 'staging' .
* 15:13 moritzm: installing freetype updates from Stretch point release
* 15:04 otto@deploy1001: Synchronized php-1.35.0-wmf.24/extensions/WikimediaEvents/modules/ext.wikimediaEvents/clientError.js: [[gerrit:578951{{!}}clientError: Changes event fields (T226986)]] (take 2) (duration: 00m 59s)
* 15:00 jynus@cumin1001: dbctl commit (dc=all): 'Remove db1089 for special groups (rc)', diff saved to https://phabricator.wikimedia.org/P10749 and previous config saved to /var/cache/conftool/dbconfig/20200323-150046-jynus.json
* 15:00 otto@deploy1001: Synchronized php-1.35.0-wmf.24/extensions/WikimediaEvents/modules/ext.wikimediaEvents/clientError.js: [[gerrit:578951{{!}}clientError: Changes event fields (T226986)]] (duration: 01m 01s)
* 14:46 jynus@cumin1001: dbctl commit (dc=all): 'Finish doubling db1107 main s1 traffic', diff saved to https://phabricator.wikimedia.org/P10748 and previous config saved to /var/cache/conftool/dbconfig/20200323-144612-jynus.json
* 14:40 jynus@cumin1001: dbctl commit (dc=all): 'Increase db1107 main s1 traffic a 50%', diff saved to https://phabricator.wikimedia.org/P10747 and previous config saved to /var/cache/conftool/dbconfig/20200323-144005-jynus.json
* 14:35 jynus@cumin1001: dbctl commit (dc=all): 'remove db1107 from special groups', diff saved to https://phabricator.wikimedia.org/P10746 and previous config saved to /var/cache/conftool/dbconfig/20200323-143536-jynus.json
* 14:28 elukey@deploy1001: helmfile [EQIAD] Ran 'apply' command on namespace 'eventgate-logging-external' for release 'canary' .
* 14:28 elukey@deploy1001: helmfile [EQIAD] Ran 'apply' command on namespace 'eventgate-logging-external' for release 'production' .
* 14:25 elukey@deploy1001: helmfile [CODFW] Ran 'apply' command on namespace 'eventgate-logging-external' for release 'canary' .
* 14:25 elukey@deploy1001: helmfile [CODFW] Ran 'apply' command on namespace 'eventgate-logging-external' for release 'production' .
* 14:13 elukey@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'eventgate-logging-external' for release 'canary' .
* 14:13 elukey@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'eventgate-logging-external' for release 'production' .
* 13:54 hnowlan@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'changeprop' for release 'staging' .
* 13:40 otto@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Temporarily disable client side error logging for a deploy - [[phab:T226986|T226986]] (duration: 01m 01s)
* 13:33 moritzm: installing python-cryptography updates from Stretch point release
* 12:27 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 12:25 marostegui@cumin1001: START - Cookbook sre.hosts.downtime
* 11:41 tgr@deploy1001: Synchronized php-1.35.0-wmf.24/extensions/OAuth/includes/frontend/specialpages/SpecialMWOAuthManageMyGrants.php: SWAT: [[gerrit:582768{{!}}Get consumerKey from consumerId not from acceptanceId (T247531)]] (duration: 01m 01s)
* 11:32 ema: cp1081: restart prometheus-trafficserver-tls-exporter.service
* 11:27 elukey: upload oozie 4.3.0-3 to thirparty/bigtop14 on wikimedia-stretch - [[phab:T244499|T244499]]
* 10:37 jbond42: switch idp1001 to tlsproxy::envoy profile
* 08:07 marostegui: Start m1 and m2 on db1117
* 08:04 marostegui: Stop m1 and m2 on db1117 to transfer them to db1077 - this will trigger dbproxies IRC alert
* 08:03 moritzm: installing python-cryptography bug fix updates from Stretch point release
* 07:46 marostegui: Stop MySQL on db1077 (non used) for 10.4 upgrade and gtid_domain_id on multisource [[phab:T149418|T149418]]
 
== 2020-03-22 ==
* 23:19 reedy@deploy1001: Synchronized wmf-config/InitialiseSettings-labs.php: [[phab:T248274|T248274]] (duration: 01m 19s)
* 04:37 gehel@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99)
 
== 2020-03-20 ==
* 23:16 dzahn@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
* 23:04 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm
* 21:06 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 21:04 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
* 21:01 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 20:59 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
* 20:59 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
* 20:59 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
* 20:57 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 20:56 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
* 20:55 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 20:53 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
* 20:41 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw124[4-9].eqiad.wmnet
* 20:40 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 20:40 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 20:40 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 20:40 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 20:40 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 20:39 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 20:37 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw123[0-1].eqiad.wmnet
* 20:32 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw122[7-9].eqiad.wmnet
* 20:18 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw124[4-9].eqiad.wmnet
* 20:18 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw123[0-1].eqiad.wmnet
* 20:18 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw122[7-9].eqiad.wmnet
* 15:44 hashar@deploy1001: Synchronized php-1.35.0-wmf.24/includes/ActorMigration.php: Avoid upsert() log warning spam in ActorMigration due to unique key array format - [[phab:T248147|T248147]] (duration: 01m 01s)
* 13:34 hnowlan@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'kube-system' for release 'calico-policy-controller' .
* 13:33 hnowlan@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'kube-system' for release 'calico-policy-controller' .
* 13:33 hnowlan@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'kube-system' for release 'calico-policy-controller' .
* 12:16 marostegui@cumin1001: dbctl commit (dc=all): 'Decrease db1087, vslow host weight in main, given that the CPU across s8 is now doing a lot better', diff saved to https://phabricator.wikimedia.org/P10741 and previous config saved to /var/cache/conftool/dbconfig/20200320-121628-marostegui.json
* 11:52 hnowlan@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'changeprop' for release 'staging' .
* 11:10 elukey: upload oozie 4.3.0-2 packages to thirdparty/bigtop14 on wikimedia-stretch
* 10:56 hnowlan@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'kube-system' for release 'calico-policy-controller' .
* 10:56 hnowlan@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'kube-system' for release 'calico-policy-controller' .
* 10:34 hnowlan@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'changeprop' for release 'staging' .
* 10:29 hnowlan@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'kube-system' for release 'calico-policy-controller' .
* 10:13 dcausse: repooling wdqs1006
* 09:28 moritzm: rolling restart of FPM on mw1261-mw1265 for freetype update
* 08:59 moritzm: installing freetype bugfix updates from stretch point release
* 08:47 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool es1017', diff saved to https://phabricator.wikimedia.org/P10739 and previous config saved to /var/cache/conftool/dbconfig/20200320-084730-marostegui.json
* 08:33 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool es1017', diff saved to https://phabricator.wikimedia.org/P10738 and previous config saved to /var/cache/conftool/dbconfig/20200320-083334-marostegui.json
* 07:59 XioNoX: reorder LVS BGP neighbors and add descriptions - https://gerrit.wikimedia.org/r/576320
* 07:48 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool es1017', diff saved to https://phabricator.wikimedia.org/P10737 and previous config saved to /var/cache/conftool/dbconfig/20200320-074816-marostegui.json
* 07:46 elukey: upload hadoop_2.8.5-2 (and related debs) to thirdparty/bigtop14 on wikimedia-stretch (manually rebuilt via docker after patch backports from upstream)
* 07:32 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool es1017', diff saved to https://phabricator.wikimedia.org/P10736 and previous config saved to /var/cache/conftool/dbconfig/20200320-073205-marostegui.json
* 07:26 marostegui: Restart mysql on es1017 for upgrade - [[phab:T239791|T239791]]
* 07:09 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es1017 for update [[phab:T239791|T239791]]', diff saved to https://phabricator.wikimedia.org/P10735 and previous config saved to /var/cache/conftool/dbconfig/20200320-070945-marostegui.json
* 07:09 marostegui@cumin1001: dbctl commit (dc=all): 'Promote es1014 to es3 master, this is a NOOP [[phab:T239791|T239791]]', diff saved to https://phabricator.wikimedia.org/P10734 and previous config saved to /var/cache/conftool/dbconfig/20200320-070922-marostegui.json
 
== 2020-03-19 ==
* 22:15 mholloway-shell@deploy1001: Finished deploy [mobileapps/deploy@794f099]: Update mobileapps to {{Gerrit|99869f45}} (duration: 05m 13s)
* 22:10 mholloway-shell@deploy1001: Started deploy [mobileapps/deploy@794f099]: Update mobileapps to {{Gerrit|99869f45}}
* 19:14 hashar@deploy1001: rebuilt and synchronized wikiversions files: all wikis to 1.35.0-wmf.24
* 18:30 ladsgroup@deploy1001: Synchronized php-1.35.0-wmf.24/extensions/Wikibase/lib/includes/Store/ByIdDispatchingEntityInfoBuilder.php: [[gerrit:581674{{!}}Fix 'max' to Int32EntityId::MAX conversion (T247985)]], part II (duration: 01m 07s)
* 18:24 ladsgroup@deploy1001: Synchronized php-1.35.0-wmf.24/extensions/Wikibase/data-access/src/SingleEntitySourceServices.php: [[gerrit:581674{{!}}Fix 'max' to Int32EntityId::MAX conversion (T247985)]], part I (duration: 01m 08s)
* 17:47 mutante: releases/releases-jenkins - closed firewall hole to port 80 for caching servers - kept it open just for envoy from the backends - ATS speaks https to them meanwhile
* 16:54 hashar@deploy1001: Synchronized php-1.35.0-wmf.24/extensions/RelatedArticles: Do not register "" as a style path, that breaks ResourceLoader - [[phab:T248090|T248090]] (duration: 01m 07s)
* 16:01 jeh@deploy1001: Finished deploy [horizon/deploy@ad60c2b]: update horizon designate-dashboard submodule (duration: 03m 31s)
* 15:57 jeh@deploy1001: Started deploy [horizon/deploy@ad60c2b]: update horizon designate-dashboard submodule
* 15:19 andrew@deploy1001: deploy aborted: modest css change for the hiera editing dialog (take two -- I consistently forget to rebase before doing this) (duration: 00m 00s)
* 14:54 otto@deploy1001: helmfile [EQIAD] Ran 'apply' command on namespace 'eventstreams' for release 'production' .
* 14:52 otto@deploy1001: helmfile [EQIAD] Ran 'apply' command on namespace 'eventstreams' for release 'canary' .
* 14:48 otto@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'eventstreams' for release 'canary' .
* 14:48 otto@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'eventstreams' for release 'production' .
* 13:32 hashar@deploy1001: Synchronized php: group1 wikis to 1.35.0-wmf.24 (duration: 01m 07s)
* 13:31 hashar@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.35.0-wmf.24
* 13:11 marostegui: Rename testwikidatawiki.wb_terms on db1078 - [[phab:T248086|T248086]]
* 12:33 XioNoX: push frack fw policies [[phab:T248004|T248004]]
* 11:43 Lucas_WMDE: EU SWAT done
* 11:40 lucaswerkmeister-wmde@deploy1001: Synchronized php-1.35.0-wmf.24/includes/OutputPage.php: SWAT: [[gerrit:581245{{!}}OutputPage: Fix warning when setting wgUserNewMsgRevisionId (T248049)]] (duration: 01m 08s)
* 11:15 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: {{Gerrit|e277d29}}: trwiki: Grant interface editors editprotected & editsemiprotected ([[phab:T247672|T247672]]; take II) (duration: 01m 08s)
* 11:13 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: {{Gerrit|e277d29}}: trwiki: Grant interface editors editprotected & editsemiprotected ([[phab:T247672|T247672]]) (duration: 01m 07s)
* 10:47 ema: upload atskafka 0.4 to buster-wikimedia [[phab:T237993|T237993]]
* 10:24 ladsgroup@deploy1001: Synchronized php-1.35.0-wmf.24/skins/Vector/skin.json: [[gerrit:581248{{!}}skins.vector.styles.legacy needs to define legacy feature (T247566)]] (duration: 01m 08s)
* 10:01 ema: cp: rolling ats-tls-restart to apply log format changes [[phab:T248067|T248067]] [[phab:T237993|T237993]]
* 09:26 marostegui: m2 maintenance window done [[phab:T246098|T246098]]
* 09:03 akosiaris: restart gerrit on gerrit1001 [[phab:T246098|T246098]]
* 09:02 akosiaris: restart otrs-daemon, apache on mendelevium [[phab:T246098|T246098]]
* 09:01 akosiaris: restart recommendation-api on scb [[phab:T246098|T246098]]
* 09:00 marostegui: Restart m2 primary database master - [[phab:T246098|T246098]]
* 08:48 dcausse: depooling wdqs1006 to help catching up lag
* 08:43 dcausse: restarting blazegraph on wdqs1006 ([[phab:T242453|T242453]])
* 07:54 moritzm: installing cups updates from Stretch point release
* 07:48 moritzm: installing libjaxen-java security updates from Stretch point release
* 07:07 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Update pc1008 spare situation [[phab:T247787|T247787]] (duration: 01m 09s)
* 06:49 elukey: execute 'sudo rm /etc/logrotate.d/ceph-common' on cloudvirt-dev and cloudcontrol-dev to stop daily cronspam
* 06:46 marostegui: Deploy schema change on testcommonswiki.globalimagelinks (empty table) on the s4 master [[phab:T243987|T243987]]
* 06:33 marostegui: Upgrade db1132 without restarting [[phab:T246098|T246098]]
* 00:39 twentyafterfour@deploy1001: rebuilt and synchronized wikiversions files: group0 wikiws to 1.35.0-wmf.24 refs [[phab:T233872|T233872]]
* 00:31 twentyafterfour@deploy1001: Synchronized php-1.35.0-wmf.24/skins/Vector/includes/templates/index.mustache: deploy https://gerrit.wikimedia.org/r/c/mediawiki/skins/Vector/+/581116 which reverts https://gerrit.wikimedia.org/r/c/mediawiki/skins/Vector/+/581054 refs  [[phab:T248010|T248010]] (duration: 01m 07s)
* 00:18 eileen: civicrm revision changed from {{Gerrit|a1b2cbeac1}} to {{Gerrit|1c477ff07f}}, config revision is {{Gerrit|37232d8460}}
 
== 2020-03-18 ==
* 23:31 twentyafterfour@deploy1001: Synchronized php-1.35.0-wmf.23/includes/TemplateParser.php: sync https://gerrit.wikimedia.org/r/c/mediawiki/core/+/581114/ refs [[phab:T248010|T248010]] (duration: 01m 07s)
* 23:26 twentyafterfour@deploy1001: Synchronized php-1.35.0-wmf.24/includes/TemplateParser.php: sync https://gerrit.wikimedia.org/r/c/mediawiki/core/+/581115/ (duration: 01m 08s)
* 22:22 volans@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 22:18 volans@cumin1001: START - Cookbook sre.dns.netbox
* 21:56 Krinkle: krinkle@mw1385: scap pull # clean up AdHoc debugging for [[phab:T248010|T248010]]
* 21:16 brennen@deploy1001: Synchronized php-1.35.0-wmf.24/skins/Vector/includes/templates/index.mustache: [[gerrit:581054{{!}}Change master template to force cache invalidation of partials]] (duration: 01m 06s)
* 21:11 brennen@deploy1001: Synchronized php-1.35.0-wmf.23/skins/Vector/includes/templates/index.mustache: [[gerrit:581054{{!}}Change master template to force cache invalidation of partials]] (duration: 01m 15s)
* 20:04 volans@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
* 19:58 volans@cumin1001: START - Cookbook sre.dns.netbox
* 19:49 hashar@deploy1001: rebuilt and synchronized wikiversions files: Ensure fleet wide consistency
* 19:21 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 19:21 mutante: shutting down (decom cookbook) elnath.codfw.wmnet ([[phab:T188544|T188544]])
* 19:20 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
* 19:15 fdans@deploy1001: Finished deploy [analytics/refinery@549f6a4]: deploying analytics refinery (duration: 15m 02s)
* 19:11 hashar: 1.35.0-wmf.24 is on hold: too many blockers
* 19:00 fdans@deploy1001: Started deploy [analytics/refinery@549f6a4]: deploying analytics refinery
* 18:32 Lucas_WMDE: Morning SWAT done
* 18:30 otto@deploy1001: helmfile [EQIAD] Ran 'apply' command on namespace 'eventstreams' for release 'canary' .
* 18:27 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/InitialiseSettings-labs.php: SWAT: [[gerrit:579018{{!}}Update linter whitelist w/ parsoid11's IP address (T246833)]] (beta-only) (duration: 01m 04s)
* 18:20 Lucas_WMDE: scap pull on mwdebug1001, attempting to fix mismatched wikiversions alert
* 18:14 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/InitialiseSettings-labs.php: SWAT: [[gerrit:580373{{!}}Add beta configuration for Wikibase reference formatting (T247416)]] (duration: 01m 08s)
* 18:13 otto@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'eventstreams' for release 'canary' .
* 18:13 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/Wikibase.php: SWAT: [[gerrit:580373{{!}}Add beta configuration for Wikibase reference formatting (T247416)]], take II (duration: 01m 07s)
* 18:11 otto@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'eventstreams' for release 'canary' .
* 18:11 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/Wikibase.php: SWAT: [[gerrit:580373{{!}}Add beta configuration for Wikibase reference formatting (T247416)]] (duration: 01m 07s)
* 16:43 mutante: wtp1025 - Icinga alerted it's running out of disk - 'apt-get clean' lowered disk usage from 97% to 91%
* 16:00 hashar@deploy1001: Finished scap: testwiki to 1.35.0-wmf.24 and rebuild l10n cache - [[phab:T233872|T233872]] (duration: 61m 23s)
* 14:58 hashar@deploy1001: Started scap: testwiki to 1.35.0-wmf.24 and rebuild l10n cache - [[phab:T233872|T233872]]
* 14:41 vgutierrez: disable TLS session tickets in ulsfo - [[phab:T245616|T245616]] [[phab:T170567|T170567]]
* 14:29 godog: add debug to icinga2001 - [[phab:T247538|T247538]]
* 14:28 _joe_: restarted php-fpm on mw1283, was throwing SIGILL
* 14:17 marostegui: Rename wb_terms on codfw hosts: s8 (wikidatawiki - db2081), s3 (testwikidatawiki - db2109), s4 (commonswiki, testcommonswiki - db2106)  [[phab:T208425|T208425]]
* 14:06 hashar@deploy1001: rebuilt and synchronized wikiversions files: all wikis to 1.35.0-wmf.23
* 11:59 hashar@deploy1001: Synchronized php-1.35.0-wmf.24/includes/objectcache/ObjectCache.php: objectcache: Restore keyspace for LocalServerCache service - [[phab:T247562|T247562]] (duration: 01m 07s)
* 11:57 hashar@deploy1001: Synchronized php-1.35.0-wmf.23/includes/objectcache/ObjectCache.php: objectcache: Restore keyspace for LocalServerCache service - [[phab:T247562|T247562]] (duration: 01m 10s)
* 11:42 marostegui@cumin1001: dbctl commit (dc=all): 'Decrease db1087, vslow host weight in main, given that the CPU across s8 is now doing a lot better', diff saved to https://phabricator.wikimedia.org/P10715 and previous config saved to /var/cache/conftool/dbconfig/20200318-114259-marostegui.json
* 11:17 ema: upload atskafka 0.3 to buster-wikimedia [[phab:T237993|T237993]]
* 11:16 kart_: EU Mid-day SWAT done
* 11:11 kartik@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit{{!}}579893{{!}}Enable ContentTranslation as a default tool in Malay, Azerbaijani and Estonian WPs (T246622, T246628, T246629)]], take II (duration: 01m 07s)
* 11:10 kartik@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit{{!}}579893{{!}}Enable ContentTranslation as a default tool in Malay, Azerbaijani and Estonian WPs (T246622, T246628, T246629)]] (duration: 01m 07s)
* 10:58 _joe_: setting num_retries=0 on mw2224 for eventgate-analytics in envoy ([[phab:T247484|T247484]])
* 10:58 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [[gerrit:579925{{!}}Stop writing to old term store (wb_terms table) in wikidata (T208425)]], take II (duration: 01m 06s)
* 10:55 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [[gerrit:579925{{!}}Stop writing to old term store (wb_terms table) in wikidata (T208425)]] (duration: 01m 08s)
* 10:52 _joe_: setting num_retries=0, idle_timeout=5s on mw2223 for eventgate-analytics in envoy ([[phab:T247484|T247484]])
* 10:48 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [[gerrit:579925{{!}}Stop writing to old term store in testwikidatawiki (T208425)]], take II (duration: 01m 07s)
* 10:45 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [[gerrit:579925{{!}}Stop writing to old term store in testwikidatawiki (T208425)]] (duration: 01m 07s)
* 10:33 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [[gerrit:579925{{!}}Read from the new term store everywhere (T219123)]], take II (duration: 01m 07s)
* 10:31 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [[gerrit:579925{{!}}Read from the new term store everywhere (T219123)]] (duration: 01m 07s)
* 10:14 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [[gerrit:579925{{!}}Read from the new term store everywhere (T219123)]], take II (duration: 01m 07s)
* 10:12 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [[gerrit:579925{{!}}Read from the new term store everywhere (T219123)]] (duration: 01m 08s)
* 09:43 vgutierrez: enabling inbound TLSv1.3 in upload@ulsfo - [[phab:T170567|T170567]]
* 09:18 vgutierrez: enabling inbound TLSv1.3 in cp4026 - [[phab:T170567|T170567]]
* 08:44 marostegui: Start replication pc1008 from pc1010 to get some of the new keys so it is not fully empty - [[phab:T247787|T247787]]
* 08:14 vgutierrez: upgrade ATS to 8.0.6-1wm3 in ulsfo - [[phab:T170567|T170567]]
* 07:55 moritzm: installing remaining libxslt security updates
* 07:40 oblivian@deploy1001: Synchronized wmf-config/ProductionServices.php: eventgate-analytics to use envoy everywhere (duration: 01m 10s)
* 07:08 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 07:05 marostegui@cumin1001: START - Cookbook sre.hosts.downtime
* 06:31 marostegui: Reboot pc1008 to try to get its RAID redone - [[phab:T247787|T247787]]
* 00:31 Amir1: foreachwikiindblist medium deleteEqualMessages.php --delete ([[phab:T247562|T247562]])
* 00:10 crusnov@deploy1001: Finished deploy [netbox/deploy@14256f9]: netbox 2.7.10 upgrade (duration: 02m 29s)
* 00:08 crusnov@deploy1001: Started deploy [netbox/deploy@14256f9]: netbox 2.7.10 upgrade
* 00:07 crusnov@deploy1001: Finished deploy [netbox/deploy@14256f9]: netbox 2.7.10 upgrade (duration: 01m 17s)
* 00:06 crusnov@deploy1001: Started deploy [netbox/deploy@14256f9]: netbox 2.7.10 upgrade
 
== 2020-03-17 ==
* 22:49 Amir1: warming up cache for Q80M to Q88M for new term store on db1111, db1126, db1104, db1092 ([[phab:T219123|T219123]])
* 22:17 bsitzmann@deploy1001: Finished deploy [mobileapps/deploy@0adead4]: Update mobileapps to {{Gerrit|ec6fd6e}} (duration: 06m 08s)
* 22:11 bsitzmann@deploy1001: Started deploy [mobileapps/deploy@0adead4]: Update mobileapps to {{Gerrit|ec6fd6e}}
* 21:54 Krinkle: krinkle@mw2170$ disable-puppet (Testing for [[phab:T99740|T99740]])
* 21:15 mholloway-shell@deploy1001: Synchronized wmf-config/InitialiseSettings.php: WikimediaEditorTasks: Enable Depicts counting (again) ([[phab:T247874|T247874]]) (duration: 01m 07s)
* 21:10 mholloway-shell@deploy1001: Synchronized wmf-config/InitialiseSettings.php: WikimediaEditorTasks: Enable Depicts counting ([[phab:T247874|T247874]]) (duration: 01m 07s)
* 20:50 mholloway-shell@deploy1001: Synchronized php-1.35.0-wmf.23/extensions/WikimediaEditorTasks: Fix revert counting for non-language-specific counters, take 2 ([[phab:T244974|T244974]]) (duration: 01m 12s)
* 20:33 mutante: boron - systemctl start docker-reporter-k8s-images ; systemctl start docker-reporter-releng-images
* 20:31 mutante: boron - had degraded systemd state in Icinga - systemctl start docker-reporter-base-images
* 19:54 mutante: miscweb1001 - restarted ferm, reverted live hack
* 19:53 ppchelko@deploy1001: Finished deploy [restbase/deploy@8db09ed]: Various PCS endpoints additions and fixes [[phab:T247295|T247295]] [[phab:T247096|T247096]] [[phab:T244175|T244175]] (duration: 14m 31s)
* 19:51 mutante: miscweb1001 - testing if ferm 80 firewall hole is needed for envoy, temp. disabled puppet, restarted ferm
* 19:38 ppchelko@deploy1001: Started deploy [restbase/deploy@8db09ed]: Various PCS endpoints additions and fixes [[phab:T247295|T247295]] [[phab:T247096|T247096]] [[phab:T244175|T244175]]
* 19:01 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [[gerrit:579925{{!}}Set up read new term store up to Q80M (T219123)]], take II (duration: 01m 06s)
* 19:00 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [[gerrit:579925{{!}}Set up read new term store up to Q80M (T219123)]] (duration: 01m 07s)
* 18:53 ladsgroup@deploy1001: Synchronized php-1.35.0-wmf.24/extensions/Wikibase/lib/includes/Store/Sql/Terms/DatabaseItemTermStoreWriter.php: [[gerrit:580390{{!}}Do not lock rows when there's no term returned (T247553 T246898)]], To catch the train (duration: 01m 08s)
* 18:50 otto@deploy1001: helmfile [EQIAD] Ran 'apply' command on namespace 'eventstreams' for release 'canary' .
* 18:45 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 18:45 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
* 18:41 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
* 18:39 mutante: removing mw1238 through mw1243 - decom with cookbook ([[phab:T247780|T247780]] [[phab:T245099|T245099]])
* 18:38 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
* 18:38 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 18:37 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
* 18:35 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw123[8-9].eqiad.wmnet
* 18:35 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw124[0-3].eqiad.wmnet
* 18:29 otto@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'eventstreams' for release 'canary' .
* 18:01 bsitzmann@deploy1001: Finished deploy [mobileapps/deploy@b6bff94]: Update mobileapps to {{Gerrit|3c73ca3}} (duration: 06m 06s)
* 18:00 otto@deploy1001: helmfile [EQIAD] Ran 'apply' command on namespace 'eventstreams' for release 'canary' .
* 17:58 otto@deploy1001: helmfile [EQIAD] Ran 'apply' command on namespace 'eventstreams' for release 'canary' .
* 17:56 ladsgroup@deploy1001: Synchronized php-1.35.0-wmf.23/languages/LanguageConverter.php: [[gerrit:580361{{!}}languages: Don't assume  in LanguageConverter (T235360)]] (duration: 01m 07s)
* 17:55 bsitzmann@deploy1001: Started deploy [mobileapps/deploy@b6bff94]: Update mobileapps to {{Gerrit|3c73ca3}}
* 17:55 otto@deploy1001: helmfile [EQIAD] Ran 'apply' command on namespace 'eventstreams' for release 'canary' .
* 17:53 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw124[0-3].eqiad.wmnet
* 17:53 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw123[89].eqiad.wmnet
* 17:52 Amir1: warming up cache for Q70M to Q80M for new term store on db1111, db1126, db1104, db1092 ([[phab:T219123|T219123]])
* 17:46 ladsgroup@deploy1001: Synchronized php-1.35.0-wmf.23/extensions/Wikibase/lib/includes/Store/Sql/Terms/DatabaseItemTermStoreWriter.php: [[gerrit:580352{{!}}Do not lock rows when there's no term returned (T247553 T246898)]] (duration: 01m 07s)
* 17:42 otto@deploy1001: helmfile [EQIAD] Ran 'apply' command on namespace 'eventstreams' for release 'canary' .
* 17:40 otto@deploy1001: helmfile [EQIAD] Ran 'apply' command on namespace 'eventstreams' for release 'canary' .
* 17:37 ejegg: updated payments-wiki from {{Gerrit|86ce0361f9}} to {{Gerrit|72856949a1}}
* 17:30 bearND: mobileapps deploy failed on canary, rolled back
* 17:29 bsitzmann@deploy1001: Finished deploy [mobileapps/deploy@266e6da]: Update mobileapps to {{Gerrit|6370784}} (duration: 04m 00s)
* 17:25 bsitzmann@deploy1001: Started deploy [mobileapps/deploy@266e6da]: Update mobileapps to {{Gerrit|6370784}}
* 17:24 elukey@deploy1001: Finished deploy [analytics/superset/deploy@3f3ddcb]: Upgrade PyHive to 0.6.2 (duration: 00m 43s)
* 17:24 elukey@deploy1001: Started deploy [analytics/superset/deploy@3f3ddcb]: Upgrade PyHive to 0.6.2
* 17:18 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1280.eqiad.wmnet
* 17:17 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1280.eqiad.wmnet
* 17:10 jynus: purging some old rows on pc1010 on a screen to earn some time [[phab:T247788|T247788]]
* 16:56 mutante: mw1280 - scap pull - had ancient mw version due to downtime
* 16:46 mutante: mw1280 back after long downtime due to broken RAM, added back into puppet ([[phab:T240187|T240187]])
* 16:36 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 16:36 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 16:36 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 16:36 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 15:56 brennen@deploy1001: rebuilt and synchronized wikiversions files: Reverting All wikis to 1.35.0-wmf.23
* 15:52 otto@deploy1001: helmfile [EQIAD] Ran 'apply' command on namespace 'eventstreams' for release 'canary' .
* 15:52 brennen@deploy1001: sync-wikiversions aborted: All wikis to 1.35.0-wmf.23 (duration: 05m 16s)
* 15:51 otto@deploy1001: helmfile [EQIAD] Ran 'apply' command on namespace 'eventstreams' for release 'canary' .
* 15:50 otto@deploy1001: helmfile [EQIAD] Ran 'apply' command on namespace 'eventstreams' for release 'canary' .
* 15:44 brennen@deploy1001: sync-wikiversions aborted: All wikis to 1.35.0-wmf.23 (duration: 03m 49s)
* 15:36 otto@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'eventstreams' for release 'canary' .
* 15:36 otto@deploy1001: helmfile [EQIAD] Ran 'apply' command on namespace 'eventstreams' for release 'canary' .
* 15:23 otto@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'eventstreams' for release 'canary' .
* 15:11 otto@deploy1001: helmfile [EQIAD] Ran 'apply' command on namespace 'eventstreams' for release 'canary' .
* 15:01 hashar: scap prep 1.35.0-wmf.24 and applying security patches # [[phab:T233872|T233872]]
* 15:00 otto@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'eventstreams' for release 'canary' .
* 14:57 otto@deploy1001: helmfile [EQIAD] Ran 'apply' command on namespace 'eventstreams' for release 'canary' .
* 14:44 dcausse: wdqs1010 (test server) is running a data-reload cookbook (and is probably taking longer than the expected downtime)
* 14:38 hashar: mediawiki/core git push {{Gerrit|68bc9300dc}}:wmf/1.35.0-wmf.24  to catch up with a change that got merged while branch is being cut # [[phab:T233872|T233872]]
* 14:29 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [[gerrit:579925{{!}}Set up read new term store up to Q70M (T219123)]], take II (duration: 01m 04s)
* 14:28 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [[gerrit:579925{{!}}Set up read new term store up to Q70M (T219123)]] (duration: 01m 10s)
* 14:24 marostegui: Stop mysql and restart pc1008 [[phab:T247787|T247787]]
* 14:23 otto@deploy1001: helmfile [EQIAD] Ran 'apply' command on namespace 'eventstreams' for release 'canary' .
* 14:21 otto@deploy1001: helmfile [EQIAD] Ran 'apply' command on namespace 'eventstreams' for release 'canary' .
* 14:14 ladsgroup@deploy1001: Synchronized php-1.35.0-wmf.23/extensions/Wikibase/lib/includes/Store/Sql/Terms/DatabaseItemTermStoreWriter.php: [[gerrit:580328{{!}}Store item terms at late as possible to avoid deadlocks (T247553 T246898)]] (duration: 01m 07s)
* 14:13 otto@deploy1001: helmfile [EQIAD] Ran 'apply' command on namespace 'eventstreams' for release 'canary' .
* 14:12 otto@deploy1001: helmfile [EQIAD] Ran 'apply' command on namespace 'eventstreams' for release 'canary' .
* 14:09 herron@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 14:07 akosiaris@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'kube-system' for release 'calico-policy-controller' .
* 14:07 herron@cumin1001: START - Cookbook sre.hosts.downtime
* 14:06 akosiaris@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'kube-system' for release 'calico-policy-controller' .
* 14:03 akosiaris@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'kube-system' for release 'calico-policy-controller' .
* 13:41 hashar: Branching 1.35.0-wmf.24 # [[phab:T233872|T233872]]
* 13:30 godog: stop puppet and turn on debug on icinga2001 - [[phab:T247538|T247538]]
* 12:06 cdanis@cumin1001: END (PASS) - Cookbook sre.network.cf (exit_code=0)
* 12:06 cdanis@cumin1001: START - Cookbook sre.network.cf
* 11:46 godog: test pinning icinga to a subset of cpu on icinga1001
* 11:16 akosiaris: [[phab:T242461|T242461]] undeploy restrouter. Unused service and per task to not  be used after all
* 11:16 akosiaris@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'restrouter' for release 'staging' .
* 11:15 akosiaris@deploy1001: helmfile [CODFW] Ran 'apply' command on namespace 'restrouter' for release 'production' .
* 11:15 akosiaris@deploy1001: helmfile [EQIAD] Ran 'apply' command on namespace 'restrouter' for release 'production' .
* 10:56 XioNoX: add extra prepend to LG export filter
* 10:41 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.cf (exit_code=0)
* 10:41 ayounsi@cumin1001: START - Cookbook sre.network.cf
* 10:41 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.cf (exit_code=0)
* 10:40 ayounsi@cumin1001: START - Cookbook sre.network.cf
* 10:40 jbond42: sec update for libgraphicsmagick on maps
* 10:20 godog: bounce squid on install1003 [[phab:T247759|T247759]]
* 10:07 _joe_: sudo cumin -b2 -s 50 'A:mw-jobrunner' 'restart-php7.2-fpm' [[phab:T247622|T247622]]
* 10:03 Amir1: warming up cache for Q60M to Q70M for new term store on db1111, db1126, db1104, db1092 ([[phab:T219123|T219123]])
* 10:02 ema: create kafka topic atskafka_test_webrequest_text [[phab:T247497|T247497]]
* 09:57 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.roll-restart-workers (exit_code=0)
* 09:55 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [[gerrit:579925{{!}}Set up read new term store up to Q60M (T219123)]], take II (duration: 01m 05s)
* 09:54 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [[gerrit:579925{{!}}Set up read new term store up to Q60M (T219123)]] (duration: 01m 09s)
* 09:27 elukey@cumin1001: START - Cookbook sre.hadoop.roll-restart-workers
* 09:21 ema: cp: rolling varnish-frontend-restart to decrease memory usage and apply transient storage limits [[phab:T185968|T185968]]
* 09:09 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.roll-restart-workers (exit_code=0)
* 08:39 elukey@cumin1001: START - Cookbook sre.hadoop.roll-restart-workers
* 00:57 krinkle@deploy1001: Synchronized php-1.35.0-wmf.23/extensions/Wikibase/lib/includes/Formatters/: {{Gerrit|Ic77b2c6b33a}}, [[phab:T247458|T247458]] (duration: 01m 12s)
 
== 2020-03-16 ==
* 23:14 tzatziki: reset email for "MNadrofsky (WMF)" on SUL and officewiki
* 20:58 mutante: mw1223 power down
* 20:54 mutante: powercycling mw1223
* 20:52 mutante: 5 old API appservers in eqiad removed
* 20:45 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
* 20:43 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
* 20:42 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw122[1-6].eqiad.wmnet
* 20:37 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 20:35 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
* 20:04 mutante: depool (yes->no) mw1221 - mw1226 ([[phab:T247780|T247780]])
* 20:04 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw122[1-6].eqiad.wmnet
* 19:28 bsitzmann@deploy1001: Finished deploy [mobileapps/deploy@f5600d6]: Update mobileapps to {{Gerrit|8a6e403}} (duration: 06m 48s)
* 19:26 otto@deploy1001: helmfile [EQIAD] Ran 'apply' command on namespace 'eventstreams' for release 'canary' .
* 19:24 otto@deploy1001: helmfile [EQIAD] Ran 'apply' command on namespace 'eventstreams' for release 'canary' .
* 19:23 jynus: stop replication at pc1010 at pos pc1007-bin.080617:{{Gerrit|259138670}}
* 19:21 bsitzmann@deploy1001: Started deploy [mobileapps/deploy@f5600d6]: Update mobileapps to {{Gerrit|8a6e403}}
* 19:11 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Pool pc1010 instead of pc1008 as pc1008 is overloaded (duration: 01m 06s)
* 18:38 krinkle@deploy1001: Synchronized wmf-config/: {{Gerrit|I2c3217fb3da8bb65}} (duration: 01m 07s)
* 18:36 krinkle@deploy1001: Synchronized wmf-config/CommonSettings.php: no-op, courtesy of opcache (duration: 01m 06s)
* 18:34 krinkle@deploy1001: Synchronized docroot/noc/: {{Gerrit|I2c3217fb3}} (duration: 01m 07s)
* 18:18 mforns@deploy1001: Finished deploy [analytics/refinery@1681b92]: deploying refinery to add forgotten artifacts for v0.0.118 (duration: 13m 01s)
* 18:05 mforns@deploy1001: Started deploy [analytics/refinery@1681b92]: deploying refinery to add forgotten artifacts for v0.0.118
* 17:08 Amir1: warming up cache for Q50M to Q60M for new term store on db1111, db1126, db1104, db1092 ([[phab:T219123|T219123]])
* 17:06 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [[gerrit:579925{{!}}Set up read new term store up to Q50M (T219123)]], take II (duration: 01m 08s)
* 17:03 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [[gerrit:579925{{!}}Set up read new term store up to Q50M (T219123)]] (duration: 01m 06s)
* 16:54 gehel: repooling wdqs1005
* 16:52 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: Enforce Content Security Policy if wmgUseCSP is set [[phab:T244124|T244124]] (duration: 01m 06s)
* 16:50 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Touch and secondary sync of IS for cache-busting (duration: 01m 07s)
* 16:48 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Set wmgUseCSP false everywhere [[phab:T244124|T244124]] (duration: 01m 07s)
* 16:34 krinkle@deploy1001: Synchronized wmf-config/CommonSettings.php: {{Gerrit|I498e2ebd8c9}} (duration: 01m 07s)
* 16:33 krinkle@deploy1001: Synchronized multiversion/MWConfigCacheGenerator.php: {{Gerrit|I498e2ebd8c9}} (no-op) (duration: 01m 07s)
* 16:30 krinkle@deploy1001: Synchronized wmf-config/wgConf.php: {{Gerrit|I870122f946d}} (duration: 01m 07s)
* 16:22 rlazarus: copied envoyproxy_1.13.1-1 from buster-wikimedia to stretch-wikimedia
* 16:21 krinkle@deploy1001: Synchronized wmf-config/CommonSettings.php: {{Gerrit|I08af45e2e47}} (duration: 01m 07s)
* 16:14 krinkle@deploy1001: Synchronized wmf-config/wgConf.php: {{Gerrit|Ie9002d9095ee}} (duration: 01m 08s)
* 15:04 akosiaris: [[phab:T234181|T234181]] upload apertium-recursive_0.0.1-1+wmf1 to apt.wikimedia.org jessie-wikimedia/main
* 15:04 akosiaris: [[phab:T234181|T234181]] upload apertium-anaphora_0.0.4-1+wmf1 to apt.wikimedia.org jessie-wikimedia/main
* 15:02 moritzm: rolling restart of FPM/apache on netmon* to pick up libxslt security updates
* 14:22 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [[gerrit:579925{{!}}Set up read new term store up to Q40M (T219123)]], take II (duration: 01m 06s)
* 14:22 Amir1: warming up cache for Q40M to Q50M for new term store on db1111, db1126, db1104, db1092 ([[phab:T219123|T219123]])
* 14:18 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [[gerrit:579925{{!}}Set up read new term store up to Q40M (T219123)]] (duration: 01m 07s)
* 14:16 moritzm: rolling restart of FPM on mw1261-mw1265 to pick up libxslt security updates
* 14:15 Amir1: ladsgroup@mwmaint1002:~$ mwscript extensions/Wikibase/repo/maintenance/rebuildItemTerms.php --wiki=wikidatawiki --from-id {{Gerrit|87500000}} --to-id {{Gerrit|87767570}} --batch-size=10 --sleep=5 ([[phab:T219123|T219123]])
* 14:05 moritzm: installing libxslt security updates
* 13:49 ema: upload atskafka 0.1 to buster-wikimedia [[phab:T237993|T237993]]
* 13:42 gehel: restarting blazegraph on wdqs1007
* 13:30 gehel: depooling wdqs1005 to catch up on lag
* 12:43 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool es1015', diff saved to https://phabricator.wikimedia.org/P10706 and previous config saved to /var/cache/conftool/dbconfig/20200316-124309-marostegui.json
* 12:09 Amir1: warming up cache for Q35M to Q40M for new term store on db1111, db1126, db1104, db1092 ([[phab:T219123|T219123]])
* 12:09 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [[gerrit:579913{{!}}Set up read new term store up to Q35M (T219123)]], take II (duration: 01m 07s)
* 12:05 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [[gerrit:579913{{!}}Set up read new term store up to Q35M (T219123)]] (duration: 01m 08s)
* 11:52 XioNoX: manually fix prometheus squid exporter on install1003
* 11:04 Amir1: ... for Q30M-Q35M of the new term store
* 11:04 Amir1: Warming up InnoDB buffer pool cache in db1111, db1126, db1104, db1092 ([[phab:T219123|T219123]])
* 10:55 Amir1: warming up db1026 for up to Q35M for the new term store ([[phab:T219123|T219123]])
* 10:47 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool es1015', diff saved to https://phabricator.wikimedia.org/P10705 and previous config saved to /var/cache/conftool/dbconfig/20200316-104723-marostegui.json
* 10:45 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: "Set term store to WRITE_BOTH for all of Wikidata" ([[phab:T219123|T219123]]), take II (duration: 01m 07s)
* 10:43 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: "Set term store to WRITE_BOTH for all of Wikidata" ([[phab:T219123|T219123]]) (duration: 01m 13s)
* 10:40 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool es1015', diff saved to https://phabricator.wikimedia.org/P10704 and previous config saved to /var/cache/conftool/dbconfig/20200316-104002-marostegui.json
* 10:36 elukey: roll restart of recommendation service on scb* as attempt to fix the flapping alerts - [[phab:T247732|T247732]]
* 10:28 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool es1015', diff saved to https://phabricator.wikimedia.org/P10703 and previous config saved to /var/cache/conftool/dbconfig/20200316-102829-marostegui.json
* 10:17 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool es1015', diff saved to https://phabricator.wikimedia.org/P10702 and previous config saved to /var/cache/conftool/dbconfig/20200316-101707-marostegui.json
* 10:10 marostegui: Stop mysql for upgrade on es1015 [[phab:T239791|T239791]]
* 10:02 Amir1: start of ladsgroup@mwmaint1002:~$ mwscript extensions/Wikibase/repo/maintenance/rebuildItemTerms.php --wiki=wikidatawiki --batch-size=50 --sleep=0 --file=15march2217-holes-nulls.list on screen ([[phab:T219123|T219123]])
* 09:32 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es1015 for upgrade and restart [[phab:T239791|T239791]]', diff saved to https://phabricator.wikimedia.org/P10701 and previous config saved to /var/cache/conftool/dbconfig/20200316-093228-marostegui.json
* 09:30 marostegui@cumin1001: dbctl commit (dc=all): 'Promote es1011 to es2 master, this is a NOOP [[phab:T239791|T239791]]', diff saved to https://phabricator.wikimedia.org/P10700 and previous config saved to /var/cache/conftool/dbconfig/20200316-093048-marostegui.json
* 08:16 marostegui: Review and enable events on recently migrated 10.4 hosts - [[phab:T247728|T247728]]
* 08:02 ema: cp4025 restart trafficserver-tls to clear 'tls process restarted' alert [[phab:T241593|T241593]] [[phab:T185968|T185968]]
* 07:57 moritzm: installing libxslt security updates
* 07:52 ema: cp4025: restart varnish-fe to clear 'child restarted' alert [[phab:T185968|T185968]]
* 07:47 moritzm: installing lxml security updates
* 07:14 moritzm: installing libgd2 security updates on jessie
* 06:54 moritzm: removing some library packages from jessie/stretch after labstore1006/1007 dist-upgrade to buster
* 06:38 _joe_: restart envoy with 10 requests per connection on mw2231, [[phab:T247484|T247484]]
 
== 2020-03-15 ==
* 23:20 jynus: removed oldest snapshots on dbprov1001
* 13:27 dcausse: restarting blazegraph on wdqs1005 [[phab:T242453|T242453]]
* 07:01 marostegui: Restart logrotate on db1107
 
== 2020-03-14 ==
* 08:33 elukey: run kafka preferred-replica-election on kafka-jumbo1001 - [[phab:T247561|T247561]]
* 08:32 elukey: run systemctl restart systemd-timedated.service on stat1008
* 01:06 mutante: planet1001 - copying /etc/apt/sources.list from planet2001 to planet1001 - apt-get update - apt-get install openssh-server [[phab:T247592|T247592]]
 
== 2020-03-13 ==
* 23:12 bstorm_: rebooting labstore1006 for upgrade to stretch [[phab:T224583|T224583]]
* 22:49 herron@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 22:45 herron@cumin1001: START - Cookbook sre.hosts.downtime
* 22:27 bstorm_: rebooting labstore1006 [[phab:T224583|T224583]]
* 22:21 bstorm_: downtimed labstore1006 for upgrades [[phab:T224583|T224583]]
* 20:02 mutante: stat1005 - ip link set en01 down ; ip link set en01 up ([[phab:T247561|T247561]])
* 19:30 bstorm_: rebooting labstore1007 for upgrade to buster [[phab:T224583|T224583]]
* 18:51 shdubsh: test increase fs.inotify.max_user_watches on prometheus2004
* 17:58 hnowlan@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'changeprop' for release 'staging' .
* 17:21 mutante: removed squid from install1002/install2002 (formerly webproxy.(eqiad{{!}}codfw).wmnet until 2 days ago, replaced by install1003/install2003) [[phab:T224576|T224576]]
* 17:20 elukey@cumin1001: END (PASS) - Cookbook sre.kafka.roll-restart-mirror-maker (exit_code=0)
* 17:09 hnowlan@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'changeprop' for release 'staging' .
* 17:08 elukey@cumin1001: START - Cookbook sre.kafka.roll-restart-mirror-maker
* 17:00 krinkle@deploy1001: Synchronized dblists/: {{Gerrit|If4d17082f}}, {{Gerrit|Iadba5b01b}}, {{Gerrit|Ibe16d5f09}} (duration: 01m 07s)
* 16:58 krinkle@deploy1001: Synchronized wmf-config/config/: {{Gerrit|Ibe16d5f09}} (duration: 01m 10s)
* 16:51 bstorm_: rebooting labstore1007 for stretch upgrade [[phab:T224583|T224583]]
* 16:37 krinkle@deploy1001: Synchronized wmf-config/config/: {{Gerrit|If4d17082f}}, {{Gerrit|Iadba5b01b}} (duration: 01m 11s)
* 16:18 herron@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 16:15 herron@cumin1001: START - Cookbook sre.hosts.downtime
* 16:04 bstorm_: rebooting labstore1007 for first cycle of upgrades [[phab:T224583|T224583]]
* 16:02 elukey: powercycle kafka-jumbo1006 after switch port changed - [[phab:T247561|T247561]]
* 15:28 _joe_: switch envoy logging to debug on mw2231
* 14:57 cdanis: [[phab:T247586|T247586]] ✔️ cdanis@grafana1002.eqiad.wmnet ~ 🕥☕ sudo systemctl restart apache2.service
* 12:48 Urbanecm: Password reset for SUL User:FuduBot ([[phab:T247601|T247601]])
* 12:16 akosiaris@deploy1001: Synchronized wmf-config/ProductionServices.php: (no justification provided) (duration: 01m 16s)
* 10:26 moritzm: installing python-werkzeug security updates
* 10:09 vgutierrez: upload trafficserver 8.0.6-1wm3 to apt.wm.o (buster) - [[phab:T245616|T245616]]
* 09:55 _joe_: running puppet across appservers to switch to http for eventgate-analytics [[phab:T247484|T247484]]
* 09:17 moritzm: installing perl updates from Stretch point release
* 06:16 vgutierrez: triggering OCSP response updates in eqiad,codfw and ulsfo - [[phab:T247584|T247584]]
* 06:12 vgutierrez: triggering OCSP response updates in eqsin - [[phab:T247584|T247584]]
* 06:05 vgutierrez: triggering OCSP response updates in esams - [[phab:T247584|T247584]]
* 00:20 shdubsh: reload prometheus@ops on prometheus1003
* 00:08 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw215[8-9].codfw.wmnet
* 00:08 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw216[0-9].codfw.wmnet
* 00:08 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw217[1-2].codfw.wmnet
* 00:04 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 00:04 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
 
== 2020-03-12 ==
* 23:58 shdubsh: reload prometheus@ops on prometheus1004
* 23:42 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw217[1-2].codfw.wmnet
* 23:41 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw216[0-9].codfw.wmnet
* 23:40 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw215[89].codfw.wmnet
* 23:26 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw215[89].codfw.wmnet
* 23:25 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2178.codfw.wmnet
* 23:21 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw216[0-6].codfw.wmnet
* 22:45 krinkle@deploy1001: Synchronized multiversion/: {{Gerrit|I403a9890a9}} (duration: 01m 07s)
* 22:44 krinkle@deploy1001: Synchronized dblists/: {{Gerrit|I403a9890a9}} (duration: 01m 09s)
* 22:41 mforns@deploy1001: Finished deploy [analytics/refinery@906bd1e]: deploying refinery together with refinery-source v0.0.118 (duration: 12m 20s)
* 22:28 mforns@deploy1001: Started deploy [analytics/refinery@906bd1e]: deploying refinery together with refinery-source v0.0.118
* 22:15 otto@deploy1001: helmfile [EQIAD] Ran 'apply' command on namespace 'eventstreams' for release 'canary' .
* 22:15 otto@deploy1001: helmfile [EQIAD] Ran 'apply' command on namespace 'eventstreams' for release 'production' .
* 22:09 otto@deploy1001: helmfile [EQIAD] Ran 'apply' command on namespace 'eventstreams' for release 'canary' .
* 22:07 bstorm_: moving all nfs traffic off labstore1007 and to labstore1006 for upgrades [[phab:T224583|T224583]]
* 22:06 otto@deploy1001: helmfile [CODFW] Ran 'apply' command on namespace 'eventstreams' for release 'canary' .
* 22:05 otto@deploy1001: helmfile [CODFW] Ran 'apply' command on namespace 'eventstreams' for release 'production' .
* 22:02 otto@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'eventstreams' for release 'canary' .
* 22:02 otto@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'eventstreams' for release 'production' .
* 21:47 mutante: doc1001 - had to manually run "/usr/local/sbin/build-envoy-config -c /etc/envoy/" to get envoy tls_terminator_443 listener into the config or envoy would not listen on 443 ([[phab:T210411|T210411]])
* 21:19 otto@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'eventstreams' for release 'canary' .
* 21:19 otto@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'eventstreams' for release 'production' .
* 21:06 foks: remove one file for legal compliance
* 20:49 ottomata: kafka-jumbo1006 - stopping kafka and powercycling - [[phab:T247561|T247561]]
* 20:15 brennen@deploy1001: rebuilt and synchronized wikiversions files: Revert "all wikis to 1.35.0-wmf.23"
* 20:11 brennen@deploy1001: rebuilt and synchronized wikiversions files: all wikis to 1.35.0-wmf.23
* 20:10 mutante: revoking puppet cert for doc.discovery.wmnet, re-creating with doc.wikimedia.org as SAN
* 20:09 eileen: civicrm revision changed from {{Gerrit|a301076871}} to {{Gerrit|a1b2cbeac1}}, config revision is {{Gerrit|37232d8460}}
* 19:46 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Revert "Set term store to WRITE_BOTH for all of Wikidata", take II (duration: 01m 06s)
* 19:45 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Revert "Set term store to WRITE_BOTH for all of Wikidata" (duration: 01m 08s)
* 19:20 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 19:18 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 18:53 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 18:51 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 18:43 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 18:40 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 18:34 ebernhardson@deploy1001: Synchronized php-1.35.0-wmf.23/extensions/WikimediaEvents/modules/ext.wikimediaEvents/searchSatisfaction.js: cirrus: Start Glent m0 AB test (duration: 01m 07s)
* 18:31 tgr@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: re-sync InitialiseSettings.php (duration: 01m 08s)
* 18:29 tgr@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:579326{{!}}Set term store to WRITE_BOTH for all of Wikidata (T219123)]] (duration: 01m 07s)
* 18:23 tgr@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:579348{{!}}Switch kowiki to use ORES for suggested edits topics]] (duration: 01m 08s)
* 18:19 hnowlan@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'changeprop' for release 'staging' .
* 17:48 elukey: increase via 'kadmin.local modprinc -maxlife 2d $user' all max ticket lifetimes of Kerberos User principals on the krb1001's KDC (changes will be propagated to codfw automatically)
* 17:48 hnowlan@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'changeprop' for release 'staging' .
* 17:35 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 17:32 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 17:17 elukey: execute modprinc -maxlife 2d krbtgt/WIKIMEDIA via kadmin.local on krb1001 (will be propagated to 2001 automatically)
* 17:12 volans@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 17:06 volans@cumin1001: START - Cookbook sre.dns.netbox
* 17:03 akosiaris@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'eventgate-analytics' for release 'canary' .
* 17:03 akosiaris@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'eventgate-analytics' for release 'production' .
* 16:53 akosiaris@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'eventgate-analytics' for release 'canary' .
* 16:53 akosiaris@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'eventgate-analytics' for release 'production' .
* 16:41 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 16:39 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 16:37 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 16:36 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime
* 16:28 volans: restarting icinga, acting up on command file (frack awol and downtimes)
* 16:20 otto@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'eventstreams' for release 'canary' .
* 16:20 otto@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'eventstreams' for release 'production' .
* 16:15 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 16:13 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 16:07 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 16:07 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime
* 14:53 rlazarus: uploading envoyproxy_1.13.1-1 (upgrade from 1.12.2) T246868
* 14:51 elukey: restart kpropd daemon on krb2001
* 14:26 volans@cumin2001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 14:23 volans@cumin2001: START - Cookbook sre.dns.netbox
* 14:07 hnowlan@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'changeprop' for release 'staging' .
* 13:35 mvolz@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'zotero' for release 'staging' .
* 13:26 otto@deploy1001: helmfile [EQIAD] Ran 'apply' command on namespace 'eventstreams' for release 'canary' .
* 13:26 otto@deploy1001: helmfile [EQIAD] Ran 'apply' command on namespace 'eventstreams' for release 'production' .
* 13:21 hnowlan@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'changeprop' for release 'staging' .
* 12:56 hnowlan@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'changeprop' for release 'staging' .
* 12:33 volans@cumin2001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
* 12:29 volans@cumin2001: START - Cookbook sre.dns.netbox
* 12:00 tarrow: EU SWAT done
* 12:00 tarrow@deploy1001: Synchronized php-1.35.0-wmf.23/extensions/TwoColConflict: SWAT: [[gerrit:579221{{!}}Detect whether an edit came from VisualEditor (T245722)]] (duration: 01m 10s)
* 11:42 akosiaris@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'eventstreams' for release 'canary' .
* 11:42 akosiaris@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'eventstreams' for release 'production' .
* 11:39 volans@cumin2001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
* 11:38 akosiaris@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'eventstreams' for release 'canary' .
* 11:38 akosiaris@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'eventstreams' for release 'production' .
* 11:37 volans@cumin2001: START - Cookbook sre.dns.netbox
* 11:23 akosiaris@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'eventstreams' for release 'canary' .
* 11:23 akosiaris@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'eventstreams' for release 'production' .
* 11:09 elukey: roll restart of krb-kdc on krb1001/krb2001 to pick up new ticket lifetime settings (10h -> 48h)
* 11:09 akosiaris@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'eventstreams' for release 'canary' .
* 11:09 akosiaris@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'eventstreams' for release 'production' .
* 11:05 akosiaris@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'eventstreams' for release 'canary' .
* 11:05 akosiaris@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'eventstreams' for release 'production' .
* 11:02 volans@cumin2001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
* 10:59 volans@cumin2001: START - Cookbook sre.dns.netbox
* 10:58 akosiaris@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'eventstreams' for release 'canary' .
* 10:58 akosiaris@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'eventstreams' for release 'production' .
* 10:39 volans@cumin2001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
* 10:39 volans@cumin2001: START - Cookbook sre.dns.netbox
* 10:29 hnowlan@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'changeprop' for release 'staging' .
* 10:28 akosiaris@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'eventstreams' for release 'canary' .
* 10:28 akosiaris@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'eventstreams' for release 'production' .
* 10:13 akosiaris@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'eventstreams' for release 'canary' .
* 10:13 akosiaris@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'eventstreams' for release 'production' .
* 09:58 akosiaris@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'eventstreams' for release 'canary' .
* 09:58 akosiaris@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'eventstreams' for release 'production' .
* 08:55 oblivian@deploy1001: Synchronized wmf-config/ProductionServices.php: switch ores to use envoy (duration: 01m 08s)
* 08:36 addshore: start "rebuild" of Q87 -> 87.5 million for [[phab:T219123|T219123]]
* 08:27 addshore@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Write to new term store up to Q87.5 million, was 87 ([[phab:T219123|T219123]]) cache bust (duration: 01m 08s)
* 08:26 addshore@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Write to new term store up to Q87.5 million, was 87 ([[phab:T219123|T219123]]) (duration: 01m 12s)
* 08:12 elukey: push new install/webproxy terms for analytics-in4/6 to cr1/cr2-eqiad
* 07:28 kart_: Updated cxserver charts to 0.0.13
* 07:26 kartik@deploy1001: helmfile [CODFW] Ran 'apply' command on namespace 'cxserver' for release 'production' .
* 07:24 kartik@deploy1001: helmfile [EQIAD] Ran 'apply' command on namespace 'cxserver' for release 'production' .
* 07:22 kartik@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'cxserver' for release 'staging' .
* 06:14 kart_: Updated cxserver to 2020-03-12-041806-production and added sectionmapping db config ([[phab:T246316|T246316]], [[phab:T243430|T243430]], [[phab:T202276|T202276]])
* 06:11 kartik@deploy1001: helmfile [CODFW] Ran 'apply' command on namespace 'cxserver' for release 'production' .
* 06:08 kartik@deploy1001: helmfile [EQIAD] Ran 'apply' command on namespace 'cxserver' for release 'production' .
* 06:03 kartik@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'cxserver' for release 'staging' .
* 01:51 mholloway-shell@deploy1001: Synchronized php-1.35.0-wmf.22/extensions/WikimediaEditorTasks: Revert 'Fix revert counting for non-language-specific counters' ([[phab:T247479|T247479]]) (duration: 01m 08s)
* 01:13 ebernhardson@deploy1001: Finished deploy [search/mjolnir/deploy@4e2ea09]: resolve deadlock in bulk_daemon (duration: 10m 05s)
* 01:03 ebernhardson@deploy1001: Started deploy [search/mjolnir/deploy@4e2ea09]: resolve deadlock in bulk_daemon
* 00:56 ebernhardson@deploy1001: Synchronized php-1.35.0-wmf.22/extensions/CirrusSearch/includes/Maintenance/Reindexer.php: wait around for counts to match up in reindexer before giving up (duration: 01m 08s)
* 00:53 ebernhardson: wmf.23 cirrussearch: wait around for counts to match before giving up
* 00:52 ebernhardson@deploy1001: Synchronized php-1.35.0-wmf.23/extensions/CirrusSearch/includes/Maintenance/Reindexer.php: (no justification provided) (duration: 01m 12s)
* 00:23 mutante: switching webproxy.eqiad.wmnet / webproxy.codfw.wmnet to install[12]003 (squids on buster)
* 00:16 mholloway-shell@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Disable depicts counter due to code revert ([[phab:T244974|T244974]]), take 2 (duration: 01m 07s)
* 00:14 mholloway-shell@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Disable depicts counter due to code revert ([[phab:T244974|T244974]]) (duration: 01m 07s)
* 00:00 mholloway-shell@deploy1001: Synchronized php-1.35.0-wmf.23/extensions/WikimediaEditorTasks: Revert 'Fix revert counting for non-language-specific counters' ([[phab:T247479|T247479]]) (duration: 01m 07s)
 
== 2020-03-11 ==
* 23:52 mholloway-shell@deploy1001: Synchronized wmf-config/InitialiseSettings.php: WikimediaEditorTasks: Enable depicts counter ([[phab:T244974|T244974]]) (Simon says) (duration: 01m 07s)
* 23:51 mholloway-shell@deploy1001: Synchronized wmf-config/InitialiseSettings.php: WikimediaEditorTasks: Enable depicts counter ([[phab:T244974|T244974]]) (duration: 01m 07s)
* 23:51 cdanis@cumin1001: END (PASS) - Cookbook sre.network.cf (exit_code=0)
* 23:51 cdanis@cumin1001: START - Cookbook sre.network.cf
* 23:42 mholloway-shell@deploy1001: Synchronized php-1.35.0-wmf.22/extensions/WikimediaEditorTasks: Fix revert counting for non-language-specific counters (duration: 01m 08s)
* 23:40 mholloway-shell@deploy1001: Synchronized php-1.35.0-wmf.23/extensions/WikimediaEditorTasks: Fix revert counting for non-language-specific counters (duration: 01m 11s)
* 23:18 krinkle@deploy1001: Synchronized multiversion/MWConfigCacheGenerator.php: {{Gerrit|I91b3a18317af}} (duration: 01m 08s)
* 22:39 volans@cumin2001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
* 22:39 volans@cumin2001: START - Cookbook sre.dns.netbox
* 22:28 mutante: depooled mw2167 through mw2172 - rack C3 ([[phab:T247018|T247018]])
* 22:27 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw217[012].codfw.wmnet
* 22:26 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw216[789].codfw.wmnet
* 22:16 James_F: Purged trwiki logos from ATS/Varnish for [[phab:T247445|T247445]]
* 22:15 jforrester@deploy1001: Synchronized static/images/project-logos/: [trwiki] Restore pre-unblocking celebration logo versions [[phab:T247445|T247445]] (duration: 01m 09s)
* 21:42 ebernhardson: stop all mjolnir-kafka-bulk-daemons in eqiad except 1 to assist debugging
* 21:33 ebernhardson@deploy1001: Finished deploy [search/mjolnir/deploy@2726268]: Downgrade kafka_python to 1.4.3 (duration: 05m 45s)
* 21:27 ebernhardson@deploy1001: Started deploy [search/mjolnir/deploy@2726268]: Downgrade kafka_python to 1.4.3
* 20:53 cdanis@cumin2001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 20:52 cdanis@cumin2001: START - Cookbook sre.hosts.decommission
* 20:26 brennen@deploy1001: Synchronized php: group1 wikis to 1.35.0-wmf.23 (duration: 01m 03s)
* 20:25 brennen@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.35.0-wmf.23
* 20:09 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 20:06 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime
* 20:03 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 20:00 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime
* 19:53 otto@deploy1001: helmfile [EQIAD] Ran 'apply' command on namespace 'eventstreams' for release 'canary' .
* 18:36 ejegg: updated payments-wiki from {{Gerrit|03765b53de}} to {{Gerrit|86ce0361f9}}
* 18:36 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 18:33 elukey@cumin1001: START - Cookbook sre.hosts.downtime
* 18:25 volans: temporary disabled puppet on A:dns-auth to deploy g/578506 [[phab:T233183|T233183]]
* 18:24 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Touch and secondary sync of IS for cache-busting (duration: 01m 07s)
* 18:22 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Stop setting wmgParsoidVariant, no longer read [[phab:T229015|T229015]] (duration: 01m 07s)
* 18:21 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: Stop using wmgParsoidVariant, no longer varied [[phab:T229015|T229015]] (duration: 01m 08s)
* 17:53 otto@deploy1001: helmfile [EQIAD] Ran 'apply' command on namespace 'eventstreams' for release 'canary' .
* 16:53 moritzm: removed cas-2020-03-09.log and cas-2020-03-10.log on idp2001 (huge logs due to some debug log level for tracking down a performance issue)
* 16:36 hnowlan@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'changeprop' for release 'staging' .
* 16:25 liw: restarting Zuul to clear queues (in collab with James F)
* 14:49 hnowlan@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'changeprop' for release 'staging' .
* 14:41 volans: installed spicerack to 0.0.32-1 on cumin[12]001
* 14:25 akosiaris@deploy1001: Synchronized wmf-config/ProductionServices.php: (no justification provided) (duration: 01m 11s)
* 14:24 hnowlan@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'changeprop' for release 'staging' .
* 14:23 akosiaris@deploy1001: sync aborted: wmf-config/ProductionServices.php (duration: 02m 42s)
* 14:22 volans: uploaded spicerack_0.0.32-1_amd64.deb to apt.wikimedia.org stretch-wikimedia
* 14:21 akosiaris: switch mediawiki to talk to eventgate-analytics via envoy
* 14:21 akosiaris@deploy1001: Started scap: wmf-config/ProductionServices.php
* 14:20 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 14:18 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime
* 14:09 akosiaris: [[phab:T239779|T239779]] upload apertium-swe-nor_0.3.1-1+wmf1 to apt.wikimedia.org jessie-wikimedia/main
* 14:08 akosiaris: [[phab:T239779|T239779]] upload apertium-swe-dan_0.8.1-1+wmf1 to apt.wikimedia.org jessie-wikimedia/main
* 14:08 akosiaris: [[phab:T239779|T239779]] upload apertium-nno-nob_1.3.0-1+wmf1 to apt.wikimedia.org jessie-wikimedia/main
* 14:08 akosiaris: [[phab:T239779|T239779]] upload apertium-dan-nor_1.4.1-1+wmf1 to apt.wikimedia.org jessie-wikimedia/main
* 13:01 thcipriani: restarting gerrit unstuck the zuul server ([[phab:T246973|T246973]])
* 12:54 thcipriani: restarting gerrit to try to fix thread deadlock on zuul (cf: [[phab:T246973|T246973]] )
* 12:43 akosiaris: disconnect+connect jenkins from gearman server.
* 12:38 akosiaris@deploy1001: helmfile [CODFW] Ran 'apply' command on namespace 'eventstreams' for release 'canary' .
* 12:38 akosiaris@deploy1001: helmfile [CODFW] Ran 'apply' command on namespace 'eventstreams' for release 'production' .
* 12:32 akosiaris@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'eventstreams' for release 'canary' .
* 12:32 akosiaris@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'eventstreams' for release 'production' .
* 12:23 akosiaris@deploy1001: helmfile [EQIAD] Ran 'apply' command on namespace 'eventstreams' for release 'canary' .
* 12:23 akosiaris@deploy1001: helmfile [EQIAD] Ran 'apply' command on namespace 'eventstreams' for release 'production' .
* 12:00 Lucas_WMDE: EU SWAT done
* 12:00 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/CommonSettings-labs.php: SWAT (prod no-op): [[gerrit:578520{{!}}Don't use TwoColConflict as beta feature on labs (T247292)]], take II (duration: 01m 07s)
* 11:59 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/CommonSettings-labs.php: SWAT (prod no-op): [[gerrit:578520{{!}}Don't use TwoColConflict as beta feature on labs (T247292)]] (duration: 01m 09s)
* 11:56 lucaswerkmeister-wmde@deploy1001: Synchronized php-1.35.0-wmf.23/extensions/WikibaseCirrusSearch/: SWAT: [[gerrit:578805{{!}}Wrap property EntitySearchHelper in PropertyDataTypeSearchHelper]] (duration: 01m 05s)
* 11:48 vgutierrez: restarting ats-backend on cp2004
* 11:25 moritzm: restarting slapd on serpens/seaborgium to pick up libidn security updates
* 11:21 hnowlan@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'changeprop' for release 'staging' .
* 11:16 _joe_: restarting zuul and zuul-merger on contint1001, they're stuck
* 11:11 moritzm: restarting exim on MXes to pick up libidn security updates
* 11:03 marostegui@cumin1001: dbctl commit (dc=all): 'Give normal 100 weight to es3 old masters - [[phab:T246072|T246072]]', diff saved to https://phabricator.wikimedia.org/P10685 and previous config saved to /var/cache/conftool/dbconfig/20200311-110334-marostegui.json
* 10:59 marostegui: Remove Mostrevisions from mwmaint1002 [[phab:T239072|T239072]]
* 10:42 vgutierrez: pool ncredir5002 - [[phab:T243391|T243391]]
* 10:38 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly give weight to es3 old masters - [[phab:T246072|T246072]]', diff saved to https://phabricator.wikimedia.org/P10684 and previous config saved to /var/cache/conftool/dbconfig/20200311-103802-marostegui.json
* 10:34 moritzm: restarting Apache on graphite*. kibana, netmon* to pick up libidn security updates
* 09:53 moritzm: installing postgresql-9.6 security updates on maps*
* 09:46 vgutierrez: depool and reimage ncredir5002 with buster - [[phab:T243391|T243391]]
* 09:43 marostegui: Finish es3 maintenance window [[phab:T246072|T246072]]
* 09:29 marostegui: Disconnect replication on all es3 hosts [[phab:T246072|T246072]]
* 09:18 marostegui: Set es1017 (es3 master) in read only on mysql [[phab:T246072|T246072]]
* 09:09 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Set es3 as RO - [[phab:T246072|T246072]] (duration: 01m 08s)
* 09:06 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Set es3 as RO - [[phab:T246072|T246072]] (duration: 01m 08s)
* 09:01 moritzm: restarting Apache on puppetboard, people.wikimedia.org, webperf*, bromine, miscweb* to pick up libidn security updates
* 08:40 moritzm: installing libidn security updates
* 08:33 moritzm: installing libvpx security updates
* 08:10 oblivian@deploy1001: Synchronized wmf-config/ProductionServices.php: switch wdqs-internal to use envoy (duration: 01m 21s)
* 07:38 marostegui: fixcopyrightwiki_p views from labs hosts [[phab:T246055|T246055]]
* 01:40 ejegg: restarted recurring donation charge jobs
* 01:27 ejegg: restarted fundraising orphan donation rectifier jobs
* 01:20 ejegg: updated fundraising CiviCRM from {{Gerrit|c4b81b19b0}} to {{Gerrit|a301076871}}
* 01:19 ejegg: disabled orphan rectifier jobs for upgrade
* 00:24 eileen: civicrm revision changed from {{Gerrit|35651da117}} to {{Gerrit|c4b81b19b0}}, config revision is {{Gerrit|71c8cda115}}
* 00:16 dzahn@cumin1001: conftool action : set/weight=15; selector: name=mw2375.codfw.wmnet
* 00:15 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw237[0246].codfw.wmnet
* 00:15 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw236[68].codfw.wmnet
* 00:14 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw23[66-76].codfw.wmnet
 
== 2020-03-10 ==
* 23:53 volker-e@deploy1001: Finished deploy [design/style-guide@8eb1daf]: Deploy design/style-guide:  (duration: 00m 05s)
* 23:53 volker-e@deploy1001: Started deploy [design/style-guide@8eb1daf]: Deploy design/style-guide:
* 23:50 ejegg: disabled recurring donation charge jobs for upgrade
* 23:48 mutante: mw2376 - systemctl start apache2
* 23:45 dzahn@cumin1001: conftool action : set/weight=15; selector: name=mw2376.codfw.wmnet
* 23:45 ebernhardson: start in-place reindex procedure on kowiki against eqiad and codfw