You are browsing a read-only backup copy of Wikitech. The primary site can be found at wikitech.wikimedia.org

Server Admin Log: Difference between revisions

From Wikitech-static
Jump to navigation Jump to search
imported>Stashbot
(hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0))
imported>Stashbot
(cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host kubernetes1024.eqiad.wmnet with OS bullseye)
 
(667 intermediate revisions by 4 users not shown)
Line 1: Line 1:
== 2020-11-27 ==
== 2022-12-07 ==
* 17:30 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 00:21 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host kubernetes1024.eqiad.wmnet with OS bullseye
* 17:30 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime
* 00:18 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host kubernetes1023.eqiad.wmnet with OS bullseye
* 15:50 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 15:50 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime
* 15:13 elukey@cumin1001: END (PASS) - Cookbook sre.zookeeper.roll-restart-zookeeper (exit_code=0)
* 15:06 elukey@cumin1001: START - Cookbook sre.zookeeper.roll-restart-zookeeper
* 14:56 elukey@cumin1001: END (PASS) - Cookbook sre.zookeeper.roll-restart-zookeeper (exit_code=0)
* 14:50 elukey: roll restart zookeeper on druid* nodes for openjdk upgrades
* 14:50 elukey@cumin1001: START - Cookbook sre.zookeeper.roll-restart-zookeeper
* 10:52 jayme: updated helmfile to 0.135.0-1 on deploy*,contint*
* 10:51 jayme: updated helm-diff to 3.1.3-1 on contint*
* 10:49 jayme: updated helm to 2.17.0-1 on deploy*,contint*,chartmuseum*
* 10:06 jayme: updated helm and helmfile on deploy2001
* 10:04 jayme@deploy2001: helmfile [staging] Ran 'sync' command on namespace 'blubberoid' for release 'staging' .
* 10:00 jayme: imported helm 2.17.0 into buster-wikimedia and stretch-wikimedia
* 08:55 elukey@cumin1001: END (PASS) - Cookbook sre.druid.roll-restart-workers (exit_code=0)
* 08:05 elukey: roll restart druid public cluster for openjdk upgrades
* 08:04 elukey@cumin1001: START - Cookbook sre.druid.roll-restart-workers
* 06:39 marostegui: Stop mysql on es1015 [[phab:T268810|T268810]]
* 06:38 marostegui@cumin1001: dbctl commit (dc=all): 'Remove es1015 from dbctl', diff saved to https://phabricator.wikimedia.org/P13454 and previous config saved to /var/cache/conftool/dbconfig/20201127-063846-marostegui.json
* 06:30 marostegui: Remove es1016 from tendril and zarcillo [[phab:T268812|T268812]]
* 06:29 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 06:25 marostegui@cumin1001: START - Cookbook sre.hosts.decommission
* 06:19 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es1015 for decommissioning [[phab:T268810|T268810]]', diff saved to https://phabricator.wikimedia.org/P13453 and previous config saved to /var/cache/conftool/dbconfig/20201127-061929-marostegui.json


== 2020-11-26 ==
== 2022-12-06 ==
* 17:18 jayme: downgrade helmfile to 0.125.2-1 on deploy*
* 23:08 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes1024.eqiad.wmnet with OS bullseye
* 17:05 jayme: updated helm-diff and helmfile on deploy100* and deploy200*
* 23:06 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes1023.eqiad.wmnet with OS bullseye
* 16:34 jayme: imported helm-diff 3.1.3-1 into buster-wikimedia and stretch-wikimedia
* 23:04 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host kubernetes1024.eqiad.wmnet with OS bullseye
* 15:01 moritzm: installing libonig security updates
* 22:52 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kubernetes1023.mgmt.eqiad.wmnet with reboot policy FORCED
* 14:44 marostegui@cumin1001: dbctl commit (dc=all): 'db1126 (re)pooling @ 100%: After schema change', diff saved to https://phabricator.wikimedia.org/P13452 and previous config saved to /var/cache/conftool/dbconfig/20201126-144446-root.json
* 22:40 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host kubernetes1023.mgmt.eqiad.wmnet with reboot policy FORCED
* 14:38 elukey@cumin1001: END (PASS) - Cookbook sre.druid.roll-restart-workers (exit_code=0)
* 22:39 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 14:36 moritzm: installing zeromq3 security updates for stretch
* 22:39 cmjohnson@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: kubernetes1023 - cmjohnson@cumin1001"
* 14:35 hnowlan@puppetmaster1001: conftool action : set/pooled=yes:weight=10; selector: dc=codfw,cluster=maps,service=kartotherian,name=maps2001.codfw.wmnet
* 22:37 tgr_: UTC late backports done
* 14:35 jbond42: failing idp back to idp2001
* 22:36 cmjohnson@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: kubernetes1023 - cmjohnson@cumin1001"
* 14:29 marostegui@cumin1001: dbctl commit (dc=all): 'db1126 (re)pooling @ 66%: After schema change', diff saved to https://phabricator.wikimedia.org/P13451 and previous config saved to /var/cache/conftool/dbconfig/20201126-142942-root.json
* 22:30 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 14:24 hnowlan@puppetmaster1001: conftool action : set/pooled=no:weight=0; selector: dc=codfw,cluster=maps,service=kartotherian,name=maps2001.codfw.wmnet
* 22:26 tgr@deploy1002: Finished scap: Backport for [[gerrit:865131{{!}}Fix UserDatabaseHelper::hasMainspaceEdits() (T324285)]], [[gerrit:865130{{!}}Fix UserDatabaseHelper::hasMainspaceEdits() (T324285)]] (duration: 18m 58s)
* 14:24 hnowlan@puppetmaster1001: conftool action : set/pooled=yes:weight=10; selector: dc=codfw,cluster=maps,service=kartotherian,name=maps2004.codfw.wmnet
* 22:09 tgr@deploy1002: tgr and tgr: Backport for [[gerrit:865131{{!}}Fix UserDatabaseHelper::hasMainspaceEdits() (T324285)]], [[gerrit:865130{{!}}Fix UserDatabaseHelper::hasMainspaceEdits() (T324285)]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet
* 14:23 moritzm: remove labtestpuppetmaster2001 from debmonitor [[phab:T258103|T258103]]
* 22:07 tgr@deploy1002: Started scap: Backport for [[gerrit:865131{{!}}Fix UserDatabaseHelper::hasMainspaceEdits() (T324285)]], [[gerrit:865130{{!}}Fix UserDatabaseHelper::hasMainspaceEdits() (T324285)]]
* 14:14 marostegui@cumin1001: dbctl commit (dc=all): 'db1126 (re)pooling @ 33%: After schema change', diff saved to https://phabricator.wikimedia.org/P13450 and previous config saved to /var/cache/conftool/dbconfig/20201126-141439-root.json
* 21:51 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes1024.eqiad.wmnet with OS bullseye
* 13:52 elukey: roll restart druid daemons on druid analytics to pick up new openjdk upgrades
* 21:12 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cephosd1002.mgmt.eqiad.wmnet with reboot policy FORCED
* 13:52 root@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 21:07 cwhite@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host logstash1029.eqiad.wmnet with OS bullseye
* 13:52 root@cumin1001: START - Cookbook sre.hosts.downtime
* 20:49 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host cephosd1002.mgmt.eqiad.wmnet with reboot policy FORCED
* 13:52 elukey@cumin1001: START - Cookbook sre.druid.roll-restart-workers
* 20:45 cwhite@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on logstash1029.eqiad.wmnet with reason: host reimage
* 13:50 moritzm: installing python3.5 security updates
* 20:42 cwhite@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on logstash1029.eqiad.wmnet with reason: host reimage
* 13:32 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1126 for schema change', diff saved to https://phabricator.wikimedia.org/P13449 and previous config saved to /var/cache/conftool/dbconfig/20201126-133204-marostegui.json
* 20:25 cwhite@cumin2002: START - Cookbook sre.hosts.reimage for host logstash1029.eqiad.wmnet with OS bullseye
* 13:29 marostegui@cumin1001: dbctl commit (dc=all): 'db1109 (re)pooling @ 100%: After schema change', diff saved to https://phabricator.wikimedia.org/P13448 and previous config saved to /var/cache/conftool/dbconfig/20201126-132918-root.json
* 20:25 cwhite@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host logstash1029.eqiad.wmnet with OS bullseye
* 13:19 hnowlan@puppetmaster1001: conftool action : set/pooled=no:weight=0; selector: dc=codfw,cluster=maps,service=kartotherian,name=maps2004.codfw.wmnet
* 20:24 cwhite@cumin2002: START - Cookbook sre.hosts.reimage for host logstash1029.eqiad.wmnet with OS bullseye
* 13:14 marostegui@cumin1001: dbctl commit (dc=all): 'db1109 (re)pooling @ 66%: After schema change', diff saved to https://phabricator.wikimedia.org/P13447 and previous config saved to /var/cache/conftool/dbconfig/20201126-131414-root.json
* 20:24 cwhite@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host logstash1028.eqiad.wmnet with OS bullseye
* 13:07 hnowlan: testing depooling kartotherian on maps2004 to reduce load
* 20:22 cwhite@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['logstash1029']
* 13:07 hnowlan@puppetmaster1001: conftool action : set/pooled=no:weight=10; selector: dc=codfw,cluster=maps,service=kartotherian,name=maps2004.codfw.wmnet
* 20:16 cwhite@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['logstash1029']
* 13:01 jbond42: update puppet_compiler on compiler1003
* 20:13 cwhite@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['logstash1029']
* 12:59 marostegui@cumin1001: dbctl commit (dc=all): 'db1109 (re)pooling @ 33%: After schema change', diff saved to https://phabricator.wikimedia.org/P13446 and previous config saved to /var/cache/conftool/dbconfig/20201126-125911-root.json
* 20:06 eileen: civicrm upgraded from {{Gerrit|c9761fee}} to {{Gerrit|3ae68ab4}}
* 12:42 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1109 for schema change', diff saved to https://phabricator.wikimedia.org/P13445 and previous config saved to /var/cache/conftool/dbconfig/20201126-124253-marostegui.json
* 20:06 cwhite@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['logstash1029']
* 12:31 jbond42: fail over idp.wikimedia.org
* 20:01 cwhite@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on logstash1028.eqiad.wmnet with reason: host reimage
* 11:55 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 20:00 robh@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti5005.eqsin.wmnet with OS bullseye
* 11:53 moritzm: rebooting seaborgium for kernel update
* 19:57 cwhite@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on logstash1028.eqiad.wmnet with reason: host reimage
* 11:53 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
* 19:56 robh@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti5007.eqsin.wmnet with OS bullseye
* 11:40 marostegui: Deploy schema change on s8 codfw - there will be lag on s8 codfw - [[phab:T268004|T268004]]
* 19:55 robh@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti5006.eqsin.wmnet with OS bullseye
* 11:16 moritzm: restarting archiva to pick up Java security update
* 19:45 ejegg: payments-wiki upgraded from {{Gerrit|a875f2b9}} to {{Gerrit|1914b6c7}}
* 10:43 marostegui@cumin1001: dbctl commit (dc=all): 'db1136 (re)pooling @ 100%: After schema change', diff saved to https://phabricator.wikimedia.org/P13442 and previous config saved to /var/cache/conftool/dbconfig/20201126-104324-root.json
* 19:40 cwhite@cumin2002: START - Cookbook sre.hosts.reimage for host logstash1028.eqiad.wmnet with OS bullseye
* 10:41 elukey@cumin1001: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0)
* 19:39 robh@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti5005.eqsin.wmnet with reason: host reimage
* 10:28 marostegui@cumin1001: dbctl commit (dc=all): 'db1136 (re)pooling @ 75%: After schema change', diff saved to https://phabricator.wikimedia.org/P13441 and previous config saved to /var/cache/conftool/dbconfig/20201126-102820-root.json
* 19:38 cwhite@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host logstash1027.eqiad.wmnet with OS bullseye
* 10:13 marostegui@cumin1001: dbctl commit (dc=all): 'db1136 (re)pooling @ 50%: After schema change', diff saved to https://phabricator.wikimedia.org/P13440 and previous config saved to /var/cache/conftool/dbconfig/20201126-101317-root.json
* 19:37 robh@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti5007.eqsin.wmnet with reason: host reimage
* 09:58 marostegui@cumin1001: dbctl commit (dc=all): 'db1136 (re)pooling @ 25%: After schema change', diff saved to https://phabricator.wikimedia.org/P13439 and previous config saved to /var/cache/conftool/dbconfig/20201126-095813-root.json
* 19:35 robh@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti5006.eqsin.wmnet with reason: host reimage
* 09:47 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1136 for schema change', diff saved to https://phabricator.wikimedia.org/P13438 and previous config saved to /var/cache/conftool/dbconfig/20201126-094729-marostegui.json
* 19:32 ladsgroup@deploy1002: Finished scap: Backport for [[gerrit:865128{{!}}Avoid syntax error on hover in grade C browsers (T324514)]] (duration: 12m 43s)
* 09:47 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1094 after schema change', diff saved to https://phabricator.wikimedia.org/P13437 and previous config saved to /var/cache/conftool/dbconfig/20201126-094702-marostegui.json
* 19:32 robh@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti5007.eqsin.wmnet with reason: host reimage
* 09:46 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1094 for schema change', diff saved to https://phabricator.wikimedia.org/P13436 and previous config saved to /var/cache/conftool/dbconfig/20201126-094639-marostegui.json
* 19:32 robh@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti5005.eqsin.wmnet with reason: host reimage
* 09:45 marostegui@cumin1001: dbctl commit (dc=all): 'db1079 (re)pooling @ 100%: After schema change', diff saved to https://phabricator.wikimedia.org/P13435 and previous config saved to /var/cache/conftool/dbconfig/20201126-094538-root.json
* 19:32 robh@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti5006.eqsin.wmnet with reason: host reimage
* 09:38 marostegui: Stop mysql on es1016 for decommission
* 19:32 cwhite@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['logstash1028']
* 09:30 marostegui@cumin1001: dbctl commit (dc=all): 'db1079 (re)pooling @ 75%: After schema change', diff saved to https://phabricator.wikimedia.org/P13434 and previous config saved to /var/cache/conftool/dbconfig/20201126-093035-root.json
* 19:22 cwhite@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['logstash1028']
* 09:26 ema: deployment-cache-text06: upgrade Varnish to 6.0.7-1wm1 [[phab:T268736|T268736]]
* 19:21 cwhite@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['logstash1028']
* 09:15 marostegui@cumin1001: dbctl commit (dc=all): 'db1079 (re)pooling @ 50%: After schema change', diff saved to https://phabricator.wikimedia.org/P13433 and previous config saved to /var/cache/conftool/dbconfig/20201126-091532-root.json
* 19:21 ladsgroup@deploy1002: ladsgroup and jdlrobson: Backport for [[gerrit:865128{{!}}Avoid syntax error on hover in grade C browsers (T324514)]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet
* 09:00 marostegui@cumin1001: dbctl commit (dc=all): 'db1079 (re)pooling @ 25%: After schema change', diff saved to https://phabricator.wikimedia.org/P13432 and previous config saved to /var/cache/conftool/dbconfig/20201126-090028-root.json
* 19:19 ladsgroup@deploy1002: Started scap: Backport for [[gerrit:865128{{!}}Avoid syntax error on hover in grade C browsers (T324514)]]
* 08:49 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1079 for schema change', diff saved to https://phabricator.wikimedia.org/P13431 and previous config saved to /var/cache/conftool/dbconfig/20201126-084903-marostegui.json
* 19:19 demon@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.40.0-wmf.13  refs [[phab:T320518|T320518]]
* 08:40 elukey: roll restart cassandra on aqs10* for openjdk upgrades
* 19:16 cwhite@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on logstash1027.eqiad.wmnet with reason: host reimage
* 08:40 elukey@cumin1001: START - Cookbook sre.cassandra.roll-restart
* 19:14 cwhite@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['logstash1028']
* 08:09 godog: swift eqiad-prod: add weight to ms-be106[0-3] - [[phab:T268435|T268435]]
* 19:12 cwhite@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on logstash1027.eqiad.wmnet with reason: host reimage
* 08:08 marostegui: Deploy schema change on s7 codfw - there will be lag on s7 codfw - [[phab:T268004|T268004]]
* 19:03 robh@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti5007.eqsin.wmnet with OS bullseye
* 07:25 marostegui@cumin1001: dbctl commit (dc=all): 'db1138 (re)pooling @ 100%: After schema change', diff saved to https://phabricator.wikimedia.org/P13430 and previous config saved to /var/cache/conftool/dbconfig/20201126-072506-root.json
* 19:03 robh@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti5006.eqsin.wmnet with OS bullseye
* 07:15 marostegui@cumin1001: dbctl commit (dc=all): 'db1085 (re)pooling @ 100%: After cloning clouddb hosts', diff saved to https://phabricator.wikimedia.org/P13429 and previous config saved to /var/cache/conftool/dbconfig/20201126-071514-root.json
* 19:03 robh@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti5005.eqsin.wmnet with OS bullseye
* 07:12 marostegui: Enable GTID on clouddb1018:3317 clouddb1014:3317 [[phab:T267090|T267090]]
* 18:57 robh@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['ganeti5006']
* 07:10 marostegui@cumin1001: dbctl commit (dc=all): 'db1138 (re)pooling @ 75%: After schema change', diff saved to https://phabricator.wikimedia.org/P13428 and previous config saved to /var/cache/conftool/dbconfig/20201126-071003-root.json
* 18:57 robh@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['ganeti5007']
* 07:00 marostegui@cumin1001: dbctl commit (dc=all): 'db1085 (re)pooling @ 75%: After cloning clouddb hosts', diff saved to https://phabricator.wikimedia.org/P13427 and previous config saved to /var/cache/conftool/dbconfig/20201126-070010-root.json
* 18:57 robh@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['ganeti5005']
* 06:55 marostegui@cumin1001: dbctl commit (dc=all): 'db1138 (re)pooling @ 50%: After schema change', diff saved to https://phabricator.wikimedia.org/P13426 and previous config saved to /var/cache/conftool/dbconfig/20201126-065500-root.json
* 18:56 cwhite@cumin2002: START - Cookbook sre.hosts.reimage for host logstash1027.eqiad.wmnet with OS bullseye
* 06:45 marostegui@cumin1001: dbctl commit (dc=all): 'db1085 (re)pooling @ 50%: After cloning clouddb hosts', diff saved to https://phabricator.wikimedia.org/P13425 and previous config saved to /var/cache/conftool/dbconfig/20201126-064507-root.json
* 18:55 cwhite@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['logstash1027']
* 06:39 marostegui@cumin1001: dbctl commit (dc=all): 'db1138 (re)pooling @ 25%: After schema change', diff saved to https://phabricator.wikimedia.org/P13424 and previous config saved to /var/cache/conftool/dbconfig/20201126-063956-root.json
* 18:45 cwhite@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['logstash1027']
* 06:32 marostegui@cumin1001: dbctl commit (dc=all): 'Remove es1016 from dbctl', diff saved to https://phabricator.wikimedia.org/P13423 and previous config saved to /var/cache/conftool/dbconfig/20201126-063234-marostegui.json
* 18:45 robh@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ganeti5007']
* 06:30 marostegui@cumin1001: dbctl commit (dc=all): 'db1085 (re)pooling @ 25%: After cloning clouddb hosts', diff saved to https://phabricator.wikimedia.org/P13422 and previous config saved to /var/cache/conftool/dbconfig/20201126-063003-root.json
* 18:45 robh@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ganeti5006']
* 06:28 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es1016 for decommissioning', diff saved to https://phabricator.wikimedia.org/P13421 and previous config saved to /var/cache/conftool/dbconfig/20201126-062811-marostegui.json
* 18:44 robh@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ganeti5005']
* 06:17 marostegui: Stop mysql on db1124:3315 to clone clouddb1016:3315 [[phab:T267090|T267090]]
* 18:43 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['logstash1027']
* 06:15 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1138 for schema change', diff saved to https://phabricator.wikimedia.org/P13420 and previous config saved to /var/cache/conftool/dbconfig/20201126-061552-marostegui.json
* 18:42 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['logstash1027']
* 06:15 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1143', diff saved to https://phabricator.wikimedia.org/P13419 and previous config saved to /var/cache/conftool/dbconfig/20201126-061459-marostegui.json
* 18:42 robh@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['dns5003']
* 06:14 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1143 for schema change', diff saved to https://phabricator.wikimedia.org/P13418 and previous config saved to /var/cache/conftool/dbconfig/20201126-061432-marostegui.json
* 18:42 robh@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['lvs5006']
* 06:08 ryankemper: [[phab:T268770|T268770]] [eqiad] Finished rolling restart of cirrus eqiad. All cirrus elasticsearch restarts are now complete (cloudelastic, relforge, eqiad, codfw)
* 18:42 robh@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['lvs5005']
* 06:05 ryankemper@cumin1001: END (PASS) - Cookbook sre.elasticsearch.rolling-restart (exit_code=0)
* 18:42 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['logstash1027']
* 04:24 ryankemper: [[phab:T268770|T268770]] [eqiad] Begin rolling restart of cirrus eqiad, 3 nodes at a time
* 18:39 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest2002.codfw.wmnet with OS bullseye
* 04:17 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-restart
* 18:33 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['logstash1027']
* 03:07 krinkle@deploy1001: Synchronized wmf-config/mc.php: {{Gerrit|I805699ecfa}} (duration: 00m 58s)
* 18:32 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['logstash1027']
* 18:31 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['logstash1027']
* 18:31 robh@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['dns5003']
* 18:31 robh@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['lvs5006']
* 18:31 robh@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['lvs5005']
* 18:28 robh@cumin2002: END (ERROR) - Cookbook sre.hardware.upgrade-firmware (exit_code=97) upgrade firmware for hosts ['logstash1028']
* 18:28 robh@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['logstash1028']
* 18:27 cwhite@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['logstash1028']
* 18:27 cwhite@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['logstash1028']
* 18:26 robh@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host lvs5005.mgmt.eqsin.wmnet with reboot policy FORCED
* 18:18 cwhite@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host logstash1028.eqiad.wmnet with OS bullseye
* 18:08 robh@cumin2002: START - Cookbook sre.hosts.provision for host lvs5005.mgmt.eqsin.wmnet with reboot policy FORCED
* 18:07 robh@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host lvs5006
* 18:07 robh@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host lvs5006
* 18:07 robh@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 18:06 xcollazo@deploy1002: Finished deploy [airflow-dags/platform_eng@4925134]: Revert Deploying image_suggestions 0.5.0 on platform_eng Airflow instance (duration: 00m 09s)
* 18:06 robh@cumin2002: START - Cookbook sre.dns.netbox
* 18:05 xcollazo@deploy1002: Started deploy [airflow-dags/platform_eng@4925134]: Revert Deploying image_suggestions 0.5.0 on platform_eng Airflow instance
* 18:02 robh@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti5007.mgmt.eqsin.wmnet with reboot policy FORCED
* 18:02 robh@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti5006.mgmt.eqsin.wmnet with reboot policy FORCED
* 18:02 cwhite@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['logstash1028']
* 18:02 cwhite@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['logstash1028']
* 17:56 cwhite@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['logstash1027']
* 17:56 cwhite@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['logstash1027']
* 17:47 robh@cumin2002: START - Cookbook sre.hosts.provision for host ganeti5007.mgmt.eqsin.wmnet with reboot policy FORCED
* 17:47 robh@cumin2002: START - Cookbook sre.hosts.provision for host ganeti5006.mgmt.eqsin.wmnet with reboot policy FORCED
* 17:47 robh@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host dns5003.mgmt.eqsin.wmnet with reboot policy FORCED
* 17:47 robh@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host lvs5006.mgmt.eqsin.wmnet with reboot policy FORCED
* 17:47 robh@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti5005.mgmt.eqsin.wmnet with reboot policy FORCED
* 17:29 robh@cumin2002: START - Cookbook sre.hosts.provision for host ganeti5005.mgmt.eqsin.wmnet with reboot policy FORCED
* 17:29 robh@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host lvs5005.mgmt.eqsin.wmnet with reboot policy FORCED
* 17:29 cwhite@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host logstash1027.eqiad.wmnet with OS bullseye
* 17:28 robh@cumin2002: START - Cookbook sre.hosts.provision for host dns5003.mgmt.eqsin.wmnet with reboot policy FORCED
* 17:27 robh@cumin2002: START - Cookbook sre.hosts.provision for host lvs5006.mgmt.eqsin.wmnet with reboot policy FORCED
* 17:27 robh@cumin2002: START - Cookbook sre.hosts.provision for host lvs5005.mgmt.eqsin.wmnet with reboot policy FORCED
* 17:25 robh@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host dns5003
* 17:25 robh@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host dns5003
* 17:25 robh@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti5007
* 17:25 robh@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti5007
* 17:24 robh@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti5006
* 17:24 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host sretest2002.codfw.wmnet with OS bullseye
* 17:24 robh@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti5006
* 17:24 robh@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti5005
* 17:23 robh@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti5005
* 17:22 cwhite@cumin2002: START - Cookbook sre.hosts.reimage for host logstash1028.eqiad.wmnet with OS bullseye
* 17:21 robh@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host lvs5006
* 17:21 robh@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host lvs5006
* 17:17 cwhite@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host logstash1029.eqiad.wmnet with OS bullseye
* 17:13 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/thumbor: apply
* 17:12 cwhite@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host logstash1028.eqiad.wmnet with OS bullseye
* 17:03 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/thumbor: apply
* 17:01 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/thumbor: apply
* 16:51 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/thumbor: apply
* 16:50 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply
* 16:50 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/thumbor: apply
* 16:48 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply
* 16:48 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/thumbor: apply
* 16:47 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/thumbor: apply
* 16:47 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/thumbor: apply
* 16:46 kostajh: UTC afternoon backports done
* 16:44 kharlan@deploy1002: Finished scap: Backport for [[gerrit:860867{{!}}GrowthExperiments: Start oldimpact experiment (T323526)]] (duration: 10m 54s)
* 16:35 kharlan@deploy1002: kharlan and kharlan: Backport for [[gerrit:860867{{!}}GrowthExperiments: Start oldimpact experiment (T323526)]] synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet
* 16:33 kharlan@deploy1002: Started scap: Backport for [[gerrit:860867{{!}}GrowthExperiments: Start oldimpact experiment (T323526)]]
* 16:32 cwhite@cumin2002: START - Cookbook sre.hosts.reimage for host logstash1027.eqiad.wmnet with OS bullseye
* 16:30 kharlan@deploy1002: Finished scap: Backport for [[gerrit:862840{{!}}GrowthExperiments: Enable new impact module on pilot wikis (T323686)]] (duration: 10m 14s)
* 16:23 cwhite@cumin2002: START - Cookbook sre.hosts.reimage for host logstash1028.eqiad.wmnet with OS bullseye
* 16:21 kharlan@deploy1002: kharlan and kharlan: Backport for [[gerrit:862840{{!}}GrowthExperiments: Enable new impact module on pilot wikis (T323686)]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet
* 16:21 robh@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host lvs5005
* 16:21 robh@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host lvs5005
* 16:21 cwhite@cumin2002: START - Cookbook sre.hosts.reimage for host logstash1029.eqiad.wmnet with OS bullseye
* 16:21 robh@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 16:21 robh@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: eqsin new hosts - robh@cumin2002"
* 16:19 kharlan@deploy1002: Started scap: Backport for [[gerrit:862840{{!}}GrowthExperiments: Enable new impact module on pilot wikis (T323686)]]
* 16:18 kharlan@deploy1002: backport aborted:  (duration: 02m 53s)
* 16:16 robh@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: eqsin new hosts - robh@cumin2002"
* 16:15 robh@cumin2002: START - Cookbook sre.dns.netbox
* 16:13 kharlan@deploy1002: Finished scap: Backport for [[gerrit:864909{{!}}User impact: Do not show impact module if user has no mainspace edits (T324285)]], [[gerrit:865082{{!}}Localisation updates from https://translatewiki.net.]], [[gerrit:864911{{!}}NewImpact: Show "999+" when we could not count edits/thanks (T324286)]] (duration: 29m 43s)
* 16:12 robh@cumin2002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
* 16:10 robh@cumin2002: START - Cookbook sre.dns.netbox
* 16:02 kharlan@deploy1002: kharlan and urbanecm and kharlan: Backport for [[gerrit:864909{{!}}User impact: Do not show impact module if user has no mainspace edits (T324285)]], [[gerrit:865082{{!}}Localisation updates from https://translatewiki.net.]], [[gerrit:864911{{!}}NewImpact: Show "999+" when we could not count edits/thanks (T324286)]] synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.
* 15:45 xcollazo@deploy1002: Finished deploy [airflow-dags/platform_eng@6377d4c]: Deploying image_suggestions 0.5.0 on platform_eng Airflow instance (duration: 00m 17s)
* 15:44 xcollazo@deploy1002: Started deploy [airflow-dags/platform_eng@6377d4c]: Deploying image_suggestions 0.5.0 on platform_eng Airflow instance
* 15:43 kharlan@deploy1002: Started scap: Backport for [[gerrit:864909{{!}}User impact: Do not show impact module if user has no mainspace edits (T324285)]], [[gerrit:865082{{!}}Localisation updates from https://translatewiki.net.]], [[gerrit:864911{{!}}NewImpact: Show "999+" when we could not count edits/thanks (T324286)]]
* 15:41 reedy@deploy1002: Synchronized php-1.40.0-wmf.13/extensions/SecurePoll/includes/Pages/ListPager.php: [[phab:T324556|T324556]] (duration: 07m 01s)
* 15:33 reedy@deploy1002: Synchronized php-1.40.0-wmf.12/extensions/SecurePoll/includes/Pages/ListPager.php: [[phab:T324556|T324556]] (duration: 07m 13s)
* 15:20 kharlan@deploy1002: Finished scap: Backport for [[gerrit:865077{{!}}Localisation updates from https://translatewiki.net.]] (duration: 10m 48s)
* 15:13 kharlan@deploy1002: kharlan and urbanecm: Backport for [[gerrit:865077{{!}}Localisation updates from https://translatewiki.net.]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet
* 15:10 kharlan@deploy1002: Started scap: Backport for [[gerrit:865077{{!}}Localisation updates from https://translatewiki.net.]]
* 14:52 kharlan@deploy1002: Finished scap: Backport for [[gerrit:864915{{!}}Instrumentation: Monitor navigation duration, transferSize, first paint (T324198)]] (duration: 10m 07s)
* 14:44 kharlan@deploy1002: kharlan and kharlan: Backport for [[gerrit:864915{{!}}Instrumentation: Monitor navigation duration, transferSize, first paint (T324198)]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet
* 14:42 kharlan@deploy1002: Started scap: Backport for [[gerrit:864915{{!}}Instrumentation: Monitor navigation duration, transferSize, first paint (T324198)]]
* 14:34 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
* 14:34 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/api-gateway: apply
* 14:34 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 14:34 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/api-gateway: apply
* 14:33 elukey@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
* 14:32 elukey@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
* 14:32 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
* 14:31 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/api-gateway: apply
* 14:31 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
* 14:31 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/api-gateway: apply
* 14:30 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/api-gateway: apply
* 14:30 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/api-gateway: apply
* 14:28 jgiannelos@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mobileapps: apply
* 14:27 jgiannelos@deploy1002: helmfile [eqiad] START helmfile.d/services/mobileapps: apply
* 14:25 jgiannelos@deploy1002: helmfile [codfw] DONE helmfile.d/services/mobileapps: apply
* 14:24 jgiannelos@deploy1002: helmfile [codfw] START helmfile.d/services/mobileapps: apply
* 14:23 jgiannelos@deploy1002: helmfile [staging] DONE helmfile.d/services/mobileapps: apply
* 14:23 jgiannelos@deploy1002: helmfile [staging] START helmfile.d/services/mobileapps: apply
* 14:21 kharlan@deploy1002: Finished scap: Backport for [[gerrit:864919{{!}}NewImpact: Adjust hasMainspaceEditsCache check (T324285)]] (duration: 09m 04s)
* 14:15 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host stat1004.eqiad.wmnet
* 14:13 kharlan@deploy1002: kharlan and kharlan: Backport for [[gerrit:864919{{!}}NewImpact: Adjust hasMainspaceEditsCache check (T324285)]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet
* 14:12 kharlan@deploy1002: Started scap: Backport for [[gerrit:864919{{!}}NewImpact: Adjust hasMainspaceEditsCache check (T324285)]]
* 14:04 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host stat1004.eqiad.wmnet
* 13:18 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply
* 13:17 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/thumbor: apply
* 13:17 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/thumbor: apply
* 13:16 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/thumbor: apply
* 13:04 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/thumbor: apply
* 13:03 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/thumbor: apply
* 13:02 jmm@cumin2002: END (PASS) - Cookbook sre.wdqs.restart-nginx (exit_code=0) rolling restart_daemons on A:wdqs-all
* 13:00 kharlan@deploy1002: Finished scap: Backport for [[gerrit:864912{{!}}Revert "resourceloader: Modern ES6 code should be forced to target mobile" (T323542)]] (duration: 07m 57s)
* 12:54 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/thumbor: apply
* 12:54 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/thumbor: apply
* 12:54 kharlan@deploy1002: kharlan and kharlan: Backport for [[gerrit:864912{{!}}Revert "resourceloader: Modern ES6 code should be forced to target mobile" (T323542)]] synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet
* 12:52 jmm@cumin2002: START - Cookbook sre.wdqs.restart-nginx rolling restart_daemons on A:wdqs-all
* 12:52 kharlan@deploy1002: Started scap: Backport for [[gerrit:864912{{!}}Revert "resourceloader: Modern ES6 code should be forced to target mobile" (T323542)]]
* 12:49 moritzm: installing glibc security updates on buster
* 12:43 jmm@cumin2002: END (PASS) - Cookbook sre.wdqs.restart-nginx (exit_code=0) rolling restart_daemons on A:wcqs-public
* 12:29 jnuche@deploy1002: Pruned MediaWiki: 1.40.0-wmf.10 (duration: 02m 09s)
* 12:27 jmm@cumin2002: START - Cookbook sre.wdqs.restart-nginx rolling restart_daemons on A:wcqs-public
* 12:27 jmm@cumin2002: END (FAIL) - Cookbook sre.wdqs.restart-nginx (exit_code=1) rolling restart_daemons on A:wcqs-public
* 12:27 jnuche@deploy1002: Finished scap: testwikis wikis to 1.40.0-wmf.13  refs [[phab:T320518|T320518]] (duration: 05m 52s)
* 12:21 jnuche@deploy1002: Started scap: testwikis wikis to 1.40.0-wmf.13  refs [[phab:T320518|T320518]]
* 12:14 jnuche@deploy1002: Started scap: testwikis wikis to 1.40.0-wmf.13  refs [[phab:T320518|T320518]]
* 12:10 jmm@cumin2002: START - Cookbook sre.wdqs.restart-nginx rolling restart_daemons on A:wcqs-public
* 11:22 jmm@cumin2002: END (PASS) - Cookbook sre.misc-clusters.roll-restart-reboot-docker-registry (exit_code=0) rolling restart_daemons on A:docker-registry
* 11:20 jmm@cumin2002: START - Cookbook sre.misc-clusters.roll-restart-reboot-docker-registry rolling restart_daemons on A:docker-registry
* 10:59 kostajh: UTC morning deploys done
* 10:56 moritzm: installing freetype security updates
* 10:48 kharlan@deploy1002: Finished scap: Backport for [[gerrit:864910{{!}}NewImpact: Show "999+" when we could not count edits/thanks (T324286)]] (duration: 31m 25s)
* 10:36 kharlan@deploy1002: kharlan and kharlan: Backport for [[gerrit:864910{{!}}NewImpact: Show "999+" when we could not count edits/thanks (T324286)]] synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet
* 10:16 kharlan@deploy1002: Started scap: Backport for [[gerrit:864910{{!}}NewImpact: Show "999+" when we could not count edits/thanks (T324286)]]
* 09:37 kharlan@deploy1002: Finished scap: Backport for [[gerrit:864908{{!}}User impact: Do not show impact module if user has no mainspace edits (T324285)]] (duration: 28m 05s)
* 09:11 kharlan@deploy1002: kharlan and kharlan: Backport for [[gerrit:864908{{!}}User impact: Do not show impact module if user has no mainspace edits (T324285)]] synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet
* 09:09 kharlan@deploy1002: Started scap: Backport for [[gerrit:864908{{!}}User impact: Do not show impact module if user has no mainspace edits (T324285)]]
* 06:44 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on dbstore1005.eqiad.wmnet with reason: Maintenance
* 06:44 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on dbstore1005.eqiad.wmnet with reason: Maintenance
* 06:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1201 ([[phab:T323907|T323907]])', diff saved to https://phabricator.wikimedia.org/P42426 and previous config saved to /var/cache/conftool/dbconfig/20221206-064402-ladsgroup.json
* 06:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1201', diff saved to https://phabricator.wikimedia.org/P42425 and previous config saved to /var/cache/conftool/dbconfig/20221206-062856-ladsgroup.json
* 06:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1201', diff saved to https://phabricator.wikimedia.org/P42424 and previous config saved to /var/cache/conftool/dbconfig/20221206-061349-ladsgroup.json
* 05:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1201 ([[phab:T323907|T323907]])', diff saved to https://phabricator.wikimedia.org/P42423 and previous config saved to /var/cache/conftool/dbconfig/20221206-055843-ladsgroup.json
* 05:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2180 ([[phab:T323907|T323907]])', diff saved to https://phabricator.wikimedia.org/P42422 and previous config saved to /var/cache/conftool/dbconfig/20221206-054030-ladsgroup.json
* 05:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1201 ([[phab:T323907|T323907]])', diff saved to https://phabricator.wikimedia.org/P42421 and previous config saved to /var/cache/conftool/dbconfig/20221206-053911-ladsgroup.json
* 05:39 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1201.eqiad.wmnet with reason: Maintenance
* 05:38 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1201.eqiad.wmnet with reason: Maintenance
* 05:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1187 ([[phab:T323907|T323907]])', diff saved to https://phabricator.wikimedia.org/P42420 and previous config saved to /var/cache/conftool/dbconfig/20221206-053850-ladsgroup.json
* 05:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2180', diff saved to https://phabricator.wikimedia.org/P42419 and previous config saved to /var/cache/conftool/dbconfig/20221206-052523-ladsgroup.json
* 05:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1187', diff saved to https://phabricator.wikimedia.org/P42418 and previous config saved to /var/cache/conftool/dbconfig/20221206-052343-ladsgroup.json
* 05:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2180', diff saved to https://phabricator.wikimedia.org/P42417 and previous config saved to /var/cache/conftool/dbconfig/20221206-051016-ladsgroup.json
* 05:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1187', diff saved to https://phabricator.wikimedia.org/P42416 and previous config saved to /var/cache/conftool/dbconfig/20221206-050837-ladsgroup.json
* 04:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2180 ([[phab:T323907|T323907]])', diff saved to https://phabricator.wikimedia.org/P42415 and previous config saved to /var/cache/conftool/dbconfig/20221206-045510-ladsgroup.json
* 04:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1187 ([[phab:T323907|T323907]])', diff saved to https://phabricator.wikimedia.org/P42414 and previous config saved to /var/cache/conftool/dbconfig/20221206-045330-ladsgroup.json
* 04:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1187 ([[phab:T323907|T323907]])', diff saved to https://phabricator.wikimedia.org/P42413 and previous config saved to /var/cache/conftool/dbconfig/20221206-043348-ladsgroup.json
* 04:33 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1187.eqiad.wmnet with reason: Maintenance
* 04:33 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1187.eqiad.wmnet with reason: Maintenance
* 04:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1180 ([[phab:T323907|T323907]])', diff saved to https://phabricator.wikimedia.org/P42412 and previous config saved to /var/cache/conftool/dbconfig/20221206-043326-ladsgroup.json
* 04:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2180 ([[phab:T323907|T323907]])', diff saved to https://phabricator.wikimedia.org/P42411 and previous config saved to /var/cache/conftool/dbconfig/20221206-042850-ladsgroup.json
* 04:28 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2180.codfw.wmnet with reason: Maintenance
* 04:28 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2180.codfw.wmnet with reason: Maintenance
* 04:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3316 ([[phab:T323907|T323907]])', diff saved to https://phabricator.wikimedia.org/P42410 and previous config saved to /var/cache/conftool/dbconfig/20221206-042828-ladsgroup.json
* 04:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1180', diff saved to https://phabricator.wikimedia.org/P42409 and previous config saved to /var/cache/conftool/dbconfig/20221206-041820-ladsgroup.json
* 04:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3316', diff saved to https://phabricator.wikimedia.org/P42408 and previous config saved to /var/cache/conftool/dbconfig/20221206-041322-ladsgroup.json
* 04:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1180', diff saved to https://phabricator.wikimedia.org/P42407 and previous config saved to /var/cache/conftool/dbconfig/20221206-040313-ladsgroup.json
* 04:02 mwpresync@deploy1002: Started scap: testwikis wikis to 1.40.0-wmf.13  refs [[phab:T320518|T320518]]
* 03:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3316', diff saved to https://phabricator.wikimedia.org/P42406 and previous config saved to /var/cache/conftool/dbconfig/20221206-035815-ladsgroup.json
* 03:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1180 ([[phab:T323907|T323907]])', diff saved to https://phabricator.wikimedia.org/P42405 and previous config saved to /var/cache/conftool/dbconfig/20221206-034806-ladsgroup.json
* 03:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3316 ([[phab:T323907|T323907]])', diff saved to https://phabricator.wikimedia.org/P42404 and previous config saved to /var/cache/conftool/dbconfig/20221206-034309-ladsgroup.json
* 03:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1180 ([[phab:T323907|T323907]])', diff saved to https://phabricator.wikimedia.org/P42403 and previous config saved to /var/cache/conftool/dbconfig/20221206-032818-ladsgroup.json
* 03:28 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1180.eqiad.wmnet with reason: Maintenance
* 03:28 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1180.eqiad.wmnet with reason: Maintenance
* 03:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1168 ([[phab:T323907|T323907]])', diff saved to https://phabricator.wikimedia.org/P42402 and previous config saved to /var/cache/conftool/dbconfig/20221206-032756-ladsgroup.json
* 03:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1168', diff saved to https://phabricator.wikimedia.org/P42401 and previous config saved to /var/cache/conftool/dbconfig/20221206-031250-ladsgroup.json
* 02:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2171:3316 ([[phab:T323907|T323907]])', diff saved to https://phabricator.wikimedia.org/P42400 and previous config saved to /var/cache/conftool/dbconfig/20221206-025831-ladsgroup.json
* 02:58 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2171.codfw.wmnet with reason: Maintenance
* 02:58 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2171.codfw.wmnet with reason: Maintenance
* 02:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2169:3316 ([[phab:T323907|T323907]])', diff saved to https://phabricator.wikimedia.org/P42399 and previous config saved to /var/cache/conftool/dbconfig/20221206-025821-ladsgroup.json
* 02:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1168', diff saved to https://phabricator.wikimedia.org/P42398 and previous config saved to /var/cache/conftool/dbconfig/20221206-025743-ladsgroup.json
* 02:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2169:3316', diff saved to https://phabricator.wikimedia.org/P42397 and previous config saved to /var/cache/conftool/dbconfig/20221206-024314-ladsgroup.json
* 02:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1168 ([[phab:T323907|T323907]])', diff saved to https://phabricator.wikimedia.org/P42396 and previous config saved to /var/cache/conftool/dbconfig/20221206-024236-ladsgroup.json
* 02:28 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
* 02:28 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
* 02:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1200 ([[phab:T322618|T322618]])', diff saved to https://phabricator.wikimedia.org/P42395 and previous config saved to /var/cache/conftool/dbconfig/20221206-022817-ladsgroup.json
* 02:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2169:3316', diff saved to https://phabricator.wikimedia.org/P42394 and previous config saved to /var/cache/conftool/dbconfig/20221206-022808-ladsgroup.json
* 02:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1168 ([[phab:T323907|T323907]])', diff saved to https://phabricator.wikimedia.org/P42393 and previous config saved to /var/cache/conftool/dbconfig/20221206-021638-ladsgroup.json
* 02:16 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1168.eqiad.wmnet with reason: Maintenance
* 02:16 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1168.eqiad.wmnet with reason: Maintenance
* 02:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1165 ([[phab:T323907|T323907]])', diff saved to https://phabricator.wikimedia.org/P42392 and previous config saved to /var/cache/conftool/dbconfig/20221206-021617-ladsgroup.json
* 02:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1200', diff saved to https://phabricator.wikimedia.org/P42391 and previous config saved to /var/cache/conftool/dbconfig/20221206-021310-ladsgroup.json
* 02:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2169:3316 ([[phab:T323907|T323907]])', diff saved to https://phabricator.wikimedia.org/P42390 and previous config saved to /var/cache/conftool/dbconfig/20221206-021301-ladsgroup.json
* 02:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1165', diff saved to https://phabricator.wikimedia.org/P42389 and previous config saved to /var/cache/conftool/dbconfig/20221206-020110-ladsgroup.json
* 01:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1200', diff saved to https://phabricator.wikimedia.org/P42388 and previous config saved to /var/cache/conftool/dbconfig/20221206-015757-ladsgroup.json
* 01:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1165', diff saved to https://phabricator.wikimedia.org/P42387 and previous config saved to /var/cache/conftool/dbconfig/20221206-014604-ladsgroup.json
* 01:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1200 ([[phab:T322618|T322618]])', diff saved to https://phabricator.wikimedia.org/P42386 and previous config saved to /var/cache/conftool/dbconfig/20221206-014251-ladsgroup.json
* 01:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2178 ([[phab:T322618|T322618]])', diff saved to https://phabricator.wikimedia.org/P42385 and previous config saved to /var/cache/conftool/dbconfig/20221206-014046-ladsgroup.json
* 01:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1200 ([[phab:T322618|T322618]])', diff saved to https://phabricator.wikimedia.org/P42384 and previous config saved to /var/cache/conftool/dbconfig/20221206-014038-ladsgroup.json
* 01:40 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1200.eqiad.wmnet with reason: Maintenance
* 01:40 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1200.eqiad.wmnet with reason: Maintenance
* 01:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1185 ([[phab:T322618|T322618]])', diff saved to https://phabricator.wikimedia.org/P42383 and previous config saved to /var/cache/conftool/dbconfig/20221206-014017-ladsgroup.json
* 01:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1165 ([[phab:T323907|T323907]])', diff saved to https://phabricator.wikimedia.org/P42382 and previous config saved to /var/cache/conftool/dbconfig/20221206-013057-ladsgroup.json
* 01:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2169:3316 ([[phab:T323907|T323907]])', diff saved to https://phabricator.wikimedia.org/P42381 and previous config saved to /var/cache/conftool/dbconfig/20221206-012812-ladsgroup.json
* 01:28 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2169.codfw.wmnet with reason: Maintenance
* 01:27 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2169.codfw.wmnet with reason: Maintenance
* 01:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2158 ([[phab:T323907|T323907]])', diff saved to https://phabricator.wikimedia.org/P42380 and previous config saved to /var/cache/conftool/dbconfig/20221206-012750-ladsgroup.json
* 01:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2178', diff saved to https://phabricator.wikimedia.org/P42379 and previous config saved to /var/cache/conftool/dbconfig/20221206-012539-ladsgroup.json
* 01:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1185', diff saved to https://phabricator.wikimedia.org/P42378 and previous config saved to /var/cache/conftool/dbconfig/20221206-012510-ladsgroup.json
* 01:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2158', diff saved to https://phabricator.wikimedia.org/P42377 and previous config saved to /var/cache/conftool/dbconfig/20221206-011244-ladsgroup.json
* 01:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1165 ([[phab:T323907|T323907]])', diff saved to https://phabricator.wikimedia.org/P42376 and previous config saved to /var/cache/conftool/dbconfig/20221206-011128-ladsgroup.json
* 01:11 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
* 01:11 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
* 01:11 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1165.eqiad.wmnet with reason: Maintenance
* 01:10 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1165.eqiad.wmnet with reason: Maintenance
* 01:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2178', diff saved to https://phabricator.wikimedia.org/P42375 and previous config saved to /var/cache/conftool/dbconfig/20221206-011033-ladsgroup.json
* 01:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1185', diff saved to https://phabricator.wikimedia.org/P42374 and previous config saved to /var/cache/conftool/dbconfig/20221206-011003-ladsgroup.json
* 00:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2158', diff saved to https://phabricator.wikimedia.org/P42373 and previous config saved to /var/cache/conftool/dbconfig/20221206-005737-ladsgroup.json
* 00:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2178 ([[phab:T322618|T322618]])', diff saved to https://phabricator.wikimedia.org/P42372 and previous config saved to /var/cache/conftool/dbconfig/20221206-005526-ladsgroup.json
* 00:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1185 ([[phab:T322618|T322618]])', diff saved to https://phabricator.wikimedia.org/P42371 and previous config saved to /var/cache/conftool/dbconfig/20221206-005457-ladsgroup.json
* 00:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2178 ([[phab:T322618|T322618]])', diff saved to https://phabricator.wikimedia.org/P42370 and previous config saved to /var/cache/conftool/dbconfig/20221206-005401-ladsgroup.json
* 00:53 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2178.codfw.wmnet with reason: Maintenance
* 00:53 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2178.codfw.wmnet with reason: Maintenance
* 00:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3315 ([[phab:T322618|T322618]])', diff saved to https://phabricator.wikimedia.org/P42369 and previous config saved to /var/cache/conftool/dbconfig/20221206-005339-ladsgroup.json
* 00:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1185 ([[phab:T322618|T322618]])', diff saved to https://phabricator.wikimedia.org/P42368 and previous config saved to /var/cache/conftool/dbconfig/20221206-005244-ladsgroup.json
* 00:52 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1185.eqiad.wmnet with reason: Maintenance
* 00:52 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1185.eqiad.wmnet with reason: Maintenance
* 00:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1161 ([[phab:T322618|T322618]])', diff saved to https://phabricator.wikimedia.org/P42367 and previous config saved to /var/cache/conftool/dbconfig/20221206-005223-ladsgroup.json
* 00:51 cstone: payments-wiki upgraded from {{Gerrit|b613ddfb}} to {{Gerrit|0cd7e779}}
* 00:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2158 ([[phab:T323907|T323907]])', diff saved to https://phabricator.wikimedia.org/P42366 and previous config saved to /var/cache/conftool/dbconfig/20221206-004231-ladsgroup.json
* 00:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3315', diff saved to https://phabricator.wikimedia.org/P42365 and previous config saved to /var/cache/conftool/dbconfig/20221206-003833-ladsgroup.json
* 00:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1161', diff saved to https://phabricator.wikimedia.org/P42364 and previous config saved to /var/cache/conftool/dbconfig/20221206-003716-ladsgroup.json
* 00:30 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1140.eqiad.wmnet with reason: Maintenance
* 00:29 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1140.eqiad.wmnet with reason: Maintenance
* 00:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1131 ([[phab:T323907|T323907]])', diff saved to https://phabricator.wikimedia.org/P42363 and previous config saved to /var/cache/conftool/dbconfig/20221206-002945-ladsgroup.json
* 00:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3315', diff saved to https://phabricator.wikimedia.org/P42362 and previous config saved to /var/cache/conftool/dbconfig/20221206-002326-ladsgroup.json
* 00:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1161', diff saved to https://phabricator.wikimedia.org/P42361 and previous config saved to /var/cache/conftool/dbconfig/20221206-002210-ladsgroup.json
* 00:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1131', diff saved to https://phabricator.wikimedia.org/P42360 and previous config saved to /var/cache/conftool/dbconfig/20221206-001438-ladsgroup.json
* 00:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3315 ([[phab:T322618|T322618]])', diff saved to https://phabricator.wikimedia.org/P42359 and previous config saved to /var/cache/conftool/dbconfig/20221206-000820-ladsgroup.json
* 00:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1161 ([[phab:T322618|T322618]])', diff saved to https://phabricator.wikimedia.org/P42358 and previous config saved to /var/cache/conftool/dbconfig/20221206-000703-ladsgroup.json
* 00:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2171:3315 ([[phab:T322618|T322618]])', diff saved to https://phabricator.wikimedia.org/P42357 and previous config saved to /var/cache/conftool/dbconfig/20221206-000654-ladsgroup.json
* 00:06 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2171.codfw.wmnet with reason: Maintenance
* 00:06 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2171.codfw.wmnet with reason: Maintenance
* 00:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2157 ([[phab:T322618|T322618]])', diff saved to https://phabricator.wikimedia.org/P42356 and previous config saved to /var/cache/conftool/dbconfig/20221206-000633-ladsgroup.json
* 00:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1161 ([[phab:T322618|T322618]])', diff saved to https://phabricator.wikimedia.org/P42355 and previous config saved to /var/cache/conftool/dbconfig/20221206-000444-ladsgroup.json
* 00:04 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
* 00:04 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
* 00:04 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1161.eqiad.wmnet with reason: Maintenance
* 00:03 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1161.eqiad.wmnet with reason: Maintenance
* 00:03 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1150.eqiad.wmnet with reason: Maintenance
* 00:03 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1150.eqiad.wmnet with reason: Maintenance
* 00:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3315 ([[phab:T322618|T322618]])', diff saved to https://phabricator.wikimedia.org/P42354 and previous config saved to /var/cache/conftool/dbconfig/20221206-000329-ladsgroup.json


== 2020-11-25 ==
== 2022-12-05 ==
* 23:28 akosiaris@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
* 23:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1131', diff saved to https://phabricator.wikimedia.org/P42353 and previous config saved to /var/cache/conftool/dbconfig/20221205-235932-ladsgroup.json
* 22:55 mutante: mwdebug1003 - scap pull - which rsyncs from deploy1001 and runs php-fpm restart check script ([[phab:T245757|T245757]])
* 23:57 tzatziki: removing 2 files for legal compliance
* 22:47 ejegg: increased Ingenico API call timeout
* 23:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2158 ([[phab:T323907|T323907]])', diff saved to https://phabricator.wikimedia.org/P42352 and previous config saved to /var/cache/conftool/dbconfig/20221205-235724-ladsgroup.json
* 22:34 shdubsh: beginning rolling restart of logstash cluster - eqiad
* 23:57 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2095.codfw.wmnet with reason: Maintenance
* 22:23 akosiaris@cumin1001: START - Cookbook
* 23:57 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2095.codfw.wmnet with reason: Maintenance
* 23:57 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2158.codfw.wmnet with reason: Maintenance
* 23:56 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2158.codfw.wmnet with reason: Maintenance
* 23:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2157', diff saved to https://phabricator.wikimedia.org/P42351 and previous config saved to /var/cache/conftool/dbconfig/20221205-235126-ladsgroup.json
* 23:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3315', diff saved to https://phabricator.wikimedia.org/P42350 and previous config saved to /var/cache/conftool/dbconfig/20221205-234822-ladsgroup.json
* 23:47 ebernhardson@deploy1002: Finished deploy [wikimedia/discovery/analytics@1d3ba41]:


== 2020-11-24 ==
== 2022-12-04 ==
* 23:50 crusnov@deploy1001: Finished deploy [netbox/deploy@0362a12]: Test deploy of 2.9.10 to netbox-next [[phab:T266488|T266488]] p2 (duration: 00m 05s)
* 04:19 TheresNoTime: [[phab:T302486|T302486]] : `[samtar@mwmaint1002 ~]$ mwscript maintenance/fixMergeHistoryCorruption.php --wiki enwiki --dry-run --ns 828`
* 23:50 crusnov@deploy1001: Started deploy [netbox/deploy@0362a12]: Test deploy of 2.9.10 to netbox-next [[phab:T266488|T266488]] p2
* 23:50 crusnov@deploy1001: Finished deploy [netbox/deploy@0362a12]: Test deploy of 2.9.10 to netbox-next [[phab:T266488|T266488]] (duration: 01m 51s)
* 23:48 crusnov@deploy1001: Started deploy [netbox/deploy@0362a12]: Test deploy of 2.9.10 to netbox-next [[phab:T266488|T266488]]
* 21:27 andrewbogott: restarting slapd on serpens
* 21:20 cdanis: ✔️ cdanis@seaborgium.wikimedia.org ~ 🕟🍵 sudo systemctl restart prometheus-openldap-exporter.service
* 21:17 andrewbogott: restarting slapd on seaborgium
* 20:49 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 20:42 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 20:41 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 20:40 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 19:53 otto@deploy1001: Synchronized wmf-config/InitialiseSettings-labs.php: Remove no longer needed EventLoggingSchemas override for NavigationTiming and ResourceTiming - [[phab:T254606|T254606]] (duration: 01m 01s)
* 19:49 ryankemper: [elasticsearch] Restarted all elasticsearch systemd-managed services on `relforge100[1,2]`: `elasticsearch_6@relforge-eqiad.service` and `elasticsearch_6@relforge-eqiad-small-alpha.service`
* 19:30 gilles@deploy1001: Synchronized php-1.36.0-wmf.18/extensions/NavigationTiming/extension.json: (no justification provided) (duration: 00m 57s)
* 19:24 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|331a129}}: Remove temporary feature flags ([[phab:T258116|T258116]]) (duration: 00m 57s)
* 19:20 mutante: LDAP - added derick to group nda ([[phab:T268150|T268150]])
* 19:17 moritzm: installing Java security updates on elastic* and relforge*
* 19:09 ppchelko@deploy1001: Synchronized wmf-config/InitialiseSettings.php: gerrit:643260 group1: Switch ParserCache to JSON (duration: 00m 57s)
* 18:59 mbsantos@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'push-notifications' for release 'main' .
* 18:56 elukey@deploy1001: Finished deploy [analytics/refinery@1ff0868]: Regular analytics weekly train (duration: 09m 50s)
* 18:56 volans: migrating anycast zonefile to the Netbox-generated ones - [[phab:T258729|T258729]]
* 18:55 mbsantos@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'push-notifications' for release 'main' .
* 18:52 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 18:51 mbsantos@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'push-notifications' for release 'main' .
* 18:48 andrew@cumin1001: START - Cookbook sre.hosts.downtime
* 18:46 elukey@deploy1001: Started deploy [analytics/refinery@1ff0868]: Regular analytics weekly train
* 18:46 crusnov@deploy1001: Finished deploy [netbox/deploy@88f61d0]: Test deploy of 2.9.9 to netbox-next [[phab:T266488|T266488]] p2 (duration: 00m 05s)
* 18:45 crusnov@deploy1001: Started deploy [netbox/deploy@88f61d0]: Test deploy of 2.9.9 to netbox-next [[phab:T266488|T266488]] p2
* 18:45 crusnov@deploy1001: Finished deploy [netbox/deploy@88f61d0]: Test deploy of 2.9.9 to netbox-next [[phab:T266488|T266488]] (duration: 01m 09s)
* 18:45 elukey: restart memcached on mw2339 to pick up the correct port (was bound on 11211 rather than 11210)
* 18:44 crusnov@deploy1001: Started deploy [netbox/deploy@88f61d0]: Test deploy of 2.9.9 to netbox-next [[phab:T266488|T266488]]
* 18:19 ejegg: updated Fundraising CiviCRM from {{Gerrit|28464df973}} to {{Gerrit|fb0ad7f39b}}
* 18:07 mbsantos@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
* 18:06 mbsantos@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
* 18:04 mbsantos@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'wikifeeds' for release 'staging' .
* 17:51 volans@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 17:44 volans@cumin1001: START - Cookbook sre.dns.netbox
* 17:10 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 17:08 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 17:08 andrew@cumin1001: START - Cookbook sre.hosts.downtime
* 17:07 andrew@cumin1001: START - Cookbook sre.hosts.downtime
* 16:29 elukey: move analytics1064 from C2 to C3 eqiad - [[phab:T267065|T267065]]
* 16:26 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 16:19 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 16:06 hnowlan: finished removing restbase2009 from cassandra cluster
* 16:01 cmjohnson1: replacing the sfp at cr1-eqiad xe-3/2/1 [[phab:T267672|T267672]]
* 15:42 marostegui: Drop kraken user from s4 - [[phab:T268636|T268636]]
* 15:38 elukey: move druid1005 from rack B7 to B6 - [[phab:T267065|T267065]]
* 15:35 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 15:33 andrew@cumin1001: START - Cookbook sre.hosts.downtime
* 15:29 otto@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventstreams' for release 'canary' .
* 15:29 otto@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventstreams' for release 'production' .
* 15:28 jayme: pushed docker-registry.discovery.wmnet/calico/kube-controllers:v3.17.0 docker-registry.discovery.wmnet/calico/node:v3.17.0 docker-registry.discovery.wmnet/calico/typha:v3.17.0
* 15:23 jayme: imported calico 3.17.0 into component/calico-future for stretch-wikimedia
* 15:07 godog: swift eqiad-prod: decom ms-be1022 ssd from swift - [[phab:T267870|T267870]]
* 15:01 marostegui: Enable GTID on clouddb1013:3311 clouddb1015:3314 clouddb1017:3311 clouddb1019:3314 [[phab:T267090|T267090]]
* 14:58 elukey: move analytics1072 from rack B2 to B3 - [[phab:T267065|T267065]]
* 14:54 otto@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventstreams' for release 'canary' .
* 14:54 otto@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventstreams' for release 'production' .
* 14:53 jayme: imported helmfile 0.135.0-1 into buster-wikimedia and stretch-wikimedia
* 14:47 otto@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventstreams' for release 'production' .
* 14:44 akosiaris@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'kube-system' for release 'calico-policy-controller' .
* 14:43 akosiaris@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'kube-system' for release 'calico-policy-controller' .
* 14:43 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'kube-system' for release 'calico-policy-controller' .
* 14:42 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'kube-system' for release 'calico-policy-controller' .
* 14:42 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1085 for schema change', diff saved to https://phabricator.wikimedia.org/P13392 and previous config saved to /var/cache/conftool/dbconfig/20201124-144219-marostegui.json
* 14:34 liw: finished testing Scap on Beta cluster in prep for https://phabricator.wikimedia.org/T268634
* 14:31 otto@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventstreams' for release 'production' .
* 14:27 otto@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventstreams' for release 'production' .
* 14:19 marostegui@cumin1001: dbctl commit (dc=all): 'db1121 (re)pooling @ 100%: After cloning the new clouddb hosts', diff saved to https://phabricator.wikimedia.org/P13391 and previous config saved to /var/cache/conftool/dbconfig/20201124-141912-root.json
* 14:09 moritzm: reset-failed idp-u2f.service after Hiera change (one time issue, will soon be obsolete)
* 14:04 marostegui@cumin1001: dbctl commit (dc=all): 'db1121 (re)pooling @ 75%: After cloning the new clouddb hosts', diff saved to https://phabricator.wikimedia.org/P13390 and previous config saved to /var/cache/conftool/dbconfig/20201124-140409-root.json
* 13:52 elukey@deploy1001: Finished deploy [statsv/statsv@b25b6ff]: Deploy https://gerrit.wikimedia.org/r/c/analytics/statsv/+/643252 (duration: 00m 05s)
* 13:52 elukey@deploy1001: Started deploy [statsv/statsv@b25b6ff]: Deploy https://gerrit.wikimedia.org/r/c/analytics/statsv/+/643252
* 13:49 marostegui@cumin1001: dbctl commit (dc=all): 'db1121 (re)pooling @ 50%: After cloning the new clouddb hosts', diff saved to https://phabricator.wikimedia.org/P13389 and previous config saved to /var/cache/conftool/dbconfig/20201124-134905-root.json
* 13:40 marostegui: Stop MySQL on db1074 to clone clouddb1018 and clouddb1014 [[phab:T267090|T267090]]
* 13:37 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1074 to clone clouddb1018 and clouddb1014 [[phab:T267090|T267090]]', diff saved to https://phabricator.wikimedia.org/P13388 and previous config saved to /var/cache/conftool/dbconfig/20201124-133709-marostegui.json
* 13:34 marostegui@cumin1001: dbctl commit (dc=all): 'db1121 (re)pooling @ 25%: After cloning the new clouddb hosts', diff saved to https://phabricator.wikimedia.org/P13387 and previous config saved to /var/cache/conftool/dbconfig/20201124-133402-root.json
* 13:13 jgleeson: civicrm revision is {{Gerrit|28464df973}}, config revision is {{Gerrit|928918a9b6}}
* 13:01 hashar@deploy1001: rebuilt and synchronized wikiversions files: all wikis to 1.36.0-wmf.18
* 13:01 liw: done testing Scap release candidate on beta (failed: disk full on deploy01)
* 12:49 hnowlan: disabled cassandra service on restbase2009, starting drain
* 12:30 liw: testing upcoming Scap release on beta
* 12:02 ayounsi@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 11:59 jayme: imported helm3 3.4.1-1 into buster-wikimedia and stretch-wikimedia
* 11:56 ayounsi@cumin1001: START - Cookbook sre.dns.netbox
* 11:52 XioNoX: push CR641949 and CR641949
* 11:38 effie: rolling depool and pool app and api clusters - [[phab:T244340|T244340]]
* 11:25 _joe_: rebuild docker images for [[phab:T268612|T268612]]
* 11:20 effie: disable puppet on api and app servers to rollout onhost memcached - [[phab:T244340|T244340]]
* 11:15 root@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 11:15 root@cumin1001: START - Cookbook sre.hosts.downtime
* 11:15 root@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 11:14 root@cumin1001: START - Cookbook sre.hosts.downtime
* 11:12 marostegui: Stop mysql on db1125:3312 to clone clouddb1014:3312 and clouddb1018:3312 - [[phab:T267090|T267090]]
* 10:45 moritzm: upgrading seaborgium to Buster
* 10:41 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 10:40 jmm@cumin2001: START - Cookbook sre.hosts.downtime
* 10:31 jbond42: up0load new cas package to wikimedia-buster
* 10:01 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db2073', diff saved to https://phabricator.wikimedia.org/P13384 and previous config saved to /var/cache/conftool/dbconfig/20201124-100139-marostegui.json
* 10:00 marostegui@cumin1001: dbctl commit (dc=all): 'Repool es2026', diff saved to https://phabricator.wikimedia.org/P13383 and previous config saved to /var/cache/conftool/dbconfig/20201124-100020-marostegui.json
* 09:48 volans: Migrating codfw private/public primary DNS records to the auto-generated ones from Netbox - [[phab:T258729|T258729]]
* 09:44 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1096:3316 after schema change', diff saved to https://phabricator.wikimedia.org/P13382 and previous config saved to /var/cache/conftool/dbconfig/20201124-094449-marostegui.json
* 09:42 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1096:3316', diff saved to https://phabricator.wikimedia.org/P13381 and previous config saved to /var/cache/conftool/dbconfig/20201124-094159-marostegui.json
* 09:40 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1098:3316 after schema change', diff saved to https://phabricator.wikimedia.org/P13380 and previous config saved to /var/cache/conftool/dbconfig/20201124-094052-marostegui.json
* 09:35 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1098:3316', diff saved to https://phabricator.wikimedia.org/P13379 and previous config saved to /var/cache/conftool/dbconfig/20201124-093517-marostegui.json
* 09:23 marostegui: Deploy schema change on db2114 and db1096:3316 - [[phab:T268004|T268004]]
* 09:13 ema: cp4032: switch back to varnish 6.0.6-1wm2 after [[phab:T264398|T264398]] experiment, fix [[phab:T268243|T268243]]
* 09:09 elukey: drop principals and keytabs for analytics10[42-57] - [[phab:T267932|T267932]]
* 09:03 gilles@deploy1001: Finished deploy [performance/navtiming@ba6cd0d]: [[phab:T260580|T260580]] Parse user agents in navtiming instead of relying on eventlogging to do it (duration: 00m 05s)
* 09:03 gilles@deploy1001: Started deploy [performance/navtiming@ba6cd0d]: [[phab:T260580|T260580]] Parse user agents in navtiming instead of relying on eventlogging to do it
* 08:49 _joe_: uploading the base production docker images for MediaWiki, [[phab:T265324|T265324]]
* 08:48 elukey@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
* 08:43 _joe_: refreshing debian buster base image
* 08:42 elukey@cumin1001: START - Cookbook sre.hosts.decommission
* 08:42 elukey@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
* 08:36 elukey@cumin1001: START - Cookbook sre.hosts.decommission
* 08:31 marostegui: Deploy user for pki database for dbproxy1012, dbproxy1014, dbproxy2001 - [[phab:T268329|T268329]]
* 08:28 elukey@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=99)
* 08:27 elukey@cumin1001: START - Cookbook sre.hosts.decommission
* 07:58 godog: swift eqiad-prod: add weight to ms-be106[0-3] - [[phab:T268435|T268435]]
* 07:43 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1112 after schema change [[phab:T267335|T267335]] [[phab:T267399|T267399]]', diff saved to https://phabricator.wikimedia.org/P13378 and previous config saved to /var/cache/conftool/dbconfig/20201124-074342-marostegui.json
* 07:32 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1112 for schema change [[phab:T267335|T267335]] [[phab:T267399|T267399]]', diff saved to https://phabricator.wikimedia.org/P13377 and previous config saved to /var/cache/conftool/dbconfig/20201124-073202-marostegui.json
* 07:31 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1078 after schema change [[phab:T267335|T267335]] [[phab:T267399|T267399]]', diff saved to https://phabricator.wikimedia.org/P13376 and previous config saved to /var/cache/conftool/dbconfig/20201124-073125-marostegui.json
* 07:27 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1078 for schema change [[phab:T267335|T267335]] [[phab:T267399|T267399]]', diff saved to https://phabricator.wikimedia.org/P13375 and previous config saved to /var/cache/conftool/dbconfig/20201124-072755-marostegui.json
* 07:27 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1075 after schema change [[phab:T267335|T267335]] [[phab:T267399|T267399]]', diff saved to https://phabricator.wikimedia.org/P13374 and previous config saved to /var/cache/conftool/dbconfig/20201124-072715-marostegui.json
* 07:22 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1075 for schema change [[phab:T267335|T267335]] [[phab:T267399|T267399]]', diff saved to https://phabricator.wikimedia.org/P13373 and previous config saved to /var/cache/conftool/dbconfig/20201124-072249-marostegui.json
* 07:00 _joe_: changing the mtail recipe for mediawiki/apache to use an actual histogram
* 06:31 marostegui: Sanitize clouddb1019:3314 [[phab:T267090|T267090]]
* 06:28 marostegui: Sanitize clouddb1015:3314 [[phab:T267090|T267090]]
* 03:43 ryankemper@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99)
* 03:42 ryankemper@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99)
* 03:38 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 03:37 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 03:36 ryankemper@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99)
* 03:31 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 00:42 reedy@deploy1001: Synchronized php-1.36.0-wmf.16/extensions/SecurePoll/includes/Crypt/GpgCrypt.php: Unbreak gpg encrypted polls [[phab:T268583|T268583]] (duration: 01m 05s)
* 00:29 reedy@deploy1001: Synchronized php-1.36.0-wmf.18/extensions/SecurePoll/includes/Crypt/GpgCrypt.php: Unbreak gpg encrypted polls [[phab:T268583|T268583]] (duration: 01m 06s)


== 2020-11-23 ==
== 2022-12-03 ==
* 22:56 pt1979@cumin2001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 00:17 cwhite: draining shards from logstash1010, logstash1033, logstash1034, logstash1035 - [[phab:T321410|T321410]]
* 22:52 pt1979@cumin2001: START - Cookbook sre.dns.netbox
* 22:35 pt1979@cumin2001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 22:31 pt1979@cumin2001: START - Cookbook sre.dns.netbox
* 21:54 mutante: mwdebug1003 - removing php packages and letting puppet reinstall them after it has the correct APT config [[phab:T267248|T267248]]
* 21:51 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 21:51 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 21:26 mutante: mwdebug1003 - scap pull because <+icinga-wm> PROBLEM - Ensure local MW versions match expected deployment on mwdebug1003 is CRITICAL
* 20:41 otto@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventstreams' for release 'production' .
* 20:28 otto@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventstreams' for release 'production' .
* 20:09 dancy@deploy1001: Synchronized php: group1 wikis to 1.36.0-wmf.18 (duration: 01m 04s)
* 20:08 dancy@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.36.0-wmf.18
* 20:00 otto@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventstreams' for release 'production' .
* 19:24 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Revert {{Gerrit|a110db09adf95edb38f663c19ce596e817ecf55d}}: group1: switch ParserCache to JSON ([[phab:T263579|T263579]]) (duration: 00m 42s)
* 19:22 Urbanecm: Morning B&C done
* 19:21 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|a110db09adf95edb38f663c19ce596e817ecf55d}}: group1: switch ParserCache to JSON ([[phab:T263579|T263579]]) (duration: 01m 05s)
* 19:15 Urbanecm: Synced security patch for [[phab:T120883|T120883]] (wmf.18)
* 19:12 Urbanecm: Synced security patch for [[phab:T120883|T120883]] (wmf.16)
* 19:09 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|7561926e1dede35c2ad27d587c044a5ebf5e6648}}: GrowthExperiments: Enable help panel top-posting on svwiki, ruwiki ([[phab:T268227|T268227]]) (duration: 01m 06s)
* 17:48 volans@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 17:46 volans@cumin1001: START - Cookbook sre.dns.netbox
* 17:46 volans@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
* 17:44 pt1979@cumin2001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 17:41 volans@cumin1001: START - Cookbook sre.dns.netbox
* 17:37 hnowlan@puppetmaster1001: conftool action : set/pooled=yes:weight=10; selector: dc=codfw,cluster=maps,service=kartotherian,name=maps2010.codfw.wmnet
* 17:36 pt1979@cumin2001: START - Cookbook sre.dns.netbox
* 17:29 urbanecm@deploy1001: Synchronized private/PrivateSettings.php: Update [[phab:T250887|T250887]] mitigations (duration: 01m 05s)
* 17:22 mutante: DNS - new project language 'skr' added - Saraiki ( سرائیکی Sarā'īkī, also spelt Siraiki, or Seraiki) is an Indo-Aryan language of the Lahnda group, spoken in the south-western half of the province of Punjab in Pakistan.
* 17:12 elukey: move aqs1004 from rack A4 to A3 - [[phab:T267065|T267065]]
* 17:06 pt1979@cumin2001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 16:58 pt1979@cumin2001: START - Cookbook sre.dns.netbox
* 16:37 elukey: move analytics1070 from rack A7 to rack A5 - [[phab:T267065|T267065]]
* 15:51 otto@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventstreams' for release 'production' .
* 15:13 godog: add ipv6 forward/reverse records for grafana1002 / grafana2001
* 15:05 filippo@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 14:57 filippo@cumin1001: START - Cookbook sre.dns.netbox
* 14:26 hnowlan@puppetmaster1001: conftool action : set/pooled=yes:weight=10; selector: dc=codfw,cluster=maps,service=kartotherian,name=maps2009.codfw.wmnet
* 14:10 kormat: cleaning up heartbeat.heartbeat on pc3 [[phab:T268336|T268336]]
* 14:09 kormat: cleaning up heartbeat.heartbeat on pc2 [[phab:T268336|T268336]]
* 14:04 kormat: cleaning up heartbeat.heartbeat on pc1 [[phab:T268336|T268336]]
* 14:01 moritzm: imported prometheus-php-fpm-exporter 0.4.1+git20181018.d0d1837-2 to buster-wikimedia [[phab:T245757|T245757]]
* 13:56 XioNoX: push CR641960
* 13:56 godog: add ms-be106[0-3] to eqiad-prod with minimal weight - [[phab:T268435|T268435]]
* 13:17 moritzm: imported ploticus 2.42-4.2~wmf1 to buster-wikimedia [[phab:T245757|T245757]]
* 13:11 Lucas_WMDE: EU backport+config window done
* 13:11 lucaswerkmeister-wmde@deploy1001: Synchronized php-1.36.0-wmf.18/extensions/Wikibase: Backport: [[gerrit:642103{{!}}Calculate page props on-the-fly during RDF dump (T145712)]] (duration: 01m 14s)
* 13:01 hnowlan: started cassandra pooling maps2009
* 12:58 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1143 after schema change [[phab:T267335|T267335]] [[phab:T267399|T267399]]', diff saved to https://phabricator.wikimedia.org/P13370 and previous config saved to /var/cache/conftool/dbconfig/20201123-125815-marostegui.json
* 12:58 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1143 for schema change [[phab:T267335|T267335]] [[phab:T267399|T267399]]', diff saved to https://phabricator.wikimedia.org/P13369 and previous config saved to /var/cache/conftool/dbconfig/20201123-125759-marostegui.json
* 12:54 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1141 after schema change [[phab:T267335|T267335]] [[phab:T267399|T267399]]', diff saved to https://phabricator.wikimedia.org/P13368 and previous config saved to /var/cache/conftool/dbconfig/20201123-125417-marostegui.json
* 12:53 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1141 for schema change [[phab:T267335|T267335]] [[phab:T267399|T267399]]', diff saved to https://phabricator.wikimedia.org/P13367 and previous config saved to /var/cache/conftool/dbconfig/20201123-125345-marostegui.json
* 12:34 Lucas_WMDE: Undeployed patch for [[phab:T260349|T260349]]
* 12:32 hnowlan@puppetmaster1001: conftool action : set/pooled=yes:weight=10; selector: dc=codfw,cluster=maps,service=kartotherian,name=maps2008.codfw.wmnet
* 12:32 Urbanecm: Run scap pull at mwdebug1003
* 12:28 marostegui: Stop mysql on db1121 to clone  clouddb1017:3314 clouddb1019:3314
* 12:27 Lucas_WMDE: Deployed patch for [[phab:T260349|T260349]]
* 12:25 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1121 to clone clouddb1017:3314 clouddb1019:3314 [[phab:T267090|T267090]]', diff saved to https://phabricator.wikimedia.org/P13366 and previous config saved to /var/cache/conftool/dbconfig/20201123-122549-marostegui.json
* 12:07 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|c00d7e8e4c407b76aa2930dfa040394e874d77bc}}: Move ContentTranslation out of Beta for br, ka, ast, si and ig WPs ([[phab:T267212|T267212]], [[phab:T266217|T266217]], [[phab:T266218|T266218]], [[phab:T266219|T266219]], [[phab:T266220|T266220]]) (duration: 01m 06s)
* 12:01 Urbanecm: Start of mwscript extensions/AbuseFilter/maintenance/updateVarDumps.php --wiki=$wiki --print-orphaned-records-to=/tmp/urbanecm/$wiki-orphaned.log --progress-markers > $wiki.log in a tmux at mwmaint1002 (wiki=zhwiki; [[phab:T246539|T246539]])
* 11:49 XioNoX: eqiad row A, split LVS, Ganeti, Cloud, interface-ranges to individual terms
* 11:38 jdrewniak@deploy1001: Synchronized portals: Wikimedia Portals Update: [[gerrit:643018{{!}} Bumping portals to master (T128546)]] (duration: 01m 05s)
* 11:37 jdrewniak@deploy1001: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: [[gerrit:643018{{!}} Bumping portals to master (T128546)]] (duration: 01m 21s)
* 11:25 hnowlan: starting cassandra bootstrap of maps2008
* 11:20 effie: enable puppet on cp* hosts
* 11:16 moritzm: installing poppler security updates on stretch
* 11:13 elukey@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=99)
* 11:13 elukey@cumin1001: START - Cookbook sre.hosts.decommission
* 11:05 XioNoX: eqiad row A, standardize interfaces descriptions and ranges order
* 10:35 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 10:26 effie: disable puppet on cp* hosts to merge 641730
* 10:26 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
* 10:26 moritzm: rebooting serpens
* 10:21 XioNoX: eqiad row B, split LVS, Ganeti, Cloud, interface-ranges to individual terms
* 09:48 XioNoX: eqiad row B, standardize interfaces descriptions and ranges order
* 08:46 elukey: drop kerberos keytabs for analytics10[28-41] from krb1001:/srv/kerberos/keytabs, decommed nodes (old hadoop  test cluster)
* 08:43 godog: start stress testing on ms-be106* - [[phab:T268435|T268435]]
* 08:41 elukey: drop kerberos principals from krb1001 for analytics10[29-41], decommed nodes (old hadoop test cluster)
* 08:36 elukey: drop analytics1028's krb principals from krb1001 - old decommed node
* 08:35 moritzm: installing remaining krb5 security updates for Stretch
* 07:27 marostegui: Stop MySQL on db1125:3314 to clone clouddb1015 and clouddb1019 - lag will appear on Commosnwiki on wikireplicas - [[phab:T267090|T267090]]
* 07:06 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 07:00 marostegui@cumin1001: START - Cookbook sre.hosts.decommission
* 06:46 marostegui: Restart clouddb1013 clouddb1015 clouddb1017 clouddb1019 for testing [[phab:T267090|T267090]]


== 2020-11-21 ==
== 2022-12-02 ==
* 09:18 joal: Drop historical logs of 'Wikidata Concepts Monitor ETL' on HDFS keeping one example - freeing 60Tb
* 19:42 volans@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 09:17 joal: Drop historical logs of '
* 19:42 volans@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Force run after a permission problem - volans@cumin1001"
* 08:28 ariel@deploy1001: Finished deploy [dumps/dumps@1a76a9a]: revinfo updates (duration: 00m 05s)
* 19:41 volans@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Force run after a permission problem - volans@cumin1001"
* 08:28 ariel@deploy1001: Started deploy [dumps/dumps@1a76a9a]: revinfo updates
* 19:39 volans@cumin1001: START - Cookbook sre.dns.netbox
* 08:10 elukey: remove big stderrlog fine in /var/lib/hadoop/data/d/yarn/logs/application_1605880843685_1450 on an-worker1110
* 19:38 volans@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 08:05 elukey: remove big stderrlog fine in /var/lib/hadoop/data/e/yarn/logs/application_1605880843685_1450 on an-worker1105
* 19:37 volans@cumin1001: START - Cookbook sre.dns.netbox
* 19:36 volans: fixed git checkout permissions [[phab:T324334|T324334]]
* 19:11 sukhe: restart pybal on lvs5004
* 19:07 mutante: gitlab-runner* - upgrading gitlab-runner package version
* 18:55 sukhe: homer "cr*-eqsin*" commit "running homer for Gerrit: 863383"
* 18:53 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts lvs5001.eqsin.wmnet
* 18:53 sukhe@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 18:53 sukhe@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: lvs5001.eqsin.wmnet decommissioned, removing all IPs except the asset tag one - sukhe@cumin2002"
* 18:51 sukhe@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: lvs5001.eqsin.wmnet decommissioned, removing all IPs except the asset tag one - sukhe@cumin2002"
* 18:49 sukhe@cumin2002: START - Cookbook sre.dns.netbox
* 18:44 sukhe@cumin2002: START - Cookbook sre.hosts.decommission for hosts lvs5001.eqsin.wmnet
* 18:22 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on lvs5001.eqsin.wmnet with reason: downtimed, in the process of decom
* 18:21 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 4:00:00 on lvs5001.eqsin.wmnet with reason: downtimed, in the process of decom
* 18:20 sukhe: decomm lvs5001: restarting pybal
* 18:14 sukhe: cr[23]-eqsin*: set routing-options static route 103.102.166.224/28 next-hop 10.132.0.39
* 18:05 volans@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 18:05 volans@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Test run after git gc - volans@cumin1001"
* 18:03 volans@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Test run after git gc - volans@cumin1001"
* 18:01 volans@cumin1001: START - Cookbook sre.dns.netbox
* 18:00 volans: performed git gc on all (auth)dns hosts in /srv/git/netbox_dns_snippets - [[phab:T324334|T324334]]
* 17:36 sukhe: homer "cr*-eqsin*" commit "running homer for Gerrit: 862944"
* 16:56 bking@cumin2002: END (PASS) - Cookbook sre.wdqs.restart (exit_code=0)
* 16:53 jnuche@deploy1002: Finished scap: testing k8s deployment (duration: 08m 35s)
* 16:49 bking@cumin2002: START - Cookbook sre.wdqs.restart
* 16:49 bblack: (above agent runs completed on all text nodes for requestctl-for-misc patch)
* 16:44 jnuche@deploy1002: Started scap: testing k8s deployment
* 16:44 bblack: running agent on A:cp-text for https://gerrit.wikimedia.org/r/c/operations/puppet/+/863375 (requestctl for misc)
* 16:29 bking@cumin2002: END (PASS) - Cookbook sre.wdqs.restart (exit_code=0)
* 16:28 sukhe@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host lvs5004.eqsin.wmnet with OS buster
* 16:21 bking@cumin2002: START - Cookbook sre.wdqs.restart
* 16:03 bking@cumin2002: END (PASS) - Cookbook sre.wdqs.restart (exit_code=0)
* 16:02 sukhe@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs5004.eqsin.wmnet with reason: host reimage
* 15:59 sukhe@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs5004.eqsin.wmnet with reason: host reimage
* 15:55 bking@cumin2002: START - Cookbook sre.wdqs.restart
* 15:48 sukhe: homer "cr*-eqsin*" commit "running homer for Gerrit: 862998"
* 15:47 bking@cumin2002: END (PASS) - Cookbook sre.wdqs.restart (exit_code=0)
* 15:43 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dns5004.wikimedia.org with OS buster
* 15:40 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
* 15:40 bking@cumin2002: START - Cookbook sre.wdqs.restart
* 15:36 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
* 15:33 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
* 15:30 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
* 15:29 bking@cumin2002: END (PASS) - Cookbook sre.wdqs.restart (exit_code=0)
* 15:28 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
* 15:22 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' .
* 15:22 bking@cumin2002: START - Cookbook sre.wdqs.restart
* 15:16 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dns5004.wikimedia.org with reason: host reimage
* 15:13 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
* 15:12 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on dns5004.wikimedia.org with reason: host reimage
* 15:06 volans: run `git gc` on /srv/netbox-exports/dns.git on netbox[12]002 - [[phab:T324334|T324334]]
* 14:48 sukhe@cumin1001: START - Cookbook sre.hosts.reimage for host lvs5004.eqsin.wmnet with OS buster
* 14:38 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host dns5004.wikimedia.org with OS buster
* 12:09 jynus: dropping all databases from db1133
* 11:16 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ganeti5001.eqsin.wmnet
* 11:16 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 11:16 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti5001.eqsin.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
* 11:12 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti5001.eqsin.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
* 11:02 jmm@cumin2002: START - Cookbook sre.dns.netbox
* 10:57 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ganeti5001.eqsin.wmnet
* 10:56 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
* 10:34 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on ganeti5001.eqsin.wmnet with reason: Remove from cluster for decom
* 10:34 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on ganeti5001.eqsin.wmnet with reason: Remove from cluster for decom
* 10:01 vgutierrez: upload acme-chief 0.36 to apt.wm.o (bullseye) - [[phab:T321309|T321309]]
* 09:58 moritzm: installing publicsuffix updates from bullseye/buster point releases
* 09:54 moritzm: installing debootstrap updates from bullseye point release
* 09:53 moritzm: rebalance ganeti codfw/C [[phab:T323222|T323222]]
* 09:52 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.addnode (exit_code=0) for new host ganeti2013.codfw.wmnet to cluster codfw and group C
* 09:51 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti2013.codfw.wmnet to cluster codfw and group C
* 09:11 marostegui@cumin1001: dbctl commit (dc=all): 'db1134 (re)pooling @ 100%: After cloning db1206', diff saved to https://phabricator.wikimedia.org/P42215 and previous config saved to /var/cache/conftool/dbconfig/20221202-091126-root.json
* 08:56 marostegui@cumin1001: dbctl commit (dc=all): 'db1134 (re)pooling @ 75%: After cloning db1206', diff saved to https://phabricator.wikimedia.org/P42214 and previous config saved to /var/cache/conftool/dbconfig/20221202-085621-root.json
* 08:41 jayme@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
* 08:41 jayme@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
* 08:41 marostegui@cumin1001: dbctl commit (dc=all): 'db1134 (re)pooling @ 50%: After cloning db1206', diff saved to https://phabricator.wikimedia.org/P42213 and previous config saved to /var/cache/conftool/dbconfig/20221202-084116-root.json
* 08:41 jayme@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
* 08:40 jayme@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
* 08:26 marostegui@cumin1001: dbctl commit (dc=all): 'db1134 (re)pooling @ 25%: After cloning db1206', diff saved to https://phabricator.wikimedia.org/P42212 and previous config saved to /var/cache/conftool/dbconfig/20221202-082611-root.json
* 08:11 marostegui@cumin1001: dbctl commit (dc=all): 'db1134 (re)pooling @ 10%: After cloning db1206', diff saved to https://phabricator.wikimedia.org/P42211 and previous config saved to /var/cache/conftool/dbconfig/20221202-081106-root.json
* 07:56 marostegui@cumin1001: dbctl commit (dc=all): 'db1134 (re)pooling @ 5%: After cloning db1206', diff saved to https://phabricator.wikimedia.org/P42210 and previous config saved to /var/cache/conftool/dbconfig/20221202-075601-root.json
* 07:49 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
* 07:49 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 07:49 elukey@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
* 07:49 elukey@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
* 07:49 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
* 07:49 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
* 07:43 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
* 07:43 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
* 07:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1163 (re)pooling @ 100%: Maint done', diff saved to https://phabricator.wikimedia.org/P42209 and previous config saved to /var/cache/conftool/dbconfig/20221202-074300-ladsgroup.json
* 07:41 moritzm: draining ganeti5001 for eventual decom [[phab:T322048|T322048]]
* 07:41 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
* 07:41 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
* 07:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1163 (re)pooling @ 75%: Maint done', diff saved to https://phabricator.wikimedia.org/P42208 and previous config saved to /var/cache/conftool/dbconfig/20221202-072755-ladsgroup.json
* 07:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1163 (re)pooling @ 25%: Maint done', diff saved to https://phabricator.wikimedia.org/P42207 and previous config saved to /var/cache/conftool/dbconfig/20221202-071250-ladsgroup.json
* 06:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1163 (re)pooling @ 10%: Maint done', diff saved to https://phabricator.wikimedia.org/P42206 and previous config saved to /var/cache/conftool/dbconfig/20221202-065745-ladsgroup.json
* 06:13 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1134', diff saved to https://phabricator.wikimedia.org/P42204 and previous config saved to /var/cache/conftool/dbconfig/20221202-061259-marostegui.json
* 00:09 rzl@cumin1001: conftool action : set/pooled=no; selector: name=mw14(45{{!}}46).eqiad.wmnet,cluster=jobrunner
* 00:09 rzl@cumin1001: conftool action : set/pooled=no; selector: name=mw14(39{{!}}40).eqiad.wmnet,cluster=videoscaler
* 00:07 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dns5004.wikimedia.org with OS buster


== 2020-11-20 ==
== 2022-12-01 ==
* 23:38 mutante: synced puppet-compiler facts - new hosts should be usable in compiler
* 23:47 rzl@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw[1347-1348].eqiad.wmnet
* 22:30 mutante: cumin1001 - sudo systemctl start cumin-check-aliases ->  <+icinga-wm> RECOVERY - Check systemd state on cumin1001 is OK  [[phab:T268369|T268369]]
* 23:47 rzl@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 23:47 rzl@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: mw[1347-1348].eqiad.wmnet decommissioned, removing all IPs except the asset tag one - rzl@cumin1001"
* 23:45 rzl@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: mw[1347-1348].eqiad.wmnet decommissioned, removing all IPs except the asset tag one - rzl@cumin1001"
* 23:43 rzl@cumin1001: START - Cookbook sre.dns.netbox
* 23:37 rzl@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw[1347-1348].eqiad.wmnet
* 23:35 rzl@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw[1327-1346].eqiad.wmnet
* 23:35 rzl@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 23:35 rzl@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: mw[1327-1346].eqiad.wmnet decommissioned, removing all IPs except the asset tag one - rzl@cumin1001"
* 23:34 rzl@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: mw[1327-1346].eqiad.wmnet decommissioned, removing all IPs except the asset tag one - rzl@cumin1001"
* 23:31 rzl@cumin1001: START - Cookbook sre.dns.netbox
* 22:59 rzl@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw[1327-1346].eqiad.wmnet
* 22:57 urbanecm@deploy1002: Finished scap: Backport for [[gerrit:856008{{!}}GrowthExperiments: Remove unused config variable GEMentorDashboardUseVue]] (duration: 07m 28s)
* 22:57 rzl: rzl@puppetmaster1001:~$ sudo puppet node deactivate mw1320.eqiad.wmnet  # [[phab:T306162|T306162]]
* 22:56 rzl: rzl@puppetmaster1001:~$ sudo puppet node deactivate mw1312.eqiad.wmnet  # [[phab:T306162|T306162]]
* 22:54 rzl@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts mw[1307-1326].eqiad.wmnet
* 22:54 rzl@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 22:54 rzl@cumin1001: END (PASS


== 2020-11-19 ==
==Archives ==
* 23:59 razzi@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
* 23:52 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 23:50 robh@cumin1001: START - Cookbook sre.hosts.downtime
* 23:23 robh@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 23:21 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 23:19 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 23:18 robh@cumin1001: START - Cookbook sre.hosts.downtime
* 23:18 robh@cumin1001: START - Cookbook sre.hosts.downtime
* 23:17 robh@cumin1001: START - Cookbook sre.hosts.downtime
* 23:06 razzi@cumin1001: START - Cookbook sre.ganeti.makevm
* 22:54 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 22:52 robh@cumin1001: START - Cookbook sre.hosts.downtime
* 22:23 razzi@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
* 22:07 razzi@cumin1001: START - Cookbook sre.ganeti.makevm
* 22:06 krinkle@deploy1001: Synchronized php-1.36.0-wmf.16/includes/filerepo/: [[phab:T267668|T267668]] - {{Gerrit|I1115135ee}}, and {{Gerrit|Ic239bb9807}} (duration: 01m 07s)
* 20:19 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 20:17 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 20:12 herron: upgraded logstash-next to kibana 7.10
* 19:23 otto@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'canary' .
* 19:23 otto@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
* 19:20 otto@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
* 19:20 otto@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'canary' .
* 19:14 otto@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
* 18:48 mutante: gerrit1001 - re-enabling puppet after merging gerrit:642086 for [[phab:T268260|T268260]] (upstream bug 13701)
* 18:41 mutante: gerrit1001 - added RequestHeader set "X-Forwarded-Proto" expr=%<nowiki>{</nowiki>REQUEST_SCHEME<nowiki>}</nowiki> in apache config, reloaded apache to fix redirect issue
* 18:37 mutante: gerrit1001 - disabled puppet
* 18:19 ryankemper@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99)
* 18:07 ryankemper@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99)
* 18:03 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.roll-restart-workers (exit_code=0)
* 17:59 clarakosi@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventstreams' for release 'production' .
* 17:47 clarakosi@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventstreams' for release 'production' .
* 17:33 hashar@deploy1001: Finished deploy [gerrit/gerrit@9d27055]: Upgrade gerrit1001 (primary) to Gerrit 3.2.5 (duration: 00m 09s)
* 17:33 hashar@deploy1001: Started deploy [gerrit/gerrit@9d27055]: Upgrade gerrit1001 (primary) to Gerrit 3.2.5
* 17:32 hashar: Upgrading Gerrit to 3.2.5 and restarting it
* 17:05 dancy@deploy1001: Synchronized php: group1 wikis to 1.36.0-wmf.16 (duration: 01m 06s)
* 17:04 dancy@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.36.0-wmf.16
* 16:59 ryankemper: [[phab:T246345|T246345]] [wdqs] Data-transfer of new wdqs node `wdqs1012` is complete, beginning transfer of `wdqs1004`->`wdqs1013` (public) and `wdqs1003`->`wdqs1011` (internal). Once these transfers are done `wdqs1012` and `wdqs1013` will need to be pooled and have their weights set to 10 after verifying they're healthy
* 16:58 kormat: started mariadb on pc2010, now with more 🤞
* 16:58 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 16:54 kormat: stopping mariadb on pc2010
* 16:54 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 16:43 hashar: Restarting Gerrit replica instance on gerrit2001
* 16:42 hashar@deploy1001: Finished deploy [gerrit/gerrit@9d27055]: Upgrade gerrit2001 to Gerrit 3.2.5 (take 2 after rebasing deploy server) (duration: 00m 10s)
* 16:42 hashar@deploy1001: Started deploy [gerrit/gerrit@9d27055]: Upgrade gerrit2001 to Gerrit 3.2.5 (take 2 after rebasing deploy server)
* 16:41 kormat: stopped and started replication on pc2010 to see if that would help it recover
* 16:40 hashar@deploy1001: Finished deploy [gerrit/gerrit@5a41181]: Upgrade gerrit2001 to Gerrit 3.2.5 (duration: 00m 05s)
* 16:40 hashar@deploy1001: Started deploy [gerrit/gerrit@5a41181]: Upgrade gerrit2001 to Gerrit 3.2.5
* 16:35 elukey: roll restart hadoop workers for openjdk upgrades
* 16:35 elukey@cumin1001: START - Cookbook sre.hadoop.roll-restart-workers
* 16:06 elukey@cumin1001: END (PASS) - Cookbook sre.presto.roll-restart-workers (exit_code=0)
* 15:58 moritzm: installing jupyter-notebook security updates on an-coord*
* 15:56 elukey@cumin1001: START - Cookbook sre.presto.roll-restart-workers
* 15:52 bblack: dns*: upgrade to gdnsd-3.4.0 on remainder of the dns fleet'
* 15:44 bblack: dns3001: upgrade gdnsd to 3.4.0
* 15:43 elukey@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
* 15:41 bblack: dns1001: upgrade gdnsd to 3.4.0
* 15:40 elukey@cumin1001: START - Cookbook sre.hosts.decommission
* 15:40 elukey@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
* 15:36 bblack: dns3002: upgrade gdnsd to 3.4.0
* 15:36 elukey@cumin1001: START - Cookbook sre.hosts.decommission
* 15:36 elukey@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
* 15:31 bblack: authdns1001: upgrade gdnsd to 3.4.0
* 15:30 elukey@cumin1001: START - Cookbook sre.hosts.decommission
* 15:29 elukey@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
* 15:26 elukey@cumin1001: START - Cookbook sre.hosts.decommission
* 15:25 elukey@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
* 15:23 elukey@cumin1001: START - Cookbook sre.hosts.decommission
* 15:22 elukey@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
* 15:18 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 15:18 elukey@cumin1001: START - Cookbook sre.hosts.decommission
* 15:17 elukey@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
* 15:16 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 15:11 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 15:08 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 14:57 moritzm: installing openldap security updates on buster (client side tools/libs, slapd already updated)
* 14:54 elukey@cumin1001: START - Cookbook sre.hosts.decommission
* 14:53 elukey@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
* 14:50 elukey@cumin1001: START - Cookbook sre.hosts.decommission
* 14:49 elukey@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
* 14:47 marostegui: Sanitize enwiki on clouddb1017 [[phab:T267090|T267090]]
* 14:45 elukey@cumin1001: START - Cookbook sre.hosts.decommission
* 14:44 elukey@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
* 14:43 volans@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
* 14:41 marostegui: Sanitize enwiki on clouddb1013 [[phab:T267090|T267090]]
* 14:39 volans@cumin1001: START - Cookbook sre.hosts.decommission
* 14:36 elukey@cumin1001: START - Cookbook sre.hosts.decommission
* 14:29 moritzm: rolling restart of app server canaries to pick up latest sec updates
* 14:21 moritzm: installing krb5 security updates on stretch
* 14:02 bblack: authdns2001: upgrade gdnsd to 3.4.0
* 13:45 XioNoX: push current state of audited cloud-in4 filter - [[phab:T264993|T264993]]
* 13:42 moritzm: removing stray wireshark 2.2.6 wireshark libs on Stretch
* 13:32 moritzm: installing wireshark security updates
* 13:30 bblack: dns4002: upgrade gdnsd to 3.4.0
* 13:28 bblack: reprepro: updated buster-wikimedia gdnsd package to 3.4.0-1~wmf1
* 12:43 moritzm: installing libproxy security updates on stretch
* 12:38 marostegui: Stop mysql on db1106 to clone clouddb1013 and clouddb1017 [[phab:T267090|T267090]]
* 12:25 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1106 [[phab:T267090|T267090]]', diff saved to https://phabricator.wikimedia.org/P13334 and previous config saved to /var/cache/conftool/dbconfig/20201119-122459-marostegui.json
* 12:00 marostegui@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
* 11:53 marostegui@cumin1001: START - Cookbook sre.hosts.decommission
* 11:46 marostegui@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
* 11:44 moritzm: installing Java security updates on Hadoop/Kafka Jumbo hosts
* 11:42 marostegui@cumin1001: START - Cookbook sre.hosts.decommission
* 11:40 elukey@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
* 11:33 elukey@cumin1001: START - Cookbook sre.hosts.decommission
* 11:00 Urbanecm: Start of mwscript extensions/AbuseFilter/maintenance/updateVarDumps.php --wiki=$wiki --print-orphaned-records-to=/tmp/urbanecm/$wiki-orphaned.log --progress-markers > $wiki.log in a tmux at mwmaint1002 (wiki=ruwiki; [[phab:T246539|T246539]])
* 10:28 marostegui: Restart mysql on db1115, tendril and dbtree will be down for a few minutes
* 09:40 marostegui: Stop mysql on db1124:3311 to clone clouddb1013 and clouddb1017, there will be lag on s1 on wikireplicas - [[phab:T267090|T267090]]
* 09:29 moritzm: upgrading serpens to Buster
* 09:26 XioNoX: eqiad row C: move Ganeti/LVS interfaces to individual terms
* 09:07 elukey: restart kafka daemons on kafka-jumbo1001 for openjdk upgrades (canary)
* 08:56 effie: disable puppet on mw canaries to merge 641816
* 08:55 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.roll-restart-masters (exit_code=0)
* 08:49 elukey: restart hadoop daemons on analytics1058 for openjdk upgrades (canary)
* 08:25 elukey@cumin1001: START - Cookbook sre.hadoop.roll-restart-masters
* 08:19 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.roll-restart-masters (exit_code=0)
* 08:19 XioNoX: eqiad row C: standardize interfaces config
* 07:55 XioNoX: eqiad row D: move Ganeti/LVS interfaces to individual terms
* 07:47 XioNoX: eqiad row D: standardize interfaces config
* 07:22 elukey@cumin1001: START - Cookbook sre.hadoop.roll-restart-masters
* 07:21 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.roll-restart-workers (exit_code=0)
* 07:05 elukey: roll restart java daemons on Hadoop test for openjdk upgrades
* 07:05 elukey@cumin1001: START - Cookbook sre.hadoop.roll-restart-workers
* 06:22 marostegui@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
* 06:21 marostegui: Remove es1014 from tendril and zarcillo [[phab:T268102|T268102]]
* 06:18 marostegui@cumin1001: START - Cookbook sre.hosts.decommission
* 06:08 marostegui: Stop mysql on db1125:3316 to clone clouddb1015 and clouddb1019, there will be lag on s6 on wikireplicas - [[phab:T267090|T267090]]
* 02:41 ryankemper@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99)
* 01:30 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
 
== 2020-11-18 ==
* 23:34 mutante: disabling puppet on memcache::mediawiki - deploying gerrit:637742
* 22:56 dpifke@deploy1001: Finished deploy [performance/arc-lamp@6bbac6d]: Fix for bytes/str issue after [[phab:T267269|T267269]] (duration: 00m 04s)
* 22:56 dpifke@deploy1001: Started deploy [performance/arc-lamp@6bbac6d]: Fix for bytes/str issue after [[phab:T267269|T267269]]
* 22:24 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 22:22 robh@cumin1001: START - Cookbook sre.hosts.downtime
* 22:19 urbanecm@deploy1001: Synchronized wmf-config/CommonSettings.php: Deploy GlobalWatchlist to beta (noop; [[phab:T268181|T268181]]) (duration: 01m 04s)
* 22:11 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Deploy GlobalWatchlist extension: Prepare IS.php to know relevant variables (noop; [[phab:T268181|T268181]]) (duration: 01m 06s)
* 22:05 urbanecm@deploy1001: Synchronized wmf-config/extension-list: Deploy GlobalWatchlist extension to beta: add it to extension-list ([[phab:T268181|T268181]]) (duration: 01m 05s)
* 21:53 mutante: mwdebug1003 - restarting ferm because config was generated but service not restarted due to puppet dependency errors, breaking NRPE monitoring [[phab:T267248|T267248]]
* 21:47 mutante: mwdebug1003 - scap pull - [[phab:T267248|T267248]]
* 21:40 mutante: mw1317,mw1318 - back in action and all monitoring activated again
* 21:17 dzahn@cumin1001: conftool action : set/weight=10; selector: name=mw1318.eqiad.wmnet,cluster=videoscaler
* 21:08 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1317.eqiad.wmnet
* 21:08 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1318.eqiad.wmnet
* 21:02 mutante: mw1317,mw1318 - repooled=no after physical move to rack B
* 20:56 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1318.eqiad.wmnet
* 20:54 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1317.eqiad.wmnet
* 20:27 mutante: mw1317, mw1318 shutting down for physical move
* 20:21 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw1318.eqiad.wmnet
* 20:21 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw1317.eqiad.wmnet
* 20:15 mutante: mw1317,mw1318 - downtimed and depooled - they are physically moving from B7 to B5 ([[phab:T266164|T266164]])
* 20:13 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 20:13 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 20:13 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 20:13 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 20:11 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1317.eqiad.wmnet
* 20:11 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1318.eqiad.wmnet
* 20:10 dancy@deploy1001: Synchronized php: group1 wikis to 1.36.0-wmf.18 (duration: 01m 03s)
* 20:09 dancy@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.36.0-wmf.18
* 20:03 akosiaris@cumin1001: conftool action : set/pooled=inactive; selector: service=recommendation-api,name=kubernetes.*,dc=codfw
* 20:03 akosiaris@cumin1001: conftool action : set/weight=0; selector: service=recommendation-api,name=kubernetes.*,dc=codfw
* 19:53 akosiaris@cumin1001: conftool action : set/pooled=no; selector: service=recommendation-api,name=kubernetes.*,dc=codfw
* 19:48 otto@deploy1001: Synchronized php-1.36.0-wmf.16/extensions/EventLogging/modules/ext.eventLogging/core.js: EventLogging legacy events should use dt as server side receive time - [[phab:T240460|T240460]] (duration: 01m 06s)
* 19:45 otto@deploy1001: Synchronized php-1.36.0-wmf.18/extensions/EventLogging/modules/ext.eventLogging/core.js: EventLogging legacy events should use dt as server side receive time - [[phab:T240460|T240460]] (duration: 01m 07s)
* 19:26 pt1979@cumin2001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 19:23 ppchelko@deploy1001: Synchronized wmf-config/InitialiseSettings.php: gerrit:635607 - Switch ParserCache to JSON for group0 wikis (duration: 01m 05s)
* 19:19 ppchelko@deploy1001: Synchronized wmf-config/CommonSettings.php: gerrit:635086 - Enable parsoid on api_appserver (duration: 01m 04s)
* 19:19 pt1979@cumin2001: START - Cookbook sre.dns.netbox
* 19:13 ppchelko@deploy1001: Synchronized wmf-config/CommonSettings.php: gerrit:641527 - Set  to 0 (duration: 01m 04s)
* 18:26 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 18:26 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 17:44 jayme@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 17:38 jayme@cumin1001: START - Cookbook sre.dns.netbox
* 17:18 elukey: shutdown an-presto1004 for hw maintenance
* 17:13 akosiaris: [[phab:T241230|T241230]] pool codfw kubernetes for recommendation-api at a very low weight
* 17:12 akosiaris@cumin1001: conftool action : set/pooled=yes; selector: service=recommendation-api,name=kubernetes.*,dc=codfw
* 17:12 akosiaris@cumin1001: conftool action : set/weight=1; selector: service=recommendation-api,name=kubernetes.*,dc=codfw
* 16:52 jbond42: drop os_version/requiers_os functions from wmflib
* 16:50 elukey: update /etc/krb5.keytab on krb1001/krb2001 to match the most up to date key version for host/krb2001.codfw.wmnet
* 16:49 hnowlan@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
* 16:49 hnowlan@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
* 16:44 hnowlan@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
* 16:43 hnowlan@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
* 16:38 reedy@deploy1001: Synchronized wmf-config/logging.php: [[phab:T268141|T268141]] (duration: 01m 06s)
* 16:36 pt1979@cumin2001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 16:32 robh@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 16:31 pt1979@cumin2001: START - Cookbook sre.dns.netbox
* 16:27 robh@cumin1001: START - Cookbook sre.dns.netbox
* 15:59 akosiaris@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'recommendation-api' for release 'production' .
* 15:56 akosiaris@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'recommendation-api' for release 'production' .
* 15:51 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'recommendation-api' for release 'production' .
* 15:42 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 15:16 Urbanecm: mwscript deleteEqualMessages.php --wiki=cswiki --delete
* 15:14 akosiaris@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'recommendation-api' for release 'production' .
* 15:12 akosiaris@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'recommendation-api' for release 'production' .
* 15:12 otto@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
* 15:12 otto@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'canary' .
* 15:11 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'recommendation-api' for release 'production' .
* 15:09 otto@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'canary' .
* 15:09 otto@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
* 15:05 otto@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
* 15:03 Urbanecm: Purge https://2030.wikimedia.org/ via purgeList.php ([[phab:T264797|T264797]])
* 14:34 akosiaris@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'recommendation-api' for release 'production' .
* 14:30 akosiaris@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'recommendation-api' for release 'production' .
* 14:30 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'recommendation-api' for release 'production' .
* 14:17 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'recommendation-api' for release 'production' .
* 14:13 Urbanecm: Purge https://2030.wikimedia.org/ via purgeList.php ([[phab:T264797|T264797]])
* 14:09 elukey: copied /etc/krb5.keytab from krb1001 to krb2001 (the last one contained only one principal for 2001, the first one both for 1001 and 2001)
* 14:05 moritzm: installing openldap security updates on ro replicas
* 14:03 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'recommendation-api' for release 'production' .
* 14:02 elukey: restart krb5-kpropd.service on krb2001 to force the pick up of new client configs
* 13:35 bblack: cache_text: Executing "varnishadm -n frontend param.set nuke_limit 1000" - [[phab:T266373|T266373]]
* 13:34 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'recommendation-api' for release 'production' .
* 13:30 moritzm: installing openldap security updates on corp replicas
* 13:08 Urbanecm: EU B&C done (~15 minutes ago)
* 12:43 akosiaris: sync staging cluster's helmfile.d/admin state. Aside from calico, the rest is a noop
* 12:43 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'kube-system' for release 'eventrouter' .
* 12:42 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'kube-system' for release 'calico-policy-controller' .
* 12:42 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'kube-system' for release 'coredns' .
* 12:42 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'kube-system' for release 'rbac-deploy-clusterrole' .
* 12:40 urbanecm@deploy1001: Synchronized php-1.36.0-wmf.16/extensions/GrowthExperiments/includes/NewcomerTasks/TaskSuggester/NewcomerTasksCacheRefreshJob.php: {{Gerrit|5488f56c7458fa8fb9be5f41f131e00b26a84cc0}}: Fix NewcomerTasksCacheRefreshJob ([[phab:T268008|T268008]]) (duration: 01m 05s)
* 12:25 urbanecm@deploy1001: Synchronized php-1.36.0-wmf.18/extensions/GrowthExperiments/includes/NewcomerTasks/TaskSuggester/NewcomerTasksCacheRefreshJob.php: {{Gerrit|45d71a37f381e81e5382c8e10ac4063c9665beb8}}: Fix NewcomerTasksCacheRefreshJob ([[phab:T268008|T268008]]) (duration: 01m 05s)
* 12:13 Urbanecm: Purge https://en.wikipedia.org/static/images/project-logos/<nowiki>{</nowiki>bnwiki,bnwiki-1.5x,bnwiki-2x<nowiki>}</nowiki>.png ([[phab:T265553|T265553]])
* 12:13 akosiaris@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=releases
* 12:11 urbanecm@deploy1001: Synchronized static/images/project-logos/: {{Gerrit|70aabf7ec8e1b549e78978e48967fb70d21316de}}: Regenerate Bengali Wikipedia logo ([[phab:T265553|T265553]]) (duration: 01m 06s)
* 12:06 akosiaris@cumin1001: conftool action : set/ttl=300; selector: dnsdisc=wikifeeds
* 12:01 akosiaris@cumin1001: conftool action : set/pooled=false; selector: dnsdisc=restbase-async,name=eqiad
* 12:00 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool pc1009 in pc3 after restarting mysql [[phab:T266483|T266483]] (duration: 01m 06s)
* 12:00 akosiaris@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=blubberoid,name=eqiad
* 11:56 Urbanecm: Start of mwscript extensions/AbuseFilter/maintenance/updateVarDumps.php --wiki=$wiki --print-orphaned-records-to=/tmp/urbanecm/$wiki-orphaned.log --progress-markers > $wiki.log in a tmux at mwmaint1002 (wiki=frwiki; [[phab:T246539|T246539]])
* 11:56 marostegui: Restart mysql on pc1009 [[phab:T266483|T266483]]
* 11:56 Urbanecm: End of mwscript extensions/AbuseFilter/maintenance/updateVarDumps.php --wiki=$wiki --print-orphaned-records-to=/tmp/urbanecm/$wiki-orphaned.log --progress-markers > $wiki.log in a tmux at mwmaint1002 (wiki=nlwiki; [[phab:T246539|T246539]])
* 11:56 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool pc1009 and place pc1010 instead of it [[phab:T266483|T266483]] (duration: 01m 18s)
* 11:40 XioNoX: eqiad row D: remove un-needed "enable" keywords
* 10:59 jbond@cumin1001: END (FAIL) - Cookbook sre.puppet.renew-cert (exit_code=99)
* 10:59 jbond@cumin1001: START - Cookbook sre.puppet.renew-cert
* 10:58 jbond42: renew sretest1002 ssl cert to test cookbook
* 10:51 hnowlan@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'changeprop' for release 'production' .
* 10:51 hnowlan@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'changeprop' for release 'staging' .
* 10:48 hnowlan@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'changeprop' for release 'staging' .
* 10:48 hnowlan@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'changeprop' for release 'production' .
* 10:25 godog: ms-be1022 - disable failed sdb
* 10:01 XioNoX: eqiad row D: Standardize interfaces descriptions
* 09:56 moritzm: uploaded libexif 0.6.21-2+deb8u4+wmf1 to jessie-wikimedia
* 09:22 elukey: set dns_canonicalize_hostname = false to all kerberos clients
* 09:13 jbond42: renew puppet certificate of seaborgium
* 08:34 marostegui: Stop MySQL on es1011, es1012, es1014 [[phab:T268100|T268100]] [[phab:T268101|T268101]] [[phab:T268102|T268102]]
* 08:29 marostegui@cumin1001: dbctl commit (dc=all): 'Remove es1012 from dbctl [[phab:T268101|T268101]]', diff saved to https://phabricator.wikimedia.org/P13326 and previous config saved to /var/cache/conftool/dbconfig/20201118-082942-marostegui.json
* 08:26 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es1012 before decommissioning it', diff saved to https://phabricator.wikimedia.org/P13325 and previous config saved to /var/cache/conftool/dbconfig/20201118-082636-marostegui.json
* 08:26 marostegui@cumin1001: dbctl commit (dc=all): 'es1032 (re)pooling @ 100%: Slowly pool es1032 after being recloned [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P13324 and previous config saved to /var/cache/conftool/dbconfig/20201118-082618-root.json
* 08:11 marostegui@cumin1001: dbctl commit (dc=all): 'es1032 (re)pooling @ 80%: Slowly pool es1032 after being recloned [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P13323 and previous config saved to /var/cache/conftool/dbconfig/20201118-081115-root.json
* 07:56 marostegui@cumin1001: dbctl commit (dc=all): 'es1032 (re)pooling @ 75%: Slowly pool es1032 after being recloned [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P13322 and previous config saved to /var/cache/conftool/dbconfig/20201118-075612-root.json
* 07:45 marostegui: Deploy schema change on db1098:3316 [[phab:T267335|T267335]] [[phab:T267399|T267399]]
* 07:41 marostegui@cumin1001: dbctl commit (dc=all): 'es1032 (re)pooling @ 60%: Slowly pool es1032 after being recloned [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P13321 and previous config saved to /var/cache/conftool/dbconfig/20201118-074108-root.json
* 07:28 Urbanecm: Start of mwscript extensions/AbuseFilter/maintenance/updateVarDumps.php --wiki=$wiki --print-orphaned-records-to=/tmp/urbanecm/$wiki-orphaned.log --progress-markers > $wiki.log in a tmux at mwmaint1002 (wiki=nlwiki; [[phab:T246539|T246539]])
* 07:26 marostegui@cumin1001: dbctl commit (dc=all): 'es1032 (re)pooling @ 50%: Slowly pool es1032 after being recloned [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P13320 and previous config saved to /var/cache/conftool/dbconfig/20201118-072605-root.json
* 07:16 marostegui: Run check table on s6 on db1125:3316 [[phab:T267090|T267090]]
* 07:11 marostegui@cumin1001: dbctl commit (dc=all): 'es1032 (re)pooling @ 30%: Slowly pool es1032 after being recloned [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P13319 and previous config saved to /var/cache/conftool/dbconfig/20201118-071101-root.json
* 06:55 marostegui@cumin1001: dbctl commit (dc=all): 'es1032 (re)pooling @ 25%: Slowly pool es1032 after being recloned [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P13318 and previous config saved to /var/cache/conftool/dbconfig/20201118-065558-root.json
* 06:53 elukey: restart also mirror maker on kafka-main1001/1003 (seems not related but just to clear old errors and a possible weird state)
* 06:45 marostegui@cumin1001: dbctl commit (dc=all): 'es1018 (re)pooling @ 100%: Slowly pool es1018 after cloning es1032 [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P13317 and previous config saved to /var/cache/conftool/dbconfig/20201118-064556-root.json
* 06:40 marostegui@cumin1001: dbctl commit (dc=all): 'es1032 (re)pooling @ 20%: Slowly pool es1032 after being recloned [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P13316 and previous config saved to /var/cache/conftool/dbconfig/20201118-064054-root.json
* 06:37 elukey: restart kafka-mirror-main-codfw_to_main-eqiad@0.service on kafka-main1002 - consumer msg rate low since kafka-main2003 went down for codfw c7 failure
* 06:30 marostegui@cumin1001: dbctl commit (dc=all): 'es1018 (re)pooling @ 75%: Slowly pool es1018 after cloning es1032 [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P13315 and previous config saved to /var/cache/conftool/dbconfig/20201118-063052-root.json
* 06:25 marostegui@cumin1001: dbctl commit (dc=all): 'es1032 (re)pooling @ 10%: Slowly pool es1032 after being recloned [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P13314 and previous config saved to /var/cache/conftool/dbconfig/20201118-062551-root.json
* 06:25 marostegui@cumin1001: dbctl commit (dc=all): 'Remove es1014 from dbctl', diff saved to https://phabricator.wikimedia.org/P13313 and previous config saved to /var/cache/conftool/dbconfig/20201118-062547-marostegui.json
* 06:15 marostegui@cumin1001: dbctl commit (dc=all): 'es1018 (re)pooling @ 50%: Slowly pool es1018 after cloning es1032 [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P13312 and previous config saved to /var/cache/conftool/dbconfig/20201118-061549-root.json
* 06:13 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es1014 before decommissioning it', diff saved to https://phabricator.wikimedia.org/P13311 and previous config saved to /var/cache/conftool/dbconfig/20201118-061340-marostegui.json
* 06:12 marostegui@cumin1001: dbctl commit (dc=all): 'Set es1027 as new es1 master', diff saved to https://phabricator.wikimedia.org/P13310 and previous config saved to /var/cache/conftool/dbconfig/20201118-061218-marostegui.json
* 06:11 marostegui@cumin1001: dbctl commit (dc=all): 'Remove es1011 from dbctl', diff saved to https://phabricator.wikimedia.org/P13309 and previous config saved to /var/cache/conftool/dbconfig/20201118-061112-marostegui.json
* 06:06 marostegui@cumin1001: dbctl commit (dc=all): 'Pool es1032 with minimum weight on es1 [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P13308 and previous config saved to /var/cache/conftool/dbconfig/20201118-060641-marostegui.json
* 06:00 marostegui@cumin1001: dbctl commit (dc=all): 'es1018 (re)pooling @ 25%: Slowly pool es1018 after cloning es1032 [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P13307 and previous config saved to /var/cache/conftool/dbconfig/20201118-060045-root.json
* 05:47 marostegui: Run check table on enwiki on db1124:3311 [[phab:T267090|T267090]]
* 05:45 marostegui@cumin1001: dbctl commit (dc=all): 'es1018 (re)pooling @ 10%: Slowly pool es1018 after cloning es1032 [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P13306 and previous config saved to /var/cache/conftool/dbconfig/20201118-054542-root.json
* 00:53 tgr_: also deployed [[gerrit:641294{{!}}Suggested Edits: Guard against task type not existing (T268012)]]
* 00:52 tgr@deploy1001: Synchronized php-1.36.0-wmf.18/extensions/GrowthExperiments/includes/HomepageModules/SuggestedEdits.php: Backport: [[gerrit:641295{{!}}Suggested edits: Guard against empty topic data (T268015)]] (duration: 01m 07s)
* 00:27 tgr@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:641250{{!}}Enable watchlist expiry feature on Wikidata & Commons (T266874)]] (duration: 01m 03s)
 
== 2020-11-17 ==
* 22:54 mforns@deploy1001: Finished deploy [analytics/refinery@f19d20c] (thin): Regular analytics weekly train THIN [analytics/refinery@f19d20c21ada05df230d00c6e0022a7d5c356c13] (duration: 00m 07s)
* 22:54 mforns@deploy1001: Started deploy [analytics/refinery@f19d20c] (thin): Regular analytics weekly train THIN [analytics/refinery@f19d20c21ada05df230d00c6e0022a7d5c356c13]
* 22:53 mforns@deploy1001: Finished deploy [analytics/refinery@f19d20c]: Regular analytics weekly train [analytics/refinery@f19d20c21ada05df230d00c6e0022a7d5c356c13] (duration: 12m 51s)
* 22:45 clarakosi@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'mathoid' for release 'production' .
* 22:40 mforns@deploy1001: Started deploy [analytics/refinery@f19d20c]: Regular analytics weekly train [analytics/refinery@f19d20c21ada05df230d00c6e0022a7d5c356c13]
* 22:39 clarakosi@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'mathoid' for release 'production' .
* 22:29 clarakosi@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'mathoid' for release 'staging' .
* 22:10 mutante: otrs1001 - systemctl start otrs-cache-cleanup
* 22:08 ppchelko@deploy1001: Finished deploy [restbase/deploy@8363aeb]: update to service-runner 2.8.0, everywhere (duration: 11m 07s)
* 22:07 mutante: otrs1001 - removing otrs-cache-cleanup cron from otrs's crontab - adding same command as systemd timer. gerrit:637038 [[phab:T265138|T265138]]
* 21:57 ppchelko@deploy1001: Started deploy [restbase/deploy@8363aeb]: update to service-runner 2.8.0, everywhere
* 21:32 ppchelko@deploy1001: Finished deploy [restbase/deploy@8363aeb]: update to service-runner 2.8.0, codfw (duration: 07m 11s)
* 21:24 ppchelko@deploy1001: Started deploy [restbase/deploy@8363aeb]: update to service-runner 2.8.0, codfw
* 20:56 dancy@deploy1001: rebuilt and synchronized wikiversions files: group0 wikis to 1.36.0-wmf.18
* 20:43 Urbanecm: End of mwscript extensions/AbuseFilter/maintenance/updateVarDumps.php --wiki=$wiki --print-orphaned-records-to=/tmp/urbanecm/$wiki-orphaned.log --progress-markers > $wiki.log in a tmux at mwmaint1002 (wiki=itwiki; [[phab:T246539|T246539]])
* 20:31 dancy@deploy1001: Finished scap: testwikis wikis to 1.36.0-wmf.18 (duration: 39m 37s)
* 19:58 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 19:56 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 19:52 dancy@deploy1001: Started scap: testwikis wikis to 1.36.0-wmf.18
* 19:50 ppchelko@deploy1001: Finished deploy [restbase/deploy@8363aeb]: update to service-runner 2.8.0, canary on 2010 (duration: 02m 03s)
* 19:48 ppchelko@deploy1001: Started deploy [restbase/deploy@8363aeb]: update to service-runner 2.8.0, canary on 2010
* 19:46 dancy@deploy1001: Pruned MediaWiki: 1.36.0-wmf.11 (duration: 13m 05s)
* 19:24 elukey@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
* 19:21 elukey@cumin1001: START - Cookbook sre.hosts.decommission
* 19:18 elukey@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
* 19:12 otto@deploy1001: Synchronized wmf-config/InitialiseSettings-labs.php: wgEventStreamsDefaultSettings in beta should only set eqiad as topic prefix - [[phab:T253069|T253069]] (duration: 02m 26s)
* 19:12 elukey@cumin1001: START - Cookbook sre.hosts.decommission
* 19:09 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 19:07 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 18:38 ejegg: updated standalone SmashPig deployment from {{Gerrit|09f29c1da5}} to {{Gerrit|63dffcb11f}}
* 18:36 ejegg: updated fundraising python tools from {{Gerrit|68e054c9ad}} to {{Gerrit|41cab089da}}
* 18:09 jynus: stopping db1139 for hw maintenance [[phab:T261405|T261405]]
* 17:59 dpifke@deploy1001: Finished deploy [performance/navtiming@8eaf7db]: (no justification provided) (duration: 00m 05s)
* 17:58 dpifke@deploy1001: Started deploy [performance/navtiming@8eaf7db]: (no justification provided)
* 17:37 dpifke@deploy1001: Finished deploy [performance/coal@43b91df]: (no justification provided) (duration: 00m 06s)
* 17:37 dpifke@deploy1001: Started deploy [performance/coal@43b91df]: (no justification provided)
* 17:34 dpifke@deploy1001: Finished deploy [statsv/statsv@249d073]: (no justification provided) (duration: 00m 05s)
* 17:34 dpifke@deploy1001: Started deploy [statsv/statsv@249d073]: (no justification provided)
* 17:27 dpifke@deploy1001: Finished deploy [statsv/statsv@873ea90]: (no justification provided) (duration: 00m 05s)
* 17:27 dpifke@deploy1001: Started deploy [statsv/statsv@873ea90]: (no justification provided)
* 17:19 dzahn@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
* 17:16 dpifke@deploy1001: Finished deploy [performance/arc-lamp@55d4d41]: (no justification provided) (duration: 00m 04s)
* 17:16 dpifke@deploy1001: Started deploy [performance/arc-lamp@55d4d41]: (no justification provided)
* 17:15 dpifke@deploy1001: Finished deploy [performance/arc-lamp@55fccc6]: (no justification provided) (duration: 00m 04s)
* 17:15 dpifke@deploy1001: Started deploy [performance/arc-lamp@55fccc6]: (no justification provided)
* 17:08 dpifke@deploy1001: Finished deploy [performance/coal@5a32eb2]: (no justification provided) (duration: 00m 04s)
* 17:08 dpifke@deploy1001: Started deploy [performance/coal@5a32eb2]: (no justification provided)
* 16:47 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm
* 16:46 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 16:46 filippo@cumin1001: START - Cookbook sre.hosts.downtime
* 16:42 jbond42: re-enable puppet fleet wide
* 16:36 clarakosi@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' .
* 16:33 clarakosi@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' .
* 16:29 clarakosi@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'staging' .
* 16:22 moritzm: uploaded zeromq3 4.0.5+dfsg-2+deb8u2+wmf1 to jessie-wikimedia
* 16:13 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 16:13 klausman@cumin1001: START - Cookbook sre.hosts.downtime
* 16:04 volans: powercycle ms-be1030.eqiad.wmnet, unresponsive to ping/ssh, no prompt in console, nothing in hw logs
* 15:34 ayounsi@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 15:27 ayounsi@cumin1001: START - Cookbook sre.dns.netbox
* 15:16 pt1979@cumin2001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 15:16 jbond42: disable puppet fleet wide
* 15:10 pt1979@cumin2001: START - Cookbook sre.dns.netbox
* 15:09 otto@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-main' for release 'canary' .
* 15:09 otto@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
* 15:01 otto@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-main' for release 'canary' .
* 15:01 otto@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
* 14:59 cdanis@deploy1001: Synchronized docroot/thankyou: Special docroot for thankyouwiki [[phab:T259312|T259312]] {{Gerrit|d2a20ec57}} (duration: 00m 55s)
* 14:58 otto@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'canary' .
* 14:58 otto@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
* 14:57 elukey: stutdown stat1008 for ram expansion
* 14:55 otto@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'canary' .
* 14:55 otto@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
* 14:49 otto@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-analytics' for release 'production' .
* 14:48 ayounsi@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 14:47 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 14:47 filippo@cumin1001: START - Cookbook sre.hosts.downtime
* 14:44 ayounsi@cumin1001: START - Cookbook sre.dns.netbox
* 14:43 XioNoX: codfw row A: move ganeti and LVS from interface-range to individual term
* 14:41 otto@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-analytics' for release 'production' .
* 14:37 Urbanecm: Start of mwscript extensions/AbuseFilter/maintenance/updateVarDumps.php --wiki=$wiki --print-orphaned-records-to=/tmp/urbanecm/$wiki-orphaned.log --progress-markers > $wiki.log in a tmux at mwmaint1002 (wiki=itwiki; [[phab:T246539|T246539]])
* 14:36 otto@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'canary' .
* 14:36 otto@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
* 14:33 otto@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
* 14:33 otto@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'canary' .
* 14:03 XioNoX: codfw row A: standardize interfaces
* 13:37 elukey@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
* 13:36 XioNoX: codfw row B: move ganeti, Cloud and LVS from interface-range to individual term
* 13:29 elukey@cumin1001: START - Cookbook sre.hosts.decommission
* 13:27 elukey@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
* 13:23 elukey@cumin1001: START - Cookbook sre.hosts.decommission
* 13:22 elukey@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=99)
* 13:22 elukey@cumin1001: START - Cookbook sre.hosts.decommission
* 13:21 elukey@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=99)
* 13:21 elukey@cumin1001: START - Cookbook sre.hosts.decommission
* 13:09 XioNoX: codfw row B: remove extra "enable"
* 12:59 Lucas_WMDE: EU backport&config window done (again ☺)
* 12:58 moritzm: updating idp-test* to 6.2.4-2
* 12:57 XioNoX: codfw row B: Standardize interfaces descriptions
* 12:55 lucaswerkmeister-wmde@deploy1001: Synchronized php-1.36.0-wmf.16/extensions/GrowthExperiments/includes/HomepageModules/SuggestedEdits.php: Backport: [[gerrit:641293{{!}}Suggested Edits: Guard against task type not existing (T268012)]] (duration: 00m 58s)
* 12:53 bblack: cpNNNN: removing old (30d+) failure reports from /var/cache/ocsp
* 12:42 moritzm: IDP updated to 6.2.4
* 12:33 Lucas_WMDE: reopen EU backport&config window
* 12:23 XioNoX: codfw row C: move ganeti and LVS from interface-range to individual term
* 12:15 XioNoX: codfw row C: remove extra "enable"
* 12:15 Lucas_WMDE: EU backport&config window done
* 12:13 hnowlan@puppetmaster1001: conftool action : set/pooled=yes:weight=10; selector: dc=codfw,cluster=maps,service=kartotherian,name=maps2006.codfw.wmnet
* 12:13 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/InitialiseSettings-labs.php: Config: [[gerrit:631496{{!}}Remove migration settings in InitialiseSettings.php (T264286)]], 2/2 (labs) (duration: 00m 56s)
* 12:12 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:631496{{!}}Remove migration settings in InitialiseSettings.php (T264286)]], 1/2 (prod) (duration: 00m 56s)
* 12:05 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/Wikibase.php: Config: [[gerrit:631431{{!}}Remove migration settings in Wikibase.php (T264286)]] (duration: 00m 57s)
* 11:51 XioNoX: codfw row C: Standardize interfaces descriptions
* 10:46 marostegui: Run a test on check_private_data on clouddb1013 for s1 and s3 - [[phab:T267090|T267090]]
* 10:25 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool pc1008 in pc2 after restarting mysql [[phab:T266483|T266483]] (duration: 00m 56s)
* 10:21 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 10:21 kormat@cumin1001: START - Cookbook sre.hosts.downtime
* 10:19 marostegui: Restart mysql on pc1008 [[phab:T266483|T266483]]
* 10:19 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool pc1008 and place pc1010 instead of it [[phab:T266483|T266483]] (duration: 00m 57s)
* 09:29 volans@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
* 09:17 volans@cumin1001: START - Cookbook sre.hosts.decommission
* 09:14 volans@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
* 09:10 volans@cumin1001: START - Cookbook sre.hosts.decommission
* 09:08 elukey@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
* 09:02 elukey@cumin1001: START - Cookbook sre.hosts.decommission
* 09:01 elukey@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
* 08:56 elukey@cumin1001: START - Cookbook sre.hosts.decommission
* 08:56 elukey@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
* 08:55 marostegui@cumin1001: dbctl commit (dc=all): 'Set es1028 as new es3 master', diff saved to https://phabricator.wikimedia.org/P13301 and previous config saved to /var/cache/conftool/dbconfig/20201117-085542-marostegui.json
* 08:54 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es1011 before decommissioning it and pool es1026 as new es2 master', diff saved to https://phabricator.wikimedia.org/P13300 and previous config saved to /var/cache/conftool/dbconfig/20201117-085432-marostegui.json
* 08:52 elukey@cumin1001: START - Cookbook sre.hosts.decommission
* 08:47 marostegui@cumin1001: dbctl commit (dc=all): 'es1034 (re)pooling @ 100%: Slowly pool es1034 after being recloned [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P13299 and previous config saved to /var/cache/conftool/dbconfig/20201117-084744-root.json
* 08:47 marostegui@cumin1001: dbctl commit (dc=all): 'es1033 (re)pooling @ 100%: Slowly pool es1033 after being recloned [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P13298 and previous config saved to /var/cache/conftool/dbconfig/20201117-084733-root.json
* 08:43 marostegui: Truncate tendril.global_status_log - [[phab:T231185|T231185]]
* 08:37 elukey@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
* 08:33 elukey@cumin1001: START - Cookbook sre.hosts.decommission
* 08:32 marostegui@cumin1001: dbctl commit (dc=all): 'es1034 (re)pooling @ 80%: Slowly pool es1034 after being recloned [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P13297 and previous config saved to /var/cache/conftool/dbconfig/20201117-083241-root.json
* 08:32 marostegui@cumin1001: dbctl commit (dc=all): 'es1033 (re)pooling @ 80%: Slowly pool es1033 after being recloned [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P13296 and previous config saved to /var/cache/conftool/dbconfig/20201117-083229-root.json
* 08:31 elukey@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
* 08:24 elukey@cumin1001: START - Cookbook sre.hosts.decommission
* 08:22 volans: restart netbox on netbox1001 to test new logging configuration
* 08:17 marostegui@cumin1001: dbctl commit (dc=all): 'es1034 (re)pooling @ 75%: Slowly pool es1034 after being recloned [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P13295 and previous config saved to /var/cache/conftool/dbconfig/20201117-081737-root.json
* 08:17 marostegui@cumin1001: dbctl commit (dc=all): 'es1033 (re)pooling @ 75%: Slowly pool es1033 after being recloned [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P13294 and previous config saved to /var/cache/conftool/dbconfig/20201117-081726-root.json
* 08:02 marostegui@cumin1001: dbctl commit (dc=all): 'es1034 (re)pooling @ 60%: Slowly pool es1034 after being recloned [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P13293 and previous config saved to /var/cache/conftool/dbconfig/20201117-080234-root.json
* 08:02 marostegui@cumin1001: dbctl commit (dc=all): 'es1033 (re)pooling @ 60%: Slowly pool es1033 after being recloned [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P13292 and previous config saved to /var/cache/conftool/dbconfig/20201117-080222-root.json
* 07:58 XioNoX: codfw row D: Convert LVS ranges to individual interfaces
* 07:54 XioNoX: codfw row D: explicitly set access ports to "interface-mode access"
* 07:49 XioNoX: split codfw row D ganeti switch ports out of the interface group
* 07:47 marostegui@cumin1001: dbctl commit (dc=all): 'es1034 (re)pooling @ 50%: Slowly pool es1034 after being recloned [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P13291 and previous config saved to /var/cache/conftool/dbconfig/20201117-074730-root.json
* 07:47 marostegui@cumin1001: dbctl commit (dc=all): 'es1033 (re)pooling @ 50%: Slowly pool es1033 after being recloned [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P13290 and previous config saved to /var/cache/conftool/dbconfig/20201117-074719-root.json
* 07:32 marostegui@cumin1001: dbctl commit (dc=all): 'es1034 (re)pooling @ 30%: Slowly pool es1034 after being recloned [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P13289 and previous config saved to /var/cache/conftool/dbconfig/20201117-073227-root.json
* 07:32 marostegui@cumin1001: dbctl commit (dc=all): 'es1033 (re)pooling @ 30%: Slowly pool es1033 after being recloned [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P13288 and previous config saved to /var/cache/conftool/dbconfig/20201117-073216-root.json
* 07:30 marostegui@cumin1001: dbctl commit (dc=all): 'es1019 (re)pooling @ 100%: Slowly pool es1019 after cloning es1034 [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P13287 and previous config saved to /var/cache/conftool/dbconfig/20201117-073057-root.json
* 07:30 marostegui@cumin1001: dbctl commit (dc=all): 'es1015 (re)pooling @ 100%: Slowly pool es1015 after cloning es1033 [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P13286 and previous config saved to /var/cache/conftool/dbconfig/20201117-073032-root.json
* 07:17 marostegui@cumin1001: dbctl commit (dc=all): 'es1034 (re)pooling @ 25%: Slowly pool es1034 after being recloned [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P13285 and previous config saved to /var/cache/conftool/dbconfig/20201117-071723-root.json
* 07:17 marostegui@cumin1001: dbctl commit (dc=all): 'es1033 (re)pooling @ 25%: Slowly pool es1033 after being recloned [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P13284 and previous config saved to /var/cache/conftool/dbconfig/20201117-071712-root.json
* 07:15 marostegui@cumin1001: dbctl commit (dc=all): 'es1019 (re)pooling @ 75%: Slowly pool es1019 after cloning es1034 [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P13283 and previous config saved to /var/cache/conftool/dbconfig/20201117-071553-root.json
* 07:15 marostegui@cumin1001: dbctl commit (dc=all): 'es1015 (re)pooling @ 75%: Slowly pool es1015 after cloning es1033 [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P13282 and previous config saved to /var/cache/conftool/dbconfig/20201117-071529-root.json
* 07:02 marostegui@cumin1001: dbctl commit (dc=all): 'es1034 (re)pooling @ 20%: Slowly pool es1034 after being recloned [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P13281 and previous config saved to /var/cache/conftool/dbconfig/20201117-070220-root.json
* 07:02 marostegui@cumin1001: dbctl commit (dc=all): 'es1033 (re)pooling @ 20%: Slowly pool es1033 after being recloned [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P13280 and previous config saved to /var/cache/conftool/dbconfig/20201117-070209-root.json
* 07:00 marostegui@cumin1001: dbctl commit (dc=all): 'es1019 (re)pooling @ 50%: Slowly pool es1019 after cloning es1034 [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P13278 and previous config saved to /var/cache/conftool/dbconfig/20201117-070050-root.json
* 07:00 marostegui: Stop mysql on db1124: s1 and s3, this will generate lag on enwiki and s3 on labsdb - [[phab:T267090|T267090]]
* 07:00 marostegui@cumin1001: dbctl commit (dc=all): 'es1015 (re)pooling @ 50%: Slowly pool es1015 after cloning es1033 [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P13277 and previous config saved to /var/cache/conftool/dbconfig/20201117-070025-root.json
* 06:51 marostegui: Upgrade db1077 and pc2010 to 10.4.17
* 06:47 marostegui@cumin1001: dbctl commit (dc=all): 'es1034 (re)pooling @ 10%: Slowly pool es1034 after being recloned [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P13276 and previous config saved to /var/cache/conftool/dbconfig/20201117-064716-root.json
* 06:47 marostegui@cumin1001: dbctl commit (dc=all): 'es1033 (re)pooling @ 10%: Slowly pool es1033 after being recloned [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P13275 and previous config saved to /var/cache/conftool/dbconfig/20201117-064705-root.json
* 06:45 marostegui@cumin1001: dbctl commit (dc=all): 'es1019 (re)pooling @ 25%: Slowly pool es1019 after cloning es1034 [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P13274 and previous config saved to /var/cache/conftool/dbconfig/20201117-064546-root.json
* 06:45 marostegui@cumin1001: dbctl commit (dc=all): 'es1015 (re)pooling @ 25%: Slowly pool es1015 after cloning es1033 [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P13273 and previous config saved to /var/cache/conftool/dbconfig/20201117-064522-root.json
* 06:39 marostegui@cumin1001: dbctl commit (dc=all): 'Pool es1034 with minimum weight on es3 [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P13272 and previous config saved to /var/cache/conftool/dbconfig/20201117-063933-marostegui.json
* 06:38 marostegui@cumin1001: dbctl commit (dc=all): 'Pool es1033 with minimum weight on es2 [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P13271 and previous config saved to /var/cache/conftool/dbconfig/20201117-063805-marostegui.json
* 06:30 marostegui@cumin1001: dbctl commit (dc=all): 'es1019 (re)pooling @ 10%: Slowly pool es1019 after cloning es1034 [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P13270 and previous config saved to /var/cache/conftool/dbconfig/20201117-063043-root.json
* 06:30 marostegui@cumin1001: dbctl commit (dc=all): 'es1015 (re)pooling @ 10%: Slowly pool es1015 after cloning es1033 [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P13269 and previous config saved to /var/cache/conftool/dbconfig/20201117-063019-root.json
* 02:37 dwisehaupt: shifted portion of thank you emails flowing through frmx's to 60% of the total volume
* 01:59 eileen_: civicrm revision is {{Gerrit|b6fe8bd791}}, config revision is {{Gerrit|61e2000391}}
 
== 2020-11-16 ==
* 23:28 mutante: cumin1001 - sudo systemctl start cumin-check-aliases (to confirm switching cron to timer worked) [[phab:T265138|T265138]]
* 22:22 otto@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
* 22:19 otto@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics' for release 'production' .
* 22:19 otto@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics' for release 'canary' .
* 22:17 otto@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
* 22:09 otto@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
* 22:06 mutante: planet - fixed updates of uk.planet which failed due to non-ASCII chars in a URL - since updates are systemd timers now that affects the entire systemd state monitoring
* 21:40 rzl@cumin1001: conftool action : set/pooled=yes; selector: name=mw2250.codfw.wmnet
* 21:40 rzl@cumin1001: conftool action : set/weight=1; selector: name=mw2250.codfw.wmnet,cluster=videoscaler,service=canary
* 21:38 rzl@cumin1001: conftool action : set/pooled=yes; selector: name=mw2250.codfw.wmnet,cluster=jobrunner
* 21:30 mutante: peek2001 - mv /var/lib/peek/git to git.old ; run puppet ; let it fix git checkout
* 21:07 rzl: disable puppet on jobrunners [[phab:T264991|T264991]]
* 20:40 mutante: planet1002/planet2002 - delete entire crontab of user planet, drop update cronjobs after switching to systemd timers with gerrit:636105 ([[phab:T265138|T265138]])
* 20:06 pt1979@cumin2001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 20:06 mutante: releases2002 systemctl reset-failed should clear Icinga systemd alert after gerrit:641228
* 20:05 dwisehaupt: disabling process-control jobs and moving to maintenance mode for maint window
* 19:57 pt1979@cumin2001: START - Cookbook sre.dns.netbox
* 19:53 ebernhardson@deploy1001: Finished deploy [wikimedia/discovery/analytics@4a953ca]: query_clicks_hourly: handle wmf.webrequest page_id change from int to bigint (duration: 02m 27s)
* 19:51 ebernhardson@deploy1001: Started deploy [wikimedia/discovery/analytics@4a953ca]: query_clicks_hourly: handle wmf.webrequest page_id change from int to bigint
* 19:48 effie: disable puppet on parsoid servers - [[phab:T264991|T264991]]
* 19:01 hnowlan@cumin1001: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0)
* 18:59 mutante: mw2255 - is pooled and puppet works on next run, after it removed php 7.2 config files
* 18:56 mutante: running puppet on mw2313 and mw2255 which were listed in puppetboard as failed puppet runs
* 18:15 rzl: disable puppet on 'A:mw-api and not A:mw-api-canary' [[phab:T264991|T264991]]
* 18:05 effie: disable puppet on all appservers
* 17:48 elukey: enable and run puppet on kafka-main2003 (it will start kafka services) - [[phab:T267865|T267865]]
* 17:42 dwisehaupt: frmon1001 upgraded to buster
* 17:36 volans: moved interfaces in Netbox from old to new switch - [[phab:T267865|T267865]]
* 17:24 vgutierrez: switching back from lvs2010 to lvs2007 - [[phab:T267865|T267865]]
* 17:21 vgutierrez: repooling cp2037 and cp2038 - [[phab:T267865|T267865]]
* 16:46 elukey@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
* 16:40 elukey@cumin1001: START - Cookbook sre.hosts.decommission
* 16:16 XioNoX: update c7 serial in row C VC config - [[phab:T267865|T267865]]
* 16:16 rzl: disable puppet on A:mw-api-canary [[phab:T264991|T264991]]
* 16:14 hnowlan@cumin1001: START - Cookbook sre.cassandra.roll-restart
* 16:08 effie: disable puppet in appservers canaries to install ICU 63 - [[phab:T264991|T264991]]
* 16:07 vgutierrez@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp2038.codfw.wmnet
* 16:07 vgutierrez@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp2037.codfw.wmnet
* 16:06 hnowlan@cumin1001: END (FAIL) - Cookbook sre.cassandra.roll-restart (exit_code=99)
* 16:03 hnowlan: joined maps2006 to maps codfw cassandra cluster
* 16:01 elukey@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
* 15:57 hnowlan@cumin1001: START - Cookbook sre.cassandra.roll-restart
* 15:57 hnowlan: roll-restarting eqiad restbase for java security updates
* 15:56 hnowlan@cumin1001: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0)
* 15:50 elukey@cumin1001: START - Cookbook sre.hosts.decommission
* 15:40 cdanis@cumin1001: END (PASS) - Cookbook sre.network.cf (exit_code=0)
* 15:40 cdanis@cumin1001: START - Cookbook sre.network.cf
* 14:16 hnowlan@cumin1001: START - Cookbook sre.cassandra.roll-restart
* 14:12 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool pc1007 in pc1 after restarting mysql [[phab:T266483|T266483]] (duration: 00m 59s)
* 14:06 marostegui: Restart pc1007's mysql [[phab:T266483|T266483]]
* 14:06 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool pc1007 and place pc1010 instead of it [[phab:T266483|T266483]] (duration: 01m 00s)
* 13:23 hnowlan@cumin1001: END (FAIL) - Cookbook sre.cassandra.roll-restart (exit_code=99)
* 13:00 kormat: running schema change against s1 in codfw [[phab:T259831|T259831]]
* 12:59 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 12:59 kormat@cumin1001: START - Cookbook sre.hosts.downtime
* 12:43 moritzm: installing tcpdump security updates
* 12:35 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 12:35 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime
* 12:25 hnowlan@cumin1001: START - Cookbook sre.cassandra.roll-restart
* 12:25 hnowlan: roll-restarting restbase-codfw
* 12:24 hnowlan@cumin1001: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0)
* 12:10 hnowlan@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 12:10 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime
* 11:49 hnowlan: roll restarting sessionstore for java updates
* 11:49 hnowlan@cumin1001: START - Cookbook sre.cassandra.roll-restart
* 11:13 moritzm: installing poppler security updates
* 10:46 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 10:46 klausman@cumin1001: START - Cookbook sre.hosts.downtime
* 10:45 dcaro@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 10:45 dcaro@cumin1001: START - Cookbook sre.hosts.downtime
* 10:44 dcaro@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 10:44 dcaro@cumin1001: START - Cookbook sre.hosts.downtime
* 09:31 gehel@cumin2001: END (FAIL) - Cookbook sre.elasticsearch.force-shard-allocation (exit_code=99)
* 09:31 gehel@cumin2001: START - Cookbook sre.elasticsearch.force-shard-allocation
* 08:39 godog: centrallog1001 move invalid config /etc/logrotate.d/logrotate-debug to /etc
* 08:35 moritzm: installing codemirror-js security updates
* 08:32 XioNoX: asw-c-codfw> request system power-off member 7 - [[phab:T267865|T267865]]
* 08:24 joal@deploy1001: Finished deploy [analytics/refinery@3df51cb] (thin): Analytics special train for webrequest table update THIN [analytics/refinery@3df51cb] (duration: 00m 07s)
* 08:23 joal@deploy1001: Started deploy [analytics/refinery@3df51cb] (thin): Analytics special train for webrequest table update THIN [analytics/refinery@3df51cb]
* 08:23 joal@deploy1001: Finished deploy [analytics/refinery@3df51cb]: Analytics special train for webrequest table update [analytics/refinery@3df51cb] (duration: 10m 09s)
* 08:13 joal@deploy1001: Started deploy [analytics/refinery@3df51cb]: Analytics special train for webrequest table update [analytics/refinery@3df51cb]
* 08:08 XioNoX: asw-c-codfw> request system power-off member 7 - [[phab:T267865|T267865]]
* 06:35 marostegui: Stop replication on s3 codfw master (db2105) for MCR schema change deployment [[phab:T238966|T238966]]
* 06:14 marostegui: Stop MySQL on es1018, es1015, es1019 to clone es1032, es1033, es1034 - [[phab:T261717|T261717]]
* 06:06 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es1018, es1015, es1019 - [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P13262 and previous config saved to /var/cache/conftool/dbconfig/20201116-060624-marostegui.json
* 06:02 marostegui: Restart mysql on db1115 (tendril/dbtree) due to memory usage
* 00:55 shdubsh: re-applied mask to kafka and kafka-mirror-main-eqiad_to_main-codfw@0 on kafka-main2003 and disabled puppet to prevent restart - [[phab:T267865|T267865]]
* 00:19 elukey: run 'systemctl mask kafka' and 'systemctl mask kafka-mirror-main-eqiad_to_main-codfw@0' on kafka-main2003 (for the brief moment when it was up) to avoid purged issues - [[phab:T267865|T267865]]
* 00:09 elukey: sudo cumin 'cp2028* or cp2036* or cp2039* or cp4022* or cp4025* or cp4028* or cp4031*' 'systemctl restart purged' -b 3 - [[phab:T267865|T267865]]
 
== 2020-11-15 ==
* 22:10 cdanis: restart some purgeds in ulsfo as well [[phab:T267865|T267865]] [[phab:T267867|T267867]]
* 22:03 cdanis: [[phab:T267867|T267867]] [[phab:T267865|T267865]] ✔️ cdanis@cumin1001.eqiad.wmnet ~ 🕔🍺 sudo cumin -b2 -s10 'A:cp and A:codfw' 'systemctl restart purged'
* 14:00 cdanis: powercycling ms-be1022 via mgmt
* 11:21 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 11:21 aborrero@cumin1001: START - Cookbook sre.hosts.downtime
* 11:12 vgutierrez: depooling lvs2007, lvs2010 taking over text traffic on codfw - [[phab:T267865|T267865]]
* 10:00 elukey: cumin 'cp2042* or cp2036* or cp2039*' 'systemctl restart purged' -b 1
* 09:57 elukey: restart purged on cp4028 (consumer stuck due to kafka-main2003 down)
* 09:55 elukey: restart purged on cp4025 (consumer stuck due to kafka-main2003 down)
* 09:53 elukey: restart purged on cp4031 (consumer stuck due to kafka-main2003 down)
* 09:50 elukey: restart purged on cp4022 (consumer stuck due to kafka-main2003 down)
* 09:42 elukey: restart purged on cp2028 (kafka-main2003 is down and there are connect timeouts errors)
* 09:07 Urbanecm: Change email for SUL user Botopol via resetUserEmail.php ([[phab:T267866|T267866]])
* 08:27 elukey: truncate -s 10g /var/lib/hadoop/data/n/yarn/logs/application_1601916545561_173219/container_e25_1601916545561_173219_01_000177/stderr on an-worker1100
* 08:24 elukey: sudo truncate -s 10g /var/lib/hadoop/data/c/yarn/logs/application_1601916545561_173219/container_e25_1601916545561_173219_01_000019/stderr on an-worker1098
 
== 2020-11-13 ==
* 22:06 Urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript emptyUserGroup.php --wiki=myvwiki autopatrolled # [[phab:T105570|T105570]]
* 22:04 Urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript emptyUserGroup.php --wiki=testwiki editor # [[phab:T105570|T105570]]
* 21:42 Urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript emptyUserGroup.php --wiki=enwikinews reviewer # [[phab:T105570|T105570]]
* 21:40 Urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript emptyUserGroup.php --wiki=bnwiki editor # [[phab:T105570|T105570]]
* 21:39 Urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript emptyUserGroup.php --wiki=testwiki flood # [[phab:T105570|T105570]]
* 21:38 Urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript emptyUserGroup.php --wiki=test2wiki upwizcampeditors # [[phab:T105570|T105570]]
* 21:33 Urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript emptyUserGroup.php --wiki=aawiki communityapplica # [[phab:T105570|T105570]]
* 21:28 Urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript emptyUserGroup.php --wiki=enwiki epadmin # [[phab:T105570|T105570]]
* 16:50 _joe_: manually rotate user.log on centrallog1001 and moved it to /srv/user.log.manual-rotation
* away: updated fundraising CiviCRM from {{Gerrit|f7954c6659}} to {{Gerrit|74d795408f}}
* 08:15 vgutierrez: restart acme-chief on acmechief1001
* 01:30 TimStarling: on mwmaint1002 running fixT260485.php unmerged fixup script from https://gerrit.wikimedia.org/r/c/mediawiki/extensions/WikimediaMaintenance/+/640348
 
== 2020-11-12 ==
* 19:12 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|0f0f8397424d4337cdcd61f7acb276d4f0b1facd}}: Enable "Cite" button in toolbar for enwiktionary ([[phab:T267504|T267504]]) (duration: 00m 58s)
* 19:08 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|3ce18e6f63abe060c05c40239b651086f65a1a33}}: Add artsdatabanken.no to the wgCopyUploadsDomains allowlist of Wikimedia Commons ([[phab:T267784|T267784]]) (duration: 01m 00s)
* 16:12 Urbanecm: Start of `mwscript extensions/AbuseFilter/maintenance/updateVarDumps.php --wiki=$wiki --print-orphaned-records-to=/tmp/urbanecm/$wiki-orphaned.log --progress-markers > $wiki.log` in a tmux at mwmaint1002 (wiki=jawiki; [[phab:T246539|T246539]])
* 16:11 Urbanecm: End of `mwscript extensions/AbuseFilter/maintenance/updateVarDumps.php --wiki=$wiki --print-orphaned-records-to=/tmp/urbanecm/$wiki-orphaned.log --progress-markers > $wiki.log` in a tmux session updateVarDumps at mwmaint1002 (wiki=cswiki; [[phab:T246539|T246539]])
* 13:38 Urbanecm: Start of `mwscript extensions/AbuseFilter/maintenance/updateVarDumps.php --wiki=$wiki --print-orphaned-records-to=/tmp/urbanecm/$wiki-orphaned.log --progress-markers > $wiki.log` in a tmux session updateVarDumps at mwmaint1002 (wiki=cswiki; [[phab:T246539|T246539]])
* 11:40 mvolz@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'citoid' for release 'production' .
* 11:35 mvolz@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'citoid' for release 'production' .
* 11:30 mvolz@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'citoid' for release 'staging' .
* 11:12 mvolz@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'zotero' for release 'production' .
* 11:08 mvolz@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'zotero' for release 'production' .
* 11:02 mvolz@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'zotero' for release 'staging' .
* 09:19 hashar@deploy1001: Synchronized php-1.36.0-wmf.16/includes/filerepo: Revert "filerepo: clean up shared cache keys to avoid key metrics clutter" - [[phab:T267668|T267668]] (duration: 01m 01s)
* 09:12 hashar: Pulled https://gerrit.wikimedia.org/r/640746 on deployment server for # [[phab:T267668|T267668]]
* 03:46 ejegg: updated python fundraising tools from {{Gerrit|7853f426ee}} to {{Gerrit|68e054c9ad}}
 
== 2020-11-11 ==
* 16:44 XioNoX: Revert "temporarily route Italy to codfw"
* 16:38 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.cf (exit_code=0)
* 16:38 ayounsi@cumin1001: START - Cookbook sre.network.cf
* 16:30 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.cf (exit_code=0)
* 16:30 ayounsi@cumin1001: START - Cookbook sre.network.cf
* 15:52 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.cf (exit_code=0)
* 15:52 ayounsi@cumin1001: START - Cookbook sre.network.cf
* 14:29 akosiaris@cumin1001: conftool action : set/pooled=yes; selector: name=cp3054.esams.wmnet
* 13:52 akosiaris@cumin1001: conftool action : set/pooled=no; selector: name=cp3054.esams.wmnet
* 12:25 Lucas_WMDE: EU backport&config window done
* 12:23 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/Wikibase.php: Config: [[gerrit:640676{{!}}Remove propagateChangeVisibility repo setting]] (duration: 00m 58s)
* 12:16 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:636453{{!}}Enable propagatePageDeletion on Wikidata]] (duration: 00m 59s)
* 12:11 lucaswerkmeister-wmde@deploy1001: Synchronized php-1.36.0-wmf.16/extensions/DiscussionTools/includes/CommentParser.php: Backport: [[gerrit:640497{{!}}Fix getHeadlineNodeAndOffset() returning text nodes (T267284)]] (duration: 01m 01s)
* 10:34 XioNoX: delete unused interfaces from asw-d-codfw
* 09:53 XioNoX: prioritized DE-CIX IXP - [[phab:T262681|T262681]]
* 02:18 ryankemper: (WDQS deploy completed)
* 00:48 ryankemper: Restarting `wdqs-categories` one host at a time across all wdqs production instances: `sudo -E cumin -b 1 'A:wdqs-all and not A:wdqs-test' 'depool && sleep 60 && systemctl restart wdqs-categories && sleep 30 && pool'`
* 00:47 ryankemper: Restarted `wdqs-categories` across wdqs test hosts: `sudo -E cumin 'A:wdqs-test' 'systemctl restart wdqs-categories'`
* 00:47 ryankemper: Restarted `wdqs-updater` simultaneously across all wdqs hosts: `sudo -E cumin -b 4 'A:wdqs-all' 'systemctl restart wdqs-updater'`
* 00:47 ryankemper: [wdqs deploy] following deploy, example query succeeds on `query.wikidata.org`, proceeding to post deploy steps
* 00:46 ryankemper@deploy1001: Finished deploy [wdqs/wdqs@03219df]: 0.3.55 (duration: 11m 24s)
* 00:46 ryankemper: [[phab:T222669|T222669]] [Elasticsearch reindex] Began long-running reindex of cirrus elasticsearch for `codfw`, `eqiad`, and `cloudelastic`. 3 tmux sessions on `ryankemper@mwmaint1002`: `reindex_eqiad`, `reindex_codfw`, `reindex_cloudelastic`
* 00:38 ryankemper: Following deploy to canary `wdqs1003`, automated tests are passing as is a manual test of an example query. Proceeding...
* 00:34 ryankemper@deploy1001: Started deploy [wdqs/wdqs@03219df]: 0.3.55
* 00:32 ryankemper: About to begin wdqs deploy; before-deploy tests on canary `wdqs1003` are passing
* 00:09 eileen: civicrm revision changed from {{Gerrit|d0cd7f6dbb}} to {{Gerrit|e5d12cc46c}}, config revision is {{Gerrit|e2d133eff4}}
 
== 2020-11-10 ==
* 22:14 mholloway-shell@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'mobileapps' for release 'nontls' .
* 22:14 mholloway-shell@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
* 22:08 mholloway-shell@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'mobileapps' for release 'nontls' .
* 22:08 mholloway-shell@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
* 22:05 mholloway-shell@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'mobileapps' for release 'staging' .
* 21:59 mholloway-shell@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
* 21:58 jgleeson: update civicrm revision changed from {{Gerrit|c36a5cc1b1}} to {{Gerrit|d0cd7f6dbb}}
* 21:57 mholloway-shell@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
* 21:55 mholloway-shell@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'wikifeeds' for release 'staging' .
* 21:47 ebernhardson: unban elastic1050 from eqiad search psi cluster
* 21:28 cstone: civicrm revision changed from {{Gerrit|b1342c4129}} to {{Gerrit|c36a5cc1b1}}
* 21:24 brennen@deploy1001: sync-file aborted: Testing: README.md sync-file with ssh -n for [[phab:T223287|T223287]] (duration: 00m 37s)
* 21:23 brennen: testing some scap operations, modified to use ssh -n for debugging [[phab:T223287|T223287]]
* 21:11 ebernhardson: ban elastic1050 from eqiad psi cluster due to excessive load
* 21:02 brennen@deploy1001: Finished scap: Backport: [[gerrit:640487{{!}}language: Honor $wgTranslateNumerals, even if PHP does digit translation(T267614)]] and [[gerrit:640488{{!}}Downgrade the severity of the non-numeric argument to formatNum warnings (T267370, T267587)]] (duration: 34m 46s)
* 20:27 brennen@deploy1001: Started scap: Backport: [[gerrit:640487{{!}}language: Honor $wgTranslateNumerals, even if PHP does digit translation(T267614)]] and [[gerrit:640488{{!}}Downgrade the severity of the non-numeric argument to formatNum warnings (T267370, T267587)]]
* 20:10 brennen@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:640254{{!}}Turn on formatnum logging (T267587, T267370)]] (duration: 01m 02s)
* 19:06 hknust: holger mwmaint1002 Stop [[phab:T219279|T219279]]
* 18:31 hknust: holger mwmaint1002 Start [[phab:T219279|T219279]]
* 17:57 effie: pool mw1263 mw1264
* 17:31 effie: briefly depool mw1263 and mw1264
* 17:30 jynus: about to shutdown db1139 for hw maintenance [[phab:T261405|T261405]]
* 17:13 dwisehaupt: upping thank you mail flow through frmx's to 30% of the total runs
* 16:32 XioNoX: add cloud-storage1-b-codfw to, well, codfw switches - [[phab:T267378|T267378]]
* 16:20 effie: pool mw1263
* 16:17 hashar: Restarting Gerrit on gerrit1001
* 16:12 hashar: Restarted Gerrit on gerrit2001 for config change
* 15:53 zpapierski@deploy1001: Finished deploy [wikimedia/discovery/analytics@1ab89ed]: Deploying venv workaround for Debian 9 (duration: 01m 06s)
* 15:52 zpapierski@deploy1001: Started deploy [wikimedia/discovery/analytics@1ab89ed]: Deploying venv workaround for Debian 9
* 15:38 moritzm: installing 4.19.152 kernel packages on buster hosts (only installing the package, reboots will happen separately)
* 15:28 effie: depool mw1263 - [[phab:T244340|T244340]]
* 15:09 ejegg: updated fundraising python tools from {{Gerrit|087a596d3a}} to {{Gerrit|7853f426ee}}
* 14:21 effie: pooling mw1276 - [[phab:T244340|T244340]]
* 13:51 moritzm: imported php-memcached 3.0.1+2.2.0-1~wmf3+buster1  to component/php72 for buster-wikimedia
* 13:29 marostegui: Restart db2093 to pick up report_host - [[phab:T266483|T266483]]
* 13:17 marostegui: Restart db1117* to pick up report_host - [[phab:T266483|T266483]]
* 12:46 effie: depool mw1276 to install onhost memcached - [[phab:T244340|T244340]]
* 12:33 Lucas_WMDE: EU backport&config window done
* 12:33 moritzm: installing wireshark security updates
* 12:31 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/CommonSettings.php: Config: [[gerrit:636095{{!}}Switch parser cache to using "mcrouter-with-onhost-tier" (T264604)]] (duration: 00m 57s)
* 12:23 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/mc.php: Config: [[gerrit:636094{{!}}Add "mcrouter-with-onhost-tier" entry to $wgObjectCaches (T264604)]] (duration: 00m 57s)
* 12:04 lucaswerkmeister-wmde@deploy1001: Synchronized php-1.36.0-wmf.16/extensions/Wikibase: Backport: [[gerrit:639035{{!}}Revert JS parser commits (T266671)]] (duration: 01m 04s)
* 08:59 hashar: Restarted Gerrit for plugins deployment
* 08:06 hashar: Restarting Gerrit on gerrit2001 / gerrit-replica
* 08:04 hashar@deploy1001: Finished deploy [gerrit/gerrit@5a41181]: jmx and prometheus metrics reporters - [[phab:T184086|T184086]] (duration: 00m 10s)
* 08:04 hashar@deploy1001: Started deploy [gerrit/gerrit@5a41181]: jmx and prometheus metrics reporters - [[phab:T184086|T184086]]
* 07:40 elukey: import hue_4.8.0-2 to buster-wikimedia
* 06:53 marostegui: Restart dbstore* to pick up report_host - [[phab:T266483|T266483]]
* 06:44 marostegui: Restart pc1010 to pick up report_host - [[phab:T266483|T266483]]
 
== 2020-11-09 ==
* 22:26 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 22:24 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 21:14 mbsantos@deploy1001: Finished deploy [tilerator/deploy@97575e4]: Add new target for beta environment and clean-up old envs ([[phab:T222377|T222377]]) (duration: 02m 23s)
* 21:11 mbsantos@deploy1001: Started deploy [tilerator/deploy@97575e4]: Add new target for beta environment and clean-up old envs ([[phab:T222377|T222377]])
* 20:53 cdanis@cumin1001: conftool action : set/pooled=inactive; selector: name=maps2002.*
* 20:36 cdanis: depool maps2002
* 20:26 mbsantos@deploy1001: Finished deploy [kartotherian/deploy@0a38bc5]: Add new target for beta environment and clean-up old envs ([[phab:T223041|T223041]] [[phab:T222377|T222377]] [[phab:T255932|T255932]]) (duration: 01m 09s)
* 20:25 mbsantos@deploy1001: Started deploy [kartotherian/deploy@0a38bc5]: Add new target for beta environment and clean-up old envs ([[phab:T223041|T223041]] [[phab:T222377|T222377]] [[phab:T255932|T255932]])
* 20:24 mbsantos@deploy1001: Finished deploy [kartotherian/deploy@0a38bc5]: Add new target for beta environment and clean-up old envs ([[phab:T223041|T223041]] [[phab:T222377|T222377]] [[phab:T255932|T255932]]) (duration: 11m 36s)
* 20:13 mbsantos@deploy1001: Started deploy [kartotherian/deploy@0a38bc5]: Add new target for beta environment and clean-up old envs ([[phab:T223041|T223041]] [[phab:T222377|T222377]] [[phab:T255932|T255932]])
* 20:12 brennen@deploy1001: rebuilt and synchronized wikiversions files: all wikis to 1.36.0-wmf.16
* 20:04 mbsantos@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
* 20:01 mbsantos@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
* 19:58 mbsantos@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'wikifeeds' for release 'staging' .
* 18:32 mepps: updated payments-wiki from {{Gerrit|388490e86d}} to {{Gerrit|8612ed1002}}, config revision is {{Gerrit|987e839869}}
* 17:53 XioNoX: re-order asw-d-codfw interfaces-ranges
* 17:51 XioNoX: standardize asw-d-codfw interfaces descriptions
* 17:33 effie: updating mwdebug2002 to ICU 63 - [[phab:T264991|T264991]]
* 17:04 ayounsi@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 16:57 brennen@deploy1001: Synchronized php: group1 wikis to 1.36.0-wmf.16 (duration: 01m 05s)
* 16:57 ayounsi@cumin1001: START - Cookbook sre.dns.netbox
* 16:56 brennen@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.36.0-wmf.16
* 16:48 ayounsi@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
* 16:45 ayounsi@cumin1001: START - Cookbook sre.dns.netbox
* 16:40 moritzm: imported 2.0.2+0.5.7-1~wmf3+php72+buster1 to component/php72 for buster-wikimedia
* 16:34 Urbanecm: Start of `mwscript extensions/AbuseFilter/maintenance/updateVarDumps.php --wiki=$wiki --print-orphaned-records-to=/tmp/urbanecm/$wiki-orphaned.log --progress-markers > $wiki.log` in a tmux session at mwmaint1002 (wiki=trwiki; [[phab:T246539|T246539]])
* 16:34 Urbanecm: End of `mwscript extensions/AbuseFilter/maintenance/updateVarDumps.php --wiki=$wiki --print-orphaned-records-to=/tmp/urbanecm/$wiki-orphaned.log --progress-markers > $wiki.log` in a tmux session at mwmaint1002 (wiki=kowiki; [[phab:T246539|T246539]])
* 16:20 XioNoX: Netbox prod: mass import from PuppetDB (cables, etc) - [[phab:T262899|T262899]]
* 16:04 pt1979@cumin2001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 15:55 pt1979@cumin2001: START - Cookbook sre.dns.netbox
* 15:12 urbanecm@deploy1001: Synchronized wmf-config/abusefilter.php: {{Gerrit|62c2e02f836095ba7e8c7b80d97a52aee885b619}}: abusefilter.php: Enable wgAbuseFilterNotificationsPrivate by default for WMF wikis ([[phab:T266298|T266298]]) (duration: 01m 07s)
* 14:34 hashar: Restarting Gerrit
* 14:07 hashar@deploy1001: Finished deploy [gerrit/gerrit@0a803e2]: Upgrade javamelody to 1.86.0 # [[phab:T232678|T232678]] (duration: 00m 18s)
* 14:07 hashar@deploy1001: Started deploy [gerrit/gerrit@0a803e2]: Upgrade javamelody to 1.86.0 # [[phab:T232678|T232678]]
* 14:05 ayounsi@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 14:03 ayounsi@cumin1001: START - Cookbook sre.dns.netbox
* 14:03 Urbanecm: Start of `mwscript extensions/AbuseFilter/maintenance/updateVarDumps.php --wiki=$wiki --print-orphaned-records-to=/tmp/urbanecm/$wiki-orphaned.log --progress-markers > $wiki.log` in a tmux session at mwmaint1002 (wiki=kowiki; [[phab:T246539|T246539]])
* 14:01 ayounsi@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 13:59 ayounsi@cumin1001: START - Cookbook sre.dns.netbox
* 13:55 ayounsi@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
* 13:49 ayounsi@cumin1001: START - Cookbook sre.dns.netbox
* 13:44 ayounsi@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
* 13:40 ayounsi@cumin1001: START - Cookbook sre.dns.netbox
* 12:13 Urbanecm: urbanecm@mwmaint1002:~$ mwscript namespaceDupes.php --wiki=zhwikinews --fix --add-prefix=BROKEN # [[phab:T266925|T266925]]
* 12:10 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|11b8f6236d159962bdebccd6dcacb72e600ec6b5}}: Add wgNamespaceAliases for zhwikinews ([[phab:T266925|T266925]]) (duration: 01m 06s)
* 12:06 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|87b3eede24fb407ddd226ad65817ab8adf44aeb8}}: Enable DiscussionTools as a beta feature on fiwiki ([[phab:T265446|T265446]]) (duration: 01m 06s)
* 11:58 moritzm: installing remaining openldap updates on stretch
* 11:57 jynus: restart dbstore1004 mariadb instances
* 10:49 ayounsi@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 10:46 ayounsi@cumin1001: START - Cookbook sre.dns.netbox
* 10:36 XioNoX: add 185.15.56.240/29 IPs to relevant cloudsw interfaces - [[phab:T265288|T265288]]
* 10:35 effie: merging 638109 and roll restart ms-fe* hosts to pick up the change
* 10:11 XioNoX: renumber cloud-xlink1-eqiad
* 09:56 Urbanecm: Purge https://vote.wikimedia.org/wiki/Main_Page ([[phab:T262689|T262689]])
* 09:54 Urbanecm: Start of `mwscript extensions/AbuseFilter/maintenance/updateVarDumps.php --wiki=$wiki --print-orphaned-records-to=/tmp/urbanecm/$wiki-orphaned.log --progress-markers > $wiki.log` in a tmux session at mwmaint1002 (wiki=svwiki; [[phab:T246539|T246539]])
* 09:52 hashar: Restarting Gerrit on gerrit1001 and gerrit2001  in order to have the JVM to exit after OutOfMemory  # [[phab:T267517|T267517]]
* 09:50 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|7b0a81f4294dcedfd5736884900cb561de9a080e}}: Revert "Change votewiki language temporarily to fa for fawiki elections" ([[phab:T262689|T262689]]) (duration: 01m 08s)
* 09:37 moritzm: installing libexif security updates
* 09:06 godog: enable thanos query-frontend on thanos-fe hosts - [[phab:T261281|T261281]]
* 08:24 XioNoX: configure traceoptions on pfw3-eqiad - [[phab:T263833|T263833]]
* 08:11 hashar: Restarting Gerrit on gerrit1001 and gerrit2001
* 07:58 hashar: Restarted CI Jenkins on contint2001 for Java upgrade
* 07:17 elukey: restart gerrit on gerrit2001 (OOM registered for two days ago, uptime from systemctl since a month ago, probably in a weird state)
* 01:35 tstarling@deploy1001: Synchronized php-1.36.0-wmf.14/tests/phpunit/maintenance/categoryChangesAsRdfTest.php: this was cherry-picked to make CI pass, pushing it out just for a clean staging dir (duration: 01m 06s)
* 01:32 tstarling@deploy1001: Synchronized php-1.36.0-wmf.14/resources/src/mediawiki.api/upload.js: fixing UBN [[phab:T266903|T266903]] (duration: 01m 06s)
* 01:30 tstarling@deploy1001: Synchronized php-1.36.0-wmf.14/resources/src/mediawiki.Upload.js: fixing UBN [[phab:T266903|T266903]] (duration: 01m 07s)
* 01:29 tstarling@deploy1001: sync-file aborted: fixing UBN [[phab:T266903|T266903]] (duration: 00m 01s)
 
== 2020-11-08 ==
* 23:08 tstarling@deploy1001: Synchronized php-1.36.0-wmf.16/resources/src/mediawiki.api/upload.js: fixing UBN [[phab:T266903|T266903]] (duration: 01m 06s)
* 23:06 tstarling@deploy1001: Synchronized php-1.36.0-wmf.16/resources/src/mediawiki.Upload.js: fixing UBN [[phab:T266903|T266903]] (duration: 01m 35s)
* 20:34 cdanis: repool esams
* 19:48 cdanis@cumin1001: END (PASS) - Cookbook sre.network.cf (exit_code=0)
* 19:48 cdanis@cumin1001: START - Cookbook sre.network.cf
* 19:16 cdanis: depool esams
* 18:35 cdanis@cumin1001: END (PASS) - Cookbook sre.network.cf (exit_code=0)
* 18:35 cdanis@cumin1001: START - Cookbook sre.network.cf
 
== 2020-11-06 ==
* 23:38 dwisehaupt: frdata1001 upgraded to buster
* 22:40 ebernhardson@deploy1001: Finished deploy [wikimedia/discovery/analytics@bfaac0f]: Update to master, primarily updates for ores weekly predictions handling (duration: 01m 08s)
* 22:39 ebernhardson@deploy1001: Started deploy [wikimedia/discovery/analytics@bfaac0f]: Update to master, primarily updates for ores weekly predictions handling
* 22:29 ebernhardson@deploy1001: Finished deploy [wikimedia/discovery/analytics@dc63e7e]: Update to master, primarily updates for ores weekly predictions handling (duration: 00m 26s)
* 22:29 ebernhardson@deploy1001: Started deploy [wikimedia/discovery/analytics@dc63e7e]: Update to master, primarily updates for ores weekly predictions handling
* 20:57 reedy@deploy1001: Synchronized php-1.36.0-wmf.16/skins/CologneBlue/: [[phab:T267278|T267278]] (duration: 01m 05s)
* 20:56 reedy@deploy1001: Synchronized php-1.36.0-wmf.14/skins/CologneBlue/: [[phab:T267278|T267278]] (duration: 01m 10s)
* 20:07 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 20:05 robh@cumin1001: START - Cookbook sre.hosts.downtime
* 18:54 cwhite@cumin1001: conftool action : set/pooled=no; selector: name=mw1379.eqiad.wmnet
* 17:02 dwisehaupt: rolled out new thank_you_mail_send process_control scripts to utilize frmx hosts
* 16:45 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 16:44 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime
* 16:44 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 16:44 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime
* 16:44 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 16:44 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime
* 16:20 hnowlan@puppetmaster1001: conftool action : set/pooled=yes:weight=10; selector: dc=codfw,cluster=maps,service=kartotherian,name=maps2005.codfw.wmnet
* 14:46 moritzm: installing wireshark security updates
* 14:36 hnowlan: resyncing database on maps1001
* 14:25 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 14:24 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime
* 14:05 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 14:03 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 14:01 elukey@cumin1001: START - Cookbook sre.hosts.downtime
* 14:01 elukey@cumin1001: START - Cookbook sre.hosts.downtime
* 13:05 hnowlan: started cassandra bootstrap of maps2005
* 11:52 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 11:50 elukey@cumin1001: START - Cookbook sre.hosts.downtime
* 11:49 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 11:47 elukey@cumin1001: START - Cookbook sre.hosts.downtime
* 11:30 hnowlan: joining maps2005 to cassandra cluster
* 11:24 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 11:22 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 11:20 elukey@cumin1001: START - Cookbook sre.hosts.downtime
* 11:19 elukey@cumin1001: START - Cookbook sre.hosts.downtime
* 11:09 moritzm: uploaded openjdk-8  8u272-b10-1~deb10u1 to buster-wikimedia/component/jdk
* 10:54 elukey@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 10:52 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 10:49 elukey@cumin1001: START - Cookbook sre.hosts.downtime
* 10:49 elukey@cumin1001: START - Cookbook sre.hosts.downtime
* 10:06 dcausse: restarted elastic on elastic1063 ([[phab:T265113|T265113]])
* 09:57 moritzm: installing spice security updates
* 09:32 moritzm: installing libsndfile security updates
* 09:15 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 09:13 elukey@cumin1001: START - Cookbook sre.hosts.downtime
* 09:12 elukey@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 09:12 elukey@cumin1001: START - Cookbook sre.hosts.downtime
* 08:14 moritzm: installing openldap security updates on stretch/buster (client-side tools/libs only, slapd updates already deployed)
* 04:38 ryankemper: [Deploy finished] WDQS deploy is complete; the service is healthy per https://grafana.wikimedia.org/d/000000489/wikidata-query-service?orgId=1&from=1604633917530&to=1604637475930
* 04:36 ryankemper: Finished restarting wdqs categories one host at a time across all wdqs production instances
* 04:02 ryankemper: Restarting wdqs categories one host at a time across all wdqs production instances: `sudo -E cumin -b 1 'A:wdqs-all and not A:wdqs-test' 'depool && sleep 60 && systemctl restart wdqs-categories && sleep 30 && pool'` (in progress)
* 04:01 ryankemper: Restarted wdqs categories across test hosts: `sudo -E cumin 'A:wdqs-test' 'systemctl restart wdqs-categories'`
* 04:01 ryankemper: Restarted wdqs updater across all hosts: `sudo -E cumin -b 4 'A:wdqs-all' 'systemctl restart wdqs-updater'`
* 04:00 ryankemper: `query.wikidata.org` looks good following deploy, proceeding to post-deploy steps
* 03:59 ryankemper@deploy1001: Finished deploy [wdqs/wdqs@27a5c54]: 0.3.54 (duration: 11m 22s)
* 03:51 ryankemper: Tests passing on canary `wdqs1003` following initial deployment, proceeding with deploy to rest of fleet
* 03:48 ryankemper@deploy1001: Started deploy [wdqs/wdqs@27a5c54]: 0.3.54
* 03:48 ryankemper: About to begin wdqs deploy, tests passing on canary `wdqs1003`
* 00:53 brennen@deploy1001: Finished scap: Synchronizing to pick up i18n for [[gerrit:639505]]. Will resume moving train to group1 on Monday morning (US) ([[phab:T263182|T263182]]) (duration: 69m 02s)
 
== 2020-11-05 ==
* 23:44 brennen@deploy1001: Started scap: Synchronizing to pick up i18n for [[gerrit:639505]]. Will resume moving train to group1 on Monday morning (US) ([[phab:T263182|T263182]])
* 23:38 brennen@deploy1001: Synchronized php-1.36.0-wmf.16/includes/media/FormatMetadata.php: Backport: [[gerrit:639505{{!}}media: Support GPSAltitudeRef exif tag - FormatMetData.php (T267370)]] (duration: 07m 22s)
* 23:29 brennen@deploy1001: Synchronized php-1.36.0-wmf.16/languages/i18n/exif: Backport: [[gerrit:639505{{!}}media: Support GPSAltitudeRef exif tag - i18n/exif files (T267370)]] (duration: 01m 08s)
* 23:09 brennen@deploy1001: Synchronized php-1.36.0-wmf.16/vendor: Backport: [[gerrit:639504{{!}}Bump wikimedia/parsoid to 0.13.0-a16 (T267146)]] (duration: 01m 14s)
* 20:54 hnowlan: reenabled tilerator in eqiad
* 20:47 brennen@deploy1001: rebuilt and synchronized wikiversions files: Revert group1 wikis to 1.36.0-wmf.14
* 20:44 brennen@deploy1001: Synchronized php: group1 wikis to 1.36.0-wmf.16 (duration: 01m 39s)
* 20:42 brennen@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.36.0-wmf.16
* 20:39 hnowlan: finished removenode of maps2002 cassandra
* 20:22 brennen: train: waiting ~15 minutes before rolling forward to group1.
* 20:19 brennen@deploy1001: rebuilt and synchronized wikiversions files: group0 wikis to 1.36.0-wmf.16
* 20:15 brennen@deploy1001: Synchronized php-1.36.0-wmf.16/extensions/CentralAuth/includes/specials/SpecialCentralAuth.php: Backport: [[gerrit:639500{{!}}Dont double-format numeric edit count (T267362)]] (duration: 01m 06s)
* 19:44 Urbanecm: Morning B&C window done
* 19:44 urbanecm@deploy1001: Synchronized php-1.36.0-wmf.16/extensions/GrowthExperiments/modules/homepage/: {{Gerrit|81cb1c7b141d49d7fc931fdc13ffd1b48b3a25ab}}: Suggested edits: Export task count from start editing dialog ([[phab:T266868|T266868]]; [[phab:T263040|T263040]]) (duration: 01m 07s)
* 19:16 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|453b9c64c44a256eafdfafe7a0023484377bbbd2}}: Fix DiscussionTools wikis config for thwiki/tgwiki ([[phab:T266303|T266303]]) (duration: 01m 08s)
* 18:32 razzi: shutting down kafka-jumbo1005 to allow dcops to upgrade NIC
* 17:52 akosiaris: restart uwsgi-ores in all ores1* nodes per complaint on IRC that max redis clients have been reached [[phab:T263910|T263910]]
* 17:51 brennen@deploy1001: rebuilt and synchronized wikiversions files: Revert group0 wikis to 1.36.0-wmf.14
* 17:48 razzi: shutting down kafka-jumbo1004 to allow dcops to upgrade NIC
* 17:46 brennen@deploy1001: rebuilt and synchronized wikiversions files: group0 wikis to 1.36.0-wmf.16
* 17:41 brennen: train is currently unblocked; rolling to group0 ([[phab:T263182|T263182]])
* 17:33 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 17:33 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime
* 17:32 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 17:32 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime
* 17:26 brennen@deploy1001: Synchronized php-1.36.0-wmf.16/languages: Backport: [[gerrit:639491{{!}}language: Clean up $separatorTransformTable in km/la/my (T267091)]] (duration: 01m 12s)
* 17:21 brennen@deploy1001: Synchronized php-1.36.0-wmf.16/resources/Resources.php: Backport: [[gerrit:639495{{!}}mediawiki.action.edit.preview: Add versionCallback to improve startup perf (T266311)]] (duration: 01m 10s)
* 17:15 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: dc=codfw,cluster=maps,service=kartotherian,name=maps2002.codfw.wmnet
* 17:14 hnowlan: rebuilding cassandra on maps2002
* 17:14 jayme: imported kubernetes 1.16.15 to component/kubernetes-future stretch-wikimedia
* 17:05 hnowlan: restarting maps2004 postgres for config change
* 17:05 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 17:05 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime
* 16:57 razzi: shutting down kafka-jumbo1003 to allow dcops to upgrade NIC
* 16:26 razzi: shutting down kafka-jumbo1002 to allow dcops to upgrade NIC
* 15:53 elukey@cumin1001: END (PASS) - Cookbook sre.aqs.roll-restart (exit_code=0)
* 15:50 elukey@cumin1001: START - Cookbook sre.aqs.roll-restart
* 15:41 moritzm: installing junit4 security updates
* 14:55 elukey: shutdown kafka-jumbo1001 to swap NICs (1g -> 10g)
* 14:45 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 14:45 kormat@cumin1001: START - Cookbook sre.hosts.downtime
* 14:10 jbond42: enable puppet fleet wide to post restart puppetdb
* 14:01 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 14:01 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime
* 13:57 jbond42: disable puppet fleet wide to restart puppetdb
* 13:26 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 13:25 kormat@cumin1001: START - Cookbook sre.hosts.downtime
* 13:08 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 13:08 kormat@cumin1001: START - Cookbook sre.hosts.downtime
* 12:54 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 12:54 kormat@cumin1001: START - Cookbook sre.hosts.downtime
* 12:52 jbond42: upgrade freetype on jessie
* 12:50 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 12:49 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime
* 12:44 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 12:44 kormat@cumin1001: START - Cookbook sre.hosts.downtime
* 12:34 root@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 12:34 root@cumin1001: START - Cookbook sre.hosts.downtime
* 12:09 marostegui: Upgrade mysql on pc2010
* 11:58 jynus: shutting down db1139 in preparation of maintenance [[phab:T261405|T261405]]
* 11:55 marostegui: Upgrade mysql on db1077
* 11:42 marostegui@cumin1001: dbctl commit (dc=all): 'Promote es1012 to es1 master, es1011 to es2 master, es1014 to es3 (this is a noop) [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P13230 and previous config saved to /var/cache/conftool/dbconfig/20201105-114223-marostegui.json
* 11:41 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 11:41 kormat@cumin1001: START - Cookbook sre.hosts.downtime
* 11:27 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 11:27 kormat@cumin1001: START - Cookbook sre.hosts.downtime
* 11:12 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 11:12 kormat@cumin1001: START - Cookbook sre.hosts.downtime
* 11:05 Urbanecm: Start of `mwscript extensions/AbuseFilter/maintenance/updateVarDumps.php --wiki=$wiki --print-orphaned-records-to=/tmp/urbanecm/$wiki-orphaned.log --progress-markers > $wiki.log` in a tmux session updateVarDumps at mwmaint1002 (wiki=dewiki; [[phab:T246539|T246539]])
* 10:55 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 10:55 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime
* 10:16 godog: grafana-rw.wikimedia.org active and sso-enabled - [[phab:T262512|T262512]]
* 09:58 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 09:58 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime
* 09:43 marostegui@cumin1001: dbctl commit (dc=all): 'es1031 (re)pooling @ 100%: Slowly pool es1031 after being recloned [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P13227 and previous config saved to /var/cache/conftool/dbconfig/20201105-094356-root.json
* 09:43 marostegui@cumin1001: dbctl commit (dc=all): 'es1030 (re)pooling @ 100%: Slowly pool es1030 after being recloned [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P13226 and previous config saved to /var/cache/conftool/dbconfig/20201105-094348-root.json
* 09:43 marostegui@cumin1001: dbctl commit (dc=all): 'es1029 (re)pooling @ 100%: Slowly pool es1029 after being recloned [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P13225 and previous config saved to /var/cache/conftool/dbconfig/20201105-094336-root.json
* 09:28 marostegui@cumin1001: dbctl commit (dc=all): 'es1031 (re)pooling @ 75%: Slowly pool es1031 after being recloned [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P13224 and previous config saved to /var/cache/conftool/dbconfig/20201105-092853-root.json
* 09:28 moritzm: enabling CAS on grafana1002, editing dashboards will be interrupted for a bit
* 09:28 marostegui@cumin1001: dbctl commit (dc=all): 'es1030 (re)pooling @ 75%: Slowly pool es1030 after being recloned [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P13223 and previous config saved to /var/cache/conftool/dbconfig/20201105-092845-root.json
* 09:28 marostegui@cumin1001: dbctl commit (dc=all): 'es1029 (re)pooling @ 75%: Slowly pool es1029 after being recloned [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P13222 and previous config saved to /var/cache/conftool/dbconfig/20201105-092833-root.json
* 09:21 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 09:21 kormat@cumin1001: START - Cookbook sre.hosts.downtime
* 09:13 marostegui@cumin1001: dbctl commit (dc=all): 'es1031 (re)pooling @ 50%: Slowly pool es1031 after being recloned [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P13219 and previous config saved to /var/cache/conftool/dbconfig/20201105-091350-root.json
* 09:13 marostegui@cumin1001: dbctl commit (dc=all): 'es1030 (re)pooling @ 50%: Slowly pool es1030 after being recloned [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P13218 and previous config saved to /var/cache/conftool/dbconfig/20201105-091341-root.json
* 09:13 marostegui@cumin1001: dbctl commit (dc=all): 'es1029 (re)pooling @ 50%: Slowly pool es1029 after being recloned [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P13217 and previous config saved to /var/cache/conftool/dbconfig/20201105-091329-root.json
* 08:58 marostegui@cumin1001: dbctl commit (dc=all): 'es1031 (re)pooling @ 25%: Slowly pool es1031 after being recloned [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P13216 and previous config saved to /var/cache/conftool/dbconfig/20201105-085846-root.json
* 08:58 marostegui@cumin1001: dbctl commit (dc=all): 'es1030 (re)pooling @ 25%: Slowly pool es1030 after being recloned [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P13215 and previous config saved to /var/cache/conftool/dbconfig/20201105-085838-root.json
* 08:58 marostegui@cumin1001: dbctl commit (dc=all): 'es1029 (re)pooling @ 25%: Slowly pool es1029 after being recloned [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P13214 and previous config saved to /var/cache/conftool/dbconfig/20201105-085826-root.json
* 08:43 marostegui@cumin1001: dbctl commit (dc=all): 'es1031 (re)pooling @ 10%: Slowly pool es1031 after being recloned [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P13213 and previous config saved to /var/cache/conftool/dbconfig/20201105-084343-root.json
* 08:43 marostegui@cumin1001: dbctl commit (dc=all): 'es1030 (re)pooling @ 10%: Slowly pool es1030 after being recloned [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P13212 and previous config saved to /var/cache/conftool/dbconfig/20201105-084334-root.json
* 08:43 marostegui@cumin1001: dbctl commit (dc=all): 'es1029 (re)pooling @ 10%: Slowly pool es1029 after being recloned [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P13211 and previous config saved to /var/cache/conftool/dbconfig/20201105-084323-root.json
* 08:42 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1090:3312', diff saved to https://phabricator.wikimedia.org/P13210 and previous config saved to /var/cache/conftool/dbconfig/20201105-084250-marostegui.json
* 08:33 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1090:3312', diff saved to https://phabricator.wikimedia.org/P13209 and previous config saved to /var/cache/conftool/dbconfig/20201105-083304-marostegui.json
* 08:31 marostegui@cumin1001: dbctl commit (dc=all): 'db1129 (re)pooling @ 100%: After regenerating table stats', diff saved to https://phabricator.wikimedia.org/P13208 and previous config saved to /var/cache/conftool/dbconfig/20201105-083142-root.json
* 08:16 marostegui@cumin1001: dbctl commit (dc=all): 'db1129 (re)pooling @ 75%: After regenerating table stats', diff saved to https://phabricator.wikimedia.org/P13207 and previous config saved to /var/cache/conftool/dbconfig/20201105-081638-root.json
* 08:01 marostegui@cumin1001: dbctl commit (dc=all): 'db1129 (re)pooling @ 50%: After regenerating table stats', diff saved to https://phabricator.wikimedia.org/P13206 and previous config saved to /var/cache/conftool/dbconfig/20201105-080135-root.json
* 07:56 marostegui@cumin1001: dbctl commit (dc=all): 'Pool es1031 on es3 with minimium weight after being cloned from es1017 [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P13205 and previous config saved to /var/cache/conftool/dbconfig/20201105-075625-marostegui.json
* 07:55 marostegui@cumin1001: dbctl commit (dc=all): 'Pool es1030 on es2 with minimium weight after being cloned from es1013 [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P13204 and previous config saved to /var/cache/conftool/dbconfig/20201105-075507-marostegui.json
* 07:54 marostegui@cumin1001: dbctl commit (dc=all): 'Pool es1029 on es1 with minimium weight after being cloned from es1016 [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P13203 and previous config saved to /var/cache/conftool/dbconfig/20201105-075358-marostegui.json
* 07:46 marostegui@cumin1001: dbctl commit (dc=all): 'db1129 (re)pooling @ 25%: After regenerating table stats', diff saved to https://phabricator.wikimedia.org/P13202 and previous config saved to /var/cache/conftool/dbconfig/20201105-074631-root.json
* 07:23 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1129 [[phab:T267216|T267216]]', diff saved to https://phabricator.wikimedia.org/P13201 and previous config saved to /var/cache/conftool/dbconfig/20201105-072352-marostegui.json
* 07:10 marostegui@cumin1001: dbctl commit (dc=all): 'es1016 (re)pooling @ 100%: After cloning es1029 [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P13200 and previous config saved to /var/cache/conftool/dbconfig/20201105-071017-root.json
* 07:06 marostegui@cumin1001: dbctl commit (dc=all): 'es1013 (re)pooling @ 100%: After cloning es1030 [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P13199 and previous config saved to /var/cache/conftool/dbconfig/20201105-070616-root.json
* 07:06 marostegui@cumin1001: dbctl commit (dc=all): 'es1017 (re)pooling @ 100%: After cloning es1031 [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P13198 and previous config saved to /var/cache/conftool/dbconfig/20201105-070610-root.json
* 06:55 marostegui@cumin1001: dbctl commit (dc=all): 'es1016 (re)pooling @ 75%: After cloning es1029 [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P13197 and previous config saved to /var/cache/conftool/dbconfig/20201105-065514-root.json
* 06:51 marostegui@cumin1001: dbctl commit (dc=all): 'es1013 (re)pooling @ 75%: After cloning es1030 [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P13196 and previous config saved to /var/cache/conftool/dbconfig/20201105-065113-root.json
* 06:51 marostegui@cumin1001: dbctl commit (dc=all): 'es1017 (re)pooling @ 75%: After cloning es1031 [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P13195 and previous config saved to /var/cache/conftool/dbconfig/20201105-065107-root.json
* 06:40 marostegui@cumin1001: dbctl commit (dc=all): 'es1016 (re)pooling @ 50%: After cloning es1029 [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P13193 and previous config saved to /var/cache/conftool/dbconfig/20201105-064010-root.json
* 06:36 marostegui@cumin1001: dbctl commit (dc=all): 'es1013 (re)pooling @ 50%: After cloning es1030 [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P13192 and previous config saved to /var/cache/conftool/dbconfig/20201105-063610-root.json
* 06:36 marostegui@cumin1001: dbctl commit (dc=all): 'es1017 (re)pooling @ 50%: After cloning es1031 [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P13191 and previous config saved to /var/cache/conftool/dbconfig/20201105-063603-root.json
* 06:34 elukey: truncate application_1601916545561_129457's taskmanager.log (~600G) on an-worker1113 due to partition 'e' full
* 06:25 marostegui@cumin1001: dbctl commit (dc=all): 'es1016 (re)pooling @ 25%: After cloning es1029 [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P13190 and previous config saved to /var/cache/conftool/dbconfig/20201105-062507-root.json
* 06:24 marostegui@cumin1001: dbctl commit (dc=all): 'es1013 (re)pooling @ 25%: After cloning es1027 [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P13189 and previous config saved to /var/cache/conftool/dbconfig/20201105-062454-root.json
* 06:24 marostegui@cumin1001: dbctl commit (dc=all): 'es1017 (re)pooling @ 25%: After cloning es1028 [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P13188 and previous config saved to /var/cache/conftool/dbconfig/20201105-062446-root.json
* 01:57 milimetric@deploy1001: Finished deploy [analytics/refinery@6913407] (thin): Regular analytics weekly train THIN [analytics/refinery@6913407] (duration: 00m 08s)
* 01:56 milimetric@deploy1001: Started deploy [analytics/refinery@6913407] (thin): Regular analytics weekly train THIN [analytics/refinery@6913407]
* 01:56 milimetric@deploy1001: Finished deploy [analytics/refinery@6913407]: Regular analytics weekly train [analytics/refinery@6913407] (duration: 08m 34s)
* 01:47 milimetric@deploy1001: Started deploy [analytics/refinery@6913407]: Regular analytics weekly train [analytics/refinery@6913407]
 
== 2020-11-04 ==
* 20:36 Urbanecm: Late B&C Morning window completed, deployment host is clear
* 20:36 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|ee0ba541fa55f6707276fdc5bd3f032cb9be3e60}}: Disable the search in header A/B test ([[phab:T265333|T265333]]) (duration: 01m 06s)
* 20:33 ejegg: updated payments-wiki from {{Gerrit|1ad4ba9639}} to {{Gerrit|388490e86d}}
* 20:31 otto@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Migrate NewcomerTask event stream to EventGate on testwiki - [[phab:T259163|T259163]] (duration: 01m 07s)
* 20:17 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|82579bf9d71bd3c9d97da0132ce8d92a8863da5b}}: Enable wgImagePreconnect on remaining wikis ([[phab:T123582|T123582]]) (duration: 01m 06s)
* 20:12 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|d2a57725f8f6fdaa3f40c834e84b43a0260077f2}}: Enable DiscussionTools as a beta feature on almost all Wikipedias ([[phab:T266303|T266303]]) (duration: 01m 07s)
* 20:08 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|fb5c03262c20b5e99b3c2f6e91abb024f12da1f5}}: Enable wgCheckUserLogLogins at all wikis but loginwiki ([[phab:T253802|T253802]]) (duration: 01m 08s)
* 19:59 brennen@deploy1001: Finished scap: testwikis wikis to 1.36.0-wmf.16 (duration: 62m 44s)
* 18:57 brennen@deploy1001: Started scap: testwikis wikis to 1.36.0-wmf.16
* 18:52 brennen@deploy1001: Pruned MediaWiki: 1.36.0-wmf.10 (duration: 27m 38s)
* 18:51 Urbanecm: Strip 2FA for Mark83 at SUL ([[phab:T267257|T267257]])
* 18:20 elukey: restart memcached on mc1036 to pick up new settings (see https://gerrit.wikimedia.org/r/639099)
* 18:15 hknust: holger@mwmaint1002 END - Run updateRestrictions.php
* 17:44 hknust: holger@mwmaint1002 START - Run updateRestrictions.php
* 17:22 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 17:20 jiji@cumin1001: START - Cookbook sre.hosts.downtime
* 17:15 zpapierski@deploy1001: Finished deploy [wikimedia/discovery/analytics@8e8d2d4]: Deploying dc switch (duration: 01m 15s)
* 17:13 zpapierski@deploy1001: Started deploy [wikimedia/discovery/analytics@8e8d2d4]: Deploying dc switch
* 17:07 effie: Reimage mc1036 for real this time
* 16:40 brennen: 1.36.0-wmf.16 was branched at {{Gerrit|f51ccd2ccef8cba0e7d874b6f7cf4b73bcd36636}} for [[phab:T263182|T263182]]
* 16:10 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 16:10 filippo@cumin1001: START - Cookbook sre.hosts.downtime
* 15:39 effie: Reimage mc1036 to buster - [[phab:T252391|T252391]]
* 15:25 otto@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Migrate ContentTranslationAbuseFilter event stream to EventGate on all wikis - [[phab:T259163|T259163]] (duration: 00m 58s)
* 15:19 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 15:19 kormat@cumin1001: START - Cookbook sre.hosts.downtime
* 15:18 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 15:18 kormat@cumin1001: START - Cookbook sre.hosts.downtime
* 15:09 otto@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Migrate ContentTranslationAbuseFilter event stream to EventGate on testwiki - [[phab:T259163|T259163]] (duration: 00m 59s)
* 14:37 jynus: restart mysql at db1133 [[phab:T266483|T266483]]
* 14:35 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 14:35 kormat@cumin1001: START - Cookbook sre.hosts.downtime
* 14:35 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 14:35 kormat@cumin1001: START - Cookbook sre.hosts.downtime
* 14:35 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 14:34 kormat@cumin1001: START - Cookbook sre.hosts.downtime
* 14:19 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 14:19 kormat@cumin1001: START - Cookbook sre.hosts.downtime
* 14:17 elukey: upload hue 4.8.0-1+deb10u1 to buster-wikimedia
* 14:15 jynus: restart mysqls at db209[789],db210[01], db2139, db2141 [[phab:T266483|T266483]]
* 14:03 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 14:03 kormat@cumin1001: START - Cookbook sre.hosts.downtime
* 13:59 jynus: restart mysqls at db1150 [[phab:T266483|T266483]]
* 13:54 jynus: restart mysqls at db1145 [[phab:T266483|T266483]]
* 13:51 jynus: restart mysqls at db1140 [[phab:T266483|T266483]]
* 13:47 jynus: restart mysqls at db1139 [[phab:T266483|T266483]]
* 13:43 jynus: restart mysqls at db1116 [[phab:T266483|T266483]]
* 13:43 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 13:43 kormat@cumin1001: START - Cookbook sre.hosts.downtime
* 13:40 jynus: restart mysqls at db1102 [[phab:T266483|T266483]]
* 13:36 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 13:36 kormat@cumin1001: START - Cookbook sre.hosts.downtime
* 13:35 jynus: restart mysqls at db1095 [[phab:T266483|T266483]]
* 13:24 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 13:24 kormat@cumin1001: START - Cookbook sre.hosts.downtime
* 13:06 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 13:06 kormat@cumin1001: START - Cookbook sre.hosts.downtime
* 12:58 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 12:58 kormat@cumin1001: START - Cookbook sre.hosts.downtime
* 12:55 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 12:55 kormat@cumin1001: START - Cookbook sre.hosts.downtime
* 12:55 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 12:55 kormat@cumin1001: START - Cookbook sre.hosts.downtime
* 12:50 Lucas_WMDE: EU backport&config done
* 12:11 Urbanecm: Run scap pull at snapshot1010 manually
* 12:09 Urbanecm: scap-sync file returned `snapshot1010.eqiad.wmnet returned [255]: Host key verification failed.`
* 12:08 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|ed3c43dc4488205663e6694b7ddfa991e3f3d4b9}}: Add www.irishstatutebook.ie to the wgCopyUploadsDomains allowlist of Wikimedia Commons ([[phab:T267193|T267193]]) (duration: 01m 02s)
* 11:54 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 11:54 kormat@cumin1001: START - Cookbook sre.hosts.downtime
* 11:53 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 11:53 kormat@cumin1001: START - Cookbook sre.hosts.downtime
* 11:52 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 11:52 kormat@cumin1001: START - Cookbook sre.hosts.downtime
* 11:51 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 11:51 kormat@cumin1001: START - Cookbook sre.hosts.downtime
* 11:47 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 11:47 kormat@cumin1001: START - Cookbook sre.hosts.downtime
* 11:32 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 11:32 kormat@cumin1001: START - Cookbook sre.hosts.downtime
* 11:25 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 11:25 kormat@cumin1001: START - Cookbook sre.hosts.downtime
* 11:05 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 11:05 kormat@cumin1001: START - Cookbook sre.hosts.downtime
* 11:03 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 11:03 kormat@cumin1001: START - Cookbook sre.hosts.downtime
* 11:03 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 11:02 kormat@cumin1001: START - Cookbook sre.hosts.downtime
* 11:01 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 11:01 kormat@cumin1001: START - Cookbook sre.hosts.downtime
* 10:27 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 10:27 kormat@cumin1001: START - Cookbook sre.hosts.downtime
* 10:23 kormat@cumin1001: dbctl commit (dc=all): 'Repooling after reboot. [[phab:T261389|T261389]]', diff saved to https://phabricator.wikimedia.org/P13185 and previous config saved to /var/cache/conftool/dbconfig/20201104-102341-kormat.json
* 10:23 Urbanecm: Start of `mwscript extensions/AbuseFilter/maintenance/updateVarDumps.php --wiki=$wiki --print-orphaned-records-to=/tmp/urbanecm/$wiki-orphaned.log --progress-markers > $wiki.log` in a tmux session updateVarDumps at mwmaint1002 (wiki=fiwiki; [[phab:T246539|T246539]])
* 10:17 kormat@cumin1001: dbctl commit (dc=all): 'Rebooting for [[phab:T261389|T261389]]', diff saved to https://phabricator.wikimedia.org/P13184 and previous config saved to /var/cache/conftool/dbconfig/20201104-101729-kormat.json
* 10:17 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 10:17 kormat@cumin1001: START - Cookbook sre.hosts.downtime
* 10:17 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 10:17 kormat@cumin1001: START - Cookbook sre.hosts.downtime
* 10:08 _joe_: restarting envoyproxy on all of restbase codfw, sending the command in parallel via cumin, to test poolcounter usage by the safe restart scripts
* 10:05 _joe_: restarting envoyproxy on restbase20<nowiki>{</nowiki>09,10<nowiki>}</nowiki> to test poolcounter usage by the safe restart scripts
* 09:25 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 09:25 kormat@cumin1001: START - Cookbook sre.hosts.downtime
* 09:24 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 09:24 kormat@cumin1001: START - Cookbook sre.hosts.downtime
* 09:19 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.cf (exit_code=0)
* 09:19 ayounsi@cumin1001: START - Cookbook sre.network.cf
* 09:01 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 09:01 kormat@cumin1001: START - Cookbook sre.hosts.downtime
* 09:00 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 09:00 kormat@cumin1001: START - Cookbook sre.hosts.downtime
* 08:44 moritzm: uploaded freetype 2.5.2+deb8u4+wmf1 to apt.wikimedia.org/jessie-wikimedia
* 08:00 marostegui@cumin1001: dbctl commit (dc=all): 'es1028 (re)pooling @ 100%: Slowly pool es1028 after being recloned [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P13182 and previous config saved to /var/cache/conftool/dbconfig/20201104-080033-root.json
* 08:00 marostegui@cumin1001: dbctl commit (dc=all): 'es1027 (re)pooling @ 100%: Slowly pool es1027 after being recloned [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P13181 and previous config saved to /var/cache/conftool/dbconfig/20201104-080024-root.json
* 07:59 marostegui@cumin1001: dbctl commit (dc=all): 'es1026 (re)pooling @ 100%: Slowly pool es1026 after being recloned [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P13180 and previous config saved to /var/cache/conftool/dbconfig/20201104-075953-root.json
* 07:45 marostegui@cumin1001: dbctl commit (dc=all): 'es1028 (re)pooling @ 75%: Slowly pool es1028 after being recloned [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P13179 and previous config saved to /var/cache/conftool/dbconfig/20201104-074530-root.json
* 07:45 marostegui@cumin1001: dbctl commit (dc=all): 'es1027 (re)pooling @ 75%: Slowly pool es1027 after being recloned [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P13178 and previous config saved to /var/cache/conftool/dbconfig/20201104-074520-root.json
* 07:44 marostegui@cumin1001: dbctl commit (dc=all): 'es1026 (re)pooling @ 75%: Slowly pool es1026 after being recloned [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P13177 and previous config saved to /var/cache/conftool/dbconfig/20201104-074449-root.json
* 07:30 marostegui@cumin1001: dbctl commit (dc=all): 'es1028 (re)pooling @ 50%: Slowly pool es1028 after being recloned [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P13176 and previous config saved to /var/cache/conftool/dbconfig/20201104-073026-root.json
* 07:30 marostegui@cumin1001: dbctl commit (dc=all): 'es1027 (re)pooling @ 50%: Slowly pool es1027 after being recloned [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P13175 and previous config saved to /var/cache/conftool/dbconfig/20201104-073017-root.json
* 07:29 marostegui@cumin1001: dbctl commit (dc=all): 'es1026 (re)pooling @ 50%: Slowly pool es1026 after being recloned [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P13174 and previous config saved to /var/cache/conftool/dbconfig/20201104-072946-root.json
* 07:15 marostegui@cumin1001: dbctl commit (dc=all): 'es1028 (re)pooling @ 25%: Slowly pool es1028 after being recloned [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P13173 and previous config saved to /var/cache/conftool/dbconfig/20201104-071523-root.json
* 07:15 marostegui@cumin1001: dbctl commit (dc=all): 'es1027 (re)pooling @ 25%: Slowly pool es1027 after being recloned [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P13172 and previous config saved to /var/cache/conftool/dbconfig/20201104-071513-root.json
* 07:14 marostegui@cumin1001: dbctl commit (dc=all): 'es1026 (re)pooling @ 25%: Slowly pool es1026 after being recloned [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P13171 and previous config saved to /var/cache/conftool/dbconfig/20201104-071443-root.json
* 07:09 elukey: manual cleanup of mcelog and its wmf-auto-restart (failing) on mw1381 (kernel 4.19, doesn't support mcelog)
* 07:01 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es1016 es1013 es1017 [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P13170 and previous config saved to /var/cache/conftool/dbconfig/20201104-070121-marostegui.json
* 07:00 marostegui: Stop mysql on es1016, es1013, es1017 to clone es1029, es1030, es1031 [[phab:T261717|T261717]]
* 07:00 marostegui@cumin1001: dbctl commit (dc=all): 'es1028 (re)pooling @ 10%: Slowly pool es1028 after being recloned [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P13169 and previous config saved to /var/cache/conftool/dbconfig/20201104-070020-root.json
* 07:00 marostegui@cumin1001: dbctl commit (dc=all): 'es1027 (re)pooling @ 10%: Slowly pool es1027 after being recloned [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P13168 and previous config saved to /var/cache/conftool/dbconfig/20201104-070010-root.json
* 06:59 marostegui@cumin1001: dbctl commit (dc=all): 'es1026 (re)pooling @ 10%: Slowly pool es1026 after being recloned [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P13167 and previous config saved to /var/cache/conftool/dbconfig/20201104-065939-root.json
* 06:59 marostegui@cumin1001: dbctl commit (dc=all): 'es1014 (re)pooling @ 100%: After cloning es1028 [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P13166 and previous config saved to /var/cache/conftool/dbconfig/20201104-065926-root.json
* 06:59 marostegui@cumin1001: dbctl commit (dc=all): 'es1012 (re)pooling @ 100%: After cloning es1027 [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P13165 and previous config saved to /var/cache/conftool/dbconfig/20201104-065905-root.json
* 06:58 marostegui@cumin1001: dbctl commit (dc=all): 'es1011 (re)pooling @ 100%: After cloning es1026 [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P13164 and previous config saved to /var/cache/conftool/dbconfig/20201104-065849-root.json
* 06:52 elukey: force start of rasdaemon.service on dumpsdata1002 (its auto-restart unit was failing for it)
* 06:47 elukey: set an-presto1004's netbox status as "active" (was: failed) after hw maintenance - [[phab:T253438|T253438]]
* 06:44 elukey: force restart of uwsgi-ores on ores1005 - daemon down after reload, max client reached error messages in the logs
* 06:44 marostegui@cumin1001: dbctl commit (dc=all): 'es1014 (re)pooling @ 75%: After cloning es1028 [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P13163 and previous config saved to /var/cache/conftool/dbconfig/20201104-064422-root.json
* 06:44 marostegui@cumin1001: dbctl commit (dc=all): 'es1012 (re)pooling @ 75%: After cloning es1027 [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P13162 and previous config saved to /var/cache/conftool/dbconfig/20201104-064402-root.json
* 06:43 marostegui@cumin1001: dbctl commit (dc=all): 'es1011 (re)pooling @ 75%: After cloning es1026 [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P13161 and previous config saved to /var/cache/conftool/dbconfig/20201104-064345-root.json
* 06:30 marostegui@cumin1001: dbctl commit (dc=all): 'Pool es1028 with minimum weight after recloning [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P13160 and previous config saved to /var/cache/conftool/dbconfig/20201104-063028-marostegui.json
* 06:29 marostegui@cumin1001: dbctl commit (dc=all): 'es1014 (re)pooling @ 50%: After cloning es1028 [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P13159 and previous config saved to /var/cache/conftool/dbconfig/20201104-062919-root.json
* 06:29 marostegui@cumin1001: dbctl commit (dc=all): 'es1012 (re)pooling @ 50%: After cloning es1027 [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P13158 and previous config saved to /var/cache/conftool/dbconfig/20201104-062858-root.json
* 06:28 marostegui@cumin1001: dbctl commit (dc=all): 'es1011 (re)pooling @ 50%: After cloning es1026 [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P13157 and previous config saved to /var/cache/conftool/dbconfig/20201104-062842-root.json
* 06:18 marostegui@cumin1001: dbctl commit (dc=all): 'Pool es1027 with minimum weight after recloning [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P13156 and previous config saved to /var/cache/conftool/dbconfig/20201104-061829-marostegui.json
* 06:15 marostegui@cumin1001: dbctl commit (dc=all): 'Pool es1026 with minimum weight after recloning [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P13155 and previous config saved to /var/cache/conftool/dbconfig/20201104-061549-marostegui.json
* 06:14 marostegui@cumin1001: dbctl commit (dc=all): 'es1014 (re)pooling @ 25%: After cloning es1028 [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P13154 and previous config saved to /var/cache/conftool/dbconfig/20201104-061416-root.json
* 06:13 marostegui@cumin1001: dbctl commit (dc=all): 'es1012 (re)pooling @ 25%: After cloning es1027 [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P13153 and previous config saved to /var/cache/conftool/dbconfig/20201104-061355-root.json
* 06:13 marostegui@cumin1001: dbctl commit (dc=all): 'es1011 (re)pooling @ 25%: After cloning es1026 [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P13152 and previous config saved to /var/cache/conftool/dbconfig/20201104-061339-root.json
 
== 2020-11-03 ==
* 22:56 _joe_: repooling mw1346
* 22:55 _joe_: depooling mw1346
* 22:49 cdanis: mw1342 restart-php7.2-fpm
* 22:37 cdanis: repool mw1278 and mw1279
* 22:35 cdanis: ✔️ cdanis@mw1290.eqiad.wmnet ~ 🕠🍺 sudo restart-php7.2-fpm
* 22:34 cdanis: restart-php7.2-fpm and pool on mw1276
* 22:31 cdanis: depool mw1276 and mw1279 also
* 22:25 cdanis: ✔️ cdanis@mw1278.eqiad.wmnet ~ 🕠🍺 sudo depool
* 21:16 hashar: Gerrit: triggering java garbage collection # [[phab:T263008|T263008]]
* 19:32 gehel: restarting blazegraph on wdqs1007 to reset ban list
* 18:21 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 18:15 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 17:45 cmjohnson1: shutting elastic1063 down to reseat DIMM [[phab:T265113|T265113]]
* 17:32 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 17:31 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime
* 17:31 hnowlan@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 17:31 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime
* 17:30 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 17:30 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime
* 17:30 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 17:29 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime
* 16:57 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 16:51 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 16:39 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 16:39 kormat@cumin1001: START - Cookbook sre.hosts.downtime
* 16:36 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 16:35 kormat@cumin1001: START - Cookbook sre.hosts.downtime
* 16:35 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 16:35 kormat@cumin1001: START - Cookbook sre.hosts.downtime
* 16:22 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 16:22 kormat@cumin1001: START - Cookbook sre.hosts.downtime
* 16:21 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 16:21 kormat@cumin1001: START - Cookbook sre.hosts.downtime
* 16:13 cdanis@cumin1001: END (PASS) - Cookbook sre.network.cf (exit_code=0)
* 16:13 cdanis@cumin1001: START - Cookbook sre.network.cf
* 16:04 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
* 16:03 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
* 16:01 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
* 16:01 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
* 15:59 elukey: shutdown kafka-jumbo1006 to replace 1G with 10G nic
* 15:52 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 15:51 kormat@cumin1001: START - Cookbook sre.hosts.downtime
* 15:51 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 15:51 kormat@cumin1001: START - Cookbook sre.hosts.downtime
* 15:08 moritzm: imported php-redis/xdebug to component/php72 for buster-wikimedia
* 14:37 moritzm: imported php-apcu-bc/php-igbinary/tideways-xhprof to component/php72 for buster-wikimedia
* 14:35 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 14:35 kormat@cumin1001: START - Cookbook sre.hosts.downtime
* 14:33 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 14:32 kormat@cumin1001: START - Cookbook sre.hosts.downtime
* 14:04 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 14:04 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime
* 13:53 moritzm: imported php-mongodb/php-wmerrors/wikidiff2 to component/php72 for buster-wikimedia
* 13:43 sobanski: Removing db1091 from tendril and zarcillo [[phab:T267088|T267088]]
* 13:34 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 13:34 kormat@cumin1001: START - Cookbook sre.hosts.downtime
* 13:33 lsobanski@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 13:24 lsobanski@cumin1001: START - Cookbook sre.hosts.decommission
* 13:22 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 13:22 kormat@cumin1001: START - Cookbook sre.hosts.downtime
* 13:02 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 13:02 kormat@cumin1001: START - Cookbook sre.hosts.downtime
* 12:40 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 12:40 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime
* 11:58 moritzm: imported php-apcu/php-geoip/php-imagick/php-mailparse to component/php72 for buster-wikimedia
* 11:57 moritzm: running "reprepro clearvanished" to prune thirdparty/orchestrator
* 11:51 gilles@deploy1001: Finished deploy [performance/asoranking@2a2cb05]: [[phab:T266985|T266985]] (duration: 00m 03s)
* 11:51 gilles@deploy1001: Started deploy [performance/asoranking@2a2cb05]: [[phab:T266985|T266985]]
* 11:29 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 11:29 kormat@cumin1001: START - Cookbook sre.hosts.downtime
* 11:23 hnowlan@cumin1001: END (FAIL) - Cookbook sre.postgresql.postgres-init (exit_code=99)
* 11:23 hnowlan@cumin1001: START - Cookbook sre.postgresql.postgres-init
* 11:23 hnowlan: resyncing postgres replica maps1001
* 11:03 Amir1: rolling restart of ores
* 10:59 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 10:59 kormat@cumin1001: START - Cookbook sre.hosts.downtime
* 10:45 gilles@deploy1001: Finished deploy [performance/asoranking@2a2cb05]: [[phab:T266985|T266985]] (duration: 00m 07s)
* 10:45 gilles@deploy1001: Started deploy [performance/asoranking@2a2cb05]: [[phab:T266985|T266985]]
* 10:39 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 10:39 kormat@cumin1001: START - Cookbook sre.hosts.downtime
* 10:22 gilles@deploy1001: Finished deploy [performance/asoranking@2a2cb05]: [[phab:T266985|T266985]] (duration: 00m 26s)
* 10:21 gilles@deploy1001: Started deploy [performance/asoranking@2a2cb05]: [[phab:T266985|T266985]]
* 10:16 elukey@deploy1001: Finished deploy [analytics/refinery@cf5db74] (hadoop-test): (no justification provided) (duration: 02m 15s)
* 10:14 elukey@deploy1001: Started deploy [analytics/refinery@cf5db74] (hadoop-test): (no justification provided)
* 10:13 elukey@deploy1001: Finished deploy [analytics/refinery@cf5db74] (hadoop-test): (no justification provided) (duration: 01m 45s)
* 10:11 elukey@deploy1001: Started deploy [analytics/refinery@cf5db74] (hadoop-test): (no justification provided)
* 10:08 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 10:08 kormat@cumin1001: START - Cookbook sre.hosts.downtime
* 09:57 kormat: uploaded orchestrator 3.2.3-2 to apt
* 09:54 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 09:54 kormat@cumin1001: START - Cookbook sre.hosts.downtime
* 09:49 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 09:49 kormat@cumin1001: START - Cookbook sre.hosts.downtime
* 09:18 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 09:18 kormat@cumin1001: START - Cookbook sre.hosts.downtime
* 09:12 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 09:12 kormat@cumin1001: START - Cookbook sre.hosts.downtime
* 09:06 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 09:06 kormat@cumin1001: START - Cookbook sre.hosts.downtime
* 09:05 kormat@cumin1001: dbctl commit (dc=all): 'Repooling after reboot. [[phab:T261389|T261389]]', diff saved to https://phabricator.wikimedia.org/P13139 and previous config saved to /var/cache/conftool/dbconfig/20201103-090523-kormat.json
* 09:00 kormat@cumin1001: dbctl commit (dc=all): 'Rebooting for [[phab:T261389|T261389]]', diff saved to https://phabricator.wikimedia.org/P13138 and previous config saved to /var/cache/conftool/dbconfig/20201103-090013-kormat.json
* 09:00 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 09:00 kormat@cumin1001: START - Cookbook sre.hosts.downtime
* 08:32 godog: Prometheus re-enable compactions - [[phab:T261281|T261281]]
* 06:59 marostegui: Remove db1091 from tendril and zarcillo [[phab:T267088|T267088]]
* 06:57 marostegui@cumin1001: dbctl commit (dc=all): 'Remove db1091 from dbctl [[phab:T267088|T267088]]', diff saved to https://phabricator.wikimedia.org/P13137 and previous config saved to /var/cache/conftool/dbconfig/20201103-065756-marostegui.json
* 06:46 marostegui: Deploy schema change on s1 codfw master: [[phab:T265349|T265349]]
* 06:16 marostegui: Stop MySQL on es1014 to clone es1028 [[phab:T261717|T261717]]
* 06:14 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es1014 to reclone es1028 [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P13136 and previous config saved to /var/cache/conftool/dbconfig/20201103-061423-marostegui.json
* 06:14 marostegui@cumin1001: dbctl commit (dc=all): 'Promote es1019 to es3 master (this is a noop) [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P13135 and previous config saved to /var/cache/conftool/dbconfig/20201103-061403-marostegui.json
* 06:11 marostegui: Stop MySQL on es1012 to clone es1027 [[phab:T261717|T261717]]
* 06:07 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es1012 to reclone es1027 [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P13134 and previous config saved to /var/cache/conftool/dbconfig/20201103-060727-marostegui.json
* 06:07 marostegui@cumin1001: dbctl commit (dc=all): 'Promote es1018 to es1 master (this is a noop) [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P13133 and previous config saved to /var/cache/conftool/dbconfig/20201103-060705-marostegui.json
* 06:04 marostegui: Stop MySQL on es1011 to clone es1026 [[phab:T261717|T261717]]
* 06:00 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es1011 to reclone es1026 [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P13132 and previous config saved to /var/cache/conftool/dbconfig/20201103-060054-marostegui.json
* 06:00 marostegui@cumin1001: dbctl commit (dc=all): 'Promote es1015 to es2 master (this is a noop) [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P13131 and previous config saved to /var/cache/conftool/dbconfig/20201103-060038-marostegui.json
* 04:39 cstone: civicrm revision changed from {{Gerrit|cd13d9e30f}} to {{Gerrit|b1342c4129}}
* 02:13 shdubsh: restart ES on logstash1009 - oom killed
* 01:01 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 00:59 robh@cumin1001: START - Cookbook sre.hosts.downtime
* 00:42 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 00:40 robh@cumin1001: START - Cookbook sre.hosts.downtime
 
== 2020-11-02 ==
* 22:19 twentyafterfour: restart php7.3-fpm on phab1001
* 22:03 twentyafterfour: applied {{Gerrit|113a244a66}} on phab1001 to hotfix [[phab:T240862|T240862]]
* 20:22 eileen: process-control config revision is {{Gerrit|313a36312f}} re-enable thank you
* 19:56 dzahn@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 19:48 dzahn@cumin1001: START - Cookbook sre.dns.netbox
* 19:47 eileen: civicrm revision changed from {{Gerrit|3317d30356}} to {{Gerrit|cd13d9e30f}}, config revision is {{Gerrit|db912e3bba}}
* 19:45 eileen: process-control config revision is {{Gerrit|db912e3bba}} - thankyou job off for testing
* 19:07 Urbanecm: Deployed security fix for [[phab:T205908|T205908]]
* 19:04 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
* 18:59 andrewbogott: added dcaro to ops and wmf ldap groups
* 18:59 mutante: decom'ing testvm1001
* 18:58 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
* 18:49 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 18:49 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime
* 18:49 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 18:49 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime
* 18:17 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 18:17 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime
* 18:14 XioNoX: push new pfw policies - [[phab:T267051|T267051]]
* 16:39 ejegg: updated payments-wiki from {{Gerrit|adc3369cb3}} to {{Gerrit|1ad4ba9639}}
* 16:37 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 16:37 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime
* 16:37 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 16:37 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime
* 16:37 hnowlan@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 16:37 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime
* 15:36 moritzm: imported php-excimer/php-luasandbox to component/php72 for buster-wikimedia
* 14:38 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 14:38 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime
* 14:34 moritzm: rolling restart of cassandra in restbase-dev to pick up Java security updates
* 14:17 kormat: uploaded orchestrator 3.2.3-1 to apt
* 14:01 hashar@deploy1001: Synchronized wmf-config/CommonSettings.php: Remove $wgExtDistListFile, unused - [[phab:T266024|T266024]] (duration: 00m 58s)
* 13:46 elukey@cumin1001: END (PASS) - Cookbook sre.zookeeper.roll-restart-zookeeper (exit_code=0)
* 13:40 elukey: roll restart zookeeper ok an-conf* to pick up new openjdk upgrades
* 13:40 elukey@cumin1001: START - Cookbook sre.zookeeper.roll-restart-zookeeper
* 13:03 Lucas_WMDE: EU backport&config window done
* 13:02 lucaswerkmeister-wmde@deploy1001: Synchronized php-1.36.0-wmf.14/extensions/Wikibase: Backport: [[gerrit:637801{{!}}Revert JS parser commits (T266671)]] (duration: 01m 09s)
* 12:52 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:637819{{!}}Add Response namespace at otrs_wikiwiki to namespaces searched by default (T266917)]] (duration: 00m 58s)
* 12:21 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/InitialiseSettings-labs.php: Config: [[gerrit:634224{{!}}Stop defining wmgULSCompactLinksForNewAccounts and wmgULSCompactLinksEnableAnon]], 2/2 (Beta) (duration: 00m 57s)
* 12:20 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:634224{{!}}Stop defining wmgULSCompactLinksForNewAccounts and wmgULSCompactLinksEnableAnon]], 1/2 (production) (duration: 01m 02s)
* 12:15 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/CommonSettings.php: Config: [[gerrit:638020{{!}}Stop reading wmgULSCompactLinksForNewAccounts and wmgULSCompactLinksEnableAnon]] (duration: 00m 58s)
* 12:15 volans: upgraded python3-wmflib to 0.0.4 on cumin[12]001
* 12:09 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/InitialiseSettings-labs.php: Config: [[gerrit:637778{{!}}Fix array depth for properties array (T266835)]], Beta part (prod no-op) (duration: 00m 58s)
* 12:07 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:637778{{!}}Fix array depth for properties array (T266835)]] (duration: 00m 59s)
* 12:02 volans: uploaded python3-wmflib_0.0.4 to apt.wikimedia.org buster-wikimedia
* 11:51 effie: disable puppet on  thumbor1001 and thumbor1002 to test 636024
* 11:51 effie: disable thumbor on thumbor1001 and thumbor1002 to test 636024
* 11:34 jdrewniak@deploy1001: Synchronized portals: Wikimedia Portals Update: [[gerrit:638045{{!}} Bumping portals to master (T128546)]] (duration: 00m 58s)
* 11:33 jdrewniak@deploy1001: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: [[gerrit:638045{{!}} Bumping portals to master (T128546)]] (duration: 01m 00s)
* 11:18 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 11:18 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime
* 11:13 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 11:06 godog: upgrade thanos to 0.16.0 on prometheus hosts - [[phab:T261281|T261281]]
* 10:59 jmm@cumin2001: START - Cookbook sre.hosts.decommission
* 10:57 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 10:50 jmm@cumin2001: START - Cookbook sre.hosts.decommission
* 10:28 oblivian@cumin1001: END (PASS) - Cookbook sre.network.cf (exit_code=0)
* 10:28 oblivian@cumin1001: START - Cookbook sre.network.cf
* 10:28 oblivian@cumin1001: END (PASS) - Cookbook sre.network.cf (exit_code=0)
* 10:28 oblivian@cumin1001: START - Cookbook sre.network.cf
* 10:23 moritzm: installing openldap security updates on corp LDAP replicas
* 08:46 XioNoX: add uRPF strict to ulsfo office links - [[phab:T266561|T266561]]
* 08:41 moritzm: installing openldap security updates on LDAP replicas
* 08:40 godog: upgrade thanos to 0.16 in codfw/eqiad - [[phab:T261281|T261281]]
* 06:09 oblivian@cumin1001: END (PASS) - Cookbook sre.network.cf (exit_code=0)
* 06:09 oblivian@cumin1001: START - Cookbook sre.network.cf
* 06:09 oblivian@cumin1001: END (PASS) - Cookbook sre.network.cf (exit_code=0)
* 06:09 oblivian@cumin1001: START - Cookbook sre.network.cf
 
== 2020-11-01 ==
* 22:41 Urbanecm: mwscript extensions/OATHAuth/maintenance/disableOATHAuthForUser.php --wiki=metawiki Turkmen # [[phab:T266976|T266976]]
* 09:52 ariel@deploy1001: Finished deploy [dumps/dumps@de4c823]: actually allow per run dir to be made early in the run (duration: 00m 04s)
* 09:52 ariel@deploy1001: Started deploy [dumps/dumps@de4c823]: actually allow per run dir to be made early in the run
* 09:16 ariel@deploy1001: Finished deploy [dumps/dumps@6c7d811]: create empty dir for tableinfo if needed (duration: 00m 04s)
* 09:16 ariel@deploy1001: Started deploy [dumps/dumps@6c7d811]: create empty dir for tableinfo if needed
* 01:26 rzl@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 01:26 rzl@cumin1001: START - Cookbook sre.hosts.downtime
* 01:16 rzl@cumin1001: dbctl commit (dc=all): 'Depool db1091', diff saved to https://phabricator.wikimedia.org/P13124 and previous config saved to /var/cache/conftool/dbconfig/20201101-011600-rzl.json
 
== 2020-10-31 ==
* 00:12 mutante: removed Nuria from wmf group, she is already in nda group ([[phab:T266086|T266086]])
 
== 2020-10-30 ==
* 23:35 foks: removing two files for legal compliance
* 23:32 mutante: adding query.wikidata.org to TLS cert for webserver-misc-apps.discovery.wmnet [[phab:T266702|T266702]]
* 23:05 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 23:04 robh@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 23:04 robh@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 23:03 robh@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 23:03 robh@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 23:03 robh@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 23:03 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 23:02 robh@cumin1001: START - Cookbook sre.hosts.downtime
* 23:02 robh@cumin1001: START - Cookbook sre.hosts.downtime
* 23:02 robh@cumin1001: START - Cookbook sre.hosts.downtime
* 23:01 robh@cumin1001: START - Cookbook sre.hosts.downtime
* 23:01 robh@cumin1001: START - Cookbook sre.hosts.downtime
* 23:01 robh@cumin1001: START - Cookbook sre.hosts.downtime
* 23:01 robh@cumin1001: START - Cookbook sre.hosts.downtime
* 21:02 jiji@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 21:00 jiji@cumin2001: START - Cookbook sre.hosts.downtime
* 20:59 mutante: mw1267,mw1268 - scap pull and repool - back to prod - [[phab:T266164|T266164]]
* 20:57 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1267.eqiad.wmnet
* 20:57 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1268.eqiad.wmnet
* 20:56 mutante: mw1267,mw1268 - scap pull
* 20:33 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 20:32 robh@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 20:32 robh@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 20:32 robh@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 20:32 robh@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 20:32 robh@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 20:32 robh@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 20:32 robh@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 20:32 robh@cumin1001: START - Cookbook sre.hosts.downtime
* 20:31 robh@cumin1001: START - Cookbook sre.hosts.downtime
* 20:31 robh@cumin1001: START - Cookbook sre.hosts.downtime
* 20:31 robh@cumin1001: START - Cookbook sre.hosts.downtime
* 20:31 robh@cumin1001: START - Cookbook sre.hosts.downtime
* 20:31 robh@cumin1001: START - Cookbook sre.hosts.downtime
* 20:31 robh@cumin1001: START - Cookbook sre.hosts.downtime
* 20:31 robh@cumin1001: START - Cookbook sre.hosts.downtime
* 20:06 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 20:04 robh@cumin1001: START - Cookbook sre.hosts.downtime
* 18:48 cdanis: the above scap began (and mostly finished) several minutes ago but is hanging on a couple hosts down for maintenance
* 18:48 cdanis@deploy1001: Synchronized wmf-config/InitialiseSettings.php: lower frwiki featured feeds limit {{Gerrit|1a41ef634}} [[phab:T266865|T266865]] (duration: 05m 14s)
* 18:48 cdanis: ✔️ cdanis@deploy1001.eqiad.wmnet /srv/mediawiki-staging 🕝☕ scap sync-file wmf-config/InitialiseSettings.php 'lower frwiki featured feeds limit {{Gerrit|1a41ef634}} [[phab:T266865|T266865]]'
* 18:27 hashar@deploy1001: Finished deploy [integration/docroot@c35e5e9]: Add ECS to doc.wikimedia.org index (duration: 00m 06s)
* 18:27 hashar@deploy1001: Started deploy [integration/docroot@c35e5e9]: Add ECS to doc.wikimedia.org index
* 17:38 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 17:36 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 17:22 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 17:22 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 17:19 effie: disable puppet on mc1036 and mc2036 - [[phab:T252391|T252391]]
* 17:18 effie: enable puppet on all mediawiki and mc* hosts
* 16:19 elukey: kafka-jumbo1006 still running with 1g nick
* 15:36 effie: stopping puppet on mediawiki and mc* hosts
* 15:11 rzl@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 15:11 rzl@cumin1001: START - Cookbook sre.hosts.downtime
* 15:09 rzl: downtiming mc2036 for buster reimage
* 14:42 elukey: stop kafka-jumbo1006 to swap NICs (1g -> 10g, d1 -> d4 rack)
* 14:14 cmjohnson1: moving mw1267 and mw168 to rack A8 eqiad [[phab:T266164|T266164]]
* 12:29 XioNoX: set normal VRRP balancing on cr2-eqiad
* 10:08 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 10:08 klausman@cumin1001: START - Cookbook sre.hosts.downtime
* 10:02 ladsgroup@deploy1001: Synchronized static/images/project-logos: Revert: Changing logo of Wikidata for the brithday (duration: 01m 12s)
* 09:13 elukey@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 09:07 elukey@cumin1001: START - Cookbook sre.dns.netbox
* 08:58 elukey@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
* 08:54 elukey: decom an-tool1006 (old analytics test vm) - [[phab:T255139|T255139]]
* 08:53 elukey@cumin1001: START - Cookbook sre.hosts.decommission
 
== 2020-10-29 ==
* 23:59 eileen: process-control config revision is {{Gerrit|6891d35bce}}
* 23:39 Urbanecm: Evening B&C window done
* 23:38 Urbanecm: urbanecm@mwmaint1002:~$ mwscript namespaceDupes.php --wiki=trwikiquote --add-prefix=BROKEN --fix # [[phab:T266605|T266605]] # P13112
* 23:37 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|ddb7e08e9c1d07f704c9f7585d8b6089f1895b5c}}: Add namespace aliases to Turkish Wikiquote ([[phab:T266605|T266605]]) (duration: 00m 57s)
* 23:36 eileen: process-control config revision is {{Gerrit|1114512f90}}
* 23:29 Urbanecm: urbanecm@mwmaint1002:~$ mwscript namespaceDupes.php --wiki=trwikisource --add-prefix=BROKEN --fix # [[phab:T266606|T266606]] # P13111
* 23:27 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|c3a8555154673c4c5a65f6ec2a1219d0832f48e0}}: Add namespace aliases to Turkish Wikisource ([[phab:T266606|T266606]]) (duration: 00m 56s)
* 23:23 Urbanecm: urbanecm@mwmaint1002:~$ mwscript namespaceDupes.php --wiki=trwikibooks --fix # [[phab:T266608|T266608]]
* 23:22 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|1800d11ec8c07ff6ccffe0fd03ce11e6786f8a6e}}: Add namespace aliases to Turkish Wikibooks ([[phab:T266608|T266608]]) (duration: 00m 57s)
* 23:22 eileen: civicrm revision changed from {{Gerrit|e1d65b0f3a}} to {{Gerrit|3317d30356}}, config revision is {{Gerrit|d70fe02cb9}}
* 23:18 Urbanecm: urbanecm@mwmaint1002:~$ mwscript namespaceDupes.php --wiki=trwiktionary --fix    # [[phab:T266609|T266609]]
* 23:17 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|090f75730727e7a3ca5a85af0ff9071213dd047f}}: Add namespace aliases to Turkish Wiktionary ([[phab:T266609|T266609]]) (duration: 00m 58s)
* 22:35 mutante: mw1268 - depooled for [[phab:T266164|T266164]]
* 22:34 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1268.eqiad.wmnet
* 22:33 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 22:33 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 22:32 mutante: mw1269 rsyncd/ferm for scap proxy was enabled - mw1268 rsyncd/ferm for scan proxy was removed - deploy1001 scap-proxies dsh group was adjusted
* 22:21 mutante: replacing scap proxy for rack A7 eqiad because mw1268 needs to move physically ([[phab:T266164|T266164]])
* 22:21 bstorm: updated packages for thirdparty/kubeadm-k8s-1-17 to prepare for install [[phab:T263284|T263284]]
* 22:10 razzi@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 22:08 razzi@cumin1001: START - Cookbook sre.hosts.downtime
* 22:07 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 22:07 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 22:06 mutante: depooled mw1267 ([[phab:T266164|T266164]])
* 22:06 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1267.eqiad.wmnet
* 22:04 mutante: scandium - puppet disabled again (but only until tomorrow), downtimed in Icinga, for ongoing parsoid tests from testreduce1001
* 22:03 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 22:03 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 20:52 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 20:50 robh@cumin1001: START - Cookbook sre.hosts.downtime
* 20:23 herron@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 20:17 herron@cumin1001: START - Cookbook sre.dns.netbox
* 20:09 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 20:08 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 20:06 robh@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 20:06 robh@cumin1001: START - Cookbook sre.hosts.downtime
* 20:06 robh@cumin1001: START - Cookbook sre.hosts.downtime
* 20:06 robh@cumin1001: START - Cookbook sre.hosts.downtime
* 19:31 cdanis@cumin1001: END (PASS) - Cookbook sre.network.cf (exit_code=0)
* 19:31 cdanis@cumin1001: START - Cookbook sre.network.cf
* 19:25 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 19:25 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime
* 19:22 Urbanecm: Start of `mwscript extensions/AbuseFilter/maintenance/updateVarDumps.php --wiki=$wiki --print-orphaned-records-to=/tmp/urbanecm/$wiki-orphaned.log --progress-markers > $wiki.log` in a tmux session on mwmaint1002 (wiki=ukwiki; [[phab:T246539|T246539]])
* 19:13 Amir1: rolling restart of ores uwsgi
* 19:00 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 18:58 robh@cumin1001: START - Cookbook sre.hosts.downtime
* 18:16 herron@cumin1001: END (ERROR) - Cookbook sre.ganeti.makevm (exit_code=97)
* 18:13 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enable WikiLove on hewikiquote ([[phab:T266744|T266744]]) (duration: 00m 57s)
* 18:09 herron@cumin1001: START - Cookbook sre.ganeti.makevm
* 18:07 herron@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99)
* 18:07 herron@cumin1001: START - Cookbook sre.ganeti.makevm
* 18:06 herron@cumin1001: END (ERROR) - Cookbook sre.ganeti.makevm (exit_code=97)
* 18:06 herron@cumin1001: START - Cookbook sre.ganeti.makevm
* 18:06 Urbanecm: [urbanecm@deploy1001 /srv/mediawiki-staging (master * u=)]$ sudo /usr/local/sbin/fix-staging-perms
* 18:05 Urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript extensions/WikimediaMaintenance/createExtensionTables.php --wiki=hewikiquote wikilove # [[phab:T266744|T266744]]
* 18:04 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|b7eaaab81e1665c478f5dc1fdb495e36c53e7863}}: [cswiki] Set wgGEHomepageManualAssignmentMentorsList to Wikipedie:Potřebuji pomoc/Mentoři/Manuální ([[phab:T245639|T245639]]) (duration: 00m 57s)
* 17:48 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.cf (exit_code=0)
* 17:48 ayounsi@cumin1001: START - Cookbook sre.network.cf
* 17:29 hashar: Restarted CI Jenkins a bit ago
* 17:15 hashar: CI: killed all java  agents (java upgrade)
* 17:12 hashar: Stopping CI Jenkins
* 16:59 XioNoX: Delete cr1-eqiad:ae2.1120 and related static routes - [[phab:T265288|T265288]]
* 16:46 _joe_: restarted kartotherian on all servers in eqiad at the same time
* 16:38 XioNoX: Move cr2-eqiad:ae2.1120 to cloudsw1-d5:irb.1120 - [[phab:T265288|T265288]]
* 16:34 XioNoX: force VRRP master on cr1-eqiad - [[phab:T265288|T265288]]
* 16:26 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: dc=eqiad,cluster=maps,service=kartotherian,name=maps1004.eqiad.wmnet
* 16:25 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: dc=eqiad,cluster=maps,service=kartotherian-ssl,name=maps1004.eqiad.wmnet
* 15:34 oblivian@deploy1001: Synchronized wmf-config/ProductionServices.php: Revert: switch restbase to use envoy, https (duration: 00m 57s)
* 15:22 moritzm: installing bacula updates from Buster point release
* 15:22 urbanecm@deploy1001: Synchronized php-1.36.0-wmf.14/extensions/intersection/: {{Gerrit|483c3bceb926ac6a2cfc40112fb9b4f0671fef72}}: Attempt to add a query cache to DPL ([[phab:T263220|T263220]]) (duration: 00m 58s)
* 15:16 papaul: poweroff mc2029 for relocation
* 15:15 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|19c5aff02c20812c56b8abdcc0ed530393010193}}: Set wgDLPQueryCacheTime to 120 at all wikis ([[phab:T263220|T263220]]) (duration: 00m 59s)
* 15:09 oblivian@deploy1001: Synchronized wmf-config/ProductionServices.php: Switch restbase to use envoy, https (duration: 00m 57s)
* 15:06 vgutierrez: rolling restart of ATS to upgrade to trafficserver 8.0.8-1wm3 - [[phab:T265911|T265911]]
* 14:59 papaul: poweroff sessionstore2002 for relocation
* 14:36 ppchelko@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' .
* 14:35 ppchelko@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' .
* 14:34 ppchelko@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'staging' .
* 14:33 jmm@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 14:29 jmm@cumin1001: START - Cookbook sre.hosts.decommission
* 14:26 jmm@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 14:25 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 14:25 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime
* 14:24 elukey: restart zookeeper on an-conf1001 for openjdk upgrades
* 14:20 jmm@cumin1001: START - Cookbook sre.hosts.decommission
* 14:08 godog: bump FS for prometheus codfw global instance
* 13:54 elukey: roll out profile::java on all zookeeper instances
* 13:53 moritzm: installing Java 11 security updates
* 13:52 bblack: authdns1001 - restart gdnsd - [[phab:T266746|T266746]]
* 13:46 bblack: authdns2001 - restart gdnsd - [[phab:T266746|T266746]]
* 13:38 bblack: staggered restart of gdnsd on dns[12345]001 (1/2 recursors in each DC) - [[phab:T266746|T266746]]
* 13:29 bblack: staggered restart of gdnsd on dns[12345]002 (1/2 recursors in each DC) - [[phab:T266746|T266746]]
* 13:25 Urbanecm: Correction: Obviously 1002 ([[phab:T246539|T246539]])
* 13:23 Urbanecm: Start of `mwscript extensions/AbuseFilter/maintenance/updateVarDumps.php --wiki=$wiki --print-orphaned-records-to=/tmp/urbanecm/$wiki-orphaned.log --progress-markers > $wiki.log` in a tmux session updateVarDumps at mwmaint2001 (wiki=idwiki; [[phab:T246539|T246539]])
* 13:21 moritzm: installing bluez security updates on stretch
* 12:56 marostegui: Make orchestrator discover pc2 [[phab:T266485|T266485]]
* 12:55 marostegui: Deploy orchestrator grants on pc2 [[phab:T266485|T266485]]
* 12:44 marostegui: Deploy grants for cluster alias on pc1 [[phab:T266485|T266485]]
* 12:35 moritzm: upgrade idp-test* hosts to latest Java securiy updates
* 12:35 moritzm: restart idp-test
* 12:34 ariel@deploy1001: Finished deploy [dumps/dumps@4ed2cb9]: revinfo for page content jobs, tableinfo for list of known tables (duration: 00m 05s)
* 12:33 ariel@deploy1001: Started deploy [dumps/dumps@4ed2cb9]: revinfo for page content jobs, tableinfo for list of known tables
* 12:01 klausman@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
* 11:18 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.cf (exit_code=0)
* 11:18 ayounsi@cumin1001: START - Cookbook sre.network.cf
* 11:14 Urbanecm: EU B&C window done
* 11:12 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|28152b7387082b79d71cfbf28be740ffe629ee50}}: Add another SDC property to search for matching media statements ([[phab:T264925|T264925]]) (duration: 00m 58s)
* 11:11 klausman@cumin1001: START - Cookbook sre.ganeti.makevm
* 11:07 klausman@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99)
* 11:07 klausman@cumin1001: START - Cookbook sre.ganeti.makevm
* 11:06 klausman@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99)
* 11:06 klausman@cumin1001: START - Cookbook sre.ganeti.makevm
* 10:15 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 10:15 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime
* 10:15 elukey@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
* 10:12 elukey: restart tilerator on maps100[1,4] - redis errors in the logs
* 10:11 elukey: restart tilerator on maps1002 - redis errors in the logs
* 10:03 elukey@cumin1001: START - Cookbook sre.ganeti.makevm
* 10:03 elukey: drop 10.64.21.6/24 and 2620:0:861:105:10:64:21:6/64 from netbox (an-tool-ui1001 related records)
* 09:59 oblivian@deploy1001: Synchronized wmf-config/ProductionServices.php: Fix cxserver's configuration to use envoy (duration: 00m 59s)
* 09:52 elukey: add gdnsd.service to all gdnsd hosts (with LimitNOFILE=infinity as override) - no daemon restart done - [[phab:T266746|T266746]]
* 09:41 marostegui: Deploy schema change on s8 wikidata codfw master (db2079) [[phab:T264109|T264109]]
* 09:33 elukey: clean up 10.64.21.7/24 and 2620:0:861:105:10:64:21:7/64 from netbox (an-test-ui1001 already have ips previously allocated by makevm)
* 09:32 elukey@cumin1001: END (ERROR) - Cookbook sre.ganeti.makevm (exit_code=97)
* 09:23 elukey@cumin1001: START - Cookbook sre.ganeti.makevm
* 08:54 vgutierrez: turn off ECDHE-ECDSA-AES128-SHA support on the main caching cluster - [[phab:T258405|T258405]]
* 08:54 moritzm: fixing up stray jenkins auto restart timers on secondary releases server
* 08:53 vgutierrez: A:cp (except cp3052, running varnish 5) upgrade libvmod-netmapper to 1.9-1 [[phab:T266567|T266567]] [[phab:T264398|T264398]]
* 08:48 moritzm: fixing up stray mcelog auto restart timers on kubestage*
* 08:38 moritzm: fixing up stray cas auto restart timers on secondary IDP servers
* 08:19 moritzm: fixing up stray pmacctd auto restart timers on netflow*
* 08:19 moritzm: fixing up stray pcacctd auto restart timers on netflow*
* 08:02 marostegui: Disconnect replication codfw -> eqiad on s1 [[phab:T266663|T266663]]
* 07:56 vgutierrez: set LimitNOFILE=500000 for gdnsd on authdns1001
* 07:54 marostegui: Disconnect replication codfw -> eqiad on s4 [[phab:T266663|T266663]]
* 07:50 vgutierrez: restart haproxy on authdns2001
* 07:49 marostegui: Disconnect replication codfw -> eqiad on s8 [[phab:T266663|T266663]]
* 07:48 godog: swift codfw-prod: bump object weight for ms-be2057 - [[phab:T261633|T261633]]
* 07:46 marostegui: Disconnect replication codfw -> eqiad on s3 [[phab:T266663|T266663]]
* 07:43 vgutierrez: restart anycast-healthchecker on authdns2001
* 07:34 vgutierrez: set LimitNOFILE=500000 for gdnsd on authdns2001
* 07:27 elukey: "sudo truncate -s 10g /var/log/daemon.log" on authdns2001
* 06:52 marostegui: Disconnect replication codfw -> eqiad on s2 [[phab:T266663|T266663]]
* 06:38 marostegui: Disconnect replication codfw -> eqiad on s7 [[phab:T266663|T266663]]
* 06:36 marostegui: Disconnect replication codfw -> eqiad on s6 [[phab:T266663|T266663]]
* 06:25 elukey: execute 'truncate -s 10g /var/log/syslog.1 on authdns2001 - root partition full
* 06:23 marostegui: Disconnect replication codfw -> eqiad on s5 [[phab:T266663|T266663]]
* 06:10 marostegui: Disconnect replication codfw -> eqiad on es4 and es5 [[phab:T266663|T266663]]
* 06:07 marostegui: Disconnect replication codfw -> eqiad on x1 [[phab:T266663|T266663]]
* 05:58 marostegui: Disconnect replication codfw -> eqiad on pc1, pc2 and pc3 [[phab:T266663|T266663]]
* 04:06 ryankemper@cumin1001: END (PASS) - Cookbook sre.elasticsearch.rolling-restart (exit_code=0)
* 01:41 mutante: scandium reimaged a second time after making puppet changes to ensure nodejs/npm is NOT installed anymore ([[phab:T257906|T257906]])
* 01:17 ryankemper: [[phab:T266492|T266492]] Beginning rolling restart of eqiad cirrus cluster, 3 nodes at a time, on `ryankemper@cumin1001` tmux session `elasticsearch_restart_eqiad`
* 01:16 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-restart
* 00:51 ryankemper: Finished restart of wdqs categories across production hosts; wdqs deploy is complete and the service is healthy
* 00:14 Amir1: rolling restart of ores
* 00:12 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 00:10 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 00:04 ryankemper: Beginning restart of wdqs categories across production hosts, one at a time: `sudo -E cumin -b 1 'A:wdqs-all and not A:wdqs-test' 'depool && sleep 60 && systemctl restart wdqs-categories && sleep 30 && pool'`
* 00:03 ryankemper: Restarted wdqs categories across test hosts: `sudo -E cumin 'A:wdqs-test' 'systemctl restart wdqs-categories'`
* 00:03 ryankemper: Restarted wdqs updater across all hosts: `sudo -E cumin -b 4 'A:wdqs-all' 'systemctl restart wdqs-updater'`
* 00:02 ryankemper: Following wdqs deploy, https://query.wikidata.org successfully responds to an example query
* 00:01 ryankemper@deploy1001: Finished deploy [wdqs/wdqs@8c97b17]: 0.3.53 (duration: 09m 29s)
 
== 2020-10-28 ==
* 23:54 ryankemper: Canary `wdqs1003` tests pass, proceeding with wdqs deploy to rest of fleet
* 23:52 ryankemper@deploy1001: Started deploy [wdqs/wdqs@8c97b17]: 0.3.53
* 23:52 ryankemper@deploy1001: deploy aborted:  0.3.53 (duration: 00m 00s)
* 23:52 ryankemper@deploy1001: Started deploy [wdqs/wdqs@8c97b17]:  0.3.53
* 22:54 mutante: scandium - scap pull after reinstalling OS
* 22:14 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 22:12 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 21:41 ryankemper: Disabled elasticsearch "saneitizer" systemd timer in eqiad due to checker jobs falling behind: `sudo systemctl disable mediawiki_job_cirrus_sanitize_jobs.timer` on `mwmaint1002`
* 21:22 herron@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
* 21:05 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 21:05 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime
* 20:50 herron@cumin1001: START - Cookbook sre.ganeti.makevm
* 20:22 ladsgroup@deploy1001: Synchronized static/images/project-logos: Changing logo of Wikidata for the brithday (duration: 00m 58s)
* 19:56 jgleeson: updated Smashpig from {{Gerrit|2246685626}} to {{Gerrit|09f29c1da5}}
* 19:53 herron@cumin1001: END (ERROR) - Cookbook sre.ganeti.makevm (exit_code=97)
* 19:53 herron@cumin1001: START - Cookbook sre.ganeti.makevm
* 19:50 herron@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99)
* 19:36 herron@cumin1001: START - Cookbook sre.ganeti.makevm
* 19:36 herron@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99)
* 19:36 herron@cumin1001: START - Cookbook sre.ganeti.makevm
* 19:30 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 19:30 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime
* 19:22 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 19:20 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 18:56 tgr_: Morning deploys done
* 18:55 tgr@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:636983{{!}}Temporary enable 'editpage' warn logging (T251023)]] (duration: 00m 57s)
* 18:51 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 18:51 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 18:47 volans@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 18:46 tgr@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:636791{{!}}Revert "cirrus: Hardcode more_like to codfw cirrus cluster"]] (duration: 00m 56s)
* 18:45 tgr@deploy1001: Synchronized wmf-config/PoolCounterSettings.php: Config: [[gerrit:636956{{!}}Revert "Revert "Increase cirrus morelike pool counter by 20%"" ()]] (duration: 00m 57s)
* 18:43 volans@cumin1001: START - Cookbook sre.dns.netbox
* 18:40 tgr@deploy1001: Synchronized php-1.36.0-wmf.14/extensions/GrowthExperiments/includes/HomepageModules/SuggestedEdits.php: Backport: [[gerrit:636787{{!}}Suggested edits: Include page ID with task preview data (T266600)]] (duration: 00m 59s)
* 18:19 tgr@deploy1001: Synchronized wmf-config/CommonSettings.php: Config: [[gerrit:619880{{!}}Removing obsolete license definition]] (duration: 01m 00s)
* 18:11 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 18:07 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 18:06 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
* 18:02 elukey@cumin1001: END (ERROR) - Cookbook sre.dns.netbox (exit_code=97)
* 17:46 elukey@cumin1001: START - Cookbook sre.dns.netbox
* 17:46 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 17:30 hnowlan: reimporting OSM data for eqiad
* 17:24 hnowlan: removing OSM database on maps1004
* 16:35 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 16:34 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime
* 16:34 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 16:34 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime
* 16:34 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 16:34 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime
* 16:22 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: dc=eqiad,cluster=maps,service=kartotherian-ssl,name=maps1004.eqiad.wmnet
* 16:22 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: dc=eqiad,cluster=maps,service=kartotherian,name=maps1004.eqiad.wmnet
* 16:18 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: dc=eqiad,cluster=kartotherian,service=kartotherian,name=maps1004.eqiad.wmnet
* 16:16 hnowlan: Disabling tilerator in eqiad
* 16:15 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 16:15 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime
* 16:06 ppchelko@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' .
* 16:05 ppchelko@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' .
* 16:03 ppchelko@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'staging' .
* 15:51 Amir1: restarting uwsgi on ores in eqiad
* 15:49 elukey@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99)
* 15:33 ppchelko@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'mobileapps' for release 'nontls' .
* 15:33 ppchelko@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
* 15:24 ppchelko@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
* 15:24 ppchelko@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'mobileapps' for release 'nontls' .
* 15:23 ppchelko@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'mobileapps' for release 'staging' .
* 15:10 godog: roll restart logstash5 in codfw
* 14:50 elukey@cumin1001: START - Cookbook sre.ganeti.makevm
* 14:05 jayme@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'kube-system' for release 'eventrouter' .
* 13:54 jayme@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'kube-system' for release 'eventrouter' .
* 12:39 moritzm: installing libdatetime-timezone-perl  updates
* 11:46 XioNoX: configure urpf strict log-only on cr3-ulsfo:et-0/0/1.501 - [[phab:T266561|T266561]]
* 10:39 ema: due to [[phab:T266651|T266651]], cancel the entry above: A:cp upgrade libvmod-netmapper to 1.9-1 [[phab:T266567|T266567]] [[phab:T264398|T264398]]
* 10:38 elukey: clean up 10.64.5.7 and 2620:0:861:104:10:64:5:7 from Netbox (records mistakely allocated via the makevm cookbook) - [[phab:T266648|T266648]]
* 10:35 elukey@cumin1001: END (ERROR) - Cookbook sre.ganeti.makevm (exit_code=97)
* 10:25 ema: A:cp (except cp3052, running varnish 5) upgrade libvmod-netmapper to 1.9-1 [[phab:T266567|T266567]] [[phab:T264398|T264398]]
* 10:20 elukey@cumin1001: START - Cookbook sre.ganeti.makevm
* 09:54 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 09:52 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 09:50 elukey@cumin1001: START - Cookbook sre.hosts.downtime
* 09:49 elukey@cumin1001: START - Cookbook sre.hosts.downtime
* 09:26 jayme: imported kubeyaml 0.0.3~20201027+git5f5556c-1 to buster-wikimedia
* 09:04 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 09:02 elukey@cumin1001: START - Cookbook sre.hosts.downtime
* 08:37 jynus: updated dump grants on db2093
* 07:53 volans: upgraded python3-wmflib to 0.0.3 on the cumin hosts - [[phab:T257905|T257905]]
* 07:40 godog: update thanos-fe1002 to thanos 0.16.0 - [[phab:T261281|T261281]]
* 07:22 godog: swift codfw-prod: bump object weight for ms-be2057 - [[phab:T261633|T261633]]
* 04:43 ryankemper: [[phab:T266492|T266492]] Finished rolling restart of codfw cirrus cluster
* 04:43 ryankemper@cumin2001: END (PASS) - Cookbook sre.elasticsearch.rolling-restart (exit_code=0)
* 02:58 ryankemper: [[phab:T266492|T266492]] Beginning rolling restart of codfw cirrus cluster, 3 nodes at a time, on `ryankemper@cumin2001` tmux session `elasticsearch_restart_codfw`
* 02:57 ryankemper@cumin2001: START - Cookbook sre.elasticsearch.rolling-restart
* 02:12 eileen: tools revision changed from {{Gerrit|a2a91d6c6a}} to {{Gerrit|087a596d3a}}
* 00:40 eileen: civicrm revision changed from {{Gerrit|4fdfb8408b}} to {{Gerrit|e1d65b0f3a}}, config revision is {{Gerrit|f16003ab62}}
 
== 2020-10-27 ==
* 22:20 mutante: systemctl reset-failed on various servers to see which are coming back later from failed auto_restart and which don't
* 21:40 mutante: mwmaint2001 - systemctl reset-failed - mediawiki_job_parser_cache_purging.service
* 20:56 mutante: ms-be1057 is network down but running, NO-CARRIER on NIC, cable disconnected?
* 20:43 mutante: releases2002 - systemctl reset-failed .. after removing wmf_auto_restart_rsync
* 20:13 mutante: gerrit1001/gerrit2001: manually deleting list_mediawiki_extensions cron job ([[phab:T266024|T266024]])
* 19:40 eileen: civicrm revision changed from {{Gerrit|bb7c08bf6d}} to {{Gerrit|4fdfb8408b}}, config revision is {{Gerrit|f16003ab62}}
* 18:35 oblivian@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' .
* 17:55 otto@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'canary' .
* 17:55 otto@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
* 17:46 otto@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'canary' .
* 17:46 otto@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
* 17:44 otto@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
* 17:22 mutante: gerrit1001/2001 - sudo rm /var/www/mediawiki-extensions.txt
* 17:18 ejegg: updated payments-wiki from {{Gerrit|4c1503ad91}} to {{Gerrit|adc3369cb3}}
* 16:34 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.cf (exit_code=0)
* 16:34 ayounsi@cumin1001: START - Cookbook sre.network.cf
* 16:05 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.cf (exit_code=0)
* 16:05 ayounsi@cumin1001: START - Cookbook sre.network.cf
* 16:05 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.cf (exit_code=0)
* 16:05 ayounsi@cumin1001: START - Cookbook sre.network.cf
* 16:05 pt1979@cumin2001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 15:59 pt1979@cumin2001: START - Cookbook sre.dns.netbox
* 15:42 mepps: updated payments-wiki-staging from {{Gerrit|5fdd29bc16}} to {{Gerrit|4c1503ad91}}
* 15:25 ema: cp4032: downgrade varnish to 6.0.4 [[phab:T264398|T264398]]
* 15:13 ema: cp4032: varnish-frontend-restart with libvmod-netmapper 1.9-1 [[phab:T266567|T266567]]
* 14:55 ema: upload libvmod-netmapper 1.9-1 to buster-wikimedia component/varnish6 [[phab:T266567|T266567]]
* 14:49 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.08-restore-ttl (exit_code=0)
* 14:48 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.08-restore-ttl
* 14:40 _joe_: restarting envoyproxy on the jobrunners in codfw
* 14:36 akosiaris: rolling restart of all pods in codfw changeprop-jobqueue
* 14:27 _joe_: restart php-fpm on jobrunners in codfw
* 14:17 cdanis: ran puppet on alert1001
* 14:16 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.08-update-tendril (exit_code=0)
* 14:15 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.08-update-tendril
* 14:15 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.08-run-puppet-on-db-masters (exit_code=0)
* 14:11 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.08-run-puppet-on-db-masters
* 14:11 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.08-start-maintenance (exit_code=0)
* 14:11 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 14:11 marostegui@cumin1001: START - Cookbook sre.hosts.downtime
* 14:09 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.08-start-maintenance
* 14:09 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.07-set-readwrite (exit_code=0)
* 14:09 rzl@cumin1001: MediaWiki read-only period ends at: 2020-10-27 14:09:02.873019
* 14:09 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.07-set-readwrite
* 14:06 root@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 14:06 root@cumin1001: START - Cookbook sre.hosts.downtime
* 14:05 rzl@cumin1001: END (FAIL) - Cookbook sre.switchdc.mediawiki.07-set-readwrite (exit_code=99)
* 14:04 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.07-set-readwrite
* 14:04 rzl@cumin1001: END (FAIL) - Cookbook sre.switchdc.mediawiki.07-set-readwrite (exit_code=99)
* 14:03 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.07-set-readwrite
* 14:03 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.06-set-db-readwrite (exit_code=0)
* 14:03 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.06-set-db-readwrite
* 14:03 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.05-invert-redis-sessions (exit_code=0)
* 14:03 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.05-invert-redis-sessions
* 14:03 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.04-switch-mediawiki (exit_code=0)
* 14:02 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.04-switch-mediawiki
* 14:02 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.03-set-db-readonly (exit_code=0)
* 14:02 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.03-set-db-readonly
* 14:02 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.02-set-readonly (exit_code=0)
* 14:01 rzl@cumin1001: MediaWiki read-only period starts at: 2020-10-27 14:01:54.999830
* 14:01 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.02-set-readonly
* 13:56 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.01-stop-maintenance (exit_code=0)
* 13:56 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.01-stop-maintenance
* 13:55 root@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 13:55 root@cumin1001: START - Cookbook sre.hosts.downtime
* 13:54 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.00-warmup-caches (exit_code=0)
* 13:53 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.00-warmup-caches
* 13:50 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.00-warmup-caches (exit_code=0)
* 13:49 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.00-warmup-caches
* 13:47 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.00-warmup-caches (exit_code=0)
* 13:46 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.00-warmup-caches
* 13:38 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.00-warmup-caches (exit_code=0)
* 13:37 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.00-warmup-caches
* 13:36 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.00-disable-puppet (exit_code=0)
* 13:35 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.00-disable-puppet
* 13:35 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.00-reduce-ttl (exit_code=0)
* 13:35 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.00-reduce-ttl
* 13:15 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=0)
* 13:10 elukey@cumin1001: START - Cookbook sre.hadoop.init-hadoop-workers
* 13:07 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=0)
* 13:04 elukey@cumin1001: START - Cookbook sre.hadoop.init-hadoop-workers
* 13:01 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=0)
* 12:55 elukey@cumin1001: START - Cookbook sre.hadoop.init-hadoop-workers
* 12:55 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=0)
* 12:51 elukey@cumin1001: START - Cookbook sre.hadoop.init-hadoop-workers
* 11:35 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=0)
* 11:27 elukey@cumin1001: START - Cookbook sre.hadoop.init-hadoop-workers
* 11:25 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=0)
* 11:21 elukey@cumin1001: START - Cookbook sre.hadoop.init-hadoop-workers
* 11:19 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=0)
* 11:14 ema: A:cp remove libvarnishapi1, replaced by libvarnishapi2 a while ago [[phab:T261487|T261487]]
* 11:13 elukey@cumin1001: START - Cookbook sre.hadoop.init-hadoop-workers
* 11:12 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=0)
* 11:06 elukey@cumin1001: START - Cookbook sre.hadoop.init-hadoop-workers
* 11:02 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=0)
* 10:54 elukey@cumin1001: START - Cookbook sre.hadoop.init-hadoop-workers
* 10:52 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=0)
* 10:46 elukey@cumin1001: START - Cookbook sre.hadoop.init-hadoop-workers
* 10:44 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=0)
* 10:40 elukey@cumin1001: START - Cookbook sre.hadoop.init-hadoop-workers
* 10:31 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=0)
* 10:27 elukey@cumin1001: START - Cookbook sre.hadoop.init-hadoop-workers
* 10:21 XioNoX: update policies from-zone production to-zone junos-host on mr1-eqiad - [[phab:T265589|T265589]]
* 10:20 XioNoX: update policies from-zone production to-zone junos-host on mr1-eqsin - [[phab:T265589|T265589]]
* 10:19 XioNoX: update policies from-zone production to-zone junos-host on mr1-ulsfo - [[phab:T265589|T265589]]
* 10:15 XioNoX: update policies from-zone production to-zone junos-host on mr1-esams - [[phab:T265589|T265589]]
* 10:06 XioNoX: update policies from-zone production to-zone junos-host on mr1-codfw - [[phab:T265589|T265589]]
* 08:58 elukey@cumin1001: END (ERROR) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=97)
* 08:55 elukey@cumin1001: START - Cookbook sre.hadoop.init-hadoop-workers
* 08:39 elukey@cumin1001: END (ERROR) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=97)
* 08:32 elukey@cumin1001: START - Cookbook sre.hadoop.init-hadoop-workers
* 08:30 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=0)
* 08:27 elukey@cumin1001: START - Cookbook sre.hadoop.init-hadoop-workers
* 08:15 godog: update thanos-fe2002 to thanos 0.16.0 - [[phab:T261281|T261281]]
* 07:35 godog: swift codfw-prod: bump object weight for ms-be2057 - [[phab:T261633|T261633]]
* 06:58 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'kube-system' for release 'eventrouter' .
* 06:50 jayme: published docker-registry.discovery.wmnet/eventrouter:0.3.0-4
* 06:42 ryankemper: [[phab:T263970|T263970]] Set number of replicas to 2 (from previous value of 1) for all codfw indices matching `apifeatureusage*`, new shards have been assigned without issue
 
== 2020-10-26 ==
* 23:12 catrope@deploy1001: Synchronized php-1.36.0-wmf.14/extensions/GrowthExperiments/: Fix JS error when no topics set ([[phab:T266501|T266501]]) (duration: 01m 00s)
* 22:30 mutante: netflow5001 - systemctl reset-failed
* 21:44 rzl: live test of sre.switchdc.mediawiki complete, the foregoing logging noise had no actual production impact
* 21:43 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.08-update-tendril (exit_code=0)
* 21:43 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.08-update-tendril
* 21:43 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.08-start-maintenance (exit_code=0)
* 21:41 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.08-start-maintenance
* 21:41 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.08-restore-ttl (exit_code=0)
* 21:41 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.08-restore-ttl
* 21:40 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.08-run-puppet-on-db-masters (exit_code=0)
* 21:37 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.08-run-puppet-on-db-masters
* 21:37 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.07-set-readwrite (exit_code=0)
* 21:37 rzl@cumin1001: [DRY-RUN] MediaWiki read-only period ends at: 2020-10-26 21:37:17.809596
* 21:37 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.07-set-readwrite
* 21:37 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.06-set-db-readwrite (exit_code=0)
* 21:37 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.06-set-db-readwrite
* 21:37 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.05-invert-redis-sessions (exit_code=0)
* 21:36 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.05-invert-redis-sessions
* 21:36 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.04-switch-mediawiki (exit_code=0)
* 21:36 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.04-switch-mediawiki
* 21:36 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.03-set-db-readonly (exit_code=0)
* 21:35 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.03-set-db-readonly
* 21:35 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.02-set-readonly (exit_code=0)
* 21:35 rzl@cumin1001: [DRY-RUN] MediaWiki read-only period starts at: 2020-10-26 21:35:20.837214
* 21:35 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.02-set-readonly
* 21:34 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.01-stop-maintenance (exit_code=0)
* 21:34 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.01-stop-maintenance
* 21:34 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.00-warmup-caches (exit_code=0)
* 21:33 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.00-warmup-caches
* 21:32 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.00-reduce-ttl (exit_code=0)
* 21:32 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.00-reduce-ttl
* 21:31 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.00-disable-puppet (exit_code=0)
* 21:31 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.00-disable-puppet
* 21:31 rzl: starting a live test of sre.switchdc.mediawiki, which will create some logging noise but no actual production impact
* 20:54 mutante: scandium rm /usr/local/bin/update_parsoid.sh (gerrit:636494)
* 20:15 ladsgroup@deploy1001: Finished deploy [ores/deploy@6912889]: Deploy new version of articlequality for wikidata ([[phab:T261326|T261326]]) (duration: 06m 53s)
* 20:08 ladsgroup@deploy1001: Started deploy [ores/deploy@6912889]: Deploy new version of articlequality for wikidata ([[phab:T261326|T261326]])
* 19:31 ppchelko@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
* 19:29 ppchelko@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
* 19:26 ppchelko@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
* 18:59 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: GrowthExperiments: Remove variant setting override (no-op) ([[phab:T265556|T265556]]) (duration: 00m 57s)
* 18:55 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Configure $wgBabelCategoryNames on ndswiki ([[phab:T264990|T264990]]) (duration: 00m 58s)
* 18:51 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Add www.legislation.gov.uk to $wgCopyUploadsDomains on commonswiki ([[phab:T265690|T265690]]) (duration: 00m 58s)
* 18:47 catrope@deploy1001: Synchronized php-1.36.0-wmf.14/extensions/GrowthExperiments/: Make variant D the default, remove variant A ([[phab:T265372|T265372]], [[phab:T265556|T265556]]) (duration: 00m 58s)
* 18:46 catrope@deploy1001: Synchronized php-1.36.0-wmf.14/vendor/wikimedia/parsoid/: Bump wikimedia/parsoid to v0.13.0-a13, enabling 6-element DSRs ([[phab:T266285|T266285]]) (duration: 00m 58s)
* 18:43 catrope@deploy1001: Synchronized php-1.36.0-wmf.14/skins/Vector/: Fix logic in collapsibleTabs code ([[phab:T71729|T71729]]) (duration: 00m 58s)
* 18:21 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Remove wtp2001-wtp2020 from LinterSubmitterWhitelist ([[phab:T265558|T265558]]) (duration: 00m 59s)
* 18:10 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: GrowthExperiments: Make variant D the default on all wikis ([[phab:T265556|T265556]]) (duration: 00m 58s)
* 17:58 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'kube-system' for release 'eventrouter' .
* 17:48 mutante: an-worker109* - systemctl reset-failed  to clear Icinga alerts related to wmf_auto_restart changes
* 17:45 mutante: releases2002,netmon2001, various other hosts - systemctl reset-failed  to clear Icinga alerts related to wmf_auto_restart changes
* 17:39 krinkle@deploy1001: Synchronized php-1.36.0-wmf.13/resources/src/mediawiki.util/: [[phab:T265809|T265809]], {{Gerrit|I1011f63ae61f5a6}} (duration: 01m 00s)
* 16:41 XioNoX: bounce security log on pfw3-eqiad - [[phab:T263833|T263833]]
* 16:29 XioNoX: set security-log traceoptions on pfw3-eqiad - [[phab:T263833|T263833]]
* 16:14 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 16:08 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 16:07 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 16:00 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 15:51 rzl@cumin1001: conftool action : set/ttl=300; selector: dnsdisc=apertium{{!}}api-gateway{{!}}citoid{{!}}cxserver{{!}}echostore{{!}}eventgate-analytics{{!}}eventgate-analytics-external{{!}}eventgate-logging-external{{!}}eventgate-main{{!}}eventstreams{{!}}graphoid{{!}}kartotherian{{!}}mathoid{{!}}mobileapps{{!}}ores{{!}}parsoid{{!}}proton{{!}}push-notifications{{!}}recommendation-api{{!}}restbase{{!}}restbase-async{{!}}schema{{!}}search{{!}}sessionstore{{!}}termbox{{!}}wdqs{{!}}wdqs-internal{{!}}wikifeeds{{!}}zotero,name=eqiad
* 15:35 rzl@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=zotero,name=eqiad
* 15:32 rzl@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=wikifeeds,name=eqiad
* 15:29 rzl@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=wdqs-internal,name=eqiad
* 15:26 rzl@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=wdqs,name=eqiad
* 15:23 rzl@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=termbox,name=eqiad
* 15:20 rzl@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=sessionstore,name=eqiad
* 15:17 rzl@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=search,name=eqiad
* 15:14 rzl@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=schema,name=eqiad
* 15:11 rzl@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=restbase-async,name=eqiad
* 15:08 rzl@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=restbase,name=eqiad
* 15:05 rzl@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=recommendation-api,name=eqiad
* 15:02 rzl@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=push-notifications,name=eqiad
* 14:59 rzl@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=proton,name=eqiad
* 14:56 rzl@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=parsoid,name=eqiad
* 14:53 rzl@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=ores,name=eqiad
* 14:50 rzl@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=mobileapps,name=eqiad
* 14:47 rzl@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=mathoid,name=eqiad
* 14:46 ppchelko@deploy1001: Finished deploy [restbase/deploy@a1a1bd7]: Add api-portal and snmwiki (duration: 16m 43s)
* 14:44 rzl@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=kartotherian,name=eqiad
* 14:41 rzl@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=graphoid,name=eqiad
* 14:38 rzl@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=eventstreams,name=eqiad
* 14:35 rzl@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=eventgate-main,name=eqiad
* 14:32 rzl@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=eventgate-logging-external,name=eqiad
* 14:30 ppchelko@deploy1001: Started deploy [restbase/deploy@a1a1bd7]: Add api-portal and snmwiki
* 14:29 rzl@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=eventgate-analytics-external,name=eqiad
* 14:26 rzl@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=eventgate-analytics,name=eqiad
* 14:23 rzl@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=echostore,name=eqiad
* 14:20 rzl@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=cxserver,name=eqiad
* 14:17 rzl@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=citoid,name=eqiad
* 14:14 rzl@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=api-gateway,name=eqiad
* 14:11 rzl@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=apertium,name=eqiad
* 14:06 rzl@cumin1001: conftool action : set/ttl=10; selector: dnsdisc=apertium{{!}}api-gateway{{!}}citoid{{!}}cxserver{{!}}echostore{{!}}eventgate-analytics{{!}}eventgate-analytics-external{{!}}eventgate-logging-external{{!}}eventgate-main{{!}}eventstreams{{!}}graphoid{{!}}kartotherian{{!}}mathoid{{!}}mobileapps{{!}}ores{{!}}parsoid{{!}}proton{{!}}push-notifications{{!}}recommendation-api{{!}}restbase{{!}}restbase-async{{!}}schema{{!}}search{{!}}sessionstore{{!}}termbox{{!}}wdqs{{!}}wdqs-internal{{!}}wikifeeds{{!}}zotero,name=eqiad
* 13:48 moritzm: imported cas 6.2.4-1 to apt.wikimedia.org [[phab:T265857|T265857]]
* 13:21 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 13:21 marostegui@cumin1001: START - Cookbook sre.hosts.downtime
* 13:19 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 13:19 marostegui@cumin1001: START - Cookbook sre.hosts.downtime
* 11:52 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|bff6b37a55fe8f260fe00cbb942c53101167fb07}}: Add foto.digitalarkivet.no to wgCopyUploadsDomains whitelist of Wikimedia Commons ([[phab:T266390|T266390]]) (duration: 01m 14s)
* 11:27 elukey@cumin1001: END (FAIL) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=99)
* 11:27 elukey@cumin1001: START - Cookbook sre.hadoop.init-hadoop-workers
* 11:26 elukey@cumin1001: END (FAIL) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=99)
* 11:26 elukey@cumin1001: START - Cookbook sre.hadoop.init-hadoop-workers
* 11:11 vgutierrez: upgrade trafficserver to 8.0.8-1wm3 on cp4032 - [[phab:T265911|T265911]]
* 11:02 elukey@cumin1001: END (FAIL) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=99)
* 11:02 elukey@cumin1001: START - Cookbook sre.hadoop.init-hadoop-workers
* 10:51 vgutierrez: manually reloading nginx on cloudelastic[1005-1006]
* 10:29 vgutierrez: upload trafficserver 8.0.8-1wm3 to apt.wm.org (buster) - [[phab:T265911|T265911]]
* 10:18 godog: roll restart pybal to apply latest configuration
* 09:51 jayme: published docker-registry.discovery.wmnet/eventrouter:0.3.0-3
* 09:31 moritzm: restarting PHP FPM on mw canaries to pick up freetype update
* 09:04 godog: swift codfw-prod: bump object weight for ms-be2057 - [[phab:T261633|T261633]]
* 08:58 moritzm: installing freetype security updates for stretch
* 08:57 XioNoX: remove down sessions to AS38758
* 08:51 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 08:51 filippo@cumin1001: START - Cookbook sre.hosts.downtime
* 08:43 XioNoX: remove down sessions to AS8560
* 08:41 XioNoX: remove down sessions to AS31334
* 08:28 XioNoX: remove down sessions to AS6327
* 08:27 XioNoX: remove down sessions to AS8674
* 08:25 XioNoX: remove down sessions to AS24429
* 08:21 XioNoX: remove down sessions to AS16509
* 06:59 _joe_: rolling restart of php7.2-fpm on the codfw jobrunners, to reduce the number of dangling transcodes after restarting cp-jobqueue for a deploy
* 06:59 oblivian@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' .
* 06:16 oblivian@cumin2001: conftool action : set/pooled=no; selector: cluster=jobrunner,dc=codfw,name=mw224.*
* 06:15 oblivian@cumin2001: conftool action : set/pooled=no; selector: cluster=videoscaler,dc=codfw,name=mw228.*
* 06:10 marostegui: Warm up tables [[phab:T261914|T261914]]
 
== 2020-10-25 ==
* 15:53 dwisehaupt: kernel upgrade and reboot for frdb1003
* 15:50 dwisehaupt: kernel upgrade and reboot for fran1001
 
== 2020-10-23 ==
* 22:56 mutante: added Nuria to "nda" LDAP group - leaving her in "wmf" until the actual last day - shell account remains so no puppet change needed in ldap_only_admins ([[phab:T266086|T266086]])
* 15:42 pt1979@cumin2001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 15:37 pt1979@cumin2001: START - Cookbook sre.dns.netbox
* 13:04 ema: rolling thumbor-instances restart to apply https://gerrit.wikimedia.org/r/c/operations/puppet/+/636012/ [[phab:T266155|T266155]]
* 12:47 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'kube-system' for release 'eventrouter' .
* 10:57 kormat: uploaded orchestrator v3.2.3 to apt.wikimedia.org buster-wikimedia - [[phab:T266023|T266023]] (forgot to log this earlier)
* 10:56 volans: uploaded python3-wmflib_0.0.3 to apt.wikimedia.org buster-wikimedia - [[phab:T257905|T257905]]
* 10:09 jayme: published docker-registry.discovery.wmnet/eventrouter:0.3.0-2
* 09:51 moritzm: masking slapd on the old Stretch replicas to uncover potential direct access outside of the LVSes  [[phab:T264388|T264388]]
* 09:47 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 09:47 jmm@cumin2001: START - Cookbook sre.hosts.downtime
* 09:47 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 09:47 jmm@cumin2001: START - Cookbook sre.hosts.downtime
* 09:32 jayme@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
* 09:31 jayme: published docker-registry.discovery.wmnet/eventrouter:0.3.0-1
* 09:26 jayme@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
* 09:23 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'wikifeeds' for release 'staging' .
* 09:09 volans: upgrading spicerack to 0.0.44 on cumin hosts - [[phab:T257905|T257905]]
 
== 2020-10-22 ==
* 22:42 mutante: ganeti1001 - adding 2 more vcpus to VM testreduce1001 - [[phab:T257940|T257940]]
* 22:03 mutante: deploy1002 - armed keyholder, all deployment keys loaded [[phab:T265963|T265963]]
* 21:56 mutante: deploy1002 - scap pull  and added to mediawiki-installation "dsh" group - will be part of scap trains but just like any appserver ([[phab:T265963|T265963]])
* 20:36 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 20:36 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 19:13 mutante: deploy1002 currently cloning ALL the deployment repos - new setup
* 18:57 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 18:57 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 18:56 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 18:56 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 18:54 mutante: applying deployment_server role to new server deploy1002 - might show up in monitoring but is not prod yet, deploy1001 still is
* 18:34 mutante: adding mcrouter cert for deploy1002.eqiad.wmnet [[phab:T265963|T265963]]
* 18:12 dpifke@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Expand  to group1 ([[phab:T123582|T123582]]) (duration: 00m 56s)
* 18:12 volans: cumin 'A:dns-rec' 'rec_control wipe-cache wikimedia.org$' - [[phab:T258729|T258729]]
* 18:07 chaomodus: Updating eqiad public network DNS to automation
* 17:50 volans: cumin 'A:dns-rec' 'rec_control wipe-cache eqiad.wmnet$' - [[phab:T258729|T258729]]
* 17:49 elukey: add thirdparty/bigtop14 to buster-wikimedia
* 17:46 chaomodus: Updating eqiad private network DNS to automation
* 17:21 bd808@cumin1001: END (PASS) - Cookbook wmcs.wikireplicas.add_wiki (exit_code=0)
* 17:21 bd808@cumin1001: Added views for new wiki: smnwiki [[phab:T264900|T264900]]
* 17:07 bd808@cumin1001: START - Cookbook wmcs.wikireplicas.add_wiki
* 16:46 pt1979@cumin2001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 16:42 pt1979@cumin2001: START - Cookbook sre.dns.netbox
* 14:56 moritzm: installing remaining mariadb-10.3 updates for buster (as packaged in Debian, not the wmf-mariadb package)
* 14:47 ayounsi@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 14:33 ayounsi@cumin1001: START - Cookbook sre.dns.netbox
* 14:13 andrewbogott: upgrading mariadb on cloudcontrol1003, 1004, 1005
* 14:05 ottomata: bump camus version to wmf12 for all camus jobs.  should be no-op now. - [[phab:T251609|T251609]]
* 14:00 otto@deploy1001: Synchronized wmf-config/InitialiseSettings.php: wgEventStreams: Enable canary events for all eventgate-analytics-external bound streams - [[phab:T251609|T251609]] (duration: 01m 02s)
* 13:55 moritzm: depooling ldap-eqiad-replica01/ldap-eqiad-replica02 [[phab:T264388|T264388]]
* 13:41 moritzm: pooling ldap-replica1001/1002 [[phab:T264388|T264388]]
* 13:10 moritzm: depooling ldap-replica2001/2002 [[phab:T264388|T264388]]
* 13:04 liw@deploy1001: rebuilt and synchronized wikiversions files: all wikis to 1.36.0-wmf.14
* 13:01 moritzm: pooling ldap-replica2004 [[phab:T264388|T264388]]
* 12:24 otto@deploy1001: Synchronized wmf-config/InitialiseSettings.php: wgEventStreams: Enable canary events for 3 eventgate-analytics bound streams - [[phab:T251609|T251609]] (duration: 01m 05s)
* 12:10 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|52ad2d4df1164dced684231c12aa64bd028b8ac9}}: Do not log logins at loginwiki via CU ([[phab:T253802|T253802]]) (duration: 01m 06s)
* 12:03 Urbanecm: [urbanecm@deploy1001 /srv/mediawiki-staging (master * u=)]$ sudo /usr/local/sbin/fix-staging-perms
* 11:59 Lucas_WMDE: EU backport&config window done
* 11:58 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/Wikibase.php: Config: [[gerrit:635762{{!}}Enable propagatePageDeletion on Test Wikidata]], 2/2 (duration: 01m 04s)
* 11:57 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:635762{{!}}Enable propagatePageDeletion on Test Wikidata]], 1/2 (duration: 01m 02s)
* 11:54 Urbanecm: Start of `mwscript extensions/AbuseFilter/maintenance/updateVarDumps.php --wiki=$wiki --print-orphaned-records-to=/tmp/urbanecm/$wiki-orphaned.log --progress-markers > $wiki.log` in a tmux session updateVarDumps at mwmaint2001 (wiki=huwiki; [[phab:T246539|T246539]])
* 11:39 moritzm: restarting nginx on acmechief*, debmonitor*, schema*, puppetdb* to pick up freetype update
* 11:38 marostegui: Compare s1-s8 tables - [[phab:T261914|T261914]]
* 11:33 aborrero@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 11:31 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/InterwikiSortOrders.php: Config: [[gerrit:635813{{!}}Add ary, avk, awa, lld, shy and smn to InterwikiSortOrders.php]] (duration: 01m 08s)
* 11:31 aborrero@cumin2001: START - Cookbook sre.hosts.downtime
* 11:25 moritzm: restarting apache and smokeping* on netmon* to pick up freetype update
* 11:21 moritzm: correction: installing freetype security updates for buster (stretch TBD)
* 10:43 moritzm: installing freetype security updates for stretch/buster
* 10:33 volans@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 10:27 volans@cumin1001: START - Cookbook sre.dns.netbox
* 09:38 arturo: merging https://gerrit.wikimedia.org/r/c/operations/puppet/+/634050 change to network data yaml
* 08:31 kormat: enabling replication from eqiad to codfw [[phab:T261914|T261914]]
* 08:23 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 08:23 filippo@cumin1001: START - Cookbook sre.hosts.downtime
* 07:52 godog: swift codfw-prod: bump object weight for ms-be2057 - [[phab:T261633|T261633]]
* 03:37 eileen: civicrm revision changed from {{Gerrit|4dce7bf535}} to {{Gerrit|bb7c08bf6d}}, config revision is {{Gerrit|9a522d03dd}}
* 03:13 eileen: civicrm revision changed from {{Gerrit|3c3dcf80ae}} to {{Gerrit|4dce7bf535}}, config revision is {{Gerrit|9a522d03dd}}
* 01:12 ryankemper@deploy1001: Finished deploy [wdqs/wdqs@870829c]: 0.3.52 (duration: 09m 07s)
* 01:04 ryankemper: Tests passing on canary `wdqs1003`, proceeding with wdqs deploy for rest of fleet
* 01:03 ryankemper@deploy1001: Started deploy [wdqs/wdqs@870829c]: 0.3.52
 
== 2020-10-21 ==
* 23:16 catrope@deploy1001: Synchronized php-1.36.0-wmf.14/extensions/GrowthExperiments/: [[phab:T266033|T266033]] (duration: 01m 05s)
* 23:14 catrope@deploy1001: Synchronized php-1.36.0-wmf.13/extensions/GrowthExperiments/: [[phab:T265751|T265751]] [[phab:T265754|T265754]] (duration: 01m 08s)
* 21:38 mutante: testreduce1001 assigned 2 more GBs of RAM - rebooting ([[phab:T257940|T257940]], [[phab:T257906|T257906]])
* 19:44 Amir1: end of foreachwikiindblist wikidataclient extensions/Wikibase/lib/maintenance/populateSitesTable.php --force-protocol https ([[phab:T264963|T264963]])
* 19:15 Amir1: start of foreachwikiindblist wikidataclient extensions/Wikibase/lib/maintenance/populateSitesTable.php --force-protocol https ([[phab:T264963|T264963]])
* 18:13 Urbanecm: Morning B&C window done
* 18:12 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|45312d359442d274e83deb7be80f86e12fb9e864}}: [WikibaseMediaInfo] Fix concept chips array nesting structure ([[phab:T256431|T256431]]) (duration: 01m 05s)
* 18:12 mepps: updated payments-wiki-staging from {{Gerrit|db03677b2d}} to {{Gerrit|5fdd29bc16}}
* 18:09 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|d94e33ff39b300c74fcaf08d1746c089fb1af783}}: cirrus: Hardcode more_like to codfw cirrus cluster (duration: 01m 05s)
* 17:56 XioNoX: configure FB PNI in eqdfw
* 17:43 ppchelko@deploy1001: Synchronized php-1.36.0-wmf.14/skins/WikimediaApiPortal: Backport gerrit:635329, [[phab:T266021|T266021]] (duration: 01m 06s)
* 17:34 ppchelko@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Switch ParserCache to JSON on testwiki gerrit:635382 (duration: 01m 05s)
* 17:24 ppchelko@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enable ParserCache logger for warn+, gerrit:635071 (duration: 01m 08s)
* 17:21 ppchelko@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enable ParserCache logger for warn+, gerrit:635071 (duration: 01m 06s)
* 17:13 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 17:13 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 16:58 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 16:57 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 16:57 mutante: scandium - disabling puppet so that Parsoid team can make some tests on testreduce1001 today
* 16:46 effie: restart php-fpm and pool mw2252 and mw2328
* 15:58 Lucas_WMDE: Deployed patch for [[phab:T260349|T260349]]
* 15:34 ppchelko@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
* 15:33 ppchelko@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
* 15:31 ppchelko@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
* 15:28 moritzm: updating prometheus-openldap-exporter to 0+git20171128-3 to buster-wikimedia
* 15:23 jbond42: upgrade puppetlabs-stdlib to 6.5.0 https://gerrit.wikimedia.org/r/c/operations/puppet/+/634278
* 15:08 moritzm: imported prometheus-openldap-exporter 0+git20171128-3 to buster-wikimedia [[phab:T264388|T264388]]
* 15:02 otto@deploy1001: Finished deploy [analytics/refinery@e4d16f0] (hadoop-test): deploying with updated camus to test cluster (duration: 02m 56s)
* 15:01 crusnov@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 15:00 otto@deploy1001: Started deploy [analytics/refinery@e4d16f0] (hadoop-test): deploying with updated camus to test cluster
* 14:56 crusnov@cumin1001: START - Cookbook sre.dns.netbox
* 14:44 reedy@deploy1001: Synchronized wmf-config/wikitech.php: Set CURLOPT_RETURNTRANSFER true in gerrit handler [[phab:T242554|T242554]] (duration: 01m 07s)
* 14:34 dcausse: restarting blazegraph on codfw servers ([[phab:T263952|T263952]])
* 13:21 moritzm: pooling ldap-replica2003 [[phab:T264388|T264388]]
* 13:04 liw@deploy1001: Synchronized php: group1 wikis to 1.36.0-wmf.14 (duration: 01m 04s)
* 13:03 liw@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.36.0-wmf.14
* 11:40 matthiasmullie: EU B&C done
* 11:33 mlitn@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [WikibaseMediaInfo] Add config for related terms API (duration: 01m 04s)
* 11:17 urbanecm@deploy1001: Synchronized wmf-config/CommonSettings.php: {{Gerrit|785404fa2b998947d236aebe481ee1abcbd14220}}: Disable registrations stat on Special:TranslationStats ([[phab:T264158|T264158]]) (duration: 01m 05s)
* 11:10 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|11567427c3f7d2908b29046ee56a7b0c0da32c09}}: Enable ContentTranslation in 5 Wikipedias as a default tool ([[phab:T264737|T264737]]; [[phab:T264738|T264738]]; [[phab:T264739|T264739]]; [[phab:T264740|T264740]]; [[phab:T264741|T264741]]) (duration: 01m 30s)
* 11:00 marostegui: Upgrade db2093's mariadb version [[phab:T266003|T266003]]
* 10:58 aborrero@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 10:56 aborrero@cumin2001: START - Cookbook sre.hosts.downtime
* 10:38 Urbanecm: Start of `mwscript extensions/AbuseFilter/maintenance/updateVarDumps.php --wiki=$wiki --print-orphaned-records-to=/tmp/urbanecm/$wiki-orphaned.log --progress-markers > $wiki.log` (wiki=rowiki; [[phab:T246539|T246539]])
* 10:37 Urbanecm: End of `mwscript extensions/AbuseFilter/maintenance/updateVarDumps.php --wiki=$wiki --print-orphaned-records-to=/tmp/urbanecm/$wiki-orphaned.log --progress-markers > $wiki.log` (wiki=srwiki; [[phab:T246539|T246539]])
* 10:01 Urbanecm: Start of `mwscript extensions/AbuseFilter/maintenance/updateVarDumps.php --wiki=$wiki --print-orphaned-records-to=/tmp/urbanecm/$wiki-orphaned.log --progress-markers > $wiki.log` (wiki=srwiki; [[phab:T246539|T246539]])
* 10:00 Urbanecm: End of `mwscript extensions/AbuseFilter/maintenance/updateVarDumps.php --wiki=$wiki --print-orphaned-records-to=/tmp/urbanecm/$wiki-orphaned.log --progress-markers > $wiki.log` (wiki=nowiki; [[phab:T246539|T246539]])
* 09:59 vgutierrez: Bump ECDHE-ECDSA-AES128-SHA pageview replacement to 100% - [[phab:T258405|T258405]]
* 09:42 Urbanecm: Start of `mwscript extensions/AbuseFilter/maintenance/updateVarDumps.php --wiki=$wiki --print-orphaned-records-to=/tmp/urbanecm/$wiki-orphaned.log --progress-markers > $wiki.log` (wiki=nowiki; [[phab:T246539|T246539]])
* 09:42 Urbanecm: End of `mwscript extensions/AbuseFilter/maintenance/updateVarDumps.php --wiki=$wiki --print-orphaned-records-to=/tmp/urbanecm/$wiki-orphaned.log --progress-markers > $wiki.log` (wiki=shwiki; [[phab:T246539|T246539]])
* 09:38 Urbanecm: Start of `mwscript extensions/AbuseFilter/maintenance/updateVarDumps.php --wiki=$wiki --print-orphaned-records-to=/tmp/urbanecm/$wiki-orphaned.log --progress-markers > $wiki.log` (wiki=shwiki; [[phab:T246539|T246539]])
* 09:37 Urbanecm: mwscript extensions/AbuseFilter/maintenance/updateVarDumps.php --wiki=$wiki --print-orphaned-records-to=/tmp/urbanecm/$wiki-orphaned.log --progress-markers > $wiki.log # wiki=warwiki; [[phab:T246539|T246539]]
* 09:30 Urbanecm: End of `mwscript extensions/AbuseFilter/maintenance/updateVarDumps.php --wiki=$wiki --print-orphaned-records-to=/tmp/urbanecm/$wiki-orphaned.log --progress-markers > $wiki.log` (wiki=viwiki; [[phab:T246539|T246539]])
* 09:23 aborrero@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 09:22 root@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
* 09:21 aborrero@cumin2001: START - Cookbook sre.hosts.downtime
* 08:52 Urbanecm: Start of `mwscript extensions/AbuseFilter/maintenance/updateVarDumps.php --wiki=$wiki --print-orphaned-records-to=/tmp/urbanecm/$wiki-orphaned.log --progress-markers > $wiki.log` (wiki=viwiki; [[phab:T246539|T246539]])
* 08:50 Urbanecm: mwscript extensions/AbuseFilter/maintenance/updateVarDumps.php --wiki=$wiki --print-orphaned-records-to=/tmp/urbanecm/$wiki-orphaned.log --progress-markers > $wiki.log # wiki=cebwiki; [[phab:T246539|T246539]]
* 08:46 Urbanecm: [urbanecm@mwmaint2001 ~/updateVarDumps/output/group2-medium/output]$ mwscript extensions/AbuseFilter/maintenance/updateVarDumps.php --wiki=apiportalwiki # [[phab:T246539|T246539]]
* 08:38 root@cumin1001: START - Cookbook sre.ganeti.makevm
* 08:38 root@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99)
* 08:38 root@cumin1001: START - Cookbook sre.ganeti.makevm
* 08:33 godog: swift codfw-prod: bump object weight for ms-be2057 - [[phab:T261633|T261633]]
* 08:10 XioNoX: Upgrade Routinator 3000 to 0.8.0 on rpki1001 - [[phab:T266001|T266001]]
* 08:09 XioNoX: add Routinator 3000 0.8.0 to apt - [[phab:T266001|T266001]]
* 07:58 elukey: update analytics-in4 filter on cr1/cr2-eqiad for https://gerrit.wikimedia.org/r/635319
* 04:35 ryankemper: re-enabled icinga notifications on all wdqs hosts now that `wdqs-updater` is healthy
 
== 2020-10-20 ==
* 22:10 dwisehaupt: frmon2001 upgraded to buster with grafana 7.2.1
* 21:19 razzi@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
* 21:18 cdanis: ✔️ cdanis@mw2252.codfw.wmnet ~ 🕠🍺 sudo depool
* 20:57 mforns@deploy1001: Finished deploy [analytics/refinery@e4d16f0] (thin): Regular analytics weekly train THIN [analytics/refinery@e4d16f08a96b6f65447fcdc6c9e8945724a89f54] (duration: 00m 08s)
* 20:56 mforns@deploy1001: Started deploy [analytics/refinery@e4d16f0] (thin): Regular analytics weekly train THIN [analytics/refinery@e4d16f08a96b6f65447fcdc6c9e8945724a89f54]
* 20:39 cdanis: doing some manual testing on mw2221, depooled and puppet disabled
* 20:33 mforns@deploy1001: Finished deploy [analytics/refinery@e4d16f0]: Regular analytics weekly train [analytics/refinery@e4d16f08a96b6f65447fcdc6c9e8945724a89f54] (duration: 08m 10s)
* 20:31 ryankemper: [Temporarily] disabled notifications for all wdqs hosts while we figure out how to unstick the updater process. Impact is that new updates will be delayed, but queries will still keep serving as normal, so fixing this is a priority but note that there's no availability outage
* 20:29 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 20:25 mforns@deploy1001: Started deploy [analytics/refinery@e4d16f0]: Regular analytics weekly train [analytics/refinery@e4d16f08a96b6f65447fcdc6c9e8945724a89f54]
* 20:19 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
* 20:18 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 20:06 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
* 19:59 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 19:47 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
* 19:47 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
* 19:47 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
* 19:45 dzahn@cumin1001: conftool action : set/pooled=yes; selector: dc=codfw,cluster=parsoid,service=canary
* 19:24 razzi@cumin1001: START - Cookbook sre.ganeti.makevm
* 18:58 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 18:56 andrew@cumin1001: START - Cookbook sre.hosts.downtime
* 17:48 effie: depooling mw2328 - [[phab:T266052|T266052]]
* 17:37 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 17:35 andrew@cumin1001: START - Cookbook sre.hosts.downtime
* 15:54 ebernhardson@deploy1001: Finished deploy [wikimedia/discovery/analytics@629e8bc]: search satisfaction: remove unused y/m/d cli args (duration: 01m 31s)
* 15:52 ebernhardson@deploy1001: Started deploy [wikimedia/discovery/analytics@629e8bc]: search satisfaction: remove unused y/m/d cli args
* 15:15 aborrero@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 15:13 aborrero@cumin2001: START - Cookbook sre.hosts.downtime
* 14:58 urbanecm@deploy1001: Synchronized php-1.36.0-wmf.13/extensions/AbuseFilter/includes/Views/AbuseFilterViewList.php: {{Gerrit|fee2d3be13ae14d7ea51ff2db42090a1c27819bf}}: Prevent uncaught warnings/exception on Special:AbuseFilter ([[phab:T265994|T265994]]) (duration: 01m 03s)
* 14:56 urbanecm@deploy1001: Synchronized php-1.36.0-wmf.14/extensions/AbuseFilter/includes/Views/AbuseFilterViewList.php: {{Gerrit|00ef00f59fd2a7a1366161ccc66c260be20e3e50}}: Prevent uncaught warnings/exception on Special:AbuseFilter ([[phab:T265994|T265994]]) (duration: 01m 01s)
* 14:48 urbanecm@deploy1001: Synchronized php-1.36.0-wmf.14/extensions/FileImporter/: {{Gerrit|5eee9b773338e5181867cabec9faefbdeacf67ca}}: Set originalRequest (incl. X-Forwarded-For) for remote edits ([[phab:T265810|T265810]]) (duration: 01m 06s)
* 14:16 urbanecm@deploy1001: Synchronized php-1.36.0-wmf.13/extensions/FileImporter/: {{Gerrit|5f8d3de14c116b618f5226419082d5c9a07766fb}}: Set originalRequest (incl. X-Forwarded-For) for remote edits ([[phab:T265810|T265810]]) (duration: 01m 09s)
* 14:15 Urbanecm: [urbanecm@deploy1001 /srv/mediawiki-staging (master u=)]$ sudo /usr/local/sbin/fix-staging-perms
* 13:54 marostegui@cumin1001: dbctl commit (dc=all): 'db2125 (re)pooling @ 100%: Slowly repool db2125 after checking tables ', diff saved to https://phabricator.wikimedia.org/P13033 and previous config saved to /var/cache/conftool/dbconfig/20201020-135436-root.json
* 13:39 marostegui@cumin1001: dbctl commit (dc=all): 'db2125 (re)pooling @ 80%: Slowly repool db2125 after checking tables ', diff saved to https://phabricator.wikimedia.org/P13032 and previous config saved to /var/cache/conftool/dbconfig/20201020-133933-root.json
* 13:24 marostegui@cumin1001: dbctl commit (dc=all): 'db2125 (re)pooling @ 60%: Slowly repool db2125 after checking tables ', diff saved to https://phabricator.wikimedia.org/P13031 and previous config saved to /var/cache/conftool/dbconfig/20201020-132430-root.json
* 13:19 XioNoX: install routinator 3000 0.8.0 on rpki2001 - [[phab:T266001|T266001]]
* 13:16 liw@deploy1001: rebuilt and synchronized wikiversions files: group0 wikis to 1.36.0-wmf.14
* 13:11 liw@deploy1001: Finished scap: testwikis wikis to 1.36.0-wmf.14 (duration: 58m 03s)
* 13:09 marostegui@cumin1001: dbctl commit (dc=all): 'db2125 (re)pooling @ 40%: Slowly repool db2125 after checking tables ', diff saved to https://phabricator.wikimedia.org/P13030 and previous config saved to /var/cache/conftool/dbconfig/20201020-130926-root.json
* 12:54 marostegui@cumin1001: dbctl commit (dc=all): 'db2125 (re)pooling @ 20%: Slowly repool db2125 after checking tables ', diff saved to https://phabricator.wikimedia.org/P13029 and previous config saved to /var/cache/conftool/dbconfig/20201020-125423-root.json
* 12:25 jayme@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventstreams' for release 'production' .
* 12:25 jayme@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventstreams' for release 'canary' .
* 12:24 jayme@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventstreams' for release 'production' .
* 12:24 jayme@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventstreams' for release 'canary' .
* 12:16 jayme@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventstreams' for release 'production' .
* 12:16 jayme@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventstreams' for release 'canary' .
* 12:15 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventstreams' for release 'production' .
* 12:13 liw@deploy1001: Started scap: testwikis wikis to 1.36.0-wmf.14
* 11:37 liw: 1.36.0-wmf.14 was branched at {{Gerrit|1b7b5f716015f9303d37158820dadf759e8db707}} for [[phab:T263180|T263180]]
* 11:35 Lucas_WMDE: EU backport/config window done
* 11:35 lucaswerkmeister-wmde@deploy1001: Synchronized php-1.36.0-wmf.13/extensions/WikimediaEvents/: Backport: [[gerrit:635030{{!}}SearchSatisfaction: Set isAnon field (T259250)]] (duration: 00m 57s)
* 11:15 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:634039{{!}}Set Wikidata MF to collapse sections by default (T239195)]] (duration: 00m 56s)
* 11:09 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:634938{{!}}Remove noratelimit from Wikidata bot group (T258354)]] (duration: 00m 56s)
* 10:09 jayme@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'canary' .
* 10:09 jayme@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
* 10:04 godog: swift codfw-prod: bump object weight for ms-be2057 - [[phab:T261633|T261633]]
* 09:59 dcausse: [[phab:T255399|T255399]]: resuming wdqs-data-reload manually from chunk no 776 on wdqs1009
* 09:51 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 09:51 klausman@cumin1001: START - Cookbook sre.hosts.downtime
* 09:50 jayme@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
* 09:50 jayme@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'canary' .
* 09:47 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
* 09:25 jayme@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
* 09:25 jayme@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'canary' .
* 09:08 jayme@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'canary' .
* 09:08 jayme@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
* 09:06 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
 
== 2020-10-19 ==
* 23:57 gehel@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99)
* 23:57 gehel@cumin1001: START - Cookbook sre.wdqs.data-reload
* 23:57 gehel@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99)
* 23:57 gehel@cumin1001: START - Cookbook sre.wdqs.data-reload
* 23:56 gehel@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99)
* 23:11 ebernhardson@deploy1001: Finished deploy [wikimedia/discovery/analytics@4bfd6c9]: spark: case insensitive schema validation (duration: 04m 33s)
* 23:07 ebernhardson@deploy1001: Started deploy [wikimedia/discovery/analytics@4bfd6c9]: spark: case insensitive schema validation
* 23:02 mutante: etherpad got restarted with new config options related to rate limiting - hopefully this fixed [[phab:T265490|T265490]]
* 21:29 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 21:20 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
* 21:19 ebernhardson@deploy1001: Finished deploy [wikimedia/discovery/analytics@94c23a1]: airflow: fix column mismatch writing page predictions (duration: 04m 48s)
* 21:14 ebernhardson@deploy1001: Started deploy [wikimedia/discovery/analytics@94c23a1]: airflow: fix column mismatch writing page predictions
* 21:01 ppchelko@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
* 20:57 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 20:55 ppchelko@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
* 20:49 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
* 20:46 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 20:41 eileen: drush vset match_on_import 1
* 20:38 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
* 20:21 otto@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventstreams' for release 'production' .
* 20:21 otto@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventstreams' for release 'canary' .
* 20:19 otto@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventstreams' for release 'canary' .
* 20:19 otto@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventstreams' for release 'production' .
* 20:18 otto@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventstreams' for release 'production' .
* 20:18 otto@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventstreams' for release 'canary' .
* 20:17 jgiannelos@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'mobileapps' for release 'nontls' .
* 20:17 jgiannelos@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
* 20:16 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: dc=codfw,name=wtp2020.codfw.wmnet
* 20:16 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 20:16 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 20:16 ebernhardson@deploy1001: Finished deploy [wikimedia/discovery/analytics@e66bec2]: Fix column mismatch when reading discovery.wikibase_item (duration: 01m 03s)
* 20:16 mutante: decom'ing wtp201[0-9].codfw.wmnet (pooled=inactive) [[phab:T265558|T265558]]
* 20:16 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 20:15 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 20:15 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: dc=codfw,name=wtp201[0-9].codfw.wmnet
* 20:15 ebernhardson@deploy1001: Started deploy [wikimedia/discovery/analytics@e66bec2]: Fix column mismatch when reading discovery.wikibase_item
* 20:09 dzahn@cumin1001: conftool action : set/weight=1; selector: dc=codfw,cluster=parsoid,service=canary
* 20:08 jgiannelos@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
* 20:08 jgiannelos@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'mobileapps' for release 'nontls' .
* 20:06 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 20:06 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 20:01 mutante: decom'ing wtp200[1-9].codfw.wmnet (pooled=inactive) [[phab:T265558|T265558]]
* 20:00 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: dc=codfw,name=wtp200[1-9].codfw.wmnet
* 19:57 jgiannelos@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'mobileapps' for release 'staging' .
* 19:57 otto@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'canary' .
* 19:57 otto@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
* 19:52 otto@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'canary' .
* 19:52 otto@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
* 19:51 otto@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'canary' .
* 19:45 ebernhardson@deploy1001: Finished deploy [wikimedia/discovery/analytics@3c590e2]: Fix column mismatch for discovery.wikibase_item and multilist handler for esbulk uploads (duration: 03m 35s)
* 19:41 ebernhardson@deploy1001: Started deploy [wikimedia/discovery/analytics@3c590e2]: Fix column mismatch for discovery.wikibase_item and multilist handler for esbulk uploads
* 19:35 otto@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'canary' .
* 19:34 otto@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'canary' .
* 19:33 mutante: wtp2001 - sudo confctl decommission
* 19:29 dzahn@cumin1001: conftool action : set/weight=0; selector: dc=codfw,cluster=parsoid,service=canary
* 19:01 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: GrowthExperiments: Set default variant to D on trwiki ([[phab:T243445|T243445]], [[phab:T265556|T265556]]) (duration: 00m 56s)
* 18:37 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|18902aa75efafb7d56ca347c12781dbe59f2f8ad}}: Change votewiki language temporarily to fa for fawiki elections ([[phab:T262689|T262689]]) (duration: 00m 56s)
* 18:31 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enable and configure GrowthExperiments on trwiki ([[phab:T243445|T243445]]) (duration: 00m 57s)
* 18:29 tzatziki: removing 10 files for legal compliance
* 18:24 catrope@deploy1001: Synchronized php-1.36.0-wmf.13/extensions/MobileFrontend/: Fix mobile diff redirect when curid parameter is present ([[phab:T265654|T265654]]) (duration: 00m 58s)
* 18:20 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: GrowthExperiments: Enable variant C/D for new users ([[phab:T265556|T265556]]) (duration: 00m 56s)
* 18:10 catrope@deploy1001: Synchronized wmf-config/CommonSettings.php: Drop wgHiddenPrefs hack for VE beta feature ([[phab:T254349|T254349]]) (duration: 00m 56s)
* 17:53 robh@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 16:44 robh@cumin1001: START - Cookbook sre.dns.netbox
* 16:18 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 16:16 robh@cumin1001: START - Cookbook sre.hosts.downtime
* 15:59 Urbanecm: mwscript extensions/CirrusSearch/maintenance/UpdateSearchIndexConfig.php --wiki=smnwiki --cluster=all
* 15:31 elukey: update puppet compilers' facts
* 14:36 bpirkle@deploy1001: Synchronized wmf-config/CommonSettings.php: gerrit:634841 Add api.wikimedia.org to the list of allowed CORS origins (duration: 00m 57s)
* 14:32 bpirkle@deploy1001: Synchronized wmf-config/InitialiseSettings-labs.php: gerrit:634356 Configuration for user menu and sidebar special pages (duration: 00m 55s)
* 14:30 bpirkle@deploy1001: Synchronized wmf-config/InitialiseSettings.php: gerrit:634356 Configuration for user menu and sidebar special pages (duration: 00m 56s)
* 14:15 moritzm: installing llvm-toolchain-7 bugfix updates from Buster point release
* 13:34 Urbanecm: Start of `[urbanecm@mwmaint2001 ~/updateVarDumps/output/group2-medium]$ while read wiki; do echo "Processing $wiki"; mwscript extensions/AbuseFilter/maintenance/updateVarDumps.php --wiki=$wiki --print-orphaned-records-to=/tmp/urbanecm/$wiki-orphaned.log --progress-markers > output/$wiki.log; done < wikis.dblist` ([[phab:T246539|T246539]]; wikis.dblist is medium wikis from group2.dblist)
* 13:33 mholloway-shell@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
* 13:32 mholloway-shell@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
* 13:31 mholloway-shell@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'wikifeeds' for release 'staging' .
* 13:26 moritzm: import prometheus-openldap-exporter 0+git20171128-2+deb10u1  for buster-wikimedia  [[phab:T264388|T264388]]
* 12:48 moritzm: installing httpcomponents-client security updates on Buster
* 12:26 Urbanecm: Creation of smnwiki is done ([[phab:T264859|T264859]])
* 12:25 urbanecm@deploy1001: Synchronized wmf-config/interwiki.php: Update interwiki cache (duration: 00m 56s)
* 12:22 urbanecm@deploy1001: Synchronized langlist: Creating smnwiki ([[phab:T264859|T264859]]) (duration: 00m 56s)
* 12:16 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Creating smnwiki ([[phab:T264859|T264859]]) (duration: 00m 55s)
* 12:16 marostegui: Sanitize smnwiki on db1124:3315 and db2094:3315 - [[phab:T264900|T264900]]
* 12:15 urbanecm@deploy1001: Synchronized static/images/project-logos/: Creating smnwiki ([[phab:T264859|T264859]]) (duration: 00m 56s)
* 12:15 marostegui: Deploy schema change on smnwiki [[phab:T265321|T265321]] [[phab:T264900|T264900]]
* 12:14 urbanecm@deploy1001: rebuilt and synchronized wikiversions files: Creating smnwiki ([[phab:T264859|T264859]])
* 12:12 urbanecm@deploy1001: Synchronized dblists: Creating smnwiki ([[phab:T264859|T264859]]) (duration: 00m 55s)
* 12:11 urbanecm@deploy1001: Synchronized wmf-config/db-codfw.php: Creating smnwiki ([[phab:T264859|T264859]]) (duration: 00m 55s)
* 12:10 urbanecm@deploy1001: Synchronized wmf-config/db-eqiad.php: Creating smnwiki ([[phab:T264859|T264859]]) (duration: 00m 56s)
* 11:51 moritzm: updating idp-test1001 to CAS 6.2.4
* 11:46 moritzm: updating idp-test2001 to CAS 6.2.4
* 11:43 Urbanecm: End of `[urbanecm@mwmaint2001 ~/updateVarDumps/script]$ while read wiki; do echo "Processing $wiki"; mwscript extensions/AbuseFilter/maintenance/updateVarDumps.php --wiki=$wiki --print-orphaned-records-to=/tmp/urbanecm/$wiki-orphaned.log --progress-markers > $wiki.log; done < ../small-group2.dblist` # [[phab:T246539|T246539]] # small-group2.dblist is wikis from small.dblist that are also in group2.dblist
* 11:42 Urbanecm: End of `mwscript extensions/AbuseFilter/maintenance/updateVarDumps.php --wiki=enwikisource --print-orphaned-records-to=/tmp/urbanecm/enwikisource-orphaned.log --progress-markers` ([[phab:T246539|T246539]])
* 11:40 Urbanecm: [urbanecm@mwmaint2001 ~/updateVarDumps/script]$ while read wiki; do echo "Processing $wiki"; mwscript extensions/AbuseFilter/maintenance/updateVarDumps.php --wiki=$wiki --print-orphaned-records-to=/tmp/urbanecm/$wiki-orphaned.log --progress-markers > $wiki.log; done < ../small-group2.dblist # [[phab:T246539|T246539]] # small-group2.dblist is wikis from small.dblist that are also in group2.dblist
* 11:31 jmm@cumin2001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1)
* 11:24 Urbanecm: EU B&C window done
* 11:24 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|ce92c9814bf9c12cab1a9592dfb32f935d255d93}}: Restore bureaucrat abilities at uzwiki ([[phab:T265746|T265746]]) (duration: 00m 56s)
* 11:20 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|26b97261f2b9d1991ea08fe32b6007ba6fe5088f}}: Disable EditorJourney (UnderstandingFirstDay) ([[phab:T252391|T252391]]) (duration: 01m 10s)
* 11:17 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
* 11:13 Urbanecm: Manually run `mwscript extensions/AbuseFilter/maintenance/updateVarDumps.php --wiki=$wiki --print-orphaned-records-to=/tmp/urbanecm/$wiki-orphaned.log --progress-markers > $wiki.log` for several small group2 wikis ([[phab:T246539|T246539]])
* 10:57 Urbanecm: Start `mwscript extensions/AbuseFilter/maintenance/updateVarDumps.php --wiki=enwikisource --print-orphaned-records-to=/tmp/urbanecm/enwikisource-orphaned.log --progress-markers` in a tmux session named updateVarDumps at mwmaint2001 ([[phab:T246539|T246539]])
* 10:53 Urbanecm: [urbanecm@mwmaint2001 ~/updateVarDumps/script]$  mwscript extensions/AbuseFilter/maintenance/updateVarDumps.php --wiki=jawikivoyage --print-orphaned-records-to=- --progress-markers # [[phab:T246539|T246539]]
* 09:09 gehel@cumin2001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 08:40 jayme: updated helm to 2.16.12-1 on deploy*,chartmuseum*,contint*
* 08:37 godog: upgrade rsyslog to 8.2008.0-1~bpo10+1 on centrallog2001 - [[phab:T259780|T259780]]
* 08:31 godog: swift codfw-prod: bump object weight for ms-be2057 - [[phab:T261633|T261633]]
* 08:26 jayme: updated helm to 2.16.12-1 on deploy2001
* 08:24 jayme: imported helm 2.16.12-1 to buster-wikimedia stretch-wikimedia jessie-wikimedia - [[phab:T263616|T263616]]
* 08:01 godog: re-enable compaction for prometheus[12]003 - [[phab:T261281|T261281]]
* 07:53 gehel@cumin2001: START - Cookbook sre.wdqs.data-transfer
* 07:36 elukey@cumin1001: END (FAIL) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=99)
* 07:36 elukey@cumin1001: START - Cookbook sre.hadoop.init-hadoop-workers
* 07:16 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2125 ', diff saved to https://phabricator.wikimedia.org/P13022 and previous config saved to /var/cache/conftool/dbconfig/20201019-071614-marostegui.json
* 06:46 elukey@deploy1001: Finished deploy [analytics/turnilo/deploy@334627e]: Upgrade to 1.27 (duration: 00m 10s)
* 06:45 elukey@deploy1001: Started deploy [analytics/turnilo/deploy@334627e]: Upgrade to 1.27
 
== 2020-10-17 ==
* 13:22 Urbanecm: [urbanecm@mwmaint2001 ~/uploads]$ mwscript importImages.php --wiki=commonswiki --comment-ext=txt --user=Fæ . # [[phab:T264529|T264529]]
 
== 2020-10-16 ==
* 21:46 pt1979@cumin2001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 21:43 pt1979@cumin2001: START - Cookbook sre.dns.netbox
* 20:27 robh@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 20:25 robh@cumin1001: START - Cookbook sre.hosts.downtime
* 19:39 robh@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 19:37 robh@cumin1001: START - Cookbook sre.hosts.downtime
* 17:43 thcipriani: restarting gerrit due to gc thrashing
* 16:25 andrew@deploy1001: Finished deploy [horizon/deploy@89b308c]: prevent creation of VMs with non-ceph flavors (duration: 04m 08s)
* 16:21 andrew@deploy1001: Started deploy [horizon/deploy@89b308c]: prevent creation of VMs with non-ceph flavors
* 15:36 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.cf (exit_code=0)
* 15:36 ayounsi@cumin1001: START - Cookbook sre.network.cf
* 15:11 bblack@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 15:01 bblack@cumin1001: START - Cookbook sre.hosts.decommission
* 13:41 effie: pooling mw2279.codfw.wmnet [[phab:T264698|T264698]]
* 12:11 jiji@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 12:09 jiji@cumin2001: START - Cookbook sre.hosts.downtime
* 10:35 reedy@deploy1001: Synchronized php-1.36.0-wmf.13/extensions/ProofreadPage/: Revert excessive escaping [[phab:T265571|T265571]] (duration: 01m 12s)
* 09:23 ema: text@esams (except for cp3050/cp3052): upgrade varnish to 6.0.6-1wm2, restart varnishkafka instances [[phab:T264074|T264074]]
* 09:19 ema: upload@esams: upgrade varnish to 6.0.6-1wm2, restart varnishkafka-webrequest [[phab:T264074|T264074]]
* 09:08 ema: upload@eqsin: upgrade varnish to 6.0.6-1wm2, restart varnishkafka-webrequest [[phab:T264074|T264074]]
* 09:03 XioNoX: eqsin, push CR 634473
* 09:01 ema: text@eqsin: upgrade varnish to 6.0.6-1wm2, restart varnishkafka instances [[phab:T264074|T264074]]
* 08:53 ema: upload@codfw: upgrade varnish to 6.0.6-1wm2, restart varnishkafka-webrequest [[phab:T264074|T264074]]
* 08:52 XioNoX: add BGP_IXP_RS_in to eqsin RS BGP sessions
* 08:48 ema: text@codfw: upgrade varnish to 6.0.6-1wm2, restart varnishkafka instances [[phab:T264074|T264074]]
* 08:29 ema: upload@eqiad: upgrade varnish to 6.0.6-1wm2, restart varnishkafka-webrequest [[phab:T264074|T264074]]
* 08:24 ema: text@eqiad: upgrade varnish to 6.0.6-1wm2, restart varnishkafka instances [[phab:T264074|T264074]]
* 08:09 elukey: reboot stat1005/stat1008 to pick up correct GPU settings