You are browsing a read-only backup copy of Wikitech. The live site can be found at wikitech.wikimedia.org

Difference between revisions of "Server Admin Log"

From Wikitech-static
Jump to navigation Jump to search
imported>Stashbot
(dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0))
imported>Stashbot
(mutante: mx2001 - did not come back from reboot, did not get IP on interface, could not start ferm, logged in via console with root password, in /etc/network/interfaces replaced all "ens5" with "ens13", rebooted again, selected previous kernel version)
 
(437 intermediate revisions by 3 users not shown)
Line 1: Line 1:
== 2020-07-30 ==
== 2021-12-04 ==
* 00:52 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 01:14 mutante: mx2001 - did not come back from reboot, did not get IP on interface, could not start ferm, logged in via console with root password, in /etc/network/interfaces replaced all "ens5" with "ens13", rebooted again, selected previous kernel version
* 00:52 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 00:54 mutante: rebooting mx2001
* 00:23 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 00:31 jynus: manually restarting clamav on otrs1001 after being killed
* 00:23 dzahn@cumin1001: START - Cookbook sre.hosts.downtime


== 2020-07-29 ==
== 2021-12-03 ==
* 23:51 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 20:29 cstone: revision changed from {{Gerrit|2c2e22cd}} to {{Gerrit|b82183b9}}
* 23:51 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 17:56 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1028.eqiad.wmnet with OS buster
* 23:41 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=wtp2008.codfw.wmnet
* 17:47 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1028.eqiad.wmnet with OS buster
* 23:28 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 17:47 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1028.eqiad.wmnet with OS buster
* 23:28 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 17:35 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1028.eqiad.wmnet with OS buster
* 23:09 Urbanecm: Run mwscript namespaceDupes.php --wiki=mswiktionary --fix ([[phab:T255391|T255391]])
* 17:35 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1028.eqiad.wmnet with OS buster
* 23:07 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|396a395c79c606cb7deeb7906fefc7f16e63fa4f}}: Add several extra namespaces for mswiktionary ([[phab:T255391|T255391]]) (duration: 01m 07s)
* 17:35 razzi@cumin1001: END (PASS) - Cookbook sre.aqs.roll-restart (exit_code=0) for AQS aqs cluster: Roll restart of all AQS's nodejs daemons.
* 22:46 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=wtp2006.codfw.wmnet
* 17:22 razzi@cumin1001: START - Cookbook sre.aqs.roll-restart for AQS aqs cluster: Roll restart of all AQS's nodejs daemons.
* 22:45 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=wtp2007.codfw.wmnet
* 16:56 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1028.eqiad.wmnet with OS buster
* 22:00 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 16:56 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1028.eqiad.wmnet with OS buster
* 22:00 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 16:44 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1028.eqiad.wmnet with OS buster
* 21:22 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 16:42 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1028.eqiad.wmnet with OS buster
* 21:22 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 16:42 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1028.eqiad.wmnet with OS buster
* 21:15 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 16:39 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1028.eqiad.wmnet with OS buster
* 21:15 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 16:39 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1028.eqiad.wmnet with OS buster
* 21:00 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 14:25 jelto@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host gitlab-runner2001.codfw.wmnet
* 21:00 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 14:10 jelto@cumin1001: START - Cookbook sre.ganeti.makevm for new host gitlab-runner2001.codfw.wmnet
* 20:35 crusnov@deploy1001: Finished deploy [netbox/deploy@fde9dfe]: Test deploy of 2.8.8 to netbox-next pt2 (duration: 00m 05s)
* 12:53 moritzm: installing nss security updates on stretch
* 20:35 crusnov@deploy1001: Started deploy [netbox/deploy@fde9dfe]: Test deploy of 2.8.8 to netbox-next pt2
* 12:37 moritzm: draining primary/secondary instances off ganeti2007 [[phab:T296622|T296622]]
* 20:35 crusnov@deploy1001: Finished deploy [netbox/deploy@fde9dfe]: Test deploy of 2.8.8 to netbox-next (duration: 01m 12s)
* 12:33 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti2022.codfw.wmnet to ganeti01.svc.codfw.wmnet
* 20:34 crusnov@deploy1001: Started deploy [netbox/deploy@fde9dfe]: Test deploy of 2.8.8 to netbox-next
* 12:33 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti2022.codfw.wmnet to ganeti01.svc.codfw.wmnet
* 20:19 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=wtp2004.codfw.wmnet
* 12:30 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2022.codfw.wmnet
* 19:45 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 12:26 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2022.codfw.wmnet
* 19:44 herron@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
* 12:13 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti2022.codfw.wmnet with OS buster
* 19:43 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 11:30 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti2022.codfw.wmnet with OS buster
* 19:41 herron@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
* 11:27 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ganeti2011.codfw.wmnet with OS buster
* 19:41 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 11:08 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti2011.codfw.wmnet with OS buster
* 19:41 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 11:06 jynus: stop and shutdown db1102 [[phab:T296546|T296546]]
* 19:29 herron@cumin1001: START - Cookbook sre.ganeti.makevm
* 11:01 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on ganeti2011.codfw.wmnet with reason: Temporarily remove node from Ganeti for reimage
* 19:27 herron@cumin1001: START - Cookbook sre.ganeti.makevm
* 11:01 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on ganeti2011.codfw.wmnet with reason: Temporarily remove node from Ganeti for reimage
* 19:20 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 09:38 moritzm: draining primary/secondary instances off ganeti2011 [[phab:T296622|T296622]]
* 19:20 herron@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99)
* 09:25 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti2009.codfw.wmnet to ganeti01.svc.codfw.wmnet
* 19:19 herron@cumin1001: START - Cookbook sre.ganeti.makevm
* 09:24 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti2009.codfw.wmnet to ganeti01.svc.codfw.wmnet
* 19:18 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 09:23 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2009.codfw.wmnet
* 19:18 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 09:18 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2009.codfw.wmnet
* 19:17 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 09:15 marostegui@cumin1001: dbctl commit (dc=all): 'After maintenance db1161 ([[phab:T277354|T277354]])', diff saved to https://phabricator.wikimedia.org/P18019 and previous config saved to /var/cache/conftool/dbconfig/20211203-091537-marostegui.json
* 19:17 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 09:00 marostegui@cumin1001: dbctl commit (dc=all): 'After maintenance db1161', diff saved to https://phabricator.wikimedia.org/P18018 and previous config saved to /var/cache/conftool/dbconfig/20211203-090033-marostegui.json
* 19:16 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 08:58 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti2009.codfw.wmnet with OS buster
* 19:04 qchris: Restarting Gerrit on gerrit2001 (gerrit-replica) to make security fix effective.
* 08:45 marostegui@cumin1001: dbctl commit (dc=all): 'After maintenance db1161', diff saved to https://phabricator.wikimedia.org/P18017 and previous config saved to /var/cache/conftool/dbconfig/20211203-084528-marostegui.json
* 19:04 qchris@deploy1001: Finished deploy [gerrit/gerrit@9275b30]: Gerrit to v3.2.3-1-g185bdc3a69 on gerrit2001 (duration: 00m 09s)
* 08:44 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 19:03 qchris@deploy1001: Started deploy [gerrit/gerrit@9275b30]: Gerrit to v3.2.3-1-g185bdc3a69 on gerrit2001
* 08:43 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 19:00 qchris: Restarting Gerrit on gerrit1001 to make security fix effective.
* 08:30 marostegui@cumin1001: dbctl commit (dc=all): 'After maintenance db1161 ([[phab:T277354|T277354]])', diff saved to https://phabricator.wikimedia.org/P18016 and previous config saved to /var/cache/conftool/dbconfig/20211203-083023-marostegui.json
* 19:00 qchris@deploy1001: Finished deploy [gerrit/gerrit@9275b30]: Gerrit to v3.2.3-1-g185bdc3a69 on gerrit1001 (duration: 00m 08s)
* 08:30 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti2009.codfw.wmnet with OS buster
* 19:00 qchris@deploy1001: Started deploy [gerrit/gerrit@9275b30]: Gerrit to v3.2.3-1-g185bdc3a69 on gerrit1001
* 08:29 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1161 ([[phab:T277354|T277354]])', diff saved to https://phabricator.wikimedia.org/P18015 and previous config saved to /var/cache/conftool/dbconfig/20211203-082859-marostegui.json
* 18:47 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 08:28 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db[1154,1161].eqiad.wmnet with reason: Maintenance [[phab:T277354|T277354]]
* 18:44 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 08:28 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db[1154,1161].eqiad.wmnet with reason: Maintenance [[phab:T277354|T277354]]
* 18:42 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 08:28 marostegui@cumin1001: dbctl commit (dc=all): 'After maintenance db1110 ([[phab:T277354|T277354]])', diff saved to https://phabricator.wikimedia.org/P18014 and previous config saved to /var/cache/conftool/dbconfig/20211203-082848-marostegui.json
* 18:42 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 08:13 marostegui@cumin1001: dbctl commit (dc=all): 'After maintenance db1110', diff saved to https://phabricator.wikimedia.org/P18013 and previous config saved to /var/cache/conftool/dbconfig/20211203-081343-marostegui.json
* 18:39 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 07:58 marostegui@cumin1001: dbctl commit (dc=all): 'After maintenance db1110', diff saved to https://phabricator.wikimedia.org/P18012 and previous config saved to /var/cache/conftool/dbconfig/20211203-075839-marostegui.json
* 18:39 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 07:43 marostegui@cumin1001: dbctl commit (dc=all): 'After maintenance db1110 ([[phab:T277354|T277354]])', diff saved to https://phabricator.wikimedia.org/P18011 and previous config saved to /var/cache/conftool/dbconfig/20211203-074334-marostegui.json
* 18:36 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 07:39 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1110 ([[phab:T277354|T277354]])', diff saved to https://phabricator.wikimedia.org/P18010 and previous config saved to /var/cache/conftool/dbconfig/20211203-073910-marostegui.json
* 18:36 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 07:39 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1110.eqiad.wmnet with reason: Maintenance [[phab:T277354|T277354]]
* 18:36 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 07:39 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db1110.eqiad.wmnet with reason: Maintenance [[phab:T277354|T277354]]
* 18:36 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 07:34 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1150.eqiad.wmnet with reason: Maintenance [[phab:T277354|T277354]]
* 18:32 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 07:34 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db1150.eqiad.wmnet with reason: Maintenance [[phab:T277354|T277354]]
* 18:32 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 07:34 marostegui@cumin1001: dbctl commit (dc=all): 'After maintenance db1144:3315 ([[phab:T277354|T277354]])', diff saved to https://phabricator.wikimedia.org/P18009 and previous config saved to /var/cache/conftool/dbconfig/20211203-073404-marostegui.json
* 18:13 Urbanecm: Morning B&C window is done
* 07:19 marostegui@cumin1001: dbctl commit (dc=all): 'After maintenance db1144:3315', diff saved to https://phabricator.wikimedia.org/P18008 and previous config saved to /var/cache/conftool/dbconfig/20211203-071900-marostegui.json
* 18:13 urbanecm@deploy1001: Synchronized php-1.36.0-wmf.2/extensions/DiscussionTools/: {{Gerrit|00ecec80d12a34977d55dd09bce0c5a1aab369f9}}: Revert new reply API for now ([[phab:T252558|T252558]]) (duration: 01m 06s)
* 07:03 marostegui@cumin1001: dbctl commit (dc=all): 'After maintenance db1144:3315', diff saved to https://phabricator.wikimedia.org/P18007 and previous config saved to /var/cache/conftool/dbconfig/20211203-070355-marostegui.json
* 18:11 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|d54f041be6508b641eec08e25287d280374cc863}}: Enable Translate extension at plwikimedia ([[phab:T259087|T259087]]) (duration: 01m 08s)
* 06:48 marostegui@cumin1001: dbctl commit (dc=all): 'After maintenance db1144:3315 ([[phab:T277354|T277354]])', diff saved to https://phabricator.wikimedia.org/P18006 and previous config saved to /var/cache/conftool/dbconfig/20211203-064850-marostegui.json
* 18:07 urbanecm@deploy1001: Synchronized dblists/visualeditor-nondefault.dblist: {{Gerrit|a237f5b40c3662c0f08398abeeaadba61d7462f8}}: Move VisualEditor from beta to default on enwikiversity ([[phab:T258992|T258992]]) (duration: 01m 06s)
* 06:28 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1125.eqiad.wmnet with OS bullseye
* 18:05 Urbanecm: Create tables for Translate extension in plwikimedia ([[phab:T259087|T259087]])
* 06:20 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1144:3315 ([[phab:T277354|T277354]])', diff saved to https://phabricator.wikimedia.org/P18005 and previous config saved to /var/cache/conftool/dbconfig/20211203-062019-marostegui.json
* 18:05 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 06:20 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1144.eqiad.wmnet with reason: Maintenance [[phab:T277354|T277354]]
* 18:03 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 06:20 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db1144.eqiad.wmnet with reason: Maintenance [[phab:T277354|T277354]]
* 18:03 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 06:20 marostegui@cumin1001: dbctl commit (dc=all): 'After maintenance db1113:3315 ([[phab:T277354|T277354]])', diff saved to https://phabricator.wikimedia.org/P18004 and previous config saved to /var/cache/conftool/dbconfig/20211203-062011-marostegui.json
* 18:01 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 06:05 marostegui@cumin1001: dbctl commit (dc=all): 'After maintenance db1113:3315', diff saved to https://phabricator.wikimedia.org/P18003 and previous config saved to /var/cache/conftool/dbconfig/20211203-060506-marostegui.json
* 18:01 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 06:02 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host db1125.eqiad.wmnet with OS bullseye
* 18:01 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 05:50 marostegui@cumin1001: dbctl commit (dc=all): 'After maintenance db1113:3315', diff saved to https://phabricator.wikimedia.org/P18002 and previous config saved to /var/cache/conftool/dbconfig/20211203-055001-marostegui.json
* 17:50 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=wtp2003.codfw.wmnet
* 05:34 marostegui@cumin1001: dbctl commit (dc=all): 'After maintenance db1113:3315 ([[phab:T277354|T277354]])', diff saved to https://phabricator.wikimedia.org/P18001 and previous config saved to /var/cache/conftool/dbconfig/20211203-053457-marostegui.json
* 17:50 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=wtp2002.codfw.wmnet
* 05:30 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1113:3315 ([[phab:T277354|T277354]])', diff saved to https://phabricator.wikimedia.org/P18000 and previous config saved to /var/cache/conftool/dbconfig/20211203-053032-marostegui.json
* 17:21 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 05:30 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1113.eqiad.wmnet with reason: Maintenance [[phab:T277354|T277354]]
* 17:21 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 05:30 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db1113.eqiad.wmnet with reason: Maintenance [[phab:T277354|T277354]]
* 17:19 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 01:39 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host restbase2025.codfw.wmnet with OS buster
* 17:16 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 01:06 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host restbase2025.codfw.wmnet with OS buster
* 17:16 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 01:05 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host restbase2024.codfw.wmnet with OS buster
* 17:16 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 01:01 tgr: UTC late deploys done
* 16:47 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 01:00 tgr@deploy1002: Synchronized php-1.38.0-wmf.9/extensions/GrowthExperiments: Backport: [[gerrit:743177{{!}}Add an image: Add test version of GEInfoboxTemplates (T291232)]] (duration: 00m 57s)
* 16:45 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 00:44 rzl: rzl@apt1001:~$ sudo -i reprepro -C main include buster-wikimedia /home/rzl/python3-imagecatalog/imagecatalog_0.0.1-1_amd64.changes
* 16:45 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 00:37 tgr@deploy1002: Synchronized php-1.38.0-wmf.9/extensions/GrowthExperiments/includes: Backport: [[gerrit:743178{{!}}Avoid references to TemplateCollectionFeature]] step2 (duration: 00m 56s)
* 16:43 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 00:36 tgr@deploy1002: Synchronized php-1.38.0-wmf.9/extensions/GrowthExperiments/includes/Config/Validation/GrowthConfigValidation.php: Backport: [[gerrit:743178{{!}}Avoid references to TemplateCollectionFeature]] step 1 (duration: 00m 56s)
* 16:43 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 00:33 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host restbase2024.codfw.wmnet with OS buster
* 16:43 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 16:25 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 16:24 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 16:24 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 16:24 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 16:21 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 16:19 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 16:18 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 16:16 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 16:16 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 16:16 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 16:16 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 16:16 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 16:15 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 16:15 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 16:02 volans@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 15:48 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 617167: Revert "Set muswiki to read only" {{!}} https://gerrit.wikimedia.org/r/c/operations/mediawiki-config/+/617167 ([[phab:T259004|T259004]]) (duration: 01m 06s)
* 15:44 volans@cumin1001: START - Cookbook sre.dns.netbox
* 15:33 liw@deploy1001: rebuilt and synchronized wikiversions files: Revert "group[0{{!}}1] wikis to 1.36.0-wmf.1"
* 15:24 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 617152: Set muswiki to read only {{!}} https://gerrit.wikimedia.org/r/c/operations/mediawiki-config/+/617152 ([[phab:T259004|T259004]]) (duration: 01m 08s)
* 15:10 jayme: imported docker-report_0.0.8-1 to buster-wikimedia
* 14:49 moritzm: installing ruby-json security updates
* 14:34 pt1979@cumin2001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 14:30 jbond42: install curl security update for jessie
* 14:29 moritzm: installing exiv2 security updates
* 14:27 pt1979@cumin2001: START - Cookbook sre.dns.netbox
* 13:55 volans: migrating *all* codfw mgmt DNS records to the autogenerated ones via Netbox - [[phab:T233183|T233183]]
* 13:50 volans@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 13:45 volans@cumin1001: START - Cookbook sre.dns.netbox
* 13:29 jayme@cumin1001: conftool action : set/pooled=yes; selector: name=wtp2001.codfw.wmnet
* 13:05 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 13:05 liw@deploy1001: Synchronized php: group1 wikis to 1.36.0-wmf.2 (duration: 01m 07s)
* 13:04 liw@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.36.0-wmf.2
* 13:03 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime
* 12:58 volans@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
* 12:56 volans@cumin1001: START - Cookbook sre.dns.netbox
* 12:49 cmjohnson@cumin1001: END (ERROR) - Cookbook sre.dns.netbox (exit_code=97)
* 12:48 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 12:44 moritzm: imported curl 7.38.0-4+deb8u16+wmf1 to apt.wikimedia.org (jessie-wikimedia) [[phab:T259102|T259102]]
* 12:30 urbanecm@deploy1001: Synchronized wmf-config/interwiki.php: Update interwiki cache (duration: 02m 21s)
* 12:28 urbanecm@deploy1001: Synchronized langlist: Creating avkwiki ([[phab:T257943|T257943]]) (duration: 01m 05s)
* 12:27 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Creating avkwiki ([[phab:T257943|T257943]]) (duration: 01m 03s)
* 12:26 urbanecm@deploy1001: Synchronized static/images/project-logos/: Creating avkwiki ([[phab:T257943|T257943]]) (duration: 01m 06s)
* 12:24 urbanecm@deploy1001: rebuilt and synchronized wikiversions files: Creating avkwiki ([[phab:T257943|T257943]])
* 12:15 urbanecm@deploy1001: Synchronized dblists: Creating avkwiki ([[phab:T257943|T257943]]) (duration: 01m 06s)
* 12:14 urbanecm@deploy1001: Synchronized wmf-config/db-codfw.php: Creating avkwiki ([[phab:T257943|T257943]]) (duration: 01m 06s)
* 12:12 urbanecm@deploy1001: Synchronized wmf-config/db-eqiad.php: Creating avkwiki ([[phab:T257943|T257943]]) (duration: 01m 05s)
* 12:09 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 12:07 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
* 12:07 moritzm: rebooting idp2001 for kernel update
* 11:41 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|252bb6c1bf83d96a14a0ef63e06eb544eef8a00b}}: Add Wikipedia wordmark for trwiki ([[phab:T255489|T255489]]; sync 2/2) (duration: 01m 05s)
* 11:39 urbanecm@deploy1001: Synchronized static/images/mobile/copyright/wikipedia-wordmark-tr.svg: {{Gerrit|252bb6c1bf83d96a14a0ef63e06eb544eef8a00b}}: Add Wikipedia wordmark for trwiki ([[phab:T255489|T255489]]; sync 1/2) (duration: 01m 06s)
* 11:36 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|9f7e03292941d0d782437862f406efa7e1c6463e}}: Fix overindentation (duration: 01m 08s)
* 11:11 Lucas_WMDE: EU B&C window done
* 11:09 Lucas_WMDE: lucaswerkmeister-wmde@mwmaint1002:~$ printf 'https://en.wikipedia.org/static/images/project-logos/%s\n' 'wuuwiki.png' 'wuuwiki-1.5x.png' 'wuuwiki-2x.png' {{!}} mwscript purgeList.php # [[phab:T259005|T259005]]
* 11:08 lucaswerkmeister-wmde@deploy1001: Synchronized static/images/project-logos/: Config: [[gerrit:616760{{!}}Change the logo for Wu Wikipedia (T259005)]] (duration: 01m 08s)
* 10:40 vgutierrez: rolling upgrade of ATS to version 8.0.8-1wm2
* 10:21 tstarling@deploy1001: Synchronized php-1.36.0-wmf.1/extensions/Score/includes/Score.php: do not offer .ly downloads (duration: 01m 07s)
* 10:19 tstarling@deploy1001: Synchronized php-1.36.0-wmf.1/extensions/Score/extension.json: do not offer .ly downloads (duration: 01m 20s)
* 10:12 vgutierrez: upgrade ATS to version 8.0.8-1wm2 on cp3064 and cp3065
* 09:44 vgutierrez: upgrade ATS to version 8.0.8-1wm2 on cp5006 and cp5012
* 09:20 jayme@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 09:20 jayme@cumin1001: START - Cookbook sre.hosts.downtime
* 09:16 vgutierrez: upgrade ATS to version 8.0.8-1wm2 on cp4026 and cp4032
* 09:15 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1112', diff saved to https://phabricator.wikimedia.org/P12115 and previous config saved to /var/cache/conftool/dbconfig/20200729-091528-marostegui.json
* 09:15 vgutierrez: upload trafficserver 8.0.8-1wm2 to apt.wm.o (buster)
* 09:13 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1112', diff saved to https://phabricator.wikimedia.org/P12114 and previous config saved to /var/cache/conftool/dbconfig/20200729-091319-marostegui.json
* 09:10 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1112', diff saved to https://phabricator.wikimedia.org/P12113 and previous config saved to /var/cache/conftool/dbconfig/20200729-091006-marostegui.json
* 08:55 marostegui: The above was db1112
* 08:55 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1121', diff saved to https://phabricator.wikimedia.org/P12112 and previous config saved to /var/cache/conftool/dbconfig/20200729-085504-marostegui.json
* 08:42 jayme@cumin1001: conftool action : set/pooled=inactive; selector: name=wtp2001.codfw.wmnet
* 08:26 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 08:24 filippo@cumin1001: START - Cookbook sre.hosts.downtime
* 08:05 marostegui: Deploy MCR schema change on db1121 (lag will show up on s4), also remove triggers on db1124:3314
* 08:04 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1121', diff saved to https://phabricator.wikimedia.org/P12111 and previous config saved to /var/cache/conftool/dbconfig/20200729-080442-marostegui.json
* 08:03 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1141', diff saved to https://phabricator.wikimedia.org/P12110 and previous config saved to /var/cache/conftool/dbconfig/20200729-080318-marostegui.json
* 07:55 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1141', diff saved to https://phabricator.wikimedia.org/P12109 and previous config saved to /var/cache/conftool/dbconfig/20200729-075558-marostegui.json
* 07:48 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1141', diff saved to https://phabricator.wikimedia.org/P12108 and previous config saved to /var/cache/conftool/dbconfig/20200729-074828-marostegui.json
* 07:44 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1141', diff saved to https://phabricator.wikimedia.org/P12107 and previous config saved to /var/cache/conftool/dbconfig/20200729-074414-marostegui.json
* 06:50 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 06:48 marostegui@cumin1001: START - Cookbook sre.hosts.downtime
* 06:26 XioNoX: standardize mr1-eqiad interfaces
* 06:22 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1112', diff saved to https://phabricator.wikimedia.org/P12106 and previous config saved to /var/cache/conftool/dbconfig/20200729-062224-marostegui.json
* 06:20 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1078', diff saved to https://phabricator.wikimedia.org/P12105 and previous config saved to /var/cache/conftool/dbconfig/20200729-062009-marostegui.json
* 06:16 XioNoX: standardize mr1-codfw interfaces
* 06:14 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1078', diff saved to https://phabricator.wikimedia.org/P12104 and previous config saved to /var/cache/conftool/dbconfig/20200729-061450-marostegui.json
* 06:05 XioNoX: standardize mr1-ulsfo interfaces
* 06:01 legoktm: ssh doc1001.eqiad.wmnet sudo -u doc-uploader git -C /srv/docroot pull
* 05:52 XioNoX: standardize mr1-eqsin interfaces
* 05:45 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 05:43 marostegui@cumin1001: START - Cookbook sre.hosts.downtime
* 05:03 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1078', diff saved to https://phabricator.wikimedia.org/P12103 and previous config saved to /var/cache/conftool/dbconfig/20200729-050346-marostegui.json
* 05:02 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1141', diff saved to https://phabricator.wikimedia.org/P12102 and previous config saved to /var/cache/conftool/dbconfig/20200729-050247-marostegui.json
* 05:02 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1142', diff saved to https://phabricator.wikimedia.org/P12101 and previous config saved to /var/cache/conftool/dbconfig/20200729-050204-marostegui.json
* 04:58 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1142', diff saved to https://phabricator.wikimedia.org/P12100 and previous config saved to /var/cache/conftool/dbconfig/20200729-045859-marostegui.json
* 02:19 tstarling@deploy1001: Synchronized wmf-config/CommonSettings.php: re-enable lilypond in safe mode (duration: 01m 09s)
* 01:47 tstarling@deploy1001: Synchronized php-1.36.0-wmf.2/extensions/Score/includes/Score.php: work around firejail bug (duration: 01m 07s)
* 01:45 tstarling@deploy1001: Synchronized php-1.36.0-wmf.1/extensions/Score/includes/Score.php: work around firejail bug (duration: 01m 08s)
* 01:19 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=wtp1048.eqiad.wmnet
* 01:15 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=wtp1047.eqiad.wmnet
* 00:53 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=wtp1046.eqiad.wmnet
* 00:48 ryankemper: sudo -E cumin -b 10 'A:wdqs-all' 'sudo run-puppet-agent'
* 00:18 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 00:14 dzahn@cumin1001: START - Cookbook sre.hosts.downtime


== 2020-07-28 ==
== 2021-12-02 ==
* 23:38 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 20:05 legoktm: re-pooling mw1414 following testing
* 23:37 ebernhardson@deploy1001: Synchronized wmf-config/InitialiseSettings.php: cirrus: reduce mlr window size on enwiki (duration: 01m 05s)
* 19:35 legoktm: installing yaml PHP extension on canaries
* 23:36 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 19:29 andrewbogott: upgrading wikitech-static deb packages as well as moving to mediawiki 1.37.0
* 23:34 ebernhardson@deploy1001: Synchronized wmf-config/CirrusSearch-common.php: cirrus: reduce mlr window size on enwiki (duration: 01m 06s)
* 19:26 majavah: UTC evening deploys done
* 23:31 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 19:26 taavi@deploy1002: Synchronized php-1.38.0-wmf.9/extensions/WikimediaEvents/modules/ext.wikimediaEvents/webUIScroll.js: Backport: [[gerrit:743227{{!}}Update scroll instrument (T294246)]] (duration: 00m 56s)
* 23:29 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 19:22 taavi@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:720363{{!}}Drop old config names for CentralAuth denylist controls (T277932)]] (duration: 00m 56s)
* 23:04 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Remove unused setting $wgGEHomepageSuggestedEditsNewAccountInitiatedPercentage (no-op) (duration: 01m 06s)
* 19:12 taavi@deploy1002: Synchronized wmf-config: Config: [[gerrit:739032{{!}}GrowthExperiments configuration fixes (T294737)]] (duration: 00m 57s)
* 22:55 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=wtp1046.eqiad.wmnet
* 18:56 legoktm: upgraded scap to 4.1.0 on A:mw-canary, A:parsoid-canary, A:mw-jobrunner-canary ([[phab:T296867|T296867]])
* 22:19 jayme@cumin1001: conftool action : set/pooled=yes; selector: name=wtp1044.eqiad.wmnet
* 18:45 legoktm: uploaded scap 4.1.0 to apt.wm.o ([[phab:T296867|T296867]])
* 21:27 dancy@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'blubberoid' for release 'production' .
* 18:22 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1028.eqiad.wmnet with OS buster
* 21:24 dancy@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'blubberoid' for release 'production' .
* 18:19 vgutierrez: re-enable puppet on cp3064 - [[phab:T296874|T296874]]
* 21:17 dancy@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'blubberoid' for release 'staging' .
* 18:14 hoo: Started Wikibase rebuildItemsPerSite on mwmaint1002 for wikidatawiki. Can be killed at any time, if necessary.
* 20:38 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 17:51 vgutierrez: puppet disabled on cp3064 to manually increase number of maxconns in HAProxy - [[phab:T296874|T296874]]
* 20:36 andrew@cumin1001: START - Cookbook sre.hosts.downtime
* 17:38 ryankemper: [WDQS] Merged https://gerrit.wikimedia.org/r/c/operations/puppet/+/743216/; as a result of the fix `'-Dwdqs.throttling-filter.time-bucket-capacity-in-seconds=240', '-Dwdqs.throttling-filter.time-bucket-refill-amount-in-seconds=120', '-Dwdqs.throttling-filter.ban-duration-in-minutes=60'` will now be in the `extra_jvm_opts` for `wdqs-internal` hosts
* 20:02 eileen: process-control config revision is {{Gerrit|b6ece03513}}
* 15:38 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on ganeti2022.codfw.wmnet with reason: Temporarily remove node from Ganeti for reimage
* 19:50 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 15:38 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on ganeti2022.codfw.wmnet with reason: Temporarily remove node from Ganeti for reimage
* 19:48 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 14:51 marostegui@cumin1001: dbctl commit (dc=all): 'After maintenance db2137:3315 ([[phab:T277354|T277354]])', diff saved to https://phabricator.wikimedia.org/P17997 and previous config saved to /var/cache/conftool/dbconfig/20211202-145151-marostegui.json
* 19:48 andrew@cumin1001: START - Cookbook sre.hosts.downtime
* 14:36 marostegui@cumin1001: dbctl commit (dc=all): 'After maintenance db2137:3315 ([[phab:T277354|T277354]])', diff saved to https://phabricator.wikimedia.org/P17996 and previous config saved to /var/cache/conftool/dbconfig/20211202-143646-marostegui.json
* 19:48 andrew@cumin1001: START - Cookbook sre.hosts.downtime
* 14:21 marostegui@cumin1001: dbctl commit (dc=all): 'After maintenance db2137:3315 ([[phab:T277354|T277354]])', diff saved to https://phabricator.wikimedia.org/P17995 and previous config saved to /var/cache/conftool/dbconfig/20211202-142141-marostegui.json
* 19:25 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
* 14:06 marostegui@cumin1001: dbctl commit (dc=all): 'After maintenance db2137:3315 ([[phab:T277354|T277354]])', diff saved to https://phabricator.wikimedia.org/P17994 and previous config saved to /var/cache/conftool/dbconfig/20211202-140636-marostegui.json
* 19:25 andrew@cumin1001: START - Cookbook sre.hosts.decommission
* 14:06 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2137:3315 ([[phab:T277354|T277354]])', diff saved to https://phabricator.wikimedia.org/P17993 and previous config saved to /var/cache/conftool/dbconfig/20211202-140557-marostegui.json
* 19:24 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
* 14:05 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2137.codfw.wmnet with reason: Maintenance [[phab:T277354|T277354]]
* 19:24 andrew@cumin1001: START - Cookbook sre.hosts.decommission
* 14:05 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db2137.codfw.wmnet with reason: Maintenance [[phab:T277354|T277354]]
* 19:24 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
* 14:05 marostegui@cumin1001: dbctl commit (dc=all): 'After maintenance db2128 ([[phab:T277354|T277354]])', diff saved to https://phabricator.wikimedia.org/P17992 and previous config saved to /var/cache/conftool/dbconfig/20211202-140548-marostegui.json
* 19:23 andrew@cumin1001: START - Cookbook sre.hosts.decommission
* 13:50 marostegui@cumin1001: dbctl commit (dc=all): 'After maintenance db2128 ([[phab:T277354|T277354]])', diff saved to https://phabricator.wikimedia.org/P17990 and previous config saved to /var/cache/conftool/dbconfig/20211202-135043-marostegui.json
* 19:19 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1142', diff saved to https://phabricator.wikimedia.org/P12097 and previous config saved to /var/cache/conftool/dbconfig/20200728-191926-marostegui.json
* 13:49 hnowlan: roll-restarting tilerator,tileratorui,kartotherian in eqiad
* 19:12 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1147', diff saved to https://phabricator.wikimedia.org/P12096 and previous config saved to /var/cache/conftool/dbconfig/20200728-191237-marostegui.json
* 13:37 hnowlan: roll-restarting tilerator,tileratorui,kartotherian in codfw
* 19:11 ebernhardson@deploy1001: Finished deploy [wikimedia/discovery/analytics@69bbbbb]: airflow: drop_old_data_daily: top_queries table renamed to fulltext_head_queries (duration: 00m 53s)
* 13:35 marostegui@cumin1001: dbctl commit (dc=all): 'After maintenance db2128 ([[phab:T277354|T277354]])', diff saved to https://phabricator.wikimedia.org/P17989 and previous config saved to /var/cache/conftool/dbconfig/20211202-133538-marostegui.json
* 19:11 ebernhardson@deploy1001: Started deploy [wikimedia/discovery/analytics@69bbbbb]: airflow: drop_old_data_daily: top_queries table renamed to fulltext_head_queries
* 13:20 marostegui@cumin1001: dbctl commit (dc=all): 'After maintenance db2128 ([[phab:T277354|T277354]])', diff saved to https://phabricator.wikimedia.org/P17988 and previous config saved to /var/cache/conftool/dbconfig/20211202-132034-marostegui.json
* 19:09 jhuneidi@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'blubberoid' for release 'production' .
* 13:20 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2128 ([[phab:T277354|T277354]])', diff saved to https://phabricator.wikimedia.org/P17987 and previous config saved to /var/cache/conftool/dbconfig/20211202-131959-marostegui.json
* 19:09 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1147', diff saved to https://phabricator.wikimedia.org/P12095 and previous config saved to /var/cache/conftool/dbconfig/20200728-190933-marostegui.json
* 13:20 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db[2094,2128].codfw.wmnet with reason: Maintenance [[phab:T277354|T277354]]
* 19:06 jhuneidi@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'blubberoid' for release 'production' .
* 13:19 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db[2094,2128].codfw.wmnet with reason: Maintenance [[phab:T277354|T277354]]
* 19:05 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1147', diff saved to https://phabricator.wikimedia.org/P12094 and previous config saved to /var/cache/conftool/dbconfig/20200728-190517-marostegui.json
* 13:19 marostegui@cumin1001: dbctl commit (dc=all): 'After maintenance db2113 ([[phab:T277354|T277354]])', diff saved to https://phabricator.wikimedia.org/P17986 and previous config saved to /var/cache/conftool/dbconfig/20211202-131949-marostegui.json
* 19:03 jhuneidi@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'blubberoid' for release 'staging' .
* 13:04 marostegui@cumin1001: dbctl commit (dc=all): 'After maintenance db2113 ([[phab:T277354|T277354]])', diff saved to https://phabricator.wikimedia.org/P17985 and previous config saved to /var/cache/conftool/dbconfig/20211202-130444-marostegui.json
* 19:01 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1147', diff saved to https://phabricator.wikimedia.org/P12093 and previous config saved to /var/cache/conftool/dbconfig/20200728-190137-marostegui.json
* 12:49 marostegui@cumin1001: dbctl commit (dc=all): 'After maintenance db2113 ([[phab:T277354|T277354]])', diff saved to https://phabricator.wikimedia.org/P17983 and previous config saved to /var/cache/conftool/dbconfig/20211202-124940-marostegui.json
* 18:35 cdanis: ✔️ cdanis@lvs1015.eqiad.wmnet ~ 🕝☕ sudo ipvsadm -D -t 10.2.2.51:9283
* 12:34 marostegui@cumin1001: dbctl commit (dc=all): 'After maintenance db2113 ([[phab:T277354|T277354]])', diff saved to https://phabricator.wikimedia.org/P17982 and previous config saved to /var/cache/conftool/dbconfig/20211202-123435-marostegui.json
* 18:29 cdanis: ❌cdanis@lvs1016.eqiad.wmnet ~ 🕝☕ sudo ipvsadm -D -t 10.2.2.51:9283
* 12:34 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2113 ([[phab:T277354|T277354]])', diff saved to https://phabricator.wikimedia.org/P17981 and previous config saved to /var/cache/conftool/dbconfig/20211202-123356-marostegui.json
* 18:29 catrope@deploy1001: Synchronized php-1.36.0-wmf.2/extensions/GrowthExperiments/extension.json: Fix reference to MentorChangeLogFormatter ([[phab:T259041|T259041]]) (duration: 01m 05s)
* 12:33 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2113.codfw.wmnet with reason: Maintenance [[phab:T277354|T277354]]
* 18:20 catrope@deploy1001: Synchronized wmf-config/CommonSettings.php: No-op sync for wmgUseWikimediaApiPortal and wmgUseWikimediaApiPortalOAuth (2 of 2) (duration: 00m 58s)
* 12:33 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db2113.codfw.wmnet with reason: Maintenance [[phab:T277354|T277354]]
* 18:17 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: No-op sync for wmgUseWikimediaApiPortal and wmgUseWikimediaApiPortalOAuth (1 of 2) (duration: 01m 05s)
* 12:33 marostegui@cumin1001: dbctl commit (dc=all): 'After maintenance db2111 ([[phab:T277354|T277354]])', diff saved to https://phabricator.wikimedia.org/P17980 and previous config saved to /var/cache/conftool/dbconfig/20211202-123348-marostegui.json
* 18:16 cdanis: primary pybal restart ✔️ cdanis@lvs1015.eqiad.wmnet ~ 🕑☕ sudo systemctl restart pybal.service
* 12:31 moritzm: installing NSS security updates
* 18:14 cdanis: backup pybal restart: ✔️ cdanis@lvs1016.eqiad.wmnet ~ 🕑☕ sudo systemctl restart pybal.service
* 12:27 Lucas_WMDE: UTC morning backport+config window done
* 18:10 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 12:23 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:743116{{!}}Wikisource: enable proofreading change-tagging for all Wikisources (T289140)]] (duration: 00m 57s)
* 18:05 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 12:18 marostegui@cumin1001: dbctl commit (dc=all): 'After maintenance db2111 ([[phab:T277354|T277354]])', diff saved to https://phabricator.wikimedia.org/P17979 and previous config saved to /var/cache/conftool/dbconfig/20211202-121843-marostegui.json
* 18:05 catrope@deploy1001: Synchronized php-1.36.0-wmf.2/includes/libs/filebackend/SwiftFileBackend.php: Fix index error in SwiftFileBackend ([[phab:T259023|T259023]]) (duration: 01m 07s)
* 12:12 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ganeti2009.codfw.wmnet with OS buster
* 17:46 volker-e@deploy1001: Finished deploy [design/style-guide@e3fda83]: Deploy design/style-guide: (duration: 00m 05s)
* 12:03 marostegui@cumin1001: dbctl commit (dc=all): 'After maintenance db2111 ([[phab:T277354|T277354]])', diff saved to https://phabricator.wikimedia.org/P17978 and previous config saved to /var/cache/conftool/dbconfig/20211202-120338-marostegui.json
* 17:46 volker-e@deploy1001: Started deploy [design/style-guide@e3fda83]: Deploy design/style-guide:
* 12:03 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti2009.codfw.wmnet with OS buster
* 17:41 volans: run apt-get clean on  wtp[1046,1048].eqiad.wmnet and wtp2001.codfw.wmnet to free ~`2GB as they were 100% - [[phab:T258775|T258775]]
* 11:48 marostegui@cumin1001: dbctl commit (dc=all): 'After maintenance db2111 ([[phab:T277354|T277354]])', diff saved to https://phabricator.wikimedia.org/P17977 and previous config saved to /var/cache/conftool/dbconfig/20211202-114833-marostegui.json
* 17:33 XioNoX: standardize mr1-esams interfaces
* 11:47 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2111 ([[phab:T277354|T277354]])', diff saved to https://phabricator.wikimedia.org/P17976 and previous config saved to /var/cache/conftool/dbconfig/20211202-114755-marostegui.json
* 17:30 brennen@deploy1001: sync aborted: (no justification provided) (duration: 28m 53s)
* 11:47 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2111.codfw.wmnet with reason: Maintenance [[phab:T277354|T277354]]
* 17:03 brennen: prior scap sync for https://gerrit.wikimedia.org/r/c/mediawiki/core/+/616842 ([[phab:T259023|T259023]])
* 11:47 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db2111.codfw.wmnet with reason: Maintenance [[phab:T277354|T277354]]
* 17:02 brennen@deploy1001: Started scap: (no justification provided)
* 11:47 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2101.codfw.wmnet with reason: Maintenance [[phab:T277354|T277354]]
* 16:51 ebernhardson@deploy1001: Finished deploy [wikimedia/discovery/analytics@0982d4e]: convert_to_esbulk: repair variable ref before assign (duration: 04m 33s)
* 11:47 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db2101.codfw.wmnet with reason: Maintenance [[phab:T277354|T277354]]
* 16:49 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 11:47 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 16:47 andrew@cumin1001: START - Cookbook sre.hosts.downtime
* 11:47 marostegui@cumin1001: dbctl commit (dc=all): 'After maintenance db2089:3315 ([[phab:T277354|T277354]])', diff saved to https://phabricator.wikimedia.org/P17975 and previous config saved to /var/cache/conftool/dbconfig/20211202-114711-marostegui.json
* 16:47 ebernhardson@deploy1001: Started deploy [wikimedia/discovery/analytics@0982d4e]: convert_to_esbulk: repair variable ref before assign
* 11:32 marostegui@cumin1001: dbctl commit (dc=all): 'After maintenance db2089:3315 ([[phab:T277354|T277354]])', diff saved to https://phabricator.wikimedia.org/P17974 and previous config saved to /var/cache/conftool/dbconfig/20211202-113206-marostegui.json
* 16:45 XioNoX: remove mr1-codfw source NAT (not used)
* 11:28 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 16:43 jayme@cumin1001: conftool action : set/pooled=yes; selector: name=wtp1045.eqiad.wmnet
* 11:21 moritzm: draining primary/secondary instances off ganeti2022 [[phab:T296622|T296622]]
* 16:39 ppchelko@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' .
* 11:17 marostegui@cumin1001: dbctl commit (dc=all): 'After maintenance db2089:3315 ([[phab:T277354|T277354]])', diff saved to https://phabricator.wikimedia.org/P17973 and previous config saved to /var/cache/conftool/dbconfig/20211202-111702-marostegui.json
* 16:36 ppchelko@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' .
* 11:01 marostegui@cumin1001: dbctl commit (dc=all): 'After maintenance db2089:3315 ([[phab:T277354|T277354]])', diff saved to https://phabricator.wikimedia.org/P17972 and previous config saved to /var/cache/conftool/dbconfig/20211202-110157-marostegui.json
* 16:34 ppchelko@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'staging' .
* 11:01 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2089:3315 ([[phab:T277354|T277354]])', diff saved to https://phabricator.wikimedia.org/P17971 and previous config saved to /var/cache/conftool/dbconfig/20211202-110120-marostegui.json
* 16:33 jayme@cumin1001: conftool action : set/pooled=yes; selector: name=wtp1035.eqiad.wmnet
* 11:01 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2089.codfw.wmnet with reason: Maintenance [[phab:T277354|T277354]]
* 16:32 jayme@cumin1001: conftool action : set/pooled=yes; selector: name=wtp1034.eqiad.wmnet
* 11:01 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db2089.codfw.wmnet with reason: Maintenance [[phab:T277354|T277354]]
* 16:31 XioNoX: mr1-eqiad# delete security nat source rule-set mgmt-to-untrust  (unused, no matching ACL)
* 11:01 marostegui@cumin1001: dbctl commit (dc=all): 'After maintenance db2075 ([[phab:T277354|T277354]])', diff saved to https://phabricator.wikimedia.org/P17970 and previous config saved to /var/cache/conftool/dbconfig/20211202-110110-marostegui.json
* 16:25 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 10:46 marostegui@cumin1001: dbctl commit (dc=all): 'After maintenance db2075 ([[phab:T277354|T277354]])', diff saved to https://phabricator.wikimedia.org/P17969 and previous config saved to /var/cache/conftool/dbconfig/20211202-104606-marostegui.json
* 16:23 andrew@cumin1001: START - Cookbook sre.hosts.downtime
* 10:31 marostegui@cumin1001: dbctl commit (dc=all): 'After maintenance db2075 ([[phab:T277354|T277354]])', diff saved to https://phabricator.wikimedia.org/P17968 and previous config saved to /var/cache/conftool/dbconfig/20211202-103100-marostegui.json
* 16:21 hnowlan: imported envoyproxy 1.15.0-1 deb into component/envoy-future for buster-wikimedia
* 10:15 marostegui@cumin1001: dbctl commit (dc=all): 'After maintenance db2075 ([[phab:T277354|T277354]])', diff saved to https://phabricator.wikimedia.org/P17967 and previous config saved to /var/cache/conftool/dbconfig/20211202-101555-marostegui.json
* 16:11 jayme@cumin1001: conftool action : set/pooled=yes; selector: name=wtp1042.eqiad.wmnet
* 10:15 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2075 ([[phab:T277354|T277354]])', diff saved to https://phabricator.wikimedia.org/P17966 and previous config saved to /var/cache/conftool/dbconfig/20211202-101522-marostegui.json
* 16:09 jayme@cumin1001: conftool action : set/pooled=yes; selector: name=wtp1043.eqiad.wmnet
* 10:15 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2075.codfw.wmnet with reason: Maintenance [[phab:T277354|T277354]]
* 15:54 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 10:15 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db2075.codfw.wmnet with reason: Maintenance [[phab:T277354|T277354]]
* 15:52 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 10:05 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 9 hosts with reason: Maintenance [[phab:T277354|T277354]]
* 15:51 jayme@cumin1001: START - Cookbook sre.hosts.downtime
* 10:05 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 9 hosts with reason: Maintenance [[phab:T277354|T277354]]
* 15:50 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 10:03 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance [[phab:T277354|T277354]]
* 15:48 andrew@cumin1001: START - Cookbook sre.hosts.downtime
* 10:03 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance [[phab:T277354|T277354]]
* 15:48 jayme@cumin1001: START - Cookbook sre.hosts.downtime
* 10:03 marostegui@cumin1001: dbctl commit (dc=all): 'After maintenance db1096:3315 ([[phab:T277354|T277354]])', diff saved to https://phabricator.wikimedia.org/P17964 and previous config saved to /var/cache/conftool/dbconfig/20211202-100307-marostegui.json
* 15:45 jayme@cumin1001: conftool action : set/pooled=no; selector: name=wtp1035.*
* 09:52 moritzm: draining primary/secondary instances off ganeti2009 [[phab:T296622|T296622]]
* 15:44 jayme@cumin1001: conftool action : set/pooled=no; selector: name=wtp1034.*
* 09:48 marostegui@cumin1001: dbctl commit (dc=all): 'After maintenance db1096:3315 ([[phab:T277354|T277354]])', diff saved to https://phabricator.wikimedia.org/P17963 and previous config saved to /var/cache/conftool/dbconfig/20211202-094802-marostegui.json
* 15:35 ayounsi@deploy1001: Finished deploy [homer/deploy@5e999c8]: once more (duration: 03m 06s)
* 09:32 marostegui@cumin1001: dbctl commit (dc=all): 'After maintenance db1096:3315 ([[phab:T277354|T277354]])', diff saved to https://phabricator.wikimedia.org/P17962 and previous config saved to /var/cache/conftool/dbconfig/20211202-093257-marostegui.json
* 15:32 ayounsi@deploy1001: Started deploy [homer/deploy@5e999c8]: once more
* 09:27 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti2010.codfw.wmnet to ganeti01.svc.codfw.wmnet
* 15:32 ayounsi@deploy1001: Finished deploy [homer/deploy@5e999c8]: CR613642 (duration: 03m 38s)
* 09:27 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti2010.codfw.wmnet to ganeti01.svc.codfw.wmnet
* 15:31 jayme@cumin1001: conftool action : set/pooled=inactive; selector: name=wtp1045.eqiad.wmnet
* 09:17 marostegui@cumin1001: dbctl commit (dc=all): 'After maintenance db1096:3315 ([[phab:T277354|T277354]])', diff saved to https://phabricator.wikimedia.org/P17961 and previous config saved to /var/cache/conftool/dbconfig/20211202-091753-marostegui.json
* 15:30 jayme@cumin1001: conftool action : set/pooled=yes; selector: name=wtp1041.eqiad.wmnet
* 09:16 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1096:3315 ([[phab:T277354|T277354]])', diff saved to https://phabricator.wikimedia.org/P17960 and previous config saved to /var/cache/conftool/dbconfig/20211202-091629-marostegui.json
* 15:30 jayme@cumin1001: conftool action : set/pooled=inactive; selector: name=wtp1044.eqiad.wmnet
* 09:16 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1096.eqiad.wmnet with reason: Maintenance [[phab:T277354|T277354]]
* 15:29 jayme@cumin1001: conftool action : set/pooled=yes; selector: name=wtp1039.eqiad.wmnet
* 09:16 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db1096.eqiad.wmnet with reason: Maintenance [[phab:T277354|T277354]]
* 15:28 ayounsi@deploy1001: Started deploy [homer/deploy@5e999c8]: CR613642
* 08:51 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2010.codfw.wmnet
* 15:19 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 08:45 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2010.codfw.wmnet
* 15:17 jayme@cumin1001: START - Cookbook sre.hosts.downtime
* 08:34 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti2010.codfw.wmnet with OS buster
* 15:16 mholloway-shell@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'proton' for release 'production' .
* 08:29 dcausse: restarting blazegraph on wdqs1007 (jvm stuck for 4h)
* 15:15 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 08:03 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti2010.codfw.wmnet with OS buster
* 15:14 mholloway-shell@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'proton' for release 'production' .
* 02:50 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1028.eqiad.wmnet with OS buster
* 15:13 jayme@cumin1001: START - Cookbook sre.hosts.downtime
* 02:43 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1028.eqiad.wmnet with OS buster
* 15:11 mholloway-shell@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'proton' for release 'production' .
* 02:40 andrew@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=93) for host cloudvirt1028.eqiad.wmnet with OS buster
* 15:08 ayounsi@deploy1001: Finished deploy [homer/deploy@fcf4332]: CR613642 (duration: 02m 14s)
* 02:15 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1028.eqiad.wmnet with OS buster
* 15:06 ayounsi@deploy1001: Started deploy [homer/deploy@fcf4332]: CR613642
* 02:14 andrew@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=93) for host cloudvirt1028.eqiad.wmnet with OS buster
* 15:01 ayounsi@deploy1001: Finished deploy [homer/deploy@fcf4332]: CR613642 (duration: 00m 11s)
* 01:52 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1028.eqiad.wmnet with OS buster
* 15:01 ayounsi@deploy1001: Started deploy [homer/deploy@fcf4332]: CR613642
* 01:21 ryankemper: [[phab:T280001|T280001]] Rolling restart of low-traffic pybal hosts complete. All of `wcqs` is pooled and the pybal / ipvs related alerts have cleared
* 14:58 jayme@cumin1001: conftool action : set/pooled=inactive; selector: name=wtp1043.eqiad.wmnet
* 01:16 ryankemper: [[phab:T280001|T280001]] Pooled `wcqs200[1-3]` (had been left unpooled from when we last removed wcqs from production)
* 14:58 jayme@cumin1001: conftool action : set/pooled=yes; selector: name=wtp1040.eqiad.wmnet
* 01:12 ryankemper: [[phab:T280001|T280001]] Restarting pybal on low-traffic primaries `lvs2009` and `lvs1015`: `ryankemper@cumin1001:~$ sudo cumin 'P<nowiki>{</nowiki>lvs2009*,lvs1015*<nowiki>}</nowiki>' 'sudo systemctl restart pybal'`
* 14:57 mholloway-shell@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'proton' for release 'production' .
* 01:11 ryankemper: [[phab:T280001|T280001]] Waited 120s and checked https://icinga.wikimedia.org/alerts, proceeding to primary low-traffic hosts `lvs2009` and `lvs1015`
* 14:55 jayme@cumin1001: conftool action : set/pooled=inactive; selector: name=wtp1042.eqiad.wmnet
* 01:08 ryankemper: [[phab:T280001|T280001]] Sanity check of `sudo ipvsadm -L -n` on backup  `lvs2010` and `lvs1016` looks good (for ex `lvs1016` has `TCP  10.2.2.67:443 wrr`)
* 14:54 jayme@cumin1001: conftool action : set/pooled=yes; selector: name=wtp1038.eqiad.wmnet
* 01:07 ryankemper: [[phab:T280001|T280001]] Restarting pybal on low-traffic backups: `ryankemper@cumin1001:~$ sudo cumin 'P<nowiki>{</nowiki>lvs2010*,lvs1016*<nowiki>}</nowiki>' 'sudo systemctl restart pybal'`
* 14:52 mholloway-shell@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'proton' for release 'production' .
* 01:02 ryankemper: [[phab:T280001|T280001]] `ryankemper@cumin1001:~$ sudo cumin 'O:lvs::balancer' 'sudo run-puppet-agent'`
* 14:48 mholloway-shell@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'proton' for release 'production' .
* 01:01 ryankemper: [[phab:T280001|T280001]] Merged https://gerrit.wikimedia.org/r/c/operations/puppet/+/742841
* 14:23 herron: bounced centrallog rsyslog services in codfw/eqiad
* 01:00 ryankemper: [[phab:T280001|T280001]] About to merge https://gerrit.wikimedia.org/r/c/operations/puppet/+/742841 to bring `wcqs` into state `lvs_setup`, after which I'll perform a rolling restart of pybal
* 14:19 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 00:24 urbanecm@deploy1002: Synchronized php-1.38.0-wmf.9/skins/Vector/: {{Gerrit|a7586cd4a2559248ea1fd29cf74de535de016501}}: Update scroll observer to allow event logging ([[phab:T292586|T292586]]) (duration: 00m 57s)
* 14:15 jayme@cumin1001: START - Cookbook sre.hosts.downtime
* 14:03 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1147', diff saved to https://phabricator.wikimedia.org/P12087 and previous config saved to /var/cache/conftool/dbconfig/20200728-140313-marostegui.json
* 14:02 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1148', diff saved to https://phabricator.wikimedia.org/P12086 and previous config saved to /var/cache/conftool/dbconfig/20200728-140249-marostegui.json
* 14:02 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 14:02 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1148', diff saved to https://phabricator.wikimedia.org/P12085 and previous config saved to /var/cache/conftool/dbconfig/20200728-140220-marostegui.json
* 14:02 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1148', diff saved to https://phabricator.wikimedia.org/P12084 and previous config saved to /var/cache/conftool/dbconfig/20200728-140207-marostegui.json
* 14:00 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 13:59 jayme@cumin1001: START - Cookbook sre.hosts.downtime
* 13:58 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 13:58 moritzm: installing perl security updates
* 13:56 jayme@cumin1001: START - Cookbook sre.hosts.downtime
* 13:56 jayme@cumin1001: START - Cookbook sre.hosts.downtime
* 13:55 jayme@cumin1001: conftool action : set/pooled=inactive; selector: name=wtp1041.eqiad.wmnet
* 13:55 jayme@cumin1001: conftool action : set/pooled=yes; selector: name=wtp1037.eqiad.wmnet
* 13:50 godog: remove stale ipvs thanos-query service on port 80
* 13:39 jayme@cumin1001: conftool action : set/pooled=inactive; selector: name=wtp1040.eqiad.wmnet
* 13:38 jayme@cumin1001: conftool action : set/pooled=yes; selector: name=wtp1036.eqiad.wmnet
* 13:38 jayme@cumin1001: conftool action : set/pooled=inactive; selector: name=wtp1039.eqiad.wmnet
* 13:37 jayme@cumin1001: conftool action : set/pooled=yes; selector: name=wtp1035.eqiad.wmnet
* 13:37 jayme@cumin1001: conftool action : set/pooled=inactive; selector: name=wtp1038.eqiad.wmnet
* 13:36 jayme@cumin1001: conftool action : set/pooled=yes; selector: name=wtp1034.eqiad.wmnet
* 13:29 godog: roll-restart pybal on eqiad lvs low-traffic to change port for thanos-query
* 13:25 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1075', diff saved to https://phabricator.wikimedia.org/P12083 and previous config saved to /var/cache/conftool/dbconfig/20200728-132520-marostegui.json
* 13:20 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1075 with less weight', diff saved to https://phabricator.wikimedia.org/P12082 and previous config saved to /var/cache/conftool/dbconfig/20200728-132023-marostegui.json
* 13:09 godog: roll-restart pybal on lvs low-traffic to apply thanos-query changes
* 13:04 XioNoX: standardize cr3-esams interfaces
* 13:03 liw@deploy1001: rebuilt and synchronized wikiversions files: group0 wikis to 1.36.0-wmf.2
* 12:41 XioNoX: standardize cr2-esams interfaces
* 12:38 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 12:36 jayme@cumin1001: START - Cookbook sre.hosts.downtime
* 12:33 jayme@cumin1001: START - Cookbook sre.hosts.downtime
* 12:32 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1075', diff saved to https://phabricator.wikimedia.org/P12081 and previous config saved to /var/cache/conftool/dbconfig/20200728-123201-marostegui.json
* 12:28 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 12:26 jayme@cumin1001: START - Cookbook sre.hosts.downtime
* 12:26 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 12:24 jayme@cumin1001: START - Cookbook sre.hosts.downtime
* 12:17 jayme@cumin1001: conftool action : set/pooled=inactive; selector: name=wtp1037.eqiad.wmnet
* 12:14 jayme@cumin1001: conftool action : set/pooled=inactive; selector: name=wtp1036.eqiad.wmnet
* 12:08 jayme@cumin1001: conftool action : set/pooled=inactive; selector: name=wtp1035.eqiad.wmnet
* 12:07 jayme@cumin1001: conftool action : set/pooled=yes; selector: name=wtp1032.eqiad.wmnet
* 12:07 jayme@cumin1001: conftool action : set/pooled=yes; selector: name=wtp1033.eqiad.wmnet
* 12:05 jayme@cumin1001: conftool action : set/pooled=yes; selector: name=wtp1031.eqiad.wmnet
* 12:04 jayme@cumin1001: conftool action : set/pooled=inactive; selector: name=wtp1034.eqiad.wmnet
* 12:04 tstarling@deploy1001: Synchronized wmf-config/CommonSettings.php: disabling lilypond rendering in Score again due to error running gs (duration: 01m 05s)
* 11:56 tstarling@deploy1001: Synchronized wmf-config/CommonSettings.php: re-enabling Score in safe mode (duration: 01m 04s)
* 11:50 Urbanecm: EU B&C window done
* 11:49 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|1a5672628b82709350ca74bb784197e7ff5fdc19}}: Add Turkish powered by MW and Wikimedia project icons ([[phab:T257732|T257732]]) (duration: 00m 59s)
* 11:46 urbanecm@deploy1001: Synchronized static/images/footer/: {{Gerrit|1a5672628b82709350ca74bb784197e7ff5fdc19}}: Add Turkish powered by MW and Wikimedia project icons ([[phab:T257732|T257732]]) (duration: 01m 01s)
* 11:43 urbanecm@deploy1001: Synchronized static/images: {{Gerrit|df9b9acf0876dad9b11d5641fe6fa174c7066f8b}}: Move footer logos to /static/images/footer ([[phab:T257732|T257732]]) (duration: 01m 02s)
* 11:38 marostegui: Deploy schema change on s3 codfw, this will generate lag on codfw [[phab:T256682|T256682]]
* 11:38 ema: A:cp-text varnish ban pt.wikiversity.org [[phab:T256750|T256750]]
* 11:37 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|df9b9acf0876dad9b11d5641fe6fa174c7066f8b}}: Move footer logos to /static/images/footer ([[phab:T257732|T257732]]) (duration: 00m 58s)
* 11:36 ema: A:cp-text varnish ban fr.wiktionary.org [[phab:T256750|T256750]]
* 11:35 urbanecm@deploy1001: Synchronized static/images/footer: {{Gerrit|df9b9acf0876dad9b11d5641fe6fa174c7066f8b}}: Move footer logos to /static/images/footer ([[phab:T257732|T257732]]) (duration: 01m 05s)
* 11:34 ema: A:cp-text varnish ban eu.wikipedia.org [[phab:T256750|T256750]]
* 11:32 ema: A:cp-text varnish ban he.wikipedia.org [[phab:T256750|T256750]]
* 11:30 marostegui: Deploy MCR change on db1143, db1148, db1146:3314
* 11:30 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1148 for MCR schema change', diff saved to https://phabricator.wikimedia.org/P12079 and previous config saved to /var/cache/conftool/dbconfig/20200728-113009-marostegui.json
* 11:30 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|04c7ef94bb7901668f2a8df3289b6a59d42f0a7e}}: Undeploy graphoid for phase 2 wikis ([[phab:T258463|T258463]]) (duration: 01m 00s)
* 11:28 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1143', diff saved to https://phabricator.wikimedia.org/P12078 and previous config saved to /var/cache/conftool/dbconfig/20200728-112850-marostegui.json
* 11:25 ema: A:cp-text varnish ban fa.wikipedia.org [[phab:T256750|T256750]]
* 11:21 dcausse@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [cirrus] use more neutral config var names (duration: 01m 06s)
* 11:20 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1143', diff saved to https://phabricator.wikimedia.org/P12077 and previous config saved to /var/cache/conftool/dbconfig/20200728-112046-marostegui.json
* 11:15 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1143', diff saved to https://phabricator.wikimedia.org/P12076 and previous config saved to /var/cache/conftool/dbconfig/20200728-111522-marostegui.json
* 11:12 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1143', diff saved to https://phabricator.wikimedia.org/P12075 and previous config saved to /var/cache/conftool/dbconfig/20200728-111226-marostegui.json
* 11:11 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 11:10 jdrewniak@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:614890 desktop improvements by default for testing group (round 2) (T254227)]] (duration: 01m 06s)
* 11:09 jayme@cumin1001: START - Cookbook sre.hosts.downtime
* 11:09 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 11:07 jayme@cumin1001: START - Cookbook sre.hosts.downtime
* 10:56 hashar@deploy1001: Finished deploy [integration/docroot@ba85bdf]: Catch up with HEAD and support DOCUMENT_ROOT being a symbolic link for [[phab:T149924|T149924]] (duration: 00m 06s)
* 10:56 hashar@deploy1001: Started deploy [integration/docroot@ba85bdf]: Catch up with HEAD and support DOCUMENT_ROOT being a symbolic link for [[phab:T149924|T149924]]
* 10:55 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 10:53 jayme@cumin1001: START - Cookbook sre.hosts.downtime
* 10:50 jayme@cumin1001: conftool action : set/pooled=inactive; selector: name=wtp1033.eqiad.wmnet
* 10:48 jayme@cumin1001: conftool action : set/pooled=yes; selector: name=wtp1030.eqiad.wmnet
* 10:48 jayme@cumin1001: conftool action : set/pooled=yes; selector: name=wtp1029.eqiad.wmnet
* 10:47 jayme@cumin1001: conftool action : set/pooled=inactive; selector: name=wtp1032.eqiad.wmnet
* 10:47 jayme@cumin1001: conftool action : set/pooled=yes; selector: name=wtp1028.eqiad.wmnet
* 10:33 jayme@cumin1001: conftool action : set/pooled=inactive; selector: name=wtp1031.eqiad.wmnet
* 10:32 jayme@cumin1001: conftool action : set/pooled=yes; selector: name=wtp1027.eqiad.wmnet
* 10:23 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1082', diff saved to https://phabricator.wikimedia.org/P12074 and previous config saved to /var/cache/conftool/dbconfig/20200728-102342-marostegui.json
* 10:04 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1082', diff saved to https://phabricator.wikimedia.org/P12072 and previous config saved to /var/cache/conftool/dbconfig/20200728-100412-marostegui.json
* 09:58 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 09:57 jmm@cumin2001: START - Cookbook sre.hosts.downtime
* 09:55 XioNoX: standardize cr2-esams interfaces
* 09:52 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 09:50 jayme@cumin1001: START - Cookbook sre.hosts.downtime
* 09:49 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 09:47 jayme@cumin1001: START - Cookbook sre.hosts.downtime
* 09:45 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 09:43 jayme@cumin1001: START - Cookbook sre.hosts.downtime
* 09:40 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 09:38 jayme@cumin1001: START - Cookbook sre.hosts.downtime
* 09:35 moritzm: imported libmysqlclient18 to component/cloudera [[phab:T258768|T258768]]
* 09:31 jayme@cumin1001: conftool action : set/pooled=inactive; selector: name=wtp1030.eqiad.wmnet
* 09:28 jayme@cumin1001: conftool action : set/pooled=inactive; selector: name=wtp1029.eqiad.wmnet
* 09:26 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1082', diff saved to https://phabricator.wikimedia.org/P12070 and previous config saved to /var/cache/conftool/dbconfig/20200728-092606-marostegui.json
* 09:24 jayme@cumin1001: conftool action : set/pooled=inactive; selector: name=wtp1028.eqiad.wmnet
* 09:19 XioNoX: standardize cr3-eqsin interfaces
* 09:18 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1082', diff saved to https://phabricator.wikimedia.org/P12069 and previous config saved to /var/cache/conftool/dbconfig/20200728-091849-marostegui.json
* 09:18 jayme@cumin1001: conftool action : set/pooled=inactive; selector: name=wtp1027.eqiad.wmnet
* 09:10 jayme@cumin1001: conftool action : set/pooled=yes; selector: name=wtp1026.eqiad.wmnet
* 09:07 ema: cp3050: restart varnishmtail.service, stuck on "Condition(c->offset <= c->vtx->len) not true."
* 08:39 XioNoX: standardize cr2-eqsin interfaces
* 08:38 godog: temporary downgrade prometheus-snmp-exporter on netmon2001
* 08:33 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1143 for MCR schema change', diff saved to https://phabricator.wikimedia.org/P12067 and previous config saved to /var/cache/conftool/dbconfig/20200728-083336-marostegui.json
* 08:32 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1146:3314', diff saved to https://phabricator.wikimedia.org/P12066 and previous config saved to /var/cache/conftool/dbconfig/20200728-083209-marostegui.json
* 08:20 liw@deploy1001: Finished scap: testwikis wikis to 1.36.0-wmf.2 (duration: 53m 11s)
* 08:09 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 08:07 jayme@cumin1001: START - Cookbook sre.hosts.downtime
* 08:06 godog: failover librenms/smokeping to netmon2001 - [[phab:T247967|T247967]]
* 08:04 marostegui: Reduce labsdb1009 weight
* 07:56 ayounsi@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 07:48 jayme: depooled wtp1026.eqiad.wmnet for reimage
* 07:48 moritzm: switched superset to CAS
* 07:47 ayounsi@cumin1001: START - Cookbook sre.dns.netbox
* 07:46 ayounsi@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
* 07:43 ayounsi@cumin1001: START - Cookbook sre.dns.netbox
* 07:34 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 07:32 marostegui@cumin1001: START - Cookbook sre.hosts.downtime
* 07:31 jayme@cumin1001: conftool action : set/pooled=yes; selector: name=wtp1025.eqiad.wmnet
* 07:27 liw@deploy1001: Started scap: testwikis wikis to 1.36.0-wmf.2
* 07:03 liw: 1.36.0-wmf.2 was branched at {{Gerrit|04e863fdf3646ee6ed5c05b784f85c9f323e1f19}} for [[phab:T257970|T257970]]
* 05:19 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1146:3314 for MCR schema change', diff saved to https://phabricator.wikimedia.org/P12065 and previous config saved to /var/cache/conftool/dbconfig/20200728-051928-marostegui.json
* 05:18 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1144:3314 and restore db1146:3314 original weight', diff saved to https://phabricator.wikimedia.org/P12064 and previous config saved to /var/cache/conftool/dbconfig/20200728-051813-marostegui.json
* 02:17 eileen: process-control config revision is {{Gerrit|6811ca294a}} - just delayed silverpop_daily a bit as clashing with dedupe
* 00:18 andrew@cumin1001: conftool action : set/pooled=inactive; selector: name=cloudcephmon1003.eqiad.wmnet
* 00:17 andrew@cumin1001: conftool action : set/pooled=no; selector: name=cloudcephmon1003.eqiad.wmnet


== 2020-07-27 ==
== 2021-12-01 ==
* 23:49 ebernhardson@deploy1001: Finished deploy [wikimedia/discovery/analytics@ac8e5d0]: airflow: head queries report, managed variables, refinery-drop-hive-partitions support (duration: 00m 54s)
* 22:15 otto@deploy1002: Finished deploy [airflow-dags/analytics@2f59257] (hadoop-test): (no justification provided) (duration: 00m 07s)
* 23:48 ebernhardson@deploy1001: Started deploy [wikimedia/discovery/analytics@ac8e5d0]: airflow: head queries report, managed variables, refinery-drop-hive-partitions support
* 22:15 otto@deploy1002: Started deploy [airflow-dags/analytics@2f59257] (hadoop-test): (no justification provided)
* 23:28 mutante: otrs1001 - ran puppet (it was alerting in icinga that puppet failed, but it was neither disabled nor failing and changed nothing when it ran)
* 22:13 otto@deploy1002: Finished deploy [airflow-dags/analytics@2f59257] (hadoop-test): (no justification provided) (duration: 00m 07s)
* 21:31 sbassett@deploy1001: Synchronized wmf-config/CommonSettings.php: Deployed CentralNotice CSP conifg change for [[phab:T258459|T258459]] (duration: 00m 57s)
* 22:13 otto@deploy1002: Started deploy [airflow-dags/analytics@2f59257] (hadoop-test): (no justification provided)
* 21:10 sbassett: Deployed mitigations for [[phab:T238075|T238075]]
* 22:12 otto@deploy1002: Finished deploy [airflow-dags/analytics@2f59257] (hadoop-test): (no justification provided) (duration: 00m 07s)
* 20:41 urbanecm@deploy1001: Synchronized php-1.36.0-wmf.1/extensions/InterwikiSorting/: {{Gerrit|c5f6c97856a5dbe673064afd2804bebb9b787580}}: Use LanguageLinksHook to sort interwiki links ([[phab:T257625|T257625]]) (duration: 00m 59s)
* 22:12 otto@deploy1002: Started deploy [airflow-dags/analytics@2f59257] (hadoop-test): (no justification provided)
* 19:50 jhuneidi@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'blubberoid' for release 'production' .
* 22:12 otto@deploy1002: Finished deploy [airflow-dags/analytics@2f59257] (hadoop-test): (no justification provided) (duration: 01m 23s)
* 19:44 jhuneidi@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'blubberoid' for release 'production' .
* 22:11 otto@deploy1002: Started deploy [airflow-dags/analytics@2f59257] (hadoop-test): (no justification provided)
* 19:36 jhuneidi@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'blubberoid' for release 'staging' .
* 22:10 otto@deploy1002: Finished deploy [airflow-dags/analytics@2f59257] (hadoop-test): (no justification provided) (duration: 00m 03s)
* 19:23 jhuneidi@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'sessionstore' for release 'production' .
* 22:10 otto@deploy1002: Started deploy [airflow-dags/analytics@2f59257] (hadoop-test): (no justification provided)
* 19:19 jhuneidi@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'sessionstore' for release 'production' .
* 22:10 otto@deploy1002: Finished deploy [airflow-dags/analytics@2f59257] (hadoop-test): (no justification provided) (duration: 00m 03s)
* 19:11 jhuneidi@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'sessionstore' for release 'staging' .
* 22:10 otto@deploy1002: Started deploy [airflow-dags/analytics@2f59257] (hadoop-test): (no justification provided)
* 19:06 jhuneidi@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'echostore' for release 'production' .
* 22:09 otto@deploy1002: Finished deploy [airflow-dags/analytics@2f59257] (hadoop-test): (no justification provided) (duration: 00m 03s)
* 19:00 jhuneidi@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'echostore' for release 'production' .
* 22:09 otto@deploy1002: Started deploy [airflow-dags/analytics@2f59257] (hadoop-test): (no justification provided)
* 18:57 urbanecm@deploy1001: sync-file aborted: {{Gerrit|3833b135caf4171daa0814eba81393b6c44db619}}: Move footer logos to /static/images/footer ([[phab:T257732|T257732]]) (duration: 00m 04s)
* 21:12 otto@deploy1002: Finished deploy [airflow-dags/analytics@2f59257] (hadoop-test): (no justification provided) (duration: 00m 03s)
* 18:50 urbanecm@deploy1001: Synchronized wmf-config/CommonSettings.php: {{Gerrit|c6a9674366d9c8d273ce0e74dfb6a04c91d64307}}: Move footer logos to wmg* variables ([[phab:T257732|T257732]]) (duration: 00m 56s)
* 21:12 otto@deploy1002: Started deploy [airflow-dags/analytics@2f59257] (hadoop-test): (no justification provided)
* 18:50 urbanecm@deploy1001: Synchronized wmf-config/CommonSettings.php: (no justification provided) (duration: 00m 57s)
* 21:11 otto@deploy1002: Finished deploy [airflow-dags/analytics@2f59257] (hadoop-test): (no justification provided) (duration: 00m 16s)
* 18:49 urbanecm@deploy1001: sync-file aborted: (no justification provided) (duration: 00m 01s)
* 21:10 otto@deploy1002: Started deploy [airflow-dags/analytics@2f59257] (hadoop-test): (no justification provided)
* 18:49 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|c6a9674366d9c8d273ce0e74dfb6a04c91d64307}}: Move footer logos to wmg* variables ([[phab:T257732|T257732]]) (duration: 00m 57s)
* 21:10 otto@deploy1002: Started deploy [airflow-dags/analytics@2f59257]: (no justification provided)
* 18:29 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enable desktop web UI click tracking instrumentation on frwiki, hewiki, fawiki ([[phab:T258058|T258058]]) (duration: 00m 56s)
* 21:09 razzi@deploy1002: Finished deploy [analytics/refinery@3b1b794]: Regular analytics weekly train [analytics/refinery@3b1b794] (duration: 21m 18s)
* 18:13 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Remove WPBSkinBlacklist ([[phab:T254675|T254675]]) (duration: 00m 57s)
* 21:06 jynus: installing python-monotonic on ms-fe2011, ms-fe2012 (breaks swift-proxy)
* 17:42 liw@deploy1001: rebuilt and synchronized wikiversions files: all wikis to 1.36.0-wmf.1
* 21:02 jynus: installing python-monotonic on ms-fe2010
* 17:30 liw: promoting train to group2
* 20:48 razzi@deploy1002: Started deploy [analytics/refinery@3b1b794]: Regular analytics weekly train [analytics/refinery@3b1b794]
* 17:14 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=99)
* 20:13 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 17:14 andrew@cumin1001: START - Cookbook sre.hosts.decommission
* 20:09 pt1979@cumin2002: START - Cookbook sre.dns.netbox
* 17:14 dpifke@deploy1001: Finished deploy [performance/arc-lamp@f14888b]: Deploying arclamp-compress-logs ([[phab:T235456|T235456]]) (duration: 00m 05s)
* 19:46 otto@deploy1002: Finished deploy [airflow-dags/analytics@2f59257] (hadoop-test): (no justification provided) (duration: 00m 03s)
* 17:14 dpifke@deploy1001: Started deploy [performance/arc-lamp@f14888b]: Deploying arclamp-compress-logs ([[phab:T235456|T235456]])
* 19:46 otto@deploy1002: Started deploy [airflow-dags/analytics@2f59257] (hadoop-test): (no justification provided)
* 16:59 andrew@cumin1001: conftool action : set/pooled=yes; selector: name=cloudcephmon1002.eqiad.wmnet
* 19:30 otto@deploy1002: Finished deploy [airflow-dags/analytics@bea2abe] (hadoop-test): (no justification provided) (duration: 00m 22s)
* 16:58 andrew@cumin1001: conftool action : set/pooled=no; selector: name=cloudcephmon1002.eqiad.wmnet
* 19:30 otto@deploy1002: Started deploy [airflow-dags/analytics@bea2abe] (hadoop-test): (no justification provided)
* 16:57 andrew@cumin1001: conftool action : set/pooled=inactive; selector: name=cloudcephmon1002.eqiad.wmnet
* 19:27 otto@deploy1002: Finished deploy [airflow-dags/analytics@bea2abe] (hadoop-test): (no justification provided) (duration: 02m 26s)
* 16:50 andrew@cumin1001: conftool action : set/pooled=inactive; selector: name=cloudcephosd1003.eqiad.wmnet
* 19:25 otto@deploy1002: Started deploy [airflow-dags/analytics@bea2abe] (hadoop-test): (no justification provided)
* 16:50 andrew@cumin1001: conftool action : set/pooled=inactive; selector: name=cloudcephosd1002.eqiad.wmnet
* 19:24 otto@deploy1002: Finished deploy [airflow-dags/analytics@bea2abe] (hadoop-test): (no justification provided) (duration: 00m 03s)
* 16:50 andrew@cumin1001: conftool action : set/pooled=inactive; selector: name=cloudcephosd1001.eqiad.wmnet
* 19:24 otto@deploy1002: Started deploy [airflow-dags/analytics@bea2abe] (hadoop-test): (no justification provided)
* 16:50 andrew@cumin1001: conftool action : set/pooled=yes; selector: name=cloudcephmon1003.eqiad.wmnet
* 19:24 otto@deploy1002: Finished deploy [airflow-dags/analytics@bea2abe] (hadoop-test): (no justification provided) (duration: 00m 03s)
* 16:50 andrew@cumin1001: conftool action : set/pooled=yes; selector: name=cloudcephmon1002.eqiad.wmnet
* 19:24 otto@deploy1002: Started deploy [airflow-dags/analytics@bea2abe] (hadoop-test): (no justification provided)
* 16:49 andrew@cumin1001: conftool action : set/pooled=yes; selector: name=cloudcephmon1001.eqiad.wmnet
* 19:18 otto@deploy1002: Finished deploy [airflow-dags/analytics@bea2abe] (hadoop-test): (no justification provided) (duration: 00m 03s)
* 16:48 andrew@cumin1001: conftool action : set/pooled=no; selector: name=cloudcephosd1003.wikimedia.org
* 19:18 otto@deploy1002: Started deploy [airflow-dags/analytics@bea2abe] (hadoop-test): (no justification provided)
* 16:48 andrew@cumin1001: conftool action : set/pooled=no; selector: name=cloudcephosd1002.wikimedia.org
* 19:13 majavah: UTC evening deploys done
* 16:48 andrew@cumin1001: conftool action : set/pooled=no; selector: name=cloudcephosd1001.wikimedia.org
* 19:11 taavi@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:742834{{!}}Add mediawiki.web_ui_scroll stream (T292586)]] (duration: 00m 57s)
* 16:48 andrew@cumin1001: conftool action : set/pooled=yes; selector: name=cloudcephosd1001.eqiad.wmnet
* 18:44 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 16:48 andrew@cumin1001: conftool action : set/pooled=yes; selector: name=cloudcephosd1002.eqiad.wmnet
* 18:41 pt1979@cumin2002: START - Cookbook sre.dns.netbox
* 16:47 andrew@cumin1001: conftool action : set/pooled=yes; selector: name=cloudcephosd1003.eqiad.wmnet
* 18:41 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp1089.eqiad.wmnet with OS buster
* 16:44 andrew@cumin1001: conftool action : set/pooled=yes; selector: name=cumin1001.eqiad.wmnet
* 18:39 vgutierrez: pool cp1089 using HAProxy as TLS terminator - [[phab:T290005|T290005]]
* 16:33 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db2087:3316, db2087:3317 after on-site maintenance [[phab:T258587|T258587]]', diff saved to https://phabricator.wikimedia.org/P12063 and previous config saved to /var/cache/conftool/dbconfig/20200727-163311-marostegui.json
* 17:58 vgutierrez@cumin1001: START - Cookbook sre.hosts.reimage for host cp1089.eqiad.wmnet with OS buster
* 16:05 marostegui: Will show up on labsdb hosts for s5
* 17:54 vgutierrez: depool cp1089 to be reimaged as cache::text_haproxy - [[phab:T290005|T290005]]
* 16:04 marostegui: Stop MySQL on db1082 for onsite maintenance - [[phab:T258910|T258910]]
* 16:08 moritzm: installing postgresql-9.6 security updates
* 15:03 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 15:54 godog: bounce logstash on eqiad/codfw to apply template changes
* 15:03 jmm@cumin2001: START - Cookbook sre.hosts.downtime
* 15:53 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality' for release 'main' .
* 14:57 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 15:42 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ganeti2010.codfw.wmnet with OS buster
* 14:55 andrew@cumin1001: START - Cookbook sre.hosts.downtime
* 15:27 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti2010.codfw.wmnet with OS buster
* 14:50 marostegui@cumin1001: dbctl commit (dc=all): 'Reduce db1146:3314 weight while db1144:3314 is depooled', diff saved to https://phabricator.wikimedia.org/P12060 and previous config saved to /var/cache/conftool/dbconfig/20200727-145010-marostegui.json
* 15:27 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ganeti2010.codfw.wmnet with OS buster
* 14:48 marostegui: Deploy MCR change on db1144:3314
* 15:17 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti2010.codfw.wmnet with OS buster
* 14:48 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1144:3314 for MCR schema change', diff saved to https://phabricator.wikimedia.org/P12059 and previous config saved to /var/cache/conftool/dbconfig/20200727-144807-marostegui.json
* 15:15 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 14:40 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1149', diff saved to https://phabricator.wikimedia.org/P12058 and previous config saved to /var/cache/conftool/dbconfig/20200727-144034-marostegui.json
* 15:08 marostegui@cumin1001: dbctl commit (dc=all): 'After maintenance db1180 ([[phab:T277354|T277354]])', diff saved to https://phabricator.wikimedia.org/P17955 and previous config saved to /var/cache/conftool/dbconfig/20211201-150853-marostegui.json
* 14:21 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 14:53 marostegui@cumin1001: dbctl commit (dc=all): 'After maintenance db1180 ([[phab:T277354|T277354]])', diff saved to https://phabricator.wikimedia.org/P17954 and previous config saved to /var/cache/conftool/dbconfig/20211201-145348-marostegui.json
* 14:19 XioNoX: standardize cr1-codfw interfaces
* 14:42 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ganeti2010.codfw.wmnet with OS buster
* 14:19 andrew@cumin1001: START - Cookbook sre.hosts.downtime
* 14:38 marostegui@cumin1001: dbctl commit (dc=all): 'After maintenance db1180 ([[phab:T277354|T277354]])', diff saved to https://phabricator.wikimedia.org/P17953 and previous config saved to /var/cache/conftool/dbconfig/20211201-143843-marostegui.json
* 14:07 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 14:38 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts puppetboard1001.eqiad.wmnet
* 14:04 jayme@cumin1001: START - Cookbook sre.hosts.downtime
* 14:29 jbond@cumin1001: START - Cookbook sre.hosts.decommission for hosts puppetboard1001.eqiad.wmnet
* 13:57 moritzm: upgrading idp2001 to CAS 6.1.7.1
* 14:28 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts puppetboard2001.codfw.wmnet
* 13:19 XioNoX: standardize some cr2-esams interfaces
* 14:24 marostegui@cumin1001: dbctl commit (dc=all): 'After maintenance db1180 ([[phab:T277354|T277354]])', diff saved to https://phabricator.wikimedia.org/P17951 and previous config saved to /var/cache/conftool/dbconfig/20211201-142339-marostegui.json
* 13:11 marostegui@cumin1001: dbctl commit (dc=all): 'More weight to db1089 in main traffic', diff saved to https://phabricator.wikimedia.org/P12057 and previous config saved to /var/cache/conftool/dbconfig/20200727-131123-marostegui.json
* 14:23 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1180 ([[phab:T277354|T277354]])', diff saved to https://phabricator.wikimedia.org/P17950 and previous config saved to /var/cache/conftool/dbconfig/20211201-142227-marostegui.json
* 13:09 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1105:3311 with normal weight and pool db1089 into vslow', diff saved to https://phabricator.wikimedia.org/P12056 and previous config saved to /var/cache/conftool/dbconfig/20200727-130954-marostegui.json
* 14:23 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1180.eqiad.wmnet with reason: Maintenance [[phab:T277354|T277354]]
* 13:07 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1105:3311', diff saved to https://phabricator.wikimedia.org/P12055 and previous config saved to /var/cache/conftool/dbconfig/20200727-130713-marostegui.json
* 14:22 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db1180.eqiad.wmnet with reason: Maintenance [[phab:T277354|T277354]]
* 13:03 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 14:22 marostegui@cumin1001: dbctl commit (dc=all): 'After maintenance db1168 ([[phab:T277354|T277354]])', diff saved to https://phabricator.wikimedia.org/P17949 and previous config saved to /var/cache/conftool/dbconfig/20211201-142219-marostegui.json
* 13:01 andrew@cumin1001: START - Cookbook sre.hosts.downtime
* 14:13 jbond@cumin1001: START - Cookbook sre.hosts.decommission for hosts puppetboard2001.codfw.wmnet
* 12:58 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1105:3311 with less weight', diff saved to https://phabricator.wikimedia.org/P12054 and previous config saved to /var/cache/conftool/dbconfig/20200727-125824-marostegui.json
* 14:13 jynus: started commonswiki codfw media backup at 8 threads of parallelism
* 12:53 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1105:3311', diff saved to https://phabricator.wikimedia.org/P12053 and previous config saved to /var/cache/conftool/dbconfig/20200727-125351-marostegui.json
* 14:07 marostegui@cumin1001: dbctl commit (dc=all): 'After maintenance db1168 ([[phab:T277354|T277354]])', diff saved to https://phabricator.wikimedia.org/P17948 and previous config saved to /var/cache/conftool/dbconfig/20211201-140715-marostegui.json
* 12:52 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1105:3311 with less weight', diff saved to https://phabricator.wikimedia.org/P12052 and previous config saved to /var/cache/conftool/dbconfig/20200727-125207-marostegui.json
* 13:58 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti2010.codfw.wmnet with OS buster
* 12:50 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1105:3311', diff saved to https://phabricator.wikimedia.org/P12051 and previous config saved to /var/cache/conftool/dbconfig/20200727-125045-marostegui.json
* 13:56 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 12:41 marostegui: Compress innodb on db1106, this will generate lag on enwiki on labsdb hosts (wiki replicas) [[phab:T254462|T254462]]
* 13:55 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 12:38 moritzm: disable puppet on idp1001/2001
* 13:52 marostegui@cumin1001: dbctl commit (dc=all): 'After maintenance db1168 ([[phab:T277354|T277354]])', diff saved to https://phabricator.wikimedia.org/P17947 and previous config saved to /var/cache/conftool/dbconfig/20211201-135210-marostegui.json
* 12:38 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1106 and pool db1105:3311 as vslow [[phab:T254462|T254462]]', diff saved to https://phabricator.wikimedia.org/P12050 and previous config saved to /var/cache/conftool/dbconfig/20200727-123833-marostegui.json
* 13:37 marostegui@cumin1001: dbctl commit (dc=all): 'After maintenance db1168 ([[phab:T277354|T277354]])', diff saved to https://phabricator.wikimedia.org/P17946 and previous config saved to /var/cache/conftool/dbconfig/20211201-133705-marostegui.json
* 12:37 akosiaris@cumin1001: conftool action : set/pooled=no; selector: dc=eqiad,service=mobileapps,name=scb1001.eqiad.wmnet
* 13:37 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1168 ([[phab:T277354|T277354]])', diff saved to https://phabricator.wikimedia.org/P17945 and previous config saved to /var/cache/conftool/dbconfig/20211201-133554-marostegui.json
* 12:37 akosiaris@cumin1001: conftool action : set/weight=0; selector: dc=eqiad,service=mobileapps,name=scb1001.eqiad.wmnet
* 13:36 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1168.eqiad.wmnet with reason: Maintenance [[phab:T277354|T277354]]
* 12:37 akosiaris@cumin1001: conftool action : set/weight=10; selector: dc=eqiad,service=mobileapps,name=scb1001.eqiad.wmnet
* 13:36 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db1168.eqiad.wmnet with reason: Maintenance [[phab:T277354|T277354]]
* 12:37 akosiaris@cumin1001: conftool action : set/pooled=yes; selector: dc=eqiad,service=mobileapps,name=scb1001.eqiad.wmnet
* 13:35 marostegui@cumin1001: dbctl commit (dc=all): 'After maintenance db1131 ([[phab:T277354|T277354]])', diff saved to https://phabricator.wikimedia.org/P17944 and previous config saved to /var/cache/conftool/dbconfig/20211201-133546-marostegui.json
* 12:36 akosiaris@cumin1001: conftool action : set/weight=0; selector: dc=eqiad,service=mobileapps,name=scb1001.eqiad.wmnet
* 13:30 moritzm: set "sudo gnt-cluster modify --hypervisor-parameters kvm:machine_version=pc-i440fx-2.8" for ganeti eqiad cluster [[phab:T294120|T294120]]
* 12:31 XioNoX: standardize cr2-codfw interfaces
* 13:20 marostegui@cumin1001: dbctl commit (dc=all): 'After maintenance db1131 ([[phab:T277354|T277354]])', diff saved to https://phabricator.wikimedia.org/P17942 and previous config saved to /var/cache/conftool/dbconfig/20211201-132041-marostegui.json
* 12:28 volans@deploy1001: Finished deploy [debmonitor/deploy@25dbd20]: Release v0.2.7 (duration: 00m 27s)
* 13:19 vgutierrez: restore haproxy 2.2.9 on cp3064 - [[phab:T290005|T290005]]
* 12:28 volans@deploy1001: Started deploy [debmonitor/deploy@25dbd20]: Release v0.2.7
* 13:05 marostegui@cumin1001: dbctl commit (dc=all): 'After maintenance db1131 ([[phab:T277354|T277354]])', diff saved to https://phabricator.wikimedia.org/P17939 and previous config saved to /var/cache/conftool/dbconfig/20211201-130536-marostegui.json
* 12:25 jbond42: upload new cas package to buster-wikimedia
* 12:50 marostegui@cumin1001: dbctl commit (dc=all): 'After maintenance db1131 ([[phab:T277354|T277354]])', diff saved to https://phabricator.wikimedia.org/P17938 and previous config saved to /var/cache/conftool/dbconfig/20211201-125031-marostegui.json
* 12:25 jbond42: upload new cas package
* 12:50 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1131 ([[phab:T277354|T277354]])', diff saved to https://phabricator.wikimedia.org/P17937 and previous config saved to /var/cache/conftool/dbconfig/20211201-124919-marostegui.json
* 12:23 ema: A:cp rolling varnish-frontend restart to actually discard old VCL still pointing at varnishcheck/check [[phab:T255015|T255015]] [[phab:T236754|T236754]]
* 12:49 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1131.eqiad.wmnet with reason: Maintenance [[phab:T277354|T277354]]
* 12:21 moritzm: installing ruby-json security updates
* 12:49 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db1131.eqiad.wmnet with reason: Maintenance [[phab:T277354|T277354]]
* 12:16 moritzm: installing batik security updates
* 12:20 marostegui@cumin1001: dbctl commit (dc=all): 'After maintenance db1165 ([[phab:T277354|T277354]])', diff saved to https://phabricator.wikimedia.org/P17936 and previous config saved to /var/cache/conftool/dbconfig/20211201-122020-marostegui.json
* 11:59 marostegui: Deploy MCR schema change on db1149
* 12:11 urbanecm: EU B&C window done
* 11:58 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1149 for MCR schema change', diff saved to https://phabricator.wikimedia.org/P12049 and previous config saved to /var/cache/conftool/dbconfig/20200727-115818-marostegui.json
* 12:10 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|c8ab29b2feb47d611873cf0465b2a2dd5eac0ad2}}: enwikisource: enable anonymous talk page mobile tabs ([[phab:T47955|T47955]]) (duration: 00m 56s)
* 11:57 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1138', diff saved to https://phabricator.wikimedia.org/P12048 and previous config saved to /var/cache/conftool/dbconfig/20200727-115739-marostegui.json
* 12:05 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|2bd14e8968c90b2562f045457d61b252728e6250}}: Add templateeditor group and protection level at viwiki ([[phab:T296154|T296154]]) (duration: 00m 56s)
* 11:52 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1138', diff saved to https://phabricator.wikimedia.org/P12047 and previous config saved to /var/cache/conftool/dbconfig/20200727-115258-marostegui.json
* 12:05 marostegui@cumin1001: dbctl commit (dc=all): 'After maintenance db1165 ([[phab:T277354|T277354]])', diff saved to https://phabricator.wikimedia.org/P17935 and previous config saved to /var/cache/conftool/dbconfig/20211201-120515-marostegui.json
* 11:28 moritzm: installing an-tool1009 [[phab:T258768|T258768]]
* 11:50 marostegui@cumin1001: dbctl commit (dc=all): 'After maintenance db1165 ([[phab:T277354|T277354]])', diff saved to https://phabricator.wikimedia.org/P17934 and previous config saved to /var/cache/conftool/dbconfig/20211201-115011-marostegui.json
* 10:54 ema: upload atskafka 0.10 to buster-wikimedia, upgrade cp3050 [[phab:T254317|T254317]]
* 11:35 marostegui@cumin1001: dbctl commit (dc=all): 'After maintenance db1165 ([[phab:T277354|T277354]])', diff saved to https://phabricator.wikimedia.org/P17933 and previous config saved to /var/cache/conftool/dbconfig/20211201-113506-marostegui.json
* 10:46 jdrewniak@deploy1001: Synchronized portals: Wikimedia Portals Update: [[gerrit:616463{{!}} Bumping portals to master (616463)]] (duration: 01m 05s)
* 11:34 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1165 ([[phab:T277354|T277354]])', diff saved to https://phabricator.wikimedia.org/P17932 and previous config saved to /var/cache/conftool/dbconfig/20211201-113354-marostegui.json
* 10:45 jdrewniak@deploy1001: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: [[gerrit:616463{{!}} Bumping portals to master (616463)]] (duration: 01m 10s)
* 11:34 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db[1155,1165].eqiad.wmnet with reason: Maintenance [[phab:T277354|T277354]]
* 10:33 XioNoX: make cr*-ulsfo interfaces netbox compliant
* 11:33 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db[1155,1165].eqiad.wmnet with reason: Maintenance [[phab:T277354|T277354]]
* 08:39 XioNoX: push "Add 185.71.138.0/24 to wikimedia4" to all routers
* 11:31 vgutierrez: test HAProxy 2.4.9 on cp3064 - [[phab:T290005|T290005]]
* 07:00 marostegui: Deploy schema change on s5 codfw [[phab:T256682|T256682]]
* 11:31 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1140.eqiad.wmnet with reason: Maintenance [[phab:T277354|T277354]]
* 06:44 elukey: truncate big log file on an-launcher1002 that is filling up the /srv partition
* 11:30 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db1140.eqiad.wmnet with reason: Maintenance [[phab:T277354|T277354]]
* 06:36 elukey: apt-get clean on netbox1001 to free some space
* 11:29 marostegui@cumin1001: dbctl commit (dc=all): 'After maintenance db1113:3316 ([[phab:T277354|T277354]])', diff saved to https://phabricator.wikimedia.org/P17931 and previous config saved to /var/cache/conftool/dbconfig/20211201-112952-marostegui.json
* 05:11 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1138 for MCR schema change', diff saved to https://phabricator.wikimedia.org/P12043 and previous config saved to /var/cache/conftool/dbconfig/20200727-051156-marostegui.json
* 11:14 marostegui@cumin1001: dbctl commit (dc=all): 'After maintenance db1113:3316 ([[phab:T277354|T277354]])', diff saved to https://phabricator.wikimedia.org/P17930 and previous config saved to /var/cache/conftool/dbconfig/20211201-111448-marostegui.json
* 05:01 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2087:3316, db2087:3317 for on-site maintenance [[phab:T258587|T258587]]', diff saved to https://phabricator.wikimedia.org/P12042 and previous config saved to /var/cache/conftool/dbconfig/20200727-050058-marostegui.json
* 10:59 marostegui@cumin1001: dbctl commit (dc=all): 'After maintenance db1113:3316 ([[phab:T277354|T277354]])', diff saved to https://phabricator.wikimedia.org/P17929 and previous config saved to /var/cache/conftool/dbconfig/20211201-105943-marostegui.json
* 04:58 marostegui: Stop MySQL on db2087 for on-site maintenance [[phab:T258587|T258587]]
* 10:44 marostegui@cumin1001: dbctl commit (dc=all): 'After maintenance db1113:3316 ([[phab:T277354|T277354]])', diff saved to https://phabricator.wikimedia.org/P17928 and previous config saved to /var/cache/conftool/dbconfig/20211201-104438-marostegui.json
* 10:43 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1113:3316 ([[phab:T277354|T277354]])', diff saved to https://phabricator.wikimedia.org/P17927 and previous config saved to /var/cache/conftool/dbconfig/20211201-104316-marostegui.json
* 10:43 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1113.eqiad.wmnet with reason: Maintenance [[phab:T277354|T277354]]
* 10:43 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db1113.eqiad.wmnet with reason: Maintenance [[phab:T277354|T277354]]
* 10:43 marostegui@cumin1001: dbctl commit (dc=all): 'After maintenance db1098:3316 ([[phab:T277354|T277354]])', diff saved to https://phabricator.wikimedia.org/P17926 and previous config saved to /var/cache/conftool/dbconfig/20211201-104308-marostegui.json
* 10:29 Lucas_WMDE: Deployed patch for [[phab:T296578|T296578]]
* 10:28 marostegui@cumin1001: dbctl commit (dc=all): 'After maintenance db1098:3316 ([[phab:T277354|T277354]])', diff saved to https://phabricator.wikimedia.org/P17925 and previous config saved to /var/cache/conftool/dbconfig/20211201-102804-marostegui.json
* 10:23 vgutierrez: test haproxy_2.2.19-1~bpo10+1 on cp3064 - [[phab:T290005|T290005]]
* 10:13 marostegui@cumin1001: dbctl commit (dc=all): 'After maintenance db1098:3316 ([[phab:T277354|T277354]])', diff saved to https://phabricator.wikimedia.org/P17924 and previous config saved to /var/cache/conftool/dbconfig/20211201-101259-marostegui.json
* 09:57 marostegui@cumin1001: dbctl commit (dc=all): 'After maintenance db1098:3316 ([[phab:T277354|T277354]])', diff saved to https://phabricator.wikimedia.org/P17923 and previous config saved to /var/cache/conftool/dbconfig/20211201-095754-marostegui.json
* 09:56 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1098:3316 ([[phab:T277354|T277354]])', diff saved to https://phabricator.wikimedia.org/P17922 and previous config saved to /var/cache/conftool/dbconfig/20211201-095632-marostegui.json
* 09:56 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1098.eqiad.wmnet with reason: Maintenance [[phab:T277354|T277354]]
* 09:56 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db1098.eqiad.wmnet with reason: Maintenance [[phab:T277354|T277354]]
* 09:56 marostegui@cumin1001: dbctl commit (dc=all): 'After maintenance db1096:3316 ([[phab:T277354|T277354]])', diff saved to https://phabricator.wikimedia.org/P17921 and previous config saved to /var/cache/conftool/dbconfig/20211201-095624-marostegui.json
* 09:49 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 09:48 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 09:46 taavi@deploy1002: Synchronized wmf-config/CommonSettings-labs.php: Config: [[gerrit:742925{{!}}beta: Update mx host]] (duration: 00m 55s)
* 09:43 urbanecm: [urbanecm@mwmaint1002 ~]$ foreachwiki extensions/CheckUser/maintenance/fixTrailingSpacesInLogs.php
* 09:41 marostegui@cumin1001: dbctl commit (dc=all): 'After maintenance db1096:3316 ([[phab:T277354|T277354]])', diff saved to https://phabricator.wikimedia.org/P17920 and previous config saved to /var/cache/conftool/dbconfig/20211201-094120-marostegui.json
* 09:39 ladsgroup@deploy1002: Synchronized php-1.38.0-wmf.9/extensions/FlaggedRevs/backend/FlaggedRevision.php: Backport: [[gerrit:742853{{!}}Drop using ft_title and ft_namespace (T296380)]] (duration: 00m 56s)
* 09:32 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 09:31 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 09:26 marostegui@cumin1001: dbctl commit (dc=all): 'After maintenance db1096:3316 ([[phab:T277354|T277354]])', diff saved to https://phabricator.wikimedia.org/P17919 and previous config saved to /var/cache/conftool/dbconfig/20211201-092615-marostegui.json
* 09:16 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubetcd2005.codfw.wmnet with reason: Switch to DRBD for migration
* 09:16 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on kubetcd2005.codfw.wmnet with reason: Switch to DRBD for migration
* 09:11 marostegui@cumin1001: dbctl commit (dc=all): 'After maintenance db1096:3316 ([[phab:T277354|T277354]])', diff saved to https://phabricator.wikimedia.org/P17918 and previous config saved to /var/cache/conftool/dbconfig/20211201-091110-marostegui.json
* 09:09 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1096:3316 ([[phab:T277354|T277354]])', diff saved to https://phabricator.wikimedia.org/P17917 and previous config saved to /var/cache/conftool/dbconfig/20211201-090948-marostegui.json
* 09:09 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1096.eqiad.wmnet with reason: Maintenance [[phab:T277354|T277354]]
* 09:09 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db1096.eqiad.wmnet with reason: Maintenance [[phab:T277354|T277354]]
* 09:03 vgutierrez: rolling restart of haproxy and varnish on O:cache::text_haproxy and O:cache::upload_haproxy - [[phab:T290005|T290005]]
* 08:56 moritzm: draining primary/secondary instance off ganeti2010 [[phab:T296622|T296622]]
* 08:51 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 08:41 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 08:32 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 06:20 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2141.codfw.wmnet with reason: Maintenance [[phab:T277354|T277354]]
* 06:20 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db2141.codfw.wmnet with reason: Maintenance [[phab:T277354|T277354]]
* 06:20 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2124.codfw.wmnet with reason: Maintenance [[phab:T277354|T277354]]
* 06:20 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db2124.codfw.wmnet with reason: Maintenance [[phab:T277354|T277354]]
* 06:19 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2117.codfw.wmnet with reason: Maintenance [[phab:T277354|T277354]]
* 06:19 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db2117.codfw.wmnet with reason: Maintenance [[phab:T277354|T277354]]
* 00:35 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 00:35 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 00:32 catrope@deploy1002: Synchronized php-1.38.0-wmf.9/extensions/GrowthExperiments/includes/NewcomerTasks/NewcomerTasksUserOptionsLookup.php: Backport: [[gerrit:742548{{!}}Newcomer tasks: Fix filtering of non-existent task types (T296366)]] (duration: 00m 56s)
* 00:10 catrope@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:742817{{!}}Enable A/B test enrollment instrumentation. (T292587)]] (duration: 00m 56s)
* 00:08 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 00:07 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .


== 2020-07-25 ==
== 2021-11-30 ==
* 12:41 marostegui@cumin1001: dbctl commit (dc=all): 'Pool db1096:3315 into s5 api afte db1082 crashed [[phab:T258336|T258336]]', diff saved to https://phabricator.wikimedia.org/P12041 and previous config saved to /var/cache/conftool/dbconfig/20200725-124104-marostegui.json
* 23:59 dzahn@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'miscweb' for release 'main' .
* 09:16 oblivian@cumin1001: dbctl commit (dc=all): 'Depool db1082 [[phab:T258336|T258336]]', diff saved to https://phabricator.wikimedia.org/P12040 and previous config saved to /var/cache/conftool/dbconfig/20200725-091616-oblivian.json
* 23:57 mutante: deploy1002 - kube_env miscweb staging ; helmfile -e staging destroy
* 01:52 mutante: ganeti - also removing (unmounted) disk 2 (100G) from webperf1002. [[phab:T257931|T257931]]
* 23:56 dzahn@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'miscweb' for release 'main' .
* 00:46 mutante: ganeti - removing disk 3 (20G) from webperf1002. the disks are 0-indexed, so the ones actually mounted are 0 (50G) and 1 (300G) ([[phab:T257931|T257931]])
* 23:46 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 00:42 dpifke: Manually compressing some more data on webperf1002, using arclamp-compress-logs from https://gerrit.wikimedia.org/r/c/performance/arc-lamp/+/615904.
* 23:45 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 23:09 mutante: gerrit - added Majavah to wmf-deployment group for [[phab:T296777|T296777]]
* 22:30 krinkle@deploy1002: Finished deploy [integration/docroot@2af7007]: {{Gerrit|Ia89b6591639e5}} (duration: 00m 09s)
* 22:30 krinkle@deploy1002: Started deploy [integration/docroot@2af7007]: {{Gerrit|Ia89b6591639e5}}
* 22:21 mutante: welcome Majavah to MediaWiki deployers ([[phab:T296777|T296777]])
* 20:09 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|5443b78f197b782238632966891d721859733a74}}: uzwiki: Deploy Growth features to newcomers ([[phab:T294245|T294245]]) (duration: 00m 57s)
* 18:09 legoktm: uploaded php-yaml for component/php72 ([[phab:T296331|T296331]])
* 18:08 vgutierrez: restart haproxy on cp3064 - [[phab:T290005|T290005]]
* 17:44 jynus@cumin1001: dbctl commit (dc=all): 'Repool db1163 fully', diff saved to https://phabricator.wikimedia.org/P17912 and previous config saved to /var/cache/conftool/dbconfig/20211130-174434-jynus.json
* 17:39 jynus@cumin1001: dbctl commit (dc=all): 'Repool db1163 at 50%', diff saved to https://phabricator.wikimedia.org/P17911 and previous config saved to /var/cache/conftool/dbconfig/20211130-173935-jynus.json
* 17:35 jynus@cumin1001: dbctl commit (dc=all): 'Repool db1163 at 25%', diff saved to https://phabricator.wikimedia.org/P17910 and previous config saved to /var/cache/conftool/dbconfig/20211130-173517-jynus.json
* 17:34 moritzm: installing libvorbis security updates
* 17:15 jynus@cumin1001: dbctl commit (dc=all): 'Repool db1163 at 5%', diff saved to https://phabricator.wikimedia.org/P17908 and previous config saved to /var/cache/conftool/dbconfig/20211130-171550-jynus.json
* 17:00 jynus: move db1139:s1 under db1118
* 16:57 jynus@cumin1001: dbctl commit (dc=all): 'Depool db1163 fully', diff saved to https://phabricator.wikimedia.org/P17907 and previous config saved to /var/cache/conftool/dbconfig/20211130-165718-jynus.json
* 16:29 XioNoX: Move cr2-codfw lumen transit link to BO cable - [[phab:T289241|T289241]]
* 16:26 XioNoX: Move cr2-codfw eqord link to BO cable - [[phab:T289241|T289241]]
* 16:23 XioNoX: Move cr2-codfw pfw3 link to BO cable - [[phab:T289241|T289241]]
* 16:20 Emperor: reboot ms-be2059 to fix device enumeration order re [[phab:T295563|T295563]]
* 16:14 jynus@cumin1001: dbctl commit (dc=all): 'Depool db1163 at 25%', diff saved to https://phabricator.wikimedia.org/P17906 and previous config saved to /var/cache/conftool/dbconfig/20211130-161457-jynus.json
* 16:13 XioNoX: cr2-codfw bounce fpc 1 pic 0 (vrrp backup) - [[phab:T289241|T289241]]
* 16:07 jynus@cumin1001: dbctl commit (dc=all): 'Depool db1163 at 50%', diff saved to https://phabricator.wikimedia.org/P17905 and previous config saved to /var/cache/conftool/dbconfig/20211130-160748-jynus.json
* 16:06 bblack: lvs2007 - repooling into service
* 16:01 bblack: lvs2007 - depooling for network maint - do not push LVS config changes please!
* 15:41 jbond@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=99) for hosts puppetboard2001.codfw.wmnet
* 15:41 jbond@cumin1001: START - Cookbook sre.hosts.decommission for hosts puppetboard2001.codfw.wmnet
* 15:38 jbond@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=99) for hosts puppetboard2001.codfw.wmnet
* 15:37 jbond@cumin1001: START - Cookbook sre.hosts.decommission for hosts puppetboard2001.codfw.wmnet
* 15:32 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 15:29 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 15:23 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 15:22 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 15:16 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 15:15 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 15:12 jforrester@deploy1002: Synchronized multiversion/MWMultiVersion.php: Add wikifunctions hard-coded value to setSiteInfoForWiki for Beta Cluster [[phab:T284162|T284162]] (duration: 00m 56s)
* 15:09 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 15:08 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 13:45 elukey@cumin1001: END (PASS) - Cookbook sre.kafka.roll-restart-brokers (exit_code=0) for Kafka A:kafka-test-eqiad cluster: Roll restart of jvm daemons for openjdk upgrade.
* 13:25 elukey@cumin1001: START - Cookbook sre.kafka.roll-restart-brokers for Kafka A:kafka-test-eqiad cluster: Roll restart of jvm daemons for openjdk upgrade.
* 13:11 marostegui@cumin1001: dbctl commit (dc=all): 'After maintenance db2114 ([[phab:T277354|T277354]])', diff saved to https://phabricator.wikimedia.org/P17904 and previous config saved to /var/cache/conftool/dbconfig/20211130-131124-marostegui.json
* 13:05 topranks: Running homer against CR routers to adjust loopback4 filter enabling local NTP queries for status.  [[phab:T296623|T296623]]
* 12:56 marostegui@cumin1001: dbctl commit (dc=all): 'After maintenance db2114 ([[phab:T277354|T277354]])', diff saved to https://phabricator.wikimedia.org/P17903 and previous config saved to /var/cache/conftool/dbconfig/20211130-125620-marostegui.json
* 12:41 marostegui@cumin1001: dbctl commit (dc=all): 'After maintenance db2114 ([[phab:T277354|T277354]])', diff saved to https://phabricator.wikimedia.org/P17902 and previous config saved to /var/cache/conftool/dbconfig/20211130-124115-marostegui.json
* 12:26 marostegui@cumin1001: dbctl commit (dc=all): 'After maintenance db2114 ([[phab:T277354|T277354]])', diff saved to https://phabricator.wikimedia.org/P17901 and previous config saved to /var/cache/conftool/dbconfig/20211130-122610-marostegui.json
* 12:25 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2114 ([[phab:T277354|T277354]])', diff saved to https://phabricator.wikimedia.org/P17900 and previous config saved to /var/cache/conftool/dbconfig/20211130-122555-marostegui.json
* 12:25 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2114.codfw.wmnet with reason: Maintenance [[phab:T277354|T277354]]
* 12:25 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db2114.codfw.wmnet with reason: Maintenance [[phab:T277354|T277354]]
* 12:09 jbond@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=99) for hosts puppetboard1001.eqiad.wmnet
* 12:02 jbond@cumin1001: START - Cookbook sre.hosts.decommission for hosts puppetboard1001.eqiad.wmnet
* 11:50 moritzm: running "sudo gnt-cluster renew-crypto --new-node-certificates --new-rapi-certificate --new-spice-certificate" for Ganeti codfw cluster [[phab:T296622|T296622]]
* 11:01 hnowlan: restarting tilerator, kartotherian and tileratorui for updates in eqiad
* 11:01 hnowlan: restarting tilerator, kartotherian and tileratorui in codfw
* 10:39 elukey: rollout wmf-certificates 0~20211129-1 fleet wide (add group/others permissions to the cert bundle)
* 10:30 lucaswerkmeister-wmde@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'termbox' for release 'production' .
* 10:29 lucaswerkmeister-wmde@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'termbox' for release 'production' .
* 09:58 moritzm: installing remaining ICU security updates
* 09:06 Amir1: dropping wikiadmin@localhost from all pooled replicas of s6 ([[phab:T296511|T296511]])
* 08:24 dcausse: restarting blazegraph on wdqs1006 (jvm stuck for 6hours)
* 08:14 Amir1: revoking DROP from wikiadmin on all pooled replicas ([[phab:T249683|T249683]])
* 03:46 ejegg: updated payments-wiki from {{Gerrit|dbc92132}} to {{Gerrit|4a4ef51d}}
* 02:05 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 02:04 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 00:22 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 00:20 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 00:17 catrope@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:742524{{!}}Enable scroll tracking for all users (T292586)]] (duration: 00m 55s)
* 00:14 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 00:14 catrope@deploy1002: Synchronized php-1.38.0-wmf.9/extensions/WikimediaEvents/modules/ext.wikimediaEvents/readingDepth.js: Backport: [[gerrit:742517{{!}}Provide fallback for config variable when not present]] (duration: 00m 55s)
* 00:13 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 00:13 catrope@deploy1002: Synchronized wmf-config/flaggedrevs.php: Config: [[gerrit:738530{{!}}allow sysops to set/remove reviewer group on ckbwiki (T294696)]] (duration: 00m 55s)


== 2020-07-24 ==
== 2021-11-29 ==
* 23:00 ryankemper@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99)
* 22:32 sbassett@deploy1002: Synchronized php-1.38.0-wmf.9/extensions/EntitySchema/src/MediaWiki/Specials/SetEntitySchemaLabelDescriptionAliases.php: Deploy security patch for [[phab:T296578|T296578]] (duration: 00m 55s)
* 20:08 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 22:21 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 20:06 andrew@cumin1001: START - Cookbook sre.hosts.downtime
* 22:20 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 19:57 dpifke: Manually gzipping some older ArcLamp data on webperf1002, to free up space and verify new compression support.
* 22:20 sbassett@deploy1002: Synchronized php-1.38.0-wmf.9/extensions/FileImporter/src/Remote/MediaWiki/HttpApiLookup.php: Backport: [[gerrit:742263{{!}}SECURITY: Fix special page displaying unescaped user input (T296605)]] (duration: 00m 56s)
* 19:55 dpifke@deploy1001: Finished deploy [performance/arc-lamp@772b4a3]: Deploy CLs 611465 and 613740 to add compression support to ArcLamp (duration: 00m 05s)
* 20:48 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 19:55 dpifke@deploy1001: Started deploy [performance/arc-lamp@772b4a3]: Deploy CLs 611465 and 613740 to add compression support to ArcLamp
* 20:47 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 16:55 Amir1: deployment done
* 20:46 jforrester@deploy1002: Synchronized wmf-config/CommonSettings.php: Fix wgWikiLambdaOrchestratorLocation service pointer typo (duration: 00m 55s)
* 16:49 ladsgroup@deploy1001: Synchronized php-1.36.0-wmf.1/extensions/Wikibase/repo/includes/RepoHooks.php: [[gerrit:616032{{!}}Prevent onTitleGetRestrictionTypes changing ns0 protections]], Part II (duration: 01m 07s)
* 20:27 tgr: UTC evening deploys done
* 16:47 ladsgroup@deploy1001: Synchronized php-1.36.0-wmf.1/extensions/Wikibase/repo/includes/WikibaseRepo.php: [[gerrit:616032{{!}}Prevent onTitleGetRestrictionTypes changing ns0 protections]], Part I (duration: 01m 06s)
* 20:26 tgr@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:742261{{!}}GrowthExperiments: Start imagerecommendation variant experiment]] (duration: 00m 55s)
* 15:06 reedy@deploy1001: Finished scap: Score backports (duration: 36m 50s)
* 20:23 tgr@deploy1002: Synchronized php-1.38.0-wmf.9/extensions/GrowthExperiments/includes/NewcomerTasks/AddImage/AddImageSubmissionHandler.php: Backport: [[gerrit:742262{{!}}AddImage: Refresh user's task feed after undecided rejection (T296491)]] (duration: 00m 56s)
* 14:30 reedy@deploy1001: Started scap: Score backports
* 20:21 tgr@deploy1002: Synchronized php-1.38.0-wmf.9/extensions/GrowthExperiments/includes/HomepageModules/SuggestedEdits.php: Backport: [[gerrit:742260{{!}}SuggestedEdits: Drop isActivated() check in getJsData (T296626)]] (duration: 00m 56s)
* 13:31 XioNoX: advertise 185.71.138.0/24 from AMS
* 20:17 ejegg: updated payments-wiki from {{Gerrit|d1d6f024}} -> {{Gerrit|dbc92132}}
* 13:17 jayme@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
* 20:16 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 13:00 ladsgroup@deploy1001: Synchronized php-1.36.0-wmf.1/includes/import/ImportableOldRevisionImporter.php: [[gerrit:616029{{!}}Import: use master DB for loading slots.]] ([[phab:T258666|T258666]]) (duration: 01m 07s)
* 20:15 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 12:34 jayme@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
* 20:10 eileen: civicrm
* 12:04 jayme@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'mobileapps' for release 'staging' .
* 20:09 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 11:48 hnowlan: bootstrapped restbase-dev1004-b
* 20:08 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 11:13 hnowlan: started bootstrap of restbase-dev1004-a
* 20:02 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 10:51 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 20:01 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 10:49 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime
* 20:00 ebernhardson@deploy1002: Synchronized wmf-config/InitialiseSettings.php: [[phab:T295705|T295705]] Move CirrusSearch traffic back to eqiad (duration: 00m 56s)
* 10:35 hnowlan: started reimage of restbase-dev1004
* 19:42 legoktm: uploaded php-yaml_2.2.1+2.1.0+2.0.4+1.3.2-2+wmf1~buster1_amd64.changes to apt.wm.o ([[phab:T296331|T296331]])
* 09:59 jmm@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
* 19:34 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 09:48 jmm@cumin1001: START - Cookbook sre.ganeti.makevm
* 19:33 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 08:40 kormat: restarting mariadb on all sanitarium hosts [[phab:T258711|T258711]]
* 19:16 vgutierrez: pool cp3064 - [[phab:T290005|T290005]]
* 08:35 akosiaris: start nagios-nrpe-server on kubernetes2002
* 18:55 bblack: repooling esams
* 07:44 elukey: depool wtp1025 - disk full
* 18:48 bblack: esams: shifting depool method to esams-offline (now that its config is fixed)
* 06:30 tstarling@deploy1001: Started scap: for Score
* 18:42 legoktm: depooling esams
* 02:36 tstarling@deploy1001: Synchronized php-1.36.0-wmf.1/extensions/Score/includes/Score.php: removing superseded local patch for hard-coding lilypond version (duration: 01m 09s)
* 18:17 vgutierrez: depool cp3064 - [[phab:T290005|T290005]]
* 01:19 ejegg: updated payments-wiki from {{Gerrit|31a3de1130}} to {{Gerrit|c365c136d2}}
* 17:58 ladsgroup@deploy1002: Synchronized php-1.38.0-wmf.9/includes/libs/rdbms/: Backport: [[gerrit:742259{{!}}rdbms: Add DB host to TransactionProfiler logging and fix time fields (T295706)]] (duration: 00m 56s)
* 01:04 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 17:56 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 01:02 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 17:55 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 01:02 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 17:49 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 01:02 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 17:48 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 01:02 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 17:42 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 01:02 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 17:41 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 01:02 andrew@cumin1001: START - Cookbook sre.hosts.downtime
* 17:40 jforrester@deploy1002: Synchronized wmf-config/CommonSettings.php: Initial Beta Cluster deployment of Wikifunctions: III - CS for [[phab:T289315|T289315]] (duration: 00m 55s)
* 01:02 andrew@cumin1001: START - Cookbook sre.hosts.downtime
* 17:38 vgutierrez: pool cp3064 - [[phab:T290005|T290005]]
* 01:02 andrew@cumin1001: START - Cookbook sre.hosts.downtime
* 17:29 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 01:02 andrew@cumin1001: START - Cookbook sre.hosts.downtime
* 17:28 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 01:02 andrew@cumin1001: START - Cookbook sre.hosts.downtime
* 17:25 lucaswerkmeister-wmde@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'termbox' for release 'test' .
* 01:02 andrew@cumin1001: START - Cookbook sre.hosts.downtime
* 17:25 lucaswerkmeister-wmde@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'termbox' for release 'staging' .
* 00:46 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
* 17:22 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 00:46 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
* 17:22 jforrester@deploy1002: Synchronized wmf-config/ProductionServices.php: Initial Beta Cluster deployment of Wikifunctions: II - Services for [[phab:T289315|T289315]] (duration: 00m 55s)
* 00:46 andrew@cumin1001: START - Cookbook sre.hosts.decommission
* 17:21 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 00:46 andrew@cumin1001: START - Cookbook sre.hosts.decommission
* 17:18 jforrester@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Initial Beta Cluster deployment of Wikifunctions: I - IS for [[phab:T289315|T289315]] (duration: 00m 55s)
* 00:45 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
* 17:00 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 00:45 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
* 16:59 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|06d8d25f6e89be0b1692d017bdbc2c9524372c0b}}: foundationwiki: Remove explicit wmgUseOAuth (duration: 00m 57s)
* 00:44 andrew@cumin1001: START - Cookbook sre.hosts.decommission
* 16:59 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 00:44 andrew@cumin1001: START - Cookbook sre.hosts.decommission
* 16:56 urbanecm@deploy1002: Synchronized wmf-config/CommonSettings.php: {{Gerrit|bad34ed8d86b30eb4c240da0498ddfb44af30ea7}}: Make foundationwiki a standard CentralAuth wiki ([[phab:T205347|T205347]]) (duration: 00m 56s)
* 00:44 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
* 16:53 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 00:44 andrew@cumin1001: START - Cookbook sre.hosts.decommission
* 16:52 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|567f2a9d4883c9a98a3251f153ea0ad58d7774c6}}: Revert "foundationwiki: Set wmgLocalAuthLoginOnly=false temporarily" ([[phab:T205347|T205347]]) (duration: 00m 56s)
* 00:43 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
* 16:51 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 00:42 andrew@cumin1001: START - Cookbook sre.hosts.decommission
* 16:25 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 00:32 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 16:21 pt1979@cumin2002: START - Cookbook sre.dns.netbox
* 00:30 andrew@cumin1001: START - Cookbook sre.hosts.downtime
* 16:20 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic2069.codfw.wmnet with OS buster
* 00:15 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
* 16:04 moritzm: sudo gnt-cluster upgrade --to 2.16 for Ganeti codfw cluster
* 00:14 andrew@cumin1001: START - Cookbook sre.hosts.decommission
* 15:55 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 15:54 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 15:52 vgutierrez: depool cp3064 - [[phab:T290005|T290005]]
* 15:51 James_F: Running mwscript extensions/WikimediaMaintenance/addWiki.php --wiki=enwiki en wikimedia wikifunctionswiki wikifunctions.beta.wmflabs.org in Beta Cluster for [[phab:T284162|T284162]]
* 15:51 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host elastic2069.codfw.wmnet with OS buster
* 15:48 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 15:47 papaul: power down logstash2028 for IDRAC reset
* 15:47 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 15:15 moritzm: gnt-cluster renew-crypto --new-cluster-certificate for codfw Ganeti cluster [[phab:T296622|T296622]]
* 14:40 mbsantos@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
* 14:38 mbsantos@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
* 14:37 mbsantos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'wikifeeds' for release 'staging' .
* 13:55 vgutierrez: repool cp3064 - [[phab:T290005|T290005]]
* 12:51 moritzm: upgrading ganeti codfw cluster to 2.16 backport [[phab:T296622|T296622]]
* 12:49 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 12:48 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 12:36 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 12:35 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 12:32 vgutierrez: depool cp3064 - [[phab:T290005|T290005]]
* 12:32 urbanecm@deploy1002: Synchronized php-1.38.0-wmf.9/extensions/GrowthExperiments/includes/HomepageModules/SuggestedEdits.php: {{Gerrit|05704407395fbf227eec47cf716393dc60a36a35}}: Fix error handling in SuggestedEdits::getActionData ([[phab:T296366|T296366]]) (duration: 05m 37s)
* 12:29 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 12:28 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 12:17 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 12:16 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 12:13 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|7fdea3e71e4fd9e85c30efbc17f94c0711deb252}}:  Add planet4589.org to the wgCopyUploadsDomains allowlist of Wikimedia Commons ([[phab:T296136|T296136]]) (duration: 00m 56s)
* 12:11 vgutierrez: pool cp3064 (text) using HAProxy as TLS terminator - [[phab:T290005|T290005]]
* 12:10 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 12:09 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp3064.esams.wmnet with OS buster
* 12:09 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 12:07 urbanecm@deploy1002: Synchronized docroot/: {{Gerrit|4662224229cb4083b8b01de436ccd65e8c00e7dd}}: Remove search.wikimedia.org files ([[phab:T289224|T289224]]) (duration: 00m 56s)
* 11:02 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 11:01 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 10:58 urbanecm@deploy1002: Synchronized php-1.38.0-wmf.9/extensions/CentralAuth/includes/CentralAuthUser.php: {{Gerrit|5fc6aaa73202a1bf2aa58998d2671d5f4a6255bc}}: Fix "Mark entries as bot entries" feature(2/2; [[phab:T296297|T296297]]) (duration: 00m 55s)
* 10:57 urbanecm@deploy1002: Synchronized php-1.38.0-wmf.9/extensions/CentralAuth/includes/Special/SpecialMultiLock.php: {{Gerrit|5fc6aaa73202a1bf2aa58998d2671d5f4a6255bc}}: Fix "Mark entries as bot entries" feature (1/2; [[phab:T296297|T296297]]) (duration: 00m 56s)
* 10:55 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 10:54 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 10:52 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|d01652ec22f6cb3413b419a3c9b0a7a08d79960f}}: Disable Growth IP research survey ([[phab:T294568|T294568]]) (duration: 00m 56s)
* 10:45 vgutierrez@cumin1001: START - Cookbook sre.hosts.reimage for host cp3064.esams.wmnet with OS buster
* 10:45 vgutierrez@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp3064.esams.wmnet with OS buster
* 10:02 vgutierrez@cumin1001: START - Cookbook sre.hosts.reimage for host cp3064.esams.wmnet with OS buster
* 10:01 vgutierrez: depool cp3064 to be reimaged as cache::text_haproxy - [[phab:T290005|T290005]]
* 09:52 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp2041.codfw.wmnet with OS buster
* 09:52 vgutierrez: pool cp2041 with HAProxy as TLS terminator - [[phab:T290005|T290005]]
* 09:46 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 09:42 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 09:36 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 09:35 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 09:34 moritzm: rolling restart of mediawiki canaries to pick up ICU security updates
* 09:34 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: NOOP: {{Gerrit|3a892860b2e1e2ac7b60fc1c4dbdb2035d6af950}}: foundationwiki: Do not enable wmgUsePageViewInfo explicitly (duration: 00m 55s)
* 09:32 urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript emptyUserGroup.php --wiki=foundationwiki 'inactive' # removing nonexistent group; backup left at P17888
* 09:30 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|786313c06188d5d63700d7e46384ef99a9297b57}}: foundationwiki: Clear group add/remove declarations (duration: 00m 55s)
* 09:29 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 09:28 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 09:27 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|c3f47dc55b67d2b53ec27bb610978ff8165aa6ca}}: foundationwiki: Disable hard redirects (duration: 00m 57s)
* 08:59 vgutierrez@cumin1001: START - Cookbook sre.hosts.reimage for host cp2041.codfw.wmnet with OS buster
* 08:56 vgutierrez: depool cp2041 to be reimaged as cache::text_haproxy - [[phab:T290005|T290005]]
* 08:54 moritzm: installing ICU security updates on buster
* 08:33 moritzm: installing bluez security updates
* 08:26 moritzm: installing libvpx security updates
* 08:19 moritzm: instaling libntlm security updates
* 08:07 elukey@deploy1002: Finished deploy [ores/deploy@69ed061]: Upgrade of mwparserfromhell - [[phab:T296563|T296563]] (duration: 07m 01s)
* 08:00 marostegui: Restart db2078 and db1117
* 08:00 elukey@deploy1002: Started deploy [ores/deploy@69ed061]: Upgrade of mwparserfromhell - [[phab:T296563|T296563]]
* 07:31 elukey@deploy1002: Finished deploy [ores/deploy@69ed061]: Canary upgrade of mwparserfromhell - [[phab:T296563|T296563]] - (second attempt, no git update submodules the first time) (duration: 00m 04s)
* 07:31 elukey@deploy1002: Started deploy [ores/deploy@69ed061]: Canary upgrade of mwparserfromhell - [[phab:T296563|T296563]] - (second attempt, no git update submodules the first time)
* 06:11 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pc2014.codfw.wmnet with OS bullseye
* 05:39 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host pc2014.codfw.wmnet with OS bullseye


== 2020-07-23 ==
== 2021-11-28 ==
* 23:32 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 17:14 elukey@deploy1002: Finished deploy [ores/deploy@69ed061]: Canary upgrade of mwparserfromhell - [[phab:T296563|T296563]] (duration: 02m 11s)
* 23:30 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 17:12 elukey@deploy1002: Started deploy [ores/deploy@69ed061]: Canary upgrade of mwparserfromhell - [[phab:T296563|T296563]]
* 23:30 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 23:30 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 23:30 andrew@cumin1001: START - Cookbook sre.hosts.downtime
* 23:30 andrew@cumin1001: START - Cookbook sre.hosts.downtime
* 23:30 andrew@cumin1001: START - Cookbook sre.hosts.downtime
* 23:30 andrew@cumin1001: START - Cookbook sre.hosts.downtime
* 22:55 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 22:53 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 22:52 mutante: stashbot quadruple log test
* 22:51 andrew@cumin1001: START - Cookbook sre.hosts.downtime
* 22:51 jhuneidi@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'echostore' for release 'staging' .
* 22:51 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 22:51 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 22:51 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 22:51 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 22:51 andrew@cumin1001: START - Cookbook sre.hosts.downtime
* 22:51 andrew@cumin1001: START - Cookbook sre.hosts.downtime
* 22:50 andrew@cumin1001: START - Cookbook sre.hosts.downtime
* 22:50 andrew@cumin1001: START - Cookbook sre.hosts.downtime
* 22:50 andrew@cumin1001: START - Cookbook sre.hosts.downtime
* 22:21 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 22:18 andrew@cumin1001: START - Cookbook sre.hosts.downtime
* 21:48 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 21:45 andrew@cumin1001: START - Cookbook sre.hosts.downtime
* 21:29 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 21:27 andrew@cumin1001: START - Cookbook sre.hosts.downtime
* 21:21 ebernhardson@deploy1001: Finished deploy [wikimedia/discovery/analytics@c99c626]: airflow: centralize installation specific airflow Variables (duration: 00m 34s)
* 21:20 ebernhardson@deploy1001: Started deploy [wikimedia/discovery/analytics@c99c626]: airflow: centralize installation specific airflow Variables
* 21:02 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 21:00 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 20:59 andrew@cumin1001: START - Cookbook sre.hosts.downtime
* 20:58 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 20:58 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 20:58 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 20:58 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 20:58 andrew@cumin1001: START - Cookbook sre.hosts.downtime
* 20:58 andrew@cumin1001: START - Cookbook sre.hosts.downtime
* 20:58 andrew@cumin1001: START - Cookbook sre.hosts.downtime
* 20:58 andrew@cumin1001: START - Cookbook sre.hosts.downtime
* 20:58 andrew@cumin1001: START - Cookbook sre.hosts.downtime
* 19:13 mholloway-shell@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
* 19:11 mholloway-shell@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
* 19:09 mholloway-shell@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'wikifeeds' for release 'staging' .
* 18:51 ryankemper: restarted blazegraph on codfw wdqs2001
* 18:44 ryankemper: Restarted blazegraph on following codfw wdqs nodes: 2007, 2003, and 2002
* 18:39 Amir1: BACC is done
* 18:29 ladsgroup@deploy1001: Synchronized wmf-config/Wikibase.php: [[gerrit:613235{{!}}Load WikibaseClient from extension.json file instead of php one (T257437 T256228 T88258)]] (duration: 01m 05s)
* 18:21 mutante: testreduce1001 - rm -rf /srv/testreduce and run puppet to re-clone testreduce to it from the scandium branch ([[phab:T257906|T257906]])
* 18:13 ryankemper: restarted blazegraph on 2001
* 17:59 ryankemper: sudo -E cumin -b 10 'A:wdqs-all and not A:wdqs-test and not P<nowiki>{</nowiki>wdqs1003.eqiad.wmnet<nowiki>}</nowiki> and not P<nowiki>{</nowiki>wdqs2001.codfw.wmnet<nowiki>}</nowiki>' 'sudo systemctl restart wdqs-blazegraph.service'
* 17:53 cdanis: ❌cdanis@cumin1001.eqiad.wmnet ~ 🕑☕ sudo cumin -b10 'wdqs*' "run-puppet-agent --unless-version 1a4ae81"
* 17:52 cdanis@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=wdqs.*,name=codfw
* 17:35 cdanis@cumin1001: conftool action : set/pooled=false; selector: dnsdisc=wdqs.*,name=codfw
* 17:22 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-reload
* 16:57 ryankemper@cumin1001: END (ERROR) - Cookbook sre.wdqs.data-reload (exit_code=97)
* 16:56 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-reload
* 15:36 urbanecm@deploy1001: Synchronized private/PrivateSettings.php: Update [[phab:T250887|T250887]] mitigations (duration: 01m 05s)
* 13:49 akosiaris@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=helm-charts,name=.*
* 12:29 marostegui: Decrease labsdb1009 weight a bit, as it is lagging again.
* 12:23 XioNoX: remove bogus lo0 IPs from cr3-knams
* 12:21 Urbanecm: Stagging at mwdebug1001 ended, run scap pull to clean changes
* 12:17 Urbanecm: Stagging at mwdebug1001 again
* 12:02 Urbanecm: Stagging at mwdebug1001 ended, run scap pull to clean changes
* 12:00 Urbanecm: Stagging at mwdebug1001
* 11:49 urbanecm@deploy1001: Synchronized wmf-config/CommonSettings.php: {{Gerrit|745ff20f53e4914cf6e1717c963419e74b68e693}}: Log ClosedWikiProviders start with info level ([[phab:T258695|T258695]]) (duration: 01m 05s)
* 11:48 marostegui: Deploy MCR schema change on db1145:3314
* 11:36 dcausse: European mid-day backport window done
* 11:31 dcausse@deploy1001: Synchronized php-1.36.0-wmf.1/extensions/Wikibase: [[phab:T258507|T258507]]: Fix bug that causes wrong prefixes in RDF output (duration: 01m 11s)
* 11:18 akosiaris: depool scb in mobileapps/eqiad. [[phab:T218733|T218733]]
* 11:17 akosiaris@cumin1001: conftool action : set/pooled=no; selector: dc=eqiad,service=mobileapps,name=scb.*
* 11:13 dcausse@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [[phab:T258474|T258474]]: [sdoc] fix entity source base URIs (duration: 01m 07s)
* 10:27 akosiaris@cumin1001: conftool action : set/weight=1; selector: dc=eqiad,service=mobileapps,name=scb.*
* 10:27 akosiaris@cumin1001: conftool action : set/weight=1; selector: dc=eqiad,service=mobileapps,name=scb*
* 10:25 akosiaris@cumin1001: conftool action : set/pooled=no; selector: dc=eqiad,service=mobileapps,name=scb1002.*
* 10:24 akosiaris@cumin1001: conftool action : set/pooled=no; selector: dc=eqiad,service=mobileapps,name=scb1001.*
* 10:18 jayme@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'zotero' for release 'production' .
* 10:14 jayme@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'zotero' for release 'production' .
* 10:11 akosiaris: poole kubernetes in mobileapps/eqiad. [[phab:T218733|T218733]]
* 10:11 akosiaris@cumin1001: conftool action : set/pooled=yes; selector: dc=eqiad,service=mobileapps,name=kubernetes.*
* 10:06 volans@deploy1001: Finished deploy [debmonitor/deploy@16d0c45]: Release v0.2.6 (duration: 00m 36s)
* 10:06 volans@deploy1001: Started deploy [debmonitor/deploy@16d0c45]: Release v0.2.6
* 10:05 volans@deploy1001: Finished deploy [debmonitor/deploy@44aa1ee]: Release v0.2.6 (duration: 00m 14s)
* 10:05 volans@deploy1001: Started deploy [debmonitor/deploy@44aa1ee]: Release v0.2.6
* 10:04 jayme@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'zotero' for release 'staging' .
* 09:51 akosiaris: prepare for pooling kubernetes mobileapps capacity in eqiad. [[phab:T218733|T218733]]
* 09:51 akosiaris@cumin1001: conftool action : set/weight=10; selector: dc=eqiad,service=mobileapps,name=kubernetes.*
* 09:46 jayme@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
* 09:40 jayme@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
* 09:38 jayme@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'wikifeeds' for release 'staging' .
* 09:27 akosiaris@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
* 09:27 akosiaris@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'mobileapps' for release 'nontls' .
* 09:25 akosiaris@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'mobileapps' for release 'staging' .
* 09:24 jayme@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'termbox' for release 'production' .
* 09:20 jayme@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'termbox' for release 'production' .
* 09:19 akosiaris: lower replica count back to 80 for mobileapps. [[phab:T218733|T218733]]
* 09:19 akosiaris@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
* 09:19 akosiaris@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'mobileapps' for release 'nontls' .
* 09:02 jayme@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'termbox' for release 'staging' .
* 08:59 marostegui: transfer --type=xtrabackup from db1117:3322 to db1107 [[phab:T257540|T257540]]
* 08:45 jayme@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'proton' for release 'production' .
* 08:42 godog: test librenms poller from netmon2001
* 08:41 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 08:40 XioNoX: remove pim-rp IPs from last routers - [[phab:T257573|T257573]]
* 08:40 jayme@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'proton' for release 'production' .
* 08:39 marostegui@cumin1001: START - Cookbook sre.hosts.downtime
* 08:29 jayme@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'proton' for release 'production' .
* 08:26 marostegui@cumin1001: dbctl commit (dc=all): 'Remove db1107 from s1 [[phab:T257540|T257540]]', diff saved to https://phabricator.wikimedia.org/P12025 and previous config saved to /var/cache/conftool/dbconfig/20200723-082647-marostegui.json
* 08:16 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1107 to move it to m2 [[phab:T257540|T257540]]', diff saved to https://phabricator.wikimedia.org/P12024 and previous config saved to /var/cache/conftool/dbconfig/20200723-081650-marostegui.json
* 05:29 marostegui: Restore labsdb1009's original weight
* 00:24 legoktm@deploy1001: Synchronized php-1.35.0-wmf.41/includes/: [[phab:T258664|T258664]]: Revert "Add a new type of database to the installer from extension" (2/2) (duration: 01m 08s)
* 00:22 legoktm@deploy1001: Synchronized php-1.35.0-wmf.41/includes/libs/rdbms/database/Database.php: [[phab:T258664|T258664]]: Revert "Add a new type of database to the installer from extension" (duration: 01m 05s)
* 00:20 legoktm@deploy1001: Scap failed!: 9/9 canaries failed their endpoint checks(https://en.wikipedia.org)
* 00:16 legoktm@deploy1001: Synchronized php-1.36.0-wmf.1/includes/: [[phab:T258664|T258664]]: Revert "Add a new type of database to the installer from extension" (duration: 01m 09s)
* 00:11 legoktm@deploy1001: scap failed: average error rate on 3/9 canaries increased by 10x (rerun with --force to override this check, see https://logstash.wikimedia.org/goto/e474f13ffac6b8c3bf919c4aeafc8c9b for details)


== 2020-07-22 ==
== 2021-11-27 ==
* 22:07 cdanis: remove downtime on api.svc.codfw.wmnet [[phab:T258614|T258614]]
* 19:55 andrew@deploy1002: Finished deploy [horizon/deploy@6115b3b]: network UI updates for [[phab:T296548|T296548]] (duration: 04m 14s)
* 19:26 jhuneidi@deploy1001: Synchronized php: group1 wikis to 1.36.0-wmf.1 (duration: 01m 03s)
* 19:51 andrew@deploy1002: Started deploy [horizon/deploy@6115b3b]: network UI updates for [[phab:T296548|T296548]]
* 19:25 jhuneidi@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.36.0-wmf.1
* 19:47 andrew@deploy1002: Finished deploy [horizon/deploy@6115b3b]: network UI tests in codfw1dev (duration: 02m 01s)
* 19:15 urbanecm@deploy1001: Finished scap: {{Gerrit|9529cf8d2570bbf6dd1e919c966f5954e39dbd67}}: {{Gerrit|b66ec9143bd96cbf3a20b70f6aa3f2d6d7963bb5}}: OOUI backport; {{Gerrit|93755a6a92923ae390e3a04b19421c8562568d2a}}: i18n changes for OAuth, removal of spam messages (duration: 42m 26s)
* 19:45 andrew@deploy1002: Started deploy [horizon/deploy@6115b3b]: network UI tests in codfw1dev
* 19:14 ejegg: updated payments-wiki from {{Gerrit|bf91f8adff}} to {{Gerrit|31a3de1130}}
* 12:22 elukey: drop /var/tmp/core files from ores100[2,4] root partition full
* 19:11 mutante: mw2335 - mw2339 - scap pull
* 12:10 elukey: drop /var/tmp/core files from ores1009, root partition full
* 18:39 dzahn@cumin1001: conftool action : set/weight=15; selector: name=mw233[5-9].codfw.wmnet
* 11:55 elukey: disable coredumps for ORES celery units (will cause a roll restart of all celeries) - [[phab:T296563|T296563]]
* 18:38 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw233[6-9].codfw.wmnet
* 11:46 elukey: drop ores coredumps from ores1008
* 18:36 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw233[6-9].codfw.wmnet
* 09:56 elukey: powercycle analytics1071, soft lockup stacktraces in the tty
* 18:33 urbanecm@deploy1001: Started scap: {{Gerrit|9529cf8d2570bbf6dd1e919c966f5954e39dbd67}}: {{Gerrit|b66ec9143bd96cbf3a20b70f6aa3f2d6d7963bb5}}: OOUI backport; {{Gerrit|93755a6a92923ae390e3a04b19421c8562568d2a}}: i18n changes for OAuth, removal of spam messages
* 09:51 elukey: move ores coredump files from /var/cache/tmp to /srv/coredumps on ores100[6,7,8] and ores2003 to free space on the root partition
* 18:33 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2335.codfw.wmnet
* 18:28 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw233[5-9].codfw.wmnet
* 18:16 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2339.codfw.wmnet
* 17:58 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2338.codfw.wmnet
* 17:58 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2337.codfw.wmnet
* 17:58 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2336.codfw.wmnet
* 17:26 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2335.codfw.wmnet
* 15:31 moritzm: updated stretch installer image to Stretch 9.13 release [[phab:T258407|T258407]]
* 15:27 jayme@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'eventstreams' for release 'production' .
* 15:27 jayme@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'eventstreams' for release 'canary' .
* 14:52 XioNoX: add accept-data and remove bogus v6 IP from ulsfo sandbox vlan
* 14:43 akosiaris@cumin1001: conftool action : set/pooled=no; selector: dc=codfw,service=mobileapps,name=scb.*
* 14:43 jayme@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'eventstreams' for release 'canary' .
* 14:43 jayme@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'eventstreams' for release 'production' .
* 14:35 jayme@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'eventstreams' for release 'canary' .
* 14:35 jayme@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'eventstreams' for release 'production' .
* 14:12 jayme@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
* 14:12 jayme@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'eventgate-main' for release 'canary' .
* 14:06 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 14:04 filippo@cumin1001: START - Cookbook sre.hosts.downtime
* 13:54 akosiaris@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'mobileapps' for release 'nontls' .
* 13:54 akosiaris@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
* 13:50 jayme@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'eventgate-main' for release 'canary' .
* 13:49 jayme@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
* 13:36 jayme@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'eventgate-main' for release 'canary' .
* 13:36 jayme@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
* 13:34 akosiaris@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'mobileapps' for release 'nontls' .
* 13:33 akosiaris@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
* 13:20 jayme@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'canary' .
* 13:19 jayme@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
* 13:18 jayme@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'canary' .
* 13:18 jayme@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
* 13:16 jayme@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
* 13:16 jayme@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'canary' .
* 12:36 akosiaris@cumin1001: conftool action : set/weight=1; selector: dc=codfw,service=mobileapps,name=scb.*
* 12:32 jayme@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'eventgate-analytics' for release 'production' .
* 12:28 akosiaris@cumin1001: conftool action : set/weight=10; selector: dc=codfw,service=mobileapps,name=scb.*
* 12:20 jayme@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'eventgate-analytics' for release 'production' .
* 12:18 jayme@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'eventgate-analytics' for release 'production' .
* 12:17 akosiaris@cumin1001: conftool action : set/weight=0; selector: dc=codfw,service=mobileapps,name=scb.*
* 12:05 ema: A:cp-text varnish ban ptwikiversity [[phab:T256750|T256750]]
* 12:01 ema: A:cp-text varnish ban frwiktionary [[phab:T256750|T256750]]
* 11:56 ema: A:cp-text varnish ban euwiki [[phab:T256750|T256750]]
* 11:54 akosiaris@cumin1001: conftool action : set/pooled=yes; selector: dc=codfw,service=mobileapps,name=scb.*
* 11:54 jayme@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'canary' .
* 11:54 jayme@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
* 11:52 Urbanecm: EU B&C window done
* 11:52 akosiaris@cumin1001: conftool action : set/pooled=no; selector: dc=codfw,service=mobileapps,name=scb.*
* 11:49 ema: A:cp-text force puppet run to apply https://gerrit.wikimedia.org/r/c/operations/puppet/+/615446 [[phab:T256750|T256750]]
* 11:48 urbanecm@deploy1001: Synchronized wmf-config/interwiki.php: Update interwiki cache (duration: 02m 15s)
* 11:42 jdrewniak@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:614889{{!}}Enable desktop improvements by default for testing group (round 1) (T254227)]] (duration: 01m 05s)
* 11:30 jdrewniak@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:614888{{!}}Enable instrumentation for wikis in the desktop improvements testing group (T254228)]] (duration: 01m 04s)
* 11:30 jayme@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'canary' .
* 11:30 jayme@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
* 11:28 jdrewniak@deploy1001: Synchronized wmf-config/config: Config: [[gerrit:614888{{!}}Enable instrumentation for wikis in the desktop improvements testing group (T254228)]] (duration: 01m 05s)
* 11:20 jdrewniak@deploy1001: Synchronized multiversion/MWConfigCacheGenerator.php: Config: [[gerrit:614888{{!}}Enable instrumentation for wikis in the desktop improvements testing group (T254228)]] (duration: 01m 05s)
* 11:18 jdrewniak@deploy1001: Synchronized dblists/desktop-improvements.dblist: Config: [[gerrit:614888{{!}}Enable instrumentation for wikis in the desktop improvements testing group (T254228)]] (duration: 01m 18s)
* 11:13 jayme@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'canary' .
* 11:13 jayme@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
* 10:39 jayme@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'cxserver' for release 'production' .
* 10:24 jbond42: upload prometheus-swagger-exporter_0.3-1+deb10u1 to apt1001 buster repo
* 10:24 jayme@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'cxserver' for release 'production' .
* 10:22 jayme@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'cxserver' for release 'staging' .
* 10:19 akosiaris@deploy2001: helmfile [CODFW] Ran 'sync' command on namespace 'mobileapps' for release 'nontls' .
* 10:19 akosiaris@deploy2001: helmfile [CODFW] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
* 10:12 akosiaris@cumin1001: conftool action : set/pooled=yes; selector: dc=codfw,service=mobileapps,name=scb.*
* 10:08 akosiaris@cumin1001: conftool action : set/pooled=no; selector: dc=codfw,service=mobileapps,name=scb.*
* 10:04 jayme@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'citoid' for release 'production' .
* 10:01 jayme@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'citoid' for release 'production' .
* 09:58 marostegui: Deploy MCR schema change on s4 codfw master (lag will appear on codfw) - [[phab:T238966|T238966]]
* 09:55 akosiaris: bump memory in codfw mobileapps another 20% [[phab:T218733|T218733]]
* 09:55 akosiaris@deploy2001: helmfile [CODFW] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
* 09:55 akosiaris@deploy2001: helmfile [CODFW] Ran 'sync' command on namespace 'mobileapps' for release 'nontls' .
* 09:52 godog: centrallog1001 lvextend /srv by 130G
* 09:51 jayme@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'citoid' for release 'staging' .
* 09:46 akosiaris: codfw mobileapps kubernetes traffic back to 96% [[phab:T218733|T218733]] again. scb pooled again.
* 09:46 akosiaris@cumin1001: conftool action : set/pooled=yes; selector: dc=codfw,service=mobileapps,name=scb.*
* 09:43 jayme@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'blubberoid' for release 'production' .
* 09:43 akosiaris@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'mobileapps' for release 'nontls' .
* 09:43 akosiaris@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
* 09:40 jayme@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'blubberoid' for release 'production' .
* 09:40 akosiaris: increase codfw mobileapps kubernetes traffic to 100% [[phab:T218733|T218733]]
* 09:40 akosiaris@cumin1001: conftool action : set/pooled=no; selector: dc=codfw,service=mobileapps,name=scb.*
* 09:34 jayme@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'blubberoid' for release 'staging' .
* 09:27 akosiaris@deploy2001: helmfile [CODFW] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
* 09:27 akosiaris@deploy2001: helmfile [CODFW] Ran 'sync' command on namespace 'mobileapps' for release 'nontls' .
* 09:25 akosiaris: bump memory limits for mobileapps by 25% [[phab:T218733|T218733]]
* 09:25 akosiaris@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'mobileapps' for release 'staging' .
* 09:10 jayme: updated docker-report to 0.0.7-1 on deneb
* 09:09 jayme: import docker-report 0.0.7-1 to buster-wikimedia
* 09:06 gehel: restarting blazegraph on all wdqs nodes - new vocabulary
* 08:48 dcausse: restarting blazegraph on wdqs1010 (testing new vocab)
* 08:46 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1126', diff saved to https://phabricator.wikimedia.org/P12017 and previous config saved to /var/cache/conftool/dbconfig/20200722-084613-marostegui.json
* 08:42 kormat@cumin1001: dbctl commit (dc=all): 'Increase es1020 to 100% pooled in es4, reduce es1021 to weight 0 [[phab:T257284|T257284]]', diff saved to https://phabricator.wikimedia.org/P12016 and previous config saved to /var/cache/conftool/dbconfig/20200722-084159-kormat.json
* 08:39 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1126', diff saved to https://phabricator.wikimedia.org/P12015 and previous config saved to /var/cache/conftool/dbconfig/20200722-083926-marostegui.json
* 08:35 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1084 and db1107', diff saved to https://phabricator.wikimedia.org/P12014 and previous config saved to /var/cache/conftool/dbconfig/20200722-083535-marostegui.json
* 08:31 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1126', diff saved to https://phabricator.wikimedia.org/P12013 and previous config saved to /var/cache/conftool/dbconfig/20200722-083140-marostegui.json
* 08:30 kart_: Updated cxserver to 2020-07-20-200559-production ([[phab:T257674|T257674]])
* 08:28 kartik@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'cxserver' for release 'production' .
* 08:25 kartik@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'cxserver' for release 'production' .
* 08:25 volans@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 08:23 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1084 and db1107', diff saved to https://phabricator.wikimedia.org/P12012 and previous config saved to /var/cache/conftool/dbconfig/20200722-082309-marostegui.json
* 08:22 kartik@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'cxserver' for release 'staging' .
* 08:20 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1126', diff saved to https://phabricator.wikimedia.org/P12010 and previous config saved to /var/cache/conftool/dbconfig/20200722-082023-marostegui.json
* 08:19 volans@cumin1001: START - Cookbook sre.dns.netbox
* 08:16 akosiaris: increase codfw mobileapps kubernetes traffic to 96% [[phab:T218733|T218733]]. Take #2. Let's see if I can reproduce the weird increases in p99 latencies and figure out their cause
* 08:15 akosiaris@cumin1001: conftool action : set/weight=1; selector: dc=codfw,service=mobileapps,name=scb.*
* 08:14 kormat@cumin1001: dbctl commit (dc=all): 'Increase es1020 to 75% pooled in es4, reduce es1021 to weight 25 [[phab:T257284|T257284]]', diff saved to https://phabricator.wikimedia.org/P12009 and previous config saved to /var/cache/conftool/dbconfig/20200722-081457-kormat.json
* 08:13 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1084 and db1107', diff saved to https://phabricator.wikimedia.org/P12008 and previous config saved to /var/cache/conftool/dbconfig/20200722-081330-marostegui.json
* 08:12 moritzm: Turnilo switched to CAS
* 08:05 jayme: updated docker-report to 0.0.6-1 on deneb
* 07:57 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1084 and db1107', diff saved to https://phabricator.wikimedia.org/P12007 and previous config saved to /var/cache/conftool/dbconfig/20200722-075749-marostegui.json
* 07:53 kormat@cumin1001: dbctl commit (dc=all): 'Increase es1020 to 50% pooled in es4 [[phab:T257284|T257284]]', diff saved to https://phabricator.wikimedia.org/P12006 and previous config saved to /var/cache/conftool/dbconfig/20200722-075312-kormat.json
* 07:50 marostegui@cumin1001: dbctl commit (dc=all): 'Add db1084 to s1, depooled [[phab:T253217|T253217]]', diff saved to https://phabricator.wikimedia.org/P12005 and previous config saved to /var/cache/conftool/dbconfig/20200722-075040-marostegui.json
* 07:49 jayme: import docker-report 0.0.6-1 to buster-wikimedia
* 07:40 jynus: stop db1145 for hw maintenance [[phab:T258249|T258249]]
* 06:47 elukey: update analytics-in4/6 filters on cr1/cr2 eqiad (ref https://gerrit.wikimedia.org/r/c/operations/homer/public/+/614702)
* 06:26 marostegui: Stop MySQL on db1107
* 06:11 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 06:09 marostegui@cumin1001: START - Cookbook sre.hosts.downtime
* 06:04 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1107 to clone db1084', diff saved to https://phabricator.wikimedia.org/P12003 and previous config saved to /var/cache/conftool/dbconfig/20200722-060432-marostegui.json
* 05:16 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1126', diff saved to https://phabricator.wikimedia.org/P12002 and previous config saved to /var/cache/conftool/dbconfig/20200722-051607-marostegui.json


== 2020-07-21 ==
== 2021-11-26 ==
* 23:37 ebernhardson@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Bump cirrus MLR models to latest (duration: 01m 06s)
* 16:11 arnoldokoth: drain kubestage1002 node in prep for decommissioning
* 23:13 Urbanecm: Evening backport window done
* 16:05 arnoldokoth: drain kubestage1001 node in prep for decommissioning
* 23:12 urbanecm@deploy1001: Synchronized wmf-config/CommonSettings.php: {{Gerrit|7a50168d54b5e86834606fb8d7880eb3a923ffd5}}: Updating UploadWizard template: PD-old-70-1923->PD-old-70-expired ([[phab:T258523|T258523]]) (duration: 01m 06s)
* 15:46 elukey: move /var/tmp/core/* to /srv/coredumps on ores1008 to free root space
* 23:06 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|7acc9d966a07d589bb6aed5f801c9e1defc75fe1}}: Enable $wgWatchlistExpiry on testwiki ([[phab:T257506|T257506]]) (duration: 01m 08s)
* 14:30 jelto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'miscweb' for release 'main' .
* 19:10 jhuneidi@deploy1001: rebuilt and synchronized wikiversions files: group0 wikis to 1.36.0-wmf.1
* 14:25 jelto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'miscweb' for release 'main' .
* 19:02 catrope@deploy1001: Synchronized php-1.36.0-wmf.1/includes/Storage/PageUpdater.php: Fix handling of null edits ([[phab:T257766|T257766]]) (duration: 01m 06s)
* 14:21 jelto@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'miscweb' for release 'main' .
* 19:01 catrope@deploy1001: Synchronized php-1.35.0-wmf.41/includes/Storage/PageUpdater.php: Fix handling of null edits ([[phab:T257766|T257766]]) (duration: 01m 11s)
* 13:48 jelto@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'miscweb' for release 'main' .
* 18:33 jhuneidi@deploy1001: Finished scap: testwikis wikis to 1.36.0-wmf.1 (duration: 41m 22s)
* 13:46 jelto@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'miscweb' for release 'main' .
* 18:27 ejegg: restored new URL for TY page in payments-wiki settings
* 13:25 akosiaris@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
* 18:22 mforns@deploy1001: Finished deploy [analytics/refinery@0c25de1] (thin): Redeploying to unbreak unique devices per domain monthly THIN [analytics/refinery@0c25de19a3a309276654b4463cca4f574336d8fd] (duration: 00m 07s)
* 13:25 akosiaris@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
* 18:22 mforns@deploy1001: Started deploy [analytics/refinery@0c25de1] (thin): Redeploying to unbreak unique devices per domain monthly THIN [analytics/refinery@0c25de19a3a309276654b4463cca4f574336d8fd]
* 12:21 vgutierrez: restarting HAProxy on O:cache::upload_haproxy - [[phab:T290005|T290005]]
* 18:21 mforns@deploy1001: Finished deploy [analytics/refinery@0c25de1]: Redeploying to unbreak unique devices per domain monthly - third try [analytics/refinery@0c25de19a3a309276654b4463cca4f574336d8fd] (duration: 00m 12s)
* 11:41 akosiaris: [[phab:T296303|T296303]] cleanup weird state of calico-codfw cluster
* 18:21 mforns@deploy1001: Started deploy [analytics/refinery@0c25de1]: Redeploying to unbreak unique devices per domain monthly - third try [analytics/refinery@0c25de19a3a309276654b4463cca4f574336d8fd]
* 11:41 akosiaris@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'sync'.
* 18:17 mforns@deploy1001: Finished deploy [analytics/refinery@0c25de1]: Redeploying to unbreak unique devices per domain monthly - second try [analytics/refinery@0c25de19a3a309276654b4463cca4f574336d8fd] (duration: 00m 17s)
* 11:41 akosiaris@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'sync'.
* 18:16 mforns@deploy1001: Started deploy [analytics/refinery@0c25de1]: Redeploying to unbreak unique devices per domain monthly - second try [analytics/refinery@0c25de19a3a309276654b4463cca4f574336d8fd]
* 11:39 akosiaris@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'sync'.
* 18:13 mforns@deploy1001: Finished deploy [analytics/refinery@0c25de1]: Redeploying to unbreak unique devices per domain monthly [analytics/refinery@0c25de19a3a309276654b4463cca4f574336d8fd] (duration: 05m 32s)
* 11:25 vgutierrez: restarting HAProxy on O:cache::(text{{!}}upload)_haproxy - [[phab:T290005|T290005]]
* 18:08 mforns@deploy1001: Started deploy [analytics/refinery@0c25de1]: Redeploying to unbreak unique devices per domain monthly [analytics/refinery@0c25de19a3a309276654b4463cca4f574336d8fd]
* 10:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repool after fixing users [[phab:T296274|T296274]]', diff saved to https://phabricator.wikimedia.org/P17880 and previous config saved to /var/cache/conftool/dbconfig/20211126-102340-ladsgroup.json
* 17:52 jhuneidi@deploy1001: Started scap: testwikis wikis to 1.36.0-wmf.1
* 10:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1111 ([[phab:T296274|T296274]])', diff saved to https://phabricator.wikimedia.org/P17879 and previous config saved to /var/cache/conftool/dbconfig/20211126-101714-ladsgroup.json
* 17:50 volans@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
* 10:17 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1111.eqiad.wmnet with reason: Maintenance [[phab:T296274|T296274]]
* 17:45 volans@cumin1001: START - Cookbook sre.dns.netbox
* 10:17 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db1111.eqiad.wmnet with reason: Maintenance [[phab:T296274|T296274]]
* 17:10 jhuneidi@deploy1001: Pruned MediaWiki: 1.35.0-wmf.39 (duration: 16m 25s)
* 10:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repool after fixing users [[phab:T296274|T296274]]', diff saved to https://phabricator.wikimedia.org/P17878 and previous config saved to /var/cache/conftool/dbconfig/20211126-101423-ladsgroup.json
* 16:32 ppchelko@deploy1001: Finished deploy [restbase/deploy@4f3cb41]: Add new wikis to RESTBase, take 2 (duration: 04m 54s)
* 10:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1177 ([[phab:T296274|T296274]])', diff saved to https://phabricator.wikimedia.org/P17877 and previous config saved to /var/cache/conftool/dbconfig/20211126-100547-ladsgroup.json
* 16:27 ppchelko@deploy1001: Started deploy [restbase/deploy@4f3cb41]: Add new wikis to RESTBase, take 2
* 10:05 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1177.eqiad.wmnet with reason: Maintenance [[phab:T296274|T296274]]
* 16:27 ppchelko@deploy1001: Finished deploy [restbase/deploy@4f3cb41]: Add new wikis to RESTBase (duration: 10m 37s)
* 10:05 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db1177.eqiad.wmnet with reason: Maintenance [[phab:T296274|T296274]]
* 16:21 longma: 1.36.0-wmf.1 was branched at {{Gerrit|3a1faac3764ecae8dde813bd67a5a8e8f4975a85}} for [[phab:T257969|T257969]]
* 10:04 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance [[phab:T296143|T296143]]
* 16:16 ppchelko@deploy1001: Started deploy [restbase/deploy@4f3cb41]: Add new wikis to RESTBase
* 10:04 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance [[phab:T296143|T296143]]
* 15:16 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 08:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'After maintenance db1160 ([[phab:T296143|T296143]])', diff saved to https://phabricator.wikimedia.org/P17876 and previous config saved to /var/cache/conftool/dbconfig/20211126-082834-ladsgroup.json
* 15:12 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
* 08:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'After maintenance db1160 ([[phab:T296143|T296143]])', diff saved to https://phabricator.wikimedia.org/P17875 and previous config saved to /var/cache/conftool/dbconfig/20211126-081329-ladsgroup.json
* 15:10 moritzm: draining restbase1027 for eventual reboot for kernel security update
* 07:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'After maintenance db1160 ([[phab:T296143|T296143]])', diff saved to https://phabricator.wikimedia.org/P17874 and previous config saved to /var/cache/conftool/dbconfig/20211126-075824-ladsgroup.json
* 15:09 godog: poweroff ms-be1024 for bbu replacement - [[phab:T257949|T257949]]
* 07:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'After maintenance db1160 ([[phab:T296143|T296143]])', diff saved to https://phabricator.wikimedia.org/P17873 and previous config saved to /var/cache/conftool/dbconfig/20211126-074320-ladsgroup.json
* 15:08 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 06:28 Amir1: killing extensions/MachineVision/maintenance/fetchSuggestions.php in mwmaint
* 15:08 filippo@cumin1001: START - Cookbook sre.hosts.downtime
* 06:19 Amir1: killing lingering process from mwmaint to depooled db (db1160) that was depooled nine hours ago
* 15:04 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 15:01 vgutierrez: show a synthetic warning for traffic using ECDHE-RSA-AES128-SHA - [[phab:T258405|T258405]]
* 15:01 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
* 15:00 moritzm: draining restbase1026 for eventual reboot for kernel security update
* 14:57 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 14:57 jmm@cumin2001: START - Cookbook sre.hosts.downtime
* 14:56 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 14:52 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
* 14:51 moritzm: draining restbase1025 for eventual reboot for kernel security update
* 14:48 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 14:44 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
* 14:35 akosiaris@cumin1001: conftool action : set/weight=10; selector: dc=codfw,service=mobileapps,name=scb.*
* 14:35 akosiaris: decrease codfw mobileapps kubernetes traffic to 72% [[phab:T218733|T218733]]. Weird latency patterns exhibited when 92% was reached. See https://grafana.wikimedia.org/d/5CmeRcnMz/mobileapps?panelId=34&fullscreen&orgId=1&from=1595338489749&to=1595342071227&var-dc=codfw%20prometheus%2Fk8s&var-service=mobileapps&var-container_name=All
* 14:35 moritzm: draining restbase1024 for eventual reboot for kernel security update
* 14:32 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1119', diff saved to https://phabricator.wikimedia.org/P11994 and previous config saved to /var/cache/conftool/dbconfig/20200721-143204-marostegui.json
* 14:26 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1119', diff saved to https://phabricator.wikimedia.org/P11993 and previous config saved to /var/cache/conftool/dbconfig/20200721-142634-marostegui.json
* 14:24 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 14:24 jmm@cumin2001: START - Cookbook sre.hosts.downtime
* 14:23 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 14:19 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
* 14:18 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1119', diff saved to https://phabricator.wikimedia.org/P11992 and previous config saved to /var/cache/conftool/dbconfig/20200721-141813-marostegui.json
* 14:16 moritzm: draining restbase1023 for eventual reboot for kernel security update
* 14:10 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 14:06 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
* 14:06 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 14:06 jmm@cumin2001: START - Cookbook sre.hosts.downtime
* 14:03 moritzm: draining restbase1022 for eventual reboot for kernel security update
* 14:01 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 13:57 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
* 13:55 moritzm: draining restbase1021 for eventual reboot for kernel security update
* 13:51 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 13:50 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1119', diff saved to https://phabricator.wikimedia.org/P11991 and previous config saved to /var/cache/conftool/dbconfig/20200721-135028-marostegui.json
* 13:48 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
* 13:46 moritzm: draining restbase1020 for eventual reboot for kernel security update
* 13:42 akosiaris@cumin1001: conftool action : set/weight=1; selector: dc=codfw,service=mobileapps,name=scb.*
* 13:41 akosiaris: increase codfw mobileapps kubernetes traffic to 96% [[phab:T218733|T218733]]
* 13:41 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 13:41 jmm@cumin2001: START - Cookbook sre.hosts.downtime
* 13:15 Amir1: end of ladsgroup@mwmaint1002:~$ foreachwikiindblist wikidataclient extensions/Wikibase/lib/maintenance/populateSitesTable.php --force-protocol https ([[phab:T258472|T258472]] [[phab:T258473|T258473]])
* 13:13 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 13:11 marostegui@cumin1001: START - Cookbook sre.hosts.downtime
* 13:10 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 13:06 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
* 13:03 moritzm: draining restbase1019 for eventual reboot for kernel security update
* 13:01 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 13:01 jmm@cumin2001: START - Cookbook sre.hosts.downtime
* 12:55 Amir1: start of ladsgroup@mwmaint1002:~$ foreachwikiindblist wikidataclient extensions/Wikibase/lib/maintenance/populateSitesTable.php --force-protocol https ([[phab:T258472|T258472]] [[phab:T258473|T258473]])
* 12:54 marostegui: Stop haproxy on dbproxy1012 - [[phab:T255408|T255408]]
* 12:13 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1087', diff saved to https://phabricator.wikimedia.org/P11988 and previous config saved to /var/cache/conftool/dbconfig/20200721-121302-marostegui.json
* 12:05 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 12:01 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
* 11:53 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 11:49 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
* 11:45 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 11:41 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
* 11:25 Urbanecm: EU B&C window done
* 11:24 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|7b96c7ea35557888c6cec2dd19768c246bff804b}}: Enable botpasswords at checkuserwiki and stewardwiki ([[phab:T258358|T258358]], [[phab:T258355|T258355]]) (duration: 00m 57s)
* 11:11 Urbanecm: Create bot_passwords table at checkuserwiki ([[phab:T258358|T258358]])
* 11:10 Urbanecm: Create bot_passwords table at stewardwiki ([[phab:T258355|T258355]])
* 11:09 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|5d5bb37c342310be5ca0b0e11a8490703867f4fd}}: Enable Vector opt in preference everywhere ([[phab:T254228|T254228]]) (duration: 00m 57s)
* 11:08 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1085 [[phab:T258360|T258360]]', diff saved to https://phabricator.wikimedia.org/P11987 and previous config saved to /var/cache/conftool/dbconfig/20200721-110854-marostegui.json
* 11:00 effie: enable puppet on  P:mediawiki::mcrouter_wancache - [[phab:T247956|T247956]]
* 10:58 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1085 [[phab:T258360|T258360]]', diff saved to https://phabricator.wikimedia.org/P11986 and previous config saved to /var/cache/conftool/dbconfig/20200721-105852-marostegui.json
* 10:45 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1085 [[phab:T258360|T258360]]', diff saved to https://phabricator.wikimedia.org/P11985 and previous config saved to /var/cache/conftool/dbconfig/20200721-104546-marostegui.json
* 10:34 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1085', diff saved to https://phabricator.wikimedia.org/P11984 and previous config saved to /var/cache/conftool/dbconfig/20200721-103430-marostegui.json
* 10:20 effie: disable puppet on  P:mediawiki::mcrouter_wancache - [[phab:T247956|T247956]]
* 10:13 effie: enable puppet on on wtp*
* 10:02 marostegui: Analyze revision table on db1119 [[phab:T258480|T258480]]
* 10:02 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1119 [[phab:T258480|T258480]]', diff saved to https://phabricator.wikimedia.org/P11983 and previous config saved to /var/cache/conftool/dbconfig/20200721-100159-marostegui.json
* 09:59 akosiaris: move all codfw mobileapps nodes (kubernetes and scb) to weight 10. Traffic level remains at 72.727272% flowing to kubernetes, the rest to scb [[phab:T218733|T218733]]
* 09:59 akosiaris: move all codfw mobileapps nodes (kubernetes and scb) to weight 10. Traffic level remains at 72.727272% flowing to kubernetes, the rest to scb
* 09:59 effie: disable puppet on wtp* to merge 613307
* 09:58 akosiaris@cumin1001: conftool action : set/weight=10; selector: dc=codfw,service=mobileapps
* 09:58 akosiaris: increase codfw mobileapps kubernetes traffic to 72.727272% [[phab:T218733|T218733]]
* 09:57 akosiaris@cumin1001: conftool action : set/weight=1; selector: dc=codfw,service=mobileapps,name=scb.*
* 09:44 elukey: add term 'idp' to analytics-in4/6 filters on cr1-eqiad and cr2-eqiad (ref: https://gerrit.wikimedia.org/r/c/operations/homer/public/+/615160)
* 09:21 kormat@cumin1001: dbctl commit (dc=all): 'Re-pool es1020 at 25% in es4 [[phab:T257284|T257284]]', diff saved to https://phabricator.wikimedia.org/P11982 and previous config saved to /var/cache/conftool/dbconfig/20200721-092126-kormat.json
* 08:37 akosiaris: increase codfw mobileapps kubernetes traffic to 47% [[phab:T218733|T218733]]
* 08:34 akosiaris@cumin1001: conftool action : set/weight=3; selector: dc=codfw,service=mobileapps,name=scb.*
* 08:28 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 08:26 kormat@cumin1001: START - Cookbook sre.hosts.downtime
* 08:08 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1119', diff saved to https://phabricator.wikimedia.org/P11980 and previous config saved to /var/cache/conftool/dbconfig/20200721-080842-marostegui.json
* 07:52 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1119', diff saved to https://phabricator.wikimedia.org/P11979 and previous config saved to /var/cache/conftool/dbconfig/20200721-075233-marostegui.json
* 07:49 marostegui: Deploy schema change on db1087, lag will appear on s8 (wikidata) on labsdb hosts [[phab:T256685|T256685]]
* 07:48 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1087 [[phab:T256685|T256685]]', diff saved to https://phabricator.wikimedia.org/P11978 and previous config saved to /var/cache/conftool/dbconfig/20200721-074843-marostegui.json
* 07:37 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1119', diff saved to https://phabricator.wikimedia.org/P11977 and previous config saved to /var/cache/conftool/dbconfig/20200721-073757-marostegui.json
* 07:29 kormat@deploy1001: Synchronized wmf-config/db-eqiad.php: Re-enable writes to es4 [[phab:T257847|T257847]] (duration: 00m 57s)
* 07:22 kormat@cumin1001: dbctl commit (dc=all): 'Depool es1020 from es4 [[phab:T257847|T257847]]', diff saved to https://phabricator.wikimedia.org/P11976 and previous config saved to /var/cache/conftool/dbconfig/20200721-072251-kormat.json
* 07:21 kormat@cumin1001: dbctl commit (dc=all): 'Promote es1021 to es4 master [[phab:T257847|T257847]]', diff saved to https://phabricator.wikimedia.org/P11975 and previous config saved to /var/cache/conftool/dbconfig/20200721-072127-kormat.json
* 07:13 kormat: killing James_F('s script) on mwmaint1002
* 07:06 _joe_: systemctl reset-failed on deneb, the usual known issue with releng image reporting
* 07:03 kormat@deploy1001: Synchronized wmf-config/db-eqiad.php: Disable writes to es4 [[phab:T257847|T257847]] (duration: 01m 00s)
* 06:59 kormat: Starting es4 failover from es1020 to es1021 [[phab:T257847|T257847]]
* 06:54 kormat@cumin1001: dbctl commit (dc=all): 'Set es1021 to weight 50 [[phab:T257847|T257847]]', diff saved to https://phabricator.wikimedia.org/P11974 and previous config saved to /var/cache/conftool/dbconfig/20200721-065457-kormat.json
* 06:54 marostegui: Pool db1119 into enwiki with MCR schema change done - [[phab:T238966|T238966]]
* 06:54 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1119', diff saved to https://phabricator.wikimedia.org/P11973 and previous config saved to /var/cache/conftool/dbconfig/20200721-065430-marostegui.json
* 06:27 _joe_: systemctl reset-failed on lists1001, a network interface was failing since 1 month
* 06:26 _joe_: enabling notifications for lists1001
* 06:23 _joe_: systemctl reset-failed on both centrallogs
* 02:43 eileen: civicrm revision changed from {{Gerrit|7f1e7d8e38}} to {{Gerrit|cc5d17fbaf}}, config revision is {{Gerrit|23460676f6}}
* 00:02 ryankemper: Began Elasticsearch reindex job on index `dewiki_content` across [`eqiad`, `codfw`, `cloudelastic`], on `rkemper@mwmaint1002` under tmux session `reindex`. Should complete in <24 hours


== 2020-07-20 ==
== 2021-11-25 ==
* 23:49 eileen: tools revision changed from {{Gerrit|b915d8efbd}} to {{Gerrit|22550f38c5}}
* 20:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1160 ([[phab:T296143|T296143]])', diff saved to https://phabricator.wikimedia.org/P17872 and previous config saved to /var/cache/conftool/dbconfig/20211125-204357-ladsgroup.json
* 23:34 ejegg: updated fundraising CiviCRM from {{Gerrit|8b09c87ce2}} to {{Gerrit|7f1e7d8e38}}
* 20:43 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1160.eqiad.wmnet with reason: Maintenance [[phab:T296143|T296143]]
* 23:12 urbanecm@deploy1001: Synchronized php-1.35.0-wmf.41/extensions/ProofreadPage/ProofreadPage.namespaces.php: {{Gerrit|03ed74f0b9b8f55d01f9112c31f2f6ea17990f9c}}: Add ProofreadPage namespace translation for lij ([[phab:T257672|T257672]]) (duration: 00m 57s)
* 20:43 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db1160.eqiad.wmnet with reason: Maintenance [[phab:T296143|T296143]]
* 23:06 Urbanecm: run mwscript namespaceDupes.php --wiki=lijwikisource -- fix ([[phab:T257672|T257672]])
* 19:28 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1150.eqiad.wmnet with reason: Maintenance [[phab:T296143|T296143]]
* 23:05 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|2147774caaa0819f8b5d71cc16bc021d94677702}}: Add English aliases for WS-specific namespaces to lijwikisource ([[phab:T257672|T257672]]) (duration: 00m 57s)
* 19:28 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db1150.eqiad.wmnet with reason: Maintenance [[phab:T296143|T296143]]
* 22:59 ryankemper@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 613669: cirrussearch: Allow 2 dewiki->content shards/node {{!}} https://gerrit.wikimedia.org/r/c/operations/mediawiki-config/+/613669 (duration: 00m 57s)
* 19:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'After maintenance db1149 ([[phab:T296143|T296143]])', diff saved to https://phabricator.wikimedia.org/P17871 and previous config saved to /var/cache/conftool/dbconfig/20211125-192850-ladsgroup.json
* 21:53 eileen: tools revision changed from {{Gerrit|40d52a0008}} to {{Gerrit|b915d8efbd}}
* 19:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'After maintenance db1149 ([[phab:T296143|T296143]])', diff saved to https://phabricator.wikimedia.org/P17870 and previous config saved to /var/cache/conftool/dbconfig/20211125-191345-ladsgroup.json
* 21:15 sbassett: Revised mitigation deployed for [[phab:T257687|T257687]]
* 18:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'After maintenance db1149 ([[phab:T296143|T296143]])', diff saved to https://phabricator.wikimedia.org/P17869 and previous config saved to /var/cache/conftool/dbconfig/20211125-185841-ladsgroup.json
* 20:07 eileen: tools revision changed from {{Gerrit|711d671600}} to {{Gerrit|40d52a0008}}
* 18:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'After maintenance db1149 ([[phab:T296143|T296143]])', diff saved to https://phabricator.wikimedia.org/P17868 and previous config saved to /var/cache/conftool/dbconfig/20211125-184336-ladsgroup.json
* 19:10 mforns@deploy1001: Finished deploy [analytics/refinery@af86a05] (thin): Regular analytics weekly train THIN [analytics/refinery@af86a05be470ed8283f6585afb5cc231b26944a2] (duration: 00m 07s)
* 17:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1149 ([[phab:T296143|T296143]])', diff saved to https://phabricator.wikimedia.org/P17867 and previous config saved to /var/cache/conftool/dbconfig/20211125-172714-ladsgroup.json
* 19:10 mforns@deploy1001: Started deploy [analytics/refinery@af86a05] (thin): Regular analytics weekly train THIN [analytics/refinery@af86a05be470ed8283f6585afb5cc231b26944a2]
* 17:27 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1149.eqiad.wmnet with reason: Maintenance [[phab:T296143|T296143]]
* 19:09 mforns@deploy1001: Finished deploy [analytics/refinery@af86a05]: Regular analytics weekly train [analytics/refinery@af86a05be470ed8283f6585afb5cc231b26944a2] (duration: 05m 46s)
* 17:27 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db1149.eqiad.wmnet with reason: Maintenance [[phab:T296143|T296143]]
* 19:03 mforns@deploy1001: Started deploy [analytics/refinery@af86a05]: Regular analytics weekly train [analytics/refinery@af86a05be470ed8283f6585afb5cc231b26944a2]
* 17:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'After maintenance db1148 ([[phab:T296143|T296143]])', diff saved to https://phabricator.wikimedia.org/P17866 and previous config saved to /var/cache/conftool/dbconfig/20211125-172707-ladsgroup.json
* 18:37 urbanecm@deploy1001: Synchronized wmf-config/CommonSettings.php: {{Gerrit|df2584f181f08da0e1191f97e619e912e587b48d}}: Switch $wgUrlShortenerDomainsWhitelist --> $wgUrlShortenerAllowedDomains ([[phab:T255491|T255491]]) (duration: 00m 57s)
* 17:12 elukey@puppetmaster1001: conftool action : set/pooled=true; selector: dnsdisc=inference
* 18:26 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|dfed4727c6f9e003f9e1949b2995a0cf0ad4f1cc}}: Adding rollbacker group for arzwiki ([[phab:T258100|T258100]]) (duration: 00m 57s)
* 17:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'After maintenance db1148 ([[phab:T296143|T296143]])', diff saved to https://phabricator.wikimedia.org/P17864 and previous config saved to /var/cache/conftool/dbconfig/20211125-171202-ladsgroup.json
* 18:24 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|ee7ac95e16f55e850b318f7354842795e08e0270}}: Change of rollbacker group settings at jawiki ([[phab:T258339|T258339]]) (duration: 00m 57s)
* 16:57 volans@deploy1002: Finished deploy [netbox/deploy@87a36a7]: Deploy v2.10.4-wmf6 (duration: 06m 59s)
* 17:36 ejegg: updated payments-wiki settings to point TY page at new URL
* 16:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'After maintenance db1148 ([[phab:T296143|T296143]])', diff saved to https://phabricator.wikimedia.org/P17863 and previous config saved to /var/cache/conftool/dbconfig/20211125-165657-ladsgroup.json
* 16:32 ebernhardson@deploy1001: Finished deploy [wikimedia/discovery/analytics@10afb4b]: airflow: Turn off catchup on cirrus_namespace_map (duration: 00m 25s)
* 16:50 volans@deploy1002: Started deploy [netbox/deploy@87a36a7]: Deploy v2.10.4-wmf6
* 16:31 ebernhardson@deploy1001: Started deploy [wikimedia/discovery/analytics@10afb4b]: airflow: Turn off catchup on cirrus_namespace_map
* 16:49 jynus@cumin1001: dbctl commit (dc=all): 'Fully repool db1163', diff saved to https://phabricator.wikimedia.org/P17862 and previous config saved to /var/cache/conftool/dbconfig/20211125-164941-jynus.json
* 16:27 akosiaris: increase codfw mobileapps kubernetes traffic to 25% [[phab:T218733|T218733]]. Take #2
* 16:46 volans@deploy1002: Finished deploy [netbox/deploy@87a36a7]: Test v2.10.4-wmf6 on netbox-next (duration: 01m 04s)
* 16:27 akosiaris@cumin1001: conftool action : set/weight=8; selector: dc=codfw,service=mobileapps,name=scb.*
* 16:45 volans@deploy1002: Started deploy [netbox/deploy@87a36a7]: Test v2.10.4-wmf6 on netbox-next
* 15:59 elukey: restart airflow-webserver/scheduler to pick up TLS to mysql settings
* 16:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'After maintenance db1148 ([[phab:T296143|T296143]])', diff saved to https://phabricator.wikimedia.org/P17861 and previous config saved to /var/cache/conftool/dbconfig/20211125-164153-ladsgroup.json
* 15:21 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 16:18 jynus@cumin1001: dbctl commit (dc=all): 'Slowly repool db1163++', diff saved to https://phabricator.wikimedia.org/P17860 and previous config saved to /var/cache/conftool/dbconfig/20211125-161833-jynus.json
* 15:21 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime
* 16:14 jynus@cumin1001: dbctl commit (dc=all): 'Slowly repool db1163+', diff saved to https://phabricator.wikimedia.org/P17859 and previous config saved to /var/cache/conftool/dbconfig/20211125-161404-jynus.json
* 15:17 hnowlan: draining and restarting sessionstore2002
* 16:10 klausman: restarting pybal on lvs2009 [[phab:T289835|T289835]]
* 15:17 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 15:57 vgutierrez: restarting pybal  on lvs2010 - [[phab:T289835|T289835]]
* 15:17 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime
* 15:55 jynus@cumin1001: dbctl commit (dc=all): 'Slowly repool db1163', diff saved to https://phabricator.wikimedia.org/P17856 and previous config saved to /var/cache/conftool/dbconfig/20211125-155538-jynus.json
* 15:16 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 15:47 jynus: reenable gtid on db1163
* 15:13 jynus: dropping and recreating nagios@localhost users on all m1 servers
* 15:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1148 ([[phab:T296143|T296143]])', diff saved to https://phabricator.wikimedia.org/P17853 and previous config saved to /var/cache/conftool/dbconfig/20211125-152906-ladsgroup.json
* 15:12 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
* 15:29 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1148.eqiad.wmnet with reason: Maintenance [[phab:T296143|T296143]]
* 15:09 hnowlan: draining and restarting sessionstore2001
* 15:29 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db1148.eqiad.wmnet with reason: Maintenance [[phab:T296143|T296143]]
* 15:09 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 15:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'After maintenance db1147 ([[phab:T296143|T296143]])', diff saved to https://phabricator.wikimedia.org/P17852 and previous config saved to /var/cache/conftool/dbconfig/20211125-152858-ladsgroup.json
* 15:09 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime
* 15:22 ayounsi@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ping1001.eqiad.wmnet
* 15:09 hnowlan@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 15:19 klausman@cumin1001: conftool action : set/pooled=yes:weight=1; selector: cluster=ml_serve,service=kubesvc
* 15:09 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime
* 15:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'After maintenance db1147 ([[phab:T296143|T296143]])', diff saved to https://phabricator.wikimedia.org/P17851 and previous config saved to /var/cache/conftool/dbconfig/20211125-151354-ladsgroup.json
* 15:08 moritzm: draining restbase2023 for eventual reboot for kernel security update
* 15:13 ayounsi@cumin1001: START - Cookbook sre.hosts.decommission for hosts ping1001.eqiad.wmnet
* 15:04 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 15:12 ayounsi@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ping3001.esams.wmnet
* 15:00 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
* 15:05 ayounsi@cumin1001: START - Cookbook sre.hosts.decommission for hosts ping3001.esams.wmnet
* 14:56 moritzm: draining restbase2022 for eventual reboot for kernel security update
* 15:04 ayounsi@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ping2001.codfw.wmnet
* 14:56 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 14:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'After maintenance db1147 ([[phab:T296143|T296143]])', diff saved to https://phabricator.wikimedia.org/P17850 and previous config saved to /var/cache/conftool/dbconfig/20211125-145849-ladsgroup.json
* 14:56 jmm@cumin2001: START - Cookbook sre.hosts.downtime
* 14:54 ayounsi@cumin1001: START - Cookbook sre.hosts.decommission for hosts ping2001.codfw.wmnet
* 14:54 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 14:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'After maintenance db1147 ([[phab:T296143|T296143]])', diff saved to https://phabricator.wikimedia.org/P17849 and previous config saved to /var/cache/conftool/dbconfig/20211125-144344-ladsgroup.json
* 14:52 hnowlan: draining and restarting sessionstore1003
* 14:42 XioNoX: Update ping redirect to point to new ping VMs - [[phab:T295767|T295767]]
* 14:52 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 14:25 jayme: uncordoned kubestage1003.eqiad.wmnet kubestage1004.eqiad.wmnet - [[phab:T293729|T293729]]
* 14:52 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime
* 14:17 klausman@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality' for release 'main' .
* 14:51 mholloway-shell@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'mobileapps' for release 'nontls' .
* 14:16 klausman@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
* 14:51 mholloway-shell@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
* 14:12 klausman@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
* 14:49 mholloway-shell@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
* 13:40 ayounsi@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host ping1002.eqiad.wmnet
* 14:49 mholloway-shell@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'mobileapps' for release 'nontls' .
* 13:32 ayounsi@cumin1001: START - Cookbook sre.ganeti.makevm for new host ping1002.eqiad.wmnet
* 14:49 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
* 13:30 ayounsi@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host ping2002.codfw.wmnet
* 14:47 mholloway-shell@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'mobileapps' for release 'staging' .
* 13:28 Amir1: killing lingering process from mwmaint to depooled db1147
* 14:47 moritzm: draining restbase2021 for eventual reboot for kernel security update
* 13:20 ayounsi@cumin1001: START - Cookbook sre.ganeti.makevm for new host ping2002.codfw.wmnet
* 14:44 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 13:14 ayounsi@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host ping3002.esams.wmnet
* 14:43 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime
* 13:05 ayounsi@cumin1001: START - Cookbook sre.ganeti.makevm for new host ping3002.esams.wmnet
* 14:37 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 12:27 hnowlan@cumin1001: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching restbase202[1-3].codfw.wmnet: Restarting for certificate updates - hnowlan@cumin1001
* 14:36 mholloway-shell@deploy1001: Finished deploy [mobileapps/deploy@ff49fdf]: Update mobileapps to {{Gerrit|0bf7bafa}} (duration: 03m 50s)
* 12:14 arturo: update repo bullseye-wikimedia/thirdparty/ceph-octopus ([[phab:T296175|T296175]])
* 14:34 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 12:14 jynus: disable temp. gtid on db1163
* 14:34 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime
* 12:11 jynus@cumin1001: dbctl commit (dc=all): 'Temp. depool db1163 fully', diff saved to https://phabricator.wikimedia.org/P17847 and previous config saved to /var/cache/conftool/dbconfig/20211125-121138-jynus.json
* 14:34 hnowlan: starting drain and restart of sessionstore hosts for new kernel
* 12:04 jynus@cumin1001: dbctl commit (dc=all): 'Reduce db1163 load even more', diff saved to https://phabricator.wikimedia.org/P17846 and previous config saved to /var/cache/conftool/dbconfig/20211125-120435-jynus.json
* 14:33 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
* 11:56 hnowlan@cumin1001: START - Cookbook sre.cassandra.roll-restart for nodes matching restbase202[1-3].codfw.wmnet: Restarting for certificate updates - hnowlan@cumin1001
* 14:32 mholloway-shell@deploy1001: Started deploy [mobileapps/deploy@ff49fdf]: Update mobileapps to {{Gerrit|0bf7bafa}}
* 11:56 jynus@cumin1001: dbctl commit (dc=all): 'Reduce db1163 load', diff saved to https://phabricator.wikimedia.org/P17845 and previous config saved to /var/cache/conftool/dbconfig/20211125-115602-jynus.json
* 14:26 moritzm: draining restbase2020 for eventual reboot for kernel security update
* 11:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1147 ([[phab:T296143|T296143]])', diff saved to https://phabricator.wikimedia.org/P17844 and previous config saved to /var/cache/conftool/dbconfig/20211125-110443-ladsgroup.json
* 14:23 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 11:04 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1147.eqiad.wmnet with reason: Maintenance [[phab:T296143|T296143]]
* 14:23 jmm@cumin2001: START - Cookbook sre.hosts.downtime
* 11:04 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db1147.eqiad.wmnet with reason: Maintenance [[phab:T296143|T296143]]
* 14:20 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 11:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'After maintenance db1146:3314 ([[phab:T296143|T296143]])', diff saved to https://phabricator.wikimedia.org/P17843 and previous config saved to /var/cache/conftool/dbconfig/20211125-110435-ladsgroup.json
* 14:17 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
* 10:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'After maintenance db1146:3314 ([[phab:T296143|T296143]])', diff saved to https://phabricator.wikimedia.org/P17842 and previous config saved to /var/cache/conftool/dbconfig/20211125-104930-ladsgroup.json
* 14:14 moritzm: draining restbase2019 for eventual reboot for kernel security update
* 10:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'After maintenance db1146:3314 ([[phab:T296143|T296143]])', diff saved to https://phabricator.wikimedia.org/P17841 and previous config saved to /var/cache/conftool/dbconfig/20211125-103425-ladsgroup.json
* 14:08 ema: lvs101[34] (primaries) - restart pybal to apply varnish healthcheck changes https://gerrit.wikimedia.org/r/c/operations/puppet/+/610047 [[phab:T255015|T255015]]
* 10:25 vgutierrez: rolling restart of varnish and HAProxy on cp2042.codfw.wmnet,cp1090.eqiad.wmnet,cp[5012].eqsin.wmnet,cp3065.esams.wmnet,cp[4026,4032].ulsfo.wmnet' to disable PROXY protocol - [[phab:T290005|T290005]]
* 14:07 ema: lvs1016 (secondary) - restart pybal to apply varnish healthcheck changes https://gerrit.wikimedia.org/r/c/operations/puppet/+/610047 [[phab:T255015|T255015]]
* 10:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'After maintenance db1146:3314 ([[phab:T296143|T296143]])', diff saved to https://phabricator.wikimedia.org/P17840 and previous config saved to /var/cache/conftool/dbconfig/20211125-101921-ladsgroup.json
* 14:06 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 09:55 jelto@cumin1001: conftool action : set/pooled=true; selector: name=eqiad,dnsdisc=(apertium{{!}}api-gateway{{!}}apple-search{{!}}blubberoid{{!}}citoid{{!}}cxserver{{!}}echostore{{!}}eventgate-analytics{{!}}eventgate-analytics-external{{!}}eventgate-logging-external{{!}}eventstreams{{!}}eventstreams-internal{{!}}linkrecommendation{{!}}mathoid{{!}}mobileapps{{!}}proton{{!}}push-notifications{{!}}recommendation-api{{!}}sessionstore{{!}}shellbox{{!}}shellbox-constraints{{!}}shellbox-media{{!}}shellbox-syntaxh
* 14:02 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
* 09:45 jelto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'zotero' for release 'production' .
* 13:59 ema: lvs300[56] (primaries) - restart pybal to apply varnish healthcheck changes https://gerrit.wikimedia.org/r/c/operations/puppet/+/610047 [[phab:T255015|T255015]]
* 09:43 jelto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
* 13:57 ema: lvs3007 (secondary) - restart pybal to apply varnish healthcheck changes https://gerrit.wikimedia.org/r/c/operations/puppet/+/610047 [[phab:T255015|T255015]]
* 09:39 jelto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'toolhub' for release 'main' .
* 13:50 ema: lvs500[12] (primaries) - restart pybal to apply varnish healthcheck changes https://gerrit.wikimedia.org/r/c/operations/puppet/+/610047 [[phab:T255015|T255015]]
* 09:37 jelto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'termbox' for release 'production' .
* 13:48 moritzm: draining restbase2018 for eventual reboot for kernel security update
* 09:34 jelto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 13:47 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 09:31 jelto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'similar-users' for release 'main' .
* 13:47 jmm@cumin2001: START - Cookbook sre.hosts.downtime
* 09:29 jelto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'shellbox-timeline' for release 'main' .
* 13:47 ema: lvs5003 (secondary) - restart pybal to apply varnish healthcheck changes https://gerrit.wikimedia.org/r/c/operations/puppet/+/610047 [[phab:T255015|T255015]]
* 09:27 jelto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'shellbox-syntaxhighlight' for release 'main' .
* 13:44 ema: lvs200[78] (primaries) - restart pybal to apply varnish healthcheck changes https://gerrit.wikimedia.org/r/c/operations/puppet/+/610047 [[phab:T255015|T255015]]
* 09:24 jelto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'shellbox-media' for release 'main' .
* 13:42 ema: lvs2010 (secondary) - restart pybal to apply varnish healthcheck changes https://gerrit.wikimedia.org/r/c/operations/puppet/+/610047 [[phab:T255015|T255015]]
* 09:23 jelto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'shellbox-media' for release 'main' .
* 13:34 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 09:21 jelto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'shellbox-constraints' for release 'main' .
* 13:31 ema: lvs400[56] (primaries) - restart pybal to apply varnish healthcheck changes https://gerrit.wikimedia.org/r/c/operations/puppet/+/610047 [[phab:T255015|T255015]]
* 09:19 jelto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'shellbox' for release 'main' .
* 13:31 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
* 09:16 jelto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'recommendation-api' for release 'production' .
* 13:27 moritzm: draining restbase2017 for eventual reboot for kernel security update
* 09:10 jelto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
* 13:24 ema: lvs4007 (secondary) - restart pybal to apply varnish healthcheck changes https://gerrit.wikimedia.org/r/c/operations/puppet/+/610047 [[phab:T255015|T255015]]
* 09:05 jelto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'push-notifications' for release 'main' .
* 13:22 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 09:02 jelto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'proton' for release 'production' .
* 13:16 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
* 08:59 jelto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 13:09 moritzm: draining restbase2016 for eventual reboot for kernel security update
* 08:51 jelto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
* 13:08 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 08:50 jelto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
* 13:08 jmm@cumin2001: START - Cookbook sre.hosts.downtime
* 08:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1146:3314 ([[phab:T296143|T296143]])', diff saved to https://phabricator.wikimedia.org/P17837 and previous config saved to /var/cache/conftool/dbconfig/20211125-084834-ladsgroup.json
* 13:07 moritzm: reset broken ifup systemd states on puppetdb* hosts
* 08:48 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1146.eqiad.wmnet with reason: Maintenance [[phab:T296143|T296143]]
* 13:05 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 08:48 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db1146.eqiad.wmnet with reason: Maintenance [[phab:T296143|T296143]]
* 13:01 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
* 08:47 jelto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'miscweb' for release 'main' .
* 12:59 Urbanecm: creating arywiki ([[phab:T257674|T257674]]), lijwikisource ([[phab:T257672|T257672]]), sysop_itwiki ([[phab:T256545|T256545]]) done
* 08:46 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1146.eqiad.wmnet with reason: Maintenance [[phab:T296143|T296143]]
* 12:59 moritzm: draining restbase2015 for eventual reboot for kernel security update
* 08:46 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db1146.eqiad.wmnet with reason: Maintenance [[phab:T296143|T296143]]
* 12:56 Urbanecm: Create Daimona Eaytoy at sysop_itwiki ([[phab:T256545|T256545]])
* 08:44 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1146.eqiad.wmnet with reason: Maintenance [[phab:T296143|T296143]]
* 12:55 urbanecm@deploy1001: Synchronized wmf-config/interwiki.php: Update interwiki cache (duration: 01m 59s)
* 08:44 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db1146.eqiad.wmnet with reason: Maintenance [[phab:T296143|T296143]]
* 12:50 urbanecm@deploy1001: Synchronized static/images/project-logos/: Creating sysop_itwiki ([[phab:T256545|T256545]]) (duration: 00m 57s)
* 08:43 jelto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mathoid' for release 'production' .
* 12:49 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Creating sysop_itwiki ([[phab:T256545|T256545]]) (duration: 00m 57s)
* 08:42 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1146.eqiad.wmnet with reason: Maintenance [[phab:T296143|T296143]]
* 12:48 urbanecm@deploy1001: rebuilt and synchronized wikiversions files: Creating sysop_itwiki ([[phab:T256545|T256545]])
* 08:42 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db1146.eqiad.wmnet with reason: Maintenance [[phab:T296143|T296143]]
* 12:46 urbanecm@deploy1001: Synchronized dblists: Creating sysop_itwiki ([[phab:T256545|T256545]]) (duration: 00m 57s)
* 08:41 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1146.eqiad.wmnet with reason: Maintenance [[phab:T296143|T296143]]
* 12:46 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 08:41 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db1146.eqiad.wmnet with reason: Maintenance [[phab:T296143|T296143]]
* 12:43 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
* 08:40 jelto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'linkrecommendation' for release 'external' .
* 12:40 moritzm: draining restbase2014 for eventual reboot for kernel security update
* 08:40 jelto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'linkrecommendation' for release 'internal' .
* 12:38 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 08:40 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1146.eqiad.wmnet with reason: Maintenance [[phab:T296143|T296143]]
* 12:38 jmm@cumin2001: START - Cookbook sre.hosts.downtime
* 08:39 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db1146.eqiad.wmnet with reason: Maintenance [[phab:T296143|T296143]]
* 12:35 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 08:37 jelto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventstreams-internal' for release 'main' .
* 12:34 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Creating lijwikisource ([[phab:T257672|T257672]]) (duration: 00m 57s)
* 08:34 jelto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventstreams' for release 'production' .
* 12:32 urbanecm@deploy1001: rebuilt and synchronized wikiversions files: Creating lijwikisource ([[phab:T257672|T257672]])
* 08:34 jelto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventstreams' for release 'canary' .
* 12:31 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
* 08:31 jelto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'canary' .
* 12:30 urbanecm@deploy1001: Synchronized dblists: Creating lijwikisource ([[phab:T257672|T257672]]) (duration: 00m 56s)
* 08:31 jelto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
* 12:28 urbanecm@deploy1001: Synchronized dblists/rtl.dblist: Add arywiki to rtl.dblist ([[phab:T257674|T257674]]) (duration: 00m 57s)
* 08:28 jelto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
* 12:27 moritzm: draining restbase2013 for eventual reboot for kernel security update
* 08:28 jelto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'canary' .
* 12:27 urbanecm@deploy1001: sync-file aborted: (no justification provided) (duration: 00m 00s)
* 08:25 jelto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-analytics' for release 'canary' .
* 12:21 urbanecm@deploy1001: Synchronized langlist: Creating arywiki ([[phab:T257674|T257674]]) (duration: 00m 56s)
* 08:25 jelto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-analytics' for release 'production' .
* 12:20 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Creating arywiki ([[phab:T257674|T257674]]) (duration: 00m 56s)
* 08:22 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
* 12:19 urbanecm@deploy1001: Synchronized static/images/project-logos/: Creating arywiki ([[phab:T257674|T257674]]) (duration: 00m 57s)
* 08:22 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 12:17 urbanecm@deploy1001: rebuilt and synchronized wikiversions files: Creating arywiki ([[phab:T257674|T257674]])
* 08:21 elukey@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
* 12:16 urbanecm@deploy1001: Synchronized dblists: Creating arywiki ([[phab:T257674|T257674]]) (duration: 00m 57s)
* 08:21 jelto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'cxserver' for release 'production' .
* 12:02 moritzm: installing qemu security updates on buster
* 08:21 elukey@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
* 11:50 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|946bf3d239f278b4e099f5dec676f5e2be61d8ca}}: Update brwikimedia logo and add upscaled versions (config) ([[phab:T257925|T257925]]) (duration: 00m 57s)
* 08:18 jelto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'citoid' for release 'production' .
* 11:49 urbanecm@deploy1001: sync-file aborted: (no justification provided) (duration: 00m 00s)
* 08:17 jelto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'citoid' for release 'production' .
* 11:49 Urbanecm: Purge 'https://en.wikipedia.org/static/images/project-logos/bnwikimedia.png'
* 08:14 jelto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' .
* 11:46 urbanecm@deploy1001: Synchronized static/images/project-logos/: {{Gerrit|f7560b6061dd3a60ccf56c916ebf70a3f104bea7}}: Update brwikimedia logo and add upscaled versions ([[phab:T257925|T257925]]) (duration: 00m 56s)
* 08:13 jelto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'changeprop' for release 'production' .
* 11:44 urbanecm@deploy1001: Synchronized wmf-config/CommonSettings.php: {{Gerrit|5b97a06fa2e9a06c251a9c1fd2ddd9beec01a683}}: Set $wgUrlShortenerAllowedDomains for all wikis ([[phab:T258134|T258134]]) (duration: 00m 57s)
* 08:09 jelto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'changeprop' for release 'production' .
* 11:42 urbanecm@deploy1001: sync-file aborted: (no justification provided) (duration: 00m 00s)
* 08:08 jelto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'changeprop' for release 'production' .
* 11:36 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|c12f1dee6b9888849c64312c2a4fd65ecbd4091e}}: Remove wgPopupsPageBlacklist config setting ([[phab:T254676|T254676]]) (duration: 00m 57s)
* 08:05 jelto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'blubberoid' for release 'production' .
* 11:35 Lucas_WMDE: lucaswerkmeister-wmde@mwmaint1002:~$ mwscript createAndPromote.php testwikidatawiki --custom-groups=interface-admin --force 'Lucas Werkmeister (WMDE)'
* 08:03 jelto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'apple-search' for release 'main' .
* 11:34 urbanecm@deploy1001: sync-file aborted: (no justification provided) (duration: 00m 01s)
* 08:02 jelto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'apple-search' for release 'main' .
* 11:25 Urbanecm: mwscript namespaceDupes.php --wiki=kowikiquote  --fix ([[phab:T255031|T255031]])
* 08:00 jelto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
* 11:24 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|3719668511231589b4fc6a723ccdfa772068ad5f}}: Add NamespaceAliases for kowikiquote ([[phab:T255031|T255031]]) (duration: 00m 57s)
* 07:57 jelto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'apertium' for release 'production' .
* 11:22 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|bc5671a90c65b66989e470fc41225986b2ec9fb5}}: Add media.farsnews.ir to the wgCopyUploadsDomains allowlist of Wikimedia Commons ([[phab:T253800|T253800]]) (duration: 00m 57s)
* 07:56 jelto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'apertium' for release 'production' .
* 11:18 Urbanecm: Run mwscript updateCollation.php --wiki=bswiktionary --previous-collation=uppercase in a tmux session at mwmaint1002 ([[phab:T258346|T258346]])
* 07:53 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1128.eqiad.wmnet with OS bullseye
* 11:17 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|0c784784d75c2bbfb570495a6a097d4c44cbe6b3}}: Set $wgCategoryCollation to uca-bs-u-kn on Bosnian Wiktionary ([[phab:T258346|T258346]]) (duration: 00m 58s)
* 07:51 jelto@cumin1001: conftool action : set/pooled=true; selector: name=eqiad,dnsdisc=(echostore{{!}}sessionstore)
* 11:13 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|6830723b0ad5031e67062ba838f09cd07c2b97a1}}: Convert ukwikisource ns:250 and ns:251 to have subpages ([[phab:T255930|T255930]]) (duration: 00m 57s)
* 07:49 marostegui: Stop mysql on db1133 to clone db1128 as a test host [[phab:T295965|T295965]]
* 11:10 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|1c7a6215d06aff6cb0a75701292d8147f006d9e4}}: Create closer group at itwikinews ([[phab:T257927|T257927]]) (duration: 00m 57s)
* 07:49 jelto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'sessionstore' for release 'production' .
* 10:55 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 07:48 jelto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'sessionstore' for release 'production' .
* 10:51 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
* 07:47 jayme: elevated MediaWiki exceptions and fatals (from ~07:35) due to a mistake during re-deploy of eventgate-main
* 10:50 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 07:45 jelto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'echostore' for release 'production' .
* 10:48 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
* 07:35 jelto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
* 10:48 moritzm: rebooting releases* hosts for kernel security update
* 07:32 jelto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-main' for release 'canary' .
* 10:35 jdrewniak@deploy1001: Synchronized portals: Wikimedia Portals Update: [[gerrit:614698{{!}} Bumping portals to master (614698)]] (duration: 00m 56s)
* 07:32 jelto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
* 10:34 jdrewniak@deploy1001: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: [[gerrit:614698{{!}} Bumping portals to master (614698)]] (duration: 00m 59s)
* 07:29 elukey_: elukey@mwdebug2002:~$ sudo systemctl reset-failed ifup@ens5.service
* 10:30 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1114', diff saved to https://phabricator.wikimedia.org/P11962 and previous config saved to /var/cache/conftool/dbconfig/20200720-103058-marostegui.json
* 07:27 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host db1128.eqiad.wmnet with OS bullseye
* 09:48 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1114', diff saved to https://phabricator.wikimedia.org/P11961 and previous config saved to /var/cache/conftool/dbconfig/20200720-094609-marostegui.json
* 07:23 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1145.eqiad.wmnet with reason: Maintenance [[phab:T296143|T296143]]
* 09:31 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1114', diff saved to https://phabricator.wikimedia.org/P11960 and previous config saved to /var/cache/conftool/dbconfig/20200720-093154-marostegui.json
* 07:23 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db1145.eqiad.wmnet with reason: Maintenance [[phab:T296143|T296143]]
* 09:25 godog: update compiler facts
* 07:20 jelto@cumin1001: conftool action : set/pooled=false; selector: name=eqiad,dnsdisc=(apertium{{!}}api-gateway{{!}}apple-search{{!}}blubberoid{{!}}citoid{{!}}cxserver{{!}}echostore{{!}}eventgate-analytics{{!}}eventgate-analytics-external{{!}}eventgate-logging-external{{!}}eventstreams{{!}}eventstreams-internal{{!}}linkrecommendation{{!}}mathoid{{!}}mobileapps{{!}}proton{{!}}push-notifications{{!}}recommendation-api{{!}}sessionstore{{!}}shellbox{{!}}shellbox-constraints{{!}}shellbox-media{{!}}shellbox-syntax
* 09:17 jayme: updating envoyproxy to 1.14.4-1 on all eqiad hosts
* 07:17 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on 32 hosts with reason: helm3 de-deploy [[phab:T251305|T251305]]
* 09:11 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1114', diff saved to https://phabricator.wikimedia.org/P11959 and previous config saved to /var/cache/conftool/dbconfig/20200720-091119-marostegui.json
* 07:17 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on 32 hosts with reason: helm3 de-deploy [[phab:T251305|T251305]]
* 09:04 jayme: updating envoyproxy to 1.14.4-1 on all codfw hosts
* 07:10 jelto: downtime PyBal backends health check on lvs1015 and lvs1016 for helm3 de-deploy [[phab:T251305|T251305]]. I'm keeping an eye on icing and remove downtime as soon as I'm finished
* 07:54 moritzm: installing libopenmpt security updates
* 07:09 jelto: start re-deploy procedure in eqiad Kubernetes [[phab:T251305|T251305]]
* 07:51 jayme: updating envoyproxy to 1.14.4-1 on all non mw and restbase hosts
* 06:31 marostegui: Restart tendril's DB
* 07:29 marostegui: Move m1-master from dbproxy1012 to dbproxy1014 - [[phab:T255408|T255408]]
* 05:51 ryankemper: [WDQS Deploy] Deploy complete. Successful test query placed on query.wikidata.org, there's no relevant criticals in Icinga, and Grafana looks good
* 07:19 marostegui: Drop non used reviewdb database - [[phab:T255715|T255715]]
* 04:45 ryankemper@deploy1002: Finished deploy [wdqs/wdqs@29c5cd7] (wcqs): Deploy 0.3.93 to WCQS (duration: 05m 27s)
* 06:55 elukey: restart matomo1002's mariadb to pick up new TLS settings
* 04:43 ryankemper: [WCQS Deploy] Tests look good following deploy of `0.3.93` to canary `wcqs1002.eqiad.wmnet`, proceeding to rest of fleet
* 06:54 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1114', diff saved to https://phabricator.wikimedia.org/P11958 and previous config saved to /var/cache/conftool/dbconfig/20200720-065438-marostegui.json
* 04:40 ryankemper@deploy1002: Started deploy [wdqs/wdqs@29c5cd7] (wcqs): Deploy 0.3.93 to WCQS
* 06:15 tstarling@deploy1001: Synchronized php-1.35.0-wmf.41/extensions/Score/includes/Score.php: reverting Reedy's temporary patch for hardcoding the lilypond version (duration: 00m 57s)
* 04:39 ryankemper: [WDQS Deploy] Restarting `wdqs-categories` across lvs-managed hosts, one node at a time: `sudo -E cumin -b 1 'A:wdqs-all and not A:wdqs-test' 'depool && sleep 45 && systemctl restart wdqs-categories && sleep 45 && pool'`
* 06:07 tstarling@deploy1001: Finished scap: fixing missing message from previous sync-dir (duration: 29m 57s)
* 04:38 ryankemper: [WDQS Deploy] Restarted `wdqs-categories` across all test hosts simultaneously: `sudo -E cumin 'A:wdqs-test' 'systemctl restart wdqs-categories'`
* 05:56 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1082 after a crash [[phab:T258336|T258336]]', diff saved to https://phabricator.wikimedia.org/P11957 and previous config saved to /var/cache/conftool/dbconfig/20200720-055614-marostegui.json
* 04:38 ryankemper: [WDQS Deploy] Restarted `wdqs-updater` across all hosts, 4 hosts at a time: `sudo -E cumin -b 4 'A:wdqs-all' 'systemctl restart wdqs-updater'`
* 05:47 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1082 after a crash [[phab:T258336|T258336]]', diff saved to https://phabricator.wikimedia.org/P11956 and previous config saved to /var/cache/conftool/dbconfig/20200720-054747-marostegui.json
* 04:35 ryankemper@deploy1002: Finished deploy [wdqs/wdqs@29c5cd7]: 0.3.93 (duration: 09m 23s)
* 05:40 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1082 after a crash [[phab:T258336|T258336]]', diff saved to https://phabricator.wikimedia.org/P11955 and previous config saved to /var/cache/conftool/dbconfig/20200720-053816-marostegui.json
* 04:30 ryankemper: [Elastic] Cleaning up dangling apt packages: `ryankemper@cumin1001:~$ sudo cumin -b 4 'elastic*' 'sudo apt autoremove -y'`
* 05:37 tstarling@deploy1001: Started scap: fixing missing message from previous sync-dir
* 04:27 ryankemper: [WDQS Deploy] Tests passing following deploy of `0.3.93` on canary `wdqs1003`; proceeding to rest of fleet
* 05:30 tstarling@deploy1001: scap sync-l10n completed (1.35.0-wmf.41) (duration: 02m 44s)
* 04:25 ryankemper@deploy1002: Started deploy [wdqs/wdqs@29c5cd7]: 0.3.93
* 05:25 marostegui: Deploy MCR schema change on enwiki on db1119 - [[phab:T238966|T238966]]
* 04:25 ryankemper: [WDQS Deploy] Gearing up for deploy of wdqs `0.3.93`. Pre-deploy tests passing on canary `wdqs1003`
* 05:24 tstarling@deploy1001: Synchronized wmf-config/CommonSettings.php: disable lilypond with better error message (duration: 00m 57s)
* 03:12 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic2072.codfw.wmnet with OS buster
* 05:18 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1082 after a crash [[phab:T258336|T258336]]', diff saved to https://phabricator.wikimedia.org/P11953 and previous config saved to /var/cache/conftool/dbconfig/20200720-051846-marostegui.json
* 02:42 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host elastic2072.codfw.wmnet with OS buster
* 05:18 tstarling@deploy1001: Synchronized php-1.35.0-wmf.41/extensions/Score: better error message for disabling of Score (duration: 01m 10s)
* 02:34 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic2071.codfw.wmnet with OS buster
* 02:23 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic2070.codfw.wmnet with OS buster
* 02:04 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host elastic2071.codfw.wmnet with OS buster
* 01:54 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host elastic2070.codfw.wmnet with OS buster
* 01:49 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic2068.codfw.wmnet with OS buster
* 01:34 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic2067.codfw.wmnet with OS buster
* 01:19 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host elastic2068.codfw.wmnet with OS buster
* 01:04 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host elastic2067.codfw.wmnet with OS buster
* 00:37 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic2066.codfw.wmnet with OS buster


== 2020-07-19 ==
== 2021-11-24 ==
* 19:16 marostegui: Upgrade and reboot db1085 [[phab:T258360|T258360]]
* 23:59 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host elastic2066.codfw.wmnet with OS buster
* 18:57 marostegui: Start mysql on db1082 [[phab:T258336|T258336]]
* 23:52 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic2065.codfw.wmnet with OS buster
* 18:51 marostegui: Upgrade and reboot db1082 [[phab:T258336|T258336]]
* 23:44 mutante: puppetmaster1001:~] $ sudo puppet cert sign gitlab-runner1001.eqiad.wmnet {{!}}  sudo install_console gitlab-runner1001.eqiad.wmnet ([[phab:T295481|T295481]])
* 18:45 cdanis@cumin1001: dbctl commit (dc=all): 'db1085 also crashed', diff saved to https://phabricator.wikimedia.org/P11952 and previous config saved to /var/cache/conftool/dbconfig/20200719-184511-cdanis.json
* 23:26 mutante: ganeti - bringing up new VM - sudo gnt-instance start gitlab-runner1001.eqiad.wmnet ; ran puppet on install1003; installing OS [[phab:T295481|T295481]]
* 18:06 Urbanecm: Run mwscript emptyUserGroup.php --wiki=testwiki contestadmin ([[phab:T256555|T256555]])
* 23:22 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host elastic2065.codfw.wmnet with OS buster
* 23:11 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic2064.codfw.wmnet with OS buster
* 23:09 mutante: mwmaint1002 - sudo /usr/bin/find /var/lib/puppet/clientbucket/ -type f -size 1M -delete  - to fix Icinga alert about large files in client bucket
* 23:08 dzahn@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host gitlab-runner1001.eqiad.wmnet
* 23:03 mutante: wcqs1001 -  sudo systemctl restart wcqs-blazegraph - after <+jinxer-wm> (BlazegraphFreeAllocatorsDecreasingRapidly) firing: Blazegraph instance wcqs1001:9195 is burning free allocators
* 22:52 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm for new host gitlab-runner1001.eqiad.wmnet
* 22:50 mutante: Creating a new Ganeti VM and wondering which row to put it? [ganeti1009:~] $ for row in A B C D; do echo "row $<nowiki>{</nowiki>row<nowiki>}</nowiki>: $(sudo gnt-instance list -o name -F "pnode.group == 'row_$<nowiki>{</nowiki>row<nowiki>}</nowiki>'" {{!}} wc -l) VMs"; done
* 22:43 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts gitlab-runner1001.wikimedia.org
* 22:41 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host elastic2064.codfw.wmnet with OS buster
* 22:39 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic2063.codfw.wmnet with OS buster
* 22:38 mutante: running decom cookbook on gitlab-runner1001.wikimedia.org VM which was in state "ADMIN_down" and not used yet. to make room to recreate it as gitlab-runner1001.eqiad.wmnet [[phab:T295481|T295481]]
* 22:36 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts gitlab-runner1001.wikimedia.org
* 22:08 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host elastic2063.codfw.wmnet with OS buster
* 22:03 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic2062.codfw.wmnet with OS buster
* 21:40 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 21:37 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 21:35 legoktm@deploy1002: Synchronized wmf-config/: Improve docs on $wmgUseGlobalAbuseFilters and sort list of wikis (duration: 00m 57s)
* 21:33 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host elastic2062.codfw.wmnet with OS buster
* 21:21 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic2061.codfw.wmnet with OS buster
* 21:00 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 20:58 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 20:54 legoktm@deploy1002: Synchronized wmf-config/: Update configuration related to disabling Score functionality (duration: 00m 57s)
* 20:51 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host elastic2061.codfw.wmnet with OS buster
* 19:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'After maintenance db1144:3314 ([[phab:T296143|T296143]])', diff saved to https://phabricator.wikimedia.org/P17834 and previous config saved to /var/cache/conftool/dbconfig/20211124-194857-ladsgroup.json
* 19:38 razzi: `sudo maintain-views --all-databases --replace-all` on clouddb1018 for [[phab:T292594|T292594]]
* 19:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'After maintenance db1144:3314 ([[phab:T296143|T296143]])', diff saved to https://phabricator.wikimedia.org/P17833 and previous config saved to /var/cache/conftool/dbconfig/20211124-193352-ladsgroup.json
* 19:19 razzi: run `maintain-views --all-databases --replace-all` on clouddb1013 for [[phab:T292594|T292594]]
* 19:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'After maintenance db1144:3314 ([[phab:T296143|T296143]])', diff saved to https://phabricator.wikimedia.org/P17832 and previous config saved to /var/cache/conftool/dbconfig/20211124-191847-ladsgroup.json
* 19:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'After maintenance db1144:3314 ([[phab:T296143|T296143]])', diff saved to https://phabricator.wikimedia.org/P17831 and previous config saved to /var/cache/conftool/dbconfig/20211124-190343-ladsgroup.json
* 18:57 vgutierrez@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ncredir2002.codfw.wmnet
* 18:51 vgutierrez@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM ncredir2002.codfw.wmnet
* 18:48 vgutierrez@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ncredir2001.codfw.wmnet
* 18:43 vgutierrez@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM ncredir2001.codfw.wmnet
* 18:42 vgutierrez@cumin1001: END (FAIL) - Cookbook sre.ganeti.reboot-vm (exit_code=99) for VM ncredir2001.codfw.wmnet
* 18:42 vgutierrez@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM ncredir2001.codfw.wmnet
* 18:42 vgutierrez@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM acmechief-test2001.codfw.wmnet
* 18:36 vgutierrez@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM acmechief-test2001.codfw.wmnet
* 18:36 vgutierrez@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM acmechief2001.codfw.wmnet
* 18:30 vgutierrez@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM acmechief2001.codfw.wmnet
* 17:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1144:3314 ([[phab:T296143|T296143]])', diff saved to https://phabricator.wikimedia.org/P17830 and previous config saved to /var/cache/conftool/dbconfig/20211124-174723-ladsgroup.json
* 17:47 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1144.eqiad.wmnet with reason: Maintenance [[phab:T296143|T296143]]
* 17:47 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db1144.eqiad.wmnet with reason: Maintenance [[phab:T296143|T296143]]
* 17:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'After maintenance db1143 ([[phab:T296143|T296143]])', diff saved to https://phabricator.wikimedia.org/P17829 and previous config saved to /var/cache/conftool/dbconfig/20211124-174615-ladsgroup.json
* 17:35 ladsgroup@deploy1002: Synchronized php-1.38.0-wmf.9/includes/libs/rdbms/: Backport: [[gerrit:741134{{!}}rdbms: Add full query to transaction profiler (T295706)]] (duration: 00m 56s)
* 17:34 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 17:34 jhathaway@cumin1001: conftool action : set/pooled=true; selector: name=codfw,dnsdisc=puppetboard
* 17:31 pt1979@cumin2002: START - Cookbook sre.dns.netbox
* 17:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'After maintenance db1143 ([[phab:T296143|T296143]])', diff saved to https://phabricator.wikimedia.org/P17828 and previous config saved to /var/cache/conftool/dbconfig/20211124-173110-ladsgroup.json
* 17:29 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 17:27 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 17:25 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 17:23 jayme@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM kubernetes2016.codfw.wmnet
* 17:22 jayme@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=helm-charts,name=codfw
* 17:21 pt1979@cumin2002: START - Cookbook sre.dns.netbox
* 17:21 jayme@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM kubernetes2016.codfw.wmnet
* 17:20 jayme@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM chartmuseum2001.codfw.wmnet
* 17:20 jayme@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM kubernetes2015.codfw.wmnet
* 17:17 jayme@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM kubernetes2015.codfw.wmnet
* 17:17 jayme@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM chartmuseum2001.codfw.wmnet
* 17:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'After maintenance db1143 ([[phab:T296143|T296143]])', diff saved to https://phabricator.wikimedia.org/P17827 and previous config saved to /var/cache/conftool/dbconfig/20211124-171604-ladsgroup.json
* 17:11 jayme@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM kubernetes2006.codfw.wmnet
* 17:11 jayme@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM registry2004.codfw.wmnet
* 17:08 jayme@cumin1001: conftool action : set/pooled=false; selector: dnsdisc=helm-charts,name=codfw
* 17:06 jayme@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM registry2004.codfw.wmnet
* 17:06 jayme@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM kubernetes2006.codfw.wmnet
* 17:05 mforns@deploy1002: Finished deploy [analytics/refinery@6253399] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@6253399] (duration: 06m 45s)
* 17:05 jayme@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM registry2003.codfw.wmnet
* 17:01 jayme@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM registry2003.codfw.wmnet
* 17:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'After maintenance db1143 ([[phab:T296143|T296143]])', diff saved to https://phabricator.wikimedia.org/P17826 and previous config saved to /var/cache/conftool/dbconfig/20211124-170100-ladsgroup.json
* 17:00 jayme@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM kubernetes2005.codfw.wmnet
* 16:58 mforns@deploy1002: Started deploy [analytics/refinery@6253399] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@6253399]
* 16:58 mforns@deploy1002: Finished deploy [analytics/refinery@6253399] (thin): Regular analytics weekly train THIN [analytics/refinery@6253399] (duration: 00m 07s)
* 16:58 mforns@deploy1002: Started deploy [analytics/refinery@6253399] (thin): Regular analytics weekly train THIN [analytics/refinery@6253399]
* 16:58 mforns@deploy1002: Finished deploy [analytics/refinery@6253399]: Regular analytics weekly train [analytics/refinery@6253399] (duration: 32m 50s)
* 16:56 jayme@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM kubernetes2005.codfw.wmnet
* 16:50 klausman@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
* 16:49 klausman@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
* 16:44 klausman@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
* 16:43 jayme@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM kubetcd2005.codfw.wmnet
* 16:43 klausman@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
* 16:42 klausman@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
* 16:42 klausman@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
* 16:41 jayme@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM kubetcd2005.codfw.wmnet
* 16:41 klausman@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
* 16:40 klausman@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
* 16:38 jayme@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM kubetcd2006.codfw.wmnet
* 16:36 jayme@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM kubestagetcd2002.codfw.wmnet
* 16:36 jayme@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM kubetcd2006.codfw.wmnet
* 16:35 ladsgroup@deploy1002: Synchronized php-1.38.0-wmf.9/includes/libs/rdbms/: Backport: [[gerrit:741132{{!}}rdbms: Make TransactionProfiler logs more useful (T295706)]] (duration: 00m 57s)
* 16:33 jayme@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM kubestagetcd2002.codfw.wmnet
* 16:33 jayme@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM kubetcd2004.codfw.wmnet
* 16:33 klausman@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
* 16:33 klausman@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
* 16:31 jayme@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM kubestagetcd2003.codfw.wmnet
* 16:31 jayme@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM kubetcd2004.codfw.wmnet
* 16:29 jayme@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM kubestagetcd2003.codfw.wmnet
* 16:25 jayme@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM kubestagetcd2001.codfw.wmnet
* 16:25 mforns@deploy1002: Started deploy [analytics/refinery@6253399]: Regular analytics weekly train [analytics/refinery@6253399]
* 16:23 jayme@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM kubemaster2001.codfw.wmnet
* 16:23 jayme@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM kubestagetcd2001.codfw.wmnet
* 16:21 jayme@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM kubemaster2001.codfw.wmnet
* 16:19 jayme@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM kubemaster2002.codfw.wmnet
* 16:16 jayme@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM kubemaster2002.codfw.wmnet
* 16:15 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 16:13 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 16:13 Amir1: start of  "foreachwikiindblist s3 migrateRevisionActorTemp.php --sleep=2" in mwmaint1002 in a screen. It will take a month or  so ([[phab:T275246|T275246]])
* 16:09 klausman@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
* 16:09 klausman@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
* 16:00 klausman@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
* 16:00 klausman@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
* 16:00 btullis: systemctl reset-failed ifup@ens5.service on schema2004 [[phab:T273026|T273026]]
* 15:48 btullis@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM schema2004.codfw.wmnet
* 15:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1143 ([[phab:T296143|T296143]])', diff saved to https://phabricator.wikimedia.org/P17821 and previous config saved to /var/cache/conftool/dbconfig/20211124-154533-ladsgroup.json
* 15:45 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1143.eqiad.wmnet with reason: Maintenance [[phab:T296143|T296143]]
* 15:45 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db1143.eqiad.wmnet with reason: Maintenance [[phab:T296143|T296143]]
* 15:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'After maintenance db1142 ([[phab:T296143|T296143]])', diff saved to https://phabricator.wikimedia.org/P17820 and previous config saved to /var/cache/conftool/dbconfig/20211124-154236-ladsgroup.json
* 15:39 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM kafkamon2002.codfw.wmnet
* 15:39 btullis@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM schema2004.codfw.wmnet
* 15:36 btullis@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM schema2003.codfw.wmnet
* 15:36 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM kafkamon2002.codfw.wmnet
* 15:35 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM irc2001.wikimedia.org
* 15:34 btullis@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM schema2003.codfw.wmnet
* 15:32 papaul: reboot ms-be2058 for firmware upgrade
* 15:31 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM irc2001.wikimedia.org
* 15:30 jayme@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM kubestagemaster2001.codfw.wmnet
* 15:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'After maintenance db1142 ([[phab:T296143|T296143]])', diff saved to https://phabricator.wikimedia.org/P17819 and previous config saved to /var/cache/conftool/dbconfig/20211124-152731-ladsgroup.json
* 15:23 jayme@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM kubestagemaster2001.codfw.wmnet
* 15:21 jayme@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM dragonfly-supernode2001.codfw.wmnet
* 15:17 jayme@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM dragonfly-supernode2001.codfw.wmnet
* 15:14 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM gitlab2001.wikimedia.org
* 15:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'After maintenance db1142 ([[phab:T296143|T296143]])', diff saved to https://phabricator.wikimedia.org/P17817 and previous config saved to /var/cache/conftool/dbconfig/20211124-151226-ladsgroup.json
* 15:08 elukey@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
* 15:08 elukey@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
* 15:08 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM gitlab2001.wikimedia.org
* 15:06 elukey@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
* 15:06 elukey@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
* 15:05 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM netflow2001.codfw.wmnet
* 15:02 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM netflow2001.codfw.wmnet
* 14:59 klausman@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
* 14:59 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM durum2002.codfw.wmnet
* 14:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'After maintenance db1142 ([[phab:T296143|T296143]])', diff saved to https://phabricator.wikimedia.org/P17815 and previous config saved to /var/cache/conftool/dbconfig/20211124-145721-ladsgroup.json
* 14:55 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM durum2002.codfw.wmnet
* 14:54 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM search-loader2001.codfw.wmnet
* 14:50 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM search-loader2001.codfw.wmnet
* 14:50 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM durum2001.codfw.wmnet
* 14:49 klausman@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
* 14:49 klausman@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
* 14:49 klausman@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
* 14:44 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM durum2001.codfw.wmnet
* 14:40 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM doh2002.wikimedia.org
* 14:39 filippo@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM logstash2031.codfw.wmnet
* 14:36 filippo@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM logstash2031.codfw.wmnet
* 14:36 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM doh2002.wikimedia.org
* 14:34 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM doh2001.wikimedia.org
* 14:33 klausman@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
* 14:32 klausman@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
* 14:31 filippo@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM logstash2030.codfw.wmnet
* 14:31 klausman@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
* 14:31 klausman@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
* 14:30 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM doh2001.wikimedia.org
* 14:30 elukey@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
* 14:30 elukey@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
* 14:28 godog: systemctl reset-failed ifup@ens5.service on logstash2024 [[phab:T273026|T273026]]
* 14:28 klausman@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
* 14:28 klausman@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
* 14:27 klausman@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
* 14:27 klausman@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
* 14:26 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM idp2001.wikimedia.org
* 14:26 filippo@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM logstash2030.codfw.wmnet
* 14:23 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM idp2001.wikimedia.org
* 14:21 filippo@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM logstash2025.codfw.wmnet
* 14:19 klausman@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
* 14:19 klausman@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
* 14:15 filippo@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM logstash2025.codfw.wmnet
* 14:10 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM idp-test2001.wikimedia.org
* 14:10 filippo@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM logstash2024.codfw.wmnet
* 14:06 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM idp-test2001.wikimedia.org
* 14:00 filippo@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM logstash2024.codfw.wmnet
* 13:58 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM serpens.wikimedia.org
* 13:55 filippo@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM logstash2023.codfw.wmnet
* 13:54 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM serpens.wikimedia.org
* 13:49 filippo@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM logstash2023.codfw.wmnet
* 13:41 filippo@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM logstash2006.codfw.wmnet
* 13:39 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1142.eqiad.wmnet with reason: Maintenance [[phab:T296143|T296143]]
* 13:39 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db1142.eqiad.wmnet with reason: Maintenance [[phab:T296143|T296143]]
* 13:39 filippo@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM logstash2006.codfw.wmnet
* 13:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1142 ([[phab:T296143|T296143]])', diff saved to https://phabricator.wikimedia.org/P17813 and previous config saved to /var/cache/conftool/dbconfig/20211124-133809-ladsgroup.json
* 13:37 filippo@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM logstash2005.codfw.wmnet
* 13:37 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ldap-replica2006.wikimedia.org
* 13:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'After maintenance db1141 ([[phab:T296143|T296143]])', diff saved to https://phabricator.wikimedia.org/P17812 and previous config saved to /var/cache/conftool/dbconfig/20211124-133628-ladsgroup.json
* 13:36 XioNoX: add Jayme r/o user to all network devices
* 13:35 filippo@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM logstash2005.codfw.wmnet
* 13:34 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM ldap-replica2006.wikimedia.org
* 13:30 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ldap-replica2005.wikimedia.org
* 13:30 filippo@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM logstash2004.codfw.wmnet
* 13:28 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM ldap-replica2005.wikimedia.org
* 13:27 filippo@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM logstash2004.codfw.wmnet
* 13:27 filippo@cumin1001: END (FAIL) - Cookbook sre.ganeti.reboot-vm (exit_code=99) for VM logstash2004.codfw.wmnet
* 13:27 filippo@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM logstash2004.codfw.wmnet
* 13:25 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ldap-corp2001.wikimedia.org
* 13:22 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM ldap-corp2001.wikimedia.org
* 13:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'After maintenance db1141 ([[phab:T296143|T296143]])', diff saved to https://phabricator.wikimedia.org/P17811 and previous config saved to /var/cache/conftool/dbconfig/20211124-131519-ladsgroup.json
* 13:07 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 13:04 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 13:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'After maintenance db1141 ([[phab:T296143|T296143]])', diff saved to https://phabricator.wikimedia.org/P17810 and previous config saved to /var/cache/conftool/dbconfig/20211124-130200-ladsgroup.json
* 13:00 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM apt2001.wikimedia.org
* 12:54 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM apt2001.wikimedia.org
* 12:53 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM grafana2001.codfw.wmnet
* 12:51 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM grafana2001.codfw.wmnet
* 12:48 jbond: enable puppet post puppetdb reboot
* 12:48 jayme@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 12:47 jbond@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM puppetdb2002.codfw.wmnet
* 12:46 jelto@cumin1001: conftool action : set/pooled=true; selector: name=codfw,dnsdisc=(apertium{{!}}api-gateway{{!}}apple-search{{!}}blubberoid{{!}}citoid{{!}}cxserver{{!}}echostore{{!}}eventgate-analytics{{!}}eventgate-analytics-external{{!}}eventgate-logging-external{{!}}eventstreams{{!}}eventstreams-internal{{!}}linkrecommendation{{!}}mathoid{{!}}mobileapps{{!}}proton{{!}}push-notifications{{!}}recommendation-api{{!}}sessionstore{{!}}shellbox{{!}}shellbox-constraints{{!}}shellbox-media{{!}}shellbox-syntaxh
* 12:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'After maintenance db1141 ([[phab:T296143|T296143]])', diff saved to https://phabricator.wikimedia.org/P17809 and previous config saved to /var/cache/conftool/dbconfig/20211124-124420-ladsgroup.json
* 12:43 jbond@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM puppetdb2002.codfw.wmnet
* 12:37 jbond: disable puppet for puppetdb reboot
* 12:32 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM urldownloader2002.wikimedia.org
* 12:29 jelto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'miscweb' for release 'main' .
* 12:29 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM urldownloader2002.wikimedia.org
* 12:28 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM urldownloader2001.wikimedia.org
* 12:25 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM urldownloader2001.wikimedia.org
* 12:24 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM releases2002.codfw.wmnet
* 12:23 awight: EU scap deployment finished
* 12:21 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM releases2002.codfw.wmnet
* 12:21 awight@deploy1002: Synchronized wmf-config/CommonSettings.php: Config: [[gerrit:737195{{!}}Replace global with parent scope]] (duration: 00m 55s)
* 12:16 awight@deploy1002: Synchronized wmf-config/CommonSettings.php: Config: [[gerrit:737193{{!}}[lint] fully-qualify classname]] (duration: 00m 55s)
* 12:13 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM netboxdb2001.codfw.wmnet
* 12:10 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM netboxdb2001.codfw.wmnet
* 12:10 awight@deploy1002: Synchronized wmf-config: Config: [[gerrit:740766{{!}}VisualEditor template dialog: new sidebar and inline descriptions (T284203, T286992)]] (duration: 00m 57s)
* 12:09 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 12:08 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM netbox2001.wikimedia.org
* 12:07 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 12:03 jelto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'zotero' for release 'production' .
* 12:03 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM netbox2001.wikimedia.org
* 12:02 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM netbox-dev2001.wikimedia.org
* 12:02 jelto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
* 12:01 jelto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'toolhub' for release 'main' .
* 11:59 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM netbox-dev2001.wikimedia.org
* 11:58 jelto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'termbox' for release 'production' .
* 11:58 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM rpki2002.codfw.wmnet
* 11:56 jelto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 11:54 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM rpki2002.codfw.wmnet
* 11:53 jelto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'similar-users' for release 'main' .
* 11:51 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM poolcounter2003.codfw.wmnet
* 11:50 jelto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'shellbox-timeline' for release 'main' .
* 11:49 moritzm: systemctl reset-failed ifup@ens5.service on poolcounter2003 [[phab:T273026|T273026]]
* 11:48 jelto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'shellbox-syntaxhighlight' for release 'main' .
* 11:45 jelto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'shellbox-media' for release 'main' .
* 11:45 jelto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'shellbox-media' for release 'main' .
* 11:44 jelto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'shellbox-media' for release 'main' .
* 11:43 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM poolcounter2003.codfw.wmnet
* 11:42 jelto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'shellbox-constraints' for release 'main' .
* 11:41 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM poolcounter2004.codfw.wmnet
* 11:40 jelto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'shellbox' for release 'main' .
* 11:38 jelto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'sessionstore' for release 'production' .
* 11:37 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM poolcounter2004.codfw.wmnet
* 11:36 jelto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'recommendation-api' for release 'production' .
* 11:35 godog: bounce apache2 on logstash1025
* 11:35 jelto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'recommendation-api' for release 'production' .
* 11:32 jelto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
* 11:27 Amir1: optimizing image.commonswiki in db1141 ([[phab:T296143|T296143]])
* 11:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1141 ([[phab:T296143|T296143]])', diff saved to https://phabricator.wikimedia.org/P17808 and previous config saved to /var/cache/conftool/dbconfig/20211124-112539-ladsgroup.json
* 11:25 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1141.eqiad.wmnet with reason: Maintenance [[phab:T296143|T296143]]
* 11:25 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db1141.eqiad.wmnet with reason: Maintenance [[phab:T296143|T296143]]
* 11:23 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM orespoolcounter2004.codfw.wmnet
* 11:23 jelto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'push-notifications' for release 'main' .
* 11:21 jelto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'push-notifications' for release 'main' .
* 11:19 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM orespoolcounter2004.codfw.wmnet
* 11:18 jelto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'proton' for release 'production' .
* 11:18 jelto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'proton' for release 'production' .
* 11:17 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM orespoolcounter2003.codfw.wmnet
* 11:15 jelto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 11:13 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM orespoolcounter2003.codfw.wmnet
* 11:13 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 11:08 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 11:07 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM webperf2002.codfw.wmnet
* 11:05 jelto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
* 11:02 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM webperf2002.codfw.wmnet
* 10:55 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM webperf2001.codfw.wmnet
* 10:53 jelto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'miscweb' for release 'main' .
* 10:52 jelto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mathoid' for release 'production' .
* 10:51 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM webperf2001.codfw.wmnet
* 10:50 jelto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'linkrecommendation' for release 'external' .
* 10:50 jelto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'linkrecommendation' for release 'internal' .
* 10:49 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM xhgui2001.codfw.wmnet
* 10:48 XioNoX: rollback: disable ping-offload for codfw - [[phab:T294119|T294119]]
* 10:47 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM xhgui2001.codfw.wmnet
* 10:47 jelto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventstreams-internal' for release 'main' .
* 10:46 jelto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventstreams-internal' for release 'main' .
* 10:44 jelto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventstreams' for release 'production' .
* 10:42 jelto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventstreams' for release 'production' .
* 10:42 jelto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventstreams' for release 'canary' .
* 10:42 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM people2002.codfw.wmnet
* 10:40 jelto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
* 10:40 jelto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-main' for release 'canary' .
* 10:38 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM people2002.codfw.wmnet
* 10:38 jelto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
* 10:38 jelto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'canary' .
* 10:36 jelto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
* 10:36 jelto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'canary' .
* 10:33 jelto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-analytics' for release 'canary' .
* 10:33 jelto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-analytics' for release 'production' .
* 10:32 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ping2001.codfw.wmnet
* 10:28 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM ping2001.codfw.wmnet
* 10:27 jelto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'echostore' for release 'production' .
* 10:25 jelto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'cxserver' for release 'production' .
* 10:25 XioNoX: disable ping-offload for codfw - [[phab:T294119|T294119]]
* 10:24 jelto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'cxserver' for release 'production' .
* 10:21 jelto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'citoid' for release 'production' .
* 10:20 jelto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'citoid' for release 'production' .
* 10:20 jelto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'citoid' for release 'production' .
* 10:18 jelto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' .
* 10:17 jelto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'changeprop' for release 'production' .
* 10:14 jelto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'changeprop' for release 'production' .
* 10:13 jelto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'changeprop' for release 'production' .
* 10:12 jelto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'blubberoid' for release 'production' .
* 10:06 jelto: downtime PyBal backends health check for helm3 de-deploy [[phab:T251305|T251305]]. I'm keeping an eye on icing and remove downtime as soon as I'm finished
* 10:06 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM puppetboard2002.codfw.wmnet
* 10:02 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM puppetboard2002.codfw.wmnet
* 10:02 elukey@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ml-etcd2003.codfw.wmnet
* 10:02 vgutierrez: repool cp5006 - [[phab:T290005|T290005]]
* 10:01 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM puppetboard2001.codfw.wmnet
* 10:00 elukey@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM ml-etcd2003.codfw.wmnet
* 09:59 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM puppetboard2001.codfw.wmnet
* 09:58 jelto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'apple-search' for release 'main' .
* 09:56 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM debmonitor2002.codfw.wmnet
* 09:56 jelto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
* 09:56 elukey@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ml-etcd2002.codfw.wmnet
* 09:55 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM debmonitor2002.codfw.wmnet
* 09:54 jelto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
* 09:53 elukey@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM ml-etcd2002.codfw.wmnet
* 09:53 vgutierrez: restart varnish/haproxy on cp5006 - [[phab:T290005|T290005]]
* 09:53 elukey@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ml-etcd2001.codfw.wmnet
* 09:52 jelto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'apertium' for release 'production' .
* 09:50 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM install2003.wikimedia.org
* 09:49 elukey@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM ml-etcd2001.codfw.wmnet
* 09:46 elukey@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ml-serve-ctrl2002.codfw.wmnet
* 09:46 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM install2003.wikimedia.org
* 09:45 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM mx2001.wikimedia.org
* 09:45 vgutierrez: depool cp5006 - [[phab:T290005|T290005]]
* 09:43 elukey@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM ml-serve-ctrl2002.codfw.wmnet
* 09:41 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM mx2001.wikimedia.org
* 09:34 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM planet2002.codfw.wmnet
* 09:30 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM planet2002.codfw.wmnet
* 09:30 oblivian@puppetmaster1001: conftool action : set/pooled=true; selector: dnsdisc=apple-search,name=eqiad
* 09:24 jelto@cumin1001: conftool action : set/pooled=false; selector: name=codfw,dnsdisc=(apertium{{!}}api-gateway{{!}}blubberoid{{!}}citoid{{!}}cxserver{{!}}echostore{{!}}eventgate-analytics{{!}}eventgate-analytics-external{{!}}eventgate-logging-external{{!}}eventstreams{{!}}eventstreams-internal{{!}}linkrecommendation{{!}}mathoid{{!}}mobileapps{{!}}proton{{!}}push-notifications{{!}}recommendation-api{{!}}sessionstore{{!}}shellbox{{!}}shellbox-constraints{{!}}shellbox-media{{!}}shellbox-syntaxhighlight{{!}}she
* 09:22 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM failoid2002.codfw.wmnet
* 09:20 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM failoid2002.codfw.wmnet
* 09:19 elukey@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ml-serve-ctrl2001.codfw.wmnet
* 09:16 elukey@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM ml-serve-ctrl2001.codfw.wmnet
* 09:16 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on zotero.svc.codfw.wmnet with reason: helm3 de-deploy [[phab:T251305|T251305]]
* 09:16 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on zotero.svc.codfw.wmnet with reason: helm3 de-deploy [[phab:T251305|T251305]]
* 09:16 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on wikifeeds.svc.codfw.wmnet with reason: helm3 de-deploy [[phab:T251305|T251305]]
* 09:15 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on wikifeeds.svc.codfw.wmnet with reason: helm3 de-deploy [[phab:T251305|T251305]]
* 09:15 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on termbox.svc.codfw.wmnet with reason: helm3 de-deploy [[phab:T251305|T251305]]
* 09:15 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on termbox.svc.codfw.wmnet with reason: helm3 de-deploy [[phab:T251305|T251305]]
* 09:15 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on tegola-vector-tiles.svc.codfw.wmnet with reason: helm3 de-deploy [[phab:T251305|T251305]]
* 09:15 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on tegola-vector-tiles.svc.codfw.wmnet with reason: helm3 de-deploy [[phab:T251305|T251305]]
* 09:15 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on similar-users.svc.codfw.wmnet with reason: helm3 de-deploy [[phab:T251305|T251305]]
* 09:15 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on similar-users.svc.codfw.wmnet with reason: helm3 de-deploy [[phab:T251305|T251305]]
* 09:15 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on shellbox-timeline.svc.codfw.wmnet with reason: helm3 de-deploy [[phab:T251305|T251305]]
* 09:15 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on shellbox-timeline.svc.codfw.wmnet with reason: helm3 de-deploy [[phab:T251305|T251305]]
* 09:15 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on shellbox-syntaxhighlight.svc.codfw.wmnet with reason: helm3 de-deploy [[phab:T251305|T251305]]
* 09:15 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on shellbox-syntaxhighlight.svc.codfw.wmnet with reason: helm3 de-deploy [[phab:T251305|T251305]]
* 09:15 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on shellbox-media.svc.codfw.wmnet with reason: helm3 de-deploy [[phab:T251305|T251305]]
* 09:15 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on shellbox-media.svc.codfw.wmnet with reason: helm3 de-deploy [[phab:T251305|T251305]]
* 09:15 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on shellbox-constraints.svc.codfw.wmnet with reason: helm3 de-deploy [[phab:T251305|T251305]]
* 09:15 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on shellbox-constraints.svc.codfw.wmnet with reason: helm3 de-deploy [[phab:T251305|T251305]]
* 09:15 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on shellbox.svc.codfw.wmnet with reason: helm3 de-deploy [[phab:T251305|T251305]]
* 09:15 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on shellbox.svc.codfw.wmnet with reason: helm3 de-deploy [[phab:T251305|T251305]]
* 09:15 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on sessionstore.svc.codfw.wmnet with reason: helm3 de-deploy [[phab:T251305|T251305]]
* 09:15 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on sessionstore.svc.codfw.wmnet with reason: helm3 de-deploy [[phab:T251305|T251305]]
* 09:15 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on recommendation-api.svc.codfw.wmnet with reason: helm3 de-deploy [[phab:T251305|T251305]]
* 09:14 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on recommendation-api.svc.codfw.wmnet with reason: helm3 de-deploy [[phab:T251305|T251305]]
* 09:14 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on push-notifications.svc.codfw.wmnet with reason: helm3 de-deploy [[phab:T251305|T251305]]
* 09:14 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on push-notifications.svc.codfw.wmnet with reason: helm3 de-deploy [[phab:T251305|T251305]]
* 09:14 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on proton.svc.codfw.wmnet with reason: helm3 de-deploy [[phab:T251305|T251305]]
* 09:14 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on proton.svc.codfw.wmnet with reason: helm3 de-deploy [[phab:T251305|T251305]]
* 09:14 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on mobileapps.svc.codfw.wmnet with reason: helm3 de-deploy [[phab:T251305|T251305]]
* 09:14 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on mobileapps.svc.codfw.wmnet with reason: helm3 de-deploy [[phab:T251305|T251305]]
* 09:14 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on mathoid.svc.codfw.wmnet with reason: helm3 de-deploy [[phab:T251305|T251305]]
* 09:14 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on mathoid.svc.codfw.wmnet with reason: helm3 de-deploy [[phab:T251305|T251305]]
* 09:14 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on linkrecommendation.svc.codfw.wmnet with reason: helm3 de-deploy [[phab:T251305|T251305]]
* 09:14 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on linkrecommendation.svc.codfw.wmnet with reason: helm3 de-deploy [[phab:T251305|T251305]]
* 09:14 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on eventstreams-internal.svc.codfw.wmnet with reason: helm3 de-deploy [[phab:T251305|T251305]]
* 09:14 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on eventstreams-internal.svc.codfw.wmnet with reason: helm3 de-deploy [[phab:T251305|T251305]]
* 09:14 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on eventstreams.svc.codfw.wmnet with reason: helm3 de-deploy [[phab:T251305|T251305]]
* 09:14 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on eventstreams.svc.codfw.wmnet with reason: helm3 de-deploy [[phab:T251305|T251305]]
* 09:14 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on eventgate-main.svc.codfw.wmnet with reason: helm3 de-deploy [[phab:T251305|T251305]]
* 09:14 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on eventgate-main.svc.codfw.wmnet with reason: helm3 de-deploy [[phab:T251305|T251305]]
* 09:14 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on eventgate-logging-external.svc.codfw.wmnet with reason: helm3 de-deploy [[phab:T251305|T251305]]
* 09:14 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on eventgate-logging-external.svc.codfw.wmnet with reason: helm3 de-deploy [[phab:T251305|T251305]]
* 09:14 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on eventgate-analytics-external.svc.codfw.wmnet with reason: helm3 de-deploy [[phab:T251305|T251305]]
* 09:13 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on eventgate-analytics-external.svc.codfw.wmnet with reason: helm3 de-deploy [[phab:T251305|T251305]]
* 09:13 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on eventgate-analytics.svc.codfw.wmnet with reason: helm3 de-deploy [[phab:T251305|T251305]]
* 09:13 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on eventgate-analytics.svc.codfw.wmnet with reason: helm3 de-deploy [[phab:T251305|T251305]]
* 09:13 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on echostore.svc.codfw.wmnet with reason: helm3 de-deploy [[phab:T251305|T251305]]
* 09:13 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on echostore.svc.codfw.wmnet with reason: helm3 de-deploy [[phab:T251305|T251305]]
* 09:13 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on cxserver.svc.codfw.wmnet with reason: helm3 de-deploy [[phab:T251305|T251305]]
* 09:13 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on cxserver.svc.codfw.wmnet with reason: helm3 de-deploy [[phab:T251305|T251305]]
* 09:13 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on citoid.svc.codfw.wmnet with reason: helm3 de-deploy [[phab:T251305|T251305]]
* 09:13 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on citoid.svc.codfw.wmnet with reason: helm3 de-deploy [[phab:T251305|T251305]]
* 09:13 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on blubberoid.svc.codfw.wmnet with reason: helm3 de-deploy [[phab:T251305|T251305]]
* 09:13 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on blubberoid.svc.codfw.wmnet with reason: helm3 de-deploy [[phab:T251305|T251305]]
* 09:13 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on apple-search.svc.codfw.wmnet with reason: helm3 de-deploy [[phab:T251305|T251305]]
* 09:13 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on apple-search.svc.codfw.wmnet with reason: helm3 de-deploy [[phab:T251305|T251305]]
* 09:13 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on api-gateway.svc.codfw.wmnet with reason: helm3 de-deploy [[phab:T251305|T251305]]
* 09:13 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on api-gateway.svc.codfw.wmnet with reason: helm3 de-deploy [[phab:T251305|T251305]]
* 09:11 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on apertium.svc.codfw.wmnet with reason: helm3 de-deploy [[phab:T251305|T251305]]
* 09:11 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on apertium.svc.codfw.wmnet with reason: helm3 de-deploy [[phab:T251305|T251305]]
* 09:10 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM deneb.codfw.wmnet
* 09:08 _joe_: switching search.wikimedia.org to be served by the apple-search servcie
* 09:04 jelto: start re-deploy procedure in codfw Kubernetes [[phab:T251305|T251305]]
* 09:01 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM deneb.codfw.wmnet
* 08:59 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 08:56 _joe_: repooling cp2027
* 08:55 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 08:55 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'apple-search' for release 'main' .
* 08:51 ladsgroup@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:741082{{!}}Set actor migration to write both on all wikis (T275246)]] (duration: 00m 57s)
* 08:51 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'apple-search' for release 'main' .
* 08:41 vgutierrez: depool cp2027
* 08:05 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1125.eqiad.wmnet with OS bullseye
* 07:40 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host db1125.eqiad.wmnet with OS bullseye
* 07:23 elukey: reboot kubernetes1018 (role::insetup) to verify negotiated speed of eth interface
* 07:12 elukey: drop /tmp/blockmgr-20fe4b2b-31fb-4a85-b5b1-{{Gerrit|bebe254120f8}} and other blockmgr-* dirs on stat1006 to free space on the root partition
* 06:47 Amir1: running optimize table with replication on db1155:3314 ([[phab:T296143|T296143]])
* 06:45 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance ([[phab:T296143|T296143]])
* 06:45 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 5:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance ([[phab:T296143|T296143]])
* 06:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1121 (re)pooling @ 100%: After optimize table ([[phab:T296143|T296143]])', diff saved to https://phabricator.wikimedia.org/P17807 and previous config saved to /var/cache/conftool/dbconfig/20211124-063228-root.json
* 06:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1121 (re)pooling @ 75%: After optimize table ([[phab:T296143|T296143]])', diff saved to https://phabricator.wikimedia.org/P17806 and previous config saved to /var/cache/conftool/dbconfig/20211124-061725-root.json
* 06:05 marostegui: Upgrade db1128's kernel [[phab:T288720|T288720]]
* 06:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1121 (re)pooling @ 25%: After optimize table ([[phab:T296143|T296143]])', diff saved to https://phabricator.wikimedia.org/P17805 and previous config saved to /var/cache/conftool/dbconfig/20211124-060221-root.json
* 05:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1121 (re)pooling @ 10%: After optimize table ([[phab:T296143|T296143]])', diff saved to https://phabricator.wikimedia.org/P17804 and previous config saved to /var/cache/conftool/dbconfig/20211124-054718-root.json
* 00:25 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wdqs2012.codfw.wmnet with OS buster


== 2020-07-18 ==
== 2021-11-23 ==
* 21:41 shdubsh: restart logstash on logstash200[456]
* 23:53 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host wdqs2012.codfw.wmnet with OS buster
* 21:14 shdubsh: bounce logstash on logstash1007
* 23:43 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wdqs2011.codfw.wmnet with OS buster
* 21:10 shdubsh: bounce logstash on logstash1008
* 23:12 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host wdqs2011.codfw.wmnet with OS buster
* 21:06 shdubsh: bounce logstash on logstash1009
* 23:11 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wdqs2010.codfw.wmnet with OS buster
* 20:52 marostegui: Due to db1082 crash there will be replication lag on s5 on labsdb hosts - [[phab:T258336|T258336]]
* 22:40 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host wdqs2010.codfw.wmnet with OS buster
* 20:37 cdanis@cumin1001: dbctl commit (dc=all): 'depool db1082, it crashed', diff saved to https://phabricator.wikimedia.org/P11951 and previous config saved to /var/cache/conftool/dbconfig/20200718-203704-cdanis.json
* 22:28 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wdqs2009.codfw.wmnet with OS buster
* 00:13 dpifke: Performing one-time expiration of ArcLamp files older than 40 days (normal retention is 45 days), to solve disk space issue until either Ganeti issue is solved or compressed logfile support is merged.
* 21:58 tgr: UTC evening deploys done
* 21:57 tgr@deploy1002: Finished scap: (no justification provided) (duration: 10m 03s)
* 21:57 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host wdqs2009.codfw.wmnet with OS buster
* 21:56 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wdqs2009.codfw.wmnet with OS buster
* 21:53 krinkle@deploy1002: Finished deploy [integration/docroot@a3435a7]: (no justification provided) (duration: 00m 07s)
* 21:53 krinkle@deploy1002: Started deploy [integration/docroot@a3435a7]: (no justification provided)
* 21:47 tgr@deploy1002: Started scap: (no justification provided)
* 21:47 tgr@deploy1002: Synchronized php-1.38.0-wmf.9/extensions/GrowthExperiments: Backport: [[gerrit:740777{{!}}Add Image: Validate GEInfoboxTemplates size (T294518)]] (duration: 00m 56s)
* 21:39 tgr@deploy1002: Synchronized php-1.38.0-wmf.9/extensions/GrowthExperiments/includes/Api/ApiQueryGrowthTasks.php: Backport: [[gerrit:740776{{!}}Structured task caching/filtering cherry-picks step 3]] (duration: 00m 55s)
* 21:35 tgr@deploy1002: Synchronized php-1.38.0-wmf.9/extensions/GrowthExperiments: Backport: [[gerrit:740775{{!}}Structured task caching/filtering cherry-picks step 2]] (duration: 00m 57s)
* 21:28 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host wdqs2009.codfw.wmnet with OS buster
* 20:57 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 20:50 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 20:25 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 20:21 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 20:07 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 20:04 legoktm@deploy1002: Synchronized php-1.38.0-wmf.9/extensions/Echo/: re-enable cross-wiki notifications by default ([[phab:T296270|T296270]]) (duration: 00m 57s)
* 19:58 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 19:57 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 19:52 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 19:51 urbanecm@deploy1002: Synchronized php-1.38.0-wmf.9/extensions/GrowthExperiments/: {{Gerrit|7d5f779a73594bb11f359bda055f2c7af8e92feb}}: Structured task caching/filtering cherry-picks, step 1 (duration: 00m 56s)
* 19:42 urbanecm@deploy1002: Synchronized php-1.38.0-wmf.9/extensions/GrowthExperiments/: {{Gerrit|c26e407118e1cd8e1e3fea6e2f4e3e43a609ea62}}: GrowthExperiments backports (duration: 01m 03s)
* 19:27 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 19:23 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 19:18 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|bf82bfb3ddcaff04a1e90abc435ccb26f792780c}}: Add new icons, wordmarks & taglines for several wikis ([[phab:T290091|T290091]]; 2/2) (duration: 00m 56s)
* 19:17 urbanecm@deploy1002: Synchronized static/images/mobile/copyright/: {{Gerrit|bf82bfb3ddcaff04a1e90abc435ccb26f792780c}}: Add new icons, wordmarks & taglines for several wikis ([[phab:T290091|T290091]]; 1/2) (duration: 00m 56s)
* 19:12 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 19:08 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|3993aacbfdbbfb6cdcc198ce369bf08b32ace865}}: Increase reading depth sampling rate to .1% ([[phab:T294777|T294777]]) (duration: 00m 57s)
* 19:08 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 18:29 mbsantos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'push-notifications' for release 'main' .
* 18:25 ejegg: updated SmashPig standalone (IPN listener) from {{Gerrit|be68299b}} -> {{Gerrit|211f8e65}}
* 18:18 mbsantos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'push-notifications' for release 'main' .
* 18:18 cmjohnson1: upgrading msw-c1-eqiad [[phab:T259758|T259758]]
* 18:04 mbsantos@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
* 18:01 mbsantos@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
* 18:00 moritzm: systemctl reset-failed ifup@ens5.service on durum2001 [[phab:T273026|T273026]]
* 17:59 mbsantos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'wikifeeds' for release 'staging' .
* 17:55 sukhe@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM durum2002.codfw.wmnet
* 17:49 mutante: miscweb1002 - rm -rf /srv/deployments/scholarships ([[phab:T243037|T243037]])
* 17:47 sukhe@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM durum2002.codfw.wmnet
* 17:42 sukhe@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM durum2001.codfw.wmnet
* 17:35 ebernhardson: [[phab:T295478|T295478]] start snapshot of commonswiki_file from cirrus codfw -> swift eqiad
* 17:34 sukhe@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM durum2001.codfw.wmnet
* 17:33 sukhe@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM doh2002.wikimedia.org
* 17:31 cmjohnson1: upgrading msw's  in row D eqiad [[phab:T259758|T259758]]
* 17:28 sukhe@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM doh2002.wikimedia.org
* 17:26 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-fe2012.codfw.wmnet with OS stretch
* 17:16 sukhe@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on durum2002.codfw.wmnet with reason: apply new KVM machine settings
* 17:16 sukhe@cumin1001: START - Cookbook sre.hosts.downtime for 0:30:00 on durum2002.codfw.wmnet with reason: apply new KVM machine settings
* 17:16 sukhe@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on durum2001.codfw.wmnet with reason: apply new KVM machine settings
* 17:16 sukhe@cumin1001: START - Cookbook sre.hosts.downtime for 0:30:00 on durum2001.codfw.wmnet with reason: apply new KVM machine settings
* 17:15 sukhe@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on doh2002.wikimedia.org with reason: apply new KVM machine settings
* 17:15 sukhe@cumin1001: START - Cookbook sre.hosts.downtime for 0:30:00 on doh2002.wikimedia.org with reason: apply new KVM machine settings
* 17:15 sukhe@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on doh2001.wikimedia.org with reason: apply new KVM machine settings
* 17:15 sukhe@cumin1001: START - Cookbook sre.hosts.downtime for 0:30:00 on doh2001.wikimedia.org with reason: apply new KVM machine settings
* 17:15 sukhe@cumin1001: END (ERROR) - Cookbook sre.ganeti.reboot-vm (exit_code=97) for VM doh2001.wikimedia.org
* 17:14 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM mwdebug2002.codfw.wmnet
* 17:14 sukhe@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM doh2001.wikimedia.org
* 17:14 sukhe@cumin1001: END (ERROR) - Cookbook sre.ganeti.reboot-vm (exit_code=97) for VM doh2001.wikimedia.org
* 17:11 sukhe@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM doh2001.wikimedia.org
* 17:10 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM mwdebug2002.codfw.wmnet
* 17:05 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM mwdebug2001.codfw.wmnet
* 17:02 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM mwdebug2001.codfw.wmnet
* 16:59 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM miscweb2002.codfw.wmnet
* 16:57 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM miscweb2002.codfw.wmnet
* 16:55 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM doc2001.codfw.wmnet
* 16:53 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM doc2001.codfw.wmnet
* 16:51 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host ms-fe2012.codfw.wmnet with OS stretch
* 16:49 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM pybal-test2001.codfw.wmnet
* 16:47 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM pybal-test2001.codfw.wmnet
* 16:41 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM pybal-test2003.codfw.wmnet
* 16:39 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM pybal-test2003.codfw.wmnet
* 16:28 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-fe2011.codfw.wmnet with OS stretch
* 16:13 cmjohnson1: updating mgmt switches in row C, racks C2-C8 eqiad [[phab:T259758|T259758]]
* 15:54 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host ms-fe2011.codfw.wmnet with OS stretch
</