You are browsing a read-only backup copy of Wikitech. The live site can be found at wikitech.wikimedia.org

Difference between revisions of "Server Admin Log"

From Wikitech-static
Jump to navigation Jump to search
imported>Labslogbot
(running mwscript populateContentModel.php --wiki=enwiki --ns=all --table=page (legoktm))
imported>Stashbot
(mutante: mx2001 - did not come back from reboot, did not get IP on interface, could not start ferm, logged in via console with root password, in /etc/network/interfaces replaced all "ens5" with "ens13", rebooted again, selected previous kernel version)
 
Line 1: Line 1:
== 2015-07-24 ==
== 2021-12-04 ==
* 21:57 legoktm: running mwscript populateContentModel.php --wiki=enwiki --ns=all --table=page
* 01:14 mutante: mx2001 - did not come back from reboot, did not get IP on interface, could not start ferm, logged in via console with root password, in /etc/network/interfaces replaced all "ens5" with "ens13", rebooted again, selected previous kernel version
* 20:36 logmsgbot: krenair Synchronized php-1.26wmf15/extensions/VisualEditor/modules/ve-mw/ui: https://gerrit.wikimedia.org/r/#/c/226907/ (duration: 00m 12s)
* 00:54 mutante: rebooting mx2001
* 19:40 awight: updated DjangoBannerStats from 3db799dc8705c728c7261ae433e8197f5498fa1b to 57a0392b3f43b65050b01a0465e120ed609a769e
* 00:31 jynus: manually restarting clamav on otrs1001 after being killed
* 19:08 YuviPanda: remove others20150724183453 on labstore1002
* 18:39 logmsgbot: ori Synchronized wmf-config/CommonSettings.php: Ib7c7861e: Point to a no-op /beacon URL rather than Special:RecordImpression (duration: 00m 12s)
* 18:38 ori: Merging Ib7c7861e: Point to a no-op /beacon URL rather than Special:RecordImpression
* 18:30 ori: Depooled Precise image scalers (mw1159 and mw1160)
* 18:29 logmsgbot: ori Synchronized wmf-config/CommonSettings.php: Idfe1fa60: testwiki: Point to a no-op /beacon URL rather than Special:RecordImpression (duration: 00m 12s)
* 18:17 YuviPanda: removed labstore/others20150724 on labstore1002
* 18:15 YuviPanda: running others20150724 on labstore1002
* 16:51 bd808: Upgraded logstash1006 to elasticsearch 1.7.0
* 16:48 bd808: Upgraded logstash1005 to elasticsearch 1.7.0
* 16:36 bd808: Upgraded logstash1004 to elasticsearch 1.7.0
* 16:27 bd808: Upgraded logstash1003 to elasticsearch 1.7.0
* 16:26 bd808: Upgraded logstash1002 to elasticsearch 1.7.0
* 16:25 bd808: Upgraded logstash1001 to elasticsearch 1.7.0
* 13:44 cmjohnson1: swapping failed disk db1058
* 13:11 cmjohnson1: swapping ssds in restbase1007
* 12:47 hashar: restarting Jenkins
* 12:47 hashar: Jenkins: switching gearman plugin from our custom compiled 0.1.1-9-g08e9c42-change_192429_2  to upstream 0.1.2. They are actually the exact same versions.
* 10:23 logmsgbot: legoktm Synchronized php-1.26wmf15/extensions/AbuseFilter/: Special:AbuseFilter on all large Wikipedias is returning errors - T106798 (duration: 00m 13s)
* 08:40 hashar: upgrading zuul to zuul_2.0.0-327-g3ebedde-wmf3precise1 to fix a regression ( https://phabricator.wikimedia.org/T106531 )
* 05:53 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Fri Jul 24 05:53:16 UTC 2015 (duration 53m 15s)
* 05:52 Krinkle: Added rl-test.php on testwiki (mw1017) to gather stats about cache-control rollover (Catrope, Krinkle). Used by testwiki/test2wiki/mediawikiwiki Common.js (sampled). See T105255.
* 02:29 logmsgbot: LocalisationUpdate completed (1.26wmf15) at 2015-07-24 02:29:25+00:00
* 02:26 urandom: restarting restbase on restbase1006
* 02:25 logmsgbot: l10nupdate Synchronized php-1.26wmf15/cache/l10n: (no message) (duration: 07m 12s)
* 02:06 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Fri Jul 24 02:06:41 UTC 2015 (duration 6m 40s)
* 02:02 logmsgbot: LocalisationUpdate failed (1.26wmf15) at 2015-07-24 02:02:31+00:00
* 00:21 ori: Re-enabled Puppet on mw1153


== 2015-07-23 ==
== 2021-12-03 ==
* 23:31 logmsgbot: catrope Synchronized php-1.26wmf15/extensions/WikimediaEvents: SWAT (duration: 00m 12s)
* 20:29 cstone: revision changed from {{Gerrit|2c2e22cd}} to {{Gerrit|b82183b9}}
* 23:31 logmsgbot: catrope Synchronized php-1.26wmf15/extensions/CirrusSearch: SWAT (duration: 00m 12s)
* 17:56 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1028.eqiad.wmnet with OS buster
* 23:30 logmsgbot: catrope Synchronized php-1.26wmf14/extensions/WikimediaEvents: SWAT (duration: 00m 12s)
* 17:47 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1028.eqiad.wmnet with OS buster
* 23:30 logmsgbot: catrope Synchronized php-1.26wmf14/extensions/CirrusSearch: SWAT (duration: 00m 13s)
* 17:47 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1028.eqiad.wmnet with OS buster
* 23:16 logmsgbot: catrope Synchronized flow.dblist: Enable Flow on viwiki (duration: 00m 12s)
* 17:35 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1028.eqiad.wmnet with OS buster
* 23:14 logmsgbot: catrope Synchronized wmf-config/: SWAT (duration: 00m 11s)
* 17:35 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1028.eqiad.wmnet with OS buster
* 23:14 logmsgbot: catrope Synchronized w/static/images/: SWAT (duration: 00m 12s)
* 17:35 razzi@cumin1001: END (PASS) - Cookbook sre.aqs.roll-restart (exit_code=0) for AQS aqs cluster: Roll restart of all AQS's nodejs daemons.
* 23:11 ori: Restarting Apache on mw1153
* 17:22 razzi@cumin1001: START - Cookbook sre.aqs.roll-restart for AQS aqs cluster: Roll restart of all AQS's nodejs daemons.
* 23:09 ori: T84842: Requests to thumb_handler.php/.* don't match the ProxyPass rule and get handled by Zend instead. To see how HHVM actually handles these requests, I'm disabling Puppet on mw1153 and dropping the '$' anchor from the ProxyPass rules.
* 16:56 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1028.eqiad.wmnet with OS buster
* 23:02 logmsgbot: catrope Synchronized wmf-config/InitialiseSettings.php: Enable geo feature usage tracking on all wikis (duration: 00m 12s)
* 16:56 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1028.eqiad.wmnet with OS buster
* 21:19 hashar: is already a nice improvement
* 16:44 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1028.eqiad.wmnet with OS buster
* 20:33 twentyafterfour: deployed hotfix for T106716, restarted apache on iridium
* 16:42 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1028.eqiad.wmnet with OS buster
* 18:46 logmsgbot: catrope Synchronized php-1.26wmf15/resources/src/mediawiki.less/mediawiki.ui/mixins.less: Unbreak quiet button styles (duration: 00m 13s)
* 16:42 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1028.eqiad.wmnet with OS buster
* 18:10 logmsgbot: twentyafterfour rebuilt wikiversions.cdb and synchronized wikiversions files: all wikis to 1.26wmf15
* 16:39 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1028.eqiad.wmnet with OS buster
* 17:56 logmsgbot: jynus Synchronized wmf-config/db-codfw.php: Repooling es2004 after hardware maintenance (duration: 00m 11s)
* 16:39 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1028.eqiad.wmnet with OS buster
* 17:56 logmsgbot: jynus Synchronized wmf-config/db-eqiad.php: Repooling es2004 after hardware maintenance (duration: 00m 12s)
* 14:25 jelto@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host gitlab-runner2001.codfw.wmnet
* 17:38 legoktm: running foreachwikiindblist /home/legoktm/largebutnotenwiki.dblist populateContentModel.php --ns=all --table=page
* 14:10 jelto@cumin1001: START - Cookbook sre.ganeti.makevm for new host gitlab-runner2001.codfw.wmnet
* 16:27 ori: restarted hhvm on mw1221
* 12:53 moritzm: installing nss security updates on stretch
* 16:16 logmsgbot: thcipriani Finished scap: SWAT: Add azb interwiki sorting, Add Southern Luri, and Fix name of S and W Balochi (duration: 06m 13s)
* 12:37 moritzm: draining primary/secondary instances off ganeti2007 [[phab:T296622|T296622]]
* 16:14 urandom: restarting Cassandra on restbase1001 to (temporarily) enable GC logging
* 12:33 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti2022.codfw.wmnet to ganeti01.svc.codfw.wmnet
* 16:10 logmsgbot: thcipriani Started scap: SWAT: Add azb interwiki sorting, Add Southern Luri, and Fix name of S and W Balochi
* 12:33 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti2022.codfw.wmnet to ganeti01.svc.codfw.wmnet
* 15:38 moritzm: added jenkins-debian-glue 0.13.0 to apt.wikimedia.org (jessie-wikimedia)
* 12:30 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2022.codfw.wmnet
* 15:35 logmsgbot: thcipriani Synchronized wmf-config/InitialiseSettings.php: SWAT: fix references to non-existent wikis [[gerrit:226470]] (duration: 00m 13s)
* 12:26 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2022.codfw.wmnet
* 15:31 _joe_: rebooting ms-be1003, stuck in kernel locks
* 12:13 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti2022.codfw.wmnet with OS buster
* 15:31 logmsgbot: thcipriani Synchronized wmf-config/InitialiseSettings.php: SWAT: Remove reference to nonexistent ru_sibwiki.png [[gerrit:226469]] (duration: 00m 14s)
* 11:30 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti2022.codfw.wmnet with OS buster
* 15:26 logmsgbot: thcipriani Synchronized wmf-config/InitialiseSettings.php: SWAT: Add wgSitename and wgMetaNamespace for pnbwiki [[gerrit:226543]] (duration: 00m 12s)
* 11:27 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ganeti2011.codfw.wmnet with OS buster
* 15:15 logmsgbot: thcipriani Synchronized wmf-config/CommonSettings.php: SWAT: Set a different wmgContentTranslationDefaultSourceLanguage for English part II [[gerrit:224031]] (duration: 00m 12s)
* 11:08 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti2011.codfw.wmnet with OS buster
* 15:14 logmsgbot: thcipriani Synchronized wmf-config/InitialiseSettings.php: SWAT: Set a different wmgContentTranslationDefaultSourceLanguage for English part I [[gerrit:224031]] (duration: 00m 13s)
* 11:06 jynus: stop and shutdown db1102 [[phab:T296546|T296546]]
* 15:04 logmsgbot: thcipriani Synchronized wmf-config/InitialiseSettings.php: SWAT: Add wgSitename and wgMetaNamespace for pnbwikipedia [[gerrit:225322]] (duration: 00m 12s)
* 11:01 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on ganeti2011.codfw.wmnet with reason: Temporarily remove node from Ganeti for reimage
* 13:08 mobrovac: graphoid deploying 81b9633
* 11:01 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on ganeti2011.codfw.wmnet with reason: Temporarily remove node from Ganeti for reimage
* 10:56 jynus: disabling puppet on maps-test hosts to debug service issue
* 09:38 moritzm: draining primary/secondary instances off ganeti2011 [[phab:T296622|T296622]]
* 07:28 _joe_: upgrading hhvm on the canary appservers
* 09:25 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti2009.codfw.wmnet to ganeti01.svc.codfw.wmnet
* 06:59 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Thu Jul 23 06:59:44 UTC 2015 (duration 59m 43s)
* 09:24 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti2009.codfw.wmnet to ganeti01.svc.codfw.wmnet
* 06:42 logmsgbot: springle Synchronized wmf-config/db-eqiad.php: repool db1070, warm up (duration: 00m 13s)
* 09:23 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2009.codfw.wmnet
* 04:25 logmsgbot: ori Synchronized php-1.26wmf15/extensions/Scribunto/common/Base.php: (no message) (duration: 00m 13s)
* 09:18 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2009.codfw.wmnet
* 04:24 logmsgbot: ori Synchronized php-1.26wmf14/extensions/Scribunto/common/Base.php: (no message) (duration: 00m 12s)
* 09:15 marostegui@cumin1001: dbctl commit (dc=all): 'After maintenance db1161 ([[phab:T277354|T277354]])', diff saved to https://phabricator.wikimedia.org/P18019 and previous config saved to /var/cache/conftool/dbconfig/20211203-091537-marostegui.json
* 04:04 springle: upgrade & reboot db1070
* 09:00 marostegui@cumin1001: dbctl commit (dc=all): 'After maintenance db1161', diff saved to https://phabricator.wikimedia.org/P18018 and previous config saved to /var/cache/conftool/dbconfig/20211203-090033-marostegui.json
* 03:04 logmsgbot: LocalisationUpdate completed (1.26wmf15) at 2015-07-23 03:04:48+00:00
* 08:58 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti2009.codfw.wmnet with OS buster
* 03:00 logmsgbot: l10nupdate Synchronized php-1.26wmf15/cache/l10n: (no message) (duration: 07m 24s)
* 08:45 marostegui@cumin1001: dbctl commit (dc=all): 'After maintenance db1161', diff saved to https://phabricator.wikimedia.org/P18017 and previous config saved to /var/cache/conftool/dbconfig/20211203-084528-marostegui.json
* 02:39 springle: temporarily silenced backup4001 check_disk space icinga noise; seems important, but not exploding-any-minute-now
* 08:44 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 02:37 logmsgbot: LocalisationUpdate completed (1.26wmf14) at 2015-07-23 02:37:55+00:00
* 08:43 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 02:34 logmsgbot: l10nupdate Synchronized php-1.26wmf14/cache/l10n: (no message) (duration: 07m 13s)
* 08:30 marostegui@cumin1001: dbctl commit (dc=all): 'After maintenance db1161 ([[phab:T277354|T277354]])', diff saved to https://phabricator.wikimedia.org/P18016 and previous config saved to /var/cache/conftool/dbconfig/20211203-083023-marostegui.json
* 02:07 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Thu Jul 23 02:07:12 UTC 2015 (duration 7m 11s)
* 08:30 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti2009.codfw.wmnet with OS buster
* 02:05 logmsgbot: springle Synchronized wmf-config/db-eqiad.php: depool db1070 (duration: 00m 12s)
* 08:29 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1161 ([[phab:T277354|T277354]])', diff saved to https://phabricator.wikimedia.org/P18015 and previous config saved to /var/cache/conftool/dbconfig/20211203-082859-marostegui.json
* 02:03 logmsgbot: LocalisationUpdate failed (1.26wmf15) at 2015-07-23 02:03:03+00:00
* 08:28 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db[1154,1161].eqiad.wmnet with reason: Maintenance [[phab:T277354|T277354]]
* 02:03 logmsgbot: LocalisationUpdate failed (1.26wmf14) at 2015-07-23 02:03:02+00:00
* 08:28 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db[1154,1161].eqiad.wmnet with reason: Maintenance [[phab:T277354|T277354]]
* 01:45 logmsgbot: ori Synchronized php-1.26wmf15/includes/libs/objectcache/APCBagOStuff.php: I4b2cf1715538 (duration: 00m 12s)
* 08:28 marostegui@cumin1001: dbctl commit (dc=all): 'After maintenance db1110 ([[phab:T277354|T277354]])', diff saved to https://phabricator.wikimedia.org/P18014 and previous config saved to /var/cache/conftool/dbconfig/20211203-082848-marostegui.json
* 01:45 logmsgbot: ori Synchronized php-1.26wmf14/includes/libs/objectcache/APCBagOStuff.php: I4b2cf1715538 (duration: 00m 12s)
* 08:13 marostegui@cumin1001: dbctl commit (dc=all): 'After maintenance db1110', diff saved to https://phabricator.wikimedia.org/P18013 and previous config saved to /var/cache/conftool/dbconfig/20211203-081343-marostegui.json
* 01:05 twentyafterfour: phab is back
* 07:58 marostegui@cumin1001: dbctl commit (dc=all): 'After maintenance db1110', diff saved to https://phabricator.wikimedia.org/P18012 and previous config saved to /var/cache/conftool/dbconfig/20211203-075839-marostegui.json
* 01:03 logmsgbot: ori Synchronized php-1.26wmf14/includes/libs/objectcache/APCBagOStuff.php: I4b2cf1715 (duration: 00m 12s)
* 07:43 marostegui@cumin1001: dbctl commit (dc=all): 'After maintenance db1110 ([[phab:T277354|T277354]])', diff saved to https://phabricator.wikimedia.org/P18011 and previous config saved to /var/cache/conftool/dbconfig/20211203-074334-marostegui.json
* 01:01 legoktm: twentyafterfour is upgrading phabricator
* 07:39 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1110 ([[phab:T277354|T277354]])', diff saved to https://phabricator.wikimedia.org/P18010 and previous config saved to /var/cache/conftool/dbconfig/20211203-073910-marostegui.json
* 00:50 yurik: deployed kartotherian fix, still not starting as a service, and no idea why. Have no access to logs. Frustrated.
* 07:39 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1110.eqiad.wmnet with reason: Maintenance [[phab:T277354|T277354]]
* 00:46 logmsgbot: krenair Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/225515/ (duration: 00m 12s)
* 07:39 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db1110.eqiad.wmnet with reason: Maintenance [[phab:T277354|T277354]]
* 00:23 logmsgbot: krenair Synchronized wmf-config/InitialiseSettings.php: fix extra dollar mark in https://gerrit.wikimedia.org/r/#/c/226336/1/wmf-config/InitialiseSettings.php (duration: 00m 12s)
* 07:34 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1150.eqiad.wmnet with reason: Maintenance [[phab:T277354|T277354]]
* 00:02 logmsgbot: krenair Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/225541/ (duration: 00m 13s)
* 07:34 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db1150.eqiad.wmnet with reason: Maintenance [[phab:T277354|T277354]]
* 00:02 logmsgbot: krenair Synchronized wmf-config/CommonSettings.php: https://gerrit.wikimedia.org/r/#/c/225541/ (duration: 00m 12s)
* 07:34 marostegui@cumin1001: dbctl commit (dc=all): 'After maintenance db1144:3315 ([[phab:T277354|T277354]])', diff saved to https://phabricator.wikimedia.org/P18009 and previous config saved to /var/cache/conftool/dbconfig/20211203-073404-marostegui.json
* 07:19 marostegui@cumin1001: dbctl commit (dc=all): 'After maintenance db1144:3315', diff saved to https://phabricator.wikimedia.org/P18008 and previous config saved to /var/cache/conftool/dbconfig/20211203-071900-marostegui.json
* 07:03 marostegui@cumin1001: dbctl commit (dc=all): 'After maintenance db1144:3315', diff saved to https://phabricator.wikimedia.org/P18007 and previous config saved to /var/cache/conftool/dbconfig/20211203-070355-marostegui.json
* 06:48 marostegui@cumin1001: dbctl commit (dc=all): 'After maintenance db1144:3315 ([[phab:T277354|T277354]])', diff saved to https://phabricator.wikimedia.org/P18006 and previous config saved to /var/cache/conftool/dbconfig/20211203-064850-marostegui.json
* 06:28 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1125.eqiad.wmnet with OS bullseye
* 06:20 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1144:3315 ([[phab:T277354|T277354]])', diff saved to https://phabricator.wikimedia.org/P18005 and previous config saved to /var/cache/conftool/dbconfig/20211203-062019-marostegui.json
* 06:20 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1144.eqiad.wmnet with reason: Maintenance [[phab:T277354|T277354]]
* 06:20 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db1144.eqiad.wmnet with reason: Maintenance [[phab:T277354|T277354]]
* 06:20 marostegui@cumin1001: dbctl commit (dc=all): 'After maintenance db1113:3315 ([[phab:T277354|T277354]])', diff saved to https://phabricator.wikimedia.org/P18004 and previous config saved to /var/cache/conftool/dbconfig/20211203-062011-marostegui.json
* 06:05 marostegui@cumin1001: dbctl commit (dc=all): 'After maintenance db1113:3315', diff saved to https://phabricator.wikimedia.org/P18003 and previous config saved to /var/cache/conftool/dbconfig/20211203-060506-marostegui.json
* 06:02 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host db1125.eqiad.wmnet with OS bullseye
* 05:50 marostegui@cumin1001: dbctl commit (dc=all): 'After maintenance db1113:3315', diff saved to https://phabricator.wikimedia.org/P18002 and previous config saved to /var/cache/conftool/dbconfig/20211203-055001-marostegui.json
* 05:34 marostegui@cumin1001: dbctl commit (dc=all): 'After maintenance db1113:3315 ([[phab:T277354|T277354]])', diff saved to https://phabricator.wikimedia.org/P18001 and previous config saved to /var/cache/conftool/dbconfig/20211203-053457-marostegui.json
* 05:30 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1113:3315 ([[phab:T277354|T277354]])', diff saved to https://phabricator.wikimedia.org/P18000 and previous config saved to /var/cache/conftool/dbconfig/20211203-053032-marostegui.json
* 05:30 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1113.eqiad.wmnet with reason: Maintenance [[phab:T277354|T277354]]
* 05:30 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db1113.eqiad.wmnet with reason: Maintenance [[phab:T277354|T277354]]
* 01:39 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host restbase2025.codfw.wmnet with OS buster
* 01:06 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host restbase2025.codfw.wmnet with OS buster
* 01:05 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host restbase2024.codfw.wmnet with OS buster
* 01:01 tgr: UTC late deploys done
* 01:00 tgr@deploy1002: Synchronized php-1.38.0-wmf.9/extensions/GrowthExperiments: Backport: [[gerrit:743177{{!}}Add an image: Add test version of GEInfoboxTemplates (T291232)]] (duration: 00m 57s)
* 00:44 rzl: rzl@apt1001:~$ sudo -i reprepro -C main include buster-wikimedia /home/rzl/python3-imagecatalog/imagecatalog_0.0.1-1_amd64.changes
* 00:37 tgr@deploy1002: Synchronized php-1.38.0-wmf.9/extensions/GrowthExperiments/includes: Backport: [[gerrit:743178{{!}}Avoid references to TemplateCollectionFeature]] step2 (duration: 00m 56s)
* 00:36 tgr@deploy1002: Synchronized php-1.38.0-wmf.9/extensions/GrowthExperiments/includes/Config/Validation/GrowthConfigValidation.php: Backport: [[gerrit:743178{{!}}Avoid references to TemplateCollectionFeature]] step 1 (duration: 00m 56s)
* 00:33 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host restbase2024.codfw.wmnet with OS buster


== 2015-07-22 ==
== 2021-12-02 ==
* 23:56 cwdent: updated civicrm from 292ad137f6b3ffc818a3bd617ca4f335931091f3 to 83cacfa1e0852ffaf47d2f02e7d843cf6f3bcda4
* 20:05 legoktm: re-pooling mw1414 following testing
* 23:55 logmsgbot: krenair Synchronized wmf-config/InitialiseSettings.php: re-try reverted portion of https://gerrit.wikimedia.org/r/#/c/118654/ using NS IDs instead of not-necessarily-defined constants which were causing warning flood (duration: 00m 13s)
* 19:35 legoktm: installing yaml PHP extension on canaries
* 23:51 logmsgbot: krenair Synchronized wmf-config/InitialiseSettings.php: partially revert https://gerrit.wikimedia.org/r/#/c/118654/ (duration: 00m 12s)
* 19:29 andrewbogott: upgrading wikitech-static deb packages as well as moving to mediawiki 1.37.0
* 23:47 logmsgbot: krenair Synchronized wmf-config/InitialiseSettings.php: https://wikitech.wikimedia.org/w/index.php?title=Deployments&diff=171578&oldid=171570 (duration: 00m 12s)
* 19:26 majavah: UTC evening deploys done
* 23:47 logmsgbot: krenair Synchronized wmf-config/CommonSettings.php: https://wikitech.wikimedia.org/w/index.php?title=Deployments&diff=171578&oldid=171570 (duration: 00m 12s)
* 19:26 taavi@deploy1002: Synchronized php-1.38.0-wmf.9/extensions/WikimediaEvents/modules/ext.wikimediaEvents/webUIScroll.js: Backport: [[gerrit:743227{{!}}Update scroll instrument (T294246)]] (duration: 00m 56s)
* 23:40 yurik: deployed kartotherian
* 19:22 taavi@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:720363{{!}}Drop old config names for CentralAuth denylist controls (T277932)]] (duration: 00m 56s)
* 23:24 logmsgbot: krenair Synchronized wmf-config/InitialiseSettings-labs.php: https://gerrit.wikimedia.org/r/#/c/224393/ (duration: 00m 12s)
* 19:12 taavi@deploy1002: Synchronized wmf-config: Config: [[gerrit:739032{{!}}GrowthExperiments configuration fixes (T294737)]] (duration: 00m 57s)
* 23:24 logmsgbot: krenair Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/224393/ (duration: 00m 13s)
* 18:56 legoktm: upgraded scap to 4.1.0 on A:mw-canary, A:parsoid-canary, A:mw-jobrunner-canary ([[phab:T296867|T296867]])
* 23:19 logmsgbot: krenair Synchronized php-1.26wmf15/extensions/VisualEditor: https://gerrit.wikimedia.org/r/#/c/226447/ (duration: 00m 13s)
* 18:45 legoktm: uploaded scap 4.1.0 to apt.wm.o ([[phab:T296867|T296867]])
* 22:52 Reedy: populateSitesTable.php finished
* 18:22 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1028.eqiad.wmnet with OS buster
* 22:09 Reedy: running in screen as reedy on tin foreachwikiindblist wikidataclient.dblist extensions/Wikidata/extensions/Wikibase/lib/maintenance/populateSitesTable.php --force-protocol https
* 18:19 vgutierrez: re-enable puppet on cp3064 - [[phab:T296874|T296874]]
* 22:09 logmsgbot: reedy Synchronized database lists: Add azbwiki to wikidataclient.dblist (duration: 00m 11s)
* 18:14 hoo: Started Wikibase rebuildItemsPerSite on mwmaint1002 for wikidatawiki. Can be killed at any time, if necessary.
* 20:55 cscott: updated Parsoid to version 6befc44e
* 17:51 vgutierrez: puppet disabled on cp3064 to manually increase number of maxconns in HAProxy - [[phab:T296874|T296874]]
* 20:26 logmsgbot: twentyafterfour Synchronized php-1.26wmf15/includes/libs/MultiHttpClient.php: Deploy https://gerrit.wikimedia.org/r/#/c/226388/ (duration: 00m 12s)
* 17:38 ryankemper: [WDQS] Merged https://gerrit.wikimedia.org/r/c/operations/puppet/+/743216/; as a result of the fix `'-Dwdqs.throttling-filter.time-bucket-capacity-in-seconds=240', '-Dwdqs.throttling-filter.time-bucket-refill-amount-in-seconds=120', '-Dwdqs.throttling-filter.ban-duration-in-minutes=60'` will now be in the `extra_jvm_opts` for `wdqs-internal` hosts
* 19:57 legoktm: re-attributed edits to User:Mirwin~enwiki (T106069)
* 15:38 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on ganeti2022.codfw.wmnet with reason: Temporarily remove node from Ganeti for reimage
* 19:34 logmsgbot: demon Finished scap: azbwiki namespace stuff (duration: 42m 57s)
* 15:38 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on ganeti2022.codfw.wmnet with reason: Temporarily remove node from Ganeti for reimage
* 19:30 moritzm: updated remaining Ubuntu systems for openssl/export grade update
* 14:51 marostegui@cumin1001: dbctl commit (dc=all): 'After maintenance db2137:3315 ([[phab:T277354|T277354]])', diff saved to https://phabricator.wikimedia.org/P17997 and previous config saved to /var/cache/conftool/dbconfig/20211202-145151-marostegui.json
* 18:51 logmsgbot: demon Started scap: azbwiki namespace stuff
* 14:36 marostegui@cumin1001: dbctl commit (dc=all): 'After maintenance db2137:3315 ([[phab:T277354|T277354]])', diff saved to https://phabricator.wikimedia.org/P17996 and previous config saved to /var/cache/conftool/dbconfig/20211202-143646-marostegui.json
* 18:49 logmsgbot: demon Synchronized wmf-config/interwiki.cdb: Updating interwiki cache (duration: 00m 13s)
* 14:21 marostegui@cumin1001: dbctl commit (dc=all): 'After maintenance db2137:3315 ([[phab:T277354|T277354]])', diff saved to https://phabricator.wikimedia.org/P17995 and previous config saved to /var/cache/conftool/dbconfig/20211202-142141-marostegui.json
* 18:48 logmsgbot: demon Synchronized langlist: azbwiki++ (duration: 00m 12s)
* 14:06 marostegui@cumin1001: dbctl commit (dc=all): 'After maintenance db2137:3315 ([[phab:T277354|T277354]])', diff saved to https://phabricator.wikimedia.org/P17994 and previous config saved to /var/cache/conftool/dbconfig/20211202-140636-marostegui.json
* 18:48 logmsgbot: demon Synchronized wmf-config/InitialiseSettings.php: azbwiki++ (duration: 00m 12s)
* 14:06 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2137:3315 ([[phab:T277354|T277354]])', diff saved to https://phabricator.wikimedia.org/P17993 and previous config saved to /var/cache/conftool/dbconfig/20211202-140557-marostegui.json
* 18:47 logmsgbot: demon Synchronized w/static/images/project-logos/azbwiki.png: azbwiki++ (duration: 00m 12s)
* 14:05 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2137.codfw.wmnet with reason: Maintenance [[phab:T277354|T277354]]
* 18:45 logmsgbot: demon rebuilt wikiversions.cdb and synchronized wikiversions files: azbwiki++
* 14:05 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db2137.codfw.wmnet with reason: Maintenance [[phab:T277354|T277354]]
* 18:44 logmsgbot: demon Synchronized database lists: azbwiki++ (duration: 00m 13s)
* 14:05 marostegui@cumin1001: dbctl commit (dc=all): 'After maintenance db2128 ([[phab:T277354|T277354]])', diff saved to https://phabricator.wikimedia.org/P17992 and previous config saved to /var/cache/conftool/dbconfig/20211202-140548-marostegui.json
* 18:18 legoktm: running populateContentModel.php --ns=all --table=page on all medium wikis
* 13:50 marostegui@cumin1001: dbctl commit (dc=all): 'After maintenance db2128 ([[phab:T277354|T277354]])', diff saved to https://phabricator.wikimedia.org/P17990 and previous config saved to /var/cache/conftool/dbconfig/20211202-135043-marostegui.json
* 18:08 logmsgbot: twentyafterfour rebuilt wikiversions.cdb and synchronized wikiversions files: group1 wikis to 1.26wmf15
* 13:49 hnowlan: roll-restarting tilerator,tileratorui,kartotherian in eqiad
* 18:08 logmsgbot: twentyafterfour Synchronized php-1.26wmf15/extensions/MobileFrontend/includes/MobileFrontend.hooks.php: deploy https://gerrit.wikimedia.org/r/#/c/226313/ (duration: 00m 13s)
* 13:37 hnowlan: roll-restarting tilerator,tileratorui,kartotherian in codfw
* 16:03 _joe_: installed the hhvm 3.6.5 on deployment-prep
* 13:35 marostegui@cumin1001: dbctl commit (dc=all): 'After maintenance db2128 ([[phab:T277354|T277354]])', diff saved to https://phabricator.wikimedia.org/P17989 and previous config saved to /var/cache/conftool/dbconfig/20211202-133538-marostegui.json
* 15:52 _joe_: uploaded hhvm_3.6.5+dfsg1-1+wm1 to reprepro
* 13:20 marostegui@cumin1001: dbctl commit (dc=all): 'After maintenance db2128 ([[phab:T277354|T277354]])', diff saved to https://phabricator.wikimedia.org/P17988 and previous config saved to /var/cache/conftool/dbconfig/20211202-132034-marostegui.json
* 15:47 logmsgbot: thcipriani Synchronized w/static/images/project-logos/lrcwiki.png: SWAT: Update the logo of lrcwiki [[gerrit:220358]] (duration: 00m 13s)
* 13:20 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2128 ([[phab:T277354|T277354]])', diff saved to https://phabricator.wikimedia.org/P17987 and previous config saved to /var/cache/conftool/dbconfig/20211202-131959-marostegui.json
* 15:27 logmsgbot: jynus Synchronized wmf-config: removing db-secondary.php (duration: 00m 12s)
* 13:20 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db[2094,2128].codfw.wmnet with reason: Maintenance [[phab:T277354|T277354]]
* 15:26 logmsgbot: jynus Synchronized docroot/noc: removing db-secondary.php from the list of symlinks to maintain (duration: 00m 12s)
* 13:19 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db[2094,2128].codfw.wmnet with reason: Maintenance [[phab:T277354|T277354]]
* 14:20 hashar: enabling puppet on labnodepool1001.eqiad.wmnet
* 13:19 marostegui@cumin1001: dbctl commit (dc=all): 'After maintenance db2113 ([[phab:T277354|T277354]])', diff saved to https://phabricator.wikimedia.org/P17986 and previous config saved to /var/cache/conftool/dbconfig/20211202-131949-marostegui.json
* 14:04 moritzm: added cython_0.20.1+git90-g0e6e38e-1ubuntu2~precise1 to precise-wikimedia on carbon (required for activemq backport on precise)
* 13:04 marostegui@cumin1001: dbctl commit (dc=all): 'After maintenance db2113 ([[phab:T277354|T277354]])', diff saved to https://phabricator.wikimedia.org/P17985 and previous config saved to /var/cache/conftool/dbconfig/20211202-130444-marostegui.json
* 11:37 logmsgbot: springle Synchronized wmf-config/db-eqiad.php: raise db1071 to normal load (duration: 00m 12s)
* 12:49 marostegui@cumin1001: dbctl commit (dc=all): 'After maintenance db2113 ([[phab:T277354|T277354]])', diff saved to https://phabricator.wikimedia.org/P17983 and previous config saved to /var/cache/conftool/dbconfig/20211202-124940-marostegui.json
* 08:03 _joe_: repooling mw1158-60
* 12:34 marostegui@cumin1001: dbctl commit (dc=all): 'After maintenance db2113 ([[phab:T277354|T277354]])', diff saved to https://phabricator.wikimedia.org/P17982 and previous config saved to /var/cache/conftool/dbconfig/20211202-123435-marostegui.json
* 07:22 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Wed Jul 22 07:22:36 UTC 2015 (duration 22m 35s)
* 12:34 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2113 ([[phab:T277354|T277354]])', diff saved to https://phabricator.wikimedia.org/P17981 and previous config saved to /var/cache/conftool/dbconfig/20211202-123356-marostegui.json
* 05:22 logmsgbot: ori Synchronized php-1.26wmf14/extensions/Scribunto/common/Base.php: Cherry-pick I53dd1ecb (duration: 00m 13s)
* 12:33 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2113.codfw.wmnet with reason: Maintenance [[phab:T277354|T277354]]
* 05:22 logmsgbot: ori Synchronized php-1.26wmf15/extensions/Scribunto/common/Base.php: Cherry-pick I53dd1ecb (duration: 00m 13s)
* 12:33 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db2113.codfw.wmnet with reason: Maintenance [[phab:T277354|T277354]]
* 04:43 logmsgbot: ori Synchronized php-1.26wmf14/extensions/Scribunto/common/Base.php: Revert: Live-hack I53dd1ecb to test impact (duration: 00m 12s)
* 12:33 marostegui@cumin1001: dbctl commit (dc=all): 'After maintenance db2111 ([[phab:T277354|T277354]])', diff saved to https://phabricator.wikimedia.org/P17980 and previous config saved to /var/cache/conftool/dbconfig/20211202-123348-marostegui.json
* 04:35 gwicke: deployed small restbase hotfix d96210f2
* 12:31 moritzm: installing NSS security updates
* 04:28 logmsgbot: ori Synchronized php-1.26wmf14/extensions/Scribunto/common/Base.php: Live-hack I53dd1ecb to test impact (duration: 00m 13s)
* 12:27 Lucas_WMDE: UTC morning backport+config window done
* 04:25 logmsgbot: springle Synchronized wmf-config/db-eqiad.php: repool db1071, warm up (duration: 00m 12s)
* 12:23 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:743116{{!}}Wikisource: enable proofreading change-tagging for all Wikisources (T289140)]] (duration: 00m 57s)
* 04:14 springle: upgrade db1071 trusty
* 12:18 marostegui@cumin1001: dbctl commit (dc=all): 'After maintenance db2111 ([[phab:T277354|T277354]])', diff saved to https://phabricator.wikimedia.org/P17979 and previous config saved to /var/cache/conftool/dbconfig/20211202-121843-marostegui.json
* 03:10 logmsgbot: LocalisationUpdate completed (1.26wmf15) at 2015-07-22 03:10:23+00:00
* 12:12 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ganeti2009.codfw.wmnet with OS buster
* 03:04 logmsgbot: l10nupdate Synchronized php-1.26wmf15/cache/l10n: (no message) (duration: 10m 33s)
* 12:03 marostegui@cumin1001: dbctl commit (dc=all): 'After maintenance db2111 ([[phab:T277354|T277354]])', diff saved to https://phabricator.wikimedia.org/P17978 and previous config saved to /var/cache/conftool/dbconfig/20211202-120338-marostegui.json
* 02:52 logmsgbot: springle Synchronized wmf-config/db-eqiad.php: depool db1071 (duration: 00m 11s)
* 12:03 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti2009.codfw.wmnet with OS buster
* 02:37 logmsgbot: LocalisationUpdate completed (1.26wmf14) at 2015-07-22 02:37:45+00:00
* 11:48 marostegui@cumin1001: dbctl commit (dc=all): 'After maintenance db2111 ([[phab:T277354|T277354]])', diff saved to https://phabricator.wikimedia.org/P17977 and previous config saved to /var/cache/conftool/dbconfig/20211202-114833-marostegui.json
* 02:33 logmsgbot: l10nupdate Synchronized php-1.26wmf14/cache/l10n: (no message) (duration: 07m 01s)
* 11:47 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2111 ([[phab:T277354|T277354]])', diff saved to https://phabricator.wikimedia.org/P17976 and previous config saved to /var/cache/conftool/dbconfig/20211202-114755-marostegui.json
* 02:07 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Wed Jul 22 02:07:33 UTC 2015 (duration 7m 32s)
* 11:47 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2111.codfw.wmnet with reason: Maintenance [[phab:T277354|T277354]]
* 02:03 logmsgbot: LocalisationUpdate failed (1.26wmf15) at 2015-07-22 02:03:19+00:00
* 11:47 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db2111.codfw.wmnet with reason: Maintenance [[phab:T277354|T277354]]
* 02:03 logmsgbot: LocalisationUpdate failed (1.26wmf14) at 2015-07-22 02:03:18+00:00
* 11:47 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2101.codfw.wmnet with reason: Maintenance [[phab:T277354|T277354]]
* 11:47 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db2101.codfw.wmnet with reason: Maintenance [[phab:T277354|T277354]]
* 11:47 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 11:47 marostegui@cumin1001: dbctl commit (dc=all): 'After maintenance db2089:3315 ([[phab:T277354|T277354]])', diff saved to https://phabricator.wikimedia.org/P17975 and previous config saved to /var/cache/conftool/dbconfig/20211202-114711-marostegui.json
* 11:32 marostegui@cumin1001: dbctl commit (dc=all): 'After maintenance db2089:3315 ([[phab:T277354|T277354]])', diff saved to https://phabricator.wikimedia.org/P17974 and previous config saved to /var/cache/conftool/dbconfig/20211202-113206-marostegui.json
* 11:28 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 11:21 moritzm: draining primary/secondary instances off ganeti2022 [[phab:T296622|T296622]]
* 11:17 marostegui@cumin1001: dbctl commit (dc=all): 'After maintenance db2089:3315 ([[phab:T277354|T277354]])', diff saved to https://phabricator.wikimedia.org/P17973 and previous config saved to /var/cache/conftool/dbconfig/20211202-111702-marostegui.json
* 11:01 marostegui@cumin1001: dbctl commit (dc=all): 'After maintenance db2089:3315 ([[phab:T277354|T277354]])', diff saved to https://phabricator.wikimedia.org/P17972 and previous config saved to /var/cache/conftool/dbconfig/20211202-110157-marostegui.json
* 11:01 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2089:3315 ([[phab:T277354|T277354]])', diff saved to https://phabricator.wikimedia.org/P17971 and previous config saved to /var/cache/conftool/dbconfig/20211202-110120-marostegui.json
* 11:01 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2089.codfw.wmnet with reason: Maintenance [[phab:T277354|T277354]]
* 11:01 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db2089.codfw.wmnet with reason: Maintenance [[phab:T277354|T277354]]
* 11:01 marostegui@cumin1001: dbctl commit (dc=all): 'After maintenance db2075 ([[phab:T277354|T277354]])', diff saved to https://phabricator.wikimedia.org/P17970 and previous config saved to /var/cache/conftool/dbconfig/20211202-110110-marostegui.json
* 10:46 marostegui@cumin1001: dbctl commit (dc=all): 'After maintenance db2075 ([[phab:T277354|T277354]])', diff saved to https://phabricator.wikimedia.org/P17969 and previous config saved to /var/cache/conftool/dbconfig/20211202-104606-marostegui.json
* 10:31 marostegui@cumin1001: dbctl commit (dc=all): 'After maintenance db2075 ([[phab:T277354|T277354]])', diff saved to https://phabricator.wikimedia.org/P17968 and previous config saved to /var/cache/conftool/dbconfig/20211202-103100-marostegui.json
* 10:15 marostegui@cumin1001: dbctl commit (dc=all): 'After maintenance db2075 ([[phab:T277354|T277354]])', diff saved to https://phabricator.wikimedia.org/P17967 and previous config saved to /var/cache/conftool/dbconfig/20211202-101555-marostegui.json
* 10:15 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2075 ([[phab:T277354|T277354]])', diff saved to https://phabricator.wikimedia.org/P17966 and previous config saved to /var/cache/conftool/dbconfig/20211202-101522-marostegui.json
* 10:15 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2075.codfw.wmnet with reason: Maintenance [[phab:T277354|T277354]]
* 10:15 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db2075.codfw.wmnet with reason: Maintenance [[phab:T277354|T277354]]
* 10:05 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 9 hosts with reason: Maintenance [[phab:T277354|T277354]]
* 10:05 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 9 hosts with reason: Maintenance [[phab:T277354|T277354]]
* 10:03 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance [[phab:T277354|T277354]]
* 10:03 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance [[phab:T277354|T277354]]
* 10:03 marostegui@cumin1001: dbctl commit (dc=all): 'After maintenance db1096:3315 ([[phab:T277354|T277354]])', diff saved to https://phabricator.wikimedia.org/P17964 and previous config saved to /var/cache/conftool/dbconfig/20211202-100307-marostegui.json
* 09:52 moritzm: draining primary/secondary instances off ganeti2009 [[phab:T296622|T296622]]
* 09:48 marostegui@cumin1001: dbctl commit (dc=all): 'After maintenance db1096:3315 ([[phab:T277354|T277354]])', diff saved to https://phabricator.wikimedia.org/P17963 and previous config saved to /var/cache/conftool/dbconfig/20211202-094802-marostegui.json
* 09:32 marostegui@cumin1001: dbctl commit (dc=all): 'After maintenance db1096:3315 ([[phab:T277354|T277354]])', diff saved to https://phabricator.wikimedia.org/P17962 and previous config saved to /var/cache/conftool/dbconfig/20211202-093257-marostegui.json
* 09:27 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti2010.codfw.wmnet to ganeti01.svc.codfw.wmnet
* 09:27 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti2010.codfw.wmnet to ganeti01.svc.codfw.wmnet
* 09:17 marostegui@cumin1001: dbctl commit (dc=all): 'After maintenance db1096:3315 ([[phab:T277354|T277354]])', diff saved to https://phabricator.wikimedia.org/P17961 and previous config saved to /var/cache/conftool/dbconfig/20211202-091753-marostegui.json
* 09:16 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1096:3315 ([[phab:T277354|T277354]])', diff saved to https://phabricator.wikimedia.org/P17960 and previous config saved to /var/cache/conftool/dbconfig/20211202-091629-marostegui.json
* 09:16 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1096.eqiad.wmnet with reason: Maintenance [[phab:T277354|T277354]]
* 09:16 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db1096.eqiad.wmnet with reason: Maintenance [[phab:T277354|T277354]]
* 08:51 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2010.codfw.wmnet
* 08:45 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2010.codfw.wmnet
* 08:34 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti2010.codfw.wmnet with OS buster
* 08:29 dcausse: restarting blazegraph on wdqs1007 (jvm stuck for 4h)
* 08:03 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti2010.codfw.wmnet with OS buster
* 02:50 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1028.eqiad.wmnet with OS buster
* 02:43 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1028.eqiad.wmnet with OS buster
* 02:40 andrew@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=93) for host cloudvirt1028.eqiad.wmnet with OS buster
* 02:15 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1028.eqiad.wmnet with OS buster
* 02:14 andrew@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=93) for host cloudvirt1028.eqiad.wmnet with OS buster
* 01:52 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1028.eqiad.wmnet with OS buster
* 01:21 ryankemper: [[phab:T280001|T280001]] Rolling restart of low-traffic pybal hosts complete. All of `wcqs` is pooled and the pybal / ipvs related alerts have cleared
* 01:16 ryankemper: [[phab:T280001|T280001]] Pooled `wcqs200[1-3]` (had been left unpooled from when we last removed wcqs from production)
* 01:12 ryankemper: [[phab:T280001|T280001]] Restarting pybal on low-traffic primaries `lvs2009` and `lvs1015`: `ryankemper@cumin1001:~$ sudo cumin 'P<nowiki>{</nowiki>lvs2009*,lvs1015*<nowiki>}</nowiki>' 'sudo systemctl restart pybal'`
* 01:11 ryankemper: [[phab:T280001|T280001]] Waited 120s and checked https://icinga.wikimedia.org/alerts, proceeding to primary low-traffic hosts `lvs2009` and `lvs1015`
* 01:08 ryankemper: [[phab:T280001|T280001]] Sanity check of `sudo ipvsadm -L -n` on backup  `lvs2010` and `lvs1016` looks good (for ex `lvs1016` has `TCP  10.2.2.67:443 wrr`)
* 01:07 ryankemper: [[phab:T280001|T280001]] Restarting pybal on low-traffic backups: `ryankemper@cumin1001:~$ sudo cumin 'P<nowiki>{</nowiki>lvs2010*,lvs1016*<nowiki>}</nowiki>' 'sudo systemctl restart pybal'`
* 01:02 ryankemper: [[phab:T280001|T280001]] `ryankemper@cumin1001:~$ sudo cumin 'O:lvs::balancer' 'sudo run-puppet-agent'`
* 01:01 ryankemper: [[phab:T280001|T280001]] Merged https://gerrit.wikimedia.org/r/c/operations/puppet/+/742841
* 01:00 ryankemper: [[phab:T280001|T280001]] About to merge https://gerrit.wikimedia.org/r/c/operations/puppet/+/742841 to bring `wcqs` into state `lvs_setup`, after which I'll perform a rolling restart of pybal
* 00:24 urbanecm@deploy1002: Synchronized php-1.38.0-wmf.9/skins/Vector/: {{Gerrit|a7586cd4a2559248ea1fd29cf74de535de016501}}: Update scroll observer to allow event logging ([[phab:T292586|T292586]]) (duration: 00m 57s)


== 2015-07-21 ==
== 2021-12-01 ==
* 23:45 logmsgbot: catrope Synchronized wmf-config/InitialiseSettings.php: Set $wgVectorResponsive = true on testwiki (duration: 00m 12s)
* 22:15 otto@deploy1002: Finished deploy [airflow-dags/analytics@2f59257] (hadoop-test): (no justification provided) (duration: 00m 07s)
* 23:39 logmsgbot: catrope Synchronized php-1.26wmf14/extensions/VisualEditor: SWAT (duration: 00m 13s)
* 22:15 otto@deploy1002: Started deploy [airflow-dags/analytics@2f59257] (hadoop-test): (no justification provided)
* 23:37 logmsgbot: catrope Synchronized php-1.26wmf15/extensions/VisualEditor: SWAT (duration: 00m 13s)
* 22:13 otto@deploy1002: Finished deploy [airflow-dags/analytics@2f59257] (hadoop-test): (no justification provided) (duration: 00m 07s)
* 23:08 logmsgbot: catrope Synchronized wmf-config/CommonSettings.php: Enable tracking of geo feature usage on enwiki (duration: 00m 12s)
* 22:13 otto@deploy1002: Started deploy [airflow-dags/analytics@2f59257] (hadoop-test): (no justification provided)
* 23:07 logmsgbot: catrope Synchronized wmf-config/InitialiseSettings.php: Enable tracking of geo feature usage on enwiki (duration: 00m 13s)
* 22:12 otto@deploy1002: Finished deploy [airflow-dags/analytics@2f59257] (hadoop-test): (no justification provided) (duration: 00m 07s)
* 23:05 logmsgbot: twentyafterfour rebuilt wikiversions.cdb and synchronized wikiversions files: trying this again: group0 to 1.26wmf15
* 22:12 otto@deploy1002: Started deploy [airflow-dags/analytics@2f59257] (hadoop-test): (no justification provided)
* 22:59 logmsgbot: twentyafterfour Finished scap: test: syncing 1.26wmf15 again (duration: 20m 51s)
* 22:12 otto@deploy1002: Finished deploy [airflow-dags/analytics@2f59257] (hadoop-test): (no justification provided) (duration: 01m 23s)
* 22:54 chasemp: 22:50 <  chasemp> "then git reset --hard 9588d0a6844fc9cc68372f4bf3e1eda3cffc8138 in  /etc/zuul/wikimedia"
* 22:11 otto@deploy1002: Started deploy [airflow-dags/analytics@2f59257] (hadoop-test): (no justification provided)
* 22:51 chasemp: gallium 'service zuul stop && service zuul-merger stop && sudo apt-get install zuul=2.0.0-304-g685ca22-wmf1precise1' DOWNGRADE due to errors
* 22:10 otto@deploy1002: Finished deploy [airflow-dags/analytics@2f59257] (hadoop-test): (no justification provided) (duration: 00m 03s)
* 22:39 logmsgbot: twentyafterfour Started scap: test: syncing 1.26wmf15 again
* 22:10 otto@deploy1002: Started deploy [airflow-dags/analytics@2f59257] (hadoop-test): (no justification provided)
* 22:27 logmsgbot: twentyafterfour rebuilt wikiversions.cdb and synchronized wikiversions files: revert group0 to 1.26wmf15
* 22:10 otto@deploy1002: Finished deploy [airflow-dags/analytics@2f59257] (hadoop-test): (no justification provided) (duration: 00m 03s)
* 22:26 logmsgbot: twentyafterfour rebuilt wikiversions.cdb and synchronized wikiversions files: group0 to 1.26wmf15
* 22:10 otto@deploy1002: Started deploy [airflow-dags/analytics@2f59257] (hadoop-test): (no justification provided)
* 22:20 ori: Accepted mw1090's minion key on palladium
* 22:09 otto@deploy1002: Finished deploy [airflow-dags/analytics@2f59257] (hadoop-test): (no justification provided) (duration: 00m 03s)
* 21:21 logmsgbot: twentyafterfour Finished scap: sync 1.26wmf15 branch + localization cache, remove wmf8 (duration: 27m 32s)
* 22:09 otto@deploy1002: Started deploy [airflow-dags/analytics@2f59257] (hadoop-test): (no justification provided)
* 20:53 logmsgbot: twentyafterfour Started scap: sync 1.26wmf15 branch + localization cache, remove wmf8
* 21:12 otto@deploy1002: Finished deploy [airflow-dags/analytics@2f59257] (hadoop-test): (no justification provided) (duration: 00m 03s)
* 20:53 logmsgbot: twentyafterfour Purged l10n cache for 1.26wmf11
* 21:12 otto@deploy1002: Started deploy [airflow-dags/analytics@2f59257] (hadoop-test): (no justification provided)
* 20:52 logmsgbot: twentyafterfour Purged l10n cache for 1.26wmf10
* 21:11 otto@deploy1002: Finished deploy [airflow-dags/analytics@2f59257] (hadoop-test): (no justification provided) (duration: 00m 16s)
* 20:51 logmsgbot: twentyafterfour Purged l10n cache for 1.26wmf9
* 21:10 otto@deploy1002: Started deploy [airflow-dags/analytics@2f59257] (hadoop-test): (no justification provided)
* 20:28 hasharConfcall: Zuul no more report any result back to Gerrit :(  Fix being deployed
* 21:10 otto@deploy1002: Started deploy [airflow-dags/analytics@2f59257]: (no justification provided)
* 19:56 ori: Dropping AccountAudit table on all wikis (T105894)
* 21:09 razzi@deploy1002: Finished deploy [analytics/refinery@3b1b794]: Regular analytics weekly train [analytics/refinery@3b1b794] (duration: 21m 18s)
* 19:45 logmsgbot: ori Synchronized wmf-config: I3887fd6c: Disable AccountAudit (duration: 00m 12s)
* 21:06 jynus: installing python-monotonic on ms-fe2011, ms-fe2012 (breaks swift-proxy)
* 18:07 logmsgbot: ori Synchronized php-1.26wmf14/extensions/Scribunto: I0e5f2d3b2: Updated mediawiki/core Project: mediawiki/extensions/Scribunto  5af0350e2d09444db279f58504967d0e9b154534 (duration: 00m 13s)
* 21:02 jynus: installing python-monotonic on ms-fe2010
* 18:06 logmsgbot: ori Synchronized php-1.26wmf14/extensions/WikimediaEvents: I0e5f2d3b2: Updated mediawiki/core Project: mediawiki/extensions/WikimediaEvents  968890f1a256a08a02925e4bdb53a8e8d64aacea (duration: 00m 13s)
* 20:48 razzi@deploy1002: Started deploy [analytics/refinery@3b1b794]: Regular analytics weekly train [analytics/refinery@3b1b794]
* 17:08 _joe_: restarted logmsgbot, ircecho on neon
* 20:13 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 16:20 logmsgbot: thcipriani Synchronized php-1.26wmf14/extensions/Wikidata: SWAT: Update Wikibase: Add api featureLog for ungroupedlist param [[gerrit:226086]] (duration: 00m 20s)
* 20:09 pt1979@cumin2002: START - Cookbook sre.dns.netbox
* 16:01 logmsgbot: thcipriani Synchronized php-1.26wmf13/extensions/Wikidata: SWAT: Update Wikibase: Add api featureLog for ungroupedlist param [[gerrit:226086]] (duration: 00m 20s)
* 19:46 otto@deploy1002: Finished deploy [airflow-dags/analytics@2f59257] (hadoop-test): (no justification provided) (duration: 00m 03s)
* 15:37 godog: cleanup ganglia temp files on uranium
* 19:46 otto@deploy1002: Started deploy [airflow-dags/analytics@2f59257] (hadoop-test): (no justification provided)
* 15:34 logmsgbot: thcipriani Synchronized php-1.26wmf14/includes/filerepo/file/File.php: SWAT: Thumbnail logging and stats part II [[gerrit:225936]] (duration: 00m 12s)
* 19:30 otto@deploy1002: Finished deploy [airflow-dags/analytics@bea2abe] (hadoop-test): (no justification provided) (duration: 00m 22s)
* 15:34 logmsgbot: thcipriani Synchronized php-1.26wmf14/thumb.php: SWAT: Thumbnail logging and stats part I [[gerrit:225936]] (duration: 00m 12s)
* 19:30 otto@deploy1002: Started deploy [airflow-dags/analytics@bea2abe] (hadoop-test): (no justification provided)
* 15:29 logmsgbot: thcipriani Synchronized php-1.26wmf14/includes/filerepo/file/File.php: SWAT: Thumbnail logging and stats part II [[gerrit:225936]] (duration: 00m 13s)
* 19:27 otto@deploy1002: Finished deploy [airflow-dags/analytics@bea2abe] (hadoop-test): (no justification provided) (duration: 02m 26s)
* 15:28 logmsgbot: thcipriani Synchronized php-1.26wmf14/thumb.php: SWAT: Thumbnail logging and stats part I [[gerrit:225936]] (duration: 00m 11s)
* 19:25 otto@deploy1002: Started deploy [airflow-dags/analytics@bea2abe] (hadoop-test): (no justification provided)
* 15:20 cmjohnson1: re-installing mw1090
* 19:24 otto@deploy1002: Finished deploy [airflow-dags/analytics@bea2abe] (hadoop-test): (no justification provided) (duration: 00m 03s)
* 15:12 logmsgbot: thcipriani Synchronized wmf-config/InitialiseSettings.php: SWAT: Offer 400px as a thumbnail size available in Special:Preferences [[gerrit:226051]] (duration: 00m 12s)
* 19:24 otto@deploy1002: Started deploy [airflow-dags/analytics@bea2abe] (hadoop-test): (no justification provided)
* 15:08 logmsgbot: thcipriani Synchronized wmf-config/InitialiseSettings.php: SWAT: Assign thumbnail access log to Monolog debug channel [[gerrit:225935]] (duration: 00m 13s)
* 19:24 otto@deploy1002: Finished deploy [airflow-dags/analytics@bea2abe] (hadoop-test): (no justification provided) (duration: 00m 03s)
* 13:57 _joe_: depooling mw1158-60 from the imagescaler pool, to test HHVM-only imagescalers
* 19:24 otto@deploy1002: Started deploy [airflow-dags/analytics@bea2abe] (hadoop-test): (no justification provided)
* 05:08 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Tue Jul 21 05:08:32 UTC 2015 (duration 8m 31s)
* 19:18 otto@deploy1002: Finished deploy [airflow-dags/analytics@bea2abe] (hadoop-test): (no justification provided) (duration: 00m 03s)
* 02:27 logmsgbot: LocalisationUpdate completed (1.26wmf14) at 2015-07-21 02:26:59+00:00
* 19:18 otto@deploy1002: Started deploy [airflow-dags/analytics@bea2abe] (hadoop-test): (no justification provided)
* 02:23 logmsgbot: l10nupdate Synchronized php-1.26wmf14/cache/l10n: (no message) (duration: 06m 55s)
* 19:13 majavah: UTC evening deploys done
* 02:07 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Tue Jul 21 02:07:22 UTC 2015 (duration 7m 21s)
* 19:11 taavi@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:742834{{!}}Add mediawiki.web_ui_scroll stream (T292586)]] (duration: 00m 57s)
* 02:03 logmsgbot: LocalisationUpdate failed (1.26wmf14) at 2015-07-21 02:03:11+00:00
* 18:44 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 18:41 pt1979@cumin2002: START - Cookbook sre.dns.netbox
* 18:41 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp1089.eqiad.wmnet with OS buster
* 18:39 vgutierrez: pool cp1089 using HAProxy as TLS terminator - [[phab:T290005|T290005]]
* 17:58 vgutierrez@cumin1001: START - Cookbook sre.hosts.reimage for host cp1089.eqiad.wmnet with OS buster
* 17:54 vgutierrez: depool cp1089 to be reimaged as cache::text_haproxy - [[phab:T290005|T290005]]
* 16:08 moritzm: installing postgresql-9.6 security updates
* 15:54 godog: bounce logstash on eqiad/codfw to apply template changes
* 15:53 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality' for release 'main' .
* 15:42 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ganeti2010.codfw.wmnet with OS buster
* 15:27 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti2010.codfw.wmnet with OS buster
* 15:27 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ganeti2010.codfw.wmnet with OS buster
* 15:17 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti2010.codfw.wmnet with OS buster
* 15:15 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 15:08 marostegui@cumin1001: dbctl commit (dc=all): 'After maintenance db1180 ([[phab:T277354|T277354]])', diff saved to https://phabricator.wikimedia.org/P17955 and previous config saved to /var/cache/conftool/dbconfig/20211201-150853-marostegui.json
* 14:53 marostegui@cumin1001: dbctl commit (dc=all): 'After maintenance db1180 ([[phab:T277354|T277354]])', diff saved to https://phabricator.wikimedia.org/P17954 and previous config saved to /var/cache/conftool/dbconfig/20211201-145348-marostegui.json
* 14:42 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ganeti2010.codfw.wmnet with OS buster
* 14:38 marostegui@cumin1001: dbctl commit (dc=all): 'After maintenance db1180 ([[phab:T277354|T277354]])', diff saved to https://phabricator.wikimedia.org/P17953 and previous config saved to /var/cache/conftool/dbconfig/20211201-143843-marostegui.json
* 14:38 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts puppetboard1001.eqiad.wmnet
* 14:29 jbond@cumin1001: START - Cookbook sre.hosts.decommission for hosts puppetboard1001.eqiad.wmnet
* 14:28 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts puppetboard2001.codfw.wmnet
* 14:24 marostegui@cumin1001: dbctl commit (dc=all): 'After maintenance db1180 ([[phab:T277354|T277354]])', diff saved to https://phabricator.wikimedia.org/P17951 and previous config saved to /var/cache/conftool/dbconfig/20211201-142339-marostegui.json
* 14:23 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1180 ([[phab:T277354|T277354]])', diff saved to https://phabricator.wikimedia.org/P17950 and previous config saved to /var/cache/conftool/dbconfig/20211201-142227-marostegui.json
* 14:23 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1180.eqiad.wmnet with reason: Maintenance [[phab:T277354|T277354]]
* 14:22 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db1180.eqiad.wmnet with reason: Maintenance [[phab:T277354|T277354]]
* 14:22 marostegui@cumin1001: dbctl commit (dc=all): 'After maintenance db1168 ([[phab:T277354|T277354]])', diff saved to https://phabricator.wikimedia.org/P17949 and previous config saved to /var/cache/conftool/dbconfig/20211201-142219-marostegui.json
* 14:13 jbond@cumin1001: START - Cookbook sre.hosts.decommission for hosts puppetboard2001.codfw.wmnet
* 14:13 jynus: started commonswiki codfw media backup at 8 threads of parallelism
* 14:07 marostegui@cumin1001: dbctl commit (dc=all): 'After maintenance db1168 ([[phab:T277354|T277354]])', diff saved to https://phabricator.wikimedia.org/P17948 and previous config saved to /var/cache/conftool/dbconfig/20211201-140715-marostegui.json
* 13:58 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti2010.codfw.wmnet with OS buster
* 13:56 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 13:55 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 13:52 marostegui@cumin1001: dbctl commit (dc=all): 'After maintenance db1168 ([[phab:T277354|T277354]])', diff saved to https://phabricator.wikimedia.org/P17947 and previous config saved to /var/cache/conftool/dbconfig/20211201-135210-marostegui.json
* 13:37 marostegui@cumin1001: dbctl commit (dc=all): 'After maintenance db1168 ([[phab:T277354|T277354]])', diff saved to https://phabricator.wikimedia.org/P17946 and previous config saved to /var/cache/conftool/dbconfig/20211201-133705-marostegui.json
* 13:37 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1168 ([[phab:T277354|T277354]])', diff saved to https://phabricator.wikimedia.org/P17945 and previous config saved to /var/cache/conftool/dbconfig/20211201-133554-marostegui.json
* 13:36 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1168.eqiad.wmnet with reason: Maintenance [[phab:T277354|T277354]]
* 13:36 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db1168.eqiad.wmnet with reason: Maintenance [[phab:T277354|T277354]]
* 13:35 marostegui@cumin1001: dbctl commit (dc=all): 'After maintenance db1131 ([[phab:T277354|T277354]])', diff saved to https://phabricator.wikimedia.org/P17944 and previous config saved to /var/cache/conftool/dbconfig/20211201-133546-marostegui.json
* 13:30 moritzm: set "sudo gnt-cluster modify --hypervisor-parameters kvm:machine_version=pc-i440fx-2.8" for ganeti eqiad cluster [[phab:T294120|T294120]]
* 13:20 marostegui@cumin1001: dbctl commit (dc=all): 'After maintenance db1131 ([[phab:T277354|T277354]])', diff saved to https://phabricator.wikimedia.org/P17942 and previous config saved to /var/cache/conftool/dbconfig/20211201-132041-marostegui.json
* 13:19 vgutierrez: restore haproxy 2.2.9 on cp3064 - [[phab:T290005|T290005]]
* 13:05 marostegui@cumin1001: dbctl commit (dc=all): 'After maintenance db1131 ([[phab:T277354|T277354]])', diff saved to https://phabricator.wikimedia.org/P17939 and previous config saved to /var/cache/conftool/dbconfig/20211201-130536-marostegui.json
* 12:50 marostegui@cumin1001: dbctl commit (dc=all): 'After maintenance db1131 ([[phab:T277354|T277354]])', diff saved to https://phabricator.wikimedia.org/P17938 and previous config saved to /var/cache/conftool/dbconfig/20211201-125031-marostegui.json
* 12:50 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1131 ([[phab:T277354|T277354]])', diff saved to https://phabricator.wikimedia.org/P17937 and previous config saved to /var/cache/conftool/dbconfig/20211201-124919-marostegui.json
* 12:49 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1131.eqiad.wmnet with reason: Maintenance [[phab:T277354|T277354]]
* 12:49 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db1131.eqiad.wmnet with reason: Maintenance [[phab:T277354|T277354]]
* 12:20 marostegui@cumin1001: dbctl commit (dc=all): 'After maintenance db1165 ([[phab:T277354|T277354]])', diff saved to https://phabricator.wikimedia.org/P17936 and previous config saved to /var/cache/conftool/dbconfig/20211201-122020-marostegui.json
* 12:11 urbanecm: EU B&C window done
* 12:10 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|c8ab29b2feb47d611873cf0465b2a2dd5eac0ad2}}: enwikisource: enable anonymous talk page mobile tabs ([[phab:T47955|T47955]]) (duration: 00m 56s)
* 12:05 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|2bd14e8968c90b2562f045457d61b252728e6250}}: Add templateeditor group and protection level at viwiki ([[phab:T296154|T296154]]) (duration: 00m 56s)
* 12:05 marostegui@cumin1001: dbctl commit (dc=all): 'After maintenance db1165 ([[phab:T277354|T277354]])', diff saved to https://phabricator.wikimedia.org/P17935 and previous config saved to /var/cache/conftool/dbconfig/20211201-120515-marostegui.json
* 11:50 marostegui@cumin1001: dbctl commit (dc=all): 'After maintenance db1165 ([[phab:T277354|T277354]])', diff saved to https://phabricator.wikimedia.org/P17934 and previous config saved to /var/cache/conftool/dbconfig/20211201-115011-marostegui.json
* 11:35 marostegui@cumin1001: dbctl commit (dc=all): 'After maintenance db1165 ([[phab:T277354|T277354]])', diff saved to https://phabricator.wikimedia.org/P17933 and previous config saved to /var/cache/conftool/dbconfig/20211201-113506-marostegui.json
* 11:34 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1165 ([[phab:T277354|T277354]])', diff saved to https://phabricator.wikimedia.org/P17932 and previous config saved to /var/cache/conftool/dbconfig/20211201-113354-marostegui.json
* 11:34 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db[1155,1165].eqiad.wmnet with reason: Maintenance [[phab:T277354|T277354]]
* 11:33 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db[1155,1165].eqiad.wmnet with reason: Maintenance [[phab:T277354|T277354]]
* 11:31 vgutierrez: test HAProxy 2.4.9 on cp3064 - [[phab:T290005|T290005]]
* 11:31 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1140.eqiad.wmnet with reason: Maintenance [[phab:T277354|T277354]]
* 11:30 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db1140.eqiad.wmnet with reason: Maintenance [[phab:T277354|T277354]]
* 11:29 marostegui@cumin1001: dbctl commit (dc=all): 'After maintenance db1113:3316 ([[phab:T277354|T277354]])', diff saved to https://phabricator.wikimedia.org/P17931 and previous config saved to /var/cache/conftool/dbconfig/20211201-112952-marostegui.json
* 11:14 marostegui@cumin1001: dbctl commit (dc=all): 'After maintenance db1113:3316 ([[phab:T277354|T277354]])', diff saved to https://phabricator.wikimedia.org/P17930 and previous config saved to /var/cache/conftool/dbconfig/20211201-111448-marostegui.json
* 10:59 marostegui@cumin1001: dbctl commit (dc=all): 'After maintenance db1113:3316 ([[phab:T277354|T277354]])', diff saved to https://phabricator.wikimedia.org/P17929 and previous config saved to /var/cache/conftool/dbconfig/20211201-105943-marostegui.json
* 10:44 marostegui@cumin1001: dbctl commit (dc=all): 'After maintenance db1113:3316 ([[phab:T277354|T277354]])', diff saved to https://phabricator.wikimedia.org/P17928 and previous config saved to /var/cache/conftool/dbconfig/20211201-104438-marostegui.json
* 10:43 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1113:3316 ([[phab:T277354|T277354]])', diff saved to https://phabricator.wikimedia.org/P17927 and previous config saved to /var/cache/conftool/dbconfig/20211201-104316-marostegui.json
* 10:43 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1113.eqiad.wmnet with reason: Maintenance [[phab:T277354|T277354]]
* 10:43 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db1113.eqiad.wmnet with reason: Maintenance [[phab:T277354|T277354]]
* 10:43 marostegui@cumin1001: dbctl commit (dc=all): 'After maintenance db1098:3316 ([[phab:T277354|T277354]])', diff saved to https://phabricator.wikimedia.org/P17926 and previous config saved to /var/cache/conftool/dbconfig/20211201-104308-marostegui.json
* 10:29 Lucas_WMDE: Deployed patch for [[phab:T296578|T296578]]
* 10:28 marostegui@cumin1001: dbctl commit (dc=all): 'After maintenance db1098:3316 ([[phab:T277354|T277354]])', diff saved to https://phabricator.wikimedia.org/P17925 and previous config saved to /var/cache/conftool/dbconfig/20211201-102804-marostegui.json
* 10:23 vgutierrez: test haproxy_2.2.19-1~bpo10+1 on cp3064 - [[phab:T290005|T290005]]
* 10:13 marostegui@cumin1001: dbctl commit (dc=all): 'After maintenance db1098:3316 ([[phab:T277354|T277354]])', diff saved to https://phabricator.wikimedia.org/P17924 and previous config saved to /var/cache/conftool/dbconfig/20211201-101259-marostegui.json
* 09:57 marostegui@cumin1001: dbctl commit (dc=all): 'After maintenance db1098:3316 ([[phab:T277354|T277354]])', diff saved to https://phabricator.wikimedia.org/P17923 and previous config saved to /var/cache/conftool/dbconfig/20211201-095754-marostegui.json
* 09:56 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1098:3316 ([[phab:T277354|T277354]])', diff saved to https://phabricator.wikimedia.org/P17922 and previous config saved to /var/cache/conftool/dbconfig/20211201-095632-marostegui.json
* 09:56 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1098.eqiad.wmnet with reason: Maintenance [[phab:T277354|T277354]]
* 09:56 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db1098.eqiad.wmnet with reason: Maintenance [[phab:T277354|T277354]]
* 09:56 marostegui@cumin1001: dbctl commit (dc=all): 'After maintenance db1096:3316 ([[phab:T277354|T277354]])', diff saved to https://phabricator.wikimedia.org/P17921 and previous config saved to /var/cache/conftool/dbconfig/20211201-095624-marostegui.json
* 09:49 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 09:48 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 09:46 taavi@deploy1002: Synchronized wmf-config/CommonSettings-labs.php: Config: [[gerrit:742925{{!}}beta: Update mx host]] (duration: 00m 55s)
* 09:43 urbanecm: [urbanecm@mwmaint1002 ~]$ foreachwiki extensions/CheckUser/maintenance/fixTrailingSpacesInLogs.php
* 09:41 marostegui@cumin1001: dbctl commit (dc=all): 'After maintenance db1096:3316 ([[phab:T277354|T277354]])', diff saved to https://phabricator.wikimedia.org/P17920 and previous config saved to /var/cache/conftool/dbconfig/20211201-094120-marostegui.json
* 09:39 ladsgroup@deploy1002: Synchronized php-1.38.0-wmf.9/extensions/FlaggedRevs/backend/FlaggedRevision.php: Backport: [[gerrit:742853{{!}}Drop using ft_title and ft_namespace (T296380)]] (duration: 00m 56s)
* 09:32 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 09:31 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 09:26 marostegui@cumin1001: dbctl commit (dc=all): 'After maintenance db1096:3316 ([[phab:T277354|T277354]])', diff saved to https://phabricator.wikimedia.org/P17919 and previous config saved to /var/cache/conftool/dbconfig/20211201-092615-marostegui.json
* 09:16 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubetcd2005.codfw.wmnet with reason: Switch to DRBD for migration
* 09:16 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on kubetcd2005.codfw.wmnet with reason: Switch to DRBD for migration
* 09:11 marostegui@cumin1001: dbctl commit (dc=all): 'After maintenance db1096:3316 ([[phab:T277354|T277354]])', diff saved to https://phabricator.wikimedia.org/P17918 and previous config saved to /var/cache/conftool/dbconfig/20211201-091110-marostegui.json
* 09:09 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1096:3316 ([[phab:T277354|T277354]])', diff saved to https://phabricator.wikimedia.org/P17917 and previous config saved to /var/cache/conftool/dbconfig/20211201-090948-marostegui.json
* 09:09 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1096.eqiad.wmnet with reason: Maintenance [[phab:T277354|T277354]]
* 09:09 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db1096.eqiad.wmnet with reason: Maintenance [[phab:T277354|T277354]]
* 09:03 vgutierrez: rolling restart of haproxy and varnish on O:cache::text_haproxy and O:cache::upload_haproxy - [[phab:T290005|T290005]]
* 08:56 moritzm: draining primary/secondary instance off ganeti2010 [[phab:T296622|T296622]]
* 08:51 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 08:41 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 08:32 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 06:20 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2141.codfw.wmnet with reason: Maintenance [[phab:T277354|T277354]]
* 06:20 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db2141.codfw.wmnet with reason: Maintenance [[phab:T277354|T277354]]
* 06:20 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2124.codfw.wmnet with reason: Maintenance [[phab:T277354|T277354]]
* 06:20 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db2124.codfw.wmnet with reason: Maintenance [[phab:T277354|T277354]]
* 06:19 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2117.codfw.wmnet with reason: Maintenance [[phab:T277354|T277354]]
* 06:19 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db2117.codfw.wmnet with reason: Maintenance [[phab:T277354|T277354]]
* 00:35 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 00:35 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 00:32 catrope@deploy1002: Synchronized php-1.38.0-wmf.9/extensions/GrowthExperiments/includes/NewcomerTasks/NewcomerTasksUserOptionsLookup.php: Backport: [[gerrit:742548{{!}}Newcomer tasks: Fix filtering of non-existent task types (T296366)]] (duration: 00m 56s)
* 00:10 catrope@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:742817{{!}}Enable A/B test enrollment instrumentation. (T292587)]] (duration: 00m 56s)
* 00:08 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 00:07 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .


== 2015-07-20 ==
== 2021-11-30 ==
* 23:43 gwicke: removed experimental nodes (1008, 1009) from system.peers on production C* nodes
* 23:59 dzahn@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'miscweb' for release 'main' .
* 21:29 ejegg: updated fundraising/tools from 9a9e7881d25f101cc612cfae6375c0a1c9b0f55d to 3e0e3ae799a507b378d0ece3e71631b10b361329
* 23:57 mutante: deploy1002 - kube_env miscweb staging ; helmfile -e staging destroy
* 20:55 XenoRyet: updated payments from ebb1a9e52172a4793cf5feb33220b4d7edfcad70 to 152a64a035a59e67b4469223b8f83609bae523a3
* 23:56 dzahn@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'miscweb' for release 'main' .
* 19:40 gwicke: (eevans, gwicke) removed *.hprof heap dumps from /var/lib/cassandra, freeing up a lot of space especially on 1004 & 1005
* 23:46 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 18:22 gwicke: deployed restbase 0951a6d to remaining nodes
* 23:45 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 17:55 gwicke: canary restbase deploy of 0951a6d on restbase1001
* 23:09 mutante: gerrit - added Majavah to wmf-deployment group for [[phab:T296777|T296777]]
* 16:44 godog: powercycle mw1090, no console no anything
* 22:30 krinkle@deploy1002: Finished deploy [integration/docroot@2af7007]: {{Gerrit|Ia89b6591639e5}} (duration: 00m 09s)
* 15:31 ejegg: updated AstroPay curl timeout setting on payments to 12 seconds
* 22:30 krinkle@deploy1002: Started deploy [integration/docroot@2af7007]: {{Gerrit|Ia89b6591639e5}}
* 05:32 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Mon Jul 20 05:32:31 UTC 2015 (duration 32m 30s)
* 22:21 mutante: welcome Majavah to MediaWiki deployers ([[phab:T296777|T296777]])
* 02:28 logmsgbot: LocalisationUpdate completed (1.26wmf14) at 2015-07-20 02:28:03+00:00
* 20:09 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|5443b78f197b782238632966891d721859733a74}}: uzwiki: Deploy Growth features to newcomers ([[phab:T294245|T294245]]) (duration: 00m 57s)
* 02:24 logmsgbot: l10nupdate Synchronized php-1.26wmf14/cache/l10n: (no message) (duration: 07m 07s)
* 18:09 legoktm: uploaded php-yaml for component/php72 ([[phab:T296331|T296331]])
* 02:07 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Mon Jul 20 02:07:34 UTC 2015 (duration 7m 33s)
* 18:08 vgutierrez: restart haproxy on cp3064 - [[phab:T290005|T290005]]
* 02:03 logmsgbot: LocalisationUpdate failed (1.26wmf14) at 2015-07-20 02:03:24+00:00
* 17:44 jynus@cumin1001: dbctl commit (dc=all): 'Repool db1163 fully', diff saved to https://phabricator.wikimedia.org/P17912 and previous config saved to /var/cache/conftool/dbconfig/20211130-174434-jynus.json
* 00:02 mutante: DNS update - adding language "azb" to langlist
* 17:39 jynus@cumin1001: dbctl commit (dc=all): 'Repool db1163 at 50%', diff saved to https://phabricator.wikimedia.org/P17911 and previous config saved to /var/cache/conftool/dbconfig/20211130-173935-jynus.json
* 17:35 jynus@cumin1001: dbctl commit (dc=all): 'Repool db1163 at 25%', diff saved to https://phabricator.wikimedia.org/P17910 and previous config saved to /var/cache/conftool/dbconfig/20211130-173517-jynus.json
* 17:34 moritzm: installing libvorbis security updates
* 17:15 jynus@cumin1001: dbctl commit (dc=all): 'Repool db1163 at 5%', diff saved to https://phabricator.wikimedia.org/P17908 and previous config saved to /var/cache/conftool/dbconfig/20211130-171550-jynus.json
* 17:00 jynus: move db1139:s1 under db1118
* 16:57 jynus@cumin1001: dbctl commit (dc=all): 'Depool db1163 fully', diff saved to https://phabricator.wikimedia.org/P17907 and previous config saved to /var/cache/conftool/dbconfig/20211130-165718-jynus.json
* 16:29 XioNoX: Move cr2-codfw lumen transit link to BO cable - [[phab:T289241|T289241]]
* 16:26 XioNoX: Move cr2-codfw eqord link to BO cable - [[phab:T289241|T289241]]
* 16:23 XioNoX: Move cr2-codfw pfw3 link to BO cable - [[phab:T289241|T289241]]
* 16:20 Emperor: reboot ms-be2059 to fix device enumeration order re [[phab:T295563|T295563]]
* 16:14 jynus@cumin1001: dbctl commit (dc=all): 'Depool db1163 at 25%', diff saved to https://phabricator.wikimedia.org/P17906 and previous config saved to /var/cache/conftool/dbconfig/20211130-161457-jynus.json
* 16:13 XioNoX: cr2-codfw bounce fpc 1 pic 0 (vrrp backup) - [[phab:T289241|T289241]]
* 16:07 jynus@cumin1001: dbctl commit (dc=all): 'Depool db1163 at 50%', diff saved to https://phabricator.wikimedia.org/P17905 and previous config saved to /var/cache/conftool/dbconfig/20211130-160748-jynus.json
* 16:06 bblack: lvs2007 - repooling into service
* 16:01 bblack: lvs2007 - depooling for network maint - do not push LVS config changes please!
* 15:41 jbond@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=99) for hosts puppetboard2001.codfw.wmnet
* 15:41 jbond@cumin1001: START - Cookbook sre.hosts.decommission for hosts puppetboard2001.codfw.wmnet
* 15:38 jbond@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=99) for hosts puppetboard2001.codfw.wmnet
* 15:37 jbond@cumin1001: START - Cookbook sre.hosts.decommission for hosts puppetboard2001.codfw.wmnet
* 15:32 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 15:29 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 15:23 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 15:22 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 15:16 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 15:15 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 15:12 jforrester@deploy1002: Synchronized multiversion/MWMultiVersion.php: Add wikifunctions hard-coded value to setSiteInfoForWiki for Beta Cluster [[phab:T284162|T284162]] (duration: 00m 56s)
* 15:09 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 15:08 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 13:45 elukey@cumin1001: END (PASS) - Cookbook sre.kafka.roll-restart-brokers (exit_code=0) for Kafka A:kafka-test-eqiad cluster: Roll restart of jvm daemons for openjdk upgrade.
* 13:25 elukey@cumin1001: START - Cookbook sre.kafka.roll-restart-brokers for Kafka A:kafka-test-eqiad cluster: Roll restart of jvm daemons for openjdk upgrade.
* 13:11 marostegui@cumin1001: dbctl commit (dc=all): 'After maintenance db2114 ([[phab:T277354|T277354]])', diff saved to https://phabricator.wikimedia.org/P17904 and previous config saved to /var/cache/conftool/dbconfig/20211130-131124-marostegui.json
* 13:05 topranks: Running homer against CR routers to adjust loopback4 filter enabling local NTP queries for status.  [[phab:T296623|T296623]]
* 12:56 marostegui@cumin1001: dbctl commit (dc=all): 'After maintenance db2114 ([[phab:T277354|T277354]])', diff saved to https://phabricator.wikimedia.org/P17903 and previous config saved to /var/cache/conftool/dbconfig/20211130-125620-marostegui.json
* 12:41 marostegui@cumin1001: dbctl commit (dc=all): 'After maintenance db2114 ([[phab:T277354|T277354]])', diff saved to https://phabricator.wikimedia.org/P17902 and previous config saved to /var/cache/conftool/dbconfig/20211130-124115-marostegui.json
* 12:26 marostegui@cumin1001: dbctl commit (dc=all): 'After maintenance db2114 ([[phab:T277354|T277354]])', diff saved to https://phabricator.wikimedia.org/P17901 and previous config saved to /var/cache/conftool/dbconfig/20211130-122610-marostegui.json
* 12:25 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2114 ([[phab:T277354|T277354]])', diff saved to https://phabricator.wikimedia.org/P17900 and previous config saved to /var/cache/conftool/dbconfig/20211130-122555-marostegui.json
* 12:25 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2114.codfw.wmnet with reason: Maintenance [[phab:T277354|T277354]]
* 12:25 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db2114.codfw.wmnet with reason: Maintenance [[phab:T277354|T277354]]
* 12:09 jbond@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=99) for hosts puppetboard1001.eqiad.wmnet
* 12:02 jbond@cumin1001: START - Cookbook sre.hosts.decommission for hosts puppetboard1001.eqiad.wmnet
* 11:50 moritzm: running "sudo gnt-cluster renew-crypto --new-node-certificates --new-rapi-certificate --new-spice-certificate" for Ganeti codfw cluster [[phab:T296622|T296622]]
* 11:01 hnowlan: restarting tilerator, kartotherian and tileratorui for updates in eqiad
* 11:01 hnowlan: restarting tilerator, kartotherian and tileratorui in codfw
* 10:39 elukey: rollout wmf-certificates 0~20211129-1 fleet wide (add group/others permissions to the cert bundle)
* 10:30 lucaswerkmeister-wmde@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'termbox' for release 'production' .
* 10:29 lucaswerkmeister-wmde@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'termbox' for release 'production' .
* 09:58 moritzm: installing remaining ICU security updates
* 09:06 Amir1: dropping wikiadmin@localhost from all pooled replicas of s6 ([[phab:T296511|T296511]])
* 08:24 dcausse: restarting blazegraph on wdqs1006 (jvm stuck for 6hours)
* 08:14 Amir1: revoking DROP from wikiadmin on all pooled replicas ([[phab:T249683|T249683]])
* 03:46 ejegg: updated payments-wiki from {{Gerrit|dbc92132}} to {{Gerrit|4a4ef51d}}
* 02:05 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 02:04 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 00:22 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 00:20 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 00:17 catrope@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:742524{{!}}Enable scroll tracking for all users (T292586)]] (duration: 00m 55s)
* 00:14 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 00:14 catrope@deploy1002: Synchronized php-1.38.0-wmf.9/extensions/WikimediaEvents/modules/ext.wikimediaEvents/readingDepth.js: Backport: [[gerrit:742517{{!}}Provide fallback for config variable when not present]] (duration: 00m 55s)
* 00:13 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 00:13 catrope@deploy1002: Synchronized wmf-config/flaggedrevs.php: Config: [[gerrit:738530{{!}}allow sysops to set/remove reviewer group on ckbwiki (T294696)]] (duration: 00m 55s)


== 2015-07-19 ==
== 2021-11-29 ==
* 20:52 logmsgbot: krenair Synchronized w/static/images/project-logos/arbcom_enwiki.png: https://gerrit.wikimedia.org/r/#/c/225822/ (duration: 00m 12s)
* 22:32 sbassett@deploy1002: Synchronized php-1.38.0-wmf.9/extensions/EntitySchema/src/MediaWiki/Specials/SetEntitySchemaLabelDescriptionAliases.php: Deploy security patch for [[phab:T296578|T296578]] (duration: 00m 55s)
* 19:10 logmsgbot: ori Synchronized wmf-config/InitialiseSettings.php: Ic0573f26: Follow-up for I189d748: whitelist 'archive.org' too (duration: 00m 12s)
* 22:21 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 19:06 logmsgbot: ori Synchronized wmf-config/InitialiseSettings.php: I189d748a: Whitelist *.archive.org for wgCopyUploadsDomains (T106293) (duration: 00m 13s)
* 22:20 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 18:29 logmsgbot: hoo Synchronized wmf-config/CommonSettings.php: Enable IP user page creation on fawiki's Draft ns (duration: 00m 11s)
* 22:20 sbassett@deploy1002: Synchronized php-1.38.0-wmf.9/extensions/FileImporter/src/Remote/MediaWiki/HttpApiLookup.php: Backport: [[gerrit:742263{{!}}SECURITY: Fix special page displaying unescaped user input (T296605)]] (duration: 00m 56s)
* 18:18 logmsgbot: ori Synchronized php-1.26wmf14/includes/site/SiteSQLStore.php: I0e5f2d3b2: Use CACHE_ACCEL for SiteLists if on HHVM (duration: 00m 12s)
* 20:48 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 17:37 logmsgbot: ori Synchronized wmf-config: Ib508a440: Undeploy VectorBeta (Task: T87489) (duration: 00m 13s)
* 20:47 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 17:27 logmsgbot: krenair Synchronized w/static/images/project-logos/arbcom_enwiki.png: https://gerrit.wikimedia.org/r/#/c/225718/ (duration: 00m 12s)
* 20:46 jforrester@deploy1002: Synchronized wmf-config/CommonSettings.php: Fix wgWikiLambdaOrchestratorLocation service pointer typo (duration: 00m 55s)
* 17:21 logmsgbot: krenair Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/225705/ (duration: 00m 12s)
* 20:27 tgr: UTC evening deploys done
* 17:14 logmsgbot: krenair Synchronized w/static/images/project-logos/arbcom_enwiki.png: https://gerrit.wikimedia.org/r/#/c/225705/ (duration: 00m 12s)
* 20:26 tgr@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:742261{{!}}GrowthExperiments: Start imagerecommendation variant experiment]] (duration: 00m 55s)
* 05:10 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Sun Jul 19 05:10:10 UTC 2015 (duration 10m 9s)
* 20:23 tgr@deploy1002: Synchronized php-1.38.0-wmf.9/extensions/GrowthExperiments/includes/NewcomerTasks/AddImage/AddImageSubmissionHandler.php: Backport: [[gerrit:742262{{!}}AddImage: Refresh user's task feed after undecided rejection (T296491)]] (duration: 00m 56s)
* 02:27 logmsgbot: LocalisationUpdate completed (1.26wmf14) at 2015-07-19 02:27:35+00:00
* 20:21 tgr@deploy1002: Synchronized php-1.38.0-wmf.9/extensions/GrowthExperiments/includes/HomepageModules/SuggestedEdits.php: Backport: [[gerrit:742260{{!}}SuggestedEdits: Drop isActivated() check in getJsData (T296626)]] (duration: 00m 56s)
* 02:23 logmsgbot: l10nupdate Synchronized php-1.26wmf14/cache/l10n: (no message) (duration: 07m 04s)
* 20:17 ejegg: updated payments-wiki from {{Gerrit|d1d6f024}} -> {{Gerrit|dbc92132}}
* 02:07 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Sun Jul 19 02:07:15 UTC 2015 (duration 7m 14s)
* 20:16 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 02:03 logmsgbot: LocalisationUpdate failed (1.26wmf14) at 2015-07-19 02:03:05+00:00
* 20:15 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 20:10 eileen: civicrm
* 20:09 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 20:08 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 20:02 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 20:01 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 20:00 ebernhardson@deploy1002: Synchronized wmf-config/InitialiseSettings.php: [[phab:T295705|T295705]] Move CirrusSearch traffic back to eqiad (duration: 00m 56s)
* 19:42 legoktm: uploaded php-yaml_2.2.1+2.1.0+2.0.4+1.3.2-2+wmf1~buster1_amd64.changes to apt.wm.o ([[phab:T296331|T296331]])
* 19:34 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 19:33 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 19:16 vgutierrez: pool cp3064 - [[phab:T290005|T290005]]
* 18:55 bblack: repooling esams
* 18:48 bblack: esams: shifting depool method to esams-offline (now that its config is fixed)
* 18:42 legoktm: depooling esams
* 18:17 vgutierrez: depool cp3064 - [[phab:T290005|T290005]]
* 17:58 ladsgroup@deploy1002: Synchronized php-1.38.0-wmf.9/includes/libs/rdbms/: Backport: [[gerrit:742259{{!}}rdbms: Add DB host to TransactionProfiler logging and fix time fields (T295706)]] (duration: 00m 56s)
* 17:56 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 17:55 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 17:49 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 17:48 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 17:42 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 17:41 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 17:40 jforrester@deploy1002: Synchronized wmf-config/CommonSettings.php: Initial Beta Cluster deployment of Wikifunctions: III - CS for [[phab:T289315|T289315]] (duration: 00m 55s)
* 17:38 vgutierrez: pool cp3064 - [[phab:T290005|T290005]]
* 17:29 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 17:28 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 17:25 lucaswerkmeister-wmde@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'termbox' for release 'test' .
* 17:25 lucaswerkmeister-wmde@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'termbox' for release 'staging' .
* 17:22 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 17:22 jforrester@deploy1002: Synchronized wmf-config/ProductionServices.php: Initial Beta Cluster deployment of Wikifunctions: II - Services for [[phab:T289315|T289315]] (duration: 00m 55s)
* 17:21 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 17:18 jforrester@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Initial Beta Cluster deployment of Wikifunctions: I - IS for [[phab:T289315|T289315]] (duration: 00m 55s)
* 17:00 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 16:59 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|06d8d25f6e89be0b1692d017bdbc2c9524372c0b}}: foundationwiki: Remove explicit wmgUseOAuth (duration: 00m 57s)
* 16:59 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 16:56 urbanecm@deploy1002: Synchronized wmf-config/CommonSettings.php: {{Gerrit|bad34ed8d86b30eb4c240da0498ddfb44af30ea7}}: Make foundationwiki a standard CentralAuth wiki ([[phab:T205347|T205347]]) (duration: 00m 56s)
* 16:53 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 16:52 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|567f2a9d4883c9a98a3251f153ea0ad58d7774c6}}: Revert "foundationwiki: Set wmgLocalAuthLoginOnly=false temporarily" ([[phab:T205347|T205347]]) (duration: 00m 56s)
* 16:51 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 16:25 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 16:21 pt1979@cumin2002: START - Cookbook sre.dns.netbox
* 16:20 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic2069.codfw.wmnet with OS buster
* 16:04 moritzm: sudo gnt-cluster upgrade --to 2.16 for Ganeti codfw cluster
* 15:55 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 15:54 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 15:52 vgutierrez: depool cp3064 - [[phab:T290005|T290005]]
* 15:51 James_F: Running mwscript extensions/WikimediaMaintenance/addWiki.php --wiki=enwiki en wikimedia wikifunctionswiki wikifunctions.beta.wmflabs.org in Beta Cluster for [[phab:T284162|T284162]]
* 15:51 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host elastic2069.codfw.wmnet with OS buster
* 15:48 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 15:47 papaul: power down logstash2028 for IDRAC reset
* 15:47 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 15:15 moritzm: gnt-cluster renew-crypto --new-cluster-certificate for codfw Ganeti cluster [[phab:T296622|T296622]]
* 14:40 mbsantos@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
* 14:38 mbsantos@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
* 14:37 mbsantos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'wikifeeds' for release 'staging' .
* 13:55 vgutierrez: repool cp3064 - [[phab:T290005|T290005]]
* 12:51 moritzm: upgrading ganeti codfw cluster to 2.16 backport [[phab:T296622|T296622]]
* 12:49 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 12:48 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 12:36 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 12:35 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 12:32 vgutierrez: depool cp3064 - [[phab:T290005|T290005]]
* 12:32 urbanecm@deploy1002: Synchronized php-1.38.0-wmf.9/extensions/GrowthExperiments/includes/HomepageModules/SuggestedEdits.php: {{Gerrit|05704407395fbf227eec47cf716393dc60a36a35}}: Fix error handling in SuggestedEdits::getActionData ([[phab:T296366|T296366]]) (duration: 05m 37s)
* 12:29 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 12:28 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 12:17 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 12:16 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 12:13 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|7fdea3e71e4fd9e85c30efbc17f94c0711deb252}}:  Add planet4589.org to the wgCopyUploadsDomains allowlist of Wikimedia Commons ([[phab:T296136|T296136]]) (duration: 00m 56s)
* 12:11 vgutierrez: pool cp3064 (text) using HAProxy as TLS terminator - [[phab:T290005|T290005]]
* 12:10 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 12:09 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp3064.esams.wmnet with OS buster
* 12:09 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 12:07 urbanecm@deploy1002: Synchronized docroot/: {{Gerrit|4662224229cb4083b8b01de436ccd65e8c00e7dd}}: Remove search.wikimedia.org files ([[phab:T289224|T289224]]) (duration: 00m 56s)
* 11:02 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 11:01 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 10:58 urbanecm@deploy1002: Synchronized php-1.38.0-wmf.9/extensions/CentralAuth/includes/CentralAuthUser.php: {{Gerrit|5fc6aaa73202a1bf2aa58998d2671d5f4a6255bc}}: Fix "Mark entries as bot entries" feature(2/2; [[phab:T296297|T296297]]) (duration: 00m 55s)
* 10:57 urbanecm@deploy1002: Synchronized php-1.38.0-wmf.9/extensions/CentralAuth/includes/Special/SpecialMultiLock.php: {{Gerrit|5fc6aaa73202a1bf2aa58998d2671d5f4a6255bc}}: Fix "Mark entries as bot entries" feature (1/2; [[phab:T296297|T296297]]) (duration: 00m 56s)
* 10:55 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 10:54 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 10:52 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|d01652ec22f6cb3413b419a3c9b0a7a08d79960f}}: Disable Growth IP research survey ([[phab:T294568|T294568]]) (duration: 00m 56s)
* 10:45 vgutierrez@cumin1001: START - Cookbook sre.hosts.reimage for host cp3064.esams.wmnet with OS buster
* 10:45 vgutierrez@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp3064.esams.wmnet with OS buster
* 10:02 vgutierrez@cumin1001: START - Cookbook sre.hosts.reimage for host cp3064.esams.wmnet with OS buster
* 10:01 vgutierrez: depool cp3064 to be reimaged as cache::text_haproxy - [[phab:T290005|T290005]]
* 09:52 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp2041.codfw.wmnet with OS buster
* 09:52 vgutierrez: pool cp2041 with HAProxy as TLS terminator - [[phab:T290005|T290005]]
* 09:46 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 09:42 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 09:36 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 09:35 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 09:34 moritzm: rolling restart of mediawiki canaries to pick up ICU security updates
* 09:34 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: NOOP: {{Gerrit|3a892860b2e1e2ac7b60fc1c4dbdb2035d6af950}}: foundationwiki: Do not enable wmgUsePageViewInfo explicitly (duration: 00m 55s)
* 09:32 urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript emptyUserGroup.php --wiki=foundationwiki 'inactive' # removing nonexistent group; backup left at P17888
* 09:30 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|786313c06188d5d63700d7e46384ef99a9297b57}}: foundationwiki: Clear group add/remove declarations (duration: 00m 55s)
* 09:29 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 09:28 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 09:27 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|c3f47dc55b67d2b53ec27bb610978ff8165aa6ca}}: foundationwiki: Disable hard redirects (duration: 00m 57s)
* 08:59 vgutierrez@cumin1001: START - Cookbook sre.hosts.reimage for host cp2041.codfw.wmnet with OS buster
* 08:56 vgutierrez: depool cp2041 to be reimaged as cache::text_haproxy - [[phab:T290005|T290005]]
* 08:54 moritzm: installing ICU security updates on buster
* 08:33 moritzm: installing bluez security updates
* 08:26 moritzm: installing libvpx security updates
* 08:19 moritzm: instaling libntlm security updates
* 08:07 elukey@deploy1002: Finished deploy [ores/deploy@69ed061]: Upgrade of mwparserfromhell - [[phab:T296563|T296563]] (duration: 07m 01s)
* 08:00 marostegui: Restart db2078 and db1117
* 08:00 elukey@deploy1002: Started deploy [ores/deploy@69ed061]: Upgrade of mwparserfromhell - [[phab:T296563|T296563]]
* 07:31 elukey@deploy1002: Finished deploy [ores/deploy@69ed061]: Canary upgrade of mwparserfromhell - [[phab:T296563|T296563]] - (second attempt, no git update submodules the first time) (duration: 00m 04s)
* 07:31 elukey@deploy1002: Started deploy [ores/deploy@69ed061]: Canary upgrade of mwparserfromhell - [[phab:T296563|T296563]] - (second attempt, no git update submodules the first time)
* 06:11 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pc2014.codfw.wmnet with OS bullseye
* 05:39 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host pc2014.codfw.wmnet with OS bullseye


== 2015-07-18 ==
== 2021-11-28 ==
* 20:58 logmsgbot: legoktm Synchronized wmf-config/InitialiseSettings-labs.php: labs only (duration: 00m 12s)
* 17:14 elukey@deploy1002: Finished deploy [ores/deploy@69ed061]: Canary upgrade of mwparserfromhell - [[phab:T296563|T296563]] (duration: 02m 11s)
* 20:44 YuviPanda: restarted etherpad
* 17:12 elukey@deploy1002: Started deploy [ores/deploy@69ed061]: Canary upgrade of mwparserfromhell - [[phab:T296563|T296563]]
* 18:56 akosiaris: reinstall labsdb1004
* 16:36 paravoid: Ganglia is up :)
* 16:09 Krenair: Ganglia seems down
* 15:42 Krenair: Doing T44180
* 05:28 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Sat Jul 18 05:28:25 UTC 2015 (duration 28m 24s)
* 02:34 logmsgbot: LocalisationUpdate completed (1.26wmf14) at 2015-07-18 02:34:29+00:00
* 02:30 logmsgbot: l10nupdate Synchronized php-1.26wmf14/cache/l10n: (no message) (duration: 07m 19s)
* 02:07 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Sat Jul 18 02:07:38 UTC 2015 (duration 7m 37s)
* 02:03 logmsgbot: LocalisationUpdate failed (1.26wmf14) at 2015-07-18 02:03:29+00:00
* 00:49 ejegg: restored recurring globalcollect batch size of 250
* 00:09 ejegg: updated civicrm from 78de1b9b74934984af3099afe9192fa53011bdaa to 292ad137f6b3ffc818a3bd617ca4f335931091f3


== 2015-07-17 ==
== 2021-11-27 ==
* 21:51 ejegg: updated civicrm from 0acac037ce0c9a64e94a475463deb2d47e84193a to 78de1b9b74934984af3099afe9192fa53011bdaa
* 19:55 andrew@deploy1002: Finished deploy [horizon/deploy@6115b3b]: network UI updates for [[phab:T296548|T296548]] (duration: 04m 14s)
* 20:53 matt_flaschen: Manually fixed issue in mediawikiwiki LQT thread table with rename of Ecliptica to Entropy. https://phabricator.wikimedia.org/T106122#1461380
* 19:51 andrew@deploy1002: Started deploy [horizon/deploy@6115b3b]: network UI updates for [[phab:T296548|T296548]]
* 20:03 hashar: stopping Zuul to get rid of a faulty registered function "build:Global-Dev Dashboard Data". Job is gone already.
* 19:47 andrew@deploy1002: Finished deploy [horizon/deploy@6115b3b]: network UI tests in codfw1dev (duration: 02m 01s)
* 17:50 ejegg: updated civicrm from fa724dd2e2e69545d81015c943cb7f52cf6de8e1 to 0acac037ce0c9a64e94a475463deb2d47e84193a
* 19:45 andrew@deploy1002: Started deploy [horizon/deploy@6115b3b]: network UI tests in codfw1dev
* 16:49 gwicke: restarted restbase on restbase1001
* 12:22 elukey: drop /var/tmp/core files from ores100[2,4] root partition full
* 15:04 gwicke: restarted RB thinner scripts, see https://phabricator.wikimedia.org/T105706
* 12:10 elukey: drop /var/tmp/core files from ores1009, root partition full
* 14:10 urandom: restart restbase service on restbase1006
* 11:55 elukey: disable coredumps for ORES celery units (will cause a roll restart of all celeries) - [[phab:T296563|T296563]]
* 14:07 urandom: restart restbase service on restbase1003
* 11:46 elukey: drop ores coredumps from ores1008
* 14:05 urandom: restart restbase service on restbase1002
* 09:56 elukey: powercycle analytics1071, soft lockup stacktraces in the tty
* 13:56 godog: apache2ctl graceful on fluorine antimony argon caesium helium
* 09:51 elukey: move ores coredump files from /var/cache/tmp to /srv/coredumps on ores100[6,7,8] and ores2003 to free space on the root partition
* 13:43 godog: apache2ctl graceful on netmon1001
* 11:24 hashar: rebooted labnodepool1001.eqiad.wmnet . Accidentally deleted the whole /dev which freeze everything :(
* 10:21 _joe_: repooling mw1158
* 09:08 _joe_: depooling mw1158, repooling mw1156,7
* 07:51 _joe_: depooled mw1156,7 for reimaging
* 04:53 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Fri Jul 17 04:53:56 UTC 2015 (duration 53m 55s)
* 03:31 logmsgbot: springle Synchronized wmf-config/db-eqiad.php: repool db1030 (duration: 00m 12s)
* 02:30 logmsgbot: LocalisationUpdate completed (1.26wmf14) at 2015-07-17 02:30:03+00:00
* 02:26 logmsgbot: l10nupdate Synchronized php-1.26wmf14/cache/l10n: (no message) (duration: 05m 55s)
* 02:07 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Fri Jul 17 02:07:22 UTC 2015 (duration 7m 20s)
* 02:03 logmsgbot: LocalisationUpdate failed (1.26wmf14) at 2015-07-17 02:03:12+00:00
* 01:30 mutante: git pull origin on strontium


== 2015-07-16 ==
== 2021-11-26 ==
* 21:27 ori: bounced nutcracker on mw1139 as well. hashar noticed flood of errors from these hosts on https://logstash.wikimedia.org/#/dashboard/elasticsearch/mediawiki-errors . lack of monitoring / alerts is troubling.
* 16:11 arnoldokoth: drain kubestage1002 node in prep for decommissioning
* 21:26 ori: bounced nutcracker on mw1128 and mw1134
* 16:05 arnoldokoth: drain kubestage1001 node in prep for decommissioning
* 20:50 mutante: iegreview tool - short maintenance downtime
* 15:46 elukey: move /var/tmp/core/* to /srv/coredumps on ores1008 to free root space
* 19:39 YuviPanda: imported aspell-id from ubuntu to jessie-wikimedia - needed by ores, simple package that I am not sure why it is not in jessie
* 14:30 jelto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'miscweb' for release 'main' .
* 19:20 logmsgbot: twentyafterfour Synchronized php-1.26wmf14/includes/db/LoadMonitor.php: Deploying Hotfix for T105373 (duration: 00m 13s)
* 14:25 jelto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'miscweb' for release 'main' .
* 18:40 logmsgbot: twentyafterfour rebuilt wikiversions.cdb and synchronized wikiversions files: all wikis to 1.26wmf14
* 14:21 jelto@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'miscweb' for release 'main' .
* 18:26 ejegg: changed batch size from 250 to 1 in RGC jenkins job
* 13:48 jelto@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'miscweb' for release 'main' .
* 18:22 ejegg: updated civicrm from 24e0fc854433ea4982e94a0fd2f8bdad8f8dcad7 to fa724dd2e2e69545d81015c943cb7f52cf6de8e1
* 13:46 jelto@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'miscweb' for release 'main' .
* 16:56 Jeff_Green: authdns update to rename lutetium.wm.o
* 13:25 akosiaris@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
* 16:08 hashar_: kept nodepool stopped on labnodepool1001.eqiad.wmnet because it spams the cron log
* 13:25 akosiaris@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
* 15:57 logmsgbot: demon Synchronized multiversion/MWMultiVersion.php: prod no-op, beta change (duration: 00m 13s)
* 12:21 vgutierrez: restarting HAProxy on O:cache::upload_haproxy - [[phab:T290005|T290005]]
* 15:54 logmsgbot: krenair Synchronized wmf-config/InitialiseSettings-labs.php: https://gerrit.wikimedia.org/r/#/c/224975/ (duration: 00m 12s)
* 11:41 akosiaris: [[phab:T296303|T296303]] cleanup weird state of calico-codfw cluster
* 15:27 logmsgbot: thcipriani Synchronized php-1.26wmf14/extensions/Math/MathMathML.php: SWAT: Fix: Undefined variable passed hook [[gerrit:225058]] (duration: 00m 12s)
* 11:41 akosiaris@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'sync'.
* 15:03 ejegg: updated payments from 4ca95d55a9745c05ccfbb16ee6f23a6f75328824 to ebb1a9e52172a4793cf5feb33220b4d7edfcad70
* 11:41 akosiaris@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'sync'.
* 12:21 dcausse: es1.6 upgrade: all done
* 11:39 akosiaris@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'sync'.
* 11:32 dcausse: restarted gmond on elastic1024
* 11:25 vgutierrez: restarting HAProxy on O:cache::(text{{!}}upload)_haproxy - [[phab:T290005|T290005]]
* 11:06 mobrovac: citoid deploying ff90869
* 10:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repool after fixing users [[phab:T296274|T296274]]', diff saved to https://phabricator.wikimedia.org/P17880 and previous config saved to /var/cache/conftool/dbconfig/20211126-102340-ladsgroup.json
* 10:56 dcausse: es1.6 upgrade: upgrade elastic1031
* 10:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1111 ([[phab:T296274|T296274]])', diff saved to https://phabricator.wikimedia.org/P17879 and previous config saved to /var/cache/conftool/dbconfig/20211126-101714-ladsgroup.json
* 10:25 mobrovac: citoid rolled back to ffbaf6d
* 10:17 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1111.eqiad.wmnet with reason: Maintenance [[phab:T296274|T296274]]
* 10:10 mobrovac: citoid deploying 5aeb0fc
* 10:17 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db1111.eqiad.wmnet with reason: Maintenance [[phab:T296274|T296274]]
* 10:05 dcausse: es1.6 upgrade: upgrade elastic1030
* 10:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repool after fixing users [[phab:T296274|T296274]]', diff saved to https://phabricator.wikimedia.org/P17878 and previous config saved to /var/cache/conftool/dbconfig/20211126-101423-ladsgroup.json
* 09:38 dcausse: es1.6 upgrade: upgrade elastic1029
* 10:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1177 ([[phab:T296274|T296274]])', diff saved to https://phabricator.wikimedia.org/P17877 and previous config saved to /var/cache/conftool/dbconfig/20211126-100547-ladsgroup.json
* 08:42 dcausse: es1.6 upgrade: upgrade elastic1028
* 10:05 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1177.eqiad.wmnet with reason: Maintenance [[phab:T296274|T296274]]
* 07:31 dcausse: es1.6 upgrade: upgrade elastic1027
* 10:05 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db1177.eqiad.wmnet with reason: Maintenance [[phab:T296274|T296274]]
* 07:22 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Thu Jul 16 07:22:49 UTC 2015 (duration 22m 48s)
* 10:04 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance [[phab:T296143|T296143]]
* 05:53 dcausse: es1.6 upgrade: upgrade elastic1026
* 10:04 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance [[phab:T296143|T296143]]
* 05:31 logmsgbot: krenair Synchronized wmf-config/interwiki.cdb: Updating interwiki cache (duration: 00m 12s)
* 08:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'After maintenance db1160 ([[phab:T296143|T296143]])', diff saved to https://phabricator.wikimedia.org/P17876 and previous config saved to /var/cache/conftool/dbconfig/20211126-082834-ladsgroup.json
* 05:24 logmsgbot: krenair Synchronized php-1.26wmf14/extensions/WikimediaMaintenance/dumpInterwiki.php: https://gerrit.wikimedia.org/r/#/c/225008/ (duration: 00m 13s)
* 08:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'After maintenance db1160 ([[phab:T296143|T296143]])', diff saved to https://phabricator.wikimedia.org/P17875 and previous config saved to /var/cache/conftool/dbconfig/20211126-081329-ladsgroup.json
* 04:38 logmsgbot: krenair Synchronized php-1.26wmf13/extensions/WikimediaMaintenance/dumpInterwiki.php: https://gerrit.wikimedia.org/r/#/c/225006/ (duration: 00m 13s)
* 07:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'After maintenance db1160 ([[phab:T296143|T296143]])', diff saved to https://phabricator.wikimedia.org/P17874 and previous config saved to /var/cache/conftool/dbconfig/20211126-075824-ladsgroup.json
* 03:54 manybubbles: es1.6 upgrade: upgrade elastic1025
* 07:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'After maintenance db1160 ([[phab:T296143|T296143]])', diff saved to https://phabricator.wikimedia.org/P17873 and previous config saved to /var/cache/conftool/dbconfig/20211126-074320-ladsgroup.json
* 03:19 logmsgbot: LocalisationUpdate completed (1.26wmf14) at 2015-07-16 03:19:37+00:00
* 06:28 Amir1: killing extensions/MachineVision/maintenance/fetchSuggestions.php in mwmaint
* 03:13 logmsgbot: l10nupdate Synchronized php-1.26wmf14/cache/l10n: (no message) (duration: 10m 23s)
* 06:19 Amir1: killing lingering process from mwmaint to depooled db (db1160) that was depooled nine hours ago
* 02:46 logmsgbot: LocalisationUpdate completed (1.26wmf13) at 2015-07-16 02:46:03+00:00
* 02:43 manybubbles: es1.6 upgrade: upgrade elastic1024
* 02:39 logmsgbot: l10nupdate Synchronized php-1.26wmf13/cache/l10n: (no message) (duration: 10m 50s)
* 02:07 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Thu Jul 16 02:07:55 UTC 2015 (duration 7m 54s)
* 02:03 logmsgbot: LocalisationUpdate failed (1.26wmf14) at 2015-07-16 02:03:31+00:00
* 02:03 logmsgbot: LocalisationUpdate failed (1.26wmf13) at 2015-07-16 02:03:30+00:00
* 01:41 logmsgbot: krenair Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/214981/ (duration: 00m 12s)
* 01:22 manybubbles: es1.6 upgrade: upgrade elastic1023


== 2015-07-15 ==
== 2021-11-25 ==
* 23:36 logmsgbot: krenair Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/221885/ (duration: 00m 13s)
* 20:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1160 ([[phab:T296143|T296143]])', diff saved to https://phabricator.wikimedia.org/P17872 and previous config saved to /var/cache/conftool/dbconfig/20211125-204357-ladsgroup.json
* 23:22 logmsgbot: krenair Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/209840/ (duration: 00m 12s)
* 20:43 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1160.eqiad.wmnet with reason: Maintenance [[phab:T296143|T296143]]
* 23:16 logmsgbot: krenair Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/194075/ (duration: 00m 12s)
* 20:43 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db1160.eqiad.wmnet with reason: Maintenance [[phab:T296143|T296143]]
* 23:10 logmsgbot: krenair Synchronized wmf-config/CommonSettings.php: https://gerrit.wikimedia.org/r/#/c/224799/ (duration: 00m 13s)
* 19:28 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1150.eqiad.wmnet with reason: Maintenance [[phab:T296143|T296143]]
* 23:09 logmsgbot: krenair Synchronized docroot/noc: https://gerrit.wikimedia.org/r/#/c/175755/ (duration: 00m 13s)
* 19:28 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db1150.eqiad.wmnet with reason: Maintenance [[phab:T296143|T296143]]
* 23:06 logmsgbot: krenair Synchronized wmf-config/CommonSettings.php: https://gerrit.wikimedia.org/r/#/c/175755/ (duration: 00m 12s)
* 19:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'After maintenance db1149 ([[phab:T296143|T296143]])', diff saved to https://phabricator.wikimedia.org/P17871 and previous config saved to /var/cache/conftool/dbconfig/20211125-192850-ladsgroup.json
* 22:23 csteipp: deploy patch for T105305 to wmf13/14
* 19:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'After maintenance db1149 ([[phab:T296143|T296143]])', diff saved to https://phabricator.wikimedia.org/P17870 and previous config saved to /var/cache/conftool/dbconfig/20211125-191345-ladsgroup.json
* 22:06 logmsgbot: krenair Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/223843/ (duration: 00m 12s)
* 18:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'After maintenance db1149 ([[phab:T296143|T296143]])', diff saved to https://phabricator.wikimedia.org/P17869 and previous config saved to /var/cache/conftool/dbconfig/20211125-185841-ladsgroup.json
* 21:59 logmsgbot: krenair Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/222584/ (duration: 00m 13s)
* 18:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'After maintenance db1149 ([[phab:T296143|T296143]])', diff saved to https://phabricator.wikimedia.org/P17868 and previous config saved to /var/cache/conftool/dbconfig/20211125-184336-ladsgroup.json
* 21:54 manybubbles: es1.6 upgrade: upgrade elastic1022
* 17:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1149 ([[phab:T296143|T296143]])', diff saved to https://phabricator.wikimedia.org/P17867 and previous config saved to /var/cache/conftool/dbconfig/20211125-172714-ladsgroup.json
* 21:37 manybubbles: es1.6 upgrade: upgrade elastic1021
* 17:27 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1149.eqiad.wmnet with reason: Maintenance [[phab:T296143|T296143]]
* 21:09 logmsgbot: twentyafterfour Synchronized php-1.26wmf14: Really Sync If0237cdd0d66634d75b2bab8bc4292c0f3ef75ef this time (duration: 01m 32s)
* 17:27 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db1149.eqiad.wmnet with reason: Maintenance [[phab:T296143|T296143]]
* 20:41 bblack: restarted salt-master service on palladium
* 17:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'After maintenance db1148 ([[phab:T296143|T296143]])', diff saved to https://phabricator.wikimedia.org/P17866 and previous config saved to /var/cache/conftool/dbconfig/20211125-172707-ladsgroup.json
* 20:33 bblack: globally cleaning up dangling symlinks left in /etc/certs from before Id7d2447 via salted 'find /etc/ssl/certs -type l -xtype l|xargs rm'
* 17:12 elukey@puppetmaster1001: conftool action : set/pooled=true; selector: dnsdisc=inference
* 20:30 logmsgbot: twentyafterfour Synchronized php-1.26wmf14: Sync If0237cdd0d66634d75b2bab8bc4292c0f3ef75ef (revert Count API module instantiations and Hook runs) (duration: 01m 48s)
* 17:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'After maintenance db1148 ([[phab:T296143|T296143]])', diff saved to https://phabricator.wikimedia.org/P17864 and previous config saved to /var/cache/conftool/dbconfig/20211125-171202-ladsgroup.json
* 20:20 manybubbles: es1.6 upgrade: upgrade elastic1020
* 16:57 volans@deploy1002: Finished deploy [netbox/deploy@87a36a7]: Deploy v2.10.4-wmf6 (duration: 06m 59s)
* 20:18 RoanKattouw: Running FlowCreateMentionTemplate.php on all Flow wikis
* 16:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'After maintenance db1148 ([[phab:T296143|T296143]])', diff saved to https://phabricator.wikimedia.org/P17863 and previous config saved to /var/cache/conftool/dbconfig/20211125-165657-ladsgroup.json
* 20:06 logmsgbot: twentyafterfour rebuilt wikiversions.cdb and synchronized wikiversions files: group1 wikis to 1.26wmf14
* 16:50 volans@deploy1002: Started deploy [netbox/deploy@87a36a7]: Deploy v2.10.4-wmf6
* 19:50 ejegg: updated civicrm from e29cc5f20b5069afcaff794e628596c1f70d69a3 to 24e0fc854433ea4982e94a0fd2f8bdad8f8dcad7
* 16:49 jynus@cumin1001: dbctl commit (dc=all): 'Fully repool db1163', diff saved to https://phabricator.wikimedia.org/P17862 and previous config saved to /var/cache/conftool/dbconfig/20211125-164941-jynus.json
* 19:06 logmsgbot: krenair Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/224408/ (duration: 00m 12s)
* 16:46 volans@deploy1002: Finished deploy [netbox/deploy@87a36a7]: Test v2.10.4-wmf6 on netbox-next (duration: 01m 04s)
* 19:01 logmsgbot: krenair Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/222792/ (duration: 00m 13s)
* 16:45 volans@deploy1002: Started deploy [netbox/deploy@87a36a7]: Test v2.10.4-wmf6 on netbox-next
* 19:00 logmsgbot: krenair Synchronized wmf-config/wikitech.php: https://gerrit.wikimedia.org/r/#/c/222792/ (duration: 00m 12s)
* 16:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'After maintenance db1148 ([[phab:T296143|T296143]])', diff saved to https://phabricator.wikimedia.org/P17861 and previous config saved to /var/cache/conftool/dbconfig/20211125-164153-ladsgroup.json
* 18:58 logmsgbot: krenair Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/222776/ (duration: 00m 13s)
* 16:18 jynus@cumin1001: dbctl commit (dc=all): 'Slowly repool db1163++', diff saved to https://phabricator.wikimedia.org/P17860 and previous config saved to /var/cache/conftool/dbconfig/20211125-161833-jynus.json
* 18:57 logmsgbot: krenair Synchronized wmf-config/CommonSettings.php: https://gerrit.wikimedia.org/r/#/c/222776/ (duration: 00m 13s)
* 16:14 jynus@cumin1001: dbctl commit (dc=all): 'Slowly repool db1163+', diff saved to https://phabricator.wikimedia.org/P17859 and previous config saved to /var/cache/conftool/dbconfig/20211125-161404-jynus.json
* 18:40 ejegg: updated civicrm from f4219bc8eca5e4db633da07b6ac9e2505cfbae16 to e29cc5f20b5069afcaff794e628596c1f70d69a3
* 16:10 klausman: restarting pybal on lvs2009 [[phab:T289835|T289835]]
* 18:39 logmsgbot: krenair Synchronized wmf-config/throttle.php: throttle labswiki account creations from hackathon at 500 (duration: 00m 12s)
* 15:57 vgutierrez: restarting pybal  on lvs2010 - [[phab:T289835|T289835]]
* 18:39 logmsgbot: twentyafterfour Finished scap: group0 to 1.26wmf14 (duration: 32m 34s)
* 15:55 jynus@cumin1001: dbctl commit (dc=all): 'Slowly repool db1163', diff saved to https://phabricator.wikimedia.org/P17856 and previous config saved to /var/cache/conftool/dbconfig/20211125-155538-jynus.json
* 18:21 manybubbles: es1.6 upgrade: upgrading elastic1019
* 15:47 jynus: reenable gtid on db1163
* 18:20 Jeff_Green: authdns-update shifting to service-oriented hostnames for fundraising cluster
* 15:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1148 ([[phab:T296143|T296143]])', diff saved to https://phabricator.wikimedia.org/P17853 and previous config saved to /var/cache/conftool/dbconfig/20211125-152906-ladsgroup.json
* 18:06 logmsgbot: twentyafterfour Started scap: group0 to 1.26wmf14
* 15:29 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1148.eqiad.wmnet with reason: Maintenance [[phab:T296143|T296143]]
* 17:55 ejegg: updated civicrm from 6560cefa8d7e68e35e30b310d6691ab57798a4c9 to f4219bc8eca5e4db633da07b6ac9e2505cfbae16
* 15:29 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db1148.eqiad.wmnet with reason: Maintenance [[phab:T296143|T296143]]
* 17:34 Jeff_Green: authdns-update to remove boron.wm.o
* 15:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'After maintenance db1147 ([[phab:T296143|T296143]])', diff saved to https://phabricator.wikimedia.org/P17852 and previous config saved to /var/cache/conftool/dbconfig/20211125-152858-ladsgroup.json
* 17:22 logmsgbot: krenair Synchronized wmf-config/CommonSettings.php: partially revert https://gerrit.wikimedia.org/r/#/c/224420/1/wmf-config/CommonSettings.php - doesnt quite work (duration: 00m 13s)
* 15:22 ayounsi@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ping1001.eqiad.wmnet
* 17:17 Jeff_Green: authdns-update to remove aluminium, also lanthanum by preexisting commit  
* 15:19 klausman@cumin1001: conftool action : set/pooled=yes:weight=1; selector: cluster=ml_serve,service=kubesvc
* 16:45 andrewbogott: rebooting labvirt1005
* 15:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'After maintenance db1147 ([[phab:T296143|T296143]])', diff saved to https://phabricator.wikimedia.org/P17851 and previous config saved to /var/cache/conftool/dbconfig/20211125-151354-ladsgroup.json
* 16:43 mutante: accepting unaccepted salt keys for ganeti VMs ,planet, bromine, krypton
* 15:13 ayounsi@cumin1001: START - Cookbook sre.hosts.decommission for hosts ping1001.eqiad.wmnet
* 16:39 mutante: krypton - signing puppet cert, initial run
* 15:12 ayounsi@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ping3001.esams.wmnet
* 16:26 andrewbogott: woo, first try!
* 15:05 ayounsi@cumin1001: START - Cookbook sre.hosts.decommission for hosts ping3001.esams.wmnet
* 16:23 andrewbogott: trying to kill labvirt1005 via repeated instance suspend/resume
* 15:04 ayounsi@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ping2001.codfw.wmnet
* 16:04 logmsgbot: krenair Synchronized wmf-config/CommonSettings.php: https://gerrit.wikimedia.org/r/#/c/224420/ (duration: 00m 12s)
* 14:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'After maintenance db1147 ([[phab:T296143|T296143]])', diff saved to https://phabricator.wikimedia.org/P17850 and previous config saved to /var/cache/conftool/dbconfig/20211125-145849-ladsgroup.json
* 16:03 logmsgbot: krenair Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/224420/ (duration: 00m 12s)
* 14:54 ayounsi@cumin1001: START - Cookbook sre.hosts.decommission for hosts ping2001.codfw.wmnet
* 16:01 logmsgbot: krenair Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/224808/ (duration: 00m 12s)
* 14:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'After maintenance db1147 ([[phab:T296143|T296143]])', diff saved to https://phabricator.wikimedia.org/P17849 and previous config saved to /var/cache/conftool/dbconfig/20211125-144344-ladsgroup.json
* 15:58 logmsgbot: krenair Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/222581/ (duration: 00m 11s)
* 14:42 XioNoX: Update ping redirect to point to new ping VMs - [[phab:T295767|T295767]]
* 15:35 logmsgbot: krenair Synchronized database lists: (no message) (duration: 00m 11s)
* 14:25 jayme: uncordoned kubestage1003.eqiad.wmnet kubestage1004.eqiad.wmnet - [[phab:T293729|T293729]]
* 15:29 logmsgbot: krenair Synchronized docroot/noc/createTxtFileSymlinks.sh: https://gerrit.wikimedia.org/r/#/c/139326/ (duration: 00m 12s)
* 14:17 klausman@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality' for release 'main' .
* 15:27 logmsgbot: krenair Synchronized wmf-config/CommonSettings.php: https://gerrit.wikimedia.org/r/#/c/139326/ (duration: 00m 12s)
* 14:16 klausman@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
* 15:20 logmsgbot: krenair Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/139326/ (duration: 00m 11s)
* 14:12 klausman@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
* 14:33 logmsgbot: legoktm Synchronized wmf-config/CommonSettings.php: Set $wgCentralAuthStrict = true; (duration: 00m 12s)
* 13:40 ayounsi@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host ping1002.eqiad.wmnet
* 14:22 legoktm: sync failed on mw1090.eqiad.wmnet, read only filesystem
* 13:32 ayounsi@cumin1001: START - Cookbook sre.ganeti.makevm for new host ping1002.eqiad.wmnet
* 14:20 logmsgbot: legoktm Synchronized php-1.26wmf13/extensions/CentralAuth/includes/CentralAuthPlugin.php: Add log entry for $wgCentralAuthStrict failures if SULMigration is enabled (duration: 00m 13s)
* 13:30 ayounsi@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host ping2002.codfw.wmnet
* 13:55 dcausse: es1.6 upgrade: upgrade elastic1018
* 13:28 Amir1: killing lingering process from mwmaint to depooled db1147
* 13:24 springle: entry below not mw1216 fault, but r/o filesystem error on mw1090
* 13:20 ayounsi@cumin1001: START - Cookbook sre.ganeti.makevm for new host ping2002.codfw.wmnet
* 13:15 springle: sync-common on mw1216 after sync-file from tin failed non-zero exit status 12
* 13:14 ayounsi@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host ping3002.esams.wmnet
* 13:12 logmsgbot: springle Synchronized wmf-config/db-eqiad.php: depool db1022 T105879 (duration: 00m 12s)
* 13:05 ayounsi@cumin1001: START - Cookbook sre.ganeti.makevm for new host ping3002.esams.wmnet
* 11:43 dcausse: es1.6 upgrade: upgrade elastic1017
* 12:27 hnowlan@cumin1001: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching restbase202[1-3].codfw.wmnet: Restarting for certificate updates - hnowlan@cumin1001
* 08:27 dcausse: es1.6 upgrade: upgrade elastic1016
* 12:14 arturo: update repo bullseye-wikimedia/thirdparty/ceph-octopus ([[phab:T296175|T296175]])
* 06:31 dcausse: es1.6 upgrade: upgrade elastic1015
* 12:14 jynus: disable temp. gtid on db1163
* 05:40 dcausse: es1.6 upgrade: upgrade elastic1014
* 12:11 jynus@cumin1001: dbctl commit (dc=all): 'Temp. depool db1163 fully', diff saved to https://phabricator.wikimedia.org/P17847 and previous config saved to /var/cache/conftool/dbconfig/20211125-121138-jynus.json
* 05:10 springle: db1030 busy removing table partitioning
* 12:04 jynus@cumin1001: dbctl commit (dc=all): 'Reduce db1163 load even more', diff saved to https://phabricator.wikimedia.org/P17846 and previous config saved to /var/cache/conftool/dbconfig/20211125-120435-jynus.json
* 04:28 manybubbles: es1.6 upgrade: lowered the shard transfer settings back to our normal rate. going to bed.
* 11:56 hnowlan@cumin1001: START - Cookbook sre.cassandra.roll-restart for nodes matching restbase202[1-3].codfw.wmnet: Restarting for certificate updates - hnowlan@cumin1001
* 04:12 manybubbles: es1.6 upgrade: upgrade elastic1013
* 11:56 jynus@cumin1001: dbctl commit (dc=all): 'Reduce db1163 load', diff saved to https://phabricator.wikimedia.org/P17845 and previous config saved to /var/cache/conftool/dbconfig/20211125-115602-jynus.json
* 03:49 springle: upgrade db1030 trusty
* 11:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1147 ([[phab:T296143|T296143]])', diff saved to https://phabricator.wikimedia.org/P17844 and previous config saved to /var/cache/conftool/dbconfig/20211125-110443-ladsgroup.json
* 03:29 manybubbles: es1.6 upgrade: upgrade elastic1012
* 11:04 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1147.eqiad.wmnet with reason: Maintenance [[phab:T296143|T296143]]
* 03:14 logmsgbot: LocalisationUpdate completed (1.26wmf13) at 2015-07-15 03:14:21+00:00
* 11:04 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db1147.eqiad.wmnet with reason: Maintenance [[phab:T296143|T296143]]
* 03:10 logmsgbot: reedy Synchronized php-1.26wmf13/cache/l10n: (no message) (duration: 13m 32s)
* 11:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'After maintenance db1146:3314 ([[phab:T296143|T296143]])', diff saved to https://phabricator.wikimedia.org/P17843 and previous config saved to /var/cache/conftool/dbconfig/20211125-110435-ladsgroup.json
* 03:03 manybubbles: es1.6 upgrade: raised limits on shard migration rate - should speed up the restart. we should lower it before we do restarts during europe's morning
* 10:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'After maintenance db1146:3314 ([[phab:T296143|T296143]])', diff saved to https://phabricator.wikimedia.org/P17842 and previous config saved to /var/cache/conftool/dbconfig/20211125-104930-ladsgroup.json
* 02:10 Reedy: Running LU manually to see what's wrong with it
* 10:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'After maintenance db1146:3314 ([[phab:T296143|T296143]])', diff saved to https://phabricator.wikimedia.org/P17841 and previous config saved to /var/cache/conftool/dbconfig/20211125-103425-ladsgroup.json
* 02:07 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Wed Jul 15 02:07:48 UTC 2015 (duration 7m 47s)
* 10:25 vgutierrez: rolling restart of varnish and HAProxy on cp2042.codfw.wmnet,cp1090.eqiad.wmnet,cp[5012].eqsin.wmnet,cp3065.esams.wmnet,cp[4026,4032].ulsfo.wmnet' to disable PROXY protocol - [[phab:T290005|T290005]]
* 02:02 logmsgbot: LocalisationUpdate failed (1.26wmf13) at 2015-07-15 02:02:55+00:00
* 10:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'After maintenance db1146:3314 ([[phab:T296143|T296143]])', diff saved to https://phabricator.wikimedia.org/P17840 and previous config saved to /var/cache/conftool/dbconfig/20211125-101921-ladsgroup.json
* 09:55 jelto@cumin1001: conftool action : set/pooled=true; selector: name=eqiad,dnsdisc=(apertium{{!}}api-gateway{{!}}apple-search{{!}}blubberoid{{!}}citoid{{!}}cxserver{{!}}echostore{{!}}eventgate-analytics{{!}}eventgate-analytics-external{{!}}eventgate-logging-external{{!}}eventstreams{{!}}eventstreams-internal{{!}}linkrecommendation{{!}}mathoid{{!}}mobileapps{{!}}proton{{!}}push-notifications{{!}}recommendation-api{{!}}sessionstore{{!}}shellbox{{!}}shellbox-constraints{{!}}shellbox-media{{!}}shellbox-syntaxh
* 09:45 jelto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'zotero' for release 'production' .
* 09:43 jelto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
* 09:39 jelto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'toolhub' for release 'main' .
* 09:37 jelto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'termbox' for release 'production' .
* 09:34 jelto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 09:31 jelto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'similar-users' for release 'main' .
* 09:29 jelto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'shellbox-timeline' for release 'main' .
* 09:27 jelto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'shellbox-syntaxhighlight' for release 'main' .
* 09:24 jelto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'shellbox-media' for release 'main' .
* 09:23 jelto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'shellbox-media' for release 'main' .
* 09:21 jelto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'shellbox-constraints' for release 'main' .
* 09:19 jelto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'shellbox' for release 'main' .
* 09:16 jelto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'recommendation-api' for release 'production' .
* 09:10 jelto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
* 09:05 jelto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'push-notifications' for release 'main' .
* 09:02 jelto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'proton' for release 'production' .
* 08:59 jelto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 08:51 jelto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
* 08:50 jelto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
* 08:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1146:3314 ([[phab:T296143|T296143]])', diff saved to https://phabricator.wikimedia.org/P17837 and previous config saved to /var/cache/conftool/dbconfig/20211125-084834-ladsgroup.json
* 08:48 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1146.eqiad.wmnet with reason: Maintenance [[phab:T296143|T296143]]
* 08:48 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db1146.eqiad.wmnet with reason: Maintenance [[phab:T296143|T296143]]
* 08:47 jelto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'miscweb' for release 'main' .
* 08:46 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1146.eqiad.wmnet with reason: Maintenance [[phab:T296143|T296143]]
* 08:46 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db1146.eqiad.wmnet with reason: Maintenance [[phab:T296143|T296143]]
* 08:44 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1146.eqiad.wmnet with reason: Maintenance [[phab:T296143|T296143]]
* 08:44 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db1146.eqiad.wmnet with reason: Maintenance [[phab:T296143|T296143]]
* 08:43 jelto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mathoid' for release 'production' .
* 08:42 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1146.eqiad.wmnet with reason: Maintenance [[phab:T296143|T296143]]
* 08:42 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db1146.eqiad.wmnet with reason: Maintenance [[phab:T296143|T296143]]
* 08:41 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1146.eqiad.wmnet with reason: Maintenance [[phab:T296143|T296143]]
* 08:41 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db1146.eqiad.wmnet with reason: Maintenance [[phab:T296143|T296143]]
* 08:40 jelto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'linkrecommendation' for release 'external' .
* 08:40 jelto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'linkrecommendation' for release 'internal' .
* 08:40 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1146.eqiad.wmnet with reason: Maintenance [[phab:T296143|T296143]]
* 08:39 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db1146.eqiad.wmnet with reason: Maintenance [[phab:T296143|T296143]]
* 08:37 jelto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventstreams-internal' for release 'main' .
* 08:34 jelto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventstreams' for release 'production' .
* 08:34 jelto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventstreams' for release 'canary' .
* 08:31 jelto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'canary' .
* 08:31 jelto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
* 08:28 jelto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
* 08:28 jelto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'canary' .
* 08:25 jelto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-analytics' for release 'canary' .
* 08:25 jelto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-analytics' for release 'production' .
* 08:22 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
* 08:22 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 08:21 elukey@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
* 08:21 jelto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'cxserver' for release 'production' .
* 08:21 elukey@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
* 08:18 jelto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'citoid' for release 'production' .
* 08:17 jelto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'citoid' for release 'production' .
* 08:14 jelto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' .
* 08:13 jelto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'changeprop' for release 'production' .
* 08:09 jelto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'changeprop' for release 'production' .
* 08:08 jelto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'changeprop' for release 'production' .
* 08:05 jelto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'blubberoid' for release 'production' .
* 08:03 jelto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'apple-search' for release 'main' .
* 08:02 jelto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'apple-search' for release 'main' .
* 08:00 jelto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
* 07:57 jelto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'apertium' for release 'production' .
* 07:56 jelto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'apertium' for release 'production' .
* 07:53 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1128.eqiad.wmnet with OS bullseye
* 07:51 jelto@cumin1001: conftool action : set/pooled=true; selector: name=eqiad,dnsdisc=(echostore{{!}}sessionstore)
* 07:49 marostegui: Stop mysql on db1133 to clone db1128 as a test host [[phab:T295965|T295965]]
* 07:49 jelto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'sessionstore' for release 'production' .
* 07:48 jelto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'sessionstore' for release 'production' .
* 07:47 jayme: elevated MediaWiki exceptions and fatals (from ~07:35) due to a mistake during re-deploy of eventgate-main
* 07:45 jelto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'echostore' for release 'production' .
* 07:35 jelto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
* 07:32 jelto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-main' for release 'canary' .
* 07:32 jelto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
* 07:29 elukey_: elukey@mwdebug2002:~$ sudo systemctl reset-failed ifup@ens5.service
* 07:27 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host db1128.eqiad.wmnet with OS bullseye
* 07:23 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1145.eqiad.wmnet with reason: Maintenance [[phab:T296143|T296143]]
* 07:23 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db1145.eqiad.wmnet with reason: Maintenance [[phab:T296143|T296143]]
* 07:20 jelto@cumin1001: conftool action : set/pooled=false; selector: name=eqiad,dnsdisc=(apertium{{!}}api-gateway{{!}}apple-search{{!}}blubberoid{{!}}citoid{{!}}cxserver{{!}}echostore{{!}}eventgate-analytics{{!}}eventgate-analytics-external{{!}}eventgate-logging-external{{!}}eventstreams{{!}}eventstreams-internal{{!}}linkrecommendation{{!}}mathoid{{!}}mobileapps{{!}}proton{{!}}push-notifications{{!}}recommendation-api{{!}}sessionstore{{!}}shellbox{{!}}shellbox-constraints{{!}}shellbox-media{{!}}shellbox-syntax
* 07:17 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on 32 hosts with reason: helm3 de-deploy [[phab:T251305|T251305]]
* 07:17 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on 32 hosts with reason: helm3 de-deploy [[phab:T251305|T251305]]
* 07:10 jelto: downtime PyBal backends health check on lvs1015 and lvs1016 for helm3 de-deploy [[phab:T251305|T251305]]. I'm keeping an eye on icing and remove downtime as soon as I'm finished
* 07:09 jelto: start re-deploy procedure in eqiad Kubernetes [[phab:T251305|T251305]]
* 06:31 marostegui: Restart tendril's DB
* 05:51 ryankemper: [WDQS Deploy] Deploy complete. Successful test query placed on query.wikidata.org, there's no relevant criticals in Icinga, and Grafana looks good
* 04:45 ryankemper@deploy1002: Finished deploy [wdqs/wdqs@29c5cd7] (wcqs): Deploy 0.3.93 to WCQS (duration: 05m 27s)
* 04:43 ryankemper: [WCQS Deploy] Tests look good following deploy of `0.3.93` to canary `wcqs1002.eqiad.wmnet`, proceeding to rest of fleet
* 04:40 ryankemper@deploy1002: Started deploy [wdqs/wdqs@29c5cd7] (wcqs): Deploy 0.3.93 to WCQS
* 04:39 ryankemper: [WDQS Deploy] Restarting `wdqs-categories` across lvs-managed hosts, one node at a time: `sudo -E cumin -b 1 'A:wdqs-all and not A:wdqs-test' 'depool && sleep 45 && systemctl restart wdqs-categories && sleep 45 && pool'`
* 04:38 ryankemper: [WDQS Deploy] Restarted `wdqs-categories` across all test hosts simultaneously: `sudo -E cumin 'A:wdqs-test' 'systemctl restart wdqs-categories'`
* 04:38 ryankemper: [WDQS Deploy] Restarted `wdqs-updater` across all hosts, 4 hosts at a time: `sudo -E cumin -b 4 'A:wdqs-all' 'systemctl restart wdqs-updater'`
* 04:35 ryankemper@deploy1002: Finished deploy [wdqs/wdqs@29c5cd7]: 0.3.93 (duration: 09m 23s)
* 04:30 ryankemper: [Elastic] Cleaning up dangling apt packages: `ryankemper@cumin1001:~$ sudo cumin -b 4 'elastic*' 'sudo apt autoremove -y'`
* 04:27 ryankemper: [WDQS Deploy] Tests passing following deploy of `0.3.93` on canary `wdqs1003`; proceeding to rest of fleet
* 04:25 ryankemper@deploy1002: Started deploy [wdqs/wdqs@29c5cd7]: 0.3.93
* 04:25 ryankemper: [WDQS Deploy] Gearing up for deploy of wdqs `0.3.93`. Pre-deploy tests passing on canary `wdqs1003`
* 03:12 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic2072.codfw.wmnet with OS buster
* 02:42 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host elastic2072.codfw.wmnet with OS buster
* 02:34 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic2071.codfw.wmnet with OS buster
* 02:23 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic2070.codfw.wmnet with OS buster
* 02:04 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host elastic2071.codfw.wmnet with OS buster
* 01:54 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host elastic2070.codfw.wmnet with OS buster
* 01:49 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic2068.codfw.wmnet with OS buster
* 01:34 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic2067.codfw.wmnet with OS buster
* 01:19 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host elastic2068.codfw.wmnet with OS buster
* 01:04 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host elastic2067.codfw.wmnet with OS buster
* 00:37 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic2066.codfw.wmnet with OS buster


== 2015-07-14 ==
== 2021-11-24 ==
* 23:46 manybubbles: es1.6 upgrade: upgraded elastic1011
* 23:59 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host elastic2066.codfw.wmnet with OS buster
* 23:22 bblack: updating nginx to 1.9.3-1+wmf1 on cp*
* 23:52 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic2065.codfw.wmnet with OS buster
* 23:17 bblack: reprepro: nginx for jessie-wikimedia/main bumped to 1.9.3-1+wmf1
* 23:44 mutante: puppetmaster1001:~] $ sudo puppet cert sign gitlab-runner1001.eqiad.wmnet {{!}}  sudo install_console gitlab-runner1001.eqiad.wmnet ([[phab:T295481|T295481]])
* 22:22 ejegg: updated civicrm from 04efc7d5c7bbb068f907125f2184692aee676123 to 6560cefa8d7e68e35e30b310d6691ab57798a4c9
* 23:26 mutante: ganeti - bringing up new VM - sudo gnt-instance start gitlab-runner1001.eqiad.wmnet ; ran puppet on install1003; installing OS [[phab:T295481|T295481]]
* 21:29 Reedy: mw1090 fs is ro
* 23:22 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host elastic2065.codfw.wmnet with OS buster
* 21:28 logmsgbot: reedy rebuilt wikiversions.cdb and synchronized wikiversions files: Fix testwiki
* 23:11 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic2064.codfw.wmnet with OS buster
* 21:05 _joe|AFK: depooling mw1090, ext4 errors in syslog, filesystem mounted read-only
* 23:09 mutante: mwmaint1002 - sudo /usr/bin/find /var/lib/puppet/clientbucket/ -type f -size 1M -delete  - to fix Icinga alert about large files in client bucket
* 21:01 logmsgbot: twentyafterfour Synchronized wmf-config/CommonSettings.php: revert LCStoreStaticArray (duration: 00m 12s)
* 23:08 dzahn@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host gitlab-runner1001.eqiad.wmnet
* 20:59 logmsgbot: twentyafterfour Finished scap: testwiki to 1.26wmf14 and rebuild localization cache (duration: 72m 45s)
* 23:03 mutante: wcqs1001 -  sudo systemctl restart wcqs-blazegraph - after <+jinxer-wm> (BlazegraphFreeAllocatorsDecreasingRapidly) firing: Blazegraph instance wcqs1001:9195 is burning free allocators
* 20:42 bblack: undoing LCStoreStaticArray because appservers look unhealthy, using ori's command: 'salt -G deployment_target:scap/scap cmd.run "rm /etc/lcstore"'
* 22:52 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm for new host gitlab-runner1001.eqiad.wmnet
* 19:46 logmsgbot: twentyafterfour Started scap: testwiki to 1.26wmf14 and rebuild localization cache
* 22:50 mutante: Creating a new Ganeti VM and wondering which row to put it? [ganeti1009:~] $ for row in A B C D; do echo "row $<nowiki>{</nowiki>row<nowiki>}</nowiki>: $(sudo gnt-instance list -o name -F "pnode.group == 'row_$<nowiki>{</nowiki>row<nowiki>}</nowiki>'" {{!}} wc -l) VMs"; done
* 19:23 manybubbles: es1.6 step iforget: upgrade elasticsearch on elastic1010
* 22:43 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts gitlab-runner1001.wikimedia.org
* 17:41 mutante: terbium:  /usr/local/bin/foreachwiki extensions/Echo/maintenance/processEchoEmailBatch.php
* 22:41 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host elastic2064.codfw.wmnet with OS buster
* 17:10 dcausse: es1.6 step 10: upgrade elastic1009
* 22:39 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic2063.codfw.wmnet with OS buster
* 16:23 mutante: bromine - apt-get upgrade
* 22:38 mutante: running decom cookbook on gitlab-runner1001.wikimedia.org VM which was in state "ADMIN_down" and not used yet. to make room to recreate it as gitlab-runner1001.eqiad.wmnet [[phab:T295481|T295481]]
* 15:08 logmsgbot: manybubbles Synchronized php-1.26wmf13/extensions/UniversalLanguageSelector/: SWAT add some hooks to extension.json (duration: 00m 13s)
* 22:36 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts gitlab-runner1001.wikimedia.org
* 14:34 gwicke: started RESTBase revision thin-out script for html and data-parsoid on wikimedia domains
* 22:08 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host elastic2063.codfw.wmnet with OS buster
* 14:01 dcausse: es1.6 step 9: upgrade elastic1008
* 22:03 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic2062.codfw.wmnet with OS buster
* 12:48 _joe_: reimaging mw1155
* 21:40 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 12:17 ori: Logging a message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log.
* 21:37 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 11:28 dcausse: es1.6 step 8: upgrade elastic1007
* 21:35 legoktm@deploy1002: Synchronized wmf-config/: Improve docs on $wmgUseGlobalAbuseFilters and sort list of wikis (duration: 00m 57s)
* 11:25 _joe_: repooling mw1154 with HHVM
* 21:33 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host elastic2062.codfw.wmnet with OS buster
* 10:12 _joe_: stopped poolcounter on mw1154
* 21:21 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic2061.codfw.wmnet with OS buster
* 10:06 _joe_: reimaging mw1154
* 21:00 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 07:49 dcausse: es1.6 step 7: upgrade elastic1006
* 20:58 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 07:09 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Tue Jul 14 07:09:10 UTC 2015 (duration 9m 9s)
* 20:54 legoktm@deploy1002: Synchronized wmf-config/: Update configuration related to disabling Score functionality (duration: 00m 57s)
* 06:48 dcausse: es1.6 step 6: upgrade elastic1005
* 20:51 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host elastic2061.codfw.wmnet with OS buster
* 06:41 logmsgbot: ori Synchronized wmf-config/CommonSettings.php: I9c9bf0f4: Use LCStoreStaticArray unconditionally (duration: 03m 02s)
* 19:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'After maintenance db1144:3314 ([[phab:T296143|T296143]])', diff saved to https://phabricator.wikimedia.org/P17834 and previous config saved to /var/cache/conftool/dbconfig/20211124-194857-ladsgroup.json
* 05:26 ori: Cleaned up now-unused hhbc files from /run/hhvm/cache on job runners
* 19:38 razzi: `sudo maintain-views --all-databases --replace-all` on clouddb1018 for [[phab:T292594|T292594]]
* 04:58 ori: Enabling LCStoreStaticArray in production. May be reverted by running: 'salt -G deployment_target:scap/scap cmd.run "rm /etc/lcstore"' on palladium.
* 19:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'After maintenance db1144:3314 ([[phab:T296143|T296143]])', diff saved to https://phabricator.wikimedia.org/P17833 and previous config saved to /var/cache/conftool/dbconfig/20211124-193352-ladsgroup.json
* 04:48 logmsgbot: ori Synchronized wmf-config/CommonSettings.php: Follow-up for Ieb62ee050e: allow LCStoreStaticArray in server mode (duration: 00m 13s)
* 19:19 razzi: run `maintain-views --all-databases --replace-all` on clouddb1013 for [[phab:T292594|T292594]]
* 02:35 logmsgbot: LocalisationUpdate completed (1.26wmf13) at 2015-07-14 02:35:21+00:00
* 19:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'After maintenance db1144:3314 ([[phab:T296143|T296143]])', diff saved to https://phabricator.wikimedia.org/P17832 and previous config saved to /var/cache/conftool/dbconfig/20211124-191847-ladsgroup.json
* 02:31 logmsgbot: l10nupdate Synchronized php-1.26wmf13/cache/l10n: (no message) (duration: 07m 27s)
* 19:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'After maintenance db1144:3314 ([[phab:T296143|T296143]])', diff saved to https://phabricator.wikimedia.org/P17831 and previous config saved to /var/cache/conftool/dbconfig/20211124-190343-ladsgroup.json
* 02:07 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Tue Jul 14 02:07:32 UTC 2015 (duration 7m 30s)
* 18:57 vgutierrez@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ncredir2002.codfw.wmnet
* 02:02 logmsgbot: LocalisationUpdate failed (1.26wmf13) at 2015-07-14 02:02:33+00:00
* 18:51 vgutierrez@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM ncredir2002.codfw.wmnet
* 01:22 logmsgbot: springle Synchronized wmf-config/db-eqiad.php: repool db1037; depool db1030 (duration: 00m 13s)
* 18:48 vgutierrez@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ncredir2001.codfw.wmnet
* 18:43 vgutierrez@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM ncredir2001.codfw.wmnet
* 18:42 vgutierrez@cumin1001: END (FAIL) - Cookbook sre.ganeti.reboot-vm (exit_code=99) for VM ncredir2001.codfw.wmnet
* 18:42 vgutierrez@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM ncredir2001.codfw.wmnet
* 18:42 vgutierrez@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM acmechief-test2001.codfw.wmnet
* 18:36 vgutierrez@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM acmechief-test2001.codfw.wmnet
* 18:36 vgutierrez@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM acmechief2001.codfw.wmnet
* 18:30 vgutierrez@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM acmechief2001.codfw.wmnet
* 17:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1144:3314 ([[phab:T296143|T296143]])', diff saved to https://phabricator.wikimedia.org/P17830 and previous config saved to /var/cache/conftool/dbconfig/20211124-174723-ladsgroup.json
* 17:47 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1144.eqiad.wmnet with reason: Maintenance [[phab:T296143|T296143]]
* 17:47 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db1144.eqiad.wmnet with reason: Maintenance [[phab:T296143|T296143]]
* 17:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'After maintenance db1143 ([[phab:T296143|T296143]])', diff saved to https://phabricator.wikimedia.org/P17829 and previous config saved to /var/cache/conftool/dbconfig/20211124-174615-ladsgroup.json
* 17:35 ladsgroup@deploy1002: Synchronized php-1.38.0-wmf.9/includes/libs/rdbms/: Backport: [[gerrit:741134{{!}}rdbms: Add full query to transaction profiler (T295706)]] (duration: 00m 56s)
* 17:34 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 17:34 jhathaway@cumin1001: conftool action : set/pooled=true; selector: name=codfw,dnsdisc=puppetboard
* 17:31 pt1979@cumin2002: START - Cookbook sre.dns.netbox
* 17:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'After maintenance db1143 ([[phab:T296143|T296143]])', diff saved to https://phabricator.wikimedia.org/P17828 and previous config saved to /var/cache/conftool/dbconfig/20211124-173110-ladsgroup.json
* 17:29 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 17:27 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 17:25 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 17:23 jayme@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM kubernetes2016.codfw.wmnet
* 17:22 jayme@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=helm-charts,name=codfw
* 17:21 pt1979@cumin2002: START - Cookbook sre.dns.netbox
* 17:21 jayme@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM kubernetes2016.codfw.wmnet
* 17:20 jayme@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM chartmuseum2001.codfw.wmnet
* 17:20 jayme@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM kubernetes2015.codfw.wmnet
* 17:17 jayme@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM kubernetes2015.codfw.wmnet
* 17:17 jayme@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM chartmuseum2001.codfw.wmnet
* 17:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'After maintenance db1143 ([[phab:T296143|T296143]])', diff saved to https://phabricator.wikimedia.org/P17827 and previous config saved to /var/cache/conftool/dbconfig/20211124-171604-ladsgroup.json
* 17:11 jayme@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM kubernetes2006.codfw.wmnet
* 17:11 jayme@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM registry2004.codfw.wmnet
* 17:08 jayme@cumin1001: conftool action : set/pooled=false; selector: dnsdisc=helm-charts,name=codfw
* 17:06 jayme@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM registry2004.codfw.wmnet
* 17:06 jayme@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM kubernetes2006.codfw.wmnet
* 17:05 mforns@deploy1002: Finished deploy [analytics/refinery@6253399] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@6253399] (duration: 06m 45s)
* 17:05 jayme@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM registry2003.codfw.wmnet
* 17:01 jayme@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM registry2003.codfw.wmnet
* 17:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'After maintenance db1143 ([[phab:T296143|T296143]])', diff saved to https://phabricator.wikimedia.org/P17826 and previous config saved to /var/cache/conftool/dbconfig/20211124-170100-ladsgroup.json
* 17:00 jayme@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM kubernetes2005.codfw.wmnet
* 16:58 mforns@deploy1002: Started deploy [analytics/refinery@6253399] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@6253399]
* 16:58 mforns@deploy1002: Finished deploy [analytics/refinery@6253399] (thin): Regular analytics weekly train THIN [analytics/refinery@6253399] (duration: 00m 07s)
* 16:58 mforns@deploy1002: Started deploy [analytics/refinery@6253399] (thin): Regular analytics weekly train THIN [analytics/refinery@6253399]
* 16:58 mforns@deploy1002: Finished deploy [analytics/refinery@6253399]: Regular analytics weekly train [analytics/refinery@6253399] (duration: 32m 50s)
* 16:56 jayme@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM kubernetes2005.codfw.wmnet
* 16:50 klausman@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
* 16:49 klausman@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
* 16:44 klausman@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
* 16:43 jayme@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM kubetcd2005.codfw.wmnet
* 16:43 klausman@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
* 16:42 klausman@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
* 16:42 klausman@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
* 16:41 jayme@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM kubetcd2005.codfw.wmnet
* 16:41 klausman@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
* 16:40 klausman@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
* 16:38 jayme@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM kubetcd2006.codfw.wmnet
* 16:36 jayme@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM kubestagetcd2002.codfw.wmnet
* 16:36 jayme@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM kubetcd2006.codfw.wmnet
* 16:35 ladsgroup@deploy1002: Synchronized php-1.38.0-wmf.9/includes/libs/rdbms/: Backport: [[gerrit:741132{{!}}rdbms: Make TransactionProfiler logs more useful (T295706)]] (duration: 00m 57s)
* 16:33 jayme@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM kubestagetcd2002.codfw.wmnet
* 16:33 jayme@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM kubetcd2004.codfw.wmnet
* 16:33 klausman@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
* 16:33 klausman@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
* 16:31 jayme@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM kubestagetcd2003.codfw.wmnet
* 16:31 jayme@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM kubetcd2004.codfw.wmnet
* 16:29 jayme@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM kubestagetcd2003.codfw.wmnet
* 16:25 jayme@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM kubestagetcd2001.codfw.wmnet
* 16:25 mforns@deploy1002: Started deploy [analytics/refinery@6253399]: Regular analytics weekly train [analytics/refinery@6253399]
* 16:23 jayme@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM kubemaster2001.codfw.wmnet
* 16:23 jayme@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM kubestagetcd2001.codfw.wmnet
* 16:21 jayme@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM kubemaster2001.codfw.wmnet
* 16:19 jayme@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM kubemaster2002.codfw.wmnet
* 16:16 jayme@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM kubemaster2002.codfw.wmnet
* 16:15 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 16:13 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 16:13 Amir1: start of  "foreachwikiindblist s3 migrateRevisionActorTemp.php --sleep=2" in mwmaint1002 in a screen. It will take a month or  so ([[phab:T275246|T275246]])
* 16:09 klausman@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
* 16:09 klausman@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
* 16:00 klausman@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
* 16:00 klausman@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
* 16:00 btullis: systemctl reset-failed ifup@ens5.service on schema2004 [[phab:T273026|T273026]]
* 15:48 btullis@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM schema2004.codfw.wmnet
* 15:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1143 ([[phab:T296143|T296143]])', diff saved to https://phabricator.wikimedia.org/P17821 and previous config saved to /var/cache/conftool/dbconfig/20211124-154533-ladsgroup.json
* 15:45 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1143.eqiad.wmnet with reason: Maintenance [[phab:T296143|T296143]]
* 15:45 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db1143.eqiad.wmnet with reason: Maintenance [[phab:T296143|T296143]]
* 15:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'After maintenance db1142 ([[phab:T296143|T296143]])', diff saved to https://phabricator.wikimedia.org/P17820 and previous config saved to /var/cache/conftool/dbconfig/20211124-154236-ladsgroup.json
* 15:39 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM kafkamon2002.codfw.wmnet
* 15:39 btullis@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM schema2004.codfw.wmnet
* 15:36 btullis@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM schema2003.codfw.wmnet
* 15:36 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM kafkamon2002.codfw.wmnet
* 15:35 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM irc2001.wikimedia.org
* 15:34 btullis@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM schema2003.codfw.wmnet
* 15:32 papaul: reboot ms-be2058 for firmware upgrade
* 15:31 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM irc2001.wikimedia.org
* 15:30 jayme@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM kubestagemaster2001.codfw.wmnet
* 15:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'After maintenance db1142 ([[phab:T296143|T296143]])', diff saved to https://phabricator.wikimedia.org/P17819 and previous config saved to /var/cache/conftool/dbconfig/20211124-152731-ladsgroup.json
* 15:23 jayme@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM kubestagemaster2001.codfw.wmnet
* 15:21 jayme@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM dragonfly-supernode2001.codfw.wmnet
* 15:17 jayme@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM dragonfly-supernode2001.codfw.wmnet
* 15:14 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM gitlab2001.wikimedia.org
* 15:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'After maintenance db1142 ([[phab:T296143|T296143]])', diff saved to https://phabricator.wikimedia.org/P17817 and previous config saved to /var/cache/conftool/dbconfig/20211124-151226-ladsgroup.json
* 15:08 elukey@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
* 15:08 elukey@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
* 15:08 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM gitlab2001.wikimedia.org
* 15:06 elukey@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
* 15:06 elukey@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
* 15:05 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM netflow2001.codfw.wmnet
* 15:02 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM netflow2001.codfw.wmnet
* 14:59 klausman@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
* 14:59 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM durum2002.codfw.wmnet
* 14:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'After maintenance db1142 ([[phab:T296143|T296143]])', diff saved to https://phabricator.wikimedia.org/P17815 and previous config saved to /var/cache/conftool/dbconfig/20211124-145721-ladsgroup.json
* 14:55 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM durum2002.codfw.wmnet
* 14:54 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM search-loader2001.codfw.wmnet
* 14:50 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM search-loader2001.codfw.wmnet
* 14:50 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM durum2001.codfw.wmnet
* 14:49 klausman@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
* 14:49 klausman@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
* 14:49 klausman@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
* 14:44 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM durum2001.codfw.wmnet
* 14:40 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM doh2002.wikimedia.org
* 14:39 filippo@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM logstash2031.codfw.wmnet
* 14:36 filippo@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM logstash2031.codfw.wmnet
* 14:36 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM doh2002.wikimedia.org
* 14:34 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM doh2001.wikimedia.org
* 14:33 klausman@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
* 14:32 klausman@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
* 14:31 filippo@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM logstash2030.codfw.wmnet
* 14:31 klausman@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
* 14:31 klausman@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
* 14:30 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM doh2001.wikimedia.org
* 14:30 elukey@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
* 14:30 elukey@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
* 14:28 godog: systemctl reset-failed ifup@ens5.service on logstash2024 [[phab:T273026|T273026]]
* 14:28 klausman@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
* 14:28 klausman@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
* 14:27 klausman@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
* 14:27 klausman@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
* 14:26 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM idp2001.wikimedia.org
* 14:26 filippo@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM logstash2030.codfw.wmnet
* 14:23 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM idp2001.wikimedia.org
* 14:21 filippo@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM logstash2025.codfw.wmnet
* 14:19 klausman@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
* 14:19 klausman@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
* 14:15 filippo@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM logstash2025.codfw.wmnet
* 14:10 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM idp-test2001.wikimedia.org
* 14:10 filippo@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM logstash2024.codfw.wmnet
* 14:06 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM idp-test2001.wikimedia.org
* 14:00 filippo@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM logstash2024.codfw.wmnet
* 13:58 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM serpens.wikimedia.org
* 13:55 filippo@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM logstash2023.codfw.wmnet
* 13:54 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM serpens.wikimedia.org
* 13:49 filippo@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM logstash2023.codfw.wmnet
* 13:41 filippo@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM logstash2006.codfw.wmnet
* 13:39 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1142.eqiad.wmnet with reason: Maintenance [[phab:T296143|T296143]]
* 13:39 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db1142.eqiad.wmnet with reason: Maintenance [[phab:T296143|T296143]]
* 13:39 filippo@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM logstash2006.codfw.wmnet
* 13:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1142 ([[phab:T296143|T296143]])', diff saved to https://phabricator.wikimedia.org/P17813 and previous config saved to /var/cache/conftool/dbconfig/20211124-133809-ladsgroup.json
* 13:37 filippo@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM logstash2005.codfw.wmnet
* 13:37 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ldap-replica2006.wikimedia.org
* 13:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'After maintenance db1141 ([[phab:T296143|T296143]])', diff saved to https://phabricator.wikimedia.org/P17812 and previous config saved to /var/cache/conftool/dbconfig/20211124-133628-ladsgroup.json
* 13:36 XioNoX: add Jayme r/o user to all network devices
* 13:35 filippo@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM logstash2005.codfw.wmnet
* 13:34 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM ldap-replica2006.wikimedia.org
* 13:30 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ldap-replica2005.wikimedia.org
* 13:30 filippo@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM logstash2004.codfw.wmnet
* 13:28 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM ldap-replica2005.wikimedia.org
* 13:27 filippo@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM logstash2004.codfw.wmnet
* 13:27 filippo@cumin1001: END (FAIL) - Cookbook sre.ganeti.reboot-vm (exit_code=99) for VM logstash2004.codfw.wmnet
* 13:27 filippo@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM logstash2004.codfw.wmnet
* 13:25 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ldap-corp2001.wikimedia.org
* 13:22 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM ldap-corp2001.wikimedia.org
* 13:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'After maintenance db1141 ([[phab:T296143|T296143]])', diff saved to https://phabricator.wikimedia.org/P17811 and previous config saved to /var/cache/conftool/dbconfig/20211124-131519-ladsgroup.json
* 13:07 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 13:04 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 13:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'After maintenance db1141 ([[phab:T296143|T296143]])', diff saved to https://phabricator.wikimedia.org/P17810 and previous config saved to /var/cache/conftool/dbconfig/20211124-130200-ladsgroup.json
* 13:00 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM apt2001.wikimedia.org
* 12:54 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM apt2001.wikimedia.org
* 12:53 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM grafana2001.codfw.wmnet
* 12:51 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM grafana2001.codfw.wmnet
* 12:48 jbond: enable puppet post puppetdb reboot
* 12:48 jayme@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 12:47 jbond@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM puppetdb2002.codfw.wmnet
* 12:46 jelto@cumin1001: conftool action : set/pooled=true; selector: name=codfw,dnsdisc=(apertium{{!}}api-gateway{{!}}apple-search{{!}}blubberoid{{!}}citoid{{!}}cxserver{{!}}echostore{{!}}eventgate-analytics{{!}}eventgate-analytics-external{{!}}eventgate-logging-external{{!}}eventstreams{{!}}eventstreams-internal{{!}}linkrecommendation{{!}}mathoid{{!}}mobileapps{{!}}proton{{!}}push-notifications{{!}}recommendation-api{{!}}sessionstore{{!}}shellbox{{!}}shellbox-constraints{{!}}shellbox-media{{!}}shellbox-syntaxh
* 12:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'After maintenance db1141 ([[phab:T296143|T296143]])', diff saved to https://phabricator.wikimedia.org/P17809 and previous config saved to /var/cache/conftool/dbconfig/20211124-124420-ladsgroup.json
* 12:43 jbond@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM puppetdb2002.codfw.wmnet
* 12:37 jbond: disable puppet for puppetdb reboot
* 12:32 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM urldownloader2002.wikimedia.org
* 12:29 jelto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'miscweb' for release 'main' .
* 12:29 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM urldownloader2002.wikimedia.org
* 12:28 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM urldownloader2001.wikimedia.org
* 12:25 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM urldownloader2001.wikimedia.org
* 12:24 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM releases2002.codfw.wmnet
* 12:23 awight: EU scap deployment finished
* 12:21 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM releases2002.codfw.wmnet
* 12:21 awight@deploy1002: Synchronized wmf-config/CommonSettings.php: Config: [[gerrit:737195{{!}}Replace global with parent scope]] (duration: 00m 55s)
* 12:16 awight@deploy1002: Synchronized wmf-config/CommonSettings.php: Config: [[gerrit:737193{{!}}[lint] fully-qualify classname]] (duration: 00m 55s)
* 12:13 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM netboxdb2001.codfw.wmnet
* 12:10 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM netboxdb2001.codfw.wmnet
* 12:10 awight@deploy1002: Synchronized wmf-config: Config: [[gerrit:740766{{!}}VisualEditor template dialog: new sidebar and inline descriptions (T284203, T286992)]] (duration: 00m 57s)
* 12:09 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 12:08 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM netbox2001.wikimedia.org
* 12:07 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 12:03 jelto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'zotero' for release 'production' .
* 12:03 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM netbox2001.wikimedia.org
* 12:02 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM netbox-dev2001.wikimedia.org
* 12:02 jelto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
* 12:01 jelto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'toolhub' for release 'main' .
* 11:59 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM netbox-dev2001.wikimedia.org
* 11:58 jelto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'termbox' for release 'production' .
* 11:58 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM rpki2002.codfw.wmnet
* 11:56 jelto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 11:54 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM rpki2002.codfw.wmnet
* 11:53 jelto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'similar-users' for release 'main' .
* 11:51 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM poolcounter2003.codfw.wmnet
* 11:50 jelto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'shellbox-timeline' for release 'main' .
* 11:49 moritzm: systemctl reset-failed ifup@ens5.service on poolcounter2003 [[phab:T273026|T273026]]
* 11:48 jelto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'shellbox-syntaxhighlight' for release 'main' .
* 11:45 jelto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'shellbox-media' for release 'main' .
* 11:45 jelto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'shellbox-media' for release 'main' .
* 11:44 jelto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'shellbox-media' for release 'main' .
* 11:43 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM poolcounter2003.codfw.wmnet
* 11:42 jelto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'shellbox-constraints' for release 'main' .
* 11:41 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM poolcounter2004.codfw.wmnet
* 11:40 jelto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'shellbox' for release 'main' .
* 11:38 jelto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'sessionstore' for release 'production' .
* 11:37 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM poolcounter2004.codfw.wmnet
* 11:36 jelto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'recommendation-api' for release 'production' .
* 11:35 godog: bounce apache2 on logstash1025
* 11:35 jelto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'recommendation-api' for release 'production' .
* 11:32 jelto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
* 11:27 Amir1: optimizing image.commonswiki in db1141 ([[phab:T296143|T296143]])
* 11:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1141 ([[phab:T296143|T296143]])', diff saved to https://phabricator.wikimedia.org/P17808 and previous config saved to /var/cache/conftool/dbconfig/20211124-112539-ladsgroup.json
* 11:25 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1141.eqiad.wmnet with reason: Maintenance [[phab:T296143|T296143]]
* 11:25 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db1141.eqiad.wmnet with reason: Maintenance [[phab:T296143|T296143]]
* 11:23 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM orespoolcounter2004.codfw.wmnet
* 11:23 jelto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'push-notifications' for release 'main' .
* 11:21 jelto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'push-notifications' for release 'main' .
* 11:19 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM orespoolcounter2004.codfw.wmnet
* 11:18 jelto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'proton' for release 'production' .
* 11:18 jelto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'proton' for release 'production' .
* 11:17 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM orespoolcounter2003.codfw.wmnet
* 11:15 jelto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 11:13 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM orespoolcounter2003.codfw.wmnet
* 11:13 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 11:08 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 11:07 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM webperf2002.codfw.wmnet
* 11:05 jelto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
* 11:02 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM webperf2002.codfw.wmnet
* 10:55 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM webperf2001.codfw.wmnet
* 10:53 jelto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'miscweb' for release 'main' .
* 10:52 jelto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mathoid' for release 'production' .
* 10:51 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM webperf2001.codfw.wmnet
* 10:50 jelto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'linkrecommendation' for release 'external' .
* 10:50 jelto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'linkrecommendation' for release 'internal' .
* 10:49 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM xhgui2001.codfw.wmnet
* 10:48 XioNoX: rollback: disable ping-offload for codfw - [[phab:T294119|T294119]]
* 10:47 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM xhgui2001.codfw.wmnet
* 10:47 jelto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventstreams-internal' for release 'main' .
* 10:46 jelto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventstreams-internal' for release 'main' .
* 10:44 jelto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventstreams' for release 'production' .
* 10:42 jelto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventstreams' for release 'production' .
* 10:42 jelto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventstreams' for release 'canary' .
* 10:42 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM people2002.codfw.wmnet
* 10:40 jelto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
* 10:40 jelto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-main' for release 'canary' .
* 10:38 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM people2002.codfw.wmnet
* 10:38 jelto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
* 10:38 jelto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'canary' .
* 10:36 jelto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
* 10:36 jelto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'canary' .
* 10:33 jelto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-analytics' for release 'canary' .
* 10:33 jelto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-analytics' for release 'production' .
* 10:32 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ping2001.codfw.wmnet
* 10:28 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM ping2001.codfw.wmnet
* 10:27 jelto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'echostore' for release 'production' .
* 10:25 jelto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'cxserver' for release 'production' .
* 10:25 XioNoX: disable ping-offload for codfw - [[phab:T294119|T294119]]
* 10:24 jelto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'cxserver' for release 'production' .
* 10:21 jelto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'citoid' for release 'production' .
* 10:20 jelto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'citoid' for release 'production' .
* 10:20 jelto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'citoid' for release 'production' .
* 10:18 jelto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' .
* 10:17 jelto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'changeprop' for release 'production' .
* 10:14 jelto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'changeprop' for release 'production' .
* 10:13 jelto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'changeprop' for release 'production' .
* 10:12 jelto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'blubberoid' for release 'production' .
* 10:06 jelto: downtime PyBal backends health check for helm3 de-deploy [[phab:T251305|T251305]]. I'm keeping an eye on icing and remove downtime as soon as I'm finished
* 10:06 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM puppetboard2002.codfw.wmnet
* 10:02 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM puppetboard2002.codfw.wmnet
* 10:02 elukey@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ml-etcd2003.codfw.wmnet
* 10:02 vgutierrez: repool cp5006 - [[phab:T290005|T290005]]
* 10:01 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM puppetboard2001.codfw.wmnet
* 10:00 elukey@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM ml-etcd2003.codfw.wmnet
* 09:59 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM puppetboard2001.codfw.wmnet
* 09:58 jelto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'apple-search' for release 'main' .
* 09:56 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM debmonitor2002.codfw.wmnet
* 09:56 jelto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
* 09:56 elukey@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ml-etcd2002.codfw.wmnet
* 09:55 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM debmonitor2002.codfw.wmnet
* 09:54 jelto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
* 09:53 elukey@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM ml-etcd2002.codfw.wmnet
* 09:53 vgutierrez: restart varnish/haproxy on cp5006 - [[phab:T290005|T290005]]
* 09:53 elukey@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ml-etcd2001.codfw.wmnet
* 09:52 jelto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'apertium' for release 'production' .
* 09:50 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM install2003.wikimedia.org
* 09:49 elukey@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM ml-etcd2001.codfw.wmnet
* 09:46 elukey@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ml-serve-ctrl2002.codfw.wmnet
* 09:46 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM install2003.wikimedia.org
* 09:45 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM mx2001.wikimedia.org
* 09:45 vgutierrez: depool cp5006 - [[phab:T290005|T290005]]
* 09:43 elukey@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM ml-serve-ctrl2002.codfw.wmnet
* 09:41 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM mx2001.wikimedia.org
* 09:34 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM planet2002.codfw.wmnet
* 09:30 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM planet2002.codfw.wmnet
* 09:30 oblivian@puppetmaster1001: conftool action : set/pooled=true; selector: dnsdisc=apple-search,name=eqiad
* 09:24 jelto@cumin1001: conftool action : set/pooled=false; selector: name=codfw,dnsdisc=(apertium{{!}}api-gateway{{!}}blubberoid{{!}}citoid{{!}}cxserver{{!}}echostore{{!}}eventgate-analytics{{!}}eventgate-analytics-external{{!}}eventgate-logging-external{{!}}eventstreams{{!}}eventstreams-internal{{!}}linkrecommendation{{!}}mathoid{{!}}mobileapps{{!}}proton{{!}}push-notifications{{!}}recommendation-api{{!}}sessionstore{{!}}shellbox{{!}}shellbox-constraints{{!}}shellbox-media{{!}}shellbox-syntaxhighlight{{!}}she
* 09:22 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM failoid2002.codfw.wmnet
* 09:20 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM failoid2002.codfw.wmnet
* 09:19 elukey@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ml-serve-ctrl2001.codfw.wmnet
* 09:16 elukey@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM ml-serve-ctrl2001.codfw.wmnet
* 09:16 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on zotero.svc.codfw.wmnet with reason: helm3 de-deploy [[phab:T251305|T251305]]
* 09:16 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on zotero.svc.codfw.wmnet with reason: helm3 de-deploy [[phab:T251305|T251305]]
* 09:16 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on wikifeeds.svc.codfw.wmnet with reason: helm3 de-deploy [[phab:T251305|T251305]]
* 09:15 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on wikifeeds.svc.codfw.wmnet with reason: helm3 de-deploy [[phab:T251305|T251305]]
* 09:15 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on termbox.svc.codfw.wmnet with reason: helm3 de-deploy [[phab:T251305|T251305]]
* 09:15 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on termbox.svc.codfw.wmnet with reason: helm3 de-deploy [[phab:T251305|T251305]]
* 09:15 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on tegola-vector-tiles.svc.codfw.wmnet with reason: helm3 de-deploy [[phab:T251305|T251305]]
* 09:15 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on tegola-vector-tiles.svc.codfw.wmnet with reason: helm3 de-deploy [[phab:T251305|T251305]]
* 09:15 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on similar-users.svc.codfw.wmnet with reason: helm3 de-deploy [[phab:T251305|T251305]]
* 09:15 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on similar-users.svc.codfw.wmnet with reason: helm3 de-deploy [[phab:T251305|T251305]]
* 09:15 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on shellbox-timeline.svc.codfw.wmnet with reason: helm3 de-deploy [[phab:T251305|T251305]]
* 09:15 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on shellbox-timeline.svc.codfw.wmnet with reason: helm3 de-deploy [[phab:T251305|T251305]]
* 09:15 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on shellbox-syntaxhighlight.svc.codfw.wmnet with reason: helm3 de-deploy [[phab:T251305|T251305]]
* 09:15 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on shellbox-syntaxhighlight.svc.codfw.wmnet with reason: helm3 de-deploy [[phab:T251305|T251305]]
* 09:15 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on shellbox-media.svc.codfw.wmnet with reason: helm3 de-deploy [[phab:T251305|T251305]]
* 09:15 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on shellbox-media.svc.codfw.wmnet with reason: helm3 de-deploy [[phab:T251305|T251305]]
* 09:15 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on shellbox-constraints.svc.codfw.wmnet with reason: helm3 de-deploy [[phab:T251305|T251305]]
* 09:15 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on shellbox-constraints.svc.codfw.wmnet with reason: helm3 de-deploy [[phab:T251305|T251305]]
* 09:15 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on shellbox.svc.codfw.wmnet with reason: helm3 de-deploy [[phab:T251305|T251305]]
* 09:15 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on shellbox.svc.codfw.wmnet with reason: helm3 de-deploy [[phab:T251305|T251305]]
* 09:15 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on sessionstore.svc.codfw.wmnet with reason: helm3 de-deploy [[phab:T251305|T251305]]
* 09:15 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on sessionstore.svc.codfw.wmnet with reason: helm3 de-deploy [[phab:T251305|T251305]]
* 09:15 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on recommendation-api.svc.codfw.wmnet with reason: helm3 de-deploy [[phab:T251305|T251305]]
* 09:14 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on recommendation-api.svc.codfw.wmnet with reason: helm3 de-deploy [[phab:T251305|T251305]]
* 09:14 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on push-notifications.svc.codfw.wmnet with reason: helm3 de-deploy [[phab:T251305|T251305]]
* 09:14 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on push-notifications.svc.codfw.wmnet with reason: helm3 de-deploy [[phab:T251305|T251305]]
* 09:14 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on proton.svc.codfw.wmnet with reason: helm3 de-deploy [[phab:T251305|T251305]]
* 09:14 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on proton.svc.codfw.wmnet with reason: helm3 de-deploy [[phab:T251305|T251305]]
* 09:14 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on mobileapps.svc.codfw.wmnet with reason: helm3 de-deploy [[phab:T251305|T251305]]
* 09:14 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on mobileapps.svc.codfw.wmnet with reason: helm3 de-deploy [[phab:T251305|T251305]]
* 09:14 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on mathoid.svc.codfw.wmnet with reason: helm3 de-deploy [[phab:T251305|T251305]]
* 09:14 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on mathoid.svc.codfw.wmnet with reason: helm3 de-deploy [[phab:T251305|T251305]]
* 09:14 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on linkrecommendation.svc.codfw.wmnet with reason: helm3 de-deploy [[phab:T251305|T251305]]
* 09:14 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on linkrecommendation.svc.codfw.wmnet with reason: helm3 de-deploy [[phab:T251305|T251305]]
* 09:14 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on eventstreams-internal.svc.codfw.wmnet with reason: helm3 de-deploy [[phab:T251305|T251305]]
* 09:14 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on eventstreams-internal.svc.codfw.wmnet with reason: helm3 de-deploy [[phab:T251305|T251305]]
* 09:14 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on eventstreams.svc.codfw.wmnet with reason: helm3 de-deploy [[phab:T251305|T251305]]
* 09:14 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on eventstreams.svc.codfw.wmnet with reason: helm3 de-deploy [[phab:T251305|T251305]]
* 09:14 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on eventgate-main.svc.codfw.wmnet with reason: helm3 de-deploy [[phab:T251305|T251305]]
* 09:14 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on eventgate-main.svc.codfw.wmnet with reason: helm3 de-deploy [[phab:T251305|T251305]]
* 09:14 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on eventgate-logging-external.svc.codfw.wmnet with reason: helm3 de-deploy [[phab:T251305|T251305]]
* 09:14 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on eventgate-logging-external.svc.codfw.wmnet with reason: helm3 de-deploy [[phab:T251305|T251305]]
* 09:14 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on eventgate-analytics-external.svc.codfw.wmnet with reason: helm3 de-deploy [[phab:T251305|T251305]]
* 09:13 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on eventgate-analytics-external.svc.codfw.wmnet with reason: helm3 de-deploy [[phab:T251305|T251305]]
* 09:13 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on eventgate-analytics.svc.codfw.wmnet with reason: helm3 de-deploy [[phab:T251305|T251305]]
* 09:13 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on eventgate-analytics.svc.codfw.wmnet with reason: helm3 de-deploy [[phab:T251305|T251305]]
* 09:13 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on echostore.svc.codfw.wmnet with reason: helm3 de-deploy [[phab:T251305|T251305]]
* 09:13 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on echostore.svc.codfw.wmnet with reason: helm3 de-deploy [[phab:T251305|T251305]]
* 09:13 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on cxserver.svc.codfw.wmnet with reason: helm3 de-deploy [[phab:T251305|T251305]]
* 09:13 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on cxserver.svc.codfw.wmnet with reason: helm3 de-deploy [[phab:T251305|T251305]]
* 09:13 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on citoid.svc.codfw.wmnet with reason: helm3 de-deploy [[phab:T251305|T251305]]
* 09:13 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on citoid.svc.codfw.wmnet with reason: helm3 de-deploy [[phab:T251305|T251305]]
* 09:13 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on blubberoid.svc.codfw.wmnet with reason: helm3 de-deploy [[phab:T251305|T251305]]
* 09:13 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on blubberoid.svc.codfw.wmnet with reason: helm3 de-deploy [[phab:T251305|T251305]]
* 09:13 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on apple-search.svc.codfw.wmnet with reason: helm3 de-deploy [[phab:T251305|T251305]]
* 09:13 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on apple-search.svc.codfw.wmnet with reason: helm3 de-deploy [[phab:T251305|T251305]]
* 09:13 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on api-gateway.svc.codfw.wmnet with reason: helm3 de-deploy [[phab:T251305|T251305]]
* 09:13 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on api-gateway.svc.codfw.wmnet with reason: helm3 de-deploy [[phab:T251305|T251305]]
* 09:11 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on apertium.svc.codfw.wmnet with reason: helm3 de-deploy [[phab:T251305|T251305]]
* 09:11 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on apertium.svc.codfw.wmnet with reason: helm3 de-deploy [[phab:T251305|T251305]]
* 09:10 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM deneb.codfw.wmnet
* 09:08 _joe_: switching search.wikimedia.org to be served by the apple-search servcie
* 09:04 jelto: start re-deploy procedure in codfw Kubernetes [[phab:T251305|T251305]]
* 09:01 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM deneb.codfw.wmnet
* 08:59 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 08:56 _joe_: repooling cp2027
* 08:55 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 08:55 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'apple-search' for release 'main' .
* 08:51 ladsgroup@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:741082{{!}}Set actor migration to write both on all wikis (T275246)]] (duration: 00m 57s)
* 08:51 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'apple-search' for release 'main' .
* 08:41 vgutierrez: depool cp2027
* 08:05 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1125.eqiad.wmnet with OS bullseye
* 07:40 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host db1125.eqiad.wmnet with OS bullseye
* 07:23 elukey: reboot kubernetes1018 (role::insetup) to verify negotiated speed of eth interface
* 07:12 elukey: drop /tmp/blockmgr-20fe4b2b-31fb-4a85-b5b1-{{Gerrit|bebe254120f8}} and other blockmgr-* dirs on stat1006 to free space on the root partition
* 06:47 Amir1: running optimize table with replication on db1155:3314 ([[phab:T296143|T296143]])
* 06:45 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance ([[phab:T296143|T296143]])
* 06:45 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 5:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance ([[phab:T296143|T296143]])
* 06:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1121 (re)pooling @ 100%: After optimize table ([[phab:T296143|T296143]])', diff saved to https://phabricator.wikimedia.org/P17807 and previous config saved to /var/cache/conftool/dbconfig/20211124-063228-root.json
* 06:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1121 (re)pooling @ 75%: After optimize table ([[phab:T296143|T296143]])', diff saved to https://phabricator.wikimedia.org/P17806 and previous config saved to /var/cache/conftool/dbconfig/20211124-061725-root.json
* 06:05 marostegui: Upgrade db1128's kernel [[phab:T288720|T288720]]
* 06:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1121 (re)pooling @ 25%: After optimize table ([[phab:T296143|T296143]])', diff saved to https://phabricator.wikimedia.org/P17805 and previous config saved to /var/cache/conftool/dbconfig/20211124-060221-root.json
* 05:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1121 (re)pooling @ 10%: After optimize table ([[phab:T296143|T296143]])', diff saved to https://phabricator.wikimedia.org/P17804 and previous config saved to /var/cache/conftool/dbconfig/20211124-054718-root.json
* 00:25 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wdqs2012.codfw.wmnet with OS buster


== 2015-07-13 ==
== 2021-11-23 ==
* 23:22 logmsgbot: catrope Synchronized php-1.26wmf13/extensions/VisualEditor: SWAT (duration: 00m 11s)
* 23:53 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host wdqs2012.codfw.wmnet with OS buster
* 23:11 logmsgbot: catrope Synchronized php-1.26wmf13/extensions/Flow/includes/Parsoid/Utils.php: Add title to Parsoid exception logging (duration: 00m 12s)
* 23:43 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wdqs2011.codfw.wmnet with OS buster
* 22:45 logmsgbot: legoktm Synchronized wmf-config: Revert "Set $wgCentralAuthStrict = true;" (duration: 00m 13s)
* 23:12 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host wdqs2011.codfw.wmnet with OS buster
* 22:41 logmsgbot: legoktm Synchronized wmf-config/CommonSettings.php: Set $wgCentralAuthStrict = true; (duration: 00m 13s)
* 23:11 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wdqs2010.codfw.wmnet with OS buster
* 22:41 logmsgbot: legoktm Synchronized wmf-config/InitialiseSettings.php: Set $wgCentralAuthStrict = true; (duration: 00m 12s)
* 22:40 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host wdqs2010.codfw.wmnet with OS buster
* 22:16 logmsgbot: legoktm Synchronized php-1.26wmf13/includes/User.php: Add 'AuthPluginStrict' log to identify users who are unable to authenticate (duration: 00m 13s)
* 22:28 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wdqs2009.codfw.wmnet with OS buster
* 22:15 logmsgbot: legoktm Synchronized php-1.26wmf13/includes/api/ApiMain.php: Revert "Revert "Revert Count API module instantiations and Hook runs"" (duration: 00m 12s)
* 21:58 tgr: UTC evening deploys done
* 22:15 logmsgbot: legoktm Synchronized php-1.26wmf13/includes/Hooks.php: Revert "Revert "Revert Count API module instantiations and Hook runs"" (duration: 00m 13s)
* 21:57 tgr@deploy1002: Finished scap: (no justification provided) (duration: 10m 03s)
* 22:13 ejegg: updated payments from ec34ebf61e5962f66b807abdcb519ff323d41e8e to 4ca95d55a9745c05ccfbb16ee6f23a6f75328824
* 21:57 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host wdqs2009.codfw.wmnet with OS buster
* 22:00 manybubbles: es1.6 step 4: upgrade elastic1003
* 21:56 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wdqs2009.codfw.wmnet with OS buster
* 21:54 ori: Debugging metric issue on graphite1001, brief stats drop possible
* 21:53 krinkle@deploy1002: Finished deploy [integration/docroot@a3435a7]: (no justification provided) (duration: 00m 07s)
* 21:32 legoktm: renaming ~3k users who were originally missed for SULF
* 21:53 krinkle@deploy1002: Started deploy [integration/docroot@a3435a7]: (no justification provided)
* 21:08 logmsgbot: ori Synchronized php-1.26wmf13/includes/Hooks.php: (no message) (duration: 00m 12s)
* 21:47 tgr@deploy1002: Started scap: (no justification provided)
* 21:08 logmsgbot: ori Synchronized php-1.26wmf13/includes/api/ApiMain.php: (no message) (duration: 00m 13s)
* 21:47 tgr@deploy1002: Synchronized php-1.38.0-wmf.9/extensions/GrowthExperiments: Backport: [[gerrit:740777{{!}}Add Image: Validate GEInfoboxTemplates size (T294518)]] (duration: 00m 56s)
* 20:42 logmsgbot: ori Synchronized php-1.26wmf13/includes/api/ApiMain.php: f9c89d2814: Revert "Revert Count API module instantiations and Hook runs" (duration: 00m 13s)
* 21:39 tgr@deploy1002: Synchronized php-1.38.0-wmf.9/extensions/GrowthExperiments/includes/Api/ApiQueryGrowthTasks.php: Backport: [[gerrit:740776{{!}}Structured task caching/filtering cherry-picks step 3]] (duration: 00m 55s)
* 20:30 logmsgbot: ori Synchronized wmf-config/CommonSettings.php: Ieb62ee05: Temporary hack to facilitate migration of l10n cache implementations (duration: 00m 11s)
* 21:35 tgr@deploy1002: Synchronized php-1.38.0-wmf.9/extensions/GrowthExperiments: Backport: [[gerrit:740775{{!}}Structured task caching/filtering cherry-picks step 2]] (duration: 00m 57s)
* 19:42 hoo: Updated Wikidata's property suggester with data from today's json dump
* 21:28 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host wdqs2009.codfw.wmnet with OS buster
* 19:24 manybubbles_: es1.6 step 3: upgrade elastic1002
* 20:57 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 19:08 legoktm: running populateContentModel.php --table=page on all small wikis
* 20:50 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 19:01 andrewbogott: two of two
* 20:25 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 19:01 mutante: morebots - are you 1.7.11 ?
* 20:21 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 19:01 andrewbogott: one of two
* 20:07 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 18:52 legoktm: running populateContentModel.php --table=page on testwiki
* 20:04 legoktm@deploy1002: Synchronized php-1.38.0-wmf.9/extensions/Echo/: re-enable cross-wiki notifications by default ([[phab:T296270|T296270]]) (duration: 00m 57s)
* 18:29 manybubbles_: es1.6 step 2: shut down extra instance of elasticsearch on elastic1021
* 19:58 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 17:39 andrewbogott: this is the second test log of three
* 19:57 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 17:39 andrewbogott: this is the first test log of three
* 19:52 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 17:36 mutante: included adminbot_1.7.11 in APT repo
* 19:51 urbanecm@deploy1002: Synchronized php-1.38.0-wmf.9/extensions/GrowthExperiments/: {{Gerrit|7d5f779a73594bb11f359bda055f2c7af8e92feb}}: Structured task caching/filtering cherry-picks, step 1 (duration: 00m 56s)
* 16:31 andrewbogott: wikidata-dev updated local puppet and rebooting property-suggester
* 19:42 urbanecm@deploy1002: Synchronized php-1.38.0-wmf.9/extensions/GrowthExperiments/: {{Gerrit|c26e407118e1cd8e1e3fea6e2f4e3e43a609ea62}}: GrowthExperiments backports (duration: 01m 03s)
* 16:08 logmsgbot: krenair Synchronized wmf-config: https://gerrit.wikimedia.org/r/#/c/224087/ (duration: 00m 12s)
* 19:27 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 16:07 logmsgbot: krenair Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/224087/ (duration: 00m 12s)
* 19:23 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 15:11 manybubbles_: all done SWATing.
* 19:18 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|bf82bfb3ddcaff04a1e90abc435ccb26f792780c}}: Add new icons, wordmarks & taglines for several wikis ([[phab:T290091|T290091]]; 2/2) (duration: 00m 56s)
* 15:09 logmsgbot: manybubbles Synchronized wmf-config/InitialiseSettings.php: SWAT enable footer contact link on ukwiki (duration: 00m 11s)
* 19:17 urbanecm@deploy1002: Synchronized static/images/mobile/copyright/: {{Gerrit|bf82bfb3ddcaff04a1e90abc435ccb26f792780c}}: Add new icons, wordmarks & taglines for several wikis ([[phab:T290091|T290091]]; 1/2) (duration: 00m 56s)
* 14:55 manybubbles_: after upgrading elasticsearch its init script no longer shuts down the old version of elasticsearch. so you have to manually kill it. that means the upgrade instructions will be "special" this time around. hopefully this is a one time thing.
* 19:12 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 14:45 manybubbles_: es1.6 step 1: upgrade elasticsearch on elastic1001 -starting
* 19:08 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|3993aacbfdbbfb6cdcc198ce369bf08b32ace865}}: Increase reading depth sampling rate to .1% ([[phab:T294777|T294777]]) (duration: 00m 57s)
* 14:45 manybubbles_: es1.6 step 0: successfully synced new versions of plugins
* 19:08 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 14:30 manybubbles_: es1.6 step 0: sync new versions of plugins
* 18:29 mbsantos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'push-notifications' for release 'main' .
* 14:30 manybubbles_: starting the elasticsearch 1.6.0 upgrade
* 18:25 ejegg: updated SmashPig standalone (IPN listener) from {{Gerrit|be68299b}} -> {{Gerrit|211f8e65}}
* 13:13 bblack: updating nginx/bind on cp*
* 18:18 mbsantos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'push-notifications' for release 'main' .
* 13:07 bblack: updating openssl on cp*
* 18:18 cmjohnson1: upgrading msw-c1-eqiad [[phab:T259758|T259758]]
* 13:02 logmsgbot: krenair Synchronized php-1.26wmf13/extensions/Cite/extension.json: https://gerrit.wikimedia.org/r/#/c/224407/ - unbreak VE mobile, https://phabricator.wikimedia.org/T105686 (duration: 00m 12s)
* 18:04 mbsantos@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
* 10:58 mobrovac: restbase deploying 6dec79d
* 18:01 mbsantos@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
* 10:22 logmsgbot: ori Synchronized php-1.26wmf13/maintenance/rebuildLocalisationCache.php: 117f60a171: rebuildLocalisationCache: don't limit memory usage (duration: 00m 12s)
* 18:00 moritzm: systemctl reset-failed ifup@ens5.service on durum2001 [[phab:T273026|T273026]]
* 08:52 godog: bounce graphite-web on graphite1001
* 17:59 mbsantos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'wikifeeds' for release 'staging' .
* 08:51 godog: bounce carbon daemons on graphite1001
* 17:55 sukhe@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM durum2002.codfw.wmnet
* 08:50 godog: upgrade graphite to 0.9.13 on graphite1001 and bounce one instance of carbon/cache
* 17:49 mutante: miscweb1002 - rm -rf /srv/deployments/scholarships ([[phab:T243037|T243037]])
* 07:29 logmsgbot: ori Synchronized php-1.26wmf13/includes/cache/LCStoreStaticArray.php: I3f63594a4: Fix variable name (follows Ib2c5856d) (duration: 00m 11s)
* 17:47 sukhe@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM durum2002.codfw.wmnet
* 06:25 logmsgbot: LocalisationUpdate failed: git pull of core failed
* 17:42 sukhe@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM durum2001.codfw.wmnet
* 06:24 ori: Experimenting with altering the localisation cache implementation for testwiki, operations/mediawiki-config on tin will have a local hack for a little bit
* 17:35 ebernhardson: [[phab:T295478|T295478]] start snapshot of commonswiki_file from cirrus codfw -> swift eqiad
* 05:07 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Mon Jul 13 05:07:32 UTC 2015 (duration 7m 31s)
* 17:34 sukhe@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM durum2001.codfw.wmnet
* 02:25 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Mon Jul 13 02:25:58 UTC 2015 (duration 25m 57s)
* 17:33 sukhe@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM doh2002.wikimedia.org
* 02:23 logmsgbot: LocalisationUpdate completed (1.26wmf13) at 2015-07-13 02:23:43+00:00
* 17:31 cmjohnson1: upgrading msw's  in row D eqiad [[phab:T259758|T259758]]
* 02:20 logmsgbot: l10nupdate Synchronized php-1.26wmf13/cache/l10n: (no message) (duration: 06m 16s)
* 17:28 sukhe@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM doh2002.wikimedia.org
* 02:10 logmsgbot: LocalisationUpdate completed (1.26wmf13) at 2015-07-13 02:10:25+00:00
* 17:26 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-fe2012.codfw.wmnet with OS stretch
* 02:10 logmsgbot: l10nupdate Synchronized php-1.26wmf13/cache/l10n: (no message) (duration: 00m 34s)
* 17:16 sukhe@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on durum2002.codfw.wmnet with reason: apply new KVM machine settings
* 01:47 springle: restarted labsdb1002 mysqld while troubleshooting replication
* 17:16 sukhe@cumin1001: START - Cookbook sre.hosts.downtime for 0:30:00 on durum2002.codfw.wmnet with reason: apply new KVM machine settings
* 17:16 sukhe@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on durum2001.codfw.wmnet with reason: apply new KVM machine settings
* 17:16 sukhe@cumin1001: START - Cookbook sre.hosts.downtime for 0:30:00 on durum2001.codfw.wmnet with reason: apply new KVM machine settings
* 17:15 sukhe@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on doh2002.wikimedia.org with reason: apply new KVM machine settings
* 17:15 sukhe@cumin1001: START - Cookbook sre.hosts.downtime for 0:30:00 on doh2002.wikimedia.org with reason: apply new KVM machine settings
* 17:15 sukhe@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on doh2001.wikimedia.org with reason: apply new KVM machine settings
* 17:15 sukhe@cumin1001: START - Cookbook sre.hosts.downtime for 0:30:00 on doh2001.wikimedia.org with reason: apply new KVM machine settings
* 17:15 sukhe@cumin1001: END (ERROR) - Cookbook sre.ganeti.reboot-vm (exit_code=97) for VM doh2001.wikimedia.org
* 17:14 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM mwdebug2002.codfw.wmnet
* 17:14 sukhe@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM doh2001.wikimedia.org
* 17:14 sukhe@cumin1001: END (ERROR) - Cookbook sre.ganeti.reboot-vm (exit_code=97) for VM doh2001.wikimedia.org
* 17:11 sukhe@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM doh2001.wikimedia.org
* 17:10 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM mwdebug2002.codfw.wmnet
* 17:05 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM mwdebug2001.codfw.wmnet
* 17:02 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM mwdebug2001.codfw.wmnet
* 16:59 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM miscweb2002.codfw.wmnet
* 16:57 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM miscweb2002.codfw.wmnet
* 16:55 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM doc2001.codfw.wmnet
* 16:53 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM doc2001.codfw.wmnet
* 16:51 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host ms-fe2012.codfw.wmnet with OS stretch
* 16:49 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM pybal-test2001.codfw.wmnet
* 16:47 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM pybal-test2001.codfw.wmnet
* 16:41 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM pybal-test2003.codfw.wmnet
* 16:39 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM pybal-test2003.codfw.wmnet
* 16:28 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-fe2011.codfw.wmnet with OS stretch
* 16:13 cmjohnson1: updating mgmt switches in row C, racks C2-C8 eqiad [[phab:T259758|T259758]]
* 15:54 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host ms-fe2011.codfw.wmnet with OS stretch
* 15:46 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'apple-search' for release 'main' .
* 15:46 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-fe2010.codfw.wmnet with OS stretch
* 15:41 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'apple-search' for release 'main' .
* 15:31 oblivian@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'apple-search' for release 'main' .
* 15:27 Emperor: rolling restart of thanos frontends [[phab:T294380|T294380]]
* 15:01 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host ms-fe2010.codfw.wmnet with OS stretch
* 14:40 btullis@cumin1001: END (PASS) - Cookbook sre.kafka.roll-restart-mirror-maker (exit_code=0) restart MirrorMaker for Kafka A:kafka-mirror-maker-test-eqiad cluster: Roll restart of jvm daemons.
* 14:34 jbond@cumin1001: conftool action : set/pooled=false; selector: name=codfw,dnsdisc=puppetboard
* 14:30 btullis@cumin1001: START - Cookbook sre.kafka.roll-restart-mirror-maker restart MirrorMaker for Kafka A:kafka-mirror-maker-test-eqiad cluster: Roll restart of jvm daemons.
* 14:09 kharlan@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'linkrecommendation' for release 'external' .
* 14:09 kharlan@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'linkrecommendation' for release 'internal' .
* 14:03 kharlan@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'linkrecommendation' for release 'internal' .
* 14:03 kharlan@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'linkrecommendation' for release 'external' .
* 14:00 marostegui: Failover m5 from db1128 to db1132 - [[phab:T288720|T288720]]
* 14:00 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host prometheus2006.codfw.wmnet with OS bullseye
* 13:50 godog: powercycle (again) ms-be2058
* 13:48 godog: add 80G to prometheus global in eqiad
* 13:31 filippo@cumin1001: START - Cookbook sre.hosts.reimage for host prometheus2006.codfw.wmnet with OS bullseye
* 13:29 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host prometheus2005.codfw.wmnet with OS bullseye
* 13:01 filippo@cumin1001: START - Cookbook sre.hosts.reimage for host prometheus2005.codfw.wmnet with OS bullseye
* 12:58 btullis@cumin1001: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching O:aqs: restarting to pick up new JRE - btullis@cumin1001
* 12:52 aborrero@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host cloudbackup1002-dev.eqiad.wmnet
* 12:46 Lucas_WMDE: UTC morning backport+config window done
* 12:43 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 12:43 aborrero@cumin1001: START - Cookbook sre.ganeti.makevm for new host cloudbackup1002-dev.eqiad.wmnet
* 12:39 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 12:34 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings-labs.php: Config: [[gerrit:737503{{!}}Set up beta test environment for QuickSurveys (T293798)]] (beta only) (duration: 00m 55s)
* 12:31 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'apple-search' for release 'main' .
* 12:29 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.38.0-wmf.9/extensions/ProofreadPage/modules/page/ext.proofreadpage.page.edit.js: Backport: [[gerrit:740784{{!}}OSD: Handle cases where the image srcset attr is not set (T296260)]] (duration: 00m 56s)
* 12:28 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 12:27 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'apple-search' for release 'main' .
* 12:26 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.38.0-wmf.9/extensions/ProofreadPage/modules/page/ext.proofreadpage.page.edit.js: Backport: [[gerrit:740778{{!}}OSD: Add a ready hook for scripts (T180569)]] (duration: 00m 56s)
* 12:24 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 12:21 oblivian@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'apple-search' for release 'main' .
* 12:12 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'apple-search' for release 'main' .
* 12:09 oblivian@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'apple-search' for release 'main' .
* 12:04 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM pybal-test2003.codfw.wmnet
* 12:01 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM pybal-test2003.codfw.wmnet
* 11:54 btullis@cumin1001: START - Cookbook sre.cassandra.roll-restart for nodes matching O:aqs: restarting to pick up new JRE - btullis@cumin1001
* 11:51 btullis@cumin1001: END (ERROR) - Cookbook sre.aqs.roll-restart (exit_code=97) for AQS aqs cluster: Roll restart of all AQS's nodejs daemons.
* 11:51 btullis@cumin1001: START - Cookbook sre.aqs.roll-restart for AQS aqs cluster: Roll restart of all AQS's nodejs daemons.
* 11:41 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM pybal-test2002.codfw.wmnet
* 11:41 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM pybal-test2002.codfw.wmnet
* 11:25 godog: powercycle ms-be2058 - down and nothign on console
* 11:17 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp5012.eqsin.wmnet with OS buster
* 11:15 vgutierrez: pool cp5012 (text) using HAProxy as TLS terminator - [[phab:T290005|T290005]]
* 11:08 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 11:08 Amir1: start of mwscript migrateRevisionActorTemp.php --wiki=testwiki --sleep=5 ([[phab:T275246|T275246]])
* 11:05 jayme: cordoned kubestage1003.eqiad.wmnet kubestage1004.eqiad.wmnet (we have issues with POD IP prefix allocation) - [[phab:T293729|T293729]]
* 11:05 jayme: uncordoned kubestage1001.eqiad.wmnet kubestage1002.eqiad.wmnet (we have issues with POD IP prefix allocation) - [[phab:T293729|T293729]]
* 11:04 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 11:02 ladsgroup@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:740807{{!}}Set test wikis to write both for actor temp table migration (T275246)]] (duration: 00m 56s)
* 10:38 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 10:31 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 10:30 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance [[phab:T296143|T296143]]
* 10:30 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance [[phab:T296143|T296143]]
* 10:29 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1155.eqiad.wmnet with reason: Maintenance [[phab:T296143|T296143]]
* 10:29 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db1155.eqiad.wmnet with reason: Maintenance [[phab:T296143|T296143]]
* 10:29 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on clouddb[1015,1019,1021].eqiad.wmnet with reason: Maintenance [[phab:T296143|T296143]]
* 10:29 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on clouddb[1015,1019,1021].eqiad.wmnet with reason: Maintenance [[phab:T296143|T296143]]
* 10:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1121 ([[phab:T296143|T296143]])', diff saved to https://phabricator.wikimedia.org/P17800 and previous config saved to /var/cache/conftool/dbconfig/20211123-102234-ladsgroup.json
* 10:22 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1121.eqiad.wmnet with reason: Maintenance [[phab:T296143|T296143]]
* 10:22 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db1121.eqiad.wmnet with reason: Maintenance [[phab:T296143|T296143]]
* 10:19 urbanecm@deploy1002: Finished scap: {{Gerrit|c98acaa2ab27e630c0a1b55a64fb81b131c087f9}}: Backport localisation updates (duration: 11m 06s)
* 10:19 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
* 10:18 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 10:08 urbanecm@deploy1002: Started scap: {{Gerrit|c98acaa2ab27e630c0a1b55a64fb81b131c087f9}}: Backport localisation updates
* 10:08 vgutierrez@cumin1001: START - Cookbook sre.hosts.reimage for host cp5012.eqsin.wmnet with OS buster
* 10:01 vgutierrez: depool cp5012 to be reimaged as cache::text_haproxy - [[phab:T290005|T290005]]
* 09:57 jayme: cordoned kubestage1001.eqiad.wmnet kubestage1002.eqiad.wmnet - [[phab:T293729|T293729]]
* 09:52 kharlan@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'linkrecommendation' for release 'staging' .
* 09:37 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1124.eqiad.wmnet with OS bullseye
* 09:27 Amir1: dropping useless GRANTs on s6 eqiad replicas without replication ([[phab:T296274|T296274]])
* 09:16 Amir1: dropping useless GRANTs on s6 eqiad master without replication ([[phab:T296274|T296274]])
* 09:09 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host db1124.eqiad.wmnet with OS bullseye
* 09:05 Amir1: fixing incorrect grants of wikiadmin on localhost in s6 master in codfw with replication
* 07:52 topranks: Adjusting BGP on cr1-eqiad and cr2-eqiad to keep MED unchanged in iBGP.
* 07:08 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1125.eqiad.wmnet with OS bullseye
* 06:41 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host db1125.eqiad.wmnet with OS bullseye
* 05:29 ryankemper: [[phab:T295705|T295705]] Downtimed `elastic2044` for one hour and doing a full reboot for good measure. Already ran the plugin upgrade: `DEBIAN_FRONTEND=noninteractive sudo apt-get -y -o Dpkg::Options::="--force-confdef" -o Dpkg::Options::="--force-confold" install elasticsearch-oss wmf-elasticsearch-search-plugins`
* 05:26 ryankemper: [[phab:T295705|T295705]] Rolling restart of `codfw` complete. `elastic2044` was manually restarted earlier today so the cookbook didn't restart it (b/c we pass in a datetime cutoff threshold) so I'm manually upgrading and restarting that host
* 05:10 ryankemper@cumin1001: END (PASS) - Cookbook sre.elasticsearch.rolling-operation (exit_code=0) restart with plugin upgrade (2 nodes at a time) for ElasticSearch cluster search_codfw: codfw plugin upgrade + restart - ryankemper@cumin1001 - [[phab:T295705|T295705]]
* 04:17 ryankemper: [[phab:T295705|T295705]] Properly disabled the sane-itizer; we don't want it running until after we (a) complete rolling restarts and (b) restore the missing `commonswikI_file` index (which is blocked on the restarts)
* 03:42 Amir1: ladsgroup@mwmaint1002:~$ cat broken_imgs {{!}} xargs -I <nowiki>{</nowiki><nowiki>}</nowiki> mwscript refreshImageMetadata.php --wiki=commonswiki --mediatype=OFFICE --verbose --mime 'image/*' --force --batch-size 1 --sleep 1 --start=<nowiki>{</nowiki><nowiki>}</nowiki> --end=<nowiki>{</nowiki><nowiki>}</nowiki> ([[phab:T296001|T296001]])
* 03:37 Amir1: rebuilding metadata of all djvu files outside of commons ([[phab:T296001|T296001]])
* 03:06 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation restart with plugin upgrade (2 nodes at a time) for ElasticSearch cluster search_codfw: codfw plugin upgrade + restart - ryankemper@cumin1001 - [[phab:T295705|T295705]]
* 02:58 ryankemper: [[phab:T295705|T295705]] `elasticsearch.exceptions.ConnectionTimeout: ConnectionTimeout caused by - ReadTimeoutError(HTTPSConnectionPool(host='search.svc.codfw.wmnet', port=9243): Read timed out. (read timeout=60))` Probably transient failure; will wait 10 mins and try again
* 02:57 ryankemper@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) restart with plugin upgrade (2 nodes at a time) for ElasticSearch cluster search_codfw: codfw plugin upgrade + restart - ryankemper@cumin1001 - [[phab:T295705|T295705]]
* 02:55 ryankemper: [[phab:T295705|T295705]] `ryankemper@cumin1001:~$ sudo cookbook sre.elasticsearch.rolling-operation codfw "codfw plugin upgrade + restart" --upgrade --nodes-per-run 2 --start-datetime 2021-11-18T18:55:54 --task-id [[phab:T295705|T295705]]` on tmux `rolling_restarts_codfw`
* 02:55 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation restart with plugin upgrade (2 nodes at a time) for ElasticSearch cluster search_codfw: codfw plugin upgrade + restart - ryankemper@cumin1001 - [[phab:T295705|T295705]]
* 02:41 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 02:37 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 02:12 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 02:08 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 01:17 urbanecm: UTC late window done
* 01:17 urbanecm@deploy1002: Finished scap: {{Gerrit|69aa4a7}}: {{Gerrit|7c0e074}}: Revert "Create redirect Special Pages for delete and protect action" ([[phab:T295611|T295611]]; [[phab:T296203|T296203]]; 4/4) (duration: 25m 50s)
* 01:12 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 01:05 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 00:51 urbanecm@deploy1002: Started scap: {{Gerrit|69aa4a7}}: {{Gerrit|7c0e074}}: Revert "Create redirect Special Pages for delete and protect action" ([[phab:T295611|T295611]]; [[phab:T296203|T296203]]; 4/4)
* 00:50 urbanecm@deploy1002: Synchronized php-1.38.0-wmf.9/autoload.php: {{Gerrit|7c0e074}}: Revert "Create redirect Special Pages for delete and protect action" ([[phab:T295611|T295611]]; [[phab:T296203|T296203]]; 3/4) (duration: 00m 55s)
* 00:49 urbanecm@deploy1002: Synchronized php-1.38.0-wmf.9/includes/specials/: {{Gerrit|7c0e074}}: Revert "Create redirect Special Pages for delete and protect action" ([[phab:T295611|T295611]]; [[phab:T296203|T296203]]; 2/4) (duration: 00m 55s)
* 00:48 urbanecm@deploy1002: Synchronized php-1.38.0-wmf.9/includes/specialpage/SpecialPageFactory.php: {{Gerrit|7c0e074}}: Revert "Create redirect Special Pages for delete and protect action" ([[phab:T295611|T295611]]; [[phab:T296203|T296203]]; 1/4) (duration: 00m 56s)
* 00:41 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|b9209433dfc8b1f81a165ec75867337800db24b1}}: Enable reading depth instrumentation at low sampling rate ([[phab:T294777|T294777]]) (duration: 00m 56s)
* 00:39 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 00:35 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 00:30 urbanecm@deploy1002: Synchronized php-1.38.0-wmf.9/extensions/WikimediaEvents: {{Gerrit|3f860c7}}: {{Gerrit|fa9fbf1}}: WikimediaEvents bbackports (2/2; [[phab:T294777|T294777]]) (duration: 00m 55s)
* 00:28 urbanecm@deploy1002: Synchronized php-1.38.0-wmf.9/extensions/WikimediaEvents/extension.json: {{Gerrit|3f860c72bca817c40486b90f0d8e0ffca72b2690}}: Restore ReadingDepth instrument (1/2) (duration: 00m 56s)
* 00:25 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 00:21 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 00:20 jeena: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/c/integration/config/+/739908


== 2015-07-12 ==
== 2021-11-22 ==
* 14:59 bblack: upgraded most packages on sodium
* 23:55 mutante: acmechief1001, acmechief-test1001: sudo systemctl restart reload-acme-chief-backend.timer
* 14:48 bblack: upgraded apache2 to 2.2.22-1ubuntu1.9 on: antimony argon caesium fluorine helium iodine logstash1001 logstash1003 magnesium neon netmon1001 rhodium stat1001 ytterbium
* 23:54 mutante: acmechief1001, acmechief-test1001: sudo systemctl start reload-acme-chief-backend.timer
* 04:49 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Sun Jul 12 04:49:08 UTC 2015 (duration 49m 7s)
* 23:48 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-fe2011.codfw.wmnet with OS stretch
* 02:26 logmsgbot: LocalisationUpdate completed (1.26wmf13) at 2015-07-12 02:26:52+00:00
* 23:22 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-fe2010.codfw.wmnet with OS stretch
* 02:25 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Sun Jul 12 02:25:33 UTC 2015 (duration 25m 32s)
* 23:20 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host ms-fe2011.codfw.wmnet with OS stretch
* 02:23 logmsgbot: l10nupdate Synchronized php-1.26wmf13/cache/l10n: (no message) (duration: 06m 12s)
* 23:18 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-fe2009.codfw.wmnet with OS stretch
* 02:10 logmsgbot: LocalisationUpdate completed (1.26wmf13) at 2015-07-12 02:10:00+00:00
* 22:53 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host ms-fe2010.codfw.wmnet with OS stretch
* 02:09 logmsgbot: l10nupdate Synchronized php-1.26wmf13/cache/l10n: (no message) (duration: 00m 34s)
* 22:44 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host ms-fe2009.codfw.wmnet with OS stretch
* 22:20 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti2028.codfw.wmnet with OS buster
* 21:51 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti2028.codfw.wmnet with OS buster
* 21:42 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti2027.codfw.wmnet with OS buster
* 21:07 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti2027.codfw.wmnet with OS buster
* 20:43 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 20:39 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 20:28 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 20:24 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 20:23 legoktm@deploy1002: Synchronized wmf-config/PoolCounterSettings.php: Lower CirrusSearch maxqueues to be closer to number of workers (duration: 00m 56s)
* 20:01 ryankemper@cumin1001: END (ERROR) - Cookbook sre.elasticsearch.rolling-operation (exit_code=97) restart with plugin upgrade (3 nodes at a time) for ElasticSearch cluster search_codfw: codfw plugin upgrade + restart - ryankemper@cumin1001 - [[phab:T295705|T295705]]
* 19:49 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation restart with plugin upgrade (3 nodes at a time) for ElasticSearch cluster search_codfw: codfw plugin upgrade + restart - ryankemper@cumin1001 - [[phab:T295705|T295705]]
* 19:49 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 19:46 urbanecm: Evening B&C window completed
* 19:45 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 19:44 urbanecm@deploy1002: Synchronized php-1.38.0-wmf.9/extensions/ProofreadPage/: {{Gerrit|10b8440069ac71434274462c545c6b2b2c9182d9}}: Use the WikiEditor ready hook instead of using() the lib ([[phab:T296033|T296033]]) (duration: 00m 56s)
* 19:34 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 19:30 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 19:24 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|b6b05e30b3c9b4007fd31ab0698507d7a48d1caf}}: kswiki: set wgTranslateNumerals to false ([[phab:T296055|T296055]]) (duration: 00m 55s)
* 19:20 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 19:18 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|4aa8d5bf465bfc3fee2ec547718af0c779f88ef4}}: Enable SandboxLink on lawiki ([[phab:T296073|T296073]]) (duration: 00m 56s)
* 19:16 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|1c082bec4c74c156b26af4349488835902c5bacd}}: Enable mapframe on the Indonesian Wikipedia ([[phab:T295571|T295571]]) (duration: 00m 56s)
* 19:15 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 19:11 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 19:05 pt1979@cumin2002: START - Cookbook sre.dns.netbox
* 19:01 vgutierrez: pool cp4032 (text) using HAProxy as TLS terminator - [[phab:T290005|T290005]]
* 18:20 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 18:14 pt1979@cumin2002: START - Cookbook sre.dns.netbox
* 18:04 ryankemper@cumin1001: END (PASS) - Cookbook sre.elasticsearch.rolling-operation (exit_code=0) restart with plugin upgrade (3 nodes at a time) for ElasticSearch cluster cloudelastic: cloudelastic plugin upgrade + restart - ryankemper@cumin1001
* 17:50 ryankemper@cumin1001: END (ERROR) - Cookbook sre.elasticsearch.rolling-operation (exit_code=97) restart with plugin upgrade (3 nodes at a time) for ElasticSearch cluster search_codfw: codfw plugin upgrade + restart - ryankemper@cumin1001 - [[phab:T295705|T295705]]
* 17:48 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 17:48 XioNoX: repool codfw
* 17:46 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4032.ulsfo.wmnet with OS buster
* 17:46 ejegg: updated fundraising python tools from {{Gerrit|d90f4c91}} -> {{Gerrit|d1d7b100}}
* 17:43 pt1979@cumin2002: START - Cookbook sre.dns.netbox
* 17:32 ebernhardson: restart both elasticsearch instances on elastic2044, reporting `connection refused` (after a brief period of `no route to host`) to masters even though the connection works outside elastic
* 17:01 ryankemper: [[phab:T295705|T295705]] Beginning rolling restart w/ plugin upgrade of `cloudelastic`: `ryankemper@cumin1001:~$ sudo cookbook sre.elasticsearch.rolling-operation cloudelastic "cloudelastic plugin upgrade + restart" --upgrade --nodes-per-run 3 --start-datetime 2021-11-22T16:59:38 --task-id [[phab:T295705|T295705]]` on tmux `rolling_restarts_cloudelastic`
* 17:00 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation restart with plugin upgrade (3 nodes at a time) for ElasticSearch cluster cloudelastic: cloudelastic plugin upgrade + restart - ryankemper@cumin1001
* 16:58 ryankemper: [Elastic] [[phab:T295705|T295705]] Rolling restart w/ plugin upgrade of `relforge` is complete
* 16:55 ryankemper: [Elastic] [[phab:T295705|T295705]] Restarting second and final relforge host: `ryankemper@relforge1003:~$ sudo systemctl restart elasticsearch_6@relforge-eqiad.service elasticsearch_6@relforge-eqiad-small-alpha.service logstash.service`
* 16:55 vgutierrez@cumin1001: START - Cookbook sre.hosts.reimage for host cp4032.ulsfo.wmnet with OS buster
* 16:52 ryankemper: [Elastic] [[phab:T295705|T295705]] Restarting first relforge host: `ryankemper@relforge1004:~$ sudo systemctl restart elasticsearch_6@relforge-eqiad.service elasticsearch_6@relforge-eqiad-small-alpha.service logstash.service`
* 16:51 jayme: fleet wide updated wmf-certificates to 0~20211122-1
* 16:50 vgutierrez: depol cp4032 to be reimaged as cache::text_haproxy - [[phab:T290005|T290005]]
* 16:49 ryankemper: [Elastic] [[phab:T295705|T295705]] Downtimed relforge* for 2 hours in order to performing a manual rolling restart of the two hosts `relforge1003` and `relforge1004`
* 16:44 ryankemper: [[phab:T295705|T295705]] Upgrading `relforge` elasticsearch packages: `ryankemper@cumin1001:~$ sudo cumin -b 2 'relforge*' 'DEBIAN_FRONTEND=noninteractive sudo apt-get -y -o Dpkg::Options::="--force-confdef" -o Dpkg::Options::="--force-confold" install elasticsearch-oss wmf-elasticsearch-search-plugins'`
* 16:39 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation restart with plugin upgrade (3 nodes at a time) for ElasticSearch cluster search_codfw: codfw plugin upgrade + restart - ryankemper@cumin1001 - [[phab:T295705|T295705]]
* 16:15 urbanecm: Password reset for Miraki@arbcom_dewiki per private request
* 16:15 moritzm: installing postgresql-13 security updates on bullseye
* 15:56 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 15:55 XioNoX: Telia DDoS auto-mitigation enabled on all circuits - [[phab:T288926|T288926]]
* 15:51 pt1979@cumin2002: START - Cookbook sre.dns.netbox
* 15:28 Amir1: revoking DROP for wikiadmin from db1100 ([[phab:T249683|T249683]])
* 15:27 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host prometheus2006.codfw.wmnet with OS bullseye
* 15:17 moritzm: set kvm:machine_version=pc-i440fx-2.8 for Ganeti cluster in codfw [[phab:T294119|T294119]]
* 15:16 jayme: imported wmf-certificates 0~20211122-1 to stretch-wikimedia,buster-wikimedia,bullseye-wikimedia
* 15:13 _joe_: restarting pybal low-traffic in codfw, eqiad
* 15:07 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 15:03 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 14:58 jelto@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host gitlab-runner1001.wikimedia.org
* 14:55 ladsgroup@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:734426{{!}}Disable DPL on opt-in wikis where not in use (T287916)]] (duration: 00m 56s)
* 14:54 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host prometheus2006.codfw.wmnet with OS bullseye
* 14:53 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 14:51 ladsgroup@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:734425{{!}}Disable DPL on Wikiversities where not in use (T287916)]] (duration: 00m 56s)
* 14:49 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 14:48 ladsgroup@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:734424{{!}}Disable DPL on Wikisources where not in use (T287916)]] (duration: 00m 56s)
* 14:44 jelto@cumin1001: START - Cookbook sre.ganeti.makevm for new host gitlab-runner1001.wikimedia.org
* 14:28 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 14:24 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 14:23 hnowlan@cumin1001: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:restbase-eqiad: Restarting to pick up Java security updates - hnowlan@cumin1001
* 14:06 akosiaris: repool wtp1025, wtp1041 to parsoid cluster. [[phab:T296098|T296098]]
* 14:05 akosiaris@cumin1001: conftool action : set/pooled=yes; selector: cluster=parsoid,name=wtp1041.eqiad.wmnet
* 14:05 akosiaris@cumin1001: conftool action : set/pooled=yes; selector: cluster=parsoid,name=wtp1025.eqiad.wmnet
* 13:58 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host prometheus2005.codfw.wmnet with OS bullseye
* 13:32 XioNoX: re-enable pybal on lvs2007 - [[phab:T295118|T295118]]
* 13:31 XioNoX: re-enable puppet on lvs2007
* 13:30 XioNoX: re-enabling V6 between cr2-codfw and asw-b-codfw - [[phab:T295118|T295118]]
* 13:28 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 13:25 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host prometheus2005.codfw.wmnet with OS bullseye
* 13:24 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 13:20 hashar@deploy1002: rebuilt and synchronized wikiversions files: all wikis to 1.38.0-wmf.9
* 13:04 XioNoX: asw-b-codfw# set virtual-chassis member 7 mastership-priority 255 - [[phab:T295118|T295118]]
* 12:53 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 12:51 Lucas_WMDE: UTC morning backport+config window done
* 12:51 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.38.0-wmf.9/extensions/ProofreadPage/includes/ProofreadPageLuaLibrary.php: Backport: [[gerrit:740556{{!}}Lua: use LinkBatch to speed up the template dependencies (T296092)]] (2/2) (duration: 01m 03s)
* 12:49 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.38.0-wmf.9/extensions/ProofreadPage/includes/Pagination/Pagination.php: Backport: [[gerrit:740556{{!}}Lua: use LinkBatch to speed up the template dependencies (T296092)]] (1/2) (duration: 01m 04s)
* 12:49 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 12:47 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.38.0-wmf.7/extensions/ProofreadPage/includes/ProofreadPageLuaLibrary.php: Backport: [[gerrit:740558{{!}}Lua: use LinkBatch to speed up the template dependencies (T296092)]] (2/2) (duration: 01m 03s)
* 12:45 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.38.0-wmf.7/extensions/ProofreadPage/includes/Pagination/Pagination.php: Backport: [[gerrit:740558{{!}}Lua: use LinkBatch to speed up the template dependencies (T296092)]] (1/2) (duration: 01m 04s)
* 12:39 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 12:35 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 12:24 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 12:20 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 12:19 jforrester@deploy1002: Synchronized wmf-config/CommonSettings.php: ExtensionDistributor: 1.37.0 is out now, so there's no beta [[phab:T289585|T289585]] (duration: 01m 04s)
* 12:11 hashar@deploy1002: Synchronized php-1.38.0-wmf.9/skins/MinervaNeue: Fix banners to show CentralNotice - [[phab:T296077|T296077]] (duration: 01m 04s)
* 11:50 moritzm: installing qemu security updates on bullseye
* 11:46 oblivian@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 11:43 moritzm: installing krb5 security updates on stretch
* 11:41 oblivian@cumin1001: START - Cookbook sre.dns.netbox
* 11:39 oblivian@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
* 11:36 oblivian@cumin1001: START - Cookbook sre.dns.netbox
* 11:34 hnowlan@cumin1001: START - Cookbook sre.cassandra.roll-restart for nodes matching A:restbase-eqiad: Restarting to pick up Java security updates - hnowlan@cumin1001
* 11:26 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.addnode (exit_code=0) for new host ganeti-test2003.codfw.wmnet to ganeti-test01.svc.codfw.wmnet
* 11:26 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti-test2003.codfw.wmnet to ganeti-test01.svc.codfw.wmnet
* 11:24 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 11:24 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti-test2003.codfw.wmnet
* 11:21 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 11:20 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti-test2003.codfw.wmnet
* 11:20 XioNoX: re-enable LibertyGlobal in esams
* 11:16 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1125.eqiad.wmnet with OS bullseye
* 11:12 XioNoX: Revert "prepend_as_out for esams/knams"
* 11:10 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti-test2003.codfw.wmnet with OS buster
* 10:54 elukey: apt-get purge up to linux-image-4.9.0-14-amd64 on sodium to free /boot space
* 10:49 elukey: `apt-get remove linux-image-4.9.0-5-amd64 linux-image-4.9.0-6-amd64` on sodium to free /boot
* 10:45 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host db1125.eqiad.wmnet with OS bullseye
* 10:41 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti-test2003.codfw.wmnet with OS buster
* 10:25 marostegui@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db1125.eqiad.wmnet with OS bullseye
* 10:16 jbond: restart snmp gracefully cr2-eqord
* 10:12 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host db1125.eqiad.wmnet with OS bullseye
* 09:45 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'apple-search' for release 'main' .
* 09:40 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'apple-search' for release 'main' .
* 09:38 marostegui@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db1125.eqiad.wmnet with OS bullseye
* 09:35 moritzm: installing Linux 4.9.272 updates on Stretch hosts
* 09:29 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 09:25 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 09:24 oblivian@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'apple-search' for release 'main' .
* 09:15 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|24b3a7769ca97e3ed951d77d911f41afae5e4136}}: Growth: Disable filtering by unstarred mentees at arwiki, enwiki, fawiki ([[phab:T293182|T293182]]) (duration: 01m 04s)
* 09:14 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host db1125.eqiad.wmnet with OS bullseye
* 09:12 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 09:08 marostegui@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db1125.eqiad.wmnet with OS bullseye
* 09:05 moritzm: installing 4.19.208-1 kernels on Stretch hosts with 4.19 kernels
* 09:05 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 08:49 moritzm: drain ganeti-test2003 for forthcoming reimage
* 08:44 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host db1125.eqiad.wmnet with OS bullseye
* 08:44 urbanecm@deploy1002: Synchronized php-1.38.0-wmf.9/extensions/GrowthExperiments/: {{Gerrit|4418c4367b7420139cd8b30cb003d697b58c618f}}: ApiSetMentorStatus: Use READ_LATEST to request back timestamp ([[phab:T295305|T295305]]) (duration: 01m 08s)
* 08:42 oblivian@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'apple-search' for release 'main' .
* 08:31 oblivian@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'apple-search' for release 'main' .
* 08:25 marostegui@cumin1001: dbctl commit (dc=all): 'db1131 (re)pooling @ 100%: Repool after HW maintenance', diff saved to https://phabricator.wikimedia.org/P17793 and previous config saved to /var/cache/conftool/dbconfig/20211122-082525-root.json
* 08:15 oblivian@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'apple-search' for release 'main' .
* 08:10 marostegui@cumin1001: dbctl commit (dc=all): 'db1131 (re)pooling @ 75%: Repool after HW maintenance', diff saved to https://phabricator.wikimedia.org/P17792 and previous config saved to /var/cache/conftool/dbconfig/20211122-081022-root.json
* 07:55 marostegui@cumin1001: dbctl commit (dc=all): 'db1131 (re)pooling @ 50%: Repool after HW maintenance', diff saved to https://phabricator.wikimedia.org/P17791 and previous config saved to /var/cache/conftool/dbconfig/20211122-075518-root.json
* 07:40 marostegui@cumin1001: dbctl commit (dc=all): 'db1131 (re)pooling @ 40%: Repool after HW maintenance', diff saved to https://phabricator.wikimedia.org/P17790 and previous config saved to /var/cache/conftool/dbconfig/20211122-074015-root.json
* 07:30 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2095.codfw.wmnet with reason: Maintenance [[phab:T296143|T296143]]
* 07:30 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2095.codfw.wmnet with reason: Maintenance [[phab:T296143|T296143]]
* 07:29 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2147.codfw.wmnet with reason: Maintenance [[phab:T296143|T296143]]
* 07:28 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2147.codfw.wmnet with reason: Maintenance [[phab:T296143|T296143]]
* 07:28 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2140.codfw.wmnet with reason: Maintenance [[phab:T296143|T296143]]
* 07:28 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2140.codfw.wmnet with reason: Maintenance [[phab:T296143|T296143]]
* 07:28 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2138.codfw.wmnet with reason: Maintenance [[phab:T296143|T296143]]
* 07:28 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2138.codfw.wmnet with reason: Maintenance [[phab:T296143|T296143]]
* 07:28 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2137.codfw.wmnet with reason: Maintenance [[phab:T296143|T296143]]
* 07:28 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2137.codfw.wmnet with reason: Maintenance [[phab:T296143|T296143]]
* 07:28 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2136.codfw.wmnet with reason: Maintenance [[phab:T296143|T296143]]
* 07:28 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2136.codfw.wmnet with reason: Maintenance [[phab:T296143|T296143]]
* 07:28 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2119.codfw.wmnet with reason: Maintenance [[phab:T296143|T296143]]
* 07:28 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2119.codfw.wmnet with reason: Maintenance [[phab:T296143|T296143]]
* 07:28 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2106.codfw.wmnet with reason: Maintenance [[phab:T296143|T296143]]
* 07:28 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2106.codfw.wmnet with reason: Maintenance [[phab:T296143|T296143]]
* 07:28 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2090.codfw.wmnet with reason: Maintenance [[phab:T296143|T296143]]
* 07:28 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2090.codfw.wmnet with reason: Maintenance [[phab:T296143|T296143]]
* 07:27 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2073.codfw.wmnet with reason: Maintenance [[phab:T296143|T296143]]
* 07:27 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2073.codfw.wmnet with reason: Maintenance [[phab:T296143|T296143]]
* 07:27 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2110.codfw.wmnet with reason: Maintenance [[phab:T296143|T296143]]
* 07:27 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2110.codfw.wmnet with reason: Maintenance [[phab:T296143|T296143]]
* 07:25 marostegui@cumin1001: dbctl commit (dc=all): 'db1131 (re)pooling @ 25%: Repool after HW maintenance', diff saved to https://phabricator.wikimedia.org/P17789 and previous config saved to /var/cache/conftool/dbconfig/20211122-072511-root.json
* 07:17 Amir1: running optimize table on image table in commonswiki on codfw with replication enabled, it'll cause replication lag ([[phab:T296143|T296143]])
* 07:10 marostegui@cumin1001: dbctl commit (dc=all): 'db1131 (re)pooling @ 20%: Repool after HW maintenance', diff saved to https://phabricator.wikimedia.org/P17788 and previous config saved to /var/cache/conftool/dbconfig/20211122-071006-root.json
* 06:55 marostegui@cumin1001: dbctl commit (dc=all): 'db1131 (re)pooling @ 10%: Repool after HW maintenance', diff saved to https://phabricator.wikimedia.org/P17787 and previous config saved to /var/cache/conftool/dbconfig/20211122-065502-root.json
* 06:46 marostegui: Revoke dump grants for scholarships database [[phab:T296166|T296166]]
* 06:39 marostegui@cumin1001: dbctl commit (dc=all): 'db1131 (re)pooling @ 5%: Repool after HW maintenance', diff saved to https://phabricator.wikimedia.org/P17786 and previous config saved to /var/cache/conftool/dbconfig/20211122-063959-root.json
* 06:24 marostegui@cumin1001: dbctl commit (dc=all): 'db1131 (re)pooling @ 1%: Repool after HW maintenance', diff saved to https://phabricator.wikimedia.org/P17785 and previous config saved to /var/cache/conftool/dbconfig/20211122-062455-root.json
* 03:30 Amir1: run optimize table on db2140 for image table ([[phab:T296143|T296143]])


== 2015-07-11 ==
== 2021-11-21 ==
* 19:48 jynus: stopping labsdb1002 after table corruption has been detected
* 13:17 dcausse: restarting blazegraph on wdqs1007 (jvm stuck for 10h)
* 19:37 urandom: from restbase1002, starting revision culling process (node thin_out_key_rev_value_data.js `hostname -i` local_group_wikimedia_T_parsoid_html 2>&1 | tee >(gzip -c > local_group_wikimedia_T_parsoid_html.log.`date +%s`.gz))
* 07:26 XioNoX: cr1-eqiad# deactivate protocols bgp group Confed_eqord
* 19:33 urandom: restbase: setting gc_grace_seconds to 604800 (1 week) on local_group_wikipedia_T_parsoid_html.data
* 05:22 Amir1: running clean up of djvu files in all wikis ([[phab:T275268|T275268]])
* 04:55 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Sat Jul 11 04:55:56 UTC 2015 (duration 55m 55s)
* 05:13 Amir1: end of djvu metadata maint script run ([[phab:T275268|T275268]])
* 04:21 bd808: Logstash cluster upgrade complete! Kibana working again
* 04:21 bd808: Upgraded Elasticsearch to 1.6.0 on logstash1006
* 04:12 bd808: rebooting logstash1006
* 04:06 bd808: logstash1005 fully recovered all shards
* 03:21 logmsgbot: mattflaschen Synchronized php-1.26wmf13/extensions/Flow/includes/Parsoid/Utils.php: Bump Flow to encode page name when sending to Parsoid (duration: 00m 13s)
* 02:28 logmsgbot: LocalisationUpdate completed (1.26wmf13) at 2015-07-11 02:28:18+00:00
* 02:25 logmsgbot: l10nupdate Synchronized php-1.26wmf13/cache/l10n: (no message) (duration: 06m 07s)
* 02:25 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Sat Jul 11 02:25:19 UTC 2015 (duration 25m 18s)
* 02:09 logmsgbot: LocalisationUpdate completed (1.26wmf13) at 2015-07-11 02:09:45+00:00
* 02:09 logmsgbot: l10nupdate Synchronized php-1.26wmf13/cache/l10n: (no message) (duration: 00m 35s)
* 00:46 bd808: Upgraded Elasticsearch to 1.6.0 on logstash1005; replicas recovering now
* 00:34 bd808: rebooting logstash1005
* 00:30 bd808: logstash1004 fully recovered all shards


== 2015-07-10 ==
== 2021-11-20 ==
* 22:51 mutante: tendril: very short maintenance downtime
* 01:02 mutante: lists1001 - restarted apache, icinga alerts for the web UI, but recovered
* 20:10 bd808: `service elasticsearch start` not starting on logstash1004; investigating
* 00:27 cdanis@cumin1001: END (PASS) - Cookbook sre.network.cf (exit_code=0)
* 20:07 bd808: ran apt-get upgrade on logstash1004
* 00:26 cdanis@cumin1001: START - Cookbook sre.network.cf
* 19:52 mutante: adminbot - built and imported 1.7.10 into APT repo
* 00:25 bblack: lvs3005 - re-enabling puppet + pybal
* 19:43 bd808: rebooting logstash1004
* 00:25 legoktm@cumin1001: END (PASS) - Cookbook sre.network.cf (exit_code=0)
* 19:40 bd808: Kibana seems to be broken by mixed 1.6.0/1.3.9 cluster
* 00:25 legoktm@cumin1001: START - Cookbook sre.network.cf
* 19:32 bd808: kibana not seeing indices after upgrading elasticsearch to 1.6.0; investigating
* 00:24 cdanis@cumin1001: END (PASS) - Cookbook sre.network.cf (exit_code=0)
* 19:26 bd808: Upgraded logstash1003 to elasticsearch 1.6.0
* 00:23 cdanis@cumin1001: START - Cookbook sre.network.cf
* 19:22 bd808: Upgraded logstash1002 to elasticsearch 1.6.0
* 00:06 bblack: lvs3005 - disabling puppet and stopping pybal (traffic will go to lvs3007)
* 19:19 bd808: Upgraded logstash1001 to elasticsearch 1.6.0
* 19:10 logmsgbot: krenair Synchronized php-1.26wmf13/extensions/VisualEditor/lib/ve/src/ce/nodes/ve.ce.TableNode.js: https://gerrit.wikimedia.org/r/#/c/224122/ (duration: 00m 12s)
* 18:11 gwicke: ansible -i production restbase -a 'nodetool setcompactionthroughput 120'
* 18:00 gwicke: ansible -i production restbase -a 'nodetool setcompactionthroughput 90'
* 17:49 gwicke: rolling restart of the cassandra cluster to apply https://gerrit.wikimedia.org/r/#/c/224114/
* 17:32 logmsgbot: demon Synchronized wmf-config/CommonSettings.php: prevent race condition on writing settings (duration: 00m 13s)
* 17:26 moritzm: installed python security updates on mc*
* 17:25 Coren: rebooting labstore2001 (experiments with the new raid setup caused the mapper table to fill)
* 16:35 mobrovac: restbase deploying hotfix for T105509
* 15:29 mobrovac: restbase restarted restabse on restbase1004
* 15:25 godog: bounce cassandra on restbae1004
* 13:43 godog: bounce cassandra on restbae1004
* 13:37 _joe_: temporarily repooled mw1031
* 12:40 godog: bounce cassandra on restbae1004
* 07:43 godog: reimage ms-be2013 T105213
* 04:36 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Fri Jul 10 04:36:49 UTC 2015 (duration 36m 48s)
* 04:33 logmsgbot: springle Synchronized wmf-config/db-eqiad.php: depool db1037; repool db1030 (revert below) (duration: 00m 12s)
* 04:28 logmsgbot: springle Synchronized wmf-config/db-eqiad.php: repool db1037; depool db1030 (duration: 00m 13s)
* 03:14 mutante: re-enabling puppet on tools-exec-1213, working around adminbot package install fail
* 02:59 elee: please log this with the year
* 02:53 andrewbogott: testing the log by logging a test
* 01:50 gwicke: bounced cassandra on restbase1004
* 01:38 jgage: cassandra restarted on restbase1004
* 00:39 urandom: starting restbase1004
* 00:35 logmsgbot: krenair Synchronized php-1.26wmf13/extensions/VisualEditor/modules/ve-mw/ui/inspectors/ve.ui.MWLinkAnnotationInspector.js: https://gerrit.wikimedia.org/r/#/c/223983/ (duration: 00m 12s)
* 00:15 hoo: Updated WikibaseQualityConstraints data on wikidata (wikidatawiki.wbqc_constraints)


== July 9 ==
== 2021-11-19 ==
* 23:41 legoktm: deployed patch for T105413
* 23:52 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host prometheus2005.codfw.wmnet with OS bullseye
* 23:07 gwicke: bounced cassandra on restbase1004
* 23:25 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host prometheus2005.codfw.wmnet with OS bullseye
* 23:02 logmsgbot: catrope Synchronized wmf-config/CommonSettings.php: TitleBlacklist: Don't block account auto-creation (duration: 00m 13s)
* 23:24 pt1979@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host prometheus2005.codfw.wmnet with OS bullseye
* 22:09 logmsgbot: oblivian Synchronized wmf-config/PoolCounterSettings-eqiad.php: I don't think we want to keep poolcounter running on an imagescaler (duration: 00m 12s)
* 23:15 mutante: LDAP - added mmartorana to wmf (91354e9e-5706-4289-9a60-{{Gerrit|98e8a7632853}}) [[phab:T295789|T295789]]
* 21:30 logmsgbot: tgr Synchronized php-1.26wmf13/extensions/OAuth/api/MWOAuthAPI.setup.php: no canonical redirects for requests with OAuth headers (duration: 00m 12s)
* 22:59 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host prometheus2005.codfw.wmnet with OS bullseye
* 21:05 tgr: backporting https://gerrit.wikimedia.org/r/#/c/223952/- fixes OAuth which is broken for 1.26wmf13
* 20:24 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes2018.codfw.wmnet with OS stretch
* 20:47 gwicke: temporarily disabled puppet on cassandra nodes while tweaking settings
* 20:21 mutante: phabricator - adding eigyan to WMF-NDA (phab projectt 61 - https://phabricator.wikimedia.org/project/members/61/ ) - since that is now standard when adding people to the wmf LDAP group ([[phab:T295928|T295928]])
* 19:53 legoktm: manually fixing global merge of Yuvipanda->YuviPanda (T104686)
* 20:20 legoktm@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts thumbor2002.codfw.wmnet
* 19:04 gwicke: bounced cassandra on restbase1004
* 20:05 legoktm@cumin1001: START - Cookbook sre.hosts.decommission for hosts thumbor2002.codfw.wmnet
* 18:29 logmsgbot: twentyafterfour rebuilt wikiversions.cdb and synchronized wikiversions files: all wikis to 1.26wmf13
* 20:00 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts mw2280.codfw.wmnet
* 17:54 gwicke: bounced restbase on restbase1005
* 19:55 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host kubernetes2018.codfw.wmnet with OS stretch
* 17:32 ori: installed poolcounter on mw1154
* 19:51 mutante: shutting down undead server mw2280 - not icinga and puppetdb but in debmonitor and still has IP and puppet cert
* 17:31 logmsgbot: ori Synchronized wmf-config/PoolCounterSettings-eqiad.php: (no message) (duration: 00m 12s)
* 19:45 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw2280.codfw.wmnet
* 17:22 cmjohnson1: shutting down helium for a few minutes to move within the same row
* 18:54 hnowlan@cumin1001: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:restbase-codfw: Restarting to pick up Java security updates - hnowlan@cumin1001
* 16:53 gwicke: bounced cassandra on restbase1004
* 18:10 andrew@deploy1002: Finished deploy [horizon/deploy@ba16257]: moving the proxy endpoint behind keystone (duration: 04m 19s)
* 16:48 godog: reboot ms-be2013 T105213
* 18:06 andrew@deploy1002: Started deploy [horizon/deploy@ba16257]: moving the proxy endpoint behind keystone
* 16:38 gwicke: bounced cassandra on restbase1006
* 17:45 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 16:07 _joe_: repooling mw1152
* 17:41 pt1979@cumin2002: START - Cookbook sre.dns.netbox
* 15:57 godog: restart cassandra on restbase1002
* 17:25 andrew@deploy1002: Finished deploy [horizon/deploy@ee83e27]: fixing sudo rule editing (duration: 04m 10s)
* 15:34 gwicke: bounced cassandra on restbase1004
* 17:21 andrew@deploy1002: Started deploy [horizon/deploy@ee83e27]: fixing sudo rule editing
* 15:24 logmsgbot: krenair Synchronized php-1.26wmf12/extensions/ContentTranslation: https://gerrit.wikimedia.org/r/#/c/223739/ (duration: 00m 12s)
* 17:19 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 15:23 logmsgbot: krenair Synchronized php-1.26wmf13/extensions/ContentTranslation: https://gerrit.wikimedia.org/r/#/c/223737/ (duration: 00m 12s)
* 17:10 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 15:23 logmsgbot: krenair Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/223742/ (duration: 00m 12s)
* 16:54 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 15:09 gwicke: bounced cassandra on restbase1004
* 16:50 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 14:44 gwicke: re-enabled compaction throttling (60mb/s) on cassandra nodes
* 16:42 thcipriani@deploy1002: rebuilt and synchronized wikiversions files: Revert "group1 wikis to 1.38.0-wmf.9  refs [[phab:T293950|T293950]] [[phab:T296098|T296098]]"
* 14:44 bblack: reprepro: jessie-wikimedia/backports openssl pkg, 1.0.2c-1 => 1.0.2d-1~wmf1
* 16:35 thcipriani: rolling back to group0 for [[phab:T296098|T296098]]
* 14:29 _joe_: reimaging mw1152 for wiping any leftover local hacks. Depooling, scheduling downtime
* 16:20 hnowlan@cumin1001: START - Cookbook sre.cassandra.roll-restart for nodes matching A:restbase-codfw: Restarting to pick up Java security updates - hnowlan@cumin1001
* 14:28 moritzm: installed python-django security updates on labmon, netmon and californium
* 15:31 akosiaris: roll restart wtp10* php7.2-fpm excluding wtp1025, wtp1041
* 14:24 godog: really upgrade python-django on graphite2001
* 15:29 akosiaris: depooling wtp1041, wtp1025 from traffic. The entire of the parsoid cluster is in a memory pressure situation, it looks like a rolling restart of php-fpm will alleviate the pressure and gives us some time to drill more on the problem before the pressure builds up again.
* 13:48 mobrovac: restbase cassandra rolling restart to apply https://gerrit.wikimedia.org/r/223774
* 15:28 akosiaris@cumin1001: conftool action : set/pooled=no; selector: cluster=parsoid,name=wtp1025.eqiad.wmnet
* 13:02 godog: upgrade python-django on graphite1001 and graphite2001 following  http://www.ubuntu.com/usn/usn-2671-1/
* 15:28 akosiaris@cumin1001: conftool action : set/pooled=no; selector: cluster=parsoid,name=wtp1041.eqiad.wmnet
* 11:34 godog: restart cassandra on restbase1001
* 14:52 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.addnode (exit_code=0) for new host ganeti-test2001.codfw.wmnet to ganeti-test01.svc.codfw.wmnet
* 11:22 logmsgbot: krinkle Synchronized php-1.26wmf13/resources/src/mediawiki/mediawiki.util.js: T105265 (duration: 00m 11s)
* 14:49 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti-test2001.codfw.wmnet to ganeti-test01.svc.codfw.wmnet
* 11:21 logmsgbot: krinkle Synchronized php-1.26wmf13/includes/GlobalFunctions.php: T105265 (duration: 00m 12s)
* 14:44 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti-test2001.codfw.wmnet
* 11:09 mobrovac: restbase deploying https://gerrit.wikimedia.org/r/#/c/223297/ which bumps the back-end module version ( https://github.com/wikimedia/restbase-mod-table-cassandra/pull/117 )
* 14:39 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti-test2001.codfw.wmnet
* 10:53 mobrovac: restbase started thinner 15 days for wikimedia group
* 14:30 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti-test2001.codfw.wmnet with OS buster
* 10:37 mark: Shutdown AMS-IX route server BGP sessions on cr1-esams
* 14:15 jayme: fleet wide updated wmf-certificates to 0~20211119-1
* 07:48 logmsgbot: oblivian Synchronized php-1.26wmf13/thumb.php: Re-add fix for thumb.php 404s on HHVM (duration: 00m 13s)
* 13:56 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti-test2001.codfw.wmnet with OS buster
* 06:27 twentyafterfour: restarted apache2 on iridium to fix phab exception
* 13:23 moritzm: draining instances from ganeti-test2001 for reimage [[phab:T284811|T284811]]
* 06:15 springle: db1037 is repartitioning tables; it will lag intermittently for a day
* 13:02 jgiannelos@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 06:05 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Thu Jul  9 06:05:30 UTC 2015 (duration 5m 29s)
* 12:10 jgiannelos@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 05:23 gwicke: dynamically limited cassandra compaction throughput to 80mb/s; please review https://gerrit.wikimedia.org/r/#/c/223722/ to make this permanent
* 12:06 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 03:01 logmsgbot: LocalisationUpdate completed (1.26wmf13) at 2015-07-09 03:01:13+00:00
* 11:54 hnowlan: roll-restarting cassandra on eqiad maps for java updates
* 02:58 logmsgbot: l10nupdate Synchronized php-1.26wmf13/cache/l10n: (no message) (duration: 05m 29s)
* 11:36 jayme: imported wmf-certificates 0~20211119-1 to stretch-wikimedia,buster-wikimedia,bullseye-wikimedia
* 02:42 logmsgbot: LocalisationUpdate completed (1.26wmf12) at 2015-07-09 02:42:56+00:00
* 09:53 XioNoX: run `commit full` on asw-b-codfw - [[phab:T295118|T295118]]
* 02:40 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Thu Jul  9 02:40:16 UTC 2015 (duration 40m 15s)
* 09:30 XioNoX: re-enable cr2-codfw<->asw-b7-codfw link after disabling inet6 on cr2-codfw:ae2 - [[phab:T295118|T295118]]
* 02:36 logmsgbot: l10nupdate Synchronized php-1.26wmf12/cache/l10n: (no message) (duration: 10m 32s)
* 09:06 elukey@cumin1001: END (PASS) - Cookbook sre.kafka.roll-restart-brokers (exit_code=0) for Kafka A:kafka-test-eqiad cluster: Roll restart of jvm daemons for openjdk upgrade.
* 02:28 twentyafterfour: restarted phd
* 08:46 elukey@cumin1001: START - Cookbook sre.kafka.roll-restart-brokers for Kafka A:kafka-test-eqiad cluster: Roll restart of jvm daemons for openjdk upgrade.
* 02:28 twentyafterfour: moved phd log to free disk space on iridium
* 08:31 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 02:24 logmsgbot: LocalisationUpdate completed (1.26wmf13) at 2015-07-09 02:24:00+00:00
* 08:30 ayounsi@cumin1001: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) homer to cumin2002.codfw.wmnet with reason: update wmf-netbox - ayounsi@cumin1001
* 02:23 logmsgbot: l10nupdate Synchronized php-1.26wmf13/cache/l10n: (no message) (duration: 00m 34s)
* 08:29 ayounsi@cumin1001: START - Cookbook sre.deploy.python-code homer to cumin2002.codfw.wmnet with reason: update wmf-netbox - ayounsi@cumin1001
* 02:17 logmsgbot: LocalisationUpdate completed (1.26wmf12) at 2015-07-09 02:17:02+00:00
* 08:27 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 02:16 logmsgbot: l10nupdate Synchronized php-1.26wmf12/cache/l10n: (no message) (duration: 00m 47s)
* 08:26 ladsgroup@deploy1002: Synchronized php-1.38.0-wmf.9/includes: Backport: [[gerrit:739841{{!}}Revert "Title: use PageStore instead of LinkCache"]] (duration: 01m 03s)
* 02:00 springle: pkg upgrade and restart db1037
* 08:23 ayounsi@deploy1002: Finished deploy [homer/deploy@dc007aa]: Homer CR738905 (duration: 01m 25s)
* 01:49 gwicke: switched remaining cassandra nodes to JDK8
* 08:22 ayounsi@deploy1002: Started deploy [homer/deploy@dc007aa]: Homer CR738905
* 01:37 logmsgbot: springle Synchronized wmf-config/db-eqiad.php: depool db1037 (duration: 00m 11s)
* 08:17 moritzm: installing mariadb-10.5 security updates on bullseye (as packaged in Debian, not the wmf-internal packages)
* 01:07 mutante: uranium - deleted apache logs older than 90 days
* 06:55 marostegui: Reboot db1132 to pick up new kernel [[phab:T288720|T288720]]
* 00:45 RoanKattouw: Running populateContentModel.php --wiki=cawiki --table=revision --ns=5
* 06:23 marostegui: Upgrade clouddb1019
* 00:20 RoanKattouw: Ran populateContentModel.php --table=revision for odd-numbered namespaces on officewiki for T105245
* 05:00 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 04:56 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 04:55 ladsgroup@deploy1002: Synchronized php-1.38.0-wmf.9/includes/media/DjVuImage.php: Backport: [[gerrit:739838{{!}}media: Store metadata of one-page documents correctly (T296001)]] (duration: 00m 56s)
* 02:54 tgr@deploy1002: Synchronized php-1.38.0-wmf.9/extensions/GrowthExperiments/modules: Backport: [[gerrit:739837{{!}}Lazy-load structured task JS files (T296049)]] (duration: 00m 55s)
* 02:49 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 02:45 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 02:42 ryankemper@cumin1001: END (PASS) - Cookbook sre.elasticsearch.rolling-operation (exit_code=0) restart with plugin upgrade (3 nodes at a time) for ElasticSearch cluster search_eqiad: eqiad plugin upgrade + restart - ryankemper@cumin1001 - [[phab:T295705|T295705]]
* 02:26 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=phab2001-vcs.codfw.wmnet
* 02:02 mutante: [puppetmaster1001:/var/run/confd-template] $ sudo rm .git-ssh*.err
* 02:01 mutante: [puppetmaster2001:/var/run/confd-template] $  sudo rm .git-ssh*.err
* 01:57 legoktm@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts thumbor2001.codfw.wmnet
* 01:52 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=phab2001-vcs.codfw.wmnet
* 01:48 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=phab2001-vcs.codfw.wmnet
* 01:45 mutante: I think git-ssh6_22 is down (see alerts lvs2008/2009) due to the v6 issue from ongoing lvs maintenance. depooled in conftool
* 01:42 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=phab2001-vcs.codfw.wmnet
* 01:40 legoktm@cumin1001: START - Cookbook sre.hosts.decommission for hosts thumbor2001.codfw.wmnet
* 01:37 dzahn@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'miscweb' for release 'main' .
* 01:35 jhuneidi@deploy1002: Synchronized php-1.38.0-wmf.9/extensions/Cite/modules/ve-cite/ve.dm.MWReferenceNode.js: Backport for [[phab:T296044|T296044]] (duration: 00m 55s)
* 01:34 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 01:31 dzahn@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'miscweb' for release 'main' .
* 01:31 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 01:30 legoktm@cumin1001: conftool action : set/pooled=inactive; selector: name=thumbor2002.codfw.wmnet
* 01:30 legoktm@cumin1001: conftool action : set/pooled=inactive; selector: name=thumbor2001.codfw.wmnet
* 01:19 dzahn@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'miscweb' for release 'main' .
* 01:18 dzahn@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'miscweb' for release 'main' .
* 01:09 legoktm@cumin1001: conftool action : set/pooled=no; selector: name=thumbor2002.codfw.wmnet
* 01:09 legoktm@cumin1001: conftool action : set/pooled=no; selector: name=thumbor2001.codfw.wmnet
* 01:05 legoktm@cumin1001: conftool action : set/weight=10; selector: name=thumbor2006.codfw.wmnet
* 01:05 legoktm@cumin1001: conftool action : set/weight=10; selector: name=thumbor2005.codfw.wmnet
* 00:56 legoktm@cumin1001: conftool action : set/weight=5; selector: name=thumbor2006.codfw.wmnet
* 00:56 legoktm@cumin1001: conftool action : set/weight=5; selector: name=thumbor2005.codfw.wmnet
* 00:55 legoktm@cumin1001: conftool action : set/pooled=yes; selector: name=thumbor2006.codfw.wmnet
* 00:55 legoktm@cumin1001: conftool action : set/pooled=yes; selector: name=thumbor2005.codfw.wmnet
* 00:33 dzahn@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'miscweb' for release 'main' .
* 00:08 brennen: end of UTC late deployment training window


== July 8 ==
== 2021-11-18 ==
* 23:07 logmsgbot: catrope Synchronized php-1.26wmf13/extensions/Flow: SWAT (duration: 00m 14s)
* 23:47 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation restart with plugin upgrade (3 nodes at a time) for ElasticSearch cluster search_eqiad: eqiad plugin upgrade + restart - ryankemper@cumin1001 - [[phab:T295705|T295705]]
* 23:06 bd808: Restarted logstash on logstash1001; no hhvm input seen for last hour
* 23:44 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=kubernetes1001.eqiad.wmnet,service=miscweb
* 22:56 gwicke: finished rolling restart of cassandra cluster to apply https://gerrit.wikimedia.org/r/#/c/223495/
* 23:28 dzahn@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'miscweb' for release 'main' .
* 22:45 mutante: zirconium - stop puppet for role switch
* 23:27 dzahn@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'miscweb' for release 'main' .
* 22:33 logmsgbot: legoktm Synchronized php-1.26wmf13/includes/changes/EnhancedChangesList.php: Unbreak missing flags in enhanced RC (duration: 00m 12s)
* 22:52 ryankemper@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) restart with plugin upgrade (3 nodes at a time) for ElasticSearch cluster search_codfw: codfw plugin upgrade + restart - ryankemper@cumin1001 - [[phab:T295705|T295705]]
* 22:08 logmsgbot: hoo Synchronized php-1.26wmf13/extensions/Wikidata/: Update Wikibase: Fix JavaScript ULS usage (duration: 00m 20s)
* 22:48 XioNoX: asw-b-codfw> request system power-off member 7
* 21:51 logmsgbot: manybubbles Synchronized php-1.26wmf12/extensions/CirrusSearch/: Stop some fatals in cirrus (duration: 00m 13s)
* 22:44 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation restart with plugin upgrade (3 nodes at a time) for ElasticSearch cluster search_codfw: codfw plugin upgrade + restart - ryankemper@cumin1001 - [[phab:T295705|T295705]]
* 21:41 logmsgbot: bd808 Synchronized php-1.26wmf13/includes/api/ApiMain.php: Revert Count API module instantiations and Hook runs (2/2) (duration: 00m 12s)
* 22:28 mutante: icinga (alert1001) - manually fix IP of mw1488.mgmt (was 0.0.0.0  is: 10.65.1.26) in /etc/icinga/objects/puppet_hosts.cfg , running puppet
* 21:40 logmsgbot: bd808 Synchronized php-1.26wmf13/includes/Hooks.php: Revert Count API module instantiations and Hook runs (1/2) (duration: 00m 12s)
* 22:06 legoktm@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts thumbor1003.eqiad.wmnet
* 21:39 logmsgbot: bd808 Synchronized php-1.26wmf13/extensions/CirrusSearch/includes/CirrusSearch.php: Suppress interwiki results when they would break (duration: 00m 12s)
* 21:53 legoktm@cumin1001: START - Cookbook sre.hosts.decommission for hosts thumbor1003.eqiad.wmnet
* 21:08 bblack: graphite: wiped /var/log/upstart/statsite* logs, restarted statsite processes
* 21:50 legoktm@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts thumbor1004.eqiad.wmnet
* 20:56 csteipp: deployed patches for T103022 & T103023
* 21:36 legoktm@cumin1001: START - Cookbook sre.hosts.decommission for hosts thumbor1004.eqiad.wmnet
* 20:53 csteipp: deployed patch for T94116 for wmf12/wmf13
* 21:31 XioNoX: asw-b-codfw> request system power-off member 7
* 20:30 gwicke: added explicit exit 1 in /etc/init.d/cassandra on restbase1008 to prevent cassandra from starting up there; is puppet restarting it?
* 21:30 legoktm@cumin1001: conftool action : set/pooled=inactive; selector: name=thumbor1004.eqiad.wmnet
* 20:29 subbu: deployed parsoid sha c4cfc527
* 21:30 legoktm@cumin1001: conftool action : set/pooled=inactive; selector: name=thumbor1003.eqiad.wmnet
* 20:15 gwicke: bounced cassandra on restbase1001
* 21:01 ejegg: updated payments-wiki from {{Gerrit|abb2bd9d}} -> {{Gerrit|d1d6f024}}
* 20:05 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Wed Jul  8 20:05:09 UTC 2015 (duration 5m 8s)
* 21:00 mutante: [puppetmaster1001:/var/run/confd-template] $ sudo rm .git-ssh*.err
* 19:32 gwicke: stopped cassandra on restbase1008
* 21:00 mutante: [puppetmaster2001:/var/run/confd-template] $ sudo rm .git-ssh*.err
* 19:27 logmsgbot: twentyafterfour Synchronized php-1.26wmf13: deploying UniversalLanguageSelector commit 2e0990ac9879 (duration: 01m 58s)
* 20:57 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 19:26 urandom: restbase rolling restart
* 20:53 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 18:21 jgage: ran 'kafka preferred-replica-election' to promote analytics1021 back to Leader
* 20:52 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=phab2001-vcs.codfw.wmnet
* 18:05 logmsgbot: twentyafterfour rebuilt wikiversions.cdb and synchronized wikiversions files: group1 wikis to 1.26wmf13
* 20:51 dcausse: restart blazegraph on wdqs1006 (jvm stuck)
* 17:16 moritzm: installed libwmf security updates on various systems
* 20:51 ryankemper@cumin1001: END (PASS) - Cookbook sre.elasticsearch.rolling-operation (exit_code=0) restart with plugin upgrade (3 nodes at a time) for ElasticSearch cluster cloudelastic: cloudelastic plugin upgrade + restart - ryankemper@cumin1001 - [[phab:T295705|T295705]]
* 17:09 gwicke: bounced cassandra on restbase1004
* 20:50 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=phab2001-vcs.codfw.wmnet
* 15:25 mutante: handing over adminship of the "test" mailman list to John F. Lewis (was: Thehelpfulone) due to inactivity
* 20:45 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=phab2001-vcs.codfw.wmnet
* 13:36 logmsgbot: springle Synchronized wmf-config/db-eqiad.php: raise db1041 load (duration: 00m 13s)
* 20:43 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 12:58 paravoid: manually dpkg -P ferm on potassium
* 20:43 jhuneidi@deploy1002: rebuilt and synchronized wikiversions files: all wikis to 1.38.0-wmf.9 refs [[phab:T293950|T293950]]
* 12:52 paravoid: rmmod all iptables/netfilter-related modules from potassium
* 20:39 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 11:23 godog: bounce cassandra on restbase1004, heap space
* 20:31 jhuneidi@deploy1002: Synchronized php: group1 wikis to 1.38.0-wmf.9  refs [[phab:T293950|T293950]] (duration: 01m 03s)
* 11:12 _joe_: mw1153 passed the smoke tests, repooling
* 20:30 jhuneidi@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.38.0-wmf.9  refs [[phab:T293950|T293950]]
* 11:08 godog: bounce cassandra on restbase1004 and restbase1005 'cannot achieve consistency level quorum'
* 20:29 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 10:50 godog: bounce cassandra on restbase1004, death by compaction
* 20:27 jhuneidi@deploy1002: Synchronized php-1.38.0-wmf.9/tests/phpunit/includes/page/PageStoreTest.php: Backport for [[phab:T295931|T295931]] (duration: 01m 03s)
* 09:43 ori: _joe_: starting reimaging of mw1153, depooling it and scheduling downtime (at 9:21 UTC)
* 20:25 jhuneidi@deploy1002: Synchronized php-1.38.0-wmf.9/includes/page/PageStore.php: Backport for [[phab:T295931|T295931]] (duration: 01m 04s)
* 09:42 ori: Nuked /var/lib/carbon/whisper/ResourceLoader on graphite[12]001. Data prior to rollout of I55f0c44cd considered bogus.
* 20:25 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 09:42 ori: morebots, are you OK?
* 20:05 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation restart with plugin upgrade (3 nodes at a time) for ElasticSearch cluster cloudelastic: cloudelastic plugin upgrade + restart - ryankemper@cumin1001 - [[phab:T295705|T295705]]
* 09:41 godog: bounce nutcracker on silver
* 20:01 ryankemper@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) restart with plugin upgrade (3 nodes at a time) for ElasticSearch cluster cloudelastic: cloudelastic plugin upgrade + restart - ryankemper@cumin1001 - [[phab:T295705|T295705]]
* 09:33 _joe_: starting reimaging of mw1153, depooling it and scheduling downtime (at 9:21 UTC)
* 19:53 legoktm@cumin1001: conftool action : set/pooled=no; selector: name=thumbor1004.eqiad.wmnet
* 09:26 hashar: upgraded plugins on jenkins and restarting it
* 19:52 legoktm@cumin1001: conftool action : set/pooled=no; selector: name=thumbor1003.eqiad.wmnet
* 09:06 hashar: Jenkins registering jobs with Zuul
* 19:52 legoktm@cumin1001: conftool action : set/weight=10; selector: name=thumbor1006.eqiad.wmnet
* 08:41 hashar: Jenkins is migrating old build histories. Lot of disk IO happening
* 19:34 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 08:11 hashar: shutdowning Jenkins for upgrade.
* 19:31 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 05:57 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Wed Jul  8 05:57:10 UTC 2015 (duration 57m 9s)
* 19:24 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=phab2001-vcs.codfw.wmnet
* 05:46 logmsgbot: springle Synchronized wmf-config/db-eqiad.php: repool db1041, warm up (duration: 00m 13s)
* 19:21 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|4b4c0bca9aa6bceac86f40f03ad688b9d4481c58}}: Enable DiscussionTools automatic topic subscriptions as beta feature on most wikis ([[phab:T290500|T290500]]) (duration: 01m 04s)
* 02:31 logmsgbot: LocalisationUpdate completed (1.26wmf13) at 2015-07-08 02:31:24+00:00
* 19:20 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 02:16 logmsgbot: LocalisationUpdate completed (1.26wmf12) at 2015-07-08 02:16:50+00:00
* 19:16 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 02:16 logmsgbot: l10nupdate Synchronized php-1.26wmf12/cache/l10n: (no message) (duration: 00m 48s)
* 19:13 twentyafterfour: upgrading php7.3 packages on phab1001
* 19:07 twentyafterfour: rebooting phab2001 to apply updated php and kernel packages
* 19:06 cmjohnson@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host an-test-coord1002.eqiad.wmnet with OS bullseye
* 19:05 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on phab2001.codfw.wmnet with reason: kernel upgrade
* 19:05 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on phab2001.codfw.wmnet with reason: kernel upgrade
* 18:57 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation restart with plugin upgrade (3 nodes at a time) for ElasticSearch cluster cloudelastic: cloudelastic plugin upgrade + restart - ryankemper@cumin1001 - [[phab:T295705|T295705]]
* 18:52 XioNoX: asw-b-codfw> request system reboot member 7 - [[phab:T295118|T295118]]
* 18:49 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host an-test-coord1002.eqiad.wmnet with OS bullseye
* 17:48 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 17:44 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 15:49 XioNoX: asw-b-codfw> request system power-off member 7 - [[phab:T295118|T295118]]
* 15:39 XioNoX: lvs2007:~$ sudo service pybal stop - [[phab:T295118|T295118]]
* 15:36 dzahn@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'miscweb' for release 'main' .
* 15:35 XioNoX: cr2-codfw# set interfaces et-1/0/3 disable - [[phab:T295118|T295118]]
* 15:34 dzahn@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'miscweb' for release 'main' .
* 15:33 btullis@cumin1001: END (PASS) - Cookbook sre.kafka.roll-restart-brokers (exit_code=0) for Kafka A:kafka-jumbo-eqiad cluster: Roll restart of jvm daemons for openjdk upgrade.
* 15:16 hnowlan: roll restarting cassandra on codfw maps for java updates
* 15:14 hnowlan@cumin1001: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:sessionstore: Restarting to pick up Java security updates - hnowlan@cumin1001
* 14:44 dzahn@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'miscweb' for release 'main' .
* 14:38 hnowlan@cumin1001: START - Cookbook sre.cassandra.roll-restart for nodes matching A:sessionstore: Restarting to pick up Java security updates - hnowlan@cumin1001
* 14:37 hnowlan: roll-restarting sessionstore for java updates
* 14:19 moritzm: installing testvm2003
* 13:42 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host testvm2003.codfw.wmnet
* 13:34 moritzm: installing pam bugfix updates on bullseye hosts
* 13:26 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host testvm2003.codfw.wmnet
* 13:22 moritzm: failover ganeti master in test cluster to ganeti-test2002 [[phab:T284811|T284811]]
* 13:08 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
* 13:08 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 12:23 jmm@cumin2002: END (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: cloudcephosd1016.wikimedia.org
* 12:23 jmm@cumin2002: START - Cookbook sre.debmonitor.remove-hosts for 1 hosts: cloudcephosd1016.wikimedia.org
* 12:21 btullis@cumin1001: START - Cookbook sre.kafka.roll-restart-brokers for Kafka A:kafka-jumbo-eqiad cluster: Roll restart of jvm daemons for openjdk upgrade.
* 12:21 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 12:20 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
* 12:20 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 12:18 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 12:16 jmm@cumin2002: END (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: mc1025.eqiad.wmnet
* 12:16 jmm@cumin2002: START - Cookbook sre.debmonitor.remove-hosts for 1 hosts: mc1025.eqiad.wmnet
* 12:16 jmm@cumin2002: END (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: mc1026.eqiad.wmnet
* 12:16 jmm@cumin2002: START - Cookbook sre.debmonitor.remove-hosts for 1 hosts: mc1026.eqiad.wmnet
* 12:15 marostegui: Upgrade dbstore1007 to 10.4.22 [[phab:T290841|T290841]] [[phab:T295970|T295970]]
* 12:15 kartik@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:739550{{!}}Enable Tamil (ta) Section Translation in test wiki (T294223)]] (duration: 01m 05s)
* 12:06 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host lvs6003.drmrs.wmnet with OS buster
* 11:45 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host lvs6002.drmrs.wmnet with OS buster
* 11:29 arturo: aborrero@apt1001:~$ sudo -i reprepro export
* 11:27 mmandere@cumin1001: START - Cookbook sre.hosts.reimage for host lvs6003.drmrs.wmnet with OS buster
* 11:26 arturo: aborrero@apt1001:~$ sudo -i reprepro processincoming default /srv/wikimedia/incoming/python-flask-keystone_0.2~git20201012.b5cd4da-1_amd64.changes ([[phab:T295234|T295234]])
* 11:08 arturo: run aborrero@apt1001:~$ sudo -i reprepro processincoming default
* 11:08 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti-test2002.codfw.wmnet
* 11:07 arturo: added python-flask-oslolog_0.1~git20201012.7803a46-1 to bullseye-wikimedia ([[phab:T295234|T295234]])
* 11:06 arturo: aborrero@apt1001:~ $ for i in $(ll /srv/wikimedia/incoming/ {{!}} grep aborrero {{!}} awk -F' ' '<nowiki>{</nowiki>print $NF<nowiki>}</nowiki>') ; do rm /srv/wikimedia/incoming/$i ; done
* 11:05 mmandere@cumin1001: START - Cookbook sre.hosts.reimage for host lvs6002.drmrs.wmnet with OS buster
* 11:02 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti-test2002.codfw.wmnet
* 10:57 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host lvs6001.drmrs.wmnet with OS buster
* 10:38 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
* 10:38 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 10:21 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti-test2002.codfw.wmnet with OS buster
* 10:17 mmandere@cumin1001: START - Cookbook sre.hosts.reimage for host lvs6001.drmrs.wmnet with OS buster
* 10:12 topranks: Re-pooling eqiad in DNS after completing iBGP policy changes on cr1-eqiad and cr2-eqiad [[phab:T295672|T295672]]
* 10:08 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
* 10:08 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 10:01 moritzm: updating perf on buster hosts
* 10:00 topranks: Re-enabling Equinix IXP port on cr1-eqiad following iBGP changes to address [[phab:T295650|T295650]]
* 09:56 ema: cp4021: repool w/ single backend experiment enabled [[phab:T288106|T288106]]
* 09:54 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti-test2002.codfw.wmnet with OS buster
* 09:49 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
* 09:49 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 09:41 ema: cp4021: stop ats-be and clear its cache [[phab:T288106|T288106]]
* 09:35 ema: cp4021: depool to enable single backend experiment [[phab:T288106|T288106]]
* 09:32 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp1090.eqiad.wmnet with OS buster
* 09:32 vgutierrez: pool cp1090 (upload) running HAProxy as TLS terminator - [[phab:T290005|T290005]]
* 09:18 jayme: systemctl start prune-production-images.service on deneb - [[phab:T287222|T287222]]
* 08:48 vgutierrez@cumin1001: START - Cookbook sre.hosts.reimage for host cp1090.eqiad.wmnet with OS buster
* 08:46 vgutierrez: depool cp1090 to be reimaged as cache::upload_haproxy - [[phab:T290005|T290005]]
* 08:45 moritzm: installing mariadb-10.3 security updates on buster (as packaged in Debian, not the wmf-internal packages)
* 08:27 topranks: De-pool of Eqiad seems to be ok, transit/peering/transport links changed BW profile but nothing maxed, total LVS connections steady but have shifted to codfw.  Proceeding to reconfigure iBGP policy on cr1-eqiad and cr2-eqiad maually.
* 08:01 topranks: Depooling eqiad in authdns to allow for reconfiguration of CR routers on site ([[phab:T295672|T295672]])
* 07:45 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 07:41 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 07:35 ladsgroup@deploy1002: Synchronized php-1.38.0-wmf.9/maintenance/migrateRevisionActorTemp.php: Backport: [[gerrit:739636{{!}}maintenance: Add waitForReplication and sleep in migrateRevisionActorTemp (T275246)]] (duration: 01m 04s)
* 07:35 marostegui@cumin1001: dbctl commit (dc=all): 'db1112 (re)pooling @ 100%: Repool after HW maintenance', diff saved to https://phabricator.wikimedia.org/P17772 and previous config saved to /var/cache/conftool/dbconfig/20211118-073507-root.json
* 07:20 marostegui@cumin1001: dbctl commit (dc=all): 'db1112 (re)pooling @ 75%: Repool after HW maintenance', diff saved to https://phabricator.wikimedia.org/P17771 and previous config saved to /var/cache/conftool/dbconfig/20211118-072004-root.json
* 07:06 marostegui@cumin1001: dbctl commit (dc=all): 'Remove watchlist from s5 eqiad [[phab:T263127|T263127]]', diff saved to https://phabricator.wikimedia.org/P17770 and previous config saved to /var/cache/conftool/dbconfig/20211118-070620-marostegui.json
* 07:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1156 (re)pooling @ 100%: After fixing GRANTs', diff saved to https://phabricator.wikimedia.org/P17769 and previous config saved to /var/cache/conftool/dbconfig/20211118-070559-root.json
* 07:05 marostegui@cumin1001: dbctl commit (dc=all): 'db1112 (re)pooling @ 50%: Repool after HW maintenance', diff saved to https://phabricator.wikimedia.org/P17768 and previous config saved to /var/cache/conftool/dbconfig/20211118-070500-root.json
* 06:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1156 (re)pooling @ 75%: After fixing GRANTs', diff saved to https://phabricator.wikimedia.org/P17767 and previous config saved to /var/cache/conftool/dbconfig/20211118-065055-root.json
* 06:49 marostegui@cumin1001: dbctl commit (dc=all): 'db1112 (re)pooling @ 40%: Repool after HW maintenance', diff saved to https://phabricator.wikimedia.org/P17766 and previous config saved to /var/cache/conftool/dbconfig/20211118-064957-root.json
* 06:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1156 (re)pooling @ 25%: After fixing GRANTs', diff saved to https://phabricator.wikimedia.org/P17765 and previous config saved to /var/cache/conftool/dbconfig/20211118-063552-root.json
* 06:34 marostegui@cumin1001: dbctl commit (dc=all): 'db1112 (re)pooling @ 25%: Repool after HW maintenance', diff saved to https://phabricator.wikimedia.org/P17764 and previous config saved to /var/cache/conftool/dbconfig/20211118-063453-root.json
* 06:31 Amir1: revoked all grants from wikiadmin and gave back an explicit list on db1102:3312 ([[phab:T249683|T249683]])
* 06:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1156 (re)pooling @ 10%: After fixing GRANTs', diff saved to https://phabricator.wikimedia.org/P17763 and previous config saved to /var/cache/conftool/dbconfig/20211118-062048-root.json
* 06:19 marostegui@cumin1001: dbctl commit (dc=all): 'db1112 (re)pooling @ 20%: Repool after HW maintenance', diff saved to https://phabricator.wikimedia.org/P17762 and previous config saved to /var/cache/conftool/dbconfig/20211118-061949-root.json
* 06:17 Amir1: revoked all grants from wikiadmin and gave back an explicit list on db1156 ([[phab:T249683|T249683]])
* 06:04 marostegui@cumin1001: dbctl commit (dc=all): 'db1112 (re)pooling @ 10%: Repool after HW maintenance', diff saved to https://phabricator.wikimedia.org/P17761 and previous config saved to /var/cache/conftool/dbconfig/20211118-060446-root.json
* 05:49 marostegui@cumin1001: dbctl commit (dc=all): 'db1112 (re)pooling @ 5%: Repool after HW maintenance', diff saved to https://phabricator.wikimedia.org/P17760 and previous config saved to /var/cache/conftool/dbconfig/20211118-054942-root.json
* 05:47 marostegui: Upgrade clouddb1014
* 05:34 marostegui@cumin1001: dbctl commit (dc=all): 'db1112 (re)pooling @ 1%: Repool after HW maintenance', diff saved to https://phabricator.wikimedia.org/P17759 and previous config saved to /var/cache/conftool/dbconfig/20211118-053438-root.json
* 05:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1131 due to network issues ([[phab:T295952|T295952]])', diff saved to https://phabricator.wikimedia.org/P17758 and previous config saved to /var/cache/conftool/dbconfig/20211118-050802-ladsgroup.json
* 04:23 dzahn@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'miscweb' for release 'main' .
* 02:08 legoktm@cumin1001: conftool action : set/pooled=no; selector: name=thumbor2006.codfw.wmnet
* 02:08 legoktm@cumin1001: conftool action : set/pooled=no; selector: name=thumbor2005.codfw.wmnet
* 01:56 legoktm@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thumbor2006.codfw.wmnet
* 01:48 legoktm@cumin1001: START - Cookbook sre.hosts.reboot-single for host thumbor2006.codfw.wmnet
* 01:47 legoktm@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thumbor2005.codfw.wmnet
* 01:42 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 01:42 legoktm@cumin1001: START - Cookbook sre.hosts.reboot-single for host thumbor2005.codfw.wmnet
* 01:39 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 01:35 ladsgroup@deploy1002: Synchronized wmf-config/InitialiseSettings.php: NOOP - Config: [[gerrit:739633{{!}}Revert "Stop setting wgActorTableSchemaMigrationStage, no longer read in core" (T275246)]] (duration: 01m 04s)
* 00:54 legoktm@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host thumbor2006.codfw.wmnet with OS stretch
* 00:28 legoktm@cumin1001: START - Cookbook sre.hosts.reimage for host thumbor2006.codfw.wmnet with OS stretch
* 00:26 legoktm@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host thumbor2005.codfw.wmnet with OS stretch
* 00:22 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 00:20 ryankemper: [[phab:T290902|T290902]] Test host looks good, proceeding to rest of fleet `ryankemper@cumin1001:~$ sudo cumin -b 4 '*elastic*' 'sudo run-puppet-agent --force'`
* 00:18 urbanecm: UTC late B&C finished
* 00:18 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 00:18 ryankemper: [[phab:T290902|T290902]] Merged https://gerrit.wikimedia.org/r/c/operations/puppet/+/739379; running puppet agent on arbitrary elastic host: `ryankemper@elastic1051:~$ sudo run-puppet-agent --force`
* 00:17 ryankemper: [[phab:T290902|T290902]] Disabling puppet across all elastic*: `ryankemper@cumin1001:~$ sudo cumin '*elastic*' 'sudo disable-puppet "Merging https://gerrit.wikimedia.org/r/c/operations/puppet/+/739379"'`
* 00:16 urbanecm@deploy1002: Synchronized wmf-config/CommonSettings.php: {{Gerrit|5110fe77bb982cca82c8d474339a2b73d02c8024}}: Migrate wmfHostnames to wmgHostnames ([[phab:T45956|T45956]]) (duration: 01m 03s)
* 00:12 urbanecm: Purge https://en.wikipedia.org/static/images/project-logos/brwikimedia.png and respective HD variants
* 00:08 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 00:08 urbanecm@deploy1002: Synchronized static/images/project-logos: {{Gerrit|59c3fe66a0d140ae21f7269150a256a5e9786b24}}: Lossless optimization of the brwikimedia logo (duration: 01m 04s)
* 00:04 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 00:00 legoktm@cumin1001: START - Cookbook sre.hosts.reimage for host thumbor2005.codfw.wmnet with OS stretch


== July 7 ==
== 2021-11-17 ==
* 23:54 jgage: kafka brokers 1018 & 1021 were demoted; i have triggered a leader election and they are leaders again
* 23:53 eileen: * revision {{Gerrit|8054869b}} -> {{Gerrit|b3e2a122}} (latest)
* 23:05 logmsgbot: catrope Synchronized visualeditor-default.dblist: Enable VE by default on labswiki (duration: 00m 12s)
* 23:51 legoktm@cumin1001: conftool action : set/pooled=yes; selector: name=thumbor1003.eqiad.wmnet
* 21:56 hoo: Restarted hhvm on mw1003 "Fatal error: Function already defined: wmfLoadInitialiseSettings in /srv/mediawiki/wmf-config/CommonSettings.php on line 187"
* 23:49 legoktm@cumin1001: conftool action : set/pooled=yes; selector: name=thumbor1001.eqiad.wmnet
* 21:16 logmsgbot: krinkle Synchronized php-1.26wmf13/includes/resourceloader/ResourceLoader.php: T104769 (duration: 00m 13s)
* 23:45 legoktm@cumin1001: conftool action : set/pooled=no; selector: name=thumbor1001.eqiad.wmnet
* 20:53 logmsgbot: twentyafterfour rebuilt wikiversions.cdb and synchronized wikiversions files: group0 to 1.26wmf13
* 23:45 legoktm@cumin1001: conftool action : set/weight=5; selector: name=thumbor1006.eqiad.wmnet
* 20:00 logmsgbot: twentyafterfour Finished scap: testwiki to php-1.26wmf13 and rebuild l10n cache (duration: 39m 41s)
* 23:44 legoktm@cumin1001: conftool action : set/pooled=yes; selector: name=thumbor1006.eqiad.wmnet
* 19:47 gwicke: restarted cassandra on restbase1005
* 23:43 legoktm@cumin1001: conftool action : set/pooled=no; selector: name=thumbor1006.eqiad.wmnet
* 19:20 logmsgbot: twentyafterfour Started scap: testwiki to php-1.26wmf13 and rebuild l10n cache
* 23:35 legoktm@cumin1001: conftool action : set/weight=10; selector: name=thumbor1005.eqiad.wmnet
* 19:15 moritzm: installed PHP security updates on all trusty hosts
* 23:12 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 18:58 ejegg: updated payments from a17ee221db0dbde70c92e24fc188379b6dbad613 to ec34ebf61e5962f66b807abdcb519ff323d41e8e
* 23:09 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 18:08 twentyafterfour: restarted apache2 on iridium (phab hotfix)
* 22:42 mutante: miscweb1002/2002 - moved /srv/deployment/scholarships to /root/ ([[phab:T243037|T243037]])
* 17:10 robh: OTRS update appears to be functioning normally. As such, ending maintenance window.
* 21:42 ayounsi@deploy1002: Finished deploy [homer/deploy@dc007aa]: Homer CR738905 (duration: 01m 27s)
* 17:06 robh: otrs is now using the new sha256 cert
* 21:41 ayounsi@deploy1002: Started deploy [homer/deploy@dc007aa]: Homer CR738905
* 17:00 robh: starting otrs maint window
* 21:07 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 16:58 _joe_: restarted HHVM on mw1026, near to OOM
* 21:00 pt1979@cumin2002: START - Cookbook sre.dns.netbox
* 16:47 twentyafterfour: applied hotfix for phabricator bug: https://secure.phabricator.com/D13544
* 20:47 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 16:36 mutante: protactinium - manual iptables rules replaced by puppet/ferm rules
* 20:45 pt1979@cumin2002: START - Cookbook sre.dns.netbox
* 16:11 logmsgbot: thcipriani Synchronized php-1.26wmf12/extensions/ContentTranslation/extension.json: Remove default value for ContentTranslationCampaigns (duration: 00m 12s)
* 20:42 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 15:33 jynus: manually editing table mediawiki.ipblocks to fully solve a former software bug
* 20:38 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 15:12 Jeff_Green: ptr records for frack/codfw and authdns-update
* 20:33 jhuneidi@deploy1002: rebuilt and synchronized wikiversions files: Revert "group1 wikis to 1.38.0-wmf.7"
* 15:10 logmsgbot: thcipriani Synchronized wmf-config/InitialiseSettings.php: SWAT: CX: Enable ContentTranslation in enwiki [[gerrit:222991]] (duration: 00m 13s)
* 20:23 jhuneidi@deploy1002: Synchronized php: group1 wikis to 1.38.0-wmf.9  refs [[phab:T293950|T293950]] (duration: 01m 03s)
* 14:21 jynus: dropping optin_survey_old table from enwiki
* 20:22 jhuneidi@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.38.0-wmf.9  refs [[phab:T293950|T293950]]
* 13:23 akosiaris: restarting gitblit on antimony
* 19:48 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 11:31 mobrovac: restbase restarted cassandra on rb1005
* 19:44 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 11:26 godog: restart cassandra on restbase1004, heap exhausted
* 19:42 ladsgroup@deploy1002: Synchronized php-1.38.0-wmf.9/includes/export/WikiExporter.php: Backport: [[gerrit:739491{{!}}export: Ignore rev_page_id index (T285149)]] (duration: 01m 04s)
* 10:49 godog: restarted cassandra on restbase1005, mutations through the roof
* 19:24 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 08:27 godog: set operations/puppet/cassandra git submodule repo as hidden
* 19:20 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 06:11 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Tue Jul  7 06:11:46 UTC 2015 (duration 11m 45s)
* 19:10 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 05:51 logmsgbot: krinkle Synchronized php-1.26wmf12/extensions/WikiEditor/modules/jquery.wikiEditor.toolbar.js: I3e965dda1c4 (duration: 00m 12s)
* 19:10 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|8e167a53cec3c3b216100bab686f28e09c424435}}: Disable local file upload on the Chinese Wikisource ([[phab:T295265|T295265]]) (duration: 01m 05s)
* 02:27 logmsgbot: LocalisationUpdate completed (1.26wmf12) at 2015-07-07 02:27:55+00:00
* 19:09 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 02:24 logmsgbot: l10nupdate Synchronized php-1.26wmf12/cache/l10n: (no message) (duration: 06m 09s)
* 19:08 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 01:12 ori: Re-pooled mw1152 at 20:46 UTC, did not log it then.
* 19:05 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|7b3a1d976cb1ef931c809b3670fb8c8b3f3a56e7}}: Make reply tool available as opt-out on commonswiki ([[phab:T295838|T295838]]) (duration: 01m 05s)
* 00:41 springle: upgrade db1041 trusty
* 19:05 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 00:37 logmsgbot: krenair Synchronized php-1.26wmf12/extensions/CentralAuth/includes/CreateLocalAccountJob.php: https://gerrit.wikimedia.org/r/#/c/223211/ (duration: 00m 13s)
* 18:58 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp2042.codfw.wmnet with OS buster
* 18:57 ejegg: updated fundraising CiviCRM from {{Gerrit|9c5f0b69}} -> {{Gerrit|8054869b}}
* 18:56 vgutierrez: pool cp2042 (upload) running HAProxy as TLS terminator - [[phab:T290005|T290005]]
* 18:06 vgutierrez@cumin1001: START - Cookbook sre.hosts.reimage for host cp2042.codfw.wmnet with OS buster
* 18:05 bblack@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 18:01 bblack@cumin1001: START - Cookbook sre.dns.netbox
* 17:59 vgutierrez: depool cp2042 to be reimaged as an HAProxy cache upload node - [[phab:T290005|T290005]]
* 17:41 btullis@cumin1001: END (PASS) - Cookbook sre.kafka.roll-restart-brokers (exit_code=0) for Kafka A:kafka-test-eqiad cluster: Roll restart of jvm daemons for openjdk upgrade.
* 17:25 cmooney@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host rpki2002.codfw.wmnet
* 17:11 XioNoX: repool Telia eqiad-codfw transport
* 17:10 cmooney@cumin2002: START - Cookbook sre.ganeti.makevm for new host rpki2002.codfw.wmnet
* 16:34 cmooney@cumin2002: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts rpki2001.codfw.wmnet
* 16:32 mutante: LDAP - added jkieserman to wmf ([[phab:T295693|T295693]])
* 16:28 cmooney@cumin2002: START - Cookbook sre.hosts.decommission for hosts rpki2001.codfw.wmnet
* 16:28 XioNoX: drain Telia eqiad-codfw link
* 16:27 cmooney@cumin2002: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=99) for hosts rpki2001.codfw.wmnet
* 16:21 XioNoX: move cr1-codfw<->cr2-eqdfw link to BO cable
* 16:19 cmooney@cumin2002: START - Cookbook sre.hosts.decommission for hosts rpki2001.codfw.wmnet
* 16:06 XioNoX: move cr1-codfw:xe-5/3/0 to BO cable
* 16:04 XioNoX: re-enable Telia BGP on cr1-codfw
* 16:01 btullis@cumin1001: START - Cookbook sre.kafka.roll-restart-brokers for Kafka A:kafka-test-eqiad cluster: Roll restart of jvm daemons for openjdk upgrade.
* 15:59 bblack: netbox: added ganeti01 and ganeti02 cluster definitions for drmrs
* 15:58 XioNoX: disable Telia BGP on cr1-codfw
* 15:55 XioNoX: move codfw-ulsfo link to break-out cable
* 15:46 mutante: restarting pybal on lvs1015
* 15:43 _joe_: restarting pybal on lvs2009
* 15:42 mutante: restarting pybal on lvs1016
* 15:39 _joe_: restarting pybal on lvs2010
* 15:35 XioNoX: drain ulsfo-codfw link
* 14:47 moritzm: installing perl bugfix updates from Bullseye point release
* 14:22 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5 days, 0:00:00 on ganeti-test[2001-2003].codfw.wmnet with reason: Ganeti update tests
* 14:22 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 5 days, 0:00:00 on ganeti-test[2001-2003].codfw.wmnet with reason: Ganeti update tests
* 13:49 marostegui@cumin1001: dbctl commit (dc=all): 'Change weights on s5 special slaves in eqiad [[phab:T263127|T263127]]', diff saved to https://phabricator.wikimedia.org/P17755 and previous config saved to /var/cache/conftool/dbconfig/20211117-134942-marostegui.json
* 13:48 marostegui@cumin1001: dbctl commit (dc=all): 'Remove recentchanges from s5 eqiad [[phab:T263127|T263127]]', diff saved to https://phabricator.wikimedia.org/P17754 and previous config saved to /var/cache/conftool/dbconfig/20211117-134835-marostegui.json
* 13:20 aborrero@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host cloudbackup1001-dev.eqiad.wmnet
* 13:10 aborrero@cumin1001: START - Cookbook sre.ganeti.makevm for new host cloudbackup1001-dev.eqiad.wmnet
* 13:02 Lucas_WMDE: UTC morning backport+config window done
* 12:54 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti-test2002.codfw.wmnet
* 12:50 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti-test2002.codfw.wmnet
* 12:26 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 12:24 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:739467{{!}}Enable disambiguator notifications on 6 Wikipedias (T293319)]] (duration: 01m 04s)
* 12:22 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 12:22 btullis@cumin1001: END (PASS) - Cookbook sre.presto.roll-restart-workers (exit_code=0) for Presto analytics cluster: Roll restart of all Presto's jvm daemons.
* 12:17 topranks: Re-pooling ulsfo after completing routing changes on cr3-ulsfo and cr4-ulsfo ([[phab:T295672|T295672]])
* 12:12 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 12:11 btullis@cumin1001: START - Cookbook sre.presto.roll-restart-workers for Presto analytics cluster: Roll restart of all Presto's jvm daemons.