You are browsing a read-only backup copy of Wikitech. The live site can be found at wikitech.wikimedia.org

Server Admin Log: Difference between revisions

From Wikitech-static
Jump to navigation Jump to search
imported>Stashbot
(derick@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Set archive namespaces on foundationwiki to 'noindex,follow' (T288763) (duration: 00m 59s))
imported>Stashbot
(ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1140.eqiad.wmnet with reason: Maintenance)
 
(296 intermediate revisions by 3 users not shown)
Line 1: Line 1:
== 2021-08-12 ==
== 2022-07-02 ==
* 23:50 derick@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:712732{{!}}Set archive namespaces on foundationwiki to 'noindex,follow' (T288763)]] (duration: 00m 59s)
* 00:45 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1140.eqiad.wmnet with reason: Maintenance
* 23:47 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 00:45 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1140.eqiad.wmnet with reason: Maintenance
* 23:45 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 23:38 cjming@deploy1002: Synchronized php-1.37.0-wmf.18/extensions/GrowthExperiments: Backport: [[gerrit:711719{{!}}Add Link: fix invalidation on non-addlink edit (T283606)]] (duration: 01m 00s)
* 23:34 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 23:33 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 22:09 tgr: [[phab:T283867|T283867]] running userOptions.php on Growth wikis as per [[phab:T283867|T283867]]#7280296
* 22:00 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 21:59 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 21:57 ladsgroup@deploy1002: Synchronized php-1.37.0-wmf.18/extensions/SpamBlacklist/includes/SpamBlacklistHooks.php: Backport: [[gerrit:711721{{!}}Don't generate HTML when asking for ParserOutput (T288639)]] (duration: 00m 58s)
* 21:52 urbanecm: Run `mwscript extensions/Translate/scripts/refresh-translatable-pages.php --wiki=$WIKI --jobqueue` for a bunch of Translate-enabled wikis ([[phab:T288683|T288683]])
* 21:32 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 21:31 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 21:30 twentyafterfour@deploy1002: rebuilt and synchronized wikiversions files: all wikis to 1.37.0-wmf.18  refs [[phab:T281159|T281159]]
* 21:13 twentyafterfour@deploy1002: Synchronized php-1.37.0-wmf.18/extensions/DiscussionTools/includes/Notifications/EventDispatcher.php: sync {{Gerrit|Ic27418a0ec976347be5fa586bbd32cc4a0d8d511}} to unblock the train refs [[phab:T288775|T288775]] and [[phab:T281159|T281159]] (duration: 01m 07s)
* 20:56 urbanecm: [urbanecm@mwmaint2002 /srv/mediawiki/php]$ mwscript extensions/Translate/scripts/refresh-translatable-pages.php --wiki=testwikidatawiki --jobqueue # [[phab:T288683|T288683]], errored out
* 20:54 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 20:54 urbanecm: [urbanecm@mwmaint2002 /srv/mediawiki/php]$ mwscript extensions/Translate/scripts/refresh-translatable-pages.php --wiki=testwiki --jobqueue # [[phab:T288683|T288683]]
* 20:53 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 20:24 urbanecm: [urbanecm@mwmaint2002 /srv/mediawiki/php]$ mwscript extensions/Translate/scripts/refresh-translatable-pages.php --wiki=wikimaniawiki --jobqueue # [[phab:T288683|T288683]]
* 20:13 urbanecm: [urbanecm@mwmaint2002 /srv/mediawiki/php]$ mwscript extensions/Translate/scripts/refresh-translatable-pages.php --wiki=wikimaniawiki --jobqueue # [[phab:T288683|T288683]]
* 19:43 twentyafterfour@deploy1002: Synchronized php-1.37.0-wmf.18/extensions/Translate/src/PageTranslation/TranslationPage.php: sync {{Gerrit|I2f46abb20145630c27449ce57f1256e92f440144}} which should fix [[phab:T288683|T288683]] & [[phab:T288700|T288700]] thus unblocking the train: [[phab:T281159|T281159]] (duration: 01m 07s)
* 19:36 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 19:34 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 18:52 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 18:49 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 16:47 dzahn@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host doh4002.wikimedia.org
* 16:37 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm for new host doh4002.wikimedia.org
* 16:33 mbsantos@deploy1002: Finished deploy [tilerator/deploy@b88cf50]: maps1005:  (duration: 00m 15s)
* 16:32 mbsantos@deploy1002: Started deploy [tilerator/deploy@b88cf50]: maps1005:
* 16:32 effie: enabling puppet on mediawiki servers  && rolling restart mcrouter
* 16:31 mbsantos@deploy1002: Finished deploy [tilerator/deploy@b88cf50]: maps1006:  (duration: 00m 15s)
* 16:31 mbsantos@deploy1002: Started deploy [tilerator/deploy@b88cf50]: maps1006:
* 16:31 mbsantos@deploy1002: Finished deploy [tilerator/deploy@b88cf50]: maps1007:  (duration: 00m 15s)
* 16:30 mbsantos@deploy1002: Started deploy [tilerator/deploy@b88cf50]: maps1007:
* 16:29 mbsantos@deploy1002: Finished deploy [tilerator/deploy@b88cf50]: maps1008:  (duration: 00m 15s)
* 16:29 mbsantos@deploy1002: Started deploy [tilerator/deploy@b88cf50]: maps1008:
* 16:29 mbsantos@deploy1002: Finished deploy [tilerator/deploy@b88cf50]: maps1009:  (duration: 00m 17s)
* 16:28 mbsantos@deploy1002: Started deploy [tilerator/deploy@b88cf50]: maps1009:
* 16:27 mbsantos@deploy1002: Finished deploy [tilerator/deploy@b88cf50]: maps1010:  (duration: 00m 15s)
* 16:27 mbsantos@deploy1002: Started deploy [tilerator/deploy@b88cf50]: maps1010:
* 16:26 mbsantos@deploy1002: Finished deploy [tilerator/deploy@b88cf50]: maps2005:  (duration: 00m 24s)
* 16:26 mbsantos@deploy1002: Started deploy [tilerator/deploy@b88cf50]: maps2005:
* 16:24 mbsantos@deploy1002: Finished deploy [tilerator/deploy@b88cf50]: maps2006:  (duration: 00m 23s)
* 16:24 mbsantos@deploy1002: Started deploy [tilerator/deploy@b88cf50]: maps2006:
* 16:23 mbsantos@deploy1002: Finished deploy [tilerator/deploy@b88cf50]: maps2007:  (duration: 00m 27s)
* 16:23 mbsantos@deploy1002: Started deploy [tilerator/deploy@b88cf50]: maps2007:
* 16:22 mbsantos@deploy1002: Finished deploy [tilerator/deploy@b88cf50]: maps2008:  (duration: 00m 24s)
* 16:21 mbsantos@deploy1002: Started deploy [tilerator/deploy@b88cf50]: maps2008:
* 16:16 mbsantos@deploy1002: Finished deploy [tilerator/deploy@b88cf50]: maps2009:  (duration: 00m 24s)
* 16:15 mbsantos@deploy1002: Started deploy [tilerator/deploy@b88cf50]: maps2009:
* 16:14 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
* 16:14 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 16:14 mbsantos@deploy1002: Finished deploy [tilerator/deploy@b88cf50]: maps2010:  (duration: 00m 23s)
* 16:14 mbsantos@deploy1002: Started deploy [tilerator/deploy@b88cf50]: maps2010:
* 16:14 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
* 16:14 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 16:13 mbsantos@deploy1002: Finished deploy [tilerator/deploy@b88cf50]: Deploy tilerator 1.1.7-beta.5 (duration: 02m 30s)
* 16:10 mbsantos@deploy1002: Started deploy [tilerator/deploy@b88cf50]: Deploy tilerator 1.1.7-beta.5
* 15:50 papaul: powerdown ms-be2060 for relocation
* 15:49 mutante: netbox - deleted 2620:0:863:1:198:35:26:6/64 (along with 198.35.26.6) due to the previous error when running makevm cookbook ([[phab:T288630|T288630]])
* 15:47 mutante: netbox - deleted 198.35.26.6 (doh4002)
* 15:44 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 15:37 dzahn@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host doh4002.wikimedia.org
* 15:36 pt1979@cumin2002: START - Cookbook sre.dns.netbox
* 15:35 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm for new host doh4002.wikimedia.org
* 15:33 moritzm: importing openjdk-8 8u302-b08-1+deb11u1 to apt.wikimedia.org/component/jdk8  [[phab:T287960|T287960]]
* 15:10 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts druid1002.eqiad.wmnet
* 15:07 filippo@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on thanos-fe1003.eqiad.wmnet with reason: REIMAGE
* 15:04 filippo@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on thanos-fe1003.eqiad.wmnet with reason: REIMAGE
* 15:00 btullis@cumin1001: START - Cookbook sre.hosts.decommission for hosts druid1002.eqiad.wmnet
* 14:48 papaul: reset to factory ps-test-d8-codfw
* 14:35 filippo@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on thanos-fe1002.eqiad.wmnet with reason: REIMAGE
* 14:33 filippo@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on thanos-fe1002.eqiad.wmnet with reason: REIMAGE
* 14:33 papaul: reset to factory ps2-test-d8-codfw
* 14:25 hnowlan: reenabling puppet on P:cassandra
* 13:57 hnowlan: disabling puppet on P:cassandra to test removal of cassandra-metrics-agent
* 13:50 effie: disable puppet on mediawiki hosts to merge 705852
* 13:39 hnowlan@cumin1001: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:sessionstore: Restarting to pick up Java security updates - hnowlan@cumin1001
* 13:31 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts druid1003.eqiad.wmnet
* 13:20 btullis@cumin1001: START - Cookbook sre.hosts.decommission for hosts druid1003.eqiad.wmnet
* 13:03 hnowlan@cumin1001: START - Cookbook sre.cassandra.roll-restart for nodes matching A:sessionstore: Restarting to pick up Java security updates - hnowlan@cumin1001
* 12:43 godog: upgrade NIC firmware on thanos-be2* / thanos-fe2* - [[phab:T286722|T286722]]
* 12:28 hnowlan@cumin1001: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:restbase-codfw: Restarting to pick up Java security updates - hnowlan@cumin1001
* 12:23 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on thanos-fe1001.eqiad.wmnet with reason: REIMAGE
* 12:18 filippo@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on thanos-fe1001.eqiad.wmnet with reason: REIMAGE
* 12:09 godog: upgrade NIC firmware on thanos-be1* - [[phab:T286722|T286722]]
* 12:08 godog: upgrade NIC firmware on thanos-fe100[34] - [[phab:T286722|T286722]]
* 12:04 godog: upgrade NIC firmware on thanos-fe100[12] - [[phab:T286722|T286722]]
* 11:57 moritzm: installing openexr security updates
* 11:47 moritzm: installing bluez security updates on buster
* 10:22 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Holger Knust out of all services on: 1743 hosts
* 10:22 jmm@cumin2002: START - Cookbook sre.idm.logout Logging Holger Knust out of all services on: 1743 hosts
* 10:18 marostegui@cumin1001: dbctl commit (dc=all): 'Pool db2107 into API', diff saved to https://phabricator.wikimedia.org/P17016 and previous config saved to /var/cache/conftool/dbconfig/20210812-101840-marostegui.json
* 10:18 mvolz@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'citoid' for release 'production' .
* 10:13 mvolz@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'citoid' for release 'production' .
* 10:08 mvolz@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'citoid' for release 'staging' .
* 09:49 hnowlan@cumin1001: START - Cookbook sre.cassandra.roll-restart for nodes matching A:restbase-codfw: Restarting to pick up Java security updates - hnowlan@cumin1001
* 09:38 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 09:36 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 09:31 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.37.0-wmf.18/extensions/Wikibase/: Backport: [[gerrit:711714{{!}}Revert "Inject NamespaceInfo into EntitySourceDefinitionsConfigParser" (T288724)]] (2/2) (duration: 01m 12s)
* 09:30 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 8 hosts with reason: Reconfiguring replication tree [[phab:T284825|T284825]]
* 09:30 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 8 hosts with reason: Reconfiguring replication tree [[phab:T284825|T284825]]
* 09:30 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 09:29 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.37.0-wmf.18/extensions/Wikibase/data-access/: Backport: [[gerrit:711714{{!}}Revert "Inject NamespaceInfo into EntitySourceDefinitionsConfigParser" (T288724)]] (1/2) (duration: 01m 08s)
* 09:29 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 09:29 marostegui@cumin1001: dbctl commit (dc=all): 'db2107 (re)pooling @ 100%: After reimage', diff saved to https://phabricator.wikimedia.org/P17015 and previous config saved to /var/cache/conftool/dbconfig/20210812-092909-root.json
* 09:28 kormat: reconfiguring replication tree for pc1 [[phab:T284825|T284825]]
* 09:27 kormat@deploy1002: Synchronized wmf-config/ProductionServices.php: Promote pc2011 to primary of pc1 [[phab:T284825|T284825]] (duration: 01m 10s)
* 09:14 marostegui@cumin1001: dbctl commit (dc=all): 'db2107 (re)pooling @ 80%: After reimage', diff saved to https://phabricator.wikimedia.org/P17014 and previous config saved to /var/cache/conftool/dbconfig/20210812-091406-root.json
* 08:59 marostegui@cumin1001: dbctl commit (dc=all): 'db2107 (re)pooling @ 60%: After reimage', diff saved to https://phabricator.wikimedia.org/P17013 and previous config saved to /var/cache/conftool/dbconfig/20210812-085902-root.json
* 08:57 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 08:56 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 08:55 dcaro@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on cloudservices[1003-1004].wikimedia.org with reason: [[phab:T288725|T288725]]
* 08:55 dcaro@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on cloudservices[1003-1004].wikimedia.org with reason: [[phab:T288725|T288725]]
* 08:53 kormat@deploy1002: Synchronized wmf-config/ProductionServices.php: Adding new pc hosts (duration: 01m 09s)
* 08:48 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host copernicium.wikimedia.org
* 08:48 jmm@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host theemin.codfw.wmnet
* 08:43 marostegui@cumin1001: dbctl commit (dc=all): 'db2107 (re)pooling @ 50%: After reimage', diff saved to https://phabricator.wikimedia.org/P17012 and previous config saved to /var/cache/conftool/dbconfig/20210812-084359-root.json
* 08:43 jmm@cumin1001: START - Cookbook sre.hosts.reboot-single for host theemin.codfw.wmnet
* 08:43 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host copernicium.wikimedia.org
* 08:41 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host people1003.eqiad.wmnet
* 08:38 jmm@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cumin2002.codfw.wmnet
* 08:37 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host people1003.eqiad.wmnet
* 08:29 jmm@cumin1001: START - Cookbook sre.hosts.reboot-single for host cumin2002.codfw.wmnet
* 08:28 marostegui@cumin1001: dbctl commit (dc=all): 'db2107 (re)pooling @ 40%: After reimage', diff saved to https://phabricator.wikimedia.org/P17011 and previous config saved to /var/cache/conftool/dbconfig/20210812-082855-root.json
* 08:21 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host people2002.codfw.wmnet
* 08:18 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host people2002.codfw.wmnet
* 08:13 marostegui@cumin1001: dbctl commit (dc=all): 'db2107 (re)pooling @ 30%: After reimage', diff saved to https://phabricator.wikimedia.org/P17010 and previous config saved to /var/cache/conftool/dbconfig/20210812-081351-root.json
* 07:58 marostegui@cumin1001: dbctl commit (dc=all): 'db2107 (re)pooling @ 20%: After reimage', diff saved to https://phabricator.wikimedia.org/P17009 and previous config saved to /var/cache/conftool/dbconfig/20210812-075848-root.json
* 07:58 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sretest1002.eqiad.wmnet
* 07:53 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host sretest1002.eqiad.wmnet
* 07:52 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-fe2001.codfw.wmnet
* 07:46 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host thanos-fe2001.codfw.wmnet
* 07:43 marostegui@cumin1001: dbctl commit (dc=all): 'db2107 (re)pooling @ 15%: After reimage', diff saved to https://phabricator.wikimedia.org/P17008 and previous config saved to /var/cache/conftool/dbconfig/20210812-074344-root.json
* 07:40 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ldap-replica2006.wikimedia.org
* 07:38 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ldap-replica2006.wikimedia.org
* 07:36 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ldap-replica2005.wikimedia.org
* 07:34 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ldap-replica2005.wikimedia.org
* 07:32 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ldap-replica1004.wikimedia.org
* 07:30 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ldap-replica1004.wikimedia.org
* 07:28 marostegui@cumin1001: dbctl commit (dc=all): 'db2107 (re)pooling @ 10%: After reimage', diff saved to https://phabricator.wikimedia.org/P17007 and previous config saved to /var/cache/conftool/dbconfig/20210812-072841-root.json
* 07:26 godog: temp upgrade thanos to 0.22.0 on thanos-fe2001 to help debug a potential upstream issue
* 07:25 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ldap-replica1003.wikimedia.org
* 07:23 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ldap-replica1003.wikimedia.org
* 07:21 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host failoid1002.eqiad.wmnet
* 07:17 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host failoid1002.eqiad.wmnet
* 07:16 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host failoid2002.codfw.wmnet
* 07:13 marostegui@cumin1001: dbctl commit (dc=all): 'db2107 (re)pooling @ 5%: After reimage', diff saved to https://phabricator.wikimedia.org/P17006 and previous config saved to /var/cache/conftool/dbconfig/20210812-071337-root.json
* 07:13 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host failoid2002.codfw.wmnet
* 06:58 marostegui@cumin1001: dbctl commit (dc=all): 'db2107 (re)pooling @ 1%: After reimage', diff saved to https://phabricator.wikimedia.org/P17005 and previous config saved to /var/cache/conftool/dbconfig/20210812-065833-root.json
* 06:49 tstarling@deploy1002: Synchronized php-1.37.0-wmf.18/extensions/SecurePoll/includes/Crypt/GpgCrypt.php: fix for [[phab:T288711|T288711]] failure of election creation (duration: 01m 09s)
* 06:47 moritzm: updating bullseye installations to the latest state of testing
* 06:46 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 06:36 moritzm: installing c-ares security updates on Bullseye
* 06:32 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 06:31 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 06:00 marostegui: Failover m3 from db1132 to db1107 - [[phab:T288197|T288197]]
* 05:15 ryankemper: [WDQS] `sudo -i cookbook sre.wdqs.data-transfer --source wdqs2005.codfw.wmnet --dest wdqs2004.codfw.wmnet --reason "transferring fresh wikidata journal after nuking wdqs2004's" --blazegraph_instance blazegraph`
* 05:15 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 05:14 ryankemper: [WDQS Deploy] Deploy complete. Successful test query placed on query.wikidata.org, there's no relevant criticals in Icinga, and Grafana looks good
* 04:45 eileen: tools revision changed from {{Gerrit|c26a8c0cb6}} to {{Gerrit|15bfaa7117}}
* 04:44 ryankemper: [WDQS Deploy] Restarting `wdqs-categories` across lvs-managed hosts, one node at a time: `sudo -E cumin -b 1 'A:wdqs-all and not A:wdqs-test' 'depool && sleep 45 && systemctl restart wdqs-categories && sleep 45 && pool'`
* 04:44 ryankemper: [WDQS Deploy] Restarted `wdqs-categories` across all test hosts simultaneously: `sudo -E cumin 'A:wdqs-test' 'systemctl restart wdqs-categories'`
* 04:44 ryankemper: [WDQS Deploy] Restarted `wdqs-updater` across all hosts, 4 hosts at a time: `sudo -E cumin -b 4 'A:wdqs-all' 'systemctl restart wdqs-updater'`
* 04:43 ryankemper@deploy1002: Finished deploy [wdqs/wdqs@9d03aaa]: 0.3.81 (duration: 02m 07s)
* 04:41 ryankemper@deploy1002: Started deploy [wdqs/wdqs@9d03aaa]: 0.3.81
* 04:41 ryankemper: [WDQS Deploy] Re-rolling deploy so that `wdqs2004` gets deployed to
* 04:41 ryankemper: [WDQS] `wdqs2004`'s disk is full due to overinflated `wikidata.jnl`, nuking and depooling: `sudo rm -fv /srv/wdqs/wikidata.jnl && sudo depool`
* 04:40 ryankemper@deploy1002: Finished deploy [wdqs/wdqs@9d03aaa]: 0.3.81 (duration: 17m 03s)
* 04:26 ryankemper: [WDQS Deploy] Tests passing following deploy of `0.3.81` on canary `wdqs1003`; proceeding to rest of fleet
* 04:23 ryankemper@deploy1002: Started deploy [wdqs/wdqs@9d03aaa]: 0.3.81
* 04:21 ryankemper: [WDQS Deploy] Gearing up for deploy of wdqs `0.3.81`. Pre-deploy tests passing on canary `wdqs1003`
* 03:40 eileen: process-control config revision is {{Gerrit|7bdc78073d}}
* 03:01 eileen: civicrm revision changed from {{Gerrit|d8ebf45819}} to {{Gerrit|f3895dc907}}, config revision is {{Gerrit|7bdc78073d}}


== 2021-08-11 ==
== 2022-07-01 ==
* 23:30 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 23:55 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1133.eqiad.wmnet with reason: Maintenance
* 23:29 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 23:55 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1133.eqiad.wmnet with reason: Maintenance
* 23:24 ebernhardson@deploy1002: Synchronized wmf-config/InitialiseSettings.php: cirrus: switch more_like traffic to codfw 2/2 (duration: 01m 08s)
* 23:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1118 ([[phab:T309311|T309311]])', diff saved to https://phabricator.wikimedia.org/P30753 and previous config saved to /var/cache/conftool/dbconfig/20220701-235524-ladsgroup.json
* 23:07 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 23:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1118', diff saved to https://phabricator.wikimedia.org/P30752 and previous config saved to /var/cache/conftool/dbconfig/20220701-234019-ladsgroup.json
* 23:06 ebernhardson@deploy1002: Synchronized wmf-config/InitialiseSettings.php: cirrus: switch more_like traffic to codfw 1/2 (duration: 01m 08s)
* 23:25 ladsgroup@cumin1001: dbctl commit (dc=all
* 23:06 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 22:34 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 22:33 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 22:32 legoktm@deploy1002: Synchronized php-1.37.0-wmf.18/extensions/Score/includes/Score.php: Record shell outs in statsd (duration: 01m 07s)
* 22:30 legoktm@deploy1002: Synchronized php-1.37.0-wmf.17/extensions/Score/includes/Score.php: Record shell outs in statsd (duration: 01m 08s)
* 21:46 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 21:42 ladsgroup@deploy1002: Synchronized php-1.37.0-wmf.17/extensions/SpamBlacklist/includes/SpamBlacklistHooks.php: Backport: [[gerrit:710725{{!}}Avoid using deprecated WikiPage::prepareContentForEdit (T288639)]] (duration: 01m 08s)
* 21:40 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 21:33 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 21:30 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 21:29 ladsgroup@deploy1002: Synchronized php-1.37.0-wmf.18/extensions/SpamBlacklist/includes/SpamBlacklistHooks.php: Backport: [[gerrit:711706{{!}}Avoid using deprecated WikiPage::prepareContentForEdit (T288639)]] (duration: 01m 07s)
* 21:18 legoktm@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 20:58 legoktm@cumin1001: START - Cookbook sre.dns.netbox
* 20:30 urbanecm: [urbanecm@mwmaint2002 ~]$ mwscript namespaceDupes.php --wiki=wikimaniawiki --move-talk --add-prefix=[[phab:T288643|T288643]] --fix # [[phab:T288643|T288643]]
* 20:28 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 20:27 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 20:23 mholloway-shell@deploy1002: Synchronized php-1.37.0-wmf.17/extensions/Popups: Log VirtualPageView events to Event Platform ([[phab:T288655|T288655]]) (duration: 01m 06s)
* 20:21 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 20:20 mholloway-shell@deploy1002: Synchronized php-1.37.0-wmf.18/extensions/Popups: Log VirtualPageView events to Event Platform ([[phab:T288655|T288655]]) (duration: 01m 09s)
* 20:19 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 19:45 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 19:39 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 19:36 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 19:35 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 19:30 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 19:29 jhuneidi@deploy1002: Synchronized php: group1 wikis to 1.37.0-wmf.18  refs [[phab:T281159|T281159]] (duration: 01m 08s)
* 19:28 jhuneidi@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.37.0-wmf.18  refs [[phab:T281159|T281159]]
* 19:22 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 19:19 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 19:10 jhuneidi@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.37.0-wmf.18  refs [[phab:T281159|T281159]]
* 19:01 jgleeson: payments-wiki updated from {{Gerrit|a70aaa7944}} to {{Gerrit|0a27dbe9b6}}
* 18:32 hnowlan@cumin1001: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:restbase-eqiad: Restarting to pick up Java security updates - hnowlan@cumin1001
* 18:24 legoktm@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
* 18:23 legoktm@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
* 18:23 legoktm@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
* 18:22 legoktm@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
* 18:22 legoktm@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
* 18:21 bstorm: removed thirdparty/kubeadm-k8s-1-17 in reprepro
* 18:21 legoktm@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
* 18:20 legoktm@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
* 18:19 legoktm@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
* 18:04 ebernhardson@deploy1002: Finished deploy [wikimedia/discovery/analytics@563f876]: process_sparql_query: increase parallelism to help backfill (duration: 02m 21s)
* 18:02 ebernhardson@deploy1002: Started deploy [wikimedia/discovery/analytics@563f876]: process_sparql_query: increase parallelism to help backfill
* 17:37 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 17:36 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 17:35 jforrester@deploy1002: Synchronized php-1.37.0-wmf.18/includes/specials/pagers/ContribsPager.php: [[phab:T288563|T288563]] Don't explode Special:Contributions on extension-formatted rows (3/3) (duration: 01m 06s)
* 17:34 jforrester@deploy1002: Synchronized php-1.37.0-wmf.18/includes/Revision/RevisionFactory.php: [[phab:T288563|T288563]] Don't explode Special:Contributions on extension-formatted rows (2/3) (duration: 01m 08s)
* 17:32 jforrester@deploy1002: Synchronized php-1.37.0-wmf.18/includes/Revision/RevisionStore.php: [[phab:T288563|T288563]] Don't explode Special:Contributions on extension-formatted rows (1/3) (duration: 01m 09s)
* 16:22 dancy: Results of testing php_fpm_always_restart:  php_fpm_always_restart=false: 1m19.942s    php_fpm_always_restart=true: 3m12.836s
* 16:19 dancy@deploy1002: Synchronized README: Testing scap php-rpm rolling restart (after) (duration: 03m 12s)
* 16:16 thcipriani: moment of truth for php-fpm-always-restart in scap
* 16:10 btullis@cumin1001: END (PASS) - Cookbook sre.druid.roll-restart-workers (exit_code=0) for Druid analytics cluster: Roll restart of Druid jvm daemons. - btullis@cumin1001
* 16:05 dancy@deploy1002: Synchronized README: Testing scap php-rpm rolling restart (before) (duration: 01m 19s)
* 15:37 hnowlan@cumin1001: START - Cookbook sre.cassandra.roll-restart for nodes matching A:restbase-eqiad: Restarting to pick up Java security updates - hnowlan@cumin1001
* 15:12 moritzm: import openjdk-8 8u302-b08-1+wmf1 to bullseye-wikimedia (bootstrap build, not to be used yet) [[phab:T287960|T287960]]
* 15:02 sukhe@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts bast4002.wikimedia.org
* 14:57 btullis@cumin1001: START - Cookbook sre.druid.roll-restart-workers for Druid analytics cluster: Roll restart of Druid jvm daemons. - btullis@cumin1001
* 14:55 sukhe@cumin1001: START - Cookbook sre.hosts.decommission for hosts bast4002.wikimedia.org
* 14:55 sukhe@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=99) for hosts bast4002.wikimedia.org
* 14:44 sukhe@cumin1001: START - Cookbook sre.hosts.decommission for hosts bast4002.wikimedia.org
* 14:44 sukhe: s/depool/decommission bast4002.wikimedia.org - [[phab:T288579|T288579]]
* 14:43 sukhe: depool bast4002.wikimedia.org - [[phab:T288579|T288579]]
* 14:23 moritzm: installing mx2002 [[phab:T286911|T286911]]
* 14:21 hnowlan: disabled cassandra-metrics-collector on maps*
* 13:33 moritzm: installing Java 8/Java 11 security updates on various analytics hosts
* 13:29 btullis@cumin1001: END (PASS) - Cookbook sre.druid.roll-restart-workers (exit_code=0) for Druid analytics cluster: Roll restart of Druid jvm daemons. - btullis@cumin1001
* 12:45 moritzm: imported openjdk-8 8u302-b08-1~deb10u1 to component/jdk8 for buster-wikimedia (forward port of the latest Java 8 security release)
* 12:32 godog: roll-restart prometheus [[phab:T284213|T284213]]
* 12:16 moritzm: installing c-ares security updates on stretch
* 12:16 btullis@cumin1001: START - Cookbook sre.druid.roll-restart-workers for Druid analytics cluster: Roll restart of Druid jvm daemons. - btullis@cumin1001
* 12:14 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 12:08 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 11:41 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 11:33 Lucas_WMDE: EU backport+config window done
* 11:32 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:711141{{!}}Remove $wmgWikibaseClientEntityNamespaces (T257260)]] (duration: 01m 08s)
* 11:31 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 11:29 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 11:29 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/Wikibase.php: Config: [[gerrit:711140{{!}}Stop setting $wgWBClientSettings['entityNamespaces'] (T257260)]] (duration: 01m 07s)
* 11:25 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:711139{{!}}Remove $wmgWikibaseRepoEntityNamespaces (T257260)]] (duration: 01m 08s)
* 11:22 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/Wikibase.php: Config: [[gerrit:711138{{!}}Stop setting $wgWBRepoSettings['entityNamespaces'] (T257260)]] (duration: 01m 08s)
* 11:17 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 11:17 btullis@cumin1001: END (PASS) - Cookbook sre.druid.roll-restart-workers (exit_code=0) for Druid analytics cluster: Roll restart of Druid jvm daemons. - btullis@cumin1001
* 11:17 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.37.0-wmf.18/extensions/SecurePoll/: Backport: [[gerrit:710720{{!}}Add ad-hoc logging to tally process (T288366)]] (duration: 01m 09s)
* 11:11 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 11:06 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:711248{{!}}Disable Collection sidebar link on English Wikisource (T288021)]] (duration: 01m 14s)
* 10:58 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 10:42 moritzm: rolling restart of Buster-based maps services to pick up c-ares security updates
* 10:37 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 10:20 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 10:02 btullis@cumin1001: START - Cookbook sre.druid.roll-restart-workers for Druid analytics cluster: Roll restart of Druid jvm daemons. - btullis@cumin1001
* 09:50 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.37.0-wmf.18/includes/specials/SpecialWhatLinksHere.php: Backport: [[gerrit:710719{{!}}Fix SelectQueryBuilder use in SpecialWhatLinksHere (T288565)]] (duration: 01m 08s)
* 09:50 godog: upgrade thanos on cloudmetrics* - [[phab:T288604|T288604]]
* 09:26 godog: upgrade thanos on prometheus* - [[phab:T288604|T288604]]
* 09:21 elukey: run "sudo find /var/log/airflow -type f -mtime +15 -delete" on an-airflow1001 to free space (root partition almost full)
* 09:19 jgiannelos@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
* 09:15 jgiannelos@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
* 09:11 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'mobileapps' for release 'staging' .
* 09:05 godog: upgrade thanos on thanos-fe* - [[phab:T288604|T288604]]
* 08:23 kormat@deploy1002: Synchronized wmf-config/ProductionServices.php: Minor cleanup of parsercache entries (duration: 01m 17s)
* 08:19 moritzm: restart Aphlict to pick up c-ares security updates
* 08:17 moritzm: restart Turnilo on an-tool1007 to pick up c-ares security updates
* 08:02 moritzm: rolling restart of AQS to pick up the c-ares security update
* 07:09 moritzm: restart etherpad-lite on etherpad1002 to pick up c-ares security updates
* 06:59 _joe_: deleting the staging deployment of mwdebug
* 05:55 marostegui@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on db2107.codfw.wmnet with reason: REIMAGE
* 05:53 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db2107.codfw.wmnet with reason: REIMAGE
* 05:22 marostegui: Stop replication on db2107 [[phab:T287454|T287454]]
* 05:18 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2107 [[phab:T287454|T287454]]', diff saved to https://phabricator.wikimedia.org/P16999 and previous config saved to /var/cache/conftool/dbconfig/20210811-051856-marostegui.json
* 05:10 marostegui@cumin1001: dbctl commit (dc=all): 'Promote db2104 to s2 master and set section read-write [[phab:T287454|T287454]]', diff saved to https://phabricator.wikimedia.org/P16998 and previous config saved to /var/cache/conftool/dbconfig/20210811-051041-root.json
* 05:00 marostegui@cumin1001: dbctl commit (dc=all): 'Set s2 codfw as read-only for maintenance - [[phab:T287454|T287454]]', diff saved to https://phabricator.wikimedia.org/P16997 and previous config saved to /var/cache/conftool/dbconfig/20210811-050040-marostegui.json
* 05:00 marostegui: Starting s2 codfw failover from db2107 to db2104 - [[phab:T287454|T287454]]
* 04:16 marostegui@cumin1001: dbctl commit (dc=all): 'Set db2104 with weight 0 [[phab:T287454|T287454]]', diff saved to https://phabricator.wikimedia.org/P16996 and previous config saved to /var/cache/conftool/dbconfig/20210811-041625-root.json
* 04:15 root@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 24 hosts with reason: Master switchover s2 [[phab:T287454|T287454]]
* 04:15 root@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 24 hosts with reason: Master switchover s2 [[phab:T287454|T287454]]
* 03:45 razzi@cumin1001: END (FAIL) - Cookbook sre.aqs.roll-restart (exit_code=99) for AQS aqs cluster: Roll restart of all AQS's nodejs daemons. - razzi@cumin1001
* 03:45 razzi@cumin1001: START - Cookbook sre.aqs.roll-restart for AQS aqs cluster: Roll restart of all AQS's nodejs daemons. - razzi@cumin1001
* 01:49 dpifke@deploy1002: Finished deploy [performance/navtiming@12d8381]: Revert https://gerrit.wikimedia.org/r/c/performance/navtiming/+/693423 (duration: 00m 05s)
* 01:49 dpifke@deploy1002: Started deploy [performance/navtiming@12d8381]: Revert https://gerrit.wikimedia.org/r/c/performance/navtiming/+/693423
* 01:47 dpifke@deploy1002: Finished deploy [performance/navtiming@12d8381]: Deploying https://gerrit.wikimedia.org/r/c/performance/navtiming/+/693423 (duration: 00m 06s)
* 01:47 dpifke@deploy1002: Started deploy [performance/navtiming@12d8381]: Deploying https://gerrit.wikimedia.org/r/c/performance/navtiming/+/693423
* 01:38 legoktm@deploy1002: Synchronized docroot/noc/conf/index.php: noc: Expose primary datacenter on conf/ (duration: 01m 06s)
* 01:22 bstorm@cumin1001: END (PASS) - Cookbook wmcs.wikireplicas.add_wiki (exit_code=0)
* 01:22 bstorm@cumin1001: Added views for new wiki: jvwikisource [[phab:T286245|T286245]]
* 01:00 bstorm@cumin1001: START - Cookbook wmcs.wikireplicas.add_wiki
* 00:38 bstorm@cumin1001: END (ERROR) - Cookbook wmcs.wikireplicas.add_wiki (exit_code=97)
* 00:36 bstorm@cumin1001: START - Cookbook wmcs.wikireplicas.add_wiki
 
== 2021-08-10 ==
* 23:33 cjming@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:710344{{!}}Enable user links feature for pilot wikis, modern vector (T288274)]] (duration: 01m 08s)
* 23:18 legoktm@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 23:06 krinkle@deploy1002: Synchronized wmf-config/: {{Gerrit|I13e88c303a}}, [[phab:T284418|T284418]] (duration: 01m 07s)
* 23:02 legoktm@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 22:58 eileen: process-control config revision is {{Gerrit|7bdc78073d}}
* 22:50 krinkle@deploy1002: Synchronized wmf-config/: {{Gerrit|I8052636}}, {{Gerrit|I2038702b7e0}} (duration: 01m 21s)
* 21:50 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1054.eqiad.wmnet with reason: REIMAGE
* 21:48 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1053.eqiad.wmnet with reason: REIMAGE
* 21:46 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1054.eqiad.wmnet with reason: REIMAGE
* 21:46 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1052.eqiad.wmnet with reason: REIMAGE
* 21:44 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1053.eqiad.wmnet with reason: REIMAGE
* 21:44 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1051.eqiad.wmnet with reason: REIMAGE
* 21:42 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1052.eqiad.wmnet with reason: REIMAGE
* 21:42 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1050.eqiad.wmnet with reason: REIMAGE
* 21:40 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1051.eqiad.wmnet with reason: REIMAGE
* 21:40 ryankemper: [WDQS] `ryankemper@wdqs2005:~$ sudo pool`
* 21:40 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1049.eqiad.wmnet with reason: REIMAGE
* 21:40 ryankemper: [[phab:T288501|T288501]] `ryankemper@wdqs2003:~$ sudo pool`
* 21:38 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1050.eqiad.wmnet with reason: REIMAGE
* 21:37 robh@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mc1048.eqiad.wmnet with reason: REIMAGE
* 21:36 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1049.eqiad.wmnet with reason: REIMAGE
* 21:35 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1048.eqiad.wmnet with reason: REIMAGE
* 21:35 robh@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mc1047.eqiad.wmnet with reason: REIMAGE
* 21:33 robh@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mc1046.eqiad.wmnet with reason: REIMAGE
* 21:32 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1047.eqiad.wmnet with reason: REIMAGE
* 21:30 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1046.eqiad.wmnet with reason: REIMAGE
* 21:08 jhuneidi@deploy1002: rebuilt and synchronized wikiversions files: Revert "group0 wikis to 1.37.0-wmf.18"
* 21:02 krinkle@deploy1002: Synchronized wmf-config/: {{Gerrit|I3b54d163b6}} (duration: 01m 09s)
* 20:54 krinkle@deploy1002: Synchronized wmf-config/CommonSettings.php: {{Gerrit|If7a8d6b6}} (duration: 01m 22s)
* 20:43 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on cloudvirt1038.eqiad.wmnet with reason: REIMAGE
* 20:42 krinkle@deploy1002: Synchronized wmf-config/: {{Gerrit|Ic5ff34b}} (duration: 01m 08s)
* 20:40 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1038.eqiad.wmnet with reason: REIMAGE
* 20:37 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1045.eqiad.wmnet with reason: REIMAGE
* 20:35 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1044.eqiad.wmnet with reason: REIMAGE
* 20:34 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1045.eqiad.wmnet with reason: REIMAGE
* 20:33 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1043.eqiad.wmnet with reason: REIMAGE
* 20:32 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1044.eqiad.wmnet with reason: REIMAGE
* 20:31 krinkle@deploy1002: Synchronized docroot/noc/: {{Gerrit|Ic013a93998f}} (duration: 01m 37s)
* 20:31 robh@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mc1042.eqiad.wmnet with reason: REIMAGE
* 20:30 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1043.eqiad.wmnet with reason: REIMAGE
* 20:29 robh@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mc1041.eqiad.wmnet with reason: REIMAGE
* 20:28 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1042.eqiad.wmnet with reason: REIMAGE
* 20:26 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1041.eqiad.wmnet with reason: REIMAGE
* 19:29 robh@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mc1040.eqiad.wmnet with reason: REIMAGE
* 19:27 robh@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mc1039.eqiad.wmnet with reason: REIMAGE
* 19:27 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1040.eqiad.wmnet with reason: REIMAGE
* 19:25 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1039.eqiad.wmnet with reason: REIMAGE
* 19:16 cmjohnson@cumin1001: END (ERROR) - Cookbook sre.dns.netbox (exit_code=97)
* 19:15 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 19:09 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on dumpsdata1005.eqiad.wmnet with reason: REIMAGE
* 19:09 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on ganeti1024.eqiad.wmnet with reason: REIMAGE
* 19:07 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dumpsdata1004.eqiad.wmnet with reason: REIMAGE
* 19:05 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on ganeti1023.eqiad.wmnet with reason: REIMAGE
* 19:04 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on dumpsdata1005.eqiad.wmnet with reason: REIMAGE
* 19:04 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti1024.eqiad.wmnet with reason: REIMAGE
* 19:04 jhuneidi@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.37.0-wmf.18  refs [[phab:T281159|T281159]]
* 19:04 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on dumpsdata1004.eqiad.wmnet with reason: REIMAGE
* 19:03 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti1023.eqiad.wmnet with reason: REIMAGE
* 18:49 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 18:47 ryankemper: [WDQS] `ryankemper@wdqs2005:~$ sudo depool` (~1.26 hours of lag)
* 18:46 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 18:46 ryankemper: [[phab:T288501|T288501]] (Misread grafana graph, `wdqs2003` only has 1.33 hours to catch up on)
* 18:45 ryankemper: [[phab:T288501|T288501]] `data-transfer` of `wikidata.jnl` completed successfully. Host needs to catch up on ~22 hours of WDQS lag before being re-pooled
* 18:42 ryankemper@cumin2001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 17:23 jhuneidi@deploy1002: Finished scap: testwikis wikis to 1.37.0-wmf.18 (duration: 36m 35s)
* 17:19 ryankemper: [[phab:T288501|T288501]] `sudo -i cookbook sre.wdqs.data-transfer --source wdqs2005.codfw.wmnet --dest wdqs2003.codfw.wmnet --reason "transferring fresh wikidata journal to resolve disk issue" --blazegraph_instance blazegraph` on `cumin2001


==Archives==
==Archives==

Latest revision as of 00:45, 2 July 2022

2022-07-02

  • 00:45 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1140.eqiad.wmnet with reason: Maintenance
  • 00:45 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1140.eqiad.wmnet with reason: Maintenance

2022-07-01

  • 23:55 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1133.eqiad.wmnet with reason: Maintenance
  • 23:55 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1133.eqiad.wmnet with reason: Maintenance
  • 23:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1118 (T309311)', diff saved to https://phabricator.wikimedia.org/P30753 and previous config saved to /var/cache/conftool/dbconfig/20220701-235524-ladsgroup.json
  • 23:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1118', diff saved to https://phabricator.wikimedia.org/P30752 and previous config saved to /var/cache/conftool/dbconfig/20220701-234019-ladsgroup.json
  • 23:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1118', diff saved to https://phabricator.wikimedia.org/P30751 and previous config saved to /var/cache/conftool/dbconfig/20220701-232514-ladsgroup.json
  • 23:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1118 (T309311)', diff saved to https://phabricator.wikimedia.org/P30750 and previous config saved to /var/cache/conftool/dbconfig/20220701-231009-ladsgroup.json
  • 23:02 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-presto1012.eqiad.wmnet with OS bullseye
  • 22:47 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-presto1012.eqiad.wmnet with reason: host reimage
  • 22:43 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-presto1012.eqiad.wmnet with reason: host reimage
  • 22:32 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-presto1015.eqiad.wmnet with OS bullseye
  • 22:31 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host an-presto1012.eqiad.wmnet with OS bullseye
  • 22:22 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-presto1012.eqiad.wmnet with OS bullseye
  • 22:17 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-presto1015.eqiad.wmnet with reason: host reimage
  • 22:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1118 (T309311)', diff saved to https://phabricator.wikimedia.org/P30749 and previous config saved to /var/cache/conftool/dbconfig/20220701-221438-ladsgroup.json
  • 22:14 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1118.eqiad.wmnet with reason: Maintenance
  • 22:14 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-presto1015.eqiad.wmnet with reason: host reimage
  • 22:14 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1118.eqiad.wmnet with reason: Maintenance
  • 22:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1119 (T309311)', diff saved to https://phabricator.wikimedia.org/P30748 and previous config saved to /var/cache/conftool/dbconfig/20220701-221418-ladsgroup.json
  • 22:12 mutante: restbase2018 - attempting power cycle via mgmt - /admin1-> racadm serveraction powercycle (T311890)
  • 22:08 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-presto1014.eqiad.wmnet with OS bullseye
  • 22:05 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-presto1013.eqiad.wmnet with OS bullseye
  • 22:05 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-presto1008.eqiad.wmnet with OS bullseye
  • 22:04 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-presto1010.eqiad.wmnet with OS bullseye
  • 22:02 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host an-presto1012.eqiad.wmnet with OS bullseye
  • 22:02 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host an-presto1015.eqiad.wmnet with OS bullseye
  • 21:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1119', diff saved to https://phabricator.wikimedia.org/P30747 and previous config saved to /var/cache/conftool/dbconfig/20220701-215913-ladsgroup.json
  • 21:57 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-presto1009.eqiad.wmnet with OS bullseye
  • 21:57 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-presto1011.eqiad.wmnet with OS bullseye
  • 21:57 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-presto1007.eqiad.wmnet with OS bullseye
  • 21:52 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-presto1015.eqiad.wmnet with OS bullseye
  • 21:51 cmjohnson@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host an-presto1012.eqiad.wmnet with OS bullseye
  • 21:51 cmjohnson@cumin1001: END (ERROR) - Cookbook sre.hosts.downtime (exit_code=97) for 2:00:00 on an-presto1011.eqiad.wmnet with reason: host reimage
  • 21:51 cmjohnson@cumin1001: END (ERROR) - Cookbook sre.hosts.downtime (exit_code=97) for 2:00:00 on an-presto1014.eqiad.wmnet with reason: host reimage
  • 21:51 cmjohnson@cumin1001: END (ERROR) - Cookbook sre.hosts.downtime (exit_code=97) for 2:00:00 on an-presto1010.eqiad.wmnet with reason: host reimage
  • 21:50 cmjohnson@cumin1001: END (ERROR) - Cookbook sre.hosts.downtime (exit_code=97) for 2:00:00 on an-presto1008.eqiad.wmnet with reason: host reimage
  • 21:50 cmjohnson@cumin1001: END (ERROR) - Cookbook sre.hosts.downtime (exit_code=97) for 2:00:00 on an-presto1013.eqiad.wmnet with reason: host reimage
  • 21:50 cmjohnson@cumin1001: END (ERROR) - Cookbook sre.hosts.downtime (exit_code=97) for 2:00:00 on an-presto1007.eqiad.wmnet with reason: host reimage
  • 21:50 cmjohnson@cumin1001: END (ERROR) - Cookbook sre.hosts.downtime (exit_code=97) for 2:00:00 on an-presto1009.eqiad.wmnet with reason: host reimage
  • 21:49 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-presto1009.eqiad.wmnet with reason: host reimage
  • 21:49 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-presto1008.eqiad.wmnet with reason: host reimage
  • 21:49 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-presto1013.eqiad.wmnet with reason: host reimage
  • 21:49 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-presto1011.eqiad.wmnet with reason: host reimage
  • 21:49 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-presto1010.eqiad.wmnet with reason: host reimage
  • 21:49 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-presto1007.eqiad.wmnet with reason: host reimage
  • 21:49 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-presto1014.eqiad.wmnet with reason: host reimage
  • 21:48 mutante: https://doc.wikimedia.org switched to doc1002 backend on buster T247653
  • 21:48 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host stat1009.eqiad.wmnet with OS bullseye
  • 21:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1119', diff saved to https://phabricator.wikimedia.org/P30746 and previous config saved to /var/cache/conftool/dbconfig/20220701-214408-ladsgroup.json
  • 21:37 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host an-presto1015.eqiad.wmnet with OS bullseye
  • 21:37 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host an-presto1010.eqiad.wmnet with OS bullseye
  • 21:37 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host an-presto1011.eqiad.wmnet with OS bullseye
  • 21:37 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host an-presto1008.eqiad.wmnet with OS bullseye
  • 21:37 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host an-presto1013.eqiad.wmnet with OS bullseye
  • 21:37 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host an-presto1007.eqiad.wmnet with OS bullseye
  • 21:37 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host an-presto1009.eqiad.wmnet with OS bullseye
  • 21:37 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host an-presto1012.eqiad.wmnet with OS bullseye
  • 21:36 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host an-presto1014.eqiad.wmnet with OS bullseye
  • 21:34 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-presto1006.eqiad.wmnet with OS bullseye
  • 21:33 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on stat1009.eqiad.wmnet with reason: host reimage
  • 21:30 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on stat1009.eqiad.wmnet with reason: host reimage
  • 21:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1119 (T309311)', diff saved to https://phabricator.wikimedia.org/P30745 and previous config saved to /var/cache/conftool/dbconfig/20220701-212903-ladsgroup.json
  • 21:20 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-presto1006.eqiad.wmnet with reason: host reimage
  • 21:18 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host stat1009.eqiad.wmnet with OS bullseye
  • 21:17 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-presto1006.eqiad.wmnet with reason: host reimage
  • 21:09 mutante: https://doc.wikimedia.org - scheduled maintenance period - switching to buster backend doc1002 (T247653)
  • 21:04 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host an-presto1006.eqiad.wmnet with OS bullseye
  • 20:33 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 20:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1119 (T309311)', diff saved to https://phabricator.wikimedia.org/P30744 and previous config saved to /var/cache/conftool/dbconfig/20220701-203251-ladsgroup.json
  • 20:32 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1119.eqiad.wmnet with reason: Maintenance
  • 20:32 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1119.eqiad.wmnet with reason: Maintenance
  • 20:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3311 (T309311)', diff saved to https://phabricator.wikimedia.org/P30743 and previous config saved to /var/cache/conftool/dbconfig/20220701-203231-ladsgroup.json
  • 20:29 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 20:22 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 20:19 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 20:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3311', diff saved to https://phabricator.wikimedia.org/P30742 and previous config saved to /var/cache/conftool/dbconfig/20220701-201726-ladsgroup.json
  • 20:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3311', diff saved to https://phabricator.wikimedia.org/P30741 and previous config saved to /var/cache/conftool/dbconfig/20220701-200221-ladsgroup.json
  • 19:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3311 (T309311)', diff saved to https://phabricator.wikimedia.org/P30740 and previous config saved to /var/cache/conftool/dbconfig/20220701-194716-ladsgroup.json
  • 18:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1099:3311 (T309311)', diff saved to https://phabricator.wikimedia.org/P30739 and previous config saved to /var/cache/conftool/dbconfig/20220701-183504-ladsgroup.json
  • 18:35 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1099.eqiad.wmnet with reason: Maintenance
  • 18:34 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1099.eqiad.wmnet with reason: Maintenance
  • 18:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3311 (T309311)', diff saved to https://phabricator.wikimedia.org/P30738 and previous config saved to /var/cache/conftool/dbconfig/20220701-183444-ladsgroup.json
  • 18:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3311', diff saved to https://phabricator.wikimedia.org/P30737 and previous config saved to /var/cache/conftool/dbconfig/20220701-181939-ladsgroup.json
  • 18:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3311', diff saved to https://phabricator.wikimedia.org/P30736 and previous config saved to /var/cache/conftool/dbconfig/20220701-180434-ladsgroup.json
  • 17:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3311 (T309311)', diff saved to https://phabricator.wikimedia.org/P30735 and previous config saved to /var/cache/conftool/dbconfig/20220701-174929-ladsgroup.json
  • 17:47 bmansurov@deploy1002: Finished deploy [airflow-dags/research@b3fe77c]: (no justification provided) (duration: 00m 08s)
  • 17:47 bmansurov@deploy1002: Started deploy [airflow-dags/research@b3fe77c]: (no justification provided)
  • 16:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1105:3311 (T309311)', diff saved to https://phabricator.wikimedia.org/P30734 and previous config saved to /var/cache/conftool/dbconfig/20220701-165407-ladsgroup.json
  • 16:54 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1105.eqiad.wmnet with reason: Maintenance
  • 16:53 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1105.eqiad.wmnet with reason: Maintenance
  • 16:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106 (T309311)', diff saved to https://phabricator.wikimedia.org/P30733 and previous config saved to /var/cache/conftool/dbconfig/20220701-165347-ladsgroup.json
  • 16:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106', diff saved to https://phabricator.wikimedia.org/P30732 and previous config saved to /var/cache/conftool/dbconfig/20220701-163842-ladsgroup.json
  • 16:30 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2168.codfw.wmnet with OS bullseye
  • 16:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106', diff saved to https://phabricator.wikimedia.org/P30731 and previous config saved to /var/cache/conftool/dbconfig/20220701-162337-ladsgroup.json
  • 16:16 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2168.codfw.wmnet with reason: host reimage
  • 16:13 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2168.codfw.wmnet with reason: host reimage
  • 16:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106 (T309311)', diff saved to https://phabricator.wikimedia.org/P30730 and previous config saved to /var/cache/conftool/dbconfig/20220701-160831-ladsgroup.json
  • 15:53 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host db2168.codfw.wmnet with OS bullseye
  • 15:22 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2167.codfw.wmnet with OS bullseye
  • 15:16 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2166.codfw.wmnet with OS bullseye
  • 15:07 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2167.codfw.wmnet with reason: host reimage
  • 15:04 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2167.codfw.wmnet with reason: host reimage
  • 15:02 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2166.codfw.wmnet with reason: host reimage
  • 15:02 bmansurov@deploy1002: Finished deploy [airflow-dags/research@b3fe77c]: (no justification provided) (duration: 00m 08s)
  • 15:02 bmansurov@deploy1002: Started deploy [airflow-dags/research@b3fe77c]: (no justification provided)
  • 15:01 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts cloudstore[1008-1009]
  • 14:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1106 (T309311)', diff saved to https://phabricator.wikimedia.org/P30729 and previous config saved to /var/cache/conftool/dbconfig/20220701-145937-ladsgroup.json
  • 14:59 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 14:59 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 14:59 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1106.eqiad.wmnet with reason: Maintenance
  • 14:59 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2166.codfw.wmnet with reason: host reimage
  • 14:59 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1106.eqiad.wmnet with reason: Maintenance
  • 14:55 andrew@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:48 andrew@cumin1001: START - Cookbook sre.dns.netbox
  • 14:44 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host db2167.codfw.wmnet with OS bullseye
  • 14:40 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host db2166.codfw.wmnet with OS bullseye
  • 14:39 andrew@cumin1001: START - Cookbook sre.hosts.decommission for hosts cloudstore[1008-1009]
  • 14:05 bmansurov@deploy1002: Finished deploy [airflow-dags/research@b3fe77c]: (no justification provided) (duration: 00m 08s)
  • 14:04 bmansurov@deploy1002: Started deploy [airflow-dags/research@b3fe77c]: (no justification provided)
  • 13:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134 (T309311)', diff saved to https://phabricator.wikimedia.org/P30728 and previous config saved to /var/cache/conftool/dbconfig/20220701-135831-ladsgroup.json
  • 13:50 bmansurov@deploy1002: Finished deploy [airflow-dags/research@b3fe77c]: (no justification provided) (duration: 00m 08s)
  • 13:50 bmansurov@deploy1002: Started deploy [airflow-dags/research@b3fe77c]: (no justification provided)
  • 13:47 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 13:43 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 13:43 bmansurov@deploy1002: Finished deploy [airflow-dags/research@b3fe77c]: (no justification provided) (duration: 00m 07s)
  • 13:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134', diff saved to https://phabricator.wikimedia.org/P30727 and previous config saved to /var/cache/conftool/dbconfig/20220701-134326-ladsgroup.json
  • 13:43 bmansurov@deploy1002: Started deploy [airflow-dags/research@b3fe77c]: (no justification provided)
  • 13:36 bmansurov@deploy1002: Finished deploy [airflow-dags/research@b3fe77c]: (no justification provided) (duration: 00m 08s)
  • 13:36 bmansurov@deploy1002: Started deploy [airflow-dags/research@b3fe77c]: (no justification provided)
  • 13:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134', diff saved to https://phabricator.wikimedia.org/P30726 and previous config saved to /var/cache/conftool/dbconfig/20220701-132821-ladsgroup.json
  • 13:23 bmansurov@deploy1002: Finished deploy [airflow-dags/research@b3fe77c]: (no justification provided) (duration: 00m 09s)
  • 13:23 bmansurov@deploy1002: Started deploy [airflow-dags/research@b3fe77c]: (no justification provided)
  • 13:19 bmansurov@deploy1002: Finished deploy [airflow-dags/research@b3fe77c]: (no justification provided) (duration: 00m 08s)
  • 13:19 bmansurov@deploy1002: Started deploy [airflow-dags/research@b3fe77c]: (no justification provided)
  • 13:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134 (T309311)', diff saved to https://phabricator.wikimedia.org/P30725 and previous config saved to /var/cache/conftool/dbconfig/20220701-131316-ladsgroup.json
  • 13:12 bmansurov@deploy1002: Finished deploy [airflow-dags/research@b3fe77c]: (no justification provided) (duration: 00m 08s)
  • 13:12 bmansurov@deploy1002: Started deploy [airflow-dags/research@b3fe77c]: (no justification provided)
  • 13:08 bmansurov@deploy1002: Finished deploy [airflow-dags/research@b3fe77c]: (no justification provided) (duration: 00m 08s)
  • 13:08 bmansurov@deploy1002: Started deploy [airflow-dags/research@b3fe77c]: (no justification provided)
  • 13:01 marostegui@cumin1001: dbctl commit (dc=all): 'Add db2155 to s4 T311493', diff saved to https://phabricator.wikimedia.org/P30724 and previous config saved to /var/cache/conftool/dbconfig/20220701-130106-marostegui.json
  • 12:38 bmansurov@deploy1002: Finished deploy [airflow-dags/research@b3fe77c]: (no justification provided) (duration: 00m 08s)
  • 12:38 bmansurov@deploy1002: Started deploy [airflow-dags/research@b3fe77c]: (no justification provided)
  • 12:37 moritzm: uploaded rsyslog 8.2102.0-2+deb11u1+wmf2 to component/rsyslog-k8s (backport of latest security fixes on top of the rsyslog with mmkubernetes plugin)
  • 12:09 bmansurov@deploy1002: Finished deploy [airflow-dags/research@b3fe77c]: (no justification provided) (duration: 00m 08s)
  • 12:09 bmansurov@deploy1002: Started deploy [airflow-dags/research@b3fe77c]: (no justification provided)
  • 12:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1134 (T309311)', diff saved to https://phabricator.wikimedia.org/P30723 and previous config saved to /var/cache/conftool/dbconfig/20220701-120657-ladsgroup.json
  • 12:06 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1134.eqiad.wmnet with reason: Maintenance
  • 12:06 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1134.eqiad.wmnet with reason: Maintenance
  • 12:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135 (T309311)', diff saved to https://phabricator.wikimedia.org/P30722 and previous config saved to /var/cache/conftool/dbconfig/20220701-120636-ladsgroup.json
  • 12:02 bmansurov@deploy1002: Finished deploy [airflow-dags/research@b3fe77c]: (no justification provided) (duration: 00m 08s)
  • 12:02 bmansurov@deploy1002: Started deploy [airflow-dags/research@b3fe77c]: (no justification provided)
  • 11:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1172 (T309311)', diff saved to https://phabricator.wikimedia.org/P30721 and previous config saved to /var/cache/conftool/dbconfig/20220701-115414-ladsgroup.json
  • 11:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135', diff saved to https://phabricator.wikimedia.org/P30720 and previous config saved to /var/cache/conftool/dbconfig/20220701-115131-ladsgroup.json
  • 11:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1172', diff saved to https://phabricator.wikimedia.org/P30719 and previous config saved to /var/cache/conftool/dbconfig/20220701-113909-ladsgroup.json
  • 11:38 bmansurov@deploy1002: Finished deploy [airflow-dags/research@b3fe77c]: (no justification provided) (duration: 00m 08s)
  • 11:38 bmansurov@deploy1002: Started deploy [airflow-dags/research@b3fe77c]: (no justification provided)
  • 11:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135', diff saved to https://phabricator.wikimedia.org/P30718 and previous config saved to /var/cache/conftool/dbconfig/20220701-113626-ladsgroup.json
  • 11:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1172', diff saved to https://phabricator.wikimedia.org/P30717 and previous config saved to /var/cache/conftool/dbconfig/20220701-112404-ladsgroup.json
  • 11:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135 (T309311)', diff saved to https://phabricator.wikimedia.org/P30716 and previous config saved to /var/cache/conftool/dbconfig/20220701-112121-ladsgroup.json
  • 11:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1172 (T309311)', diff saved to https://phabricator.wikimedia.org/P30715 and previous config saved to /var/cache/conftool/dbconfig/20220701-110859-ladsgroup.json
  • 11:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1172 (T309311)', diff saved to https://phabricator.wikimedia.org/P30714 and previous config saved to /var/cache/conftool/dbconfig/20220701-110204-ladsgroup.json
  • 11:02 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1172.eqiad.wmnet with reason: Maintenance
  • 11:01 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1172.eqiad.wmnet with reason: Maintenance
  • 11:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1179 (T309311)', diff saved to https://phabricator.wikimedia.org/P30713 and previous config saved to /var/cache/conftool/dbconfig/20220701-110117-ladsgroup.json
  • 10:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1179', diff saved to https://phabricator.wikimedia.org/P30712 and previous config saved to /var/cache/conftool/dbconfig/20220701-104612-ladsgroup.json
  • 10:45 bmansurov@deploy1002: Finished deploy [airflow-dags/research@b3fe77c]: (no justification provided) (duration: 00m 08s)
  • 10:45 bmansurov@deploy1002: Started deploy [airflow-dags/research@b3fe77c]: (no justification provided)
  • 10:44 bmansurov@deploy1002: Finished deploy [airflow-dags/research@b3fe77c]: (no justification provided) (duration: 00m 09s)
  • 10:44 bmansurov@deploy1002: Started deploy [airflow-dags/research@b3fe77c]: (no justification provided)
  • 10:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1179', diff saved to https://phabricator.wikimedia.org/P30711 and previous config saved to /var/cache/conftool/dbconfig/20220701-103107-ladsgroup.json
  • 10:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1135 (T309311)', diff saved to https://phabricator.wikimedia.org/P30710 and previous config saved to /var/cache/conftool/dbconfig/20220701-102810-ladsgroup.json
  • 10:28 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1135.eqiad.wmnet with reason: Maintenance
  • 10:27 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1135.eqiad.wmnet with reason: Maintenance
  • 10:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1179 (T309311)', diff saved to https://phabricator.wikimedia.org/P30709 and previous config saved to /var/cache/conftool/dbconfig/20220701-101602-ladsgroup.json
  • 09:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1179 (T309311)', diff saved to https://phabricator.wikimedia.org/P30708 and previous config saved to /var/cache/conftool/dbconfig/20220701-094927-ladsgroup.json
  • 09:49 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1179.eqiad.wmnet with reason: Maintenance
  • 09:49 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1179.eqiad.wmnet with reason: Maintenance
  • 09:39 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on 13 hosts with reason: Maintenance
  • 09:39 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on 13 hosts with reason: Maintenance
  • 09:39 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2103.codfw.wmnet with reason: Maintenance
  • 09:39 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2103.codfw.wmnet with reason: Maintenance
  • 08:35 marostegui: Stop mysql on db2073 for cloning db2155
  • 07:47 mmandere: kubemaster2001, restart rsyslog
  • 07:46 marostegui@cumin1001: dbctl commit (dc=all): 'Add db2154 to s8 T311493', diff saved to https://phabricator.wikimedia.org/P30705 and previous config saved to /var/cache/conftool/dbconfig/20220701-074607-marostegui.json
  • 07:35 marostegui@cumin1001: dbctl commit (dc=all): 'Add db2153 to s1 T311493', diff saved to https://phabricator.wikimedia.org/P30704 and previous config saved to /var/cache/conftool/dbconfig/20220701-073512-marostegui.json
  • 06:00 marostegui@cumin1001: dbctl commit (dc=all): 'Remove db2091 from dbctl T311803', diff saved to https://phabricator.wikimedia.org/P30703 and previous config saved to /var/cache/conftool/dbconfig/20220701-060000-marostegui.json
  • 05:41 marostegui@cumin1001: dbctl commit (dc=all): 'Remove db2092 from dbctl T311802', diff saved to https://phabricator.wikimedia.org/P30701 and previous config saved to /var/cache/conftool/dbconfig/20220701-054102-marostegui.json
  • 02:31 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2165.codfw.wmnet with OS bullseye
  • 02:16 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2165.codfw.wmnet with reason: host reimage
  • 02:13 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2165.codfw.wmnet with reason: host reimage
  • 02:06 krinkle@deploy1002: Synchronized wmf-config/: I60edfb0f60 (3/3) (duration: 03m 31s)
  • 02:01 krinkle@deploy1002: Synchronized multiversion/: I60edfb0f60 (2/3) (duration: 03m 34s)
  • 01:54 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host db2165.codfw.wmnet with OS bullseye
  • 01:49 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2163.codfw.wmnet with OS bullseye
  • 01:39 krinkle@deploy1002: Synchronized tests/: I60edfb0f60 (1/3) (duration: 03m 32s)
  • 01:35 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2163.codfw.wmnet with reason: host reimage
  • 01:33 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 01:32 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 01:32 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 01:31 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2163.codfw.wmnet with reason: host reimage
  • 01:31 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 01:30 krinkle@deploy1002: Synchronized src/: I796f38 (3/3) (duration: 03m 24s)
  • 01:26 krinkle@deploy1002: Synchronized multiversion/: I796f38 (2/3) (duration: 03m 32s)
  • 01:23 krinkle@deploy1002: Synchronized tests/: I796f38 (1/3) (duration: 03m 41s)
  • 01:21 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 01:20 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 01:20 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 01:19 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 01:17 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2162.codfw.wmnet with OS bullseye
  • 01:12 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host db2163.codfw.wmnet with OS bullseye
  • 01:02 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2162.codfw.wmnet with reason: host reimage
  • 01:00 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2161.codfw.wmnet with OS bullseye
  • 00:57 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2162.codfw.wmnet with reason: host reimage
  • 00:53 ejegg: updated payments-wiki from ef53c82e to 78dee85e
  • 00:51 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db2168.mgmt.codfw.wmnet with reboot policy FORCED
  • 00:51 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db2167.mgmt.codfw.wmnet with reboot policy FORCED
  • 00:46 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2161.codfw.wmnet with reason: host reimage
  • 00:42 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2161.codfw.wmnet with reason: host reimage
  • 00:37 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host db2162.codfw.wmnet with OS bullseye
  • 00:28 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host db2168.mgmt.codfw.wmnet with reboot policy FORCED
  • 00:28 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host db2167.mgmt.codfw.wmnet with reboot policy FORCED
  • 00:27 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db2166.mgmt.codfw.wmnet with reboot policy FORCED
  • 00:26 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db2165.mgmt.codfw.wmnet with reboot policy FORCED
  • 00:23 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host db2161.codfw.wmnet with OS bullseye
  • 00:05 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host db2166.mgmt.codfw.wmnet with reboot policy FORCED
  • 00:05 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db2163.mgmt.codfw.wmnet with reboot policy FORCED
  • 00:01 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host db2165.mgmt.codfw.wmnet with reboot policy FORCED

Archives

See Server Admin Log/Archives.