You are browsing a read-only backup copy of Wikitech. The primary site can be found at wikitech.wikimedia.org

Server Admin Log: Difference between revisions

From Wikitech-static
Jump to navigation Jump to search
imported>Stashbot
(ebernhardson@deploy1001: Started deploy [wikimedia/discovery/analytics@85713c1]: restore data range specifier in extract job partition spec)
imported>Stashbot
(mutante: wikistats-bullseye:~$ /usr/lib/wikistats/update.php wp prefix blk ; /usr/lib/wikistats/update.php wp prefix kcg T315121)
(508 intermediate revisions by 4 users not shown)
Line 1: Line 1:
== 2021-02-05 ==
== 2022-08-12 ==
* 00:59 ebernhardson@deploy1001: Started deploy [wikimedia/discovery/analytics@85713c1]: restore data range specifier in extract job partition spec
* 23:41 mutante: wikistats-bullseye:~$ /usr/lib/wikistats/update.php wp prefix blk ; /usr/lib/wikistats/update.php wp prefix kcg [[phab:T315121|T315121]]
* 00:36 legoktm@cumin1001: conftool action : set/pooled=no; selector: name=mw1278.eqiad.wmnet
* 23:38 mutante: [mwmaint1002:~] $ sudo systemctl start mediawiki_job_initsitestats.timer [[phab:T315121|T315121]]
* 00:35 legoktm: enabled remote IPMI access on mw1349.mgmt.eqiad.wmnet and  mw1380.mgmt.eqiad.wmnet
* 22:14 ryankemper@cumin1001: END (ERROR) - Cookbook sre.elasticsearch.rolling-operation (exit_code=97) Operation.REIMAGE (1 nodes at a time) for ElasticSearch cluster search_eqiad: eqiad cluster reimage (bullseye upgrade) - ryankemper@cumin1001 - [[phab:T289135|T289135]]
* 00:24 ebernhardson@deploy1001: Finished deploy [wikimedia/discovery/analytics@9858513]: transfer_to_es: Wait for link reco, and write to weighted_tags as well (duration: 02m 43s)
* 21:48 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic1071.eqiad.wmnet with OS bullseye
* 00:21 ebernhardson@deploy1001: Started deploy [wikimedia/discovery/analytics@9858513]: transfer_to_es: Wait for link reco, and write to weighted_tags as well
* 21:45 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host clouddb2002-dev.codfw.wmnet with OS bullseye
* 21:27 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic1071.eqiad.wmnet with reason: host reimage
* 21:25 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic1071.eqiad.wmnet with reason: host reimage
* 21:12 ryankemper@cumin1001: START - Cookbook sre.hosts.reimage for host elastic1071.eqiad.wmnet with OS bullseye
* 21:10 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on clouddb2002-dev.codfw.wmnet with reason: host reimage
* 21:06 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on clouddb2002-dev.codfw.wmnet with reason: host reimage
* 21:06 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic1053.eqiad.wmnet with OS bullseye
* 20:50 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host clouddb2002-dev.codfw.wmnet with OS bullseye
* 20:43 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic1053.eqiad.wmnet with reason: host reimage
* 20:39 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic1053.eqiad.wmnet with reason: host reimage
* 20:24 ryankemper@cumin1001: START - Cookbook sre.hosts.reimage for host elastic1053.eqiad.wmnet with OS bullseye
* 20:12 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic1048.eqiad.wmnet with OS bullseye
* 19:55 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic1048.eqiad.wmnet with reason: host reimage
* 19:53 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic1048.eqiad.wmnet with reason: host reimage
* 19:42 ryankemper@cumin1001: START - Cookbook sre.hosts.reimage for host elastic1048.eqiad.wmnet with OS bullseye
* 19:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1146:3314 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P32375 and previous config saved to /var/cache/conftool/dbconfig/20220812-193822-ladsgroup.json
* 19:38 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1146.eqiad.wmnet with reason: Maintenance
* 19:38 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1146.eqiad.wmnet with reason: Maintenance
* 19:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1121 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P32374 and previous config saved to /var/cache/conftool/dbconfig/20220812-193801-ladsgroup.json
* 19:33 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic1054.eqiad.wmnet with OS bullseye
* 19:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1121', diff saved to https://phabricator.wikimedia.org/P32373 and previous config saved to /var/cache/conftool/dbconfig/20220812-192255-ladsgroup.json
* 19:12 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic1054.eqiad.wmnet with reason: host reimage
* 19:09 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic1054.eqiad.wmnet with reason: host reimage
* 19:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1121', diff saved to https://phabricator.wikimedia.org/P32372 and previous config saved to /var/cache/conftool/dbconfig/20220812-190749-ladsgroup.json
* 18:58 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 12:00:00 on clouddb[1015,1019,1021].eqiad.wmnet with reason: Maint
* 18:58 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 12:00:00 on clouddb[1015,1019,1021].eqiad.wmnet with reason: Maint
* 18:54 ryankemper@cumin1001: START - Cookbook sre.hosts.reimage for host elastic1054.eqiad.wmnet with OS bullseye
* 18:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1121 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P32371 and previous config saved to /var/cache/conftool/dbconfig/20220812-185243-ladsgroup.json
* 18:48 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic1066.eqiad.wmnet with OS bullseye
* 18:25 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic1066.eqiad.wmnet with reason: host reimage
* 18:22 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic1066.eqiad.wmnet with reason: host reimage
* 18:08 ryankemper@cumin1001: START - Cookbook sre.hosts.reimage for host elastic1066.eqiad.wmnet with OS bullseye
* 18:00 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic1064.eqiad.wmnet with OS bullseye
* 17:42 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic1064.eqiad.wmnet with reason: host reimage
* 17:39 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic1064.eqiad.wmnet with reason: host reimage
* 17:24 ryankemper@cumin1001: START - Cookbook sre.hosts.reimage for host elastic1064.eqiad.wmnet with OS bullseye
* 17:21 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts netmon2002.wikimedia.org
* 17:21 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts netmon2002.wikimedia.org
* 17:19 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host netmon2002.wikimedia.org with OS bullseye
* 17:04 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on netmon2002.wikimedia.org with reason: host reimage
* 17:01 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on netmon2002.wikimedia.org with reason: host reimage
* 16:42 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host netmon2002.wikimedia.org with OS bullseye
* 16:26 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic1067.eqiad.wmnet with OS bullseye
* 16:21 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts cloudcontrol2003-dev.wikimedia.org
* 16:21 andrew@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 16:16 andrew@cumin1001: START - Cookbook sre.dns.netbox
* 16:11 andrew@cumin1001: START - Cookbook sre.hosts.decommission for hosts cloudcontrol2003-dev.wikimedia.org
* 16:08 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['netmon2002.wikimedia.org']
* 16:03 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic1067.eqiad.wmnet with reason: host reimage
* 15:58 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic1067.eqiad.wmnet with reason: host reimage
* 15:43 ryankemper@cumin1001: START - Cookbook sre.hosts.reimage for host elastic1067.eqiad.wmnet with OS bullseye
* 15:37 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['netmon2002.wikimedia.org']
* 15:31 jbond@cumin2002: END (ERROR) - Cookbook sre.hardware.upgrade-firmware (exit_code=97) upgrade firmware for hosts ['netmon2002.wikimedia.org']
* 15:31 jbond@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['netmon2002.wikimedia.org']
* 15:07 jbond@cumin2002: END (ERROR) - Cookbook sre.hardware.upgrade-firmware (exit_code=97) upgrade firmware for hosts netmon1002.wikimedia.org
* 15:07 jbond@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts netmon1002.wikimedia.org
* 15:04 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic1061.eqiad.wmnet with OS bullseye
* 14:46 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic1061.eqiad.wmnet with reason: host reimage
* 14:46 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp2042.codfw.wmnet,service=varnish-fe
* 14:46 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp2042.codfw.wmnet,service=ats-be
* 14:46 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp2042.codfw.wmnet,service=ats-tls
* 14:43 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic1061.eqiad.wmnet with reason: host reimage
* 14:29 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on dbstore1007.eqiad.wmnet with reason: Maint
* 14:28 ryankemper@cumin1001: START - Cookbook sre.hosts.reimage for host elastic1061.eqiad.wmnet with OS bullseye
* 14:28 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on dbstore1007.eqiad.wmnet with reason: Maint
* 14:24 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic1063.eqiad.wmnet with OS bullseye
* 14:05 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic1063.eqiad.wmnet with reason: host reimage
* 14:02 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic1063.eqiad.wmnet with reason: host reimage
* 13:47 ryankemper@cumin1001: START - Cookbook sre.hosts.reimage for host elastic1063.eqiad.wmnet with OS bullseye
* 13:41 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation Operation.REIMAGE (1 nodes at a time) for ElasticSearch cluster search_eqiad: eqiad cluster reimage (bullseye upgrade) - ryankemper@cumin1001 - [[phab:T289135|T289135]]
* 06:01 ryankemper@puppetmaster1001: conftool action : set/pooled=yes:weight=10; selector: name=elastic10[8-9][0-9].*
* 05:54 ryankemper@puppetmaster1001: conftool action : set/pooled=yes:weight=10; selector: name=elastic110.*
* 01:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1121 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P32369 and previous config saved to /var/cache/conftool/dbconfig/20220812-010312-ladsgroup.json
* 01:03 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
* 01:02 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
* 01:02 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1121.eqiad.wmnet with reason: Maintenance
* 01:02 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1121.eqiad.wmnet with reason: Maintenance
* 01:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1147 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P32368 and previous config saved to /var/cache/conftool/dbconfig/20220812-010233-ladsgroup.json
* 00:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1147', diff saved to https://phabricator.wikimedia.org/P32367 and previous config saved to /var/cache/conftool/dbconfig/20220812-004727-ladsgroup.json
* 00:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1147', diff saved to https://phabricator.wikimedia.org/P32366 and previous config saved to /var/cache/conftool/dbconfig/20220812-003221-ladsgroup.json
* 00:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1147 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P32365 and previous config saved to /var/cache/conftool/dbconfig/20220812-001715-ladsgroup.json


== 2021-02-04 ==
== 2022-08-11 ==
* 23:59 ebernhardson@deploy1001: Finished deploy [wikimedia/discovery/analytics@93bf374]: correct hql in ores_predictions_init_v3 (duration: 01m 06s)
* 21:30 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 23:58 ebernhardson@deploy1001: Started deploy [wikimedia/discovery/analytics@93bf374]: correct hql in ores_predictions_init_v3
* 21:29 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 23:26 legoktm@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1278.eqiad.wmnet with reason: REIMAGE
* 21:29 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 23:24 legoktm@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1278.eqiad.wmnet with reason: REIMAGE
* 21:28 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 23:05 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1397.eqiad.wmnet
* 21:23 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 23:05 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1396.eqiad.wmnet
* 21:22 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 23:02 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1396.eqiad.wmnet
* 21:22 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 23:02 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1397.eqiad.wmnet
* 21:21 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 23:01 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1311.eqiad.wmnet
* 21:16 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 22:55 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1263.eqiad.wmnet
* 21:15 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 22:39 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1263.eqiad.wmnet
* 21:15 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 22:38 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1311.eqiad.wmnet
* 21:14 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 22:38 ebernhardson@deploy1001: Finished deploy [wikimedia/discovery/analytics@700cd49]: partition ores staging tables by data source (duration: 01m 19s)
* 21:04 thcipriani@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: revert [[gerrit:806944{{!}}Define default value for "wmgSiteLogoVariants" (T305692 T308620)]] (duration: 03m 15s)
* 22:37 ebernhardson@deploy1001: Started deploy [wikimedia/discovery/analytics@700cd49]: partition ores staging tables by data source
* 20:59 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 22:31 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1396.eqiad.wmnet with reason: REIMAGE
* 20:58 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 22:29 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1397.eqiad.wmnet with reason: REIMAGE
* 20:58 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 22:28 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1396.eqiad.wmnet with reason: REIMAGE
* 20:57 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 22:27 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1397.eqiad.wmnet with reason: REIMAGE
* 20:52 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 21:59 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1399.eqiad.wmnet
* 20:51 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 21:56 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1398.eqiad.wmnet
* 20:50 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 21:53 legoktm@cumin1001: conftool action : set/pooled=yes; selector: name=mw2244.codfw.wmnet
* 20:49 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 21:52 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1311.eqiad.wmnet with reason: REIMAGE
* 20:47 thcipriani@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:806944{{!}}Define default value for "wmgSiteLogoVariants" (T305692 T308620)]] (duration: 03m 07s)
* 21:52 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1399.eqiad.wmnet
* 20:44 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 21:52 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1398.eqiad.wmnet
* 20:43 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 21:50 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1311.eqiad.wmnet with reason: REIMAGE
* 20:43 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 21:32 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1263.eqiad.wmnet with reason: REIMAGE
* 20:42 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 21:30 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1263.eqiad.wmnet with reason: REIMAGE
* 20:29 thcipriani@deploy1002: Synchronized php-1.39.0-wmf.23/extensions/VisualEditor/modules/ve-mw/preinit/ve.init.mw.DesktopArticleTarget.init.js: Backport: [[gerrit:822396{{!}}Do not show incompatible skin warning when page is not editable (T314952)]] (duration: 03m 16s)
* 21:24 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1398.eqiad.wmnet with reason: REIMAGE
* 20:27 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 21:23 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1399.eqiad.wmnet with reason: REIMAGE
* 20:26 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 21:22 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1398.eqiad.wmnet with reason: REIMAGE
* 20:26 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 21:21 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1399.eqiad.wmnet with reason: REIMAGE
* 20:25 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 21:20 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1308.eqiad.wmnet
* 20:23 mutante: merging change on prod phabricator host to allow scap deployment, part 1
* 21:17 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1308.eqiad.wmnet
* 19:42 damilare: payments-wiki upgraded from {{Gerrit|cf5e1848}} to {{Gerrit|0894d75a}}
* 20:59 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1400.eqiad.wmnet
* 19:41 mutante: disabling puppet on C:profile::phabricator::main
* 20:56 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2267.codfw.wmnet
* 19:20 mvernon@cumin2002: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:restbase-eqiad: upgrade to 3.11.13 [[phab:T309896|T309896]] - mvernon@cumin2002
* 20:56 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2267.wmnet
* 17:58 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:38 legoktm@cumin1001: conftool action : set/pooled=no; selector: name=mw2244.codfw.wmnet
* 17:58 taavi@deploy1002: Synchronized wmf-config/CommonSettings.php: Config: [[gerrit:822428{{!}}Fix labtestwiki database name servers (T310795)]] (duration: 03m 39s)
* 20:35 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1400.eqiad.wmnet
* 17:57 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:35 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2267.wmnet
* 17:57 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:25 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1308.eqiad.wmnet with reason: REIMAGE
* 17:56 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:23 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1308.eqiad.wmnet with reason: REIMAGE
* 17:52 sukhe: testing ATS 9.1.3-1wm1 on cp3064: [[phab:T309651|T309651]]
* 20:13 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1400.eqiad.wmnet with reason: REIMAGE
* 17:49 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host netmon2002.mgmt.codfw.wmnet with reboot policy FORCED
* 20:11 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2267.codfw.wmnet with reason: REIMAGE
* 17:46 sukhe: testing ATS 9.1.3-1wm1 on cp3064: [[phab:T3096515|T3096515]]
* 20:10 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1400.eqiad.wmnet with reason: REIMAGE
* 17:41 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host netmon2002.mgmt.codfw.wmnet with reboot policy FORCED
* 20:09 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2267.codfw.wmnet with reason: REIMAGE
* 17:40 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 19:56 Urbanecm: Purge several recompressed Wikipedia logos
* 17:38 sukhe: testing ATS 9.1.3-1wm1 on cp1090: [[phab:T309651|T309651]]
* 19:52 urbanecm@deploy1001: Synchronized logos/config.yaml: Recompress several Wikipedia logos (2/2) (duration: 01m 05s)
* 17:36 pt1979@cumin2002: START - Cookbook sre.dns.netbox
* 19:51 urbanecm@deploy1001: Synchronized static/images/project-logos/: Recompress several Wikipedia logos (1/2) (duration: 01m 07s)
* 17:35 pt1979@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host netmon2002
* 19:49 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1309.eqiad.wmnet
* 17:34 pt1979@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host netmon2002
* 19:35 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|968ae8b69d7f743f0e589ba3568de36bc462c7d6}}: sysop_itwiki: Set wmgUsePopups to false ([[phab:T259480|T259480]]) (duration: 01m 06s)
* 17:33 sukhe: testing ATS 9.1.3-1wm1 on cp3065: [[phab:T309651|T309651]]
* 19:33 legoktm@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2244.codfw.wmnet with reason: REIMAGE
* 17:28 sukhe: testing ATS 9.1.3-1wm1 on cp1089: [[phab:T309651|T309651]]
* 19:31 legoktm@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2244.codfw.wmnet with reason: REIMAGE
* 17:19 bking@cumin1001: conftool action : set/weight=10:pooled=no; selector: service=elasticsearch-omega-ssl,name=elastic1100.eqiad.wmnet
* 19:31 urbanecm@deploy1001: Synchronized wmf-config/abusefilter.php: {{Gerrit|a199b8384f4226b70fc00538f01e41a9a68b3ea3}}: abusefilter: enwikibooks: Enable block action ([[phab:T273864|T273864]]) (duration: 01m 06s)
* 17:18 bking@cumin1001: conftool action : set/weight=10:pooled=yes; selector: service=elasticsearch-omega-ssl,name=elastic1100.eqiad.wmnet
* 19:26 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|35e6e4014eee7946979fbf6cd782ae90a3612b82}}: Remove ruwiki A/B test for WelcomeSurvey ([[phab:T273900|T273900]]) (duration: 01m 07s)
* 17:15 bking@cumin1001: conftool action : set/weight=10:pooled=yes; selector: service=search-omega-https,name=elastic1100.eqiad.wmnet
* 19:18 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|74e7f70c7c8ae4c8ee9589262d088562c7274b98}}: wgAbuseFilterAflFilterMigrationStage: Make READ_NEW in production ([[phab:T269712|T269712]]) (duration: 01m 11s)
* 16:35 mvernon@cumin2002: START - Cookbook sre.cassandra.roll-restart for nodes matching A:restbase-eqiad: upgrade to 3.11.13 [[phab:T309896|T309896]] - mvernon@cumin2002
* 19:06 cdanis: ✔️ cdanis@cumin1001.eqiad.wmnet ~ 🕜☕ sudo cumin A:cp 'enable-puppet "cdanis deploying {{Gerrit|I498a0c4af}} [[phab:T263496|T263496]]"'
* 16:30 mvernon@cumin2002: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:restbase-codfw: upgrade to 3.11.13 [[phab:T309896|T309896]] - mvernon@cumin2002
* 19:02 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2278.codfw.wmnet
* 16:29 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on elastic[1100-1102].eqiad.wmnet with reason: [[phab:T309810|T309810]]
* 19:00 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1309.eqiad.wmnet
* 16:29 bking@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on elastic[1100-1102].eqiad.wmnet with reason: [[phab:T309810|T309810]]
* 18:59 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1401.eqiad.wmnet
* 16:26 inflatador: bking@elastic1054 attempting to ban elastic1100-1102 from cluster due to firewall issues
* 18:58 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc2019.codfw.wmnet
* 16:13 bking@cumin1001: conftool action : set/weight=10:pooled=yes; selector: service=search-omega-https,name=elastic1100.eqiad.wmnet
* 18:52 robh@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 16:12 bking@cumin1001: conftool action : set/weight=10:pooled=yes; selector: name=elastic1100
* 18:45 cdanis: [[phab:T263496|T263496]] deployed {{Gerrit|I498a0c4af}} on cp2027 at 18:29; now deploying on cp3060
* 15:15 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 18:45 robh@cumin1001: START - Cookbook sre.dns.netbox
* 15:09 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 18:44 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc2019.codfw.wmnet
* 14:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1162 (re)pooling @ 100%: Maint done', diff saved to https://phabricator.wikimedia.org/P32364 and previous config saved to /var/cache/conftool/dbconfig/20220811-145823-ladsgroup.json
* 18:28 cdanis: ✔️ cdanis@cumin1001.eqiad.wmnet ~ 🕜☕ sudo cumin A:cp 'disable-puppet "cdanis deploying {{Gerrit|I498a0c4af}} [[phab:T263496|T263496]]"'
* 14:55 inflatador: bking@cumin1001 running puppet agent across eqiad elastic hosts
* 18:20 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2278.codfw.wmnet
* 14:48 ryankemper@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) Operation.REIMAGE (1 nodes at a time) for ElasticSearch cluster search_eqiad: eqiad cluster reimage (bullseye upgrade) - ryankemper@cumin1001 - [[phab:T289135|T289135]]
* 18:19 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1401.eqiad.wmnet
* 14:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1162 (re)pooling @ 75%: Maint done', diff saved to https://phabricator.wikimedia.org/P32362 and previous config saved to /var/cache/conftool/dbconfig/20220811-144318-ladsgroup.json
* 18:05 otto@deploy1001: Synchronized wmf-config/InitialiseSettings.php: revert - Migrate PrefUpdate schema to Event Platform on  all wikis - leave on testwiki only, seeing validation errors.  [[phab:T267348|T267348]] (duration: 01m 01s)
* 14:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1162 (re)pooling @ 25%: Maint done', diff saved to https://phabricator.wikimedia.org/P32361 and previous config saved to /var/cache/conftool/dbconfig/20220811-142813-ladsgroup.json
* 18:03 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1309.eqiad.wmnet with reason: REIMAGE
* 14:28 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts cloudcontrol1003.wikimedia.org
* 18:01 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1309.eqiad.wmnet with reason: REIMAGE
* 14:28 andrew@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 17:53 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2278.codfw.wmnet with reason: REIMAGE
* 14:24 andrew@cumin1001: START - Cookbook sre.dns.netbox
* 17:51 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2278.codfw.wmnet with reason: REIMAGE
* 14:19 andrew@cumin1001: START - Cookbook sre.hosts.decommission for hosts cloudcontrol1003.wikimedia.org
* 17:51 otto@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Migrate PrefUpdate schema to Event Platform on  all wikis - [[phab:T267348|T267348]] (duration: 01m 01s)
* 14:19 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 17:46 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1401.eqiad.wmnet with reason: REIMAGE
* 14:18 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts cloudcontrol1004.wikimedia.org
* 17:44 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1401.eqiad.wmnet with reason: REIMAGE
* 14:18 andrew@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 17:42 urbanecm@deploy1001: Synchronized wmf-config/logos.php: {{Gerrit|eed3c8e7294d03a62bc71e0a8d9a50044d1edbaa}}: Switch enwiki back to standard logo ([[phab:T272108|T272108]]; resync) (duration: 01m 07s)
* 14:18 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 17:41 urbanecm@deploy1001: Synchronized logos/config.yaml: {{Gerrit|eed3c8e7294d03a62bc71e0a8d9a50044d1edbaa}}: Switch enwiki back to standard logo ([[phab:T272108|T272108]]; 2/2) (duration: 01m 07s)
* 14:18 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 17:38 urbanecm@deploy1001: Synchronized wmf-config/logos.php: {{Gerrit|eed3c8e7294d03a62bc71e0a8d9a50044d1edbaa}}: Switch enwiki back to standard logo ([[phab:T272108|T272108]]; 1/2) (duration: 03m 12s)
* 14:17 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 16:46 otto@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Migrate PrefUpdate schema to Event Platform on  testwiki - [[phab:T267348|T267348]] (duration: 01m 08s)
* 14:17 ladsgroup@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:822375{{!}}Stop writing to the old templatelinks fields in s2 (T312865)]] (duration: 03m 25s)
* 16:25 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti3002.esams.wmnet
* 14:16 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
* 16:21 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ganeti3002.esams.wmnet
* 14:16 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
* 16:12 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc2023.codfw.wmnet
* 14:16 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
* 16:00 moritzm: draining ganeti3002 for eventual reboot
* 14:15 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
* 15:57 moritzm: failover ganeti master in esams to ganeti3001
* 14:13 andrew@cumin1001: START - Cookbook sre.dns.netbox
* 15:56 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc2023.codfw.wmnet
* 14:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1162 (re)pooling @ 10%: Maint done', diff saved to https://phabricator.wikimedia.org/P32360 and previous config saved to /var/cache/conftool/dbconfig/20220811-141309-ladsgroup.json
* 15:56 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc2022.codfw.wmnet
* 14:12 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 15:55 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti3001.esams.wmnet
* 14:11 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 15:55 jynus@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 14:11 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 15:50 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ganeti3001.esams.wmnet
* 14:11 awight: EU backport window complete
* 15:40 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc2022.codfw.wmnet
* 14:10 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 15:29 moritzm: draining ganeti3001 for eventual reboot
* 14:10 awight@deploy1002: Synchronized php-1.39.0-wmf.23/extensions/DiscussionTools/includes/CommentFormatter.php: Backport: [[gerrit:822149{{!}}CommentFormatter: Set 'data-mw-comment' even when reply tool disabled (T314707)]] (duration: 03m 31s)
* 15:27 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti3003.esams.wmnet
* 14:09 andrew@cumin1001: START - Cookbook sre.hosts.decommission for hosts cloudcontrol1004.wikimedia.org
* 15:25 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc2021.codfw.wmnet
* 14:05 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 15:23 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ganeti3003.esams.wmnet
* 14:04 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 15:20 moritzm: draining ganeti3003 for eventual reboot
* 14:04 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 15:11 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc2021.codfw.wmnet
* 14:03 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 15:11 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc2020.codfw.wmnet
* 13:52 mvernon@cumin2002: START - Cookbook sre.cassandra.roll-restart for nodes matching A:restbase-codfw: upgrade to 3.11.13 [[phab:T309896|T309896]] - mvernon@cumin2002
* 15:01 jynus@cumin1001: START - Cookbook sre.hosts.decommission
* 13:50 awight@deploy1002: Synchronized wmf-config: Config: [[gerrit:820666{{!}}Revert "Revert "testwiki: Add mediawiki.web_ui.interactions stream""]] (duration: 03m 10s)
* 14:54 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc2020.codfw.wmnet
* 13:48 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 14:53 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc2019.codfw.wmnet
* 13:47 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 14:47 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ncredir2001.codfw.wmnet
* 13:47 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 14:43 jynus: stop db1095 instance in preparation of its decom [[phab:T273732|T273732]]
* 13:46 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 14:41 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host ncredir2001.codfw.wmnet
* 13:36 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic1060.eqiad.wmnet with OS bullseye
* 14:38 godog: swift codfw-prod decrease HDD weight for ms-be20[16-27] - [[phab:T272837|T272837]]
* 13:36 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 14:37 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc2019.codfw.wmnet
* 13:36 awight@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:822130{{!}}trwikiquote: Install WikiLove extension (T314895)]] (duration: 03m 30s)
* 14:30 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti5002.eqsin.wmnet
* 13:35 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 14:28 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ncredir2002.codfw.wmnet
* 13:35 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 14:22 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host ncredir2002.codfw.wmnet
* 13:34 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 14:21 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ganeti5002.eqsin.wmnet
* 13:33 filippo@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host logstash2003.codfw.wmnet
* 14:21 godog: roll-restart rsync/swift-object-replicator in codfw to apply memory limits
* 13:25 awight@deploy1002: Synchronized static/images: Config: [[gerrit:821330{{!}}Revert "trwiki: Change old and new vector logos for 500k articles"]] (part 3) (duration: 03m 09s)
* 14:21 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ncredir4001.ulsfo.wmnet
* 13:21 awight@deploy1002: Synchronized logos/: Config: [[gerrit:821330{{!}}Revert "trwiki: Change old and new vector logos for 500k articles"]] (part 2) (duration: 03m 09s)
* 14:18 effie: start rolling reboots of  mc[2019-2027,2029-2037].codfw.wmnet [[phab:T273278|T273278]]
* 13:19 topranks: merging CR821781 to expose additional network info in puppet facts
* 14:16 mbsantos@deploy1001: Finished deploy [kartotherian/deploy@47fc426]: (no justification provided) (duration: 00m 12s)
* 13:18 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 14:16 mbsantos@deploy1001: Started deploy [kartotherian/deploy@47fc426]: (no justification provided)
* 13:18 awight@deploy1002: Synchronized wmf-config/: Config: [[gerrit:821330{{!}}Revert "trwiki: Change old and new vector logos for 500k articles"]] (part 1) (duration: 03m 13s)
* 14:15 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host ncredir4001.ulsfo.wmnet
* 13:17 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 14:14 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ncredir4002.ulsfo.wmnet
* 13:17 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 14:14 moritzm: installing ffmpeg security updates on stretch
* 13:16 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 14:11 mbsantos@deploy1001: Finished deploy [kartotherian/deploy@0a38bc5]: (no justification provided) (duration: 00m 03s)
* 13:14 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic1060.eqiad.wmnet with reason: host reimage
* 14:11 mbsantos@deploy1001: Started deploy [kartotherian/deploy@0a38bc5]: (no justification provided)
* 13:11 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 14:10 mbsantos@deploy1001: Finished deploy [tilerator/deploy@46a2eaf]: (no justification provided) (duration: 00m 13s)
* 13:11 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic1060.eqiad.wmnet with reason: host reimage
* 14:10 mbsantos@deploy1001: Started deploy [tilerator/deploy@46a2eaf]: (no justification provided)
* 13:10 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 14:07 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host ncredir4002.ulsfo.wmnet
* 13:10 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 14:05 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ncredir5001.eqsin.wmnet
* 13:09 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 13:58 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: NO-OP: {{Gerrit|7c67b2f03cbc27cf9e5f214a6f0ea0856d8c1ae4}}: bnwiki: wgGEHelpPanelLinks: Remove text in brackets ([[phab:T266020|T266020]]) (duration: 01m 12s)
* 13:08 awight@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:822073{{!}}Enable editor line numbering on all namespaces, for twwiki (T302852)]] (duration: 03m 42s)
* 13:51 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host ncredir5001.eqsin.wmnet
* 12:56 ryankemper@cumin1001: START - Cookbook sre.hosts.reimage for host elastic1060.eqiad.wmnet with OS bullseye
* 13:50 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ncredir5002.eqsin.wmnet
* 12:55 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation Operation.REIMAGE (1 nodes at a time) for ElasticSearch cluster search_eqiad: eqiad cluster reimage (bullseye upgrade) - ryankemper@cumin1001 - [[phab:T289135|T289135]]
* 13:44 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host ncredir5002.eqsin.wmnet
* 12:49 aikochou@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
* 13:44 vgutierrez: rolling restart of ncredir instances (kernel upgrade)
* 12:46 aikochou@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' .
* 13:36 moritzm: installing openldap security updates on buster (client-side tools/libs only, slapd instance already updated)
* 12:26 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=restbase2018.codfw.wmnet
* 13:31 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1157.eqiad.wmnet with reason: REIMAGE
* 12:26 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=restbase202[367].codfw.wmnet
* 13:31 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mwdebug1003.eqiad.wmnet
* 12:17 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
* 13:31 godog: reboot logstash2005.codfw.wmnet, no ssh / stuck
* 12:17 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 13:29 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1157.eqiad.wmnet with reason: REIMAGE
* 12:17 elukey@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
* 13:29 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host mwdebug1003.eqiad.wmnet
* 12:16 elukey@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
* 13:10 jbond42: upload cas_6.2.7 to downgrade cas [[phab:T273867|T273867]]
* 12:13 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host logstash2003.codfw.wmnet
* 13:04 ariel@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on snapshot1010.eqiad.wmnet with reason: REIMAGE
* 12:11 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
* 13:02 ariel@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on snapshot1010.eqiad.wmnet with reason: REIMAGE
* 12:10 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
* 12:27 moritzm: installing libdatetime-timezone-perl updates on Buster
* 12:09 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
* 12:17 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on 17 hosts with reason: reboot
* 11:20 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1162.eqiad.wmnet with reason: Maintenance
* 12:17 jmm@cumin2001: START - Cookbook sre.hosts.downtime for 4:00:00 on 17 hosts with reason: reboot
* 11:20 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1162.eqiad.wmnet with reason: Maintenance
* 12:17 moritzm: rebooting mw[1264-1268,1276-1277,1337-1338,1404-1409,1411,1413].eqiad.wmnet for kernel update
* 09:58 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1162.eqiad.wmnet with reason: Maintenance
* 12:08 godog: bounce rsyslog on centrallog1001
* 09:57 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1162.eqiad.wmnet with reason: Maintenance
* 11:47 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: dc=eqiad,cluster=maps,service=kartotherian,name=maps1009.eqiad.wmnet
* 09:56 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1162.eqiad.wmnet with reason: Maintenance
* 11:47 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: dc=eqiad,cluster=maps,service=kartotherian-ssl,name=maps1009.eqiad.wmnet
* 09:56 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1162.eqiad.wmnet with reason: Maintenance
* 11:30 elukey@cumin1001: END (PASS) - Cookbook sre.aqs.roll-restart (exit_code=0)
* 09:49 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1162.eqiad.wmnet with reason: Maintenance
* 11:26 elukey@cumin1001: START - Cookbook sre.aqs.roll-restart
* 09:49 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1162.eqiad.wmnet with reason: Maintenance
* 11:07 elukey@puppetmaster1001: conftool action : set/pooled=true; selector: dnsdisc=eventstreams-internal
* 09:32 godog: arm keyholder on netmon2001
* 10:35 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on 93 hosts with reason: reboot
* 09:09 jbond: update gnutls28 on bullseye systems
* 10:35 moritzm: rebooting mw[2261-2262,2268-2271,2273-2277,2283-2288,2290-2335,2337-2339,2350-2376].codfw.wmnet
* 09:00 jbond: update unzip
* 10:34 jmm@cumin2001: START - Cookbook sre.hosts.downtime for 4:00:00 on 93 hosts with reason: reboot
* 08:21 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
* 10:23 marostegui@cumin1001: dbctl commit (dc=all): 'db1173 (re)pooling @ 100%: Slowly pooling db1173 for the first time in s6', diff saved to https://phabricator.wikimedia.org/P14204 and previous config saved to /var/cache/conftool/dbconfig/20210204-102312-root.json
* 08:13 elukey@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
* 10:15 elukey: restart pybal on lvs1015 (low-traffic active) to pick up new changes for eventstreams-internal (new VIP) - [[phab:T269160|T269160]]
* 08:12 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
* 10:13 elukey: restart pybal on lvs2009 (low-traffic active) to pick up new changes for eventstreams-internal (new VIP) - [[phab:T269160|T269160]]
* 08:06 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.debug (exit_code=0) for Netbox interface ID cr3-ulsfo:xe-0/1/1
* 10:08 elukey: restart pybal on lvs1016 (low-traffic standby) to pick up new changes for eventstreams-internal (new VIP) - [[phab:T269160|T269160]]
* 08:06 ayounsi@cumin1001: START - Cookbook sre.network.debug for Netbox interface ID cr3-ulsfo:xe-0/1/1
* 10:08 marostegui@cumin1001: dbctl commit (dc=all): 'db1173 (re)pooling @ 75%: Slowly pooling db1173 for the first time in s6', diff saved to https://phabricator.wikimedia.org/P14203 and previous config saved to /var/cache/conftool/dbconfig/20210204-100808-root.json
* 07:58 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.debug (exit_code=0) for Netbox interface ID cr3-ulsfo:xe-0/1/1
* 10:05 elukey: restart pybal on lvs2010 (low-traffic standby) to pick up new changes for eventstreams-internal (new VIP) - [[phab:T269160|T269160]]
* 07:57 ayounsi@cumin1001: START - Cookbook sre.network.debug for Netbox interface ID cr3-ulsfo:xe-0/1/1
* 09:58 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 07:55 oblivian@cumin1001: conftool action : set/pooled=false; selector: dnsdisc=k8s-ingress-wikikube-rw,name=codfw
* 09:53 marostegui@cumin1001: dbctl commit (dc=all): 'db1173 (re)pooling @ 60%: Slowly pooling db1173 for the first time in s6', diff saved to https://phabricator.wikimedia.org/P14202 and previous config saved to /var/cache/conftool/dbconfig/20210204-095305-root.json
* 07:51 vgutierrez: rolling restart of pybal in eqsin and ulsfo
* 09:49 marostegui@cumin1001: START - Cookbook sre.hosts.decommission
* 07:24 oblivian@cumin1001: conftool action : set/pooled=false; selector: dnsdisc=restbase-async,name=eqiad
* 09:45 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on 37 hosts with reason: reboot
* 07:24 oblivian@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=shellbox-timeline
* 09:44 jmm@cumin2001: START - Cookbook sre.hosts.downtime for 4:00:00 on 37 hosts with reason: reboot
* 07:23 oblivian@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=inference
* 09:41 moritzm: rebooting mw[2215-2219,2221-2243,2246-2249,2251-2253,2255,2258 for kernel update
* 07:19 _joe_: pooling all services in codfw
* 09:38 marostegui@cumin1001: dbctl commit (dc=all): 'db1173 (re)pooling @ 50%: Slowly pooling db1173 for the first time in s6', diff saved to https://phabricator.wikimedia.org/P14201 and previous config saved to /var/cache/conftool/dbconfig/20210204-093801-root.json
* 07:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1147 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P32357 and previous config saved to /var/cache/conftool/dbconfig/20220811-070312-ladsgroup.json
* 09:37 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host flowspec1001.eqiad.wmnet
* 07:03 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1147.eqiad.wmnet with reason: Maintenance
* 09:33 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host flowspec1001.eqiad.wmnet
* 07:02 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1147.eqiad.wmnet with reason: Maintenance
* 09:24 XioNoX: re-enable ping offload in esams - [[phab:T273278|T273278]]
* 07:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1160 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P32356 and previous config saved to /var/cache/conftool/dbconfig/20220811-070252-ladsgroup.json
* 09:24 marostegui@cumin1001: dbctl commit (dc=all): 'Remove db1078 from dbctl [[phab:T273597|T273597]]', diff saved to https://phabricator.wikimedia.org/P14199 and previous config saved to /var/cache/conftool/dbconfig/20210204-092414-marostegui.json
* 06:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1160', diff saved to https://phabricator.wikimedia.org/P32355 and previous config saved to /var/cache/conftool/dbconfig/20220811-064746-ladsgroup.json
* 09:23 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ping3001.esams.wmnet
* 06:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1160', diff saved to https://phabricator.wikimedia.org/P32354 and previous config saved to /var/cache/conftool/dbconfig/20220811-063240-ladsgroup.json
* 09:22 marostegui@cumin1001: dbctl commit (dc=all): 'db1173 (re)pooling @ 30%: Slowly pooling db1173 for the first time in s6', diff saved to https://phabricator.wikimedia.org/P14198 and previous config saved to /var/cache/conftool/dbconfig/20210204-092257-root.json
* 06:28 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1162.eqiad.wmnet with reason: Maintenance
* 09:20 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ping3001.esams.wmnet
* 06:28 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1162.eqiad.wmnet with reason: Maintenance
* 09:17 XioNoX: disable ping offload in esams (eqiad re-enabled) - [[phab:T273278|T273278]]
* 06:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1160 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P32353 and previous config saved to /var/cache/conftool/dbconfig/20220811-061734-ladsgroup.json
* 09:15 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ping1001.eqiad.wmnet
* 06:17 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1162.eqiad.wmnet with reason: Maint
* 09:15 godog: roll restart lvs low-traffic in codfw/eqiad for swift healthcheck updates
* 06:17 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1162.eqiad.wmnet with reason: Maint
* 09:11 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ping1001.eqiad.wmnet
* 06:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depool db1162 ([[phab:T314368|T314368]] [[phab:T298555|T298555]] [[phab:T312863|T312863]] [[phab:T310011|T310011]] [[phab:T309311|T309311]] [[phab:T60674|T60674]] [[phab:T298560|T298560]] [[phab:T303603|T303603]] [[phab:T310485|T310485]])', diff saved to https://phabricator.wikimedia.org/P32352 and previous config saved to /var/cache/conftool/dbconfig/20220811-060625-ladsgroup.json
* 09:10 XioNoX: disable ping offload in eqiad (codfw-re-enabled) - [[phab:T273278|T273278]]
* 06:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Promote db1122 to s2 primary and set section read-write [[phab:T314368|T314368]]', diff saved to https://phabricator.wikimedia.org/P32351 and previous config saved to /var/cache/conftool/dbconfig/20220811-060113-ladsgroup.json
* 09:07 marostegui@cumin1001: dbctl commit (dc=all): 'db1173 (re)pooling @ 25%: Slowly pooling db1173 for the first time in s6', diff saved to https://phabricator.wikimedia.org/P14197 and previous config saved to /var/cache/conftool/dbconfig/20210204-090754-root.json
* 06:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Set s2 eqiad as read-only for maintenance - [[phab:T314368|T314368]]', diff saved to https://phabricator.wikimedia.org/P32350 and previous config saved to /var/cache/conftool/dbconfig/20220811-060042-ladsgroup.json
* 09:06 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ping2001.codfw.wmnet
* 06:00 Amir1: Starting s2 eqiad failover from db1162 to db1122 - [[phab:T314368|T314368]]
* 09:04 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ping2001.codfw.wmnet
* 05:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Set db1122 with weight 0 [[phab:T314368|T314368]]', diff saved to https://phabricator.wikimedia.org/P32349 and previous config saved to /var/cache/conftool/dbconfig/20220811-051913-ladsgroup.json
* 09:02 XioNoX: disable ping offload in codfw - [[phab:T273278|T273278]]
* 05:19 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 25 hosts with reason: Primary switchover s2 [[phab:T314368|T314368]]
* 08:52 marostegui@cumin1001: dbctl commit (dc=all): 'db1173 (re)pooling @ 20%: Slowly pooling db1173 for the first time in s6', diff saved to https://phabricator.wikimedia.org/P14196 and previous config saved to /var/cache/conftool/dbconfig/20210204-085250-root.json
* 05:18 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 25 hosts with reason: Primary switchover s2 [[phab:T314368|T314368]]
* 08:37 marostegui@cumin1001: dbctl commit (dc=all): 'db1173 (re)pooling @ 15%: Slowly pooling db1173 for the first time in s6', diff saved to https://phabricator.wikimedia.org/P14195 and previous config saved to /var/cache/conftool/dbconfig/20210204-083747-root.json
* m: chown -R librenms /srv/librenms/rrd/ on netmon1003 [[phab:T314972|T314972]]
* 08:33 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sretest1001.eqiad.wmnet
* 03:51 cwhite: chown librenms /srv/librenms/rrd/* on netmon1003 [[phab:T314972|T314972]]
* 08:29 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host sretest1001.eqiad.wmnet
* 02:55 ejegg: civicrm upgraded from {{Gerrit|1f91ac2d}} to {{Gerrit|92467234}}
* 08:22 marostegui@cumin1001: dbctl commit (dc=all): 'db1173 (re)pooling @ 12%: Slowly pooling db1173 for the first time in s6', diff saved to https://phabricator.wikimedia.org/P14194 and previous config saved to /var/cache/conftool/dbconfig/20210204-082243-root.json
* 02:46 ejegg: updated process-control yaml files with @wmff alias
* 08:22 moritzm: reset failed ifup@ens5 on xhgui2001/xhgui1001 [[phab:T273026|T273026]]
* 02:08 ejegg: civicrm rolled back from {{Gerrit|92467234}} to {{Gerrit|1f91ac2d}}
* 08:16 marostegui@cumin1001: dbctl commit (dc=all): 'db1137 (re)pooling @ 100%: Repool db1137 after daemon restart', diff saved to https://phabricator.wikimedia.org/P14193 and previous config saved to /var/cache/conftool/dbconfig/20210204-081605-root.json
* 02:05 ejegg: civicrm upgraded from {{Gerrit|1f91ac2d}} to {{Gerrit|92467234}}
* 08:10 ariel@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on snapshot1009.eqiad.wmnet with reason: REIMAGE
* 01:40 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 08:08 ariel@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on snapshot1009.eqiad.wmnet with reason: REIMAGE
* 01:39 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 08:07 marostegui@cumin1001: dbctl commit (dc=all): 'db1173 (re)pooling @ 10%: Slowly pooling db1173 for the first time in s6', diff saved to https://phabricator.wikimedia.org/P14192 and previous config saved to /var/cache/conftool/dbconfig/20210204-080740-root.json
* 01:39 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 08:01 marostegui@cumin1001: dbctl commit (dc=all): 'db1137 (re)pooling @ 75%: Repool db1137 after daemon restart', diff saved to https://phabricator.wikimedia.org/P14191 and previous config saved to /var/cache/conftool/dbconfig/20210204-080101-root.json
* 01:38 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 07:52 marostegui@cumin1001: dbctl commit (dc=all): 'db1173 (re)pooling @ 7%: Slowly pooling db1173 for the first time in s6', diff saved to https://phabricator.wikimedia.org/P14190 and previous config saved to /var/cache/conftool/dbconfig/20210204-075236-root.json
* 01:38 tstarling@deploy1002: Synchronized wmf-config/logging.php: (no justification provided) (duration: 03m 25s)
* 07:45 marostegui@cumin1001: dbctl commit (dc=all): 'db1137 (re)pooling @ 50%: Repool db1137 after daemon restart', diff saved to https://phabricator.wikimedia.org/P14189 and previous config saved to /var/cache/conftool/dbconfig/20210204-074558-root.json
* 01:19 tstarling@puppetmaster1001: conftool action : set/pooled=true; selector: dnsdisc=(appservers{{!}}api)-ro,name=codfw
* 07:37 marostegui@cumin1001: dbctl commit (dc=all): 'db1173 (re)pooling @ 5%: Slowly pooling db1173 for the first time in s6', diff saved to https://phabricator.wikimedia.org/P14188 and previous config saved to /var/cache/conftool/dbconfig/20210204-073733-root.json
* 01:19 tstarling@puppetmaster1001: conftool action : set/pooled=true; selector: dnsdisc=sessionstore,name=codfw
* 07:30 marostegui@cumin1001: dbctl commit (dc=all): 'db1137 (re)pooling @ 25%: Repool db1137 after daemon restart', diff saved to https://phabricator.wikimedia.org/P14187 and previous config saved to /var/cache/conftool/dbconfig/20210204-073054-root.json
* 00:58 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp2042.codfw.wmnet,service=varnish-fe
* 07:22 marostegui@cumin1001: dbctl commit (dc=all): 'db1173 (re)pooling @ 3%: Slowly pooling db1173 for the first time in s6', diff saved to https://phabricator.wikimedia.org/P14186 and previous config saved to /var/cache/conftool/dbconfig/20210204-072229-root.json
* 00:58 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp2042.codfw.wmnet,service=ats-be
* 07:16 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=0) for hosts an-worker1117.eqiad.wmnet
* 00:58 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp2042.codfw.wmnet,service=ats-tls
* 07:15 marostegui@cumin1001: dbctl commit (dc=all): 'db1137 (re)pooling @ 20%: Repool db1137 after daemon restart', diff saved to https://phabricator.wikimedia.org/P14185 and previous config saved to /var/cache/conftool/dbconfig/20210204-071551-root.json
* 00:57 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on cp2042.codfw.wmnet with reason: host down; depooled and will debug tomorrow
* 07:13 elukey@cumin1001: START - Cookbook sre.hadoop.init-hadoop-workers for hosts an-worker1117.eqiad.wmnet
* 00:57 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on cp2042.codfw.wmnet with reason: host down; depooled and will debug tomorrow
* 07:07 marostegui@cumin1001: dbctl commit (dc=all): 'db1173 (re)pooling @ 2%: Slowly pooling db1173 for the first time in s6', diff saved to https://phabricator.wikimedia.org/P14184 and previous config saved to /var/cache/conftool/dbconfig/20210204-070726-root.json
* 07:00 marostegui@cumin1001: dbctl commit (dc=all): 'db1137 (re)pooling @ 10%: Repool db1137 after daemon restart', diff saved to https://phabricator.wikimedia.org/P14183 and previous config saved to /var/cache/conftool/dbconfig/20210204-070047-root.json
* 06:45 marostegui@cumin1001: dbctl commit (dc=all): 'db1137 (re)pooling @ 5%: Repool db1137 after daemon restart', diff saved to https://phabricator.wikimedia.org/P14182 and previous config saved to /var/cache/conftool/dbconfig/20210204-064544-root.json
* 06:42 marostegui: Restart mysql on db1137
* 06:42 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1137 [[phab:T266483|T266483]]', diff saved to https://phabricator.wikimedia.org/P14181 and previous config saved to /var/cache/conftool/dbconfig/20210204-064157-marostegui.json
* 06:30 marostegui@cumin1001: dbctl commit (dc=all): 'db1173 (re)pooling @ 1%: Slowly pooling db1173 for the first time in s6', diff saved to https://phabricator.wikimedia.org/P14180 and previous config saved to /var/cache/conftool/dbconfig/20210204-063033-root.json
* 06:28 marostegui@cumin1001: dbctl commit (dc=all): 'Add db1173 to dbctl - depooled [[phab:T258361|T258361]]', diff saved to https://phabricator.wikimedia.org/P14179 and previous config saved to /var/cache/conftool/dbconfig/20210204-062836-marostegui.json
* 02:02 legoktm@deploy1001: Synchronized logos/config.yaml: Update and recompress logos for nowiki, cawiki, fiwiki, ukwiki, cswiki, huwiki, trwiki (2/2) (duration: 01m 06s)
* 02:00 legoktm@deploy1001: Synchronized static/images/project-logos/: Update and recompress logos for nowiki, cawiki, fiwiki, ukwiki, cswiki, huwiki, trwiki (1/2) (duration: 01m 10s)
* 01:15 ebernhardson@deploy1001: Finished deploy [wikimedia/discovery/analytics@4b4872d]: transfer_to_es: Increase timeout waiting for source data to three hours (duration: 01m 16s)
* 01:13 ebernhardson@deploy1001: Started deploy [wikimedia/discovery/analytics@4b4872d]: transfer_to_es: Increase timeout waiting for source data to three hours
* 01:04 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1310.eqiad.wmnet
* 00:55 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1318.eqiad.wmnet
* 00:51 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1310.eqiad.wmnet
* 00:44 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1318.eqiad.wmnet
* 00:22 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2279.codfw.wmnet
* 00:19 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2280.codfw.wmnet
* 00:17 eileen: civicrm revision changed from {{Gerrit|dfb2ea2148}} to {{Gerrit|1e9a86dd6e}}, config revision is {{Gerrit|01ea3062f4}}
* 00:12 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2279.codw.wmnet
* 00:11 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2280.codfw.wmnet
* 00:05 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1310.eqiad.wmnet with reason: REIMAGE
* 00:03 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1310.eqiad.wmnet with reason: REIMAGE


== 2021-02-03 ==
== 2022-08-10 ==
* 23:59 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1318.eqiad.wmnet with reason: REIMAGE
* 21:25 bking@cumin1001: conftool action : set/weight=10:pooled=yes; selector: name=wdqs1016.eqiad.wmnet
* 23:56 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1318.eqiad.wmnet with reason: REIMAGE
* 21:23 bking@cumin1001: conftool action : set/weight=10:pooled=yes; selector: name=wdqs1014.eqiad.wmnet
* 23:51 mutante: installservers: replacing squid proxy logrotate cron with systemd timer
* 21:10 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on 16 hosts with reason: [[phab:T309810|T309810]]
* 23:50 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2279.codfw.wmnet with reason: REIMAGE
* 21:10 bking@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on 16 hosts with reason: [[phab:T309810|T309810]]
* 23:48 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (
* 21:09 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on elastic[1101-1102].eqiad.wmnet with reason: [[phab:T309810|T309810]]
* 21:09 bking@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on elastic[1101-1102].eqiad.wmnet with reason: [[phab:T309810|T309810]]
* 21:00 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 21:00 cjming: end of UTC late backport


== 2021-02-02 ==
== 2022-08-09 ==
* 23:53 mutante: mw1300 - scap pull (it crashed earlier put is back after powercycling)
* 23:17 bking@cumin1001: conftool action : set/weight=10:pooled=yes; selector: name=wdqs1011.eqiad.wmnet
* 23:52 jhuneidi@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'blubberoid' for release 'staging' .
* 23:07 bking@deploy1002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply
* 23:30 mutante: powercycling crashed m1300.eqiad.wmnet
* 23:06 bking@deploy1002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply
* 21:56 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1335.eqiad.wmnet
* 22:51 bking@deploy1002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply
* 21:56 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1336.eqiad.wmnet
* 22:51 bking@deploy1002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply
* 21:56 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1335.eqiad.wmnet
* 22:49 bking@deploy1002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply
* 21:55 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1336.eqiad.wmnet
* 22:49 bking@deploy1002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply
* 21:09 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1335.eqiad.wmnet with reason: REIMAGE
* 22:46 bking@cumin1001: conftool action : set/weight=10:pooled=yes; selector: name=wdqs1015.eqiad.wmnet
* 21:08 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1336.eqiad.wmnet with reason: REIMAGE
* 22:31 ryankemper@deploy1002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply
* 21:06 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1335.eqiad.wmnet with reason: REIMAGE
* 22:31 ryankemper@deploy1002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply
* 21:05 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1336.eqiad.wmnet with reason: REIMAGE
* 22:28 bking@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99)
* 20:12 cdanis: ✔️ cdanis@cumin1001.eqiad.wmnet ~ 🕒☕ sudo cumin A:cp 'enable-puppet "cdanis deploying {{Gerrit|I7003b7b6}} and {{Gerrit|Idd0e124f5}} [[phab:T263496|T263496]]"'  # test on cp2027 looks good, perhaps slightly-increased Varnish CPU consumption but hard to be sure
* 22:02 bking@deploy1002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply
* 20:00 Lucas_WMDE: Morning backport window done
* 22:02 bking@deploy1002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply
* 19:58 lucaswerkmeister-wmde@deploy1001: Synchronized php-1.36.0-wmf.29/extensions/WikibaseMediaInfo/: Backport: [[gerrit:661092{{!}}Pass $databaseName into WikiPageEntityDataLoader (T273622)]] (duration: 01m 07s)
* 21:57 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on wdqs2006.codfw.wmnet with reason: [[phab:T310146|T310146]]
* 19:57 lucaswerkmeister-wmde@deploy1001: Synchronized php-1.36.0-wmf.29/extensions/Wikibase/: Backport: [[gerrit:661091{{!}}Add wiki ID to WikiPageEntityDataLoader (T273622)]] (duration: 01m 25s)
* 21:57 bking@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on wdqs2006.codfw.wmnet with reason: [[phab:T310146|T310146]]
* 19:52 cdanis: ❌cdanis@cumin1001.eqiad.wmnet ~ 🕒☕ sudo cumin A:cp 'disable-puppet "cdanis deploying {{Gerrit|I7003b7b6}} and {{Gerrit|Idd0e124f5}} [[phab:T263496|T263496]]"'
* 21:53 bking@deploy1002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply
* 19:00 mbsantos@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'proton' for release 'production' .
* 21:52 bking@deploy1002: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: apply
* 18:48 mbsantos@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'proton' for release 'production' .
* 21:50 bking@deploy1002: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: apply
* 18:43 mbsantos@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'proton' for release 'production' .
* 21:49 bking@deploy1002: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: apply
* 18:23 milimetric@deploy1001: Finished deploy [analytics/turnilo/deploy@052348b]: (no justification provided) (duration: 00m 03s)
* 21:43 bking@deploy1002: helmfile [codfw] DONE helmfile.d/services/changeprop: apply
* 18:23 milimetric@deploy1001: Started deploy [analytics/turnilo/deploy@052348b]: (no justification provided)
* 21:43 bking@deploy1002: helmfile [codfw] START helmfile.d/services/changeprop: apply
* 18:22 milimetric@deploy1001: deploy aborted: (no justification provided) (duration: 00m 10s)
* 21:43 bking@deploy1002: helmfile [eqiad] DONE helmfile.d/services/changeprop: apply
* 18:22 milimetric@deploy1001: Started deploy [analytics/turnilo/deploy@052348b]: (no justification provided)
* 21:43 bking@deploy1002: helmfile [eqiad] START helmfile.d/services/changeprop: apply
* 18:17 mbsantos@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
* 21:43 bking@deploy1002: helmfile [staging] DONE helmfile.d/services/changeprop: apply
* 18:07 mbsantos@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
* 21:43 bking@deploy1002: helmfile [staging] START helmfile.d/services/changeprop: apply
* 18:03 mbsantos@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'mobileapps' for release 'staging' .
* 21:08 bking@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 16:37 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host auth2001.codfw.wmnet
* 21:00 bking@cumin1001: conftool action : set/weight=10:pooled=yes; selector: name=wdqs1014.eqiad.wmnet
* 16:33 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host auth1002.eqiad.wmnet
* 20:56 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
* 16:30 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host auth1002.eqiad.wmnet
* 20:55 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
* 16:30 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host auth2001.codfw.wmnet
* 20:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1148 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P32332 and previous config saved to /var/cache/conftool/dbconfig/20220809-205548-ladsgroup.json
* 15:20 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host miscweb2002.codfw.wmnet
* 20:51 bking@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for wdqs1014.eqiad.wmnet
* 15:19 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host moscovium.eqiad.wmnet
* 20:51 bking@cumin1001: START - Cookbook sre.hosts.remove-downtime for wdqs1014.eqiad.wmnet
* 15:16 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host miscweb2002.codfw.wmnet
* 20:46 bking@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99)
* 15:16 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host moscovium.eqiad.wmnet
* 20:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1148', diff saved to https://phabricator.wikimedia.org/P32331 and previous config saved to /var/cache/conftool/dbconfig/20220809-204042-ladsgroup.json
* 14:39 marostegui@cumin1001: dbctl commit (dc=all): 'db1094 (re)pooling @ 100%: Repool db1094 after cloning another host', diff saved to https://phabricator.wikimedia.org/P14135 and previous config saved to /var/cache/conftool/dbconfig/20210202-143950-root.json
* 20:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1148', diff saved to https://phabricator.wikimedia.org/P32330 and previous config saved to /var/cache/conftool/dbconfig/20220809-202536-ladsgroup.json
* 14:38 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host failoid1001.eqiad.wmnet
* 20:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1148 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P32329 and previous config saved to /var/cache/conftool/dbconfig/20220809-201030-ladsgroup.json
* 14:35 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host failoid1001.eqiad.wmnet
* 19:57 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on wdqs1014.eqiad.wmnet with reason: [[phab:T314890|T314890]]
* 14:35 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host debmonitor1002.eqiad.wmnet
* 19:57 bking@cumin1001: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on wdqs1014.eqiad.wmnet with reason: [[phab:T314890|T314890]]
* 14:35 klausman@cumin2001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host ml-etcd2003.codfw.wmnet
* 19:56 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on wdqs1016.eqiad.wmnet with reason: [[phab:T314890|T314890]]
* 14:29 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host debmonitor2002.codfw.wmnet
* 19:56 bking@cumin1001: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on wdqs1016.eqiad.wmnet with reason: [[phab:T314890|T314890]]
* 14:26 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host debmonitor1002.eqiad.wmnet
* 19:55 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on wdqs1015.eqiad.wmnet with reason: [[phab:T314890|T314890]]
* 14:26 hashar@deploy1001: Finished scap: testwikis wikis to 1.36.0-wmf.29 (duration: 73m 10s)
* 19:55 bking@cumin1001: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on wdqs1015.eqiad.wmnet with reason: [[phab:T314890|T314890]]
* 14:26 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host debmonitor2002.codfw.wmnet
* 19:38 dcausse@deploy1002: helmfile [codfw] DONE helmfile.d/services/rdf-streaming-updater: apply
* 14:24 marostegui@cumin1001: dbctl commit (dc=all): 'db1094 (re)pooling @ 75%: Repool db1094 after cloning another host', diff saved to https://phabricator.wikimedia.org/P14134 and previous config saved to /var/cache/conftool/dbconfig/20210202-142446-root.json
* 19:36 dcausse@deploy1002: helmfile [codfw] START helmfile.d/services/rdf-streaming-updater: apply
* 14:21 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host deploy1002.eqiad.wmnet
* 19:35 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1130.eqiad.wmnet with reason: Maintenance
* 14:21 klausman@cumin2001: START - Cookbook sre.ganeti.makevm for new host ml-etcd2003.codfw.wmnet
* 19:35 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on db1130.eqiad.wmnet with reason: Maintenance
* 14:12 klausman@cumin2001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host ml-etcd2002.codfw.wmnet
* 19:25 bking@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 14:09 marostegui@cumin1001: dbctl commit (dc=all): 'db1094 (re)pooling @ 50%: Repool db1094 after cloning another host', diff saved to https://phabricator.wikimedia.org/P14133 and previous config saved to /var/cache/conftool/dbconfig/20210202-140943-root.json
* 18:09 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 14:06 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host deploy1002.eqiad.wmnet
* 18:06 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 13:54 marostegui@cumin1001: dbctl commit (dc=all): 'db1094 (re)pooling @ 25%: Repool db1094 after cloning another host', diff saved to https://phabricator.wikimedia.org/P14132 and previous config saved to /var/cache/conftool/dbconfig/20210202-135439-root.json
* 17:54 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 13:49 klausman@cumin2001: START - Cookbook sre.ganeti.makevm for new host ml-etcd2002.codfw.wmnet
* 17:47 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 13:49 klausman@cumin2001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host ml-etcd2001.codfw.wmnet
* 17:38 bking@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic1072.eqiad.wmnet with OS bullseye
* 13:39 marostegui@cumin1001: dbctl commit (dc=all): 'db1094 (re)pooling @ 10%: Repool db1094 after cloning another host', diff saved to https://phabricator.wikimedia.org/P14128 and previous config saved to /var/cache/conftool/dbconfig/20210202-133936-root.json
* 17:29 vgutierrez: test trafficserver 9.1.2-1wm2 in cp6016 - [[phab:T309651|T309651]]
* 13:34 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host deploy2002.codfw.wmnet
* 17:15 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic1072.eqiad.wmnet with reason: host reimage
* 13:32 klausman@cumin2001: START - Cookbook sre.ganeti.makevm for new host ml-etcd2001.codfw.wmnet
* 17:13 bking@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic1072.eqiad.wmnet with reason: host reimage
* 13:31 klausman@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host ml-etcd1003.eqiad.wmnet
* 17:00 bking@cumin1001: START - Cookbook sre.hosts.reimage for host elastic1072.eqiad.wmnet with OS bullseye
* 13:27 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host doc2001.codfw.wmnet
* 16:54 bking@cumin1001: END (PASS) - Cookbook sre.elasticsearch.force-shard-allocation (exit_code=0)
* 13:17 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host deploy2002.codfw.wmnet
* 16:54 bking@cumin1001: START - Cookbook sre.elasticsearch.force-shard-allocation
* 13:17 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host doc1002.eqiad.wmnet
* 16:53 bking@cumin1001: END (PASS) - Cookbook sre.elasticsearch.force-shard-allocation (exit_code=0)
* 13:13 hashar@deploy1001: Started scap: testwikis wikis to 1.36.0-wmf.29
* 16:53 bking@cumin1001: START - Cookbook sre.elasticsearch.force-shard-allocation
* 13:13 klausman@cumin1001: START - Cookbook sre.ganeti.makevm for new host ml-etcd1003.eqiad.wmnet
* 16:26 bking@deploy1002: helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply
* 13:12 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host doc2001.codfw.wmnet
* 16:26 bking@deploy1002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply
* 13:12 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host doc1002.eqiad.wmnet
* 16:01 bking@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic1069.eqiad.wmnet with OS bullseye
* 13:11 klausman@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host ml-etcd1002.eqiad.wmnet
* 15:45 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic1069.eqiad.wmnet with reason: host reimage
* 13:11 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host irc2001.wikimedia.org
* 15:42 bking@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic1069.eqiad.wmnet with reason: host reimage
* 13:11 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host failoid2001.codfw.wmnet
* 15:32 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 13:09 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host failoid2001.codfw.wmnet
* 15:31 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 13:08 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host irc2001.wikimedia.org
* 15:31 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 13:08 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host urldownloader1002.wikimedia.org
* 15:30 bking@cumin1001: START - Cookbook sre.hosts.reimage for host elastic1069.eqiad.wmnet with OS bullseye
* 13:08 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host urldownloader2002.wikimedia.org
* 15:28 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 13:05 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host urldownloader2002.wikimedia.org
* 15:27 bking@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic1058.eqiad.wmnet with OS bullseye
* 13:05 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host urldownloader1002.wikimedia.org
* 15:08 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic1058.eqiad.wmnet with reason: host reimage
* 12:52 klausman@cumin1001: START - Cookbook sre.ganeti.makevm for new host ml-etcd1002.eqiad.wmnet
* 15:05 bking@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic1058.eqiad.wmnet with reason: host reimage
* 12:52 klausman@cumin1001: END (ERROR) - Cookbook sre.ganeti.makevm (exit_code=97) for new host ml-etcd1002.eqiad.wmnet
* 14:59 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
* 12:51 klausman@cumin1001: START - Cookbook sre.ganeti.makevm for new host ml-etcd1002.eqiad.wmnet
* 14:59 elukey@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
* 12:50 sukhe@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:05:00 on malmok.wikimedia.org with reason: rebooting for kernel update
* m: finished running 'homer "status:active" commit "netmon: Add the netmon1003 host as a syslog destination"' in the cumin1001 host. Homer reported no errors.
* 12:50 sukhe@cumin1001: START - Cookbook sre.hosts.downtime for 0:05:00 on malmok.wikimedia.org with reason: rebooting for kernel update
* 14:54 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
* 12:47 sukhe@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:10:00 on cescout1001.eqiad.wmnet with reason: rebooting for kernel update
* 14:50 bking@cumin1001: START - Cookbook sre.hosts.reimage for host elastic1058.eqiad.wmnet with OS bullseye
* 12:46 sukhe@cumin1001: START - Cookbook sre.hosts.downtime for 0:10:00 on cescout1001.eqiad.wmnet with reason: rebooting for kernel update
* 14:28 bking@cumin1001: conftool action : set/pooled=false; selector: dnsdisc=wdqs,name=codfw
* 12:46 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host pki2001.codfw.wmnet
* 13:57 kevinbazira@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
* 12:46 klausman@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host ml-etcd1002.eqiad.wmnet
* 13:57 kevinbazira@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
* 12:46 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host puppetboard1002.eqiad.wmnet
* m: Add the new netmon1003 host as a syslog destination in homer templates/common/system.conf https://gerrit.wikimedia.org/r/c/operations/homer/public/+/819124
* 12:44 klausman@cumin2001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host ml-etcd2001.codfw.wmnet
* m: Successfully ran '# run-puppet-merge' in the netmon1002 and netmon1003 hosts.
* 12:43 klausman@cumin1001: START - Cookbook sre.ganeti.makevm for new host ml-etcd1002.eqiad.wmnet
* m: Running '# run-puppet-agent' in the netmon1003 host
* 12:43 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host pki1001.eqiad.wmnet
* m: Running '# run-puppet-agent' in the netmon1002 host
* 12:43 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host puppetboard2002.codfw.wmnet
* 13:47 ryankemper@cumin1001: END (PASS) - Cookbook sre.elasticsearch.force-shard-allocation (exit_code=0)
* 12:42 jbond@cumin1001: START - Cookbook sre.hosts.reboot-single for host puppetboard1002.eqiad.wmnet
* 13:46 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.force-shard-allocation
* 12:42 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host install4001.wikimedia.org
* m: puppet-merge on puppetmaster2004.codfw.wmnet for patch 819179 succeeded
* 12:42 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host install5001.wikimedia.org
* m: Set netmon1003 as netmon_server and netmon1002 as a netmon_servers_failover in the Puppet repository https://gerrit.wikimedia.org/r/c/operations/puppet/+/819179
* 12:41 klausman@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host ml-etcd1001.eqiad.wmnet
* m: authdns updated successfully
* 12:41 jbond@cumin1001: START - Cookbook sre.hosts.reboot-single for host puppetboard2002.codfw.wmnet
* m: Had to revert https://gerrit.wikimedia.org/r/c/operations/dns/+/819177 because I rebased my changes incorrectly, sent the new patch in https://gerrit.wikimedia.org/r/c/operations/dns/+/821746
* 12:40 klausman@cumin2001: START - Cookbook sre.ganeti.makevm for new host ml-etcd2001.codfw.wmnet
* m: running '# authdns-update' in  ns0.wikimedia.org
* 12:40 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host install2003.wikimedia.org
* m: Flip DNS for LibreNMS and Smokeping from netmon1002 to netmon1003 https://gerrit.wikimedia.org/r/c/operations/dns/+/819177
* 12:40 jbond@cumin1001: START - Cookbook sre.hosts.reboot-single for host pki2001.codfw.wmnet
* 13:23 jynus: stop replication on db1117:m1 [[phab:T309074|T309074]]
* 12:39 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host install3001.wikimedia.org
* m: netmon1002 to netmon1003 failover
* 12:38 jbond@cumin1001: START - Cookbook sre.hosts.reboot-single for host pki1001.eqiad.wmnet
* 13:17 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
* 12:38 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host install1003.wikimedia.org
* 13:16 elukey@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
* 12:37 jbond@cumin1001: START - Cookbook sre.hosts.reboot-single for host install5001.wikimedia.org
* 10:58 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
* 12:37 jbond@cumin1001: START - Cookbook sre.hosts.reboot-single for host install4001.wikimedia.org
* 09:53 vgutierrez: rolling restart of pybal in eqsin - [[phab:T310070|T310070]]
* 12:36 jbond@cumin1001: START - Cookbook sre.hosts.reboot-single for host install3001.wikimedia.org
* 09:25 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
* 12:35 jbond@cumin1001: START - Cookbook sre.hosts.reboot-single for host install2003.wikimedia.org
* 09:24 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
* 12:35 jbond@cumin1001: START - Cookbook sre.hosts.reboot-single for host install1003.wikimedia.org
* 09:24 elukey@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
* 12:34 klausman@cumin2001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host ml-etcd2001.codfw.wmnet
* 09:12 vgutierrez: rolling restart of pybal in codfw - [[phab:T310070|T310070]]
* 12:34 urbanecm@deploy1001: Synchronized docroot/noc/conf/index.php: {{Gerrit|995649efafc2f5a44824af1e96128baaf15ef928}}: noc: yaml files may be published w/o .txt extension (duration: 00m 57s)
* 08:47 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
* 12:34 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host idp1001.wikimedia.org
* 08:30 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
* 12:33 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host apt1001.wikimedia.org
* 08:28 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
* 12:30 klausman@cumin2001: START - Cookbook sre.ganeti.makevm for new host ml-etcd2001.codfw.wmnet
* 08:27 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
* 12:30 klausman@cumin2001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host ml-etcd2001.codfw.wmnet
* 08:27 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 12:29 jbond@cumin1001: START - Cookbook sre.hosts.reboot-single for host idp1001.wikimedia.org
* 08:27 elukey@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
* 12:29 jbond@cumin1001: START - Cookbook sre.hosts.reboot-single for host apt1001.wikimedia.org
* 08:26 elukey@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
* 12:26 urbanecm@deploy1001: Synchronized docroot/noc/createTxtFileSymlinks.sh: {{Gerrit|210647e915c91a4bddf0407d05436a9e231d3f29}}: noc: Publicly expose logos/config.yaml (2/2; [[phab:T273330|T273330]]) (duration: 00m 55s)
* 08:24 jynus: starting data check using es1021 and es2021, expect increased read traffic [[phab:T314559|T314559]]
* 12:23 urbanecm@deploy1001: Synchronized docroot/noc/conf/logos-config.yaml: {{Gerrit|210647e915c91a4bddf0407d05436a9e231d3f29}}: noc: Publicly expose logos/config.yaml (1/2; [[phab:T273330|T273330]]) (duration: 00m 57s)
* 08:21 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
* 12:22 klausman@cumin2001: START - Cookbook sre.ganeti.makevm for new host ml-etcd2001.codfw.wmnet
* 06:22 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1130.eqiad.wmnet with reason: Maintenance
* 12:20 urbanecm@deploy1001: Synchronized php-1.36.0-wmf.27/extensions/GrowthExperiments/includes/HomepageModules/Banner.php: {{Gerrit|da8f328640ca5c46385a57e706cd76614bbfdc7a}}: Banner module: Switch to using activated/unactivated for state ([[phab:T273084|T273084]]) (duration: 00m 58s)
* 06:22 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1130.eqiad.wmnet with reason: Maintenance
* 12:18 urbanecm@deploy1001: Synchronized php-1.36.0-wmf.27/extensions/GrowthExperiments/includes/Specials/SpecialHomepage.php: {{Gerrit|18c59d018b6ef72c750e25588518d2df6f492db3}}: SpecialHomepage: Do not load start-startediting if SE arent enabled ([[phab:T273243|T273243]]) (duration: 01m 01s)
* 06:19 Amir1: dbmaint s5@eqiad ([[phab:T312863|T312863]] [[phab:T312984|T312984]] [[phab:T310011|T310011]] [[phab:T310485|T310485]])
* 12:18 klausman@cumin1001: START - Cookbook sre.ganeti.makevm for new host ml-etcd1001.eqiad.wmnet
* 06:11 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db1130.eqiad.wmnet with reason: Maint
* 12:16 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host apt2001.wikimedia.org
* 06:11 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db1130.eqiad.wmnet with reason: Maint
* 12:15 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host idp-test2001.wikimedia.org
* 06:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depool db1130 [[phab:T314370|T314370]]', diff saved to https://phabricator.wikimedia.org/P32323 and previous config saved to /var/cache/conftool/dbconfig/20220809-060836-ladsgroup.json
* 12:15 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host idp2001.wikimedia.org
* 06:07 oblivian@deploy1002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply
* 12:14 jbond@cumin1001: START - Cookbook sre.hosts.reboot-single for host apt2001.wikimedia.org
* 06:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Promote db1100 to s5 primary and set section read-write [[phab:T314370|T314370]]', diff saved to https://phabricator.wikimedia.org/P32322 and previous config saved to /var/cache/conftool/dbconfig/20220809-060159-ladsgroup.json
* 12:14 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host idp-test1001.wikimedia.org
* 06:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Set s5 eqiad as read-only for maintenance - [[phab:T314370|T314370]]', diff saved to https://phabricator.wikimedia.org/P32321 and previous config saved to /var/cache/conftool/dbconfig/20220809-060105-ladsgroup.json
* 12:13 jbond42: upload cas_6.3 package
* 06:00 Amir1: Starting s5 eqiad failover from db1130 to db1100 - [[phab:T314370|T314370]]
* 12:12 jbond@cumin1001: START - Cookbook sre.hosts.reboot-single for host idp2001.wikimedia.org
* 05:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Set db1100 with weight 0 [[phab:T314370|T314370]]', diff saved to https://phabricator.wikimedia.org/P32320 and previous config saved to /var/cache/conftool/dbconfig/20220809-051251-ladsgroup.json
* 12:12 jbond@cumin1001: START - Cookbook sre.hosts.reboot-single for host idp-test2001.wikimedia.org
* 05:12 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 22 hosts with reason: Primary switchover s5 [[phab:T314370|T314370]]
* 12:11 jbond@cumin1001: START - Cookbook sre.hosts.reboot-single for host idp-test1001.wikimedia.org
* 05:11 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 22 hosts with reason: Primary switchover s5 [[phab:T314370|T314370]]
* 11:06 jynus@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1171.eqiad.wmnet with reason: REIMAGE
* 02:42 ejegg: SmashPig upgraded from {{Gerrit|9b97ea15}} to {{Gerrit|13e9e9cc}}
* 11:04 jynus@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1171.eqiad.wmnet with reason: REIMAGE
* 02:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1148 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P32318 and previous config saved to /var/cache/conftool/dbconfig/20220809-023113-ladsgroup.json
* 10:30 XioNoX: re-enable DE-CIX codfw peering sessions
* 02:31 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1148.eqiad.wmnet with reason: Maintenance
* 10:17 elukey@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 02:30 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1148.eqiad.wmnet with reason: Maintenance
* 10:09 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1094 to clone db1174 - [[phab:T258361|T258361]]', diff saved to https://phabricator.wikimedia.org/P14121 and previous config saved to /var/cache/conftool/dbconfig/20210202-100859-marostegui.json
* 02:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3314 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P32317 and previous config saved to /var/cache/conftool/dbconfig/20220809-023052-ladsgroup.json
* 10:08 elukey@cumin1001: START - Cookbook sre.dns.netbox
* 02:28 ejegg: payments-wiki upgraded from {{Gerrit|6880236d}} to {{Gerrit|cf5e1848}}
* 10:02 hashar: Restarted Gerrit primary on gerrit1001 # [[phab:T273223|T273223]]
* 02:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3314', diff saved to https://phabricator.wikimedia.org/P32316 and previous config saved to /var/cache/conftool/dbconfig/20220809-021546-ladsgroup.json
* 10:00 hashar@deploy1001: Finished deploy [gerrit/gerrit@c3cd63b]: Gerrit primary on gerrit1001 to v3.2.7 [[phab:T273223|T273223]] (duration: 00m 09s)
* 02:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3314', diff saved to https://phabricator.wikimedia.org/P32315 and previous config saved to /var/cache/conftool/dbconfig/20220809-020040-ladsgroup.json
* 10:00 hashar@deploy1001: Started deploy [gerrit/gerrit@c3cd63b]: Gerrit primary on gerrit1001 to v3.2.7 [[phab:T273223|T273223]]
* 01:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3314 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P32314 and previous config saved to /var/cache/conftool/dbconfig/20220809-014534-ladsgroup.json
* 10:00 hashar: Restarted Gerrit replica on gerrit2001 # [[phab:T273223|T273223]]
* 09:56 hashar@deploy1001: Finished deploy [gerrit/gerrit@c3cd63b]: Gerrit replica on gerrit2001 to v3.2.7 [[phab:T273223|T273223]] (duration: 00m 12s)
* 09:56 hashar@deploy1001: Started deploy [gerrit/gerrit@c3cd63b]: Gerrit replica on gerrit2001 to v3.2.7 [[phab:T273223|T273223]]
* 09:27 legoktm@cumin1001: conftool action : set/pooled=yes; selector: name=mw1381.eqiad.wmnet
* 08:56 XioNoX: disable DE-CIX codfw peering session
* 08:30 godog: swift eqiad-prod: add weight back to sdg on ms-be1054 - [[phab:T273582|T273582]]
* 08:02 legoktm: depooled mw1381.eqiad.wmnet for perf testing ([[phab:T273312|T273312]])
* 07:59 legoktm@cumin1001: conftool action : set/pooled=no; selector: name=mw1381.eqiad.wmnet
* 07:45 legoktm@cumin1001: conftool action : set/pooled=yes; selector: name=mw1403.eqiad.wmnet
* 07:45 legoktm@cumin1001: conftool action : set/pooled=yes; selector: name=mw1405.eqiad.wmnet
* 07:31 marostegui@cumin1001: dbctl commit (dc=all): 'es1022 (re)pooling @ 100%: Repool es1022 after a restart', diff saved to https://phabricator.wikimedia.org/P14118 and previous config saved to /var/cache/conftool/dbconfig/20210202-073105-root.json
* 07:21 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 07:16 marostegui@cumin1001: dbctl commit (dc=all): 'es1022 (re)pooling @ 75%: Repool es1022 after a restart', diff saved to https://phabricator.wikimedia.org/P14117 and previous config saved to /var/cache/conftool/dbconfig/20210202-071602-root.json
* 07:14 marostegui@cumin1001: START - Cookbook sre.hosts.decommission
* 07:00 marostegui@cumin1001: dbctl commit (dc=all): 'es1022 (re)pooling @ 50%: Repool es1022 after a restart', diff saved to https://phabricator.wikimedia.org/P14116 and previous config saved to /var/cache/conftool/dbconfig/20210202-070057-root.json
* 06:45 marostegui@cumin1001: dbctl commit (dc=all): 'es1022 (re)pooling @ 25%: Repool es1022 after a restart', diff saved to https://phabricator.wikimedia.org/P14115 and previous config saved to /var/cache/conftool/dbconfig/20210202-064553-root.json
* 06:30 marostegui@cumin1001: dbctl commit (dc=all): 'es1022 (re)pooling @ 10%: Repool es1022 after a restart', diff saved to https://phabricator.wikimedia.org/P14114 and previous config saved to /var/cache/conftool/dbconfig/20210202-063050-root.json
* 06:24 marostegui: Restart mysql on es1022
* 06:23 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es1022 [[phab:T266483|T266483]]', diff saved to https://phabricator.wikimedia.org/P14113 and previous config saved to /var/cache/conftool/dbconfig/20210202-062303-marostegui.json
* 04:12 ryankemper: [WDQS Deploy] Deploy complete. Successful test query placed on query.wikidata.org, there's no relevant criticals in Icinga, and Grafana looks good
* 03:40 ryankemper: [WDQS Deploy] Restarting `wdqs-categories` across lvs-managed hosts, one node at a time: `sudo -E cumin -b 1 'A:wdqs-all and not A:wdqs-test' 'depool && sleep 45 && systemctl restart wdqs-categories && sleep 45 && pool'`
* 03:40 ryankemper: [WDQS Deploy] Restarted `wdqs-categories` across all test hosts simultaneously: `sudo -E cumin 'A:wdqs-test' 'systemctl restart wdqs-categories'`
* 03:40 ryankemper: [WDQS Deploy] Restarted `wdqs-updater` across all hosts, 4 hosts at a time: `sudo -E cumin -b 4 'A:wdqs-all' 'systemctl restart wdqs-updater'`
* 03:36 ryankemper@deploy1001: Finished deploy [wdqs/wdqs@ad9db35]: 0.3.62 (duration: 06m 59s)
* 03:29 ryankemper: [WDQS Deploy] Tests passing following deploy of `0.3.62` on canary `wdqs1003`; proceeding to rest of fleet
* 03:29 ryankemper@deploy1001: Started deploy [wdqs/wdqs@ad9db35]: 0.3.62
* 03:26 ryankemper: [WDQS Deploy] Gearing up for deploy of wdqs `0.3.62`. Pre-deploy tests passing on canary `wdqs1003`
* 03:21 ryankemper: `sudo systemctl restart wdqs-blazegraph` on `wdqs1006`


== 2021-02-01 ==
== 2022-08-08 ==
* 23:54 legoktm@deploy1001: Synchronized wmf-config/profiler.php: profiler: Send data to excimer-buster pipeline ([[phab:T273312|T273312]]) (duration: 00m 57s)
* 23:52 tstarling@deploy1002: Synchronized wmf-config/InitialiseSettings.php: clean up testwiki experiments [[phab:T314750|T314750]] (duration: 03m 19s)
* 23:15 legoktm: depooling mw1403 and mw1405 for perf testing
* 23:47 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 23:14 legoktm@cumin1001: conftool action : set/pooled=no; selector: name=mw1405.eqiad.wmnet
* 23:46 tstarling@deploy1002: Synchronized wmf-config/CommonSettings.php: clean up testwiki experiments [[phab:T314750|T314750]] (duration: 03m 27s)
* 23:14 legoktm@cumin1001: conftool action : set/pooled=no; selector: name=mw1403.eqiad.wmnet
* 23:46 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 23:14 legoktm@cumin1001: conftool action : set/pooled=yes; selector: name=mw1278.eqiad.wmnet
* 23:46 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 23:05 urbanecm@deploy1001: Synchronized php-1.36.0-wmf.28/extensions/Collection/includes/Specials/SpecialCollection.php: {{Gerrit|3c7864ca1d5aadc9cd251939c0e23f661faef5e9}}: Remove unnecessary calls to WikiPage ([[phab:T273101|T273101]]) (duration: 00m 58s)
* 23:45 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 22:09 sbassett: Deployed security patch for [[phab:T272386|T272386]]
* 23:32 eileen___: config revision changed from {{Gerrit|f5668044}} to 787cd0e0<eileen___> eileen
* 22:05 sbassett: Deployed security patch for [[phab:T270713|T270713]]
* 23:32 eileen___: civicrm upgraded from {{Gerrit|497bddf7}} to {{Gerrit|1f91ac2d}}
* 22:04 legoktm: depooling mw1278.eqiad.wmnet for perf testing
* 22:16 ryankemper@cumin1001: END (ERROR) - Cookbook sre.elasticsearch.rolling-operation (exit_code=97) Operation.REIMAGE (1 nodes at a time) for ElasticSearch cluster search_eqiad: eqiad cluster reimage (bullseye upgrade) - ryankemper@cumin1001 - [[phab:T289135|T289135]]
* 22:03 legoktm@cumin1001: conftool action : set/pooled=no; selector: name=mw1278.eqiad.wmnet
* 22:16 ryankemper@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=1) for host elastic1065.eqiad.wmnet with OS bullseye
* 22:03 legoktm@cumin1001: conftool action : set/pooled=yes; selector: name=mw1277.eqiad.wmnet
* 21:53 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic1065.eqiad.wmnet with reason: host reimage
* 21:29 dancy@deploy1001: rebuilt and synchronized wikiversions files: all wikis to 1.36.0-wmf.27
* 21:50 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic1065.eqiad.wmnet with reason: host reimage
* 20:53 andrew@deploy1001: Finished deploy [striker/deploy@b6441b8]: Striker hacked fix for [[phab:T272410|T272410]] (duration: 00m 57s)
* 21:36 ryankemper@cumin1001: START - Cookbook sre.hosts.reimage for host elastic1065.eqiad.wmnet with OS bullseye
* 20:52 andrew@deploy1001: Started deploy [striker/deploy@b6441b8]: Striker hacked fix for [[phab:T272410|T272410]]
* 21:12 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic1062.eqiad.wmnet with OS bullseye
* 20:27 legoktm: depooling mw1277.eqiad.wmnet for perf testing
* 20:53 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic1062.eqiad.wmnet with reason: host reimage
* 19:42 Urbanecm: Morning B&C done
* 20:50 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic1062.eqiad.wmnet with reason: host reimage
* 19:41 urbanecm@deploy1001: Synchronized php-1.36.0-wmf.28/extensions/GrowthExperiments/includes/Specials/SpecialHomepage.php: {{Gerrit|1acaba4b3650dfb757d29af5395cc7660c839756}}: SpecialHomepage: Do not load start-startediting if SE arent enabled ([[phab:T273243|T273243]]) (duration: 01m 05s)
* 20:36 ryankemper@cumin1001: START - Cookbook sre.hosts.reimage for host elastic1062.eqiad.wmnet with OS bullseye
* 19:39 urbanecm@deploy1001: Synchronized php-1.36.0-wmf.28/extensions/GrowthExperiments/includes/HomepageModules/Banner.php: {{Gerrit|d39746aa3ed07dfa9173a98d253c61771d5592a1}}: Banner module: Switch to using activated/unactivated for state ([[phab:T273084|T273084]]) (duration: 01m 05s)
* 20:32 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 19:23 mutante: gerrit2001 - restarting gerrit (replica)
* 20:31 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 19:14 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|6360e7899b7dedc29941783b1cdf76df8db073d7}}: Enable DiscussionTools as a beta feature on 3 wikis per request ([[phab:T258554|T258554]]; [[phab:T265829|T265829]]; [[phab:T273192|T273192]]) (duration: 01m 04s)
* 20:31 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 19:11 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|a98f08f9582215e8f12f9e9c43f79a1f2fc21a2f}}: Enable DiscussionTools as a beta feature on wikis with language variants ([[phab:T272639|T272639]]) (duration: 01m 07s)
* 20:29 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation Operation.REIMAGE (1 nodes at a time) for ElasticSearch cluster search_eqiad: eqiad cluster reimage (bullseye upgrade) - ryankemper@cumin1001 - [[phab:T289135|T289135]]
* 18:57 mutante: restarting gerrit for change 660030 (no ticket)
* 20:28 cjming: end of UTC late backport window
* 18:44 mutante: new Wikimedia project language "mni" added - Meitei is a Sino-Tibetan language and the predominant language and lingua franca of the state of Manipur in northeastern India.
* 20:27 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 18:03 mutante: ping3001 - apt-get clean; apt-get autoremove; let it finish kernel upgrade; was out of disk
* 20:27 cjming@deploy1002: Synchronized php-1.39.0-wmf.23/skins/Vector/resources/skins.vector.styles/layouts/grid.less: Backport: [[gerrit:821243{{!}}Fix grid blowout bug (T314756)]] (duration: 03m 26s)
* 17:59 mutante: ping 2001 - apt-get clean; apt autoremove - was out of disk as well
* 20:12 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 17:52 mutante: ping1001 - apt-get clean gets back 447M - it was out of disk completely, now 84% usage
* 20:11 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 17:34 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 20:11 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 17:29 pt1979@cumin2001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
* 20:11 cjming@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:817785{{!}}Disable sticky header edit A/B test for pilot wikis (T312296)]] (duration: 03m 35s)
* 17:17 jbond@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host deneb.codfw.wmnet
* 20:08 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 17:16 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host install5001.wikimedia.org
* 17:34 bking@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic1088.eqiad.wmnet with OS bullseye
* 17:15 jbond@cumin2001: START - Cookbook sre.hosts.reboot-single for host deneb.codfw.wmnet
* 17:15 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic1088.eqiad.wmnet with reason: host reimage
* 17:15 pt1979@cumin2001: START - Cookbook sre.dns.netbox
* 17:12 bking@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic1088.eqiad.wmnet with reason: host reimage
* 17:14 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host install4001.wikimedia.org
* 17:00 bking@cumin1001: START - Cookbook sre.hosts.reimage for host elastic1088.eqiad.wmnet with OS bullseye
* 17:14 mutante: decom'ing francium.eqiad.wmnet, formerly HTML dumps server, replaced by htmldumper1001
* 16:54 bking@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic1085.eqiad.wmnet with OS bullseye
* 17:13 jbond@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host install2003.wikimedia.org
* 16:49 ryankemper@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) Operation.REIMAGE (1 nodes at a time) for ElasticSearch cluster search_eqiad: eqiad cluster reimage (bullseye upgrade) - ryankemper@cumin1001 - [[phab:T289135|T289135]]
* 17:12 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
* 16:43 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 17:12 jbond@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host install1003.wikimedia.org
* 16:41 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic1085.eqiad.wmnet with reason: host reimage
* 17:12 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host install3001.wikimedia.org
* 16:39 pt1979@cumin2002: START - Cookbook sre.dns.netbox
* 17:12 jbond@cumin1001: START - Cookbook sre.hosts.reboot-single for host install5001.wikimedia.org
* 16:38 bking@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic1085.eqiad.wmnet with reason: host reimage
* 17:10 jbond@cumin1001: START - Cookbook sre.hosts.reboot-single for host install4001.wikimedia.org
* 16:26 bking@cumin1001: START - Cookbook sre.hosts.reimage for host elastic1085.eqiad.wmnet with OS bullseye
* 17:10 sukhe: upload dnsdist_1.5.1-3wm1 to apt.wm.o (buster) - [[phab:T252132|T252132]]
* 16:24 bking@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host elastic1085.eqiad.wmnet with OS bullseye
* 17:09 jbond@cumin1001: START - Cookbook sre.hosts.reboot-single for host install3001.wikimedia.org
* 16:19 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic1085.eqiad.wmnet with reason: host reimage
* 17:09 jbond@cumin1001: END (ERROR) - Cookbook sre.hosts.reboot-single (exit_code=97) for host install2003.wikimedia.org
* 16:16 bking@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic1085.eqiad.wmnet with reason: host reimage
* 17:09 jbond@cumin1001: START - Cookbook sre.hosts.reboot-single for host install2003.wikimedia.org
* 16:16 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation Operation.REIMAGE (1 nodes at a time) for ElasticSearch cluster search_eqiad: eqiad cluster reimage (bullseye upgrade) - ryankemper@cumin1001 - [[phab:T289135|T289135]]
* 17:08 jbond@cumin2001: START - Cookbook sre.hosts.reboot-single for host install2003.wikimedia.org
* 16:14 elukey@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
* 17:07 jbond@cumin2001: START - Cookbook sre.hosts.reboot-single for host install1003.wikimedia.org
* 16:12 elukey@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
* 17:06 jbond@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host apt1001.wikimedia.org
* 16:10 elukey@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
* 17:03 jbond@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host idp1001.wikimedia.org
* 16:09 ryankemper@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) Operation.REIMAGE (1 nodes at a time) for ElasticSearch cluster search_eqiad: eqiad cluster reimage (bullseye upgrade) - ryankemper@cumin1001 - [[phab:T289135|T289135]]
* 17:01 jbond@cumin2001: START - Cookbook sre.hosts.reboot-single for host idp1001.wikimedia.org
* 16:04 bking@cumin1001: START - Cookbook sre.hosts.reimage for host elastic1085.eqiad.wmnet with OS bullseye
* 17:01 jbond@cumin2001: START - Cookbook sre.hosts.reboot-single for host apt1001.wikimedia.org
* 16:00 bking@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic1084.eqiad.wmnet with OS bullseye
* 16:53 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host idp-test2001.wikimedia.org
* 15:58 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation Operation.REIMAGE (1 nodes at a time) for ElasticSearch cluster search_eqiad: eqiad cluster reimage (bullseye upgrade) - ryankemper@cumin1001 - [[phab:T289135|T289135]]
* 16:52 jbond@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host idp2001.wikimedia.org
* 15:47 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic1084.eqiad.wmnet with reason: host reimage
* 16:50 jbond@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host apt2001.wikimedia.org
* 15:46 sukhe: upload reprepro -C main include bullseye-wikimedia python-pynetbox_6.6.0-1+wmf11u1_amd64.changes
* 16:46 jbond@cumin2001: START - Cookbook sre.hosts.reboot-single for host idp2001.wikimedia.org
* 15:45 bking@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic1084.eqiad.wmnet with reason: host reimage
* 16:46 jbond@cumin2001: START - Cookbook sre.hosts.reboot-single for host apt2001.wikimedia.org
* 15:37 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es2021.codfw.wmnet with reason: Maint
* 16:44 jbond@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host pki1001.eqiad.wmnet
* 15:37 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es2021.codfw.wmnet with reason: Maint
* 16:44 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host pki2001.codfw.wmnet
* 15:32 bking@cumin1001: START - Cookbook sre.hosts.reimage for host elastic1084.eqiad.wmnet with OS bullseye
* 16:43 jbond@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host idp-test1001.wikimedia.org
* 14:59 elukey@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
* 16:42 jbond42: enable puppet fleet wide to post reboots
* 14:55 elukey@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
* 16:35 jbond@cumin1001: START - Cookbook sre.hosts.reboot-single for host idp-test2001.wikimedia.org
* 14:47 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on cp5001.eqsin.wmnet with reason: depooled: faulty DIMM: [[phab:T314256|T314256]]
* 16:35 jbond@cumin2001: START - Cookbook sre.hosts.reboot-single for host idp-test1001.wikimedia.org
* 14:46 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on cp5001.eqsin.wmnet with reason: depooled: faulty DIMM: [[phab:T314256|T314256]]
* 16:34 jbond@cumin2001: START - Cookbook sre.hosts.reboot-single for host pki1001.eqiad.wmnet
* 14:34 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
* 16:34 jbond@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host puppetmaster2003.codfw.wmnet
* 14:11 kevinbazira@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
* 16:34 jbond@cumin1001: START - Cookbook sre.hosts.reboot-single for host pki2001.codfw.wmnet
* 13:03 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 16:33 jbond@cumin1001: END (ERROR) - Cookbook sre.hosts.reboot-single (exit_code=97) for host puppetmaster2001.codfw.wmnet
* 13:01 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 16:33 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host puppetmaster2002.codfw.wmnet
* 13:01 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 16:33 jbond@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host puppetdb2002.codfw.wmnet
* 12:58 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 16:28 jbond@cumin2001: START - Cookbook sre.hosts.reboot-single for host puppetdb2002.codfw.wmnet
* 12:56 urbanecm@deploy1002: Synchronized wmf-config/CommonSettings.php: {{Gerrit|77fd5abdd7d9462869259e1511bbcf2d7ce62246}}: Growth: Add new rights to wgAvailableRights (duration: 03m 24s)
* 16:28 jbond@cumin2001: START - Cookbook sre.hosts.reboot-single for host puppetmaster2003.codfw.wmnet
* 12:30 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1102.eqiad.wmnet
* 16:28 jbond@cumin1001: START - Cookbook sre.hosts.reboot-single for host puppetmaster2002.codfw.wmnet
* 12:09 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 16:28 jbond@cumin1001: START - Cookbook sre.hosts.reboot-single for host puppetmaster2001.codfw.wmnet
* 12:09 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 16:28 jbond@cumin1001: END (ERROR) - Cookbook sre.hosts.reboot-single (exit_code=97) for host puppetmaster1001.eqiad.wmnet
* 12:06 urbanecm@deploy1002: Synchronized php-1.39.0-wmf.23/extensions/GrowthExperiments/: {{Gerrit|3eaf155678b7313c55dcca0cd39ab29f73eead37}}: MentorTools: Do not use MentorWeightManager ([[phab:T314362|T314362]]) (duration: 03m 31s)
* 16:26 jbond@cumin2001: END (ERROR) - Cookbook sre.hosts.reboot-single (exit_code=97) for host puppetdb1002.eqiad.wmnet
* 12:04 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 16:22 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host puppetmaster1002.eqiad.wmnet
* 11:43 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-worker1102.eqiad.wmnet
* 16:21 jbond@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host puppetmaster1003.eqiad.wmnet
* 11:21 jelto@cumin1001: conftool action : set/pooled=yes; selector: name=kubernetes2022.codfw.wmnet
* 16:15 XioNoX: fail-back RG1 back to node1 on pfw3-eqiad - [[phab:T263833|T263833]]
* 11:21 jelto: kubectl uncordon kubernetes2022.codfw.wmnet
* 16:14 jbond@cumin1001: START - Cookbook sre.hosts.reboot-single for host puppetmaster1001.eqiad.wmnet
* 10:43 Amir1: Removing db2079 from orchestrator ([[phab:T313885|T313885]])
* 16:14 jbond@cumin1001: START - Cookbook sre.hosts.reboot-single for host puppetmaster1002.eqiad.wmnet
* 10:39 Amir1: Removing db2079 from zarcillo ([[phab:T313885|T313885]])
* 16:14 jbond@cumin2001: START - Cookbook sre.hosts.reboot-single for host puppetmaster1003.eqiad.wmnet
* 10:35 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db2079.codfw.wmnet
* 16:13 jbond@cumin2001: START - Cookbook sre.hosts.reboot-single for host puppetdb1002.eqiad.wmnet
* 10:35 ladsgroup@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 16:12 jbond42: disable puppet fleet wide to preform reboots
* 10:30 ladsgroup@cumin1001: START - Cookbook sre.dns.netbox
* 16:09 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host puppetboard1002.eqiad.wmnet
* 10:25 ladsgroup@cumin1001: START - Cookbook sre.hosts.decommission for hosts db2079.codfw.wmnet
* 16:05 jbond@cumin1001: START - Cookbook sre.hosts.reboot-single for host puppetboard1002.eqiad.wmnet
* 10:18 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 12:00:00 on db2079.codfw.wmnet with reason: Decom
* 16:05 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host puppetboard2002.codfw.wmnet
* 10:17 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 12:00:00 on db2079.codfw.wmnet with reason: Decom
* 16:03 jbond@cumin1001: START - Cookbook sre.hosts.reboot-single for host puppetboard2002.codfw.wmnet
* 08:41 jbond: deploy libtirpc update
* 16:01 marostegui@cumin1001: dbctl commit (dc=all): 'db1147 (re)pooling @ 100%: Repool db1147 after a restart', diff saved to https://phabricator.wikimedia.org/P14110 and previous config saved to /var/cache/conftool/dbconfig/20210201-160122-root.json
* 07:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1144:3314 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P32310 and previous config saved to /var/cache/conftool/dbconfig/20220808-075723-ladsgroup.json
* 15:59 jbond42: install buster kernel update
* 07:57 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1144.eqiad.wmnet with reason: Maintenance
* 15:46 XioNoX: failover RG1 back to node0 on pfw3-eqiad - [[phab:T263833|T263833]]
* 07:57 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1144.eqiad.wmnet with reason: Maintenance
* 15:46 marostegui@cumin1001: dbctl commit (dc=all): 'db1147 (re)pooling @ 80%: Repool db1147 after a restart', diff saved to https://phabricator.wikimedia.org/P14109 and previous config saved to /var/cache/conftool/dbconfig/20210201-154618-root.json
* 07:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1142 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P32309 and previous config saved to /var/cache/conftool/dbconfig/20220808-075702-ladsgroup.json
* 15:31 marostegui@cumin1001: dbctl commit (dc=all): 'db1147 (re)pooling @ 60%: Repool db1147 after a restart', diff saved to https://phabricator.wikimedia.org/P14108 and previous config saved to /var/cache/conftool/dbconfig/20210201-153115-root.json
* 07:53 godog: grow sda/sdb 3 by 100G on thanos-be2001 - [[phab:T314275|T314275]]
* 15:16 marostegui@cumin1001: dbctl commit (dc=all): 'db1147 (re)pooling @ 40%: Repool db1147 after a restart', diff saved to https://phabricator.wikimedia.org/P14107 and previous config saved to /var/cache/conftool/dbconfig/20210201-151611-root.json
* 07:50 godog: grow sda/sdb 3 by 100G on thanos-be1004 - [[phab:T314275|T314275]]
* 15:01 marostegui@cumin1001: dbctl commit (dc=all): 'db1147 (re)pooling @ 20%: Repool db1147 after a restart', diff saved to https://phabricator.wikimedia.org/P14106 and previous config saved to /var/cache/conftool/dbconfig/20210201-150107-root.json
* 07:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1142', diff saved to https://phabricator.wikimedia.org/P32308 and previous config saved to /var/cache/conftool/dbconfig/20220808-074156-ladsgroup.json
* 14:53 oblivian@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=swift,name=codfw
* 07:32 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 14:53 oblivian@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=swift-ro,name=codfw
* 07:27 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 14:46 marostegui@cumin1001: dbctl commit (dc=all): 'db1147 (re)pooling @ 10%: Repool db1147 after a restart', diff saved to https://phabricator.wikimedia.org/P14105 and previous config saved to /var/cache/conftool/dbconfig/20210201-144604-root.json
* 07:27 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 14:40 marostegui: Restart mysql on db1147 [[phab:T266483|T266483]]
* 07:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1142', diff saved to https://phabricator.wikimedia.org/P32307 and previous config saved to /var/cache/conftool/dbconfig/20220808-072650-ladsgroup.json
* 14:39 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1147 [[phab:T266483|T266483]]', diff saved to https://phabricator.wikimedia.org/P14104 and previous config saved to /var/cache/conftool/dbconfig/20210201-143925-marostegui.json
* 07:23 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 14:12 ladsgroup@deploy1001: Finished scap: [[gerrit:660796{{!}}Add Multilingual Wikisource to list of Wikidata's special sites]] ([[phab:T138332|T138332]]) (duration: 21m 52s)
* 07:22 kartik@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:820815{{!}}trwikivoyage: Create rollbacker user group (T314678)]] (duration: 03m 17s)
* 13:50 ladsgroup@deploy1001: Started scap: [[gerrit:660796{{!}}Add Multilingual Wikisource to list of Wikidata's special sites]] ([[phab:T138332|T138332]])
* 07:18 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 13:47 ladsgroup@deploy1001: scap sync-l10n completed (1.36.0-wmf.28) (duration: 00m 58s)
* 07:17 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 13:27 hashar@deploy1001: Synchronized php: group1 wikis to 1.36.0-wmf.28 (duration: 01m 03s)
* 07:17 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 13:26 hashar@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.36.0-wmf.28
* 07:16 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 12:43 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 100%: Slowly pooling db1166 for the first time', diff saved to https://phabricator.wikimedia.org/P14102 and previous config saved to /var/cache/conftool/dbconfig/20210201-124308-root.json
* 07:11 elukey: restart rsyslog on ml-serve2007
* 12:42 urbanecm@deploy1001: Synchronized wmf-config/logos.php: {{Gerrit|d70e8ac549145872c9d251cc78e6e40355029fc7}}: Update ombudsmenwiki logo (3/3) (duration: 01m 05s)
* 07:11 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 12:42 Urbanecm: Purge 'https://en.wikipedia.org/static/images/project-logos/ombudsmenwiki.png' ([[phab:T273323|T273323]])
* 07:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1142 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P32306 and previous config saved to /var/cache/conftool/dbconfig/20220808-071144-ladsgroup.json
* 12:41 urbanecm@deploy1001: Synchronized logos/config.yaml: {{Gerrit|d70e8ac549145872c9d251cc78e6e40355029fc7}}: Update ombudsmenwiki logo (2/3) (duration: 01m 04s)
* 07:10 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 12:40 urbanecm@deploy1001: Synchronized static/images/project-logos/: {{Gerrit|d70e8ac549145872c9d251cc78e6e40355029fc7}}: Update ombudsmenwiki logo (1/3) (duration: 01m 05s)
* 07:10 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 12:38 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|cf349361b392f1831fe5ebce8fb544b035a83835}}: ombudsmenwiki: Set sitename to "Ombuds Commission" ([[phab:T273323|T273323]]) (duration: 01m 06s)
* 07:09 kartik@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:820261{{!}}Enable SectionTranslation on 10 Wikipedias where ContentTranslation is default (T308829)]] (duration: 03m 15s)
* 12:35 urbanecm@deploy1001: Synchronized static/images/project-logos/: Regenerate a couple of logos from Commons (2/2) (duration: 01m 08s)
* 07:09 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 12:34 urbanecm@deploy1001: Synchronized logos/config.yaml: Regenerate a couple of logos from Commons (1/2) (duration: 01m 07s)
* 07:06 XioNoX: add CSP headers to Netbox - [[phab:T296356|T296356]]
* 12:28 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 75%: Slowly pooling db1166 for the first time', diff saved to https://phabricator.wikimedia.org/P14100 and previous config saved to /var/cache/conftool/dbconfig/20210201-122804-root.json
* 07:05 elukey: restart rsyslog on ml-serve-ctrl2001
* 12:25 urbanecm@deploy1001: Synchronized docroot/noc/createTxtFileSymlinks.sh: {{Gerrit|ec5b6d221b50d0b3807242d7a8869f97e6cbdbef}}: Publish logos.php at noc.wikimedia.org (2/2; [[phab:T273330|T273330]]) (duration: 01m 05s)
* 12:24 urbanecm@deploy1001: Synchronized docroot/noc/conf/logos.php.txt: {{Gerrit|ec5b6d221b50d0b3807242d7a8869f97e6cbdbef}}: Publish logos.php at noc.wikimedia.org (1/2; [[phab:T273330|T273330]]) (duration: 01m 04s)
* 12:20 Lucas_WMDE: EU backport&config window done
* 12:19 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:660774{{!}}wikidata: post edit constraint jobs on 40% of edits (T204031)]] (duration: 01m 03s)
* 12:13 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 60%: Slowly pooling db1166 for the first time', diff saved to https://phabricator.wikimedia.org/P14099 and previous config saved to /var/cache/conftool/dbconfig/20210201-121301-root.json
* 12:12 mlitn@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|9836287e0}}, {{Gerrit|424efdcdb}}: [WikibaseMediaInfo] Set wgMediaInfoMediaSearchHasLtrPlugin & wgMediaInfoMediaSearchConceptChipsSimpleHeuristics (duration: 01m 10s)
* 11:57 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 50%: Slowly pooling db1166 for the first time', diff saved to https://phabricator.wikimedia.org/P14098 and previous config saved to /var/cache/conftool/dbconfig/20210201-115757-root.json
* 11:50 oblivian@cumin1001: conftool action : set/pooled=false; selector: dnsdisc=swift-ro,name=codfw
* 11:49 oblivian@cumin1001: conftool action : set/pooled=false; selector: dnsdisc=^swift,name=codfw
* 11:47 jdrewniak@deploy1001: Synchronized portals: Wikimedia Portals Update: [[gerrit:660807{{!}} Bumping portals to master (T128546)]] (duration: 01m 04s)
* 11:46 jdrewniak@deploy1001: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: [[gerrit:660807{{!}} Bumping portals to master (T128546)]] (duration: 01m 14s)
* 11:42 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 30%: Slowly pooling db1166 for the first time', diff saved to https://phabricator.wikimedia.org/P14097 and previous config saved to /var/cache/conftool/dbconfig/20210201-114254-root.json
* 11:28 XioNoX: push pfw policies - [[phab:T272073|T272073]]
* 11:27 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 25%: Slowly pooling db1166 for the first time', diff saved to https://phabricator.wikimedia.org/P14096 and previous config saved to /var/cache/conftool/dbconfig/20210201-112750-root.json
* 11:12 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 20%: Slowly pooling db1166 for the first time', diff saved to https://phabricator.wikimedia.org/P14095 and previous config saved to /var/cache/conftool/dbconfig/20210201-111246-root.json
* 11:04 hashar@deploy1001: rebuilt and synchronized wikiversions files: group0 wikis to 1.36.0-wmf.28
* 11:01 marostegui@cumin1001: dbctl commit (dc=all): 'db1175 (re)pooling @ 100%: Slowly pooling db1175 for the first time', diff saved to https://phabricator.wikimedia.org/P14094 and previous config saved to /var/cache/conftool/dbconfig/20210201-110102-root.json
* 10:57 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 15%: Slowly pooling db1166 for the first time', diff saved to https://phabricator.wikimedia.org/P14093 and previous config saved to /var/cache/conftool/dbconfig/20210201-105743-root.json
* 10:54 hashar@deploy1001: Finished scap: testwikis wikis to 1.36.0-wmf.28 (duration: 07m 48s)
* 10:46 hashar@deploy1001: Started scap: testwikis wikis to 1.36.0-wmf.28
* 10:46 marostegui@cumin1001: dbctl commit (dc=all): 'db1175 (re)pooling @ 75%: Slowly pooling db1175 for the first time', diff saved to https://phabricator.wikimedia.org/P14092 and previous config saved to /var/cache/conftool/dbconfig/20210201-104559-root.json
* 10:45 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on ms-be1047.eqiad.wmnet with reason: reboot
* 10:45 filippo@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on ms-be1047.eqiad.wmnet with reason: reboot
* 10:42 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 12%: Slowly pooling db1166 for the first time', diff saved to https://phabricator.wikimedia.org/P14091 and previous config saved to /var/cache/conftool/dbconfig/20210201-104240-root.json
* 10:42 urbanecm@deploy1001: Synchronized php-1.36.0-wmf.28/includes/: Fixing [[phab:T273317|T273317]] [[phab:T273296|T273296]] (duration: 01m 01s)
* 10:41 urbanecm@deploy1001: sync-file aborted: Fixing [[phab:T273317|T273317]] [[phab:T273296|T273296]] (duration: 00m 12s)
* 10:39 urbanecm@deploy1001: Synchronized php-1.36.0-wmf.28/includes/user//User.php: Fixing [[phab:T273317|T273317]] [[phab:T273296|T273296]] (duration: 00m 58s)
* 10:30 marostegui@cumin1001: dbctl commit (dc=all): 'db1175 (re)pooling @ 60%: Slowly pooling db1175 for the first time', diff saved to https://phabricator.wikimedia.org/P14090 and previous config saved to /var/cache/conftool/dbconfig/20210201-103055-root.json
* 10:27 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 10%: Slowly pooling db1166 for the first time', diff saved to https://phabricator.wikimedia.org/P14089 and previous config saved to /var/cache/conftool/dbconfig/20210201-102736-root.json
* 10:15 marostegui@cumin1001: dbctl commit (dc=all): 'db1175 (re)pooling @ 50%: Slowly pooling db1175 for the first time', diff saved to https://phabricator.wikimedia.org/P14088 and previous config saved to /var/cache/conftool/dbconfig/20210201-101552-root.json
* 10:12 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 9%: Slowly pooling db1166 for the first time', diff saved to https://phabricator.wikimedia.org/P14087 and previous config saved to /var/cache/conftool/dbconfig/20210201-101233-root.json
* 10:00 marostegui@cumin1001: dbctl commit (dc=all): 'db1175 (re)pooling @ 30%: Slowly pooling db1175 for the first time', diff saved to https://phabricator.wikimedia.org/P14086 and previous config saved to /var/cache/conftool/dbconfig/20210201-100048-root.json
* 09:57 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 8%: Slowly pooling db1166 for the first time', diff saved to https://phabricator.wikimedia.org/P14085 and previous config saved to /var/cache/conftool/dbconfig/20210201-095729-root.json
* 09:52 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool es4 into writes [[phab:T266483|T266483]] (duration: 00m 56s)
* 09:46 marostegui: Restart mysql on es1021 [[phab:T266483|T266483]]
* 09:45 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool es4 from writes [[phab:T266483|T266483]] (duration: 01m 04s)
* 09:45 marostegui@cumin1001: dbctl commit (dc=all): 'db1175 (re)pooling @ 25%: Slowly pooling db1175 for the first time', diff saved to https://phabricator.wikimedia.org/P14084 and previous config saved to /var/cache/conftool/dbconfig/20210201-094545-root.json
* 09:42 elukey@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 09:42 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 7%: Slowly pooling db1166 for the first time', diff saved to https://phabricator.wikimedia.org/P14083 and previous config saved to /var/cache/conftool/dbconfig/20210201-094226-root.json
* 09:39 elukey@cumin1001: START - Cookbook sre.dns.netbox
* 09:30 marostegui@cumin1001: dbctl commit (dc=all): 'db1175 (re)pooling @ 20%: Slowly pooling db1175 for the first time', diff saved to https://phabricator.wikimedia.org/P14082 and previous config saved to /var/cache/conftool/dbconfig/20210201-093041-root.json
* 09:27 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 6%: Slowly pooling db1166 for the first time', diff saved to https://phabricator.wikimedia.org/P14081 and previous config saved to /var/cache/conftool/dbconfig/20210201-092722-root.json
* 09:27 dcausse: restarting blazegraph on wdqs1013
* 09:24 XioNoX: renumber gr-3/3/0.1 local endpoint on cr1-eqiad
* 09:15 marostegui@cumin1001: dbctl commit (dc=all): 'db1175 (re)pooling @ 15%: Slowly pooling db1175 for the first time', diff saved to https://phabricator.wikimedia.org/P14080 and previous config saved to /var/cache/conftool/dbconfig/20210201-091538-root.json
* 09:12 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 5%: Slowly pooling db1166 for the first time', diff saved to https://phabricator.wikimedia.org/P14079 and previous config saved to /var/cache/conftool/dbconfig/20210201-091218-root.json
* 09:04 gilles@deploy1001: Finished deploy [performance/navtiming@3215510]: [[phab:T271208|T271208]] browser_minor is needed for Mobile Safari allowlist (duration: 00m 05s)
* 09:04 gilles@deploy1001: Started deploy [performance/navtiming@3215510]: [[phab:T271208|T271208]] browser_minor is needed for Mobile Safari allowlist
* 09:03 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on ms-be1054.eqiad.wmnet with reason: reboot
* 09:03 filippo@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on ms-be1054.eqiad.wmnet with reason: reboot
* 09:00 marostegui@cumin1001: dbctl commit (dc=all): 'db1175 (re)pooling @ 12%: Slowly pooling db1175 for the first time', diff saved to https://phabricator.wikimedia.org/P14078 and previous config saved to /var/cache/conftool/dbconfig/20210201-090034-root.json
* 09:00 elukey@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
* 08:57 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 3%: Slowly pooling db1166 for the first time', diff saved to https://phabricator.wikimedia.org/P14077 and previous config saved to /var/cache/conftool/dbconfig/20210201-085714-root.json
* 08:56 marostegui: Stop MySQL on db1089 - [[phab:T273417|T273417]]
* 08:53 gilles@deploy1001: Finished deploy [performance/navtiming@1e02d76]: [[phab:T271208|T271208]] Add more debug logging (duration: 00m 05s)
* 08:53 gilles@deploy1001: Started deploy [performance/navtiming@1e02d76]: [[phab:T271208|T271208]] Add more debug logging
* 08:53 elukey@cumin1001: START - Cookbook sre.dns.netbox
* 08:45 marostegui@cumin1001: dbctl commit (dc=all): 'db1175 (re)pooling @ 10%: Slowly pooling db1175 for the first time', diff saved to https://phabricator.wikimedia.org/P14075 and previous config saved to /var/cache/conftool/dbconfig/20210201-084531-root.json
* 08:45 marostegui@cumin1001: dbctl commit (dc=all): 'Remove db1089 from dbctl [[phab:T273417|T273417]]', diff saved to https://phabricator.wikimedia.org/P14074 and previous config saved to /var/cache/conftool/dbconfig/20210201-084523-marostegui.json
* 08:42 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 4%: Slowly pooling db1166 for the first time', diff saved to https://phabricator.wikimedia.org/P14073 and previous config saved to /var/cache/conftool/dbconfig/20210201-084211-root.json
* 08:29 marostegui@cumin1001: dbctl commit (dc=all): 'db1175 (re)pooling @ 7%: Slowly pooling db1175 for the first time', diff saved to https://phabricator.wikimedia.org/P14072 and previous config saved to /var/cache/conftool/dbconfig/20210201-082933-root.json
* 08:27 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 2%: Slowly pooling db1166 for the first time', diff saved to https://phabricator.wikimedia.org/P14071 and previous config saved to /var/cache/conftool/dbconfig/20210201-082707-root.json
* 08:17 godog: swift codfw-prod decrease HDD weight for ms-be20[16-27] - [[phab:T272837|T272837]]
* 08:15 marostegui@cumin1001: dbctl commit (dc=all): 'Pool db1166 with minimal weight for the first time [[phab:T258361|T258361]]', diff saved to https://phabricator.wikimedia.org/P14070 and previous config saved to /var/cache/conftool/dbconfig/20210201-081554-marostegui.json
* 08:14 marostegui@cumin1001: dbctl commit (dc=all): 'db1175 (re)pooling @ 5%: Slowly pooling db1175 for the first time', diff saved to https://phabricator.wikimedia.org/P14069 and previous config saved to /var/cache/conftool/dbconfig/20210201-081429-root.json
* 08:05 marostegui@cumin1001: dbctl commit (dc=all): 'Add db1166 to dbctl, depooled [[phab:T258361|T258361]]', diff saved to https://phabricator.wikimedia.org/P14068 and previous config saved to /var/cache/conftool/dbconfig/20210201-080520-marostegui.json
* 07:59 marostegui@cumin1001: dbctl commit (dc=all): 'db1175 (re)pooling @ 3%: Slowly pooling db1175 for the first time', diff saved to https://phabricator.wikimedia.org/P14067 and previous config saved to /var/cache/conftool/dbconfig/20210201-075926-root.json
* 07:44 marostegui@cumin1001: dbctl commit (dc=all): 'db1175 (re)pooling @ 2%: Slowly pooling db1175 for the first time', diff saved to https://phabricator.wikimedia.org/P14066 and previous config saved to /var/cache/conftool/dbconfig/20210201-074422-root.json
* 07:36 marostegui@cumin1001: dbctl commit (dc=all): 'Add db1175 with some more minimal weight [[phab:T258361|T258361]]', diff saved to https://phabricator.wikimedia.org/P14065 and previous config saved to /var/cache/conftool/dbconfig/20210201-073603-marostegui.json
* 07:04 marostegui@cumin1001: dbctl commit (dc=all): 'db1094 (re)pooling @ 100%: After fixing replication', diff saved to https://phabricator.wikimedia.org/P14064 and previous config saved to /var/cache/conftool/dbconfig/20210201-070429-root.json
* 06:49 marostegui@cumin1001: dbctl commit (dc=all): 'db1094 (re)pooling @ 75%: After fixing replication', diff saved to https://phabricator.wikimedia.org/P14063 and previous config saved to /var/cache/conftool/dbconfig/20210201-064926-root.json
* 06:39 marostegui: Run analyze table on db2071 and db2102
* 06:34 marostegui@cumin1001: dbctl commit (dc=all): 'db1094 (re)pooling @ 50%: After fixing replication', diff saved to https://phabricator.wikimedia.org/P14062 and previous config saved to /var/cache/conftool/dbconfig/20210201-063422-root.json
* 06:24 marostegui@cumin1001: dbctl commit (dc=all): 'Add db1175 to dbctl, depooled [[phab:T258361|T258361]]', diff saved to https://phabricator.wikimedia.org/P14061 and previous config saved to /var/cache/conftool/dbconfig/20210201-062358-marostegui.json
* 06:19 marostegui@cumin1001: dbctl commit (dc=all): 'db1094 (re)pooling @ 25%: After fixing replication', diff saved to https://phabricator.wikimedia.org/P14060 and previous config saved to /var/cache/conftool/dbconfig/20210201-061919-root.json
* 06:10 marostegui: Upgrade db2071 and db2102 to 10.4.18
* 06:04 marostegui@cumin1001: dbctl commit (dc=all): 'db1094 (re)pooling @ 10%: After fixing replication', diff saved to https://phabricator.wikimedia.org/P14059 and previous config saved to /var/cache/conftool/dbconfig/20210201-060415-root.json
* 05:58 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1094', diff saved to https://phabricator.wikimedia.org/P14058 and previous config saved to /var/cache/conftool/dbconfig/20210201-055851-marostegui.json


== 2021-01-29 ==
== 2022-08-07 ==
* 23:26 razzi@cumin1001: END (PASS) - Cookbook sre.kafka.reboot-workers (exit_code=0) for Kafka test cluster: Reboot kafka nodes - razzi@cumin1001
* 19:58 taavi: taavi@mwmaint1002 ~ $ echo "https://upload.wikimedia.org/wikipedia/commons/1/15/Keep_tidy_ask.svg" {{!}} mwscript purgeList.php --wiki enwiki # [[phab:T314712|T314712]]
* 22:36 dancy@deploy1001: Finished scap: MW servers complaining about l10n files after .27 rollback (duration: 07m 22s)
* 13:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1142 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P32305 and previous config saved to /var/cache/conftool/dbconfig/20220807-135204-ladsgroup.json
* 22:29 dancy@deploy1001: Started scap: MW servers complaining about l10n files after .27 rollback
* 13:51 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1142.eqiad.wmnet with reason: Maintenance
* 22:26 dancy@deploy1001: rebuilt and synchronized wikiversions files: all wikis to 1.36.0-wmf.27
* 13:51 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1142.eqiad.wmnet with reason: Maintenance
* 22:20 reedy@deploy1001: Synchronized php-1.36.0-wmf.27/includes/parser/CacheTime.php: CacheTime: Extra protection for rollback unserialization [[phab:T273007|T273007]] (duration: 01m 00s)
* 13:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1141 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P32304 and previous config saved to /var/cache/conftool/dbconfig/20220807-135143-ladsgroup.json
* 22:14 dancy@deploy1001: rebuilt and synchronized wikiversions files: all wikis to 1.36.0-wmf.28
* 13:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1141', diff saved to https://phabricator.wikimedia.org/P32303 and previous config saved to /var/cache/conftool/dbconfig/20220807-133637-ladsgroup.json
* 22:09 dancy@deploy1001: scap failed: average error rate on 8/9 canaries increased by 10x (rerun with --force to override this check, see https://logstash.wikimedia.org/goto/83629bcb5560d11e61d3085c89dd9ed6 for details)
* 13:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1141', diff saved to https://phabricator.wikimedia.org/P32302 and previous config saved to /var/cache/conftool/dbconfig/20220807-132131-ladsgroup.json
* 21:42 razzi: rebalance kafka partitions for codfw.resource_change
* 13:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1141 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P32301 and previous config saved to /var/cache/conftool/dbconfig/20220807-130625-ladsgroup.json
* 21:40 razzi@cumin1001: START - Cookbook sre.kafka.reboot-workers for Kafka test cluster: Reboot kafka nodes - razzi@cumin1001
* 12:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1141 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P32300 and previous config saved to /var/cache/conftool/dbconfig/20220807-120610-ladsgroup.json
* 19:26 razzi@cumin1001: END (FAIL) - Cookbook sre.kafka.reboot-workers (exit_code=99) for Kafka test cluster: Reboot kafka nodes - razzi@cumin1001
* 12:06 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1141.eqiad.wmnet with reason: Maintenance
* 19:26 razzi@cumin1001: START - Cookbook sre.kafka.reboot-workers for Kafka test cluster: Reboot kafka nodes - razzi@cumin1001
* 12:05 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1141.eqiad.wmnet with reason: Maintenance
* 18:50 hashar: CI slightly overloaded due to a surge of library updates but is otherwise processing changes
* 12:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1149 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P32299 and previous config saved to /var/cache/conftool/dbconfig/20220807-120549-ladsgroup.json
* 17:31 reedy@deploy1001: Synchronized php-1.36.0-wmf.28/extensions/WikiEditor/modules/jquery.wikiEditor.toolbar.config.js: [[phab:T273231|T273231]] (duration: 01m 02s)
* 11:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1149', diff saved to https://phabricator.wikimedia.org/P32298 and previous config saved to /var/cache/conftool/dbconfig/20220807-115043-ladsgroup.json
* 16:56 effie: depool mw1403 and mw1405
* 11:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1149', diff saved to https://phabricator.wikimedia.org/P32297 and previous config saved to /var/cache/conftool/dbconfig/20220807-113537-ladsgroup.json
* 15:46 elukey@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host an-test-presto1001.eqiad.wmnet
* 11:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1149 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P32296 and previous config saved to /var/cache/conftool/dbconfig/20220807-112031-ladsgroup.json
* 15:27 elukey@cumin1001: START - Cookbook sre.ganeti.makevm for new host an-test-presto1001.eqiad.wmnet
* 14:58 ariel@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on snapshot1007.eqiad.wmnet with reason: REIMAGE
* 14:56 ariel@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on snapshot1007.eqiad.wmnet with reason: REIMAGE
* 13:50 akosiaris@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'linkrecommendation' for release 'staging' .
* 13:50 akosiaris@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'linkrecommendation' for release 'production' .
* 13:50 akosiaris@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'linkrecommendation' for release 'external' .
* 13:49 akosiaris@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'linkrecommendation' for release 'production' .
* 13:49 akosiaris@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'linkrecommendation' for release 'external' .
* 13:48 akosiaris@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'linkrecommendation' for release 'staging' .
* 13:47 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'linkrecommendation' for release 'production' .
* 13:47 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'linkrecommendation' for release 'external' .
* 13:47 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'linkrecommendation' for release 'staging' .
* 13:16 akosiaris@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'linkrecommendation' for release 'staging' .
* 13:16 akosiaris@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'linkrecommendation' for release 'production' .
* 13:16 akosiaris@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'linkrecommendation' for release 'external' .
* 13:05 akosiaris@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'linkrecommendation' for release 'production' .
* 13:05 akosiaris@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'linkrecommendation' for release 'external' .
* 13:05 akosiaris@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'linkrecommendation' for release 'staging' .
* 13:02 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'linkrecommendation' for release 'staging' .
* 13:02 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'linkrecommendation' for release 'external' .
* 13:02 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'linkrecommendation' for release 'production' .
* 12:38 hnowlan: uploaded osmborder_0.1.0-2~buster0 package to buster-wikimedia
* 12:00 gilles@deploy1001: Finished deploy [performance/coal@b0d3b59]: [[phab:T271208|T271208]] Filter out canary events (duration: 00m 06s)
* 12:00 gilles@deploy1001: Started deploy [performance/coal@b0d3b59]: [[phab:T271208|T271208]] Filter out canary events
* 11:42 dcausse@deploy1001: Synchronized wmf-config/unitConversionConfig.json: [[phab:T270252|T270252]]: Update unitConversionConfig.json (duration: 01m 01s)
* 11:39 gilles@deploy1001: Finished deploy [performance/navtiming@ae8310a]: [[phab:T271208|T271208]] Fix canary event check (duration: 00m 05s)
* 11:39 gilles@deploy1001: Started deploy [performance/navtiming@ae8310a]: [[phab:T271208|T271208]] Fix canary event check
* 11:26 gilles@deploy1001: Finished deploy [performance/navtiming@e7712c3]: [[phab:T271208|T271208]] Log instead of hard error on missing wiki field (duration: 00m 06s)
* 11:26 gilles@deploy1001: Started deploy [performance/navtiming@e7712c3]: [[phab:T271208|T271208]] Log instead of hard error on missing wiki field
* 11:06 gilles@deploy1001: Finished deploy [performance/navtiming@125f6be]: [[phab:T271208|T271208]] Ignore canary events (duration: 00m 05s)
* 11:06 gilles@deploy1001: Started deploy [performance/navtiming@125f6be]: [[phab:T271208|T271208]] Ignore canary events
* 11:04 elukey: upload presto-* version 0.246-1 packages to buster/stretch-wikimedia
* 10:54 jynus@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 10:45 jynus@cumin1001: START - Cookbook sre.hosts.decommission
* 10:35 marostegui@cumin1001: dbctl commit (dc=all): 'db1078 (re)pooling @ 100%: After upgrading the kernel', diff saved to https://phabricator.wikimedia.org/P14050 and previous config saved to /var/cache/conftool/dbconfig/20210129-103505-root.json
* 10:20 marostegui@cumin1001: dbctl commit (dc=all): 'db1078 (re)pooling @ 75%: After upgrading the kernel', diff saved to https://phabricator.wikimedia.org/P14049 and previous config saved to /var/cache/conftool/dbconfig/20210129-102001-root.json
* 10:18 vgutierrez: pool cp5006
* 10:17 jynus@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 10:04 marostegui@cumin1001: dbctl commit (dc=all): 'db1078 (re)pooling @ 50%: After upgrading the kernel', diff saved to https://phabricator.wikimedia.org/P14048 and previous config saved to /var/cache/conftool/dbconfig/20210129-100458-root.json
* 09:51 jynus@cumin1001: START - Cookbook sre.hosts.decommission
* 09:50 vgutierrez: reboot cp5006
* 09:49 marostegui@cumin1001: dbctl commit (dc=all): 'db1078 (re)pooling @ 25%: After upgrading the kernel', diff saved to https://phabricator.wikimedia.org/P14047 and previous config saved to /var/cache/conftool/dbconfig/20210129-094954-root.json
* 09:34 marostegui@cumin1001: dbctl commit (dc=all): 'db1078 (re)pooling @ 10%: After upgrading the kernel', diff saved to https://phabricator.wikimedia.org/P14046 and previous config saved to /var/cache/conftool/dbconfig/20210129-093451-root.json
* 09:32 marostegui: Expand lvs on db1155-db1175 [[phab:T258361|T258361]]
* 09:31 vgutierrez: depool cp5006
* 08:20 marostegui: Change buffer pool sizes on clouddb1013,1015,1017,1019 [[phab:T267090|T267090]]
* 07:11 marostegui: Upgrade pc2007 to 10.4.18 [[phab:T268457|T268457]]
* 06:55 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1078 to clone db1175', diff saved to https://phabricator.wikimedia.org/P14044 and previous config saved to /var/cache/conftool/dbconfig/20210129-065529-marostegui.json
* 03:35 marostegui: Reload haproxy1018
* 02:42 legoktm@cumin1001: conftool action : set/pooled=yes; selector: name=mw2251.codfw.wmnet
* 02:42 legoktm@cumin1001: conftool action : set/pooled=yes; selector: name=mw2252.codfw.wmnet
* 02:37 legoktm@cumin1001: conftool action : set/pooled=no; selector: name=mw2252.codfw.wmnet
* 02:37 legoktm@cumin1001: conftool action : set/pooled=no; selector: name=mw2251.codfw.wmnet
* 02:04 krinkle@deploy1001: Synchronized wmf-config/profiler.php: {{Gerrit|If0c71a983772c}} (duration: 00m 58s)
* 01:49 legoktm@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2252.codfw.wmnet with reason: REIMAGE
* 01:48 legoktm@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2251.codfw.wmnet with reason: REIMAGE
* 01:46 legoktm@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2252.codfw.wmnet with reason: REIMAGE
* 01:46 legoktm@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2251.codfw.wmnet with reason: REIMAGE
* 01:09 legoktm@cumin1001: conftool action : set/pooled=yes; selector: name=mw2253.codfw.wmnet
* 01:07 mutante: repooled mw2248,mw2249 - jobrunners/videoscalers now on buster
* 01:06 mutante: repooled mw2048,mw2049 - jobrunners/videoscalers now on buster
* 01:06 legoktm@cumin1001: conftool action : set/pooled=no; selector: name=mw2253.codfw.wmnet
* 01:06 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2249.codfw.wmnet
* 01:05 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2248.codfw.wmnet
* 01:03 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2249.codfw.wmnet
* 01:03 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2248.codfw.wmnet
* 00:19 legoktm@cumin1001: conftool action : set/pooled=yes; selector: name=mw2261.codfw.wmnet
* 00:14 legoktm@cumin1001: conftool action : set/pooled=yes; selector: name=mw2262.codfw.wmnet
* 00:13 legoktm@cumin1001: conftool action : set/pooled=yes; selector: name=mw2283.codfw.wmnet


== 2021-01-28 ==
== 2022-08-06 ==
* 23:58 legoktm@cumin1001: conftool action : set/pooled=no; selector: name=mw2261.codfw.wmnet
* 17:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1149 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P32295 and previous config saved to /var/cache/conftool/dbconfig/20220806-175916-ladsgroup.json
* 23:58 legoktm@cumin1001: conftool action : set/pooled=no; selector: name=mw2262.codfw.wmnet
* 17:59 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1149.eqiad.wmnet with reason: Maintenance
* 23:57 legoktm@cumin1001: conftool action : set/pooled=no; selector: name=mw2283.codfw.wmnet
* 17:58 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1149.eqiad.wmnet with reason: Maintenance
* 23:52 legoktm@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2253.codfw.wmnet with reason: REIMAGE
* 03:10 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 23:49 legoktm@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2253.codfw.wmnet with reason: REIMAGE
* 03:09 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 23:47 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2248.codfw.wmnet with reason: REIMAGE
* 03:09 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 23:45 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2249.codfw.wmnet with reason: REIMAGE
* 03:08 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 23:44 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2248.codfw.wmnet with reason: REIMAGE
* 03:03 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 23:43 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2249.codfw.wmnet with reason: REIMAGE
* 03:02 krinkle@deploy1002: Synchronized w/: {{Gerrit|I9067d47fab0324}} (duration: 03m 25s)
* 23:34 legoktm@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on mw2283.codfw.wmnet with reason: reimaging
* 03:02 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 23:34 legoktm@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on mw2283.codfw.wmnet with reason: reimaging
* 03:02 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 23:33 legoktm@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2262.codfw.wmnet with reason: REIMAGE
* 03:01 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 23:31 legoktm@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mw2283.codfw.wmnet with reason: REIMAGE
* 02:41 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 23:31 legoktm@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2261.codfw.wmnet with reason: REIMAGE
* 02:39 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 23:29 legoktm@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2283.codfw.wmnet with reason: REIMAGE
* 02:39 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 23:29 legoktm@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2262.codfw.wmnet with reason: REIMAGE
* 02:38 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 23:29 legoktm@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2261.codfw.wmnet with reason: REIMAGE
* 02:38 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1143.eqiad.wmnet with reason: Maintenance
* 23:14 mutante: reimaging jobrunners/videoscallers mw2248,mw2249
* 02:37 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1143.eqiad.wmnet with reason: Maintenance
* 22:43 brennen@deploy1001: Synchronized php-1.36.0-wmf.27/includes/parser/CacheTime.php: [[gerrit:658688{{!}}CacheTime: Extra protection for rollback unserialization (T273007)]] (duration: 00m 57s)
* 02:33 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 22:41 bblack: eqiad lvs should be back to normal state now with everything working
* 02:32 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 22:39 bblack: lvs1014 - apply https://gerrit.wikimedia.org/r/659439
* 02:32 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 22:37 bblack: lvs1013 - testing https://gerrit.wikimedia.org/r/659439 (expect nop, worked on 1015!)
* 02:31 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 22:36 bblack: lvs1015 - testing https://gerrit.wikimedia.org/r/659439 (expect nop)
* 22:21 bblack: lvs1016 - trying https://gerrit.wikimedia.org/r/659439 on backup LVS...
* 22:21 legoktm@cumin1001: conftool action : set/pooled=yes; selector: name=mw2287.codfw.wmnet
* 22:21 legoktm@cumin1001: conftool action : set/pooled=yes; selector: name=mw2286.codfw.wmnet
* 22:20 legoktm@cumin1001: conftool action : set/pooled=yes; selector: name=mw2285.codfw.wmnet
* 22:20 legoktm@cumin1001: conftool action : set/pooled=yes; selector: name=mw2284.codfw.wmnet
* 22:16 bblack: disabling puppet on all eqiad lvs for https://gerrit.wikimedia.org/r/659439 risks
* 22:03 legoktm@cumin1001: conftool action : set/pooled=no; selector: name=mw2284.codfw.wmnet
* 22:03 legoktm@cumin1001: conftool action : set/pooled=no; selector: name=mw2285.codfw.wmnet
* 22:02 legoktm@cumin1001: conftool action : set/pooled=no; selector: name=mw2286.codfw.wmnet
* 22:02 legoktm@cumin1001: conftool action : set/pooled=no; selector: name=mw2287.codfw.wmnet
* 21:33 robh@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on db1173.eqiad.wmnet with reason: REIMAGE
* 21:32 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1172.eqiad.wmnet with reason: REIMAGE
* 21:30 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1175.eqiad.wmnet with reason: REIMAGE
* 21:28 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1173.eqiad.wmnet with reason: REIMAGE
* 21:28 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1172.eqiad.wmnet with reason: REIMAGE
* 21:28 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1175.eqiad.wmnet with reason: REIMAGE
* 21:28 brennen@deploy1001: rebuilt and synchronized wikiversions files: all wikis to 1.36.0-wmf.28
* 21:28 legoktm@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on mw2287.codfw.wmnet with reason: reimaging
* 21:28 legoktm@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on mw2287.codfw.wmnet with reason: reimaging
* 21:27 legoktm@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on mw2285.codfw.wmnet with reason: reimaging
* 21:27 legoktm@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on mw2285.codfw.wmnet with reason: reimaging
* 21:27 legoktm@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2284.codfw.wmnet with reason: REIMAGE
* 21:25 legoktm@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2286.codfw.wmnet with reason: REIMAGE
* 21:23 legoktm@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mw2285.codfw.wmnet with reason: REIMAGE
* 21:23 legoktm@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mw2287.codfw.wmnet with reason: REIMAGE
* 21:23 legoktm@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2284.codfw.wmnet with reason: REIMAGE
* 21:23 legoktm@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2285.codfw.wmnet with reason: REIMAGE
* 21:23 legoktm@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2287.codfw.wmnet with reason: REIMAGE
* 21:23 legoktm@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2286.codfw.wmnet with reason: REIMAGE
* 21:19 brennen@deploy1001: Synchronized php: group1 wikis to 1.36.0-wmf.28 (duration: 01m 05s)
* 21:17 brennen@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.36.0-wmf.28
* 21:15 brennen: 1.36.0-wmf.28 train status ([[phab:T271342|T271342]]): blockers resolved, going go group1 to be follow shortly by all wikis
* 21:11 brennen@deploy1001: Synchronized php-1.36.0-wmf.28/extensions/CentralAuth/includes/: Backport: [[gerrit:659362{{!}}Revert CentralAuthCreateLocalAccountJob changes in 9f79de4 (T273205)]] (duration: 01m 09s)
* 20:49 brennen@deploy1001: Synchronized php-1.36.0-wmf.28/tests/phpunit/includes/parser/ParserOptionsTest.php: Backport: [[gerrit:659103{{!}}Make ParserOptions::isSafeToCache more robust (T273120)]] (duration: 01m 07s)
* 20:46 brennen@deploy1001: Synchronized php-1.36.0-wmf.28/includes/parser/ParserOptions.php: Backport: [[gerrit:659103{{!}}Make ParserOptions::isSafeToCache more robust (T273120)]] (duration: 01m 08s)
* 20:25 bblack: lvs1014,lvs1016 - all back to "normal" state
* 20:24 bblack: lvs1014 - restart pybal
* 20:20 bblack: lvs1016 - restart pybal
* 20:15 ebernhardson@deploy1001: Finished deploy [wikimedia/discovery/analytics@911731d]: write articletopic and drafttopic to hourly tables (duration: 01m 44s)
* 20:13 bblack: lvs1014,lvs1016 - puppet temporarily disabled for new service config deploy - [[phab:T271476|T271476]]
* 20:13 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2223.codfw.wmnet
* 20:13 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2247.codfw.wmnet
* 20:13 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1264.eqiad.wmnet
* 20:13 ebernhardson@deploy1001: Started deploy [wikimedia/discovery/analytics@911731d]: write articletopic and drafttopic to hourly tables
* 20:13 mutante: scap pulling and repooling: mw1264, mw2223, mw2247
* 20:11 bstorm@cumin1001: conftool action : set/pooled=yes; selector: name=dbproxy1019.eqiad.wmnet
* 20:10 bstorm@cumin1001: conftool action : set/pooled=yes; selector: name=dbproxy1018.eqiad.wmnet
* 20:01 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2223.codfw.wmnet
* 20:00 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2247.codfw.wmnet
* 20:00 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1264.eqiad.wmnet
* 19:57 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1171.eqiad.wmnet with reason: REIMAGE
* 19:55 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1171.eqiad.wmnet with reason: REIMAGE
* 19:53 ebernhardson@deploy1001: Finished deploy [wikimedia/discovery/analytics@ba1acd6]: airflow: start ores_predictions_daily one day earlier (duration: 01m 09s)
* 19:52 ebernhardson@deploy1001: Started deploy [wikimedia/discovery/analytics@ba1acd6]: airflow: start ores_predictions_daily one day earlier
* 19:45 Urbanecm: Run mwscript namespaceDupes.php --wiki=frwikisource --add-prefix=BROKEN --fix ([[phab:T271939|T271939]])
* 19:44 Urbanecm: Run mwscript namespaceDupes.php --wiki=frwikisource --fix ([[phab:T271939|T271939]])
* 19:42 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|0ae49093893316657ffd7cf56669a470fb073352}}: frwikisource: Add WS as an alias to NS_PROJECT ([[phab:T271939|T271939]]) (duration: 00m 57s)
* 19:37 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|fd18092fd8b73414f6c320895601c83b883e29ee}}: Add image.laji.fi to the wgCopyUploadsDomains allowlist of Wikimedia Commons ([[phab:T270587|T270587]]) (duration: 01m 04s)
* 19:36 jynus: extending backup1001 /dev/mapper/array1-archive partition to allocate enough space for helium backups [[phab:T238048|T238048]]
* 19:30 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|519350b86bd4afc8d4efc3c2f9b2631a0ced22c2}}: frwiktionary: Change babel category names per community request ([[phab:T270186|T270186]]) (duration: 00m 59s)
* 19:23 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1163.eqiad.wmnet with reason: REIMAGE
* 19:22 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|3d0ca3a11a59063e5adfc126702032ea357e8524}}: Create patroller user group for thwiki ([[phab:T272149|T272149]]) (duration: 01m 07s)
* 19:20 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1163.eqiad.wmnet with reason: REIMAGE
* 19:19 mforns@deploy1001: Finished deploy [analytics/refinery@1e41f60] (thin): Regular analytics weekly train THIN [analytics/refinery@1e41f608fad96e7a9f77eb28cd1c082a0a01d562] (duration: 00m 08s)
* 19:19 mforns@deploy1001: Started deploy [analytics/refinery@1e41f60] (thin): Regular analytics weekly train THIN [analytics/refinery@1e41f608fad96e7a9f77eb28cd1c082a0a01d562]
* 19:15 mforns@deploy1001: Finished deploy [analytics/refinery@1e41f60]: Regular analytics weekly train [analytics/refinery@1e41f608fad96e7a9f77eb28cd1c082a0a01d562] (duration: 16m 53s)
* 19:14 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|e914f1e65adfdf2f41af97363501b0ba3c40d5b8}}: robots: cawikimedia: Set wgDefaultRobotPolicy to noindex,nofollow ([[phab:T272871|T272871]]) (duration: 01m 08s)
* 19:12 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2247.codfw.wmnet with reason: REIMAGE
* 19:10 ebernhardson@deploy1001: Finished deploy [wikimedia/discovery/analytics@0742443]: hourly partitioning for ores tables (duration: 01m 25s)
* 19:10 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2223.codfw.wmnet with reason: REIMAGE
* 19:09 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2247.codfw.wmnet with reason: REIMAGE
* 19:09 ebernhardson@deploy1001: Started deploy [wikimedia/discovery/analytics@0742443]: hourly partitioning for ores tables
* 19:08 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2223.codfw.wmnet with reason: REIMAGE
* 19:07 cdanis: decom Zayo IP transit on cr2-codfw [[phab:T272675|T272675]]
* 19:06 ebernhardson@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enable canary events for mediawiki_revision_recommendation_create (duration: 01m 12s)
* 19:02 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1264.eqiad.wmnet with reason: REIMAGE
* 19:00 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1264.eqiad.wmnet with reason: REIMAGE
* 18:58 cdanis: draining traffic from Zayo OGYX/123447 codfw<>ulsfo in preparation for decommission 🥃 [[phab:T272675|T272675]]
* 18:58 mforns@deploy1001: Started deploy [analytics/refinery@1e41f60]: Regular analytics weekly train [analytics/refinery@1e41f608fad96e7a9f77eb28cd1c082a0a01d562]
* 18:58 urbanecm@deploy1001: Synchronized private/PrivateSettings.php: Remove [[phab:T257687|T257687]] mitigations (duration: 01m 10s)
* 18:46 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1159.eqiad.wmnet with reason: REIMAGE
* 18:44 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1159.eqiad.wmnet with reason: REIMAGE
* 18:34 mutante: reimaging another canary appserver, mw1264, so that we will have at least 2 stretch and 2 buster canaries for the transitional period
* 18:30 bblack@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 18:26 bblack@cumin1001: START - Cookbook sre.dns.netbox
* 17:49 jgleeson: fundraising-tools tools updated from {{Gerrit|41cab089da}} to {{Gerrit|d64b2f8cee}}
* 17:38 crusnov@deploy1001: Finished deploy [netbox/deploy@52d6fb9]: Test deploy of 2.10.4 to netbox-next [[phab:T265084|T265084]] (duration: 01m 18s)
* 17:37 crusnov@deploy1001: Started deploy [netbox/deploy@52d6fb9]: Test deploy of 2.10.4 to netbox-next [[phab:T265084|T265084]]
* 17:35 crusnov@deploy1001: Started deploy [netbox/deploy@52d6fb9]: Test deploy of 2.10.4 to netbox-next [[phab:T265084|T265084]]
* 17:28 ebernhardson: ban elastic1063 from production-search-omega-eqiad and production-search-eqiad [[phab:T265113|T265113]]
* 17:11 urbanecm@deploy1001: Synchronized private/PrivateSettings.php: Update [[phab:T250887|T250887]] mitigations (duration: 01m 06s)
* 16:56 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host deploy1002.eqiad.wmnet
* 16:51 akosiaris@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'cxserver' for release 'staging' .
* 16:51 akosiaris@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'cxserver' for release 'production' .
* 16:49 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host deploy1002.eqiad.wmnet
* 16:49 akosiaris@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'cxserver' for release 'production' .
* 16:49 akosiaris@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'cxserver' for release 'staging' .
* 16:49 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host deploy2002.codfw.wmnet
* 16:48 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'cxserver' for release 'staging' .
* 16:48 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'cxserver' for release 'production' .
* 16:45 elukey@cumin1001: END (FAIL) - Cookbook sre.hadoop.change-distro-from-cdh-clients (exit_code=99) for Hadoop test cluster: Change Hadoop distribution - elukey@cumin1001
* 16:44 elukey@cumin1001: START - Cookbook sre.hadoop.change-distro-from-cdh-clients for Hadoop test cluster: Change Hadoop distribution - elukey@cumin1001
* 16:44 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.change-distro-from-cdh-clients (exit_code=0) for Hadoop test cluster: Change Hadoop distribution - elukey@cumin1001
* 16:44 elukey@cumin1001: START - Cookbook sre.hadoop.change-distro-from-cdh-clients for Hadoop test cluster: Change Hadoop distribution - elukey@cumin1001
* 16:41 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.change-distro-from-cdh-clients (exit_code=0) for Hadoop test cluster: Change Hadoop distribution - elukey@cumin1001
* 16:41 arturo: running homer on cr*-eqiad* again for reverting latest changes ([[phab:T271476|T271476]])
* 16:39 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host deploy2002.codfw.wmnet
* 16:28 akosiaris@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'apertium' for release 'production' .
* 16:28 akosiaris@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'apertium' for release 'staging' .
* 16:28 akosiaris@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'apertium' for release 'plain' .
* 16:26 akosiaris@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'apertium' for release 'production' .
* 16:25 akosiaris@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'apertium' for release 'plain' .
* 16:25 akosiaris@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'apertium' for release 'staging' .
* 16:24 elukey@cumin1001: START - Cookbook sre.hadoop.change-distro-from-cdh-clients for Hadoop test cluster: Change Hadoop distribution - elukey@cumin1001
* 16:24 akosiaris: stop scraping apertium from prometheus, it doesn't have a prometheus endpoint.
* 16:23 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'apertium' for release 'production' .
* 16:23 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'apertium' for release 'plain' .
* 16:23 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'apertium' for release 'staging' .
* 16:19 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.change-distro-from-cdh-clients (exit_code=0) for Hadoop test cluster: Change Hadoop distribution - elukey@cumin1001
* 16:17 elukey@cumin1001: START - Cookbook sre.hadoop.change-distro-from-cdh-clients for Hadoop test cluster: Change Hadoop distribution - elukey@cumin1001
* 16:06 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.change-distro-from-cdh (exit_code=0) for Hadoop test cluster: Change Hadoop distribution - elukey@cumin1001
* 16:03 arturo: running homer on cr*-eqiad* for [[phab:T271476|T271476]]
* 15:55 elukey@cumin1001: START - Cookbook sre.hadoop.change-distro-from-cdh for Hadoop test cluster: Change Hadoop distribution - elukey@cumin1001
* 15:54 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.stop-cluster (exit_code=0) for Hadoop test cluster: Stop the Hadoop cluster before maintenance. - elukey@cumin1001
* 15:52 cdanis: draining traffic from Zayo OGYX/120003 codfw<>eqiad in preparation for decommission 🥃 [[phab:T272675|T272675]]
* 15:49 elukey@cumin1001: START - Cookbook sre.hadoop.stop-cluster for Hadoop test cluster: Stop the Hadoop cluster before maintenance. - elukey@cumin1001
* 15:49 ebernhardson@deploy1001: Finished deploy [wikimedia/discovery/analytics@d0a6933]: align threshold path references across days (duration: 01m 15s)
* 15:49 marostegui: Power off clouddb1019 for memory replacement [[phab:T272125|T272125]]
* 15:48 ebernhardson@deploy1001: Started deploy [wikimedia/discovery/analytics@d0a6933]: align threshold path references across days
* 15:25 otto@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Migrate NavigationTiming schemas to Event Platform on all wikis - [[phab:T271208|T271208]] (duration: 01m 11s)
* 15:06 jayme@deploy1001: helmfile [staging-codfw] DONE helmfile.d/admin 'sync'.
* 15:05 jayme@deploy1001: helmfile [staging-codfw] START helmfile.d/admin 'sync'.
* 14:26 jayme@deploy1001: helmfile [staging-codfw] DONE helmfile.d/admin 'sync'.
* 14:14 jayme@deploy1001: helmfile [staging-codfw] START helmfile.d/admin 'sync'.
* 14:14 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1148 after kernel upgrade and enablement of report_host', diff saved to https://phabricator.wikimedia.org/P14039 and previous config saved to /var/cache/conftool/dbconfig/20210128-141425-marostegui.json
* 13:57 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1148 for kernel upgrade and enablement of report_host', diff saved to https://phabricator.wikimedia.org/P14038 and previous config saved to /var/cache/conftool/dbconfig/20210128-135730-marostegui.json
* 13:56 marostegui@cumin1001: dbctl commit (dc=all): 'db1146:3314 (re)pooling @ 100%: After upgrading the kernel', diff saved to https://phabricator.wikimedia.org/P14037 and previous config saved to /var/cache/conftool/dbconfig/20210128-135612-root.json
* 13:56 marostegui@cumin1001: dbctl commit (dc=all): 'db1146:3312 (re)pooling @ 100%: After upgrading the kernel', diff saved to https://phabricator.wikimedia.org/P14036 and previous config saved to /var/cache/conftool/dbconfig/20210128-135602-root.json
* 13:41 marostegui@cumin1001: dbctl commit (dc=all): 'db1146:3314 (re)pooling @ 75%: After upgrading the kernel', diff saved to https://phabricator.wikimedia.org/P14035 and previous config saved to /var/cache/conftool/dbconfig/20210128-134109-root.json
* 13:40 marostegui@cumin1001: dbctl commit (dc=all): 'db1146:3312 (re)pooling @ 75%: After upgrading the kernel', diff saved to https://phabricator.wikimedia.org/P14034 and previous config saved to /var/cache/conftool/dbconfig/20210128-134057-root.json
* 13:26 marostegui@cumin1001: dbctl commit (dc=all): 'db1146:3314 (re)pooling @ 50%: After upgrading the kernel', diff saved to https://phabricator.wikimedia.org/P14033 and previous config saved to /var/cache/conftool/dbconfig/20210128-132605-root.json
* 13:25 marostegui@cumin1001: dbctl commit (dc=all): 'db1146:3312 (re)pooling @ 50%: After upgrading the kernel', diff saved to https://phabricator.wikimedia.org/P14032 and previous config saved to /var/cache/conftool/dbconfig/20210128-132553-root.json
* 13:17 godog: swift codfw-prod decrease HDD weight for ms-be20[16-27] - [[phab:T272837|T272837]]
* 13:11 marostegui@cumin1001: dbctl commit (dc=all): 'db1146:3314 (re)pooling @ 25%: After upgrading the kernel', diff saved to https://phabricator.wikimedia.org/P14031 and previous config saved to /var/cache/conftool/dbconfig/20210128-131101-root.json
* 13:10 marostegui@cumin1001: dbctl commit (dc=all): 'db1146:3312 (re)pooling @ 25%: After upgrading the kernel', diff saved to https://phabricator.wikimedia.org/P14030 and previous config saved to /var/cache/conftool/dbconfig/20210128-131050-root.json
* 12:56 marostegui@cumin1001: dbctl commit (dc=all): 'Remove es1024's weight', diff saved to https://phabricator.wikimedia.org/P14029 and previous config saved to /var/cache/conftool/dbconfig/20210128-125631-marostegui.json
* 12:55 marostegui@cumin1001: dbctl commit (dc=all): 'db1146:3314 (re)pooling @ 10%: After upgrading the kernel', diff saved to https://phabricator.wikimedia.org/P14028 and previous config saved to /var/cache/conftool/dbconfig/20210128-125558-root.json
* 12:55 marostegui@cumin1001: dbctl commit (dc=all): 'db1146:3312 (re)pooling @ 10%: After upgrading the kernel', diff saved to https://phabricator.wikimedia.org/P14027 and previous config saved to /var/cache/conftool/dbconfig/20210128-125546-root.json
* 12:48 dcausse: European mid-day backport window done
* 12:38 marostegui@cumin1001: dbctl commit (dc=all): 'es1025 (re)pooling @ 100%: Repooling after the restart', diff saved to https://phabricator.wikimedia.org/P14026 and previous config saved to /var/cache/conftool/dbconfig/20210128-123800-root.json
* 12:32 dcausse@deploy1001: Synchronized php-1.36.0-wmf.27/extensions/CirrusSearch/: Add an option to limit the size of the file_text field: [[phab:T271493|T271493]] (duration: 01m 09s)
* 12:22 marostegui@cumin1001: dbctl commit (dc=all): 'es1025 (re)pooling @ 80%: Repooling after the restart', diff saved to https://phabricator.wikimedia.org/P14025 and previous config saved to /var/cache/conftool/dbconfig/20210128-122256-root.json
* 12:22 marostegui: Reboot db1146:3312 db1146:3314
* 12:21 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1146:3312, db1146:3314 for kernel upgrade and enablement of report_host', diff saved to https://phabricator.wikimedia.org/P14024 and previous config saved to /var/cache/conftool/dbconfig/20210128-122118-marostegui.json
* 12:12 dcausse@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [[phab:T271493|T271493]]: [cirrus] set 50kb limit on file text indexing for commons (duration: 01m 09s)
* 12:07 marostegui@cumin1001: dbctl commit (dc=all): 'es1025 (re)pooling @ 70%: Repooling after the restart', diff saved to https://phabricator.wikimedia.org/P14023 and previous config saved to /var/cache/conftool/dbconfig/20210128-120752-root.json
* 12:07 dcausse@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [[phab:T266027|T266027]]: [cirrus] Swith to perfield builder for spaceless languages (duration: 01m 06s)
* 11:52 marostegui@cumin1001: dbctl commit (dc=all): 'es1025 (re)pooling @ 50%: Repooling after the restart', diff saved to https://phabricator.wikimedia.org/P14022 and previous config saved to /var/cache/conftool/dbconfig/20210128-115249-root.json
* 11:45 gilles@deploy1001: Finished deploy [performance/navtiming@446e5df]: (no justification provided) (duration: 00m 05s)
* 11:45 gilles@deploy1001: Started deploy [performance/navtiming@446e5df]: (no justification provided)
* 11:37 vgutierrez: upgrade pybal to 1.15.9 in esams
* 11:30 elukey: disable nginx proxy buffering on archiva.wikimedia.org for a perf test - [[phab:T252767|T252767]]
* 11:22 marostegui@cumin1001: dbctl commit (dc=all): 'es1025 (re)pooling @ 30%: Repooling after the restart', diff saved to https://phabricator.wikimedia.org/P14020 and previous config saved to /var/cache/conftool/dbconfig/20210128-112242-root.json
* 11:21 vgutierrez: upgrade pybal to 1.15.9 in eqiad
* 11:20 godog: swift codfw-prod decrease SSD weight for ms-be20[16-27] - [[phab:T272837|T272837]]
* 11:07 marostegui@cumin1001: dbctl commit (dc=all): 'es1025 (re)pooling @ 10%: Repooling after the restart', diff saved to https://phabricator.wikimedia.org/P14019 and previous config saved to /var/cache/conftool/dbconfig/20210128-110739-root.json
* 11:04 marostegui: Restart mysql on es1025  [[phab:T266483|T266483]]
* 11:03 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es1025 [[phab:T266483|T266483]]', diff saved to https://phabricator.wikimedia.org/P14018 and previous config saved to /var/cache/conftool/dbconfig/20210128-110353-marostegui.json
* 11:01 _joe_: restarting php-fpm on the appserver,api and jobrunner clusters in eqiad, 10% at a time, for simulating scap rolling restarts [[phab:T266055|T266055]]
* 10:52 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool es5 on writes [[phab:T266483|T266483]] (duration: 01m 05s)
* 10:46 marostegui: Restart mysql on es1024  [[phab:T266483|T266483]]
* 10:46 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool es5 from writes [[phab:T266483|T266483]] (duration: 01m 09s)
* 10:33 _joe_: performing a test-run of the rolling restart of php-fpm in codfw, using the same code scap will use [[phab:T266055|T266055]]. Starting from the api cluster, then proceeding whith others
* 10:15 _joe_: upgrading pybal on lvs2008
* 10:11 _joe_: upgrading pybal on lvs2009
* 10:10 vgutierrez: upgrade pybal to 1.15.9 in eqsin
* 09:56 marostegui@cumin1001: dbctl commit (dc=all): 'db1169 (re)pooling @ 100%: Pooling for the first time very slowly', diff saved to https://phabricator.wikimedia.org/P14017 and previous config saved to /var/cache/conftool/dbconfig/20210128-095642-root.json
* 09:48 _joe_: upgrading pybal to 1.15.9 in codfw, starting from lvs2010
* 09:47 jbond42: upload new cas package to apt
* 09:41 marostegui@cumin1001: dbctl commit (dc=all): 'db1169 (re)pooling @ 80%: Pooling for the first time very slowly', diff saved to https://phabricator.wikimedia.org/P14016 and previous config saved to /var/cache/conftool/dbconfig/20210128-094139-root.json
* 09:30 _joe_: upgrading pybal on lvs4006
* 09:26 marostegui@cumin1001: dbctl commit (dc=all): 'db1169 (re)pooling @ 70%: Pooling for the first time very slowly', diff saved to https://phabricator.wikimedia.org/P14015 and previous config saved to /var/cache/conftool/dbconfig/20210128-092635-root.json
* 09:25 _joe_: upgrading pybal on lvs4005
* 09:11 _joe_: installing pybal 1.15.9 on lvs4007
* 09:11 marostegui@cumin1001: dbctl commit (dc=all): 'db1169 (re)pooling @ 60%: Pooling for the first time very slowly', diff saved to https://phabricator.wikimedia.org/P14014 and previous config saved to /var/cache/conftool/dbconfig/20210128-091131-root.json
* 09:08 moritzm: installing perf updates on Stretch
* 09:06 marostegui: Testing wikitech
* 09:00 _joe_: uploading pybal 1.15.9 to apt.wikimedia.org
* 08:58 moritzm: installing perf updates on Buster
* 08:56 marostegui@cumin1001: dbctl commit (dc=all): 'db1169 (re)pooling @ 50%: Pooling for the first time very slowly', diff saved to https://phabricator.wikimedia.org/P14013 and previous config saved to /var/cache/conftool/dbconfig/20210128-085627-root.json
* 08:41 marostegui@cumin1001: dbctl commit (dc=all): 'db1169 (re)pooling @ 40%: Pooling for the first time very slowly', diff saved to https://phabricator.wikimedia.org/P14012 and previous config saved to /var/cache/conftool/dbconfig/20210128-084123-root.json
* 08:34 godog: swift codfw-prod decrease SSD weight for ms-be20[16-27] - [[phab:T272837|T272837]]
* 08:33 marostegui@cumin1001: dbctl commit (dc=all): 'db1144:3314 (re)pooling @ 100%: After upgrading the kernel', diff saved to https://phabricator.wikimedia.org/P14011 and previous config saved to /var/cache/conftool/dbconfig/20210128-083347-root.json
* 08:33 marostegui@cumin1001: dbctl commit (dc=all): 'db1144:3315 (re)pooling @ 100%: After upgrading the kernel', diff saved to https://phabricator.wikimedia.org/P14010 and previous config saved to /var/cache/conftool/dbconfig/20210128-083337-root.json
* 08:32 vgutierrez: pool cp1087 - [[phab:T273153|T273153]]
* 08:26 marostegui@cumin1001: dbctl commit (dc=all): 'db1169 (re)pooling @ 30%: Pooling for the first time very slowly', diff saved to https://phabricator.wikimedia.org/P14009 and previous config saved to /var/cache/conftool/dbconfig/20210128-082620-root.json
* 08:20 vgutierrez: restart purged on cp1087 - [[phab:T273153|T273153]]
* 08:18 marostegui@cumin1001: dbctl commit (dc=all): 'db1144:3314 (re)pooling @ 75%: After upgrading the kernel', diff saved to https://phabricator.wikimedia.org/P14008 and previous config saved to /var/cache/conftool/dbconfig/20210128-081843-root.json
* 08:18 marostegui@cumin1001: dbctl commit (dc=all): 'db1144:3315 (re)pooling @ 75%: After upgrading the kernel', diff saved to https://phabricator.wikimedia.org/P14007 and previous config saved to /var/cache/conftool/dbconfig/20210128-081834-root.json
* 08:11 marostegui@cumin1001: dbctl commit (dc=all): 'db1169 (re)pooling @ 20%: Pooling for the first time very slowly', diff saved to https://phabricator.wikimedia.org/P14006 and previous config saved to /var/cache/conftool/dbconfig/20210128-081116-root.json
* 08:03 marostegui@cumin1001: dbctl commit (dc=all): 'db1144:3314 (re)pooling @ 50%: After upgrading the kernel', diff saved to https://phabricator.wikimedia.org/P14005 and previous config saved to /var/cache/conftool/dbconfig/20210128-080340-root.json
* 08:03 marostegui@cumin1001: dbctl commit (dc=all): 'db1144:3315 (re)pooling @ 50%: After upgrading the kernel', diff saved to https://phabricator.wikimedia.org/P14004 and previous config saved to /var/cache/conftool/dbconfig/20210128-080330-root.json
* 07:56 marostegui@cumin1001: dbctl commit (dc=all): 'db1169 (re)pooling @ 15%: Pooling for the first time very slowly', diff saved to https://phabricator.wikimedia.org/P14003 and previous config saved to /var/cache/conftool/dbconfig/20210128-075613-root.json
* 07:54 moritzm: installing tomcat9 security updates
* 07:48 marostegui@cumin1001: dbctl commit (dc=all): 'db1144:3314 (re)pooling @ 25%: After upgrading the kernel', diff saved to https://phabricator.wikimedia.org/P14002 and previous config saved to /var/cache/conftool/dbconfig/20210128-074836-root.json
* 07:48 marostegui@cumin1001: dbctl commit (dc=all): 'db1144:3315 (re)pooling @ 25%: After upgrading the kernel', diff saved to https://phabricator.wikimedia.org/P14001 and previous config saved to /var/cache/conftool/dbconfig/20210128-074827-root.json
* 07:34 marostegui@cumin1001: dbctl commit (dc=all): 'Give db1169 some more minimal weight [[phab:T258361|T258361]]', diff saved to https://phabricator.wikimedia.org/P14000 and previous config saved to /var/cache/conftool/dbconfig/20210128-073426-marostegui.json
* 07:33 marostegui@cumin1001: dbctl commit (dc=all): 'db1144:3314 (re)pooling @ 10%: After upgrading the kernel', diff saved to https://phabricator.wikimedia.org/P13999 and previous config saved to /var/cache/conftool/dbconfig/20210128-073333-root.json
* 07:33 marostegui@cumin1001: dbctl commit (dc=all): 'db1144:3315 (re)pooling @ 10%: After upgrading the kernel', diff saved to https://phabricator.wikimedia.org/P13998 and previous config saved to /var/cache/conftool/dbconfig/20210128-073323-root.json
* 07:25 elukey: powercycle cp1087 (after depooling it)
* 07:24 elukey@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp1087.eqiad.wmnet
* 07:21 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1144:3315 for kernel upgrade and enablement of report_host', diff saved to https://phabricator.wikimedia.org/P13997 and previous config saved to /var/cache/conftool/dbconfig/20210128-072154-marostegui.json
* 07:21 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1144:3314 for kernel upgrade and enablement of report_host', diff saved to https://phabricator.wikimedia.org/P13996 and previous config saved to /var/cache/conftool/dbconfig/20210128-072120-marostegui.json
* 07:20 marostegui@cumin1001: dbctl commit (dc=all): 'Give db1169 some more minimal weight [[phab:T258361|T258361]]', diff saved to https://phabricator.wikimedia.org/P13995 and previous config saved to /var/cache/conftool/dbconfig/20210128-072036-marostegui.json
* 06:38 marostegui@cumin1001: dbctl commit (dc=all): 'Add db1169 to s1 for the first time, with minimal weight [[phab:T258361|T258361]]', diff saved to https://phabricator.wikimedia.org/P13994 and previous config saved to /var/cache/conftool/dbconfig/20210128-063806-marostegui.json
* 06:36 marostegui@cumin1001: dbctl commit (dc=all): 'Add db1169 to dbctl [[phab:T258361|T258361]]', diff saved to https://phabricator.wikimedia.org/P13993 and previous config saved to /var/cache/conftool/dbconfig/20210128-063655-marostegui.json
* 03:03 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1268.eqiad.wmnet
* 03:00 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1268.eqiad.wmnet
* 02:13 legoktm@cumin1001: conftool action : set/pooled=yes; selector: name=mw2291.codfw.wmnet
* 02:13 legoktm@cumin1001: conftool action : set/pooled=yes; selector: name=mw2290.codfw.wmnet
* 02:13 legoktm@cumin1001: conftool action : set/pooled=yes; selector: name=mw2288.codfw.wmnet
* 02:05 legoktm@cumin1001: conftool action : set/pooled=no; selector: name=mw2288.codfw.wmnet
* 02:05 legoktm@cumin1001: conftool action : set/pooled=no; selector: name=mw2290.codfw.wmnet
* 02:05 legoktm@cumin1001: conftool action : set/pooled=no; selector: name=mw2291.codfw.wmnet
* 02:02 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1268.eqiad.wmnet with reason: REIMAGE
* 02:00 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1268.eqiad.wmnet with reason: REIMAGE
* 01:35 legoktm@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2291.codfw.wmnet with reason: REIMAGE
* 01:35 legoktm@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on mw2288.codfw.wmnet with reason: reimaging
* 01:35 legoktm@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on mw2288.codfw.wmnet with reason: reimaging
* 01:33 legoktm@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2291.codfw.wmnet with reason: REIMAGE
* 01:33 legoktm@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2290.codfw.wmnet with reason: REIMAGE
* 01:32 legoktm@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 1:00:00 on mw2288.codfw.wmnet with reason: reimaging
* 01:32 legoktm@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on mw2288.codfw.wmnet with reason: reimaging
* 01:32 legoktm@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 1:00:00 on mw2288.codfw.wmnet with reason: reimaging
* 01:32 legoktm@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on mw2288.codfw.wmnet with reason: reimaging
* 01:31 legoktm@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mw2288.codfw.wmnet with reason: REIMAGE
* 01:31 legoktm@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2290.codfw.wmnet with reason: REIMAGE
* 01:31 legoktm@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2288.codfw.wmnet with reason: REIMAGE
* 01:24 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1268.eqiad.wmnet
* 01:14 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1268.eqiad.wmnet
* 01:10 legoktm@cumin1001: conftool action : set/pooled=yes; selector: name=mw2294.codfw.wmnet
* 01:09 legoktm@cumin1001: conftool action : set/pooled=yes; selector: name=mw2293.codfw.wmnet
* 01:09 legoktm@cumin1001: conftool action : set/pooled=yes; selector: name=mw2292.codfw.wmnet
* 00:56 legoktm@cumin1001: conftool action : set/pooled=no; selector: name=mw2294.codfw.wmnet
* 00:56 legoktm@cumin1001: conftool action : set/pooled=no; selector: name=mw2293.codfw.wmnet
* 00:56 legoktm@cumin1001: conftool action : set/pooled=no; selector: name=mw2292.codfw.wmnet
* 00:50 Urbanecm: Evening B&C done
* 00:49 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|87c304c5439b1b7898f951db61d0a0a8a11ee4f7}}: Disable max-width on page namespace for wikisource ([[phab:T260091|T260091]]; 2nd take) (duration: 01m 00s)
* 00:43 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1404.eqiad.wmnet
* 00:41 foks: reset email for User:Uwe Martens
* 00:40 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1404.eqiad.wmnet
* 00:39 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1404.wmnet
* 00:33 urbanecm@deploy1001: Synchronized php-1.36.0-wmf.27/includes/: {{Gerrit|c5c39ba8b3fce3f946e161191b814446aa5c1f4b}}: Fix fetching ipblock-exempt within BlockManager::getUserBlock ([[phab:T271551|T271551]], [[phab:T270145|T270145]]) (duration: 01m 04s)
* 00:32 legoktm@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on mw2293.codfw.wmnet with reason: reimaging
* 00:32 legoktm@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on mw2293.codfw.wmnet with reason: reimaging
* 00:31 urbanecm@deploy1001: Synchronized php-1.36.0-wmf.28/includes/: {{Gerrit|a67fe4f7cbf172b82153aaceaa93a067cdff2ae4}}: Fix fetching ipblock-exempt within BlockManager::getUserBlock ([[phab:T271551|T271551]], [[phab:T270145|T270145]]) (duration: 01m 07s)
* 00:28 legoktm@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2292.codfw.wmnet with reason: REIMAGE
* 00:26 urbanecm@deploy1001: Synchronized php-1.36.0-wmf.28/extensions/GrowthExperiments/includes/HomepageModules/BaseModule.php: {{Gerrit|5417e0c8518b54144b99c963a1bbff3d15a00b32}}: Fix BaseModule::BASE_CSS_CLASS visibility ([[phab:T273099|T273099]]) (duration: 01m 00s)
* 00:26 legoktm@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2294.codfw.wmnet with reason: REIMAGE
* 00:24 legoktm@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mw2293.codfw.wmnet with reason: REIMAGE
* 00:24 legoktm@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2292.codfw.wmnet with reason: REIMAGE
* 00:24 legoktm@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2293.codfw.wmnet with reason: REIMAGE
* 00:24 legoktm@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2294.codfw.wmnet with reason: REIMAGE
* 00:20 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1268.eqiad.wmnet with reason: REIMAGE
* 00:18 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1268.eqiad.wmnet with reason: REIMAGE
* 00:16 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1404.eqiad.wmnet with reason: REIMAGE
* 00:14 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1404.eqiad.wmnet with reason: REIMAGE
* 00:12 urbanecm@deploy1001: Scap failed!: Call to mwscript eval.php stderr: not empty


== 2021-01-27 ==
== 2022-08-05 ==
* 23:30 shdubsh: reboot logstash2006
* 22:20 dcausse@deploy1002: Finished deploy [wikimedia/discovery/analytics@71fe016]: Fix schedule_interval for image_recommendation_weekly (duration: 02m 01s)
* 22:47 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2246.codfw.wmnet
* 22:18 dcausse@deploy1002: Started deploy [wikimedia/discovery/analytics@71fe016]: Fix schedule_interval for image_recommendation_weekly
* 22:46 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2246.codfw.wmnet
* 17:08 pt1979@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1195.eqiad.wmnet with OS bullseye
* 22:45 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2222.codfw.wmnet
* 16:54 pt1979@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1194.eqiad.wmnet with OS bullseye
* 22:44 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2222.codfw.wmnet
* 16:53 pt1979@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1195.eqiad.wmnet with reason: host reimage
* 22:42 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1405.eqiad.wmnet
* 16:49 pt1979@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1195.eqiad.wmnet with reason: host reimage
* 22:39 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1405.eqiad.wmnet
* 16:41 pt1979@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1194.eqiad.wmnet with reason: host reimage
* 21:57 dancy@deploy1001: rebuilt and synchronized wikiversions files: group0 wikis to 1.36.0-wmf.28
* 16:37 pt1979@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1194.eqiad.wmnet with reason: host reimage
* 21:51 ebernhardson@deploy1001: Finished deploy [wikimedia/discovery/analytics@ae24e12]: repoint ores thresholds to yesterday (duration: 02m 23s)
* 16:34 pt1979@cumin1001: START - Cookbook sre.hosts.reimage for host db1195.eqiad.wmnet with OS bullseye
* 21:48 ebernhardson@deploy1001: Started deploy [wikimedia/discovery/analytics@ae24e12]: repoint ores thresholds to yesterday
* 16:27 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp203[56]\.codfw\.wmnet,service=varnish-fe
* 21:17 ebernhardson@deploy1001: Finished deploy [wikimedia/discovery/analytics@1c9d487]: airflow: hourly tasks must wait for yesterdays daily task (duration: 07m 54s)
* 16:27 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp203[56]\.codfw\.wmnet,service=ats-be
* 21:09 ebernhardson@deploy1001: Started deploy [wikimedia/discovery/analytics@1c9d487]: airflow: hourly tasks must wait for yesterdays daily task
* 16:27 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp203[56]\.codfw\.wmnet,service=ats-tls
* 21:09 ebernhardson@deploy1001: deploy aborted: airflow: hourly tasks must wait for yesterdays daily tank (duration: 00m 00s)
* 16:26 pt1979@cumin1001: START - Cookbook sre.hosts.reimage for host db1194.eqiad.wmnet with OS bullseye
* 21:09 ebernhardson@deploy1001: Started deploy [wikimedia/discovery/analytics@1c9d487]: airflow: hourly tasks must wait for yesterdays daily tank
* 16:25 pt1979@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1193.eqiad.wmnet with OS bullseye
* 20:58 brennen@deploy1001: Synchronized php-1.36.0-wmf.28/includes/libs/objectcache/RedisBagOStuff.php: Backport: [[gerrit:658780{{!}}objectcache: fix broken for loop in RedisBagOStuff::doSetMulti() (T273006)]] (duration: 01m 07s)
* 16:21 pt1979@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=1) for host db1192.eqiad.wmnet with OS bullseye
* 20:55 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2246.codfw.wmnet with reason: REIMAGE
* 16:12 dcausse@deploy1002: Finished deploy [wikimedia/discovery/analytics@8489923]: [[phab:T304954|T304954]]: Automate imagesuggestion imports (duration: 02m 03s)
* 20:53 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2246.codfw.wmnet with reason: REIMAGE
* 16:11 pt1979@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1193.eqiad.wmnet with reason: host reimage
* 20:47 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2222.codfw.wmnet with reason: REIMAGE
* 16:11 milimetric@deploy1002: Finished deploy [analytics/refinery@fe7bf9e]: Hotfix for webrequest load refine, now with FORCE :) (duration: 06m 09s)
* 20:45 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2222.codfw.wmnet with reason: REIMAGE
* 16:10 dcausse@deploy1002: Started deploy [wikimedia/discovery/analytics@8489923]: [[phab:T304954|T304954]]: Automate imagesuggestion imports
* 20:44 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2299.codfw.wmnet
* 16:07 pt1979@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1193.eqiad.wmnet with reason: host reimage
* 20:43 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2217.codfw.wmnet
* 16:07 pt1979@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1192.eqiad.wmnet with reason: host reimage
* 20:42 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2217.codfw.wmnet
* 16:05 milimetric@deploy1002: Started deploy [analytics/refinery@fe7bf9e]: Hotfix for webrequest load refine, now with FORCE :)
* 20:40 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2221.codfw.wmnet
* 16:04 milimetric@deploy1002: Finished deploy [analytics/refinery@fe7bf9e]: Hotfix for webrequest load refine (duration: 34m 38s)
* 20:37 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2221.codfw.wmnet
* 16:03 pt1979@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1192.eqiad.wmnet with reason: host reimage
* 20:31 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1405.eqiad.wmnet with reason: REIMAGE
* 15:55 pt1979@cumin1001: START - Cookbook sre.hosts.reimage for host db1193.eqiad.wmnet with OS bullseye
* 20:30 brennen: 1.36.0-wmf.28 ([[phab:T271342|T271342]]): taking over train while dancy is afk; waiting on [[gerrit:658939]] to merge and will sync for verification on testwikis
* 15:52 pt1979@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1191.eqiad.wmnet with OS bullseye
* 20:29 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1405.eqiad.wmnet with reason: REIMAGE
* 15:51 pt1979@cumin1001: START - Cookbook sre.hosts.reimage for host db1192.eqiad.wmnet with OS bullseye
* 20:25 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2216.codfw.wmnet
* 15:42 pt1979@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1190.eqiad.wmnet with OS bullseye
* 20:18 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2218.codfw.wmnet
* 15:38 pt1979@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1191.eqiad.wmnet with reason: host reimage
* 20:13 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2219.codfw.wmnet
* 15:34 pt1979@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1191.eqiad.wmnet with reason: host reimage
* 20:10 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1263.eqiad.wmnet
* 15:30 milimetric@deploy1002: Started deploy [analytics/refinery@fe7bf9e]: Hotfix for webrequest load refine
* 20:09 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2216.codfw.wmnet
* 15:28 pt1979@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1190.eqiad.wmnet with reason: host reimage
* 20:07 urbanecm@deploy1001: Synchronized logos/config.yaml: {{Gerrit|6c5dd65e6138eb32db8059720a2149d4728763e7}}: Undeploy cswiki birthday logo (duration: 01m 05s)
* 15:25 pt1979@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1190.eqiad.wmnet with reason: host reimage
* 20:06 urbanecm@deploy1001: Synchronized wmf-config/logos.php: {{Gerrit|6c5dd65e6138eb32db8059720a2149d4728763e7}}: Undeploy cswiki birthday logo (duration: 01m 06s)
* 15:24 jbond: upload trapperkeeper-metrics-clojure to puppet7 component
* 20:05 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2218.codfw.wmnet
* 15:22 pt1979@cumin1001: START - Cookbook sre.hosts.reimage for host db1191.eqiad.wmnet with OS bullseye
* 20:04 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2219.codfw.wmnet
* 15:19 jbond: upload puppetlabs-http-client-clojur to puppet7 component
* 20:04 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1263.eqiad.wmnet
* 15:17 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 19:46 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2221.codfw.wmnet with reason: REIMAGE
* 15:16 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 19:44 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2221.codfw.wmnet with reason: REIMAGE
* 15:16 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 19:22 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|53419ab6c0f2c306a68edb8979106bd42536211a}}: arwiki: Configure wgGEHomepageManualAssignmentMentorsList ([[phab:T273060|T273060]]) (duration: 00m 59s)
* 15:15 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 19:19 elukey: reboot an-launcher1002 for kernel upgrades
* 15:14 dancy@deploy1002: Finished scap: Backport for [[gerrit:820653]] scap gitignore: ignore all files under the `scap` directory (duration: 04m 41s)
* 19:17 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|cabb2e2009f97bb86c1b8827c3cc61cc991c41a9}}: Declare 6 more NavigationTiming eventlogging streams and migrate on testwiki ([[phab:T271208|T271208]]) (duration: 01m 00s)
* 15:11 jbond: upload jolokia to puppet7 component
* 19:07 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|9382a9879bd6823fd664c0d3721fd0a9dc0d56d8}}: Migrate WebUIActionsTracking schemas to Event Platform on all wikis ([[phab:T267347|T267347]],[[phab:T271164|T271164]]) (duration: 01m 03s)
* 15:10 pt1979@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1185.eqiad.wmnet with OS bullseye
* 19:06 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2215.codfw.wmnet
* 15:09 dancy@deploy1002: Started scap: Backport for [[gerrit:820653]] scap gitignore: ignore all files under the `scap` directory
* 18:53 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2215.codfw.wmnet
* 15:09 jbond: upload test-chuck-clojure to puppet7 component
* 18:50 mutante: testreduce1001 - making nginx listen on IPv6 and restarting it [[phab:T266509|T266509]]
* 15:05 pt1979@cumin1001: START - Cookbook sre.hosts.reimage for host db1190.eqiad.wmnet with OS bullseye
* 18:49 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1263.eqiad.wmnet with reason: REIMAGE
* 15:04 jbond: upload test-check-clojure to puppet7 component
* 18:47 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1263.eqiad.wmnet with reason: REIMAGE
* 14:57 jbond: upload nippy-clojure to puppet7 component
* 18:45 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2219.codfw.wmnet with reason: REIMAGE
* 14:56 pt1979@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1185.eqiad.wmnet with reason: host reimage
* 18:43 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2219.codfw.wmnet with reason: REIMAGE
* 14:52 pt1979@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1185.eqiad.wmnet with reason: host reimage
* 18:42 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2218.codfw.wmnet with reason: REIMAGE
* 14:43 jbond: upload fressian to puppet7 component
* 18:40 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2218.codfw.wmnet with reason: REIMAGE
* 14:40 pt1979@cumin1001: START - Cookbook sre.hosts.reimage for host db1185.eqiad.wmnet with OS bullseye
* 18:39 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2217.codfw.wmnet with reason: REIMAGE
* 14:40 jbond: upload test-generative-clojure to puppet7 component
* 18:37 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2217.codfw.wmnet with reason: REIMAGE
* 14:35 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 18:30 Tchanders: Creating the table securepoll_log in votewiki and testwiki ([[phab:T271270|T271270]])
* 14:34 jbond: upload data-generators-clojure to puppet7 component
* 18:25 hashar@deploy1001: Finished deploy [integration/docroot@da43ad4]: Add Shellbox to doc.wm.o , misc build related changes fdf0917..da43ad4 (duration: 00m 07s)
* 14:31 pt1979@cumin2002: START - Cookbook sre.dns.netbox
* 18:25 hashar@deploy1001: Started deploy [integration/docroot@da43ad4]: Add Shellbox to doc.wm.o , misc build related changes fdf0917..da43ad4
* 14:23 jbond: upload encore-clojure to puppet7 component
* 18:25 hashar@deploy1001: Finished deploy [integration/docroot@da43ad4]: Add Shellbox to doc.wm.o , misc build related changes fdf0917..da43ad4 (duration: 00m 10s)
* 14:17 jbond: upload truss-clojure to puppet7 component
* 18:25 hashar@deploy1001: Started deploy [integration/docroot@da43ad4]: Add Shellbox to doc.wm.o , misc build related changes fdf0917..da43ad4
* 14:13 jbond: upload structured-logging-clojure to puppet7 component
* 18:23 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1268.eqiad.wmnet
* 14:06 jbond: upload murphy-clojure to puppet7 component
* 18:21 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1268.eqiad.wmnet
* 13:57 jbond: upload logstash-logback-encoder-7.2 to puppet7 component
* 18:15 dpifke@deploy1001: Finished deploy [performance/arc-lamp@e24f319]: Re-deploying ArcLamp to webperf1002 (duration: 00m 05s)
* 13:49 jbond: upload kitchensink-clojure to puppet7 component
* 18:15 dpifke@deploy1001: Started deploy [performance/arc-lamp@e24f319]: Re-deploying ArcLamp to webperf1002
* 13:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depool hosts with fragile power supply ([[phab:T314559|T314559]] [[phab:T314628|T314628]])', diff saved to https://phabricator.wikimedia.org/P32292 and previous config saved to /var/cache/conftool/dbconfig/20220805-132709-ladsgroup.json
* 18:13 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2301.codfw.wmnet
* 13:12 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 12:00:00 on db2095.codfw.wmnet with reason: Maintenance
* 18:10 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1406.eqiad.wmnet
* 13:12 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 12:00:00 on db2095.codfw.wmnet with reason: Maintenance
* 18:07 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2216.codfw.wmnet with reason: REIMAGE
* 13:09 sukhe: repool codfw
* 18:06 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1407.eqiad.wmnet
* 13:02 jbond: upload honeysql-clojure to puppet7 component
* 18:05 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2216.codfw.wmnet with reason: REIMAGE
* 12:53 _joe_: progressive repool of services in codfw
* 18:04 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1407.eqiad.wmnet
* 12:24 moritzm: installing nano bugfix updates from bullseye point release
* 18:03 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2301.codfw.wmnet
* 11:50 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: sync
* 18:03 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1406.eqiad.wmnet
* 11:40 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: sync
* 17:50 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2215.codfw.wmnet with reason: REIMAGE
* 11:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repool after PDU maint on D3 ([[phab:T310146|T310146]])', diff saved to https://phabricator.wikimedia.org/P32291 and previous config saved to /var/cache/conftool/dbconfig/20220805-113729-ladsgroup.json
* 17:48 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2215.codfw.wmnet with reason: REIMAGE
* 11:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repool after PDU maint on C6 ([[phab:T310145|T310145]])', diff saved to https://phabricator.wikimedia.org/P32290 and previous config saved to /var/cache/conftool/dbconfig/20220805-113555-ladsgroup.json
* 17:34 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2301.codfw.wmnet with reason: REIMAGE
* 11:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repool after PDU maint on C5 ([[phab:T310145|T310145]])', diff saved to https://phabricator.wikimedia.org/P32289 and previous config saved to /var/cache/conftool/dbconfig/20220805-113436-ladsgroup.json
* 17:32 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2301.codfw.wmnet with reason: REIMAGE
* 10:46 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: sync
* 17:32 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1406.eqiad.wmnet with reason: REIMAGE
* 10:36 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: sync
* 17:30 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1407.eqiad.wmnet with reason: REIMAGE
* 10:17 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: sync
* 17:29 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1406.eqiad.wmnet with reason: REIMAGE
* 10:12 Amir1: dbmaint at s4@codfw ([[phab:T312863|T312863]])
* 17:28 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1407.eqiad.wmnet with reason: REIMAGE
* 10:07 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: sync
* 17:27 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1268.eqiad.wmnet with reason: REIMAGE
* 09:04 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on 12 hosts with reason: Maintenance
* 17:25 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1268.eqiad.wmnet with reason: REIMAGE
* 09:03 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on 12 hosts with reason: Maintenance
* 16:54 elukey@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventstreams-internal' for release 'main' .
* 09:03 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2110.codfw.wmnet with reason: Maintenance
* 16:40 elukey@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventstreams-internal' for release 'main' .
* 09:03 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2110.codfw.wmnet with reason: Maintenance
* 16:38 jayme@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'kube-system' for release 'rbac-deploy-clusterrole' .
* 00:53 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8 days, 0:00:00 on gerrit2001.wikimedia.org with reason: decom, replaced by gerrit2002
* 16:21 jayme@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'kube-system' for release 'rbac-deploy-clusterrole' .
* 00:53 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 8 days, 0:00:00 on gerrit2001.wikimedia.org with reason: decom, replaced by gerrit2002
* 16:18 moritzm: installing python-bottle security updates
* 00:53 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for gerrit2002.wikimedia.org
* 15:42 elukey: umount /var/hadoop/data/r on an-worker1099 and restart hadoop daemons - [[phab:T273034|T273034]]
* 00:53 dzahn@cumin1001: START - Cookbook sre.hosts.remove-downtime for gerrit2002.wikimedia.org
* 15:31 otto@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Migrate 5 NavigationTiming schemas to Event Platform on group0 and group1 - [[phab:T271208|T271208]] (duration: 01m 07s)
* 00:52 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8 days, 0:00:00 on gerrit2002.wikimedia.org with reason: decom, replaced by gerrit2002
* 15:15 godog: bounce rsyslog on centrallog1001
* 00:52 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 8 days, 0:00:00 on gerrit2002.wikimedia.org with reason: decom, replaced by gerrit2002
* 13:52 kharlan@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'linkrecommendation' for release 'external' .
* 00:18 mutante: restarting gerrit for config change - removing old replica [[phab:T313250|T313250]]
* 13:52 kharlan@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'linkrecommendation' for release 'production' .
* 13:48 kharlan@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'linkrecommendation' for release 'production' .
* 13:48 kharlan@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'linkrecommendation' for release 'external' .
* 13:43 kharlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'linkrecommendation' for release 'staging' .
* 13:25 godog: swift codfw-prod decrease SSD weight for ms-be20[16-27] - [[phab:T272837|T272837]]
* 13:20 godog: swift codfw-prod decrease SSD weight for ms-be20[16-27] - [[phab:T272837|T272837]]
* 12:33 marostegui@cumin1001: dbctl commit (dc=all): 'db1143 (re)pooling @ 100%: After upgrading the kernel', diff saved to https://phabricator.wikimedia.org/P13989 and previous config saved to /var/cache/conftool/dbconfig/20210127-123300-root.json
* 12:25 awight: EU bacon done
* 12:25 awight@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:658594{{!}}Enable bracket matching on the first wikis (T270238)]] (duration: 01m 07s)
* 12:20 awight@deploy1001: Synchronized php-1.36.0-wmf.27/extensions/CodeMirror: Backport: [[gerrit:658814{{!}}Improve matchbrackets performance when moving the cursor (T270317)]] (duration: 01m 06s)
* 12:19 awight@deploy1001: Synchronized php-1.36.0-wmf.28/extensions/CodeMirror: Backport: [[gerrit:658815{{!}}Improve matchbrackets performance when moving the cursor (T270317)]] (duration: 01m 14s)
* 12:17 marostegui@cumin1001: dbctl commit (dc=all): 'db1143 (re)pooling @ 75%: After upgrading the kernel', diff saved to https://phabricator.wikimedia.org/P13988 and previous config saved to /var/cache/conftool/dbconfig/20210127-121756-root.json
* 12:02 marostegui@cumin1001: dbctl commit (dc=all): 'db1143 (re)pooling @ 50%: After upgrading the kernel', diff saved to https://phabricator.wikimedia.org/P13987 and previous config saved to /var/cache/conftool/dbconfig/20210127-120253-root.json
* 11:47 marostegui@cumin1001: dbctl commit (dc=all): 'db1143 (re)pooling @ 25%: After upgrading the kernel', diff saved to https://phabricator.wikimedia.org/P13986 and previous config saved to /var/cache/conftool/dbconfig/20210127-114749-root.json
* 11:32 marostegui@cumin1001: dbctl commit (dc=all): 'db1143 (re)pooling @ 10%: After upgrading the kernel', diff saved to https://phabricator.wikimedia.org/P13985 and previous config saved to /var/cache/conftool/dbconfig/20210127-113245-root.json
* 10:57 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1143 for kernel upgrade and enablement of report_host', diff saved to https://phabricator.wikimedia.org/P13984 and previous config saved to /var/cache/conftool/dbconfig/20210127-105735-marostegui.json
* 10:36 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host schema2004.codfw.wmnet
* 10:23 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host schema2004.codfw.wmnet
* 10:23 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host schema2003.codfw.wmnet
* 10:20 marostegui@cumin1001: dbctl commit (dc=all): 'Pool db1160 with final weight [[phab:T258361|T258361]]', diff saved to https://phabricator.wikimedia.org/P13982 and previous config saved to /var/cache/conftool/dbconfig/20210127-102042-marostegui.json
* 10:18 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host schema2003.codfw.wmnet
* 10:17 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host schema1004.eqiad.wmnet
* 10:15 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host schema1004.eqiad.wmnet
* 10:14 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host schema1003.eqiad.wmnet
* 10:12 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host schema1003.eqiad.wmnet
* 10:05 elukey: reboot matomo1002 for kernel upgrades
* 10:02 marostegui@cumin1001: dbctl commit (dc=all): 'Pool db1160 with more weight [[phab:T258361|T258361]]', diff saved to https://phabricator.wikimedia.org/P13981 and previous config saved to /var/cache/conftool/dbconfig/20210127-100220-marostegui.json
* 09:38 marostegui@cumin1001: dbctl commit (dc=all): 'Pool db1160 with more weight [[phab:T258361|T258361]]', diff saved to https://phabricator.wikimedia.org/P13980 and previous config saved to /var/cache/conftool/dbconfig/20210127-093802-marostegui.json
* 09:19 marostegui@cumin1001: dbctl commit (dc=all): 'Pool db1160 with more weight [[phab:T258361|T258361]]', diff saved to https://phabricator.wikimedia.org/P13979 and previous config saved to /var/cache/conftool/dbconfig/20210127-091909-marostegui.json
* 09:04 jbond42: deploy fix to enable-puppet
* 09:03 godog: swift codfw-prod decrease SSD weight for ms-be20[16-27] - [[phab:T272837|T272837]]
* 08:36 marostegui@cumin1001: dbctl commit (dc=all): 'Pool db1160 with more weight [[phab:T258361|T258361]]', diff saved to https://phabricator.wikimedia.org/P13978 and previous config saved to /var/cache/conftool/dbconfig/20210127-083618-marostegui.json
* 08:29 marostegui: Stop mysql on db1089 to clone db1169 [[phab:T258361|T258361]]
* 08:28 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1089 to clone db1169 [[phab:T258361|T258361]]', diff saved to https://phabricator.wikimedia.org/P13976 and previous config saved to /var/cache/conftool/dbconfig/20210127-082826-marostegui.json
* 08:11 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1121', diff saved to https://phabricator.wikimedia.org/P13975 and previous config saved to /var/cache/conftool/dbconfig/20210127-081150-marostegui.json
* 08:07 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1121', diff saved to https://phabricator.wikimedia.org/P13974 and previous config saved to /var/cache/conftool/dbconfig/20210127-080753-marostegui.json
* 08:06 marostegui@cumin1001: dbctl commit (dc=all): 'db1085 (re)pooling @ 100%: After moving clouddb replicas', diff saved to https://phabricator.wikimedia.org/P13973 and previous config saved to /var/cache/conftool/dbconfig/20210127-080645-root.json
* 07:57 marostegui@cumin1001: dbctl commit (dc=all): 'Give db1160 some more small weight [[phab:T258361|T258361]]', diff saved to https://phabricator.wikimedia.org/P13972 and previous config saved to /var/cache/conftool/dbconfig/20210127-075715-marostegui.json
* 07:51 marostegui@cumin1001: dbctl commit (dc=all): 'db1085 (re)pooling @ 75%: After moving clouddb replicas', diff saved to https://phabricator.wikimedia.org/P13971 and previous config saved to /var/cache/conftool/dbconfig/20210127-075142-root.json
* 07:36 marostegui@cumin1001: dbctl commit (dc=all): 'db1085 (re)pooling @ 50%: After moving clouddb replicas', diff saved to https://phabricator.wikimedia.org/P13970 and previous config saved to /var/cache/conftool/dbconfig/20210127-073638-root.json
* 07:26 elukey: powercycle analytics1073 - kernel soft lock up bug registered, os needs a reboot
* 07:21 marostegui@cumin1001: dbctl commit (dc=all): 'db1085 (re)pooling @ 25%: After moving clouddb replicas', diff saved to https://phabricator.wikimedia.org/P13969 and previous config saved to /var/cache/conftool/dbconfig/20210127-072135-root.json
* 07:05 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1085 [[phab:T272008|T272008]]', diff saved to https://phabricator.wikimedia.org/P13968 and previous config saved to /var/cache/conftool/dbconfig/20210127-070502-marostegui.json
* 06:57 marostegui@cumin1001: dbctl commit (dc=all): 'Give db1160 some more small weight [[phab:T258361|T258361]]', diff saved to https://phabricator.wikimedia.org/P13967 and previous config saved to /var/cache/conftool/dbconfig/20210127-065715-marostegui.json
* 06:39 marostegui@cumin1001: dbctl commit (dc=all): 'Give db1160 some more small weight [[phab:T258361|T258361]]', diff saved to https://phabricator.wikimedia.org/P13966 and previous config saved to /var/cache/conftool/dbconfig/20210127-063930-marostegui.json
* 06:13 marostegui@cumin1001: dbctl commit (dc=all): 'Pool db1160 with minimal weight [[phab:T258361|T258361]]', diff saved to https://phabricator.wikimedia.org/P13965 and previous config saved to /var/cache/conftool/dbconfig/20210127-061336-marostegui.json
* 06:03 twentyafterfour: phabricator appears to be up and running fine
* 06:03 twentyafterfour: phabricator is read-write
* 06:01 twentyafterfour: phabricator is read-only
* 06:00 marostegui: m3 master restart, phabricator will go on read only - [[phab:T272596|T272596]]
* 05:50 marostegui: Deploy schema change on s3 [[phab:T270055|T270055]]
* 03:48 ryankemper: (Restarted `wdqs-blazegraph` on `wdqs1012`)
* 02:24 ebernhardson@deploy1001: Finished deploy [wikimedia/discovery/analytics@9c85a21]: transfer_to_es: start date 2020 -> 2021 (duration: 02m 59s)
* 02:21 ebernhardson@deploy1001: Started deploy [wikimedia/discovery/analytics@9c85a21]: transfer_to_es: start date 2020 -> 2021
* 01:58 ryankemper: [WDQS Deploy] Restarting `wdqs-categories` across lvs-managed hosts, one node at a time: `sudo -E cumin -b 1 'A:wdqs-all and not A:wdqs-test' 'depool && sleep 45 && systemctl restart wdqs-categories && sleep 45 && pool'`
* 01:57 ryankemper: [WDQS Deploy] Restarted `wdqs-categories` across all test hosts simultaneously: `sudo -E cumin 'A:wdqs-test' 'systemctl restart wdqs-categories'`
* 01:57 ryankemper: [WDQS Deploy] Restarted `wdqs-updater` across all hosts, 4 hosts at a time: `sudo -E cumin -b 4 'A:wdqs-all' 'systemctl restart wdqs-updater'`
* 01:56 ryankemper@deploy1001: Finished deploy [wdqs/wdqs@6c6b2cb]: 0.3.61 (duration: 07m 50s)
* 01:50 ryankemper: [WDQS Deploy] Tests passing following deploy of `0.3.61` on canary `wdqs1003`; proceeding to rest of fleet
* 01:48 ryankemper@deploy1001: Started deploy [wdqs/wdqs@6c6b2cb]: 0.3.61
* 01:48 ryankemper: [WDQS Deploy] Gearing up for deploy of wdqs `0.3.61`. Pre-deploy tests passing on canary `wdqs1003`
* 01:39 ebernhardson@deploy1001: Finished deploy [wikimedia/discovery/analytics@ee948e0]: transfer_to_es: Enable catchup (duration: 01m 11s)
* 01:38 ebernhardson@deploy1001: Started deploy [wikimedia/discovery/analytics@ee948e0]: transfer_to_es: Enable catchup
* 01:25 legoktm@cumin1001: conftool action : set/pooled=yes; selector: name=mw2296.codfw.wmnet
* 01:25 legoktm@cumin1001: conftool action : set/pooled=yes; selector: name=mw2295.codfw.wmnet
* 01:24 ryankemper: [[phab:T272713|T272713]] [Deploy envoy for `wdqs-internal`] Roll-out complete. Will monitor `wdqs-internal` for any issues. All the remaining `WDQS SPARQL` alerts should clear shortly
* 01:21 ryankemper: [[phab:T272713|T272713]] [Deploy envoy for `wdqs-internal`] Test queries to `wdqs1003.eqiad.wmnet` passed, and metrics in Grafana (https://grafana.wikimedia.org/d/000000489/wikidata-query-service?orgId=1&var-cluster_name=wdqs-internal&from=1611706751381&to=1611710190405) look good. Rolling out to rest of fleet
* 01:21 legoktm@cumin1001: conftool action : set/pooled=no; selector: name=mw2296.codfw.wmnet
* 01:20 legoktm@cumin1001: conftool action : set/pooled=no; selector: name=mw2295.codfw.wmnet
* 01:14 ebernhardson@deploy1001: Finished deploy [wikimedia/discovery/analytics@246b640]: remove link recommendations from hourly transfer deps (duration: 03m 31s)
* 01:10 ebernhardson@deploy1001: Started deploy [wikimedia/discovery/analytics@246b640]: remove link recommendations from hourly transfer deps
* 00:54 legoktm@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2296.codfw.wmnet with reason: REIMAGE
* 00:52 legoktm@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2295.codfw.wmnet with reason: REIMAGE
* 00:51 ryankemper: [[phab:T272713|T272713]] [Deploy envoy for `wdqs-internal`] Fixed typo in private key in commit `ea152df802b55e939d34494a4965ed83a80a24f2`. Puppet run on `wdqs1003` was successful as a result. Monitoring...
* 00:49 legoktm@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2295.codfw.wmnet with reason: REIMAGE
* 00:49 legoktm@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2296.codfw.wmnet with reason: REIMAGE
* 00:45 ryankemper: [[phab:T272713|T272713]] [Deploy envoy for `wdqs-internal`] Discovered source of the above failure; the secret key in the puppetmaster `/srv/private` repo has a typo in its name (my error): it had `wqds` instead of `wdqs`. Opening up a patch now
* 00:45 ryankemper: [[phab:T272713|T272713]] [Deploy envoy for `wdqs-internal`] `...Error while evaluating a Function Call, secret(): invalid secret ssl/wdqs-internal.discovery.wmnet.key (file: /etc/puppet/modules/sslcert/manifests/certificate.pp, line: 91, column: 26) (file: /etc/puppet/modules/profile/manifests/tlsproxy/envoy.pp, line: 129) on node wdqs1003.eqiad.wmnet`
* 00:36 ryankemper: [Deploy envoy for `wdqs-internal`] `...Error while evaluating a Function Call, secret(): invalid secret ssl/wdqs-internal.discovery.wmnet.key (file: /etc/puppet/modules/sslcert/manifests/certificate.pp, line: 91, column: 26) (file: /etc/puppet/modules/profile/manifests/tlsproxy/envoy.pp, line: 129) on node wdqs1003.eqiad.wmnet`
* 00:20 ryankemper: [[phab:T272713|T272713]] [Deploy envoy for `wdqs-internal`] Disabled puppet on all `wdqs-internal` hosts; merged https://gerrit.wikimedia.org/r/c/operations/puppet/+/657913
* 00:16 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs2008.codfw.wmnet with reason: Enabling envoy for wdqs-internal
* 00:16 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs2008.codfw.wmnet with reason: Enabling envoy for wdqs-internal
* 00:15 ryankemper: [[phab:T272713|T272713]] [Deploy envoy for `wdqs-internal`] Downtimed all `wdqs-internal` hosts on icinga
* 00:15 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs2006.codfw.wmnet with reason: Enabling envoy for wdqs-internal
* 00:15 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs2006.codfw.wmnet with reason: Enabling envoy for wdqs-internal
* 00:15 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs2005.codfw.wmnet with reason: Enabling envoy for wdqs-internal
* 00:15 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs2005.codfw.wmnet with reason: Enabling envoy for wdqs-internal
* 00:15 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs2004.codfw.wmnet with reason: Enabling envoy for wdqs-internal
* 00:15 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs2004.codfw.wmnet with reason: Enabling envoy for wdqs-internal
* 00:15 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs1011.eqiad.wmnet with reason: Enabling envoy for wdqs-internal
* 00:15 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs1011.eqiad.wmnet with reason: Enabling envoy for wdqs-internal
* 00:15 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs1008.eqiad.wmnet with reason: Enabling envoy for wdqs-internal
* 00:15 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs1008.eqiad.wmnet with reason: Enabling envoy for wdqs-internal
* 00:14 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs1003.eqiad.wmnet with reason: Enabling envoy for wdqs-internal
* 00:14 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs1003.eqiad.wmnet with reason: Enabling envoy for wdqs-internal


== 2021-01-26 ==
== 2022-08-04 ==
* 23:43 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2297.codfw.wmnet
* 23:07 mutante: switching gerrit-replica.wikimedia.org to new machine gerrit2002, dropping gerrit-replica-new.wikimedia.org [[phab:T313250|T313250]]
* 23:41 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2298.codfw.wmnet
* 21:07 ryankemper@deploy1002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply
* 23:40 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2302.codfw.wmnet
* 20:59 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 23:37 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1264.eqiad.wmnet
* 20:57 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 23:32 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2297.codfw.wmnet
* 20:57 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 23:31 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1264.eqiad.wmnet
* 20:56 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 23:31 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2298.codfw
* 20:56 thcipriani@deploy1002: Finished scap: Backport for [[gerrit:819774]] tkwiki: Update wordmark (duration: 06m 12s)
* 20:51 ryankemper@deploy1002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply
* 20:51 ryankemper@deploy1002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply
* 20:51 ryankemper@deploy1002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply
* 20:50 thcipriani@deploy1002: Started scap: Backport for [[gerrit:819774]] tkwiki: Update wordmark
* 20:48 thcipriani@deploy1002: Finished scap: Backport for [[gerrit:812391]] [config]: Add click event logging for mobile and desktop (duration: 39m 16s)
* 20:45 ryankemper@deploy1002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply
* 20:24 ryankemper@deploy1002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply
* 20:23 ryankemper@deploy1002: helmfile [staging] DONE helmfile.d/services/changeprop-jobqueue: apply
* 20:22 ryankemper@deploy1002: helmfile [staging] START helmfile.d/


== 2021-01-25 ==
== 2022-08-03 ==
* 23:09 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2318.codfw.wmnet with reason: REIMAGE
* 23:59 dzahn@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on gerrit1001.wikimedia.org with reason: service restart
* 23:07 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2319.codfw.wmnet with reason: REIMAGE
* 23:50 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312 ([[phab:T312972|T312972]])', diff saved to https://phabricator.wikimedia.org/P32270 and previous config saved to /var/cache/conftool/dbconfig/20220803-235030-marostegui.json
* 23:06 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2318.codfw.wmnet with reason: REIMAGE
* 22:50 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1170:3312 ([[phab:T312972|T312972]])', diff saved to https://phabricator.wikimedia.org/P32269 and previous config saved to /var/cache/conftool/dbconfig/20220803-225015-marostegui.json
*
* 22:50 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1170.eqiad.wmnet with reason: Maintenance
* 22:49 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1170.eqiad.wmnet with reason: Maintenance
* 22:49 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on 9 hosts with reason: Maintenance
* 22:49 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on 9 hosts with reason: Maintenance
* 22:49 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2104.codfw.wmnet with reason: Maintenance
* 22:49 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2104.codfw.wmnet with reason: Maintenance
* 22:48 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
* 22:48 marostegui@cumin1001: START - Cookbook


== 2021-01-23 ==
== 2022-08-02 ==
* 22:21 volker-e@deploy1001: Finished deploy [design/style-guide@63e39e7]: Deploy design/style-guide: {{Gerrit|63e39e7}} “Components”: Amend button groups states SVG font stack (#427) (duration: 00m 06s)
* 22:39 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 22:21 volker-e@deploy1001: Started deploy [design/style-guide@63e39e7]: Deploy design/style-guide: {{Gerrit|63e39e7}} “Components”: Amend button groups states SVG font stack (#427)
* 22:32 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 04:05 ryankemper: Depooled `wdqs1013` (it has ~50 mins of lag to catch up on, and also the bad gateway above)
* 22:32 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 04:03 ryankemper: Restarted `wdqs-blazegraph` on `wdqs1013`: `sudo systemctl restart wdqs-blazegraph`
* 22:25 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 01:52 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2332.codfw.wmnet
* 22:15 mutante: gerrit - syncing data (/srv/gerrit /var/lib/gerrit2/review_site  /home) again after gerrit2002 was reimaged with buster [[phab:T313250|T313250]] [[phab:T313972|T313972]]
* 01:52 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2328.codfw.wmnet
* 22:04 dancy@deploy1002: Finished deploy [gerrit/gerrit@94c5028]: (no justification provided) (duration: 00m 06s)
* 01:51 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2332.codfw.wmnet
* 22:04 dancy@deploy1002: Started deploy [gerrit/gerrit@94c5028]: (no justification provided)
* 01:51 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2328.codfw.wmnet
* 22:00 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 01:50 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2330.codfw.wmnet
* 21:59 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 01:50 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2334.codfw.wmnet
* 21:59 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 01:48 foks: reset user email for Davey2010
* 21:58 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 01:44 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1413.eqiad.wmnet
* 21:58 dancy@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.39.0-wmf.23  refs [[phab:T308076|T308076]]
* 01:43 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1268.eqiad.wmnet
* 21:53 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 01:41 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2330.codfw.wmnet
* 21:47 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 01:41 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2334.codfw.wmnet
* 21:46 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 01:40 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1268.eqiad.wmnet
* 21:40 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 01:39 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1413.eqiad.wmnet
* 21:35 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 00:46 legoktm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Switch enwiki to use enwiki20 "Option A" logo variant ([[phab:T272526|T272526]]) (duration: 00m 57s)
* 21:29 ladsgroup@deploy1002: Synchronized php-1.39.0-wmf.23/extensions/CirrusSearch/includes/Sanity/Checker.php: Backport: [[gerrit:819621{{!}}Fix appending of join conds (T312421 T314439)]] (duration: 03m 15s)
* 00:36 legoktm@deploy1001: Synchronized static/images/project-logos/: Add enwiki20 "Option A" fixed logos ([[phab:T272526|T272526]]) (duration: 00m 59s)
* 21:28 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 21:28 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 21:27 bking@cumin1001: END (PASS) - Cookbook sre.elasticsearch.rolling-operation (exit_code=0) Operation.UPGRADE (3 nodes at a time) for ElasticSearch cluster search_eqiad: deploy wmf-elasticsearch-search-plugins pkg - bking@cumin1001 - [[phab:T314078|T314078]]
* 21:21 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 21:11 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host gerrit2002.wikimedia.org with OS buster
* 21:01 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 21:01 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 21:00 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:59 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:58 dancy@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.39.0-wmf.22  refs [[phab:T308076|T308076]]
* 20:54 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:53 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:53 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:53 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on gerrit2002.wikimedia.org with reason: host reimage
* 20:52 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:51 dzahn@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on gerrit2002.wikimedia.org with reason: host reimage
* 20:50 dancy@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.39.0-wmf.23  refs [[phab:T308076|T308076]]
* 20:38 mutante: re-imaging gerrit2002 with buster - because it's on bullseye, needs git-fat and that has not been ported to python3 yet which blocks upgrading gerrit machines otherwise [[phab:T313250|T313250]] [[phab:T243027|T243027]] [[phab:T279509|T279509]]
* 20:37 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:36 dzahn@cumin2002: START - Cookbook sre.hosts.reimage for host gerrit2002.wikimedia.org with OS buster
* 20:36 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:36 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:36 urbanecm: UTC evening B&C window done
* 20:35 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:33 urbanecm@deploy1002: Synchronized php-1.39.0-wmf.23/includes/Rest/Handler/HTMLTransformInput.php: {{Gerrit|69e91528a5c6f372af520307dc2f4227b9981442}}: ParsoidHandler: fix page bundle input with no orig HTML (duration: 03m 22s)
* 20:30 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:29 urbanecm@deploy1002: Synchronized php-1.39.0-wmf.23/includes/Rest/Handler/ParsoidHandler.php: {{Gerrit|322a960e3777bc01fa8823908340c36e3851a648}}: ParsoidHandler: pass metrics object to HTMLTransformInput (duration: 03m 19s)
* 20:27 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:27 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:22 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:20 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|5fac0aaf8e76a6f8cc3302771eac068e4f866e5f}}: GrowthExperiments: Remove wgGEHomepageTutorialTitle (duration: 03m 26s)
* 20:06 dancy@deploy1002: Finished scap: Backport for [[gerrit:819612]] Revert "Bump wikimedia/parsoid to 0.16.0-a18" (duration: 11m 30s)
* 20:01 dancy@deploy1002: Finished deploy [gerrit/gerrit@94c5028]: (no justification provided) (duration: 00m 05s)
* 20:01 dancy@deploy1002: Started deploy [gerrit/gerrit@94c5028]: (no justification provided)
* 19:59 dancy@deploy1002: Finished deploy [gerrit/gerrit@94c5028]: (no justification provided) (duration: 00m 01s)
* 19:59 dancy@deploy1002: Started deploy [gerrit/gerrit@94c5028]: (no justification provided)
* 19:55 dancy@deploy1002: Started scap: Backport for [[gerrit:819612]] Revert "Bump wikimedia/parsoid to 0.16.0-a18"
* 19:42 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 19:37 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp2034.codfw.wmnet,service=ats-tls
* 19:37 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp2034.codfw.wmnet,service=varnish-fe
* 19:37 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp2034.codfw.wmnet,service=ats-be
* 19:36 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 19:36 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 19:36 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp2033.codfw.wmnet,service=ats-tls
* 19:36 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp2033.codfw.wmnet,service=varnish-fe
* 19:36 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp2033.codfw.wmnet,service=ats-be
* 19:36 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for ms-be[2041,2046].codfw.wmnet
* 19:35 mvernon@cumin2002: START - Cookbook sre.hosts.remove-downtime for ms-be[2041,2046].codfw.wmnet
* 19:29 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 19:28 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for thanos-fe2002.codfw.wmnet
* 19:28 mvernon@cumin2002: START - Cookbook sre.hosts.remove-downtime for thanos-fe2002.codfw.wmnet
* 19:26 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for ms-fe2010.codfw.wmnet
* 19:26 mvernon@cumin2002: START - Cookbook sre.hosts.remove-downtime for ms-fe2010.codfw.wmnet
* 19:21 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp2032.codfw.wmnet,service=ats-tls
* 19:21 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp2032.codfw.wmnet,service=varnish-fe
* 19:21 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp2032.codfw.wmnet,service=ats-be
* 19:17 rzl@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on mc2038.codfw.wmnet with reason: install
* 19:17 rzl@cumin2002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on mc2038.codfw.wmnet with reason: install
* 19:13 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp2031.codfw.wmnet,service=ats-tls
* 19:13 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp2031.codfw.wmnet,service=varnish-fe
* 19:13 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp2031.codfw.wmnet,service=ats-be
* 19:11 mutante: gerrit1001 - rsyncing /home/ to gerrit2002:/srv/home-gerrit1001.wikimedia.org [[phab:T313250|T313250]]
* 19:01 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on gerrit2002.wikimedia.org with reason: new machine
* 19:01 dzahn@cumin2002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on gerrit2002.wikimedia.org with reason: new machine
* 18:55 dancy@deploy1002: Finished scap: testwikis wikis to 1.39.0-wmf.23  refs [[phab:T308076|T308076]] (duration: 50m 39s)
* 18:54 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 18:52 ejegg: updated payments-wiki from {{Gerrit|589bb64e}} to {{Gerrit|e1b6036a}} (just i18n changes in extensions)
* 18:47 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 18:47 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 18:46 bking@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation Operation.UPGRADE (3 nodes at a time) for ElasticSearch cluster search_eqiad: deploy wmf-elasticsearch-search-plugins pkg - bking@cumin1001 - [[phab:T314078|T314078]]
* 18:46 rzl@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on mc2038.codfw.wmnet with reason: install
* 18:45 rzl@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on mc2038.codfw.wmnet with reason: install
* 18:41 rzl@cumin2002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for mc2038.codfw.wmnet
* 18:41 rzl@cumin2002: START - Cookbook sre.hosts.remove-downtime for mc2038.codfw.wmnet
* 18:39 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 18:19 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 18:18 rzl@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc2038.codfw.wmnet with reason: install
* 18:18 rzl@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on mc2038.codfw.wmnet with reason: install
* 18:17 rzl@cumin2002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mw2038.codfw.wmnet with reason: install
* 18:17 rzl@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2038.codfw.wmnet with reason: install
* 18:17 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 18:17 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 18:16 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on lvs2008.codfw.wmnet with reason: shutdown for PDU upgrade
* 18:16 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on lvs2008.codfw.wmnet with reason: shutdown for PDU upgrade
* 18:16 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 18:11 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 18:08 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 18:08 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 18:05 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 18:04 dancy@deploy1002: Started scap: testwikis wikis to 1.39.0-wmf.23  refs [[phab:T308076|T308076]]
* 17:52 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1168 ([[phab:T312972|T312972]])', diff saved to https://phabricator.wikimedia.org/P32185 and previous config saved to /var/cache/conftool/dbconfig/20220802-175233-marostegui.json
* 17:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depool db2159', diff saved to https://phabricator.wikimedia.org/P32184 and previous config saved to /var/cache/conftool/dbconfig/20220802-174311-ladsgroup.json
* 17:37 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1168', diff saved to https://phabricator.wikimedia.org/P32183 and previous config saved to /var/cache/conftool/dbconfig/20220802-173723-marostegui.json
* 17:35 moritzm: installing node-moment security updates
* 17:32 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic[2041-2042,2057].codfw.wmnet with reason: [[phab:T310070|T310070]]
* 17:32 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic[2041-2042,2057].codfw.wmnet with reason: [[phab:T310070|T310070]]
* 17:27 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2013.codfw.wmnet
* 17:25 moritzm: installing fribidi security updates
* 17:22 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1168', diff saved to https://phabricator.wikimedia.org/P32182 and previous config saved to /var/cache/conftool/dbconfig/20220802-172217-marostegui.json
* 17:20 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp2030.codfw.wmnet,service=ats-tls
* 17:20 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp2030.codfw.wmnet,service=varnish-fe
* 17:20 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp2030.codfw.wmnet,service=ats-be
* 17:18 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2013.codfw.wmnet
* 17:07 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1168 ([[phab:T312972|T312972]])', diff saved to https://phabricator.wikimedia.org/P32181 and previous config saved to /var/cache/conftool/dbconfig/20220802-170711-marostegui.json
* 17:06 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc[2042-2043].codfw.wmnet with reason: shutdown for PDU upgrade
* 17:06 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on mc[2042-2043].codfw.wmnet with reason: shutdown for PDU upgrade
* 17:05 Emperor: ms-be20[31,32,41,46].codfw.wmnet,ms-fe2010.codfw.wmnet,thanos-fe2002.codfw.wmnet downtime for PDU work [[phab:T309957|T309957]]
* 17:05 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1168 ([[phab:T312972|T312972]])', diff saved to https://phabricator.wikimedia.org/P32180 and previous config saved to /var/cache/conftool/dbconfig/20220802-170503-marostegui.json
* 17:04 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1168.eqiad.wmnet with reason: Maintenance
* 17:04 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1168.eqiad.wmnet with reason: Maintenance
* 17:04 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on dbstore1005.eqiad.wmnet with reason: Maintenance
* 17:04 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on 6 hosts with reason: shutdown for PDU replacement
* 17:04 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on dbstore1005.eqiad.wmnet with reason: Maintenance
* 17:04 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on 6 hosts with reason: shutdown for PDU replacement
* 17:04 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on 8 hosts with reason: Maintenance
* 17:03 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on 8 hosts with reason: Maintenance
* 17:03 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2129.codfw.wmnet with reason: Maintenance
* 17:03 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2129.codfw.wmnet with reason: Maintenance
* 17:03 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1131 ([[phab:T312972|T312972]])', diff saved to https://phabricator.wikimedia.org/P32179 and previous config saved to /var/cache/conftool/dbconfig/20220802-170333-marostegui.json
* 17:01 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp2029.codfw.wmnet,service=ats-tls
* 17:01 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp2029.codfw.wmnet,service=varnish-fe
* 17:01 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp2029.codfw.wmnet,service=ats-be
* 17:00 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for ms-be[2030,2045,2052].codfw.wmnet
* 17:00 mvernon@cumin2002: START - Cookbook sre.hosts.remove-downtime for ms-be[2030,2045,2052].codfw.wmnet
* 16:57 btullis@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host an-airflow1004.eqiad.wmnet
* 16:54 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: sync
* 16:53 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: sync
* 16:51 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: sync
* 16:49 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: sync
* 16:48 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: sync
* 16:48 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1131', diff saved to https://phabricator.wikimedia.org/P32178 and previous config saved to /var/cache/conftool/dbconfig/20220802-164827-marostegui.json
* 16:38 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: sync
* 16:35 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/changeprop-jobqueue: sync
* 16:35 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/changeprop-jobqueue: sync
* 16:33 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1131', diff saved to https://phabricator.wikimedia.org/P32177 and previous config saved to /var/cache/conftool/dbconfig/20220802-163321-marostegui.json
* 16:29 dancy@mwmaint1002: pull aborted:  (duration: 00m 07s)
* 16:25 rzl: rzl@stat1007:~$ sudo systemctl stop wmde-analytics-daily-early  # wedged, timer will restart it now with max_runtime_seconds
* 16:18 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1131 ([[phab:T312972|T312972]])', diff saved to https://phabricator.wikimedia.org/P32176 and previous config saved to /var/cache/conftool/dbconfig/20220802-161815-marostegui.json
* 16:16 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1131 ([[phab:T312972|T312972]])', diff saved to https://phabricator.wikimedia.org/P32175 and previous config saved to /var/cache/conftool/dbconfig/20220802-161607-marostegui.json
* 16:16 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1131.eqiad.wmnet with reason: Maintenance
* 16:15 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1131.eqiad.wmnet with reason: Maintenance
* 16:15 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1165 ([[phab:T312972|T312972]])', diff saved to https://phabricator.wikimedia.org/P32174 and previous config saved to /var/cache/conftool/dbconfig/20220802-161545-marostegui.json
* 16:10 btullis@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) an-airflow1004.eqiad.wmnet on all recursors
* 16:10 btullis@cumin1001: START - Cookbook sre.dns.wipe-cache an-airflow1004.eqiad.wmnet on all recursors
* 16:10 btullis@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 16:05 btullis@cumin1001: START - Cookbook sre.dns.netbox
* 16:05 btullis@cumin1001: START - Cookbook sre.ganeti.makevm for new host an-airflow1004.eqiad.wmnet
* 16:00 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1165', diff saved to https://phabricator.wikimedia.org/P32173 and previous config saved to /var/cache/conftool/dbconfig/20220802-160039-marostegui.json
* 15:51 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on elastic2056.codfw.wmnet with reason: [[phab:T309957|T309957]]
* 15:50 bking@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on elastic2056.codfw.wmnet with reason: [[phab:T309957|T309957]]
* 15:49 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on elastic2040.codfw.wmnet with reason: [[phab:T309957|T309957]]
* 15:49 bking@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on elastic2040.codfw.wmnet with reason: [[phab:T309957|T309957]]
* 15:46 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on elastic2039.codfw.wmnet with reason: [[phab:T309957|T309957]]
* 15:45 bking@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on elastic2039.codfw.wmnet with reason: [[phab:T309957|T309957]]
* 15:45 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1165', diff saved to https://phabricator.wikimedia.org/P32172 and previous config saved to /var/cache/conftool/dbconfig/20220802-154533-marostegui.json
* 15:37 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc[2040-2041].codfw.wmnet with reason: shutdown for PDU upgrade
* 15:37 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on mc[2040-2041].codfw.wmnet with reason: shutdown for PDU upgrade
* 15:36 bking@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host elastic2037.codfw.wmnet
* 15:36 bking@cumin1001: START - Cookbook sre.hosts.reboot-single for host elastic2037.codfw.wmnet
* 15:30 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1165 ([[phab:T312972|T312972]])', diff saved to https://phabricator.wikimedia.org/P32171 and previous config saved to /var/cache/conftool/dbconfig/20220802-153027-marostegui.json
* 15:28 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1165 ([[phab:T312972|T312972]])', diff saved to https://phabricator.wikimedia.org/P32170 and previous config saved to /var/cache/conftool/dbconfig/20220802-152818-marostegui.json
* 15:28 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
* 15:27 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
* 15:27 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1165.eqiad.wmnet with reason: Maintenance
* 15:27 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1165.eqiad.wmnet with reason: Maintenance
* 15:27 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3316 ([[phab:T312972|T312972]])', diff saved to https://phabricator.wikimedia.org/P32169 and previous config saved to /var/cache/conftool/dbconfig/20220802-152740-marostegui.json
* 15:24 moritzm: installing gnupg2 security updates
* 15:15 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc2024.codfw.wmnet with reason: shutdown for PDU upgrade
* 15:15 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on mc2024.codfw.wmnet with reason: shutdown for PDU upgrade
* 15:13 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host puppetmaster1004.eqiad.wmnet with OS buster
* 15:12 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3316', diff saved to https://phabricator.wikimedia.org/P32167 and previous config saved to /var/cache/conftool/dbconfig/20220802-151234-marostegui.json
* 15:10 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on ganeti-test[2001-2003].codfw.wmnet with reason: Power down for PDU maintenance, [[phab:T310070|T310070]]
* 15:10 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 4:00:00 on ganeti-test[2001-2003].codfw.wmnet with reason: Power down for PDU maintenance, [[phab:T310070|T310070]]
* 15:08 root@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on thanos-be2001.codfw.wmnet with reason: pdu
* 15:08 root@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on thanos-be2001.codfw.wmnet with reason: pdu
* 15:07 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on ms-be[2030,2045,2052].codfw.wmnet with reason: shutdown for PDU replacement
* 15:07 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 3:00:00 on ms-be[2030,2045,2052].codfw.wmnet with reason: shutdown for PDU replacement
* 15:06 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on mc-gp2002.codfw.wmnet with reason: Power down for PDU maintenance, [[phab:T310070|T310070]]
* 15:06 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 4:00:00 on mc-gp2002.codfw.wmnet with reason: Power down for PDU maintenance, [[phab:T310070|T310070]]
* 15:04 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on elastic2037.codfw.wmnet with reason: [[phab:T309957|T309957]]
* 15:04 bking@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on elastic2037.codfw.wmnet with reason: [[phab:T309957|T309957]]
* 15:01 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 6 hosts with reason: shutdown for PDU upgrade
* 15:00 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on 6 hosts with reason: shutdown for PDU upgrade
* 14:59 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on elastic2025.codfw.wmnet with reason: [[phab:T309957|T309957]]
* 14:59 bking@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on elastic2025.codfw.wmnet with reason: [[phab:T309957|T309957]]
* 14:58 oblivian@puppetmaster1001: conftool action : set/pooled=false; selector: dnsdisc=(appservers{{!}}api)-ro,name=codfw
* 14:57 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3316', diff saved to https://phabricator.wikimedia.org/P32166 and previous config saved to /var/cache/conftool/dbconfig/20220802-145728-marostegui.json
* 14:54 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic2060.codfw.wmnet with OS bullseye
* 14:53 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on puppetmaster1004.eqiad.wmnet with reason: host reimage
* 14:50 moritzm: uploaded gnupg2 2.1.18-8~deb9u4+wmf1 to stretch-wikimedia
* 14:50 jbond@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on puppetmaster1004.eqiad.wmnet with reason: host reimage
* 14:42 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3316 ([[phab:T312972|T312972]])', diff saved to https://phabricator.wikimedia.org/P32164 and previous config saved to /var/cache/conftool/dbconfig/20220802-144222-marostegui.json
* 14:40 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1113:3316 ([[phab:T312972|T312972]])', diff saved to https://phabricator.wikimedia.org/P32163 and previous config saved to /var/cache/conftool/dbconfig/20220802-144013-marostegui.json
* 14:40 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1113.eqiad.wmnet with reason: Maintenance
* 14:39 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1113.eqiad.wmnet with reason: Maintenance
* 14:39 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3316 ([[phab:T312972|T312972]])', diff saved to https://phabricator.wikimedia.org/P32162 and previous config saved to /var/cache/conftool/dbconfig/20220802-143952-marostegui.json
* 14:37 jbond@cumin1001: START - Cookbook sre.hosts.reimage for host puppetmaster1004.eqiad.wmnet with OS buster
* 14:32 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic2060.codfw.wmnet with reason: host reimage
* 14:28 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic2060.codfw.wmnet with reason: host reimage
* 14:24 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3316', diff saved to https://phabricator.wikimedia.org/P32161 and previous config saved to /var/cache/conftool/dbconfig/20220802-142446-marostegui.json
* 14:23 Emperor: shutdown ms-be20[30,45,52] for PDU work [[phab:T309957|T309957]]
* 14:22 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on ms-be[2030,2045,2052].codfw.wmnet with reason: shutdown for PDU replacement
* 14:21 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on ms-be[2030,2045,2052].codfw.wmnet with reason: shutdown for PDU replacement
* 14:12 ryankemper@cumin1001: START - Cookbook sre.hosts.reimage for host elastic2060.codfw.wmnet with OS bullseye
* 14:09 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3316', diff saved to https://phabricator.wikimedia.org/P32160 and previous config saved to /var/cache/conftool/dbconfig/20220802-140940-marostegui.json
* 14:05 jbond@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host puppetmaster2004.codfw.wmnet with OS buster
* 14:04 godog: grow sda/sdb 3 by 100G on thanos-be1001 - [[phab:T314275|T314275]]
* 14:03 root@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on centrallog2002.codfw.wmnet with reason: pdu
* 14:03 root@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on centrallog2002.codfw.wmnet with reason: pdu
* 14:01 root@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on prometheus2005.codfw.wmnet with reason: pdu
* 14:01 root@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on prometheus2005.codfw.wmnet with reason: pdu
* 13:57 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp2030.codfw.wmnet,service=ats-tls
* 13:57 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp2032.codfw.wmnet,service=ats-be
* 13:57 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp2031.codfw.wmnet,service=ats-be
* 13:56 godog: schedule poweroff for centrallog2002 at 16 utc - [[phab:T310070|T310070]]
* 13:54 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp203[3-4].codfw.wmnet,service=ats-be
* 13:54 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3316 ([[phab:T312972|T312972]])', diff saved to https://phabricator.wikimedia.org/P32159 and previous config saved to /var/cache/conftool/dbconfig/20220802-135435-marostegui.json
* 13:53 godog: depool and poweroff prometheus2005 - [[phab:T310070|T310070]]
* 13:53 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp203[3-4].codfw.wmnet,service=ats-tls
* 13:53 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp203[3-4].codfw.wmnet,service=ats-tls
* 13:53 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp203[3-4].codfw.wmnet,service=varnish-fe
* 13:52 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1098:3316 ([[phab:T312972|T312972]])', diff saved to https://phabricator.wikimedia.org/P32158 and previous config saved to /var/cache/conftool/dbconfig/20220802-135226-marostegui.json
* 13:52 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1098.eqiad.wmnet with reason: Maintenance
* 13:52 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp203[1-2].codfw.wmnet,service=ats-tls
* 13:51 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1098.eqiad.wmnet with reason: Maintenance
* 13:51 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3316 ([[phab:T312972|T312972]])', diff saved to https://phabricator.wikimedia.org/P32157 and previous config saved to /var/cache/conftool/dbconfig/20220802-135155-marostegui.json
* 13:51 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp203[1-2].codfw.wmnet,service=ats-tls
* 13:51 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp203[1-2].codfw.wmnet,service=varnish-fe
* 13:50 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp2030.codfw.wmnet,service=ats-be
* 13:50 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp2030.codfw.wmnet,service=varnish-fe
* 13:50 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp2030.codfw.wmnet,service=ats-be
* 13:50 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp2029.codfw.wmnet,service=ats-tls
* 13:50 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp2029.codfw.wmnet,service=varnish-fe
* 13:50 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp2029.codfw.wmnet,service=ats-be
* 13:45 jbond@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on puppetmaster2004.codfw.wmnet with reason: host reimage
* 13:42 jbond@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on puppetmaster2004.codfw.wmnet with reason: host reimage
* 13:42 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 13:42 Lucas_WMDE: UTC afternoon backport+config window done
* 13:41 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti2013.codfw.wmnet with OS bullseye
* 13:41 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 13:41 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 13:40 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:754933{{!}}Enable usage tracking for statement for cebwiki (T296384)]] – expected to gradually increase number of wbc_entity_usage and probably recentchanges rows on cebwiki, but not too much, see task for details (duration: 03m 06s)
* 13:40 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 13:39 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic2028.codfw.wmnet with OS bullseye
* 13:36 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3316', diff saved to https://phabricator.wikimedia.org/P32156 and previous config saved to /var/cache/conftool/dbconfig/20220802-133648-marostegui.json
* 13:35 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 13:34 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/Wikibase.php: Config: [[gerrit:754937{{!}}Introduce $wmgEntityUsageModifierLimitsStatement (T296384)]] (2/2) (duration: 03m 21s)
* 13:34 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 13:34 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 13:33 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 13:31 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:754937{{!}}Introduce $wmgEntityUsageModifierLimitsStatement (T296384)]] (1/2) (duration: 03m 16s)
* 13:30 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on ganeti2028.codfw.wmnet with reason: Power down for PDU maintenance, [[phab:T309957|T309957]]
* 13:30 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 4:00:00 on ganeti2028.codfw.wmnet with reason: Power down for PDU maintenance, [[phab:T309957|T309957]]
* 13:27 jbond@cumin2002: START - Cookbook sre.hosts.reimage for host puppetmaster2004.codfw.wmnet with OS buster
* 13:24 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti2013.codfw.wmnet with reason: host reimage
* 13:24 vgutierrez: restarting ATS 9.x instances to apply https://gerrit.wikimedia.org/r/819585 - [[phab:T309651|T309651]]
* 13:23 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic2028.codfw.wmnet with reason: host reimage
* 13:21 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3316', diff saved to https://phabricator.wikimedia.org/P32155 and previous config saved to /var/cache/conftool/dbconfig/20220802-132142-marostegui.json
* 13:19 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti2013.codfw.wmnet with reason: host reimage
* 13:19 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic2028.codfw.wmnet with reason: host reimage
* 13:17 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 13:17 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 13:17 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 13:16 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 13:15 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|a4499e5ac23a0558bed276e2b74134590afc5c95}}:  Revert "testwiki: Add mediawiki.web_ui.interactions stream" ([[phab:T314151|T314151]], [[phab:T311268|T311268]]) (duration: 03m 19s)
* 13:10 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 13:09 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 13:09 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 13:09 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|c2fb8a58d8f62e29a15ebee26198e79e4597d24c}}: Enable RealtimePreview on Group 0 wikis ([[phab:T314150|T314150]]) (duration: 03m 21s)
* 13:08 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 13:06 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3316 ([[phab:T312972|T312972]])', diff saved to https://phabricator.wikimedia.org/P32154 and previous config saved to /var/cache/conftool/dbconfig/20220802-130636-marostegui.json
* 13:04 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1096:3316 ([[phab:T312972|T312972]])', diff saved to https://phabricator.wikimedia.org/P32153 and previous config saved to /var/cache/conftool/dbconfig/20220802-130428-marostegui.json
* 13:04 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1096.eqiad.wmnet with reason: Maintenance
* 13:04 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1096.eqiad.wmnet with reason: Maintenance
* 13:04 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1140.eqiad.wmnet with reason: Maintenance
* 13:03 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1140.eqiad.wmnet with reason: Maintenance
* 13:03 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1180 ([[phab:T312972|T312972]])', diff saved to https://phabricator.wikimedia.org/P32152 and previous config saved to /var/cache/conftool/dbconfig/20220802-130351-marostegui.json
* 13:02 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti2013.codfw.wmnet with OS bullseye
* 13:00 ryankemper@cumin1001: START - Cookbook sre.hosts.reimage for host elastic2028.codfw.wmnet with OS bullseye
* 13:00 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on ganeti2013.codfw.wmnet with reason: Remove node for eventual reimage, [[phab:T311686|T311686]]
* 12:59 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on ganeti2013.codfw.wmnet with reason: Remove node for eventual reimage, [[phab:T311686|T311686]]
* 12:48 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1180', diff saved to https://phabricator.wikimedia.org/P32151 and previous config saved to /var/cache/conftool/dbconfig/20220802-124845-marostegui.json
* 12:33 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1180', diff saved to https://phabricator.wikimedia.org/P32150 and previous config saved to /var/cache/conftool/dbconfig/20220802-123338-marostegui.json
* 12:18 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1180 ([[phab:T312972|T312972]])', diff saved to https://phabricator.wikimedia.org/P32149 and previous config saved to /var/cache/conftool/dbconfig/20220802-121832-marostegui.json
* 12:16 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1180 ([[phab:T312972|T312972]])', diff saved to https://phabricator.wikimedia.org/P32148 and previous config saved to /var/cache/conftool/dbconfig/20220802-121624-marostegui.json
* 12:16 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1180.eqiad.wmnet with reason: Maintenance
* 12:15 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1180.eqiad.wmnet with reason: Maintenance
* 12:13 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 12:12 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 12:12 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 12:11 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 12:01 marostegui: dbmaint x1@eqiad [[phab:T314087|T314087]]
* 11:57 marostegui: dbmaint s7@eqiad [[phab:T314377|T314377]]
* 11:57 marostegui: dbmaint s3@eqiad [[phab:T314377|T314377]]
* 11:57 marostegui: dbmaint s8@eqiad [[phab:T314377|T314377]]
* 11:55 marostegui: dbmait s8@eqiad [[phab:T314377|T314377]]
* 11:54 marostegui: dbmait s3@eqiad [[phab:T314377|T314377]]
* 11:50 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
* 11:48 marostegui: dbmait s7@eqiad [[phab:T314377|T314377]]
* 11:46 marostegui: dbmait s4@eqiad [[phab:T314377|T314377]]
* 11:35 elukey: restart rsyslog on ml-serve1006
* 10:50 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on an-worker1082.eqiad.wmnet with reason: [[phab:T312626|T312626]] btullis
* 10:50 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on an-worker1082.eqiad.wmnet with reason: [[phab:T312626|T312626]] btullis
* 10:49 godog: grow sda3 by 100G on thanos-be2004 - [[phab:T314275|T314275]]
* 10:42 btullis@puppetmaster1001: conftool action : set/pooled=inactive; selector: cluster=wikireplicas-b,name=dbproxy1018.eqiad.wmnet
* 10:42 btullis@puppetmaster1001: conftool action : set/pooled=yes; selector: cluster=wikireplicas-b,name=dbproxy1019.eqiad.wmnet
* 10:35 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 10:34 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 10:34 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 10:34 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 10:33 marostegui@cumin1001: dbctl commit (dc=all): 'db1181 (re)pooling @ 100%: After restart', diff saved to https://phabricator.wikimedia.org/P32147 and previous config saved to /var/cache/conftool/dbconfig/20220802-103318-root.json
* 10:18 marostegui@cumin1001: dbctl commit (dc=all): 'db1181 (re)pooling @ 75%: After restart', diff saved to https://phabricator.wikimedia.org/P32146 and previous config saved to /var/cache/conftool/dbconfig/20220802-101813-root.json
* 10:15 marostegui@cumin1001: dbctl commit (dc=all): 'Add db2175 to s2 [[phab:T311494|T311494]]', diff saved to https://phabricator.wikimedia.org/P32145 and previous config saved to /var/cache/conftool/dbconfig/20220802-101522-marostegui.json
* 10:12 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dbproxy1019.eqiad.wmnet with OS bullseye
* 10:05 jynus: shutdown dbprov2002 backup2005 backup2008 [[phab:T310070|T310070]]
* 10:03 marostegui@cumin1001: dbctl commit (dc=all): 'db1181 (re)pooling @ 50%: After restart', diff saved to https://phabricator.wikimedia.org/P32144 and previous config saved to /var/cache/conftool/dbconfig/20220802-100308-root.json
* 10:03 marostegui@cumin1001: dbctl commit (dc=all): 'db1174 (re)pooling @ 100%: After maintenance', diff saved to https://phabricator.wikimedia.org/P32143 and previous config saved to /var/cache/conftool/dbconfig/20220802-100304-root.json
* 09:54 marostegui@cumin1001: dbctl commit (dc=all): 'Remove db2079 from dbctl [[phab:T313885|T313885]]', diff saved to https://phabricator.wikimedia.org/P32141 and previous config saved to /var/cache/conftool/dbconfig/20220802-095455-marostegui.json
* 09:52 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dbproxy1019.eqiad.wmnet with reason: host reimage
* 09:49 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on dbproxy1019.eqiad.wmnet with reason: host reimage
* 09:49 btullis@cumin1001: END (PASS) - Cookbook sre.zookeeper.roll-restart-zookeeper (exit_code=0) for Zookeeper A:zookeeper-druid-analytics cluster: Roll restart of jvm daemons.
* 09:48 marostegui@cumin1001: dbctl commit (dc=all): 'db1181 (re)pooling @ 25%: After restart', diff saved to https://phabricator.wikimedia.org/P32140 and previous config saved to /var/cache/conftool/dbconfig/20220802-094804-root.json
* 09:47 marostegui@cumin1001: dbctl commit (dc=all): 'db1174 (re)pooling @ 75%: After maintenance', diff saved to https://phabricator.wikimedia.org/P32139 and previous config saved to /var/cache/conftool/dbconfig/20220802-094759-root.json
* 09:44 godog: grow sdb3 by 100G on thanos-be2004 - [[phab:T314275|T314275]]
* 09:43 btullis@cumin1001: START - Cookbook sre.zookeeper.roll-restart-zookeeper for Zookeeper A:zookeeper-druid-analytics cluster: Roll restart of jvm daemons.
* 09:42 btullis@cumin1001: END (PASS) - Cookbook sre.zookeeper.roll-restart-zookeeper (exit_code=0) for Zookeeper A:zookeeper-druid-public cluster: Roll restart of jvm daemons.
* 09:37 btullis@cumin1001: START - Cookbook sre.hosts.reimage for host dbproxy1019.eqiad.wmnet with OS bullseye
* 09:36 btullis@cumin1001: START - Cookbook sre.zookeeper.roll-restart-zookeeper for Zookeeper A:zookeeper-druid-public cluster: Roll restart of jvm daemons.
* 09:33 marostegui@cumin1001: dbctl commit (dc=all): 'db1181 (re)pooling @ 10%: After restart', diff saved to https://phabricator.wikimedia.org/P32138 and previous config saved to /var/cache/conftool/dbconfig/20220802-093259-root.json
* 09:32 marostegui@cumin1001: dbctl commit (dc=all): 'db1174 (re)pooling @ 50%: After maintenance', diff saved to https://phabricator.wikimedia.org/P32137 and previous config saved to /var/cache/conftool/dbconfig/20220802-093254-root.json
* 09:30 btullis@puppetmaster1001: conftool action : set/pooled=no; selector: cluster=wikireplicas-b,name=dbproxy1019.eqiad.wmnet
* 09:30 btullis@puppetmaster1001: conftool action : set/pooled=yes; selector: cluster=wikireplicas-b,name=dbproxy1018.eqiad.wmnet
* 09:28 btullis@cumin1001: END (PASS) - Cookbook sre.zookeeper.roll-restart-zookeeper (exit_code=0) for Zookeeper A:zookeeper-analytics cluster: Roll restart of jvm daemons.
* 09:26 btullis@puppetmaster1001: conftool action : set/pooled=inactive; selector: cluster=wikireplicas-a,name=dbproxy1019.eqiad.wmnet
* 09:25 btullis@puppetmaster1001: conftool action : set/pooled=yes; selector: cluster=wikireplicas-a,name=dbproxy1018.eqiad.wmnet
* 09:22 btullis@cumin1001: START - Cookbook sre.zookeeper.roll-restart-zookeeper for Zookeeper A:zookeeper-analytics cluster: Roll restart of jvm daemons.
* 09:17 marostegui@cumin1001: dbctl commit (dc=all): 'db1181 (re)pooling @ 5%: After restart', diff saved to https://phabricator.wikimedia.org/P32136 and previous config saved to /var/cache/conftool/dbconfig/20220802-091754-root.json
* 09:17 marostegui@cumin1001: dbctl commit (dc=all): 'db1174 (re)pooling @ 10%: After maintenance', diff saved to https://phabricator.wikimedia.org/P32135 and previous config saved to /var/cache/conftool/dbconfig/20220802-091749-root.json
* 09:15 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2143', diff saved to https://phabricator.wikimedia.org/P32134 and previous config saved to /var/cache/conftool/dbconfig/20220802-091518-root.json
* 09:02 marostegui@cumin1001: dbctl commit (dc=all): 'db1181 (re)pooling @ 2%: After restart', diff saved to https://phabricator.wikimedia.org/P32133 and previous config saved to /var/cache/conftool/dbconfig/20220802-090250-root.json
* 09:02 marostegui@cumin1001: dbctl commit (dc=all): 'db1174 (re)pooling @ 5%: After maintenance', diff saved to https://phabricator.wikimedia.org/P32132 and previous config saved to /var/cache/conftool/dbconfig/20220802-090245-root.json
* 08:47 marostegui@cumin1001: dbctl commit (dc=all): 'db1181 (re)pooling @ 1%: After restart', diff saved to https://phabricator.wikimedia.org/P32131 and previous config saved to /var/cache/conftool/dbconfig/20220802-084745-root.json
* 08:47 marostegui@cumin1001: dbctl commit (dc=all): 'db1174 (re)pooling @ 1%: After maintenance', diff saved to https://phabricator.wikimedia.org/P32130 and previous config saved to /var/cache/conftool/dbconfig/20220802-084740-root.json
* 08:46 marostegui: stop mysql on db2095 db2107 db2109 db2137 db2147 db2159 db2160 pc2012 for pdu maintenance on codfw b5 [[phab:T310070|T310070]]
* 07:49 moritzm: upgrading drmrs ganeti clusters to 3.0.2 [[phab:T312637|T312637]]
* 07:33 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on kubetcd2005.codfw.wmnet with reason: Switch instance to plain disks, [[phab:T311686|T311686]]
* 07:33 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on kubetcd2005.codfw.wmnet with reason: Switch instance to plain disks, [[phab:T311686|T311686]]
* 07:22 godog: bounce icinga on alert2001 - [[phab:T314353|T314353]]
* 07:18 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on kubetcd2005.codfw.wmnet with reason: Switch instance to DRBD, [[phab:T311686|T311686]]
* 07:18 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on kubetcd2005.codfw.wmnet with reason: Switch instance to DRBD, [[phab:T311686|T311686]]
* 06:58 elukey: restart rsyslog on ml-serve2006
* 06:56 ladsgroup@deploy1002: Synchronized php-1.39.0-wmf.22/extensions/FlaggedRevs/maintenance/pruneRevData.php: Backport: [[gerrit:819077{{!}}pruneRevData: Make cleaning in larger batches (T296380)]] (duration: 03m 26s)
* 06:56 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 06:55 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 06:55 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 06:54 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 06:46 godog: bounce icinga on alert1001 - [[phab:T314353|T314353]]
* 05:48 marostegui@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts db2088.codfw.wmnet
* 05:48 marostegui@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 05:44 marostegui@cumin1001: START - Cookbook sre.dns.netbox
* 05:35 marostegui@cumin1001: START - Cookbook sre.hosts.decommission for hosts db2088.codfw.wmnet
* 05:29 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1181', diff saved to https://phabricator.wikimedia.org/P32127 and previous config saved to /var/cache/conftool/dbconfig/20220802-052923-root.json
* 05:24 marostegui: dbmait x1@eqiad [[phab:T314087|T314087]]
* 04:17 ryankemper: [Elastic] Small amendment to my earlier statement; based off epoch time `be_x_oldwiki_titlesuggest_1659407912` was not an old index hanging around after a reindex operation, but rather the new one that the reindex operation was trying to create, but had not yet finished (therefore didn't switch over the aliases). It presumably got interrupted by the reimage of `elastic2059`.
* 04:15 ryankemper: [Elastic] Blew away red index like so: `ryankemper@cumin1001:~$ curl -XDELETE https://search.svc.codfw.wmnet:9243/be_x_oldwiki_titlesuggest_1659407912`. Cluster is back to `green` status.
* 04:07 ryankemper: [Elastic] Per `curl -s https://search.svc.codfw.wmnet:9243/_cat/aliases {{!}} grep -i be_x` I see `be_x_oldwiki_titlesuggest ` alias points to `be_x_oldwiki_titlesuggest_1658396688`. I think this means the red index is an old index from an in-progress reindex operation. I likely just need to delete `be_x_oldwiki_titlesuggest_1659407912` but doing some quick digging first
* 04:04 ryankemper: [Elastic] Red cluster status in main codfw elasticsearch cluster (`https://search.svc.codfw.wmnet:9243`); culprit appears to be index `be_x_oldwiki_titlesuggest_1659407912`. Confusingly it has 2 replicas set so it's not clear to me how we got into this state starting from green (in the past we've gone into red status from indices that erroneously had 0 replicas in production)
* 03:47 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 03:46 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 03:46 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 03:45 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 03:40 krinkle@deploy1002: Synchronized multiversion/: {{Gerrit|I0802db272695}} (duration: 03m 10s)
* 03:40 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 03:39 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 03:39 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 03:38 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 03:34 krinkle@deploy1002: Synchronized wmf-config/: {{Gerrit|I9b89c0ff5c2}} (duration: 03m 32s)
* 03:33 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 03:32 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 03:32 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 03:31 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 03:27 krinkle@deploy1002: Synchronized multiversion/: {{Gerrit|I6e97d39a3}}, {{Gerrit|Ib843ebced31}} (duration: 03m 30s)
* 03:26 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 03:25 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 03:25 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 03:24 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 03:22 krinkle@mwmaint1002: pull aborted:  (duration: 00m 11s)
* 03:21 krinkle@deploy1002: Synchronized wmf-config/CommonSettings.php: {{Gerrit|I39a2b86065}} (duration: 03m 19s)
* 03:20 ryankemper@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=1) for host elastic2059.codfw.wmnet with OS bullseye
* 03:15 krinkle@deploy1002: Synchronized multiversion/: {{Gerrit|Ieaea60a991e5611}} (duration: 03m 03s)
* 03:14 krinkle@mwmaint2002: pull aborted:  (duration: 01m 36s)
* 03:14 krinkle@mwmaint1002: pull aborted:  (duration: 01m 31s)
* 03:13 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 03:12 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 03:12 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 03:11 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 02:58 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic2059.codfw.wmnet with reason: host reimage
* 02:54 ryankemper: [WDQS] `ryankemper@wdqs1012:~$ sudo systemctl restart wdqs-blazegraph.service` to clear `Query Service HTTP Port` && `WDQS SPARQL` alerts
* 02:53 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic2059.codfw.wmnet with reason: host reimage
* 02:36 ryankemper@cumin1001: START - Cookbook sre.hosts.reimage for host elastic2059.codfw.wmnet with OS bullseye
* 02:31 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 02:30 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 02:30 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 02:29 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 02:09 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 02:08 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 02:08 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 02:07 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 00:41 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 00:40 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 00:40 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 00:39 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 00:35 krinkle@deploy1002: Synchronized wmf-config/CommonSettings.php: {{Gerrit|Ieaea60a991e5}} (duration: 03m 10s)
* 00:29 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 00:28 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 00:28 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 00:27 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 00:23 krinkle@deploy1002: Synchronized multiversion/: {{Gerrit|Ia3406eba4ab8bb}} (duration: 03m 22s)
* 00:17 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 00:16 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 00:16 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 00:15 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 00:05 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 00:04 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 00:04 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 00:03 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply


== 2021-01-22 ==
== 2022-08-01 ==
* 22:41 reedy@deploy1001: Synchronized invalid.json: (no justification provided) (duration: 00m 58s)
* 23:59 krinkle@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|Id1ce285631f5}}, {{Gerrit|I194d419fbfe}} (duration: 03m 09s)
* 20:07 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1268.eqiad.wmnet with reason: REIMAGE
* 23:58 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:05 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1268.eqiad.wmnet with reason: REIMAGE
* 23:57 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:05 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mw2330.codfw.wmnet with reason: REIMAGE
* 23:57 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:05 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mw2332.codfw.wmnet with reason: REIMAGE
* 23:56 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:03 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2328.codfw.wmnet with reason: REIMAGE
* 21:08 moritzm: drain ganeti2028 [[phab:T309957|T309957]]
* 20:01 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mw1413.eqiad.wmnet with reason: REIMAGE
* 21:03
* 20:01 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2334.codfw.wmnet with reason: REIMAGE
* 20:00 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1413.eqiad.wmnet with reason: REIMAGE
* 20:00 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2328.codfw.wmnet with reason: REIMAGE
* 20:00 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2330.codfw.wmnet with reason: REIMAGE
* 20:00 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2332.codfw.wmnet with reason: REIMAGE
* 19:59 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2334.codfw.wmnet with reason: REIMAGE
* 19:39 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2356.codfw.wmnet
* 19:38 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2354.codfw.wmnet
* 19:38 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2352.codfw.wmnet
* 19:36 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2350.codfw.wmnet
* 19:35 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2352.codfw.wmnet
* 19:35 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2350.codfw.wmnet
* 19:35 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2354.codfw.wmnet
* 19:34 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2356.codfw.wmnet
* 19:15 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mw2350.codfw.wmnet with reason: REIMAGE
* 19:13 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2352.codfw.wmnet with reason: REIMAGE
* 19:11 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2354.codfw.wmnet with reason: REIMAGE
* 19:10 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2350.codfw.wmnet with reason: REIMAGE
* 19:09 mutante: releases1002 systemctl reset-failed
* 19:09 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2356.codfw.wmnet with reason: REIMAGE
* 19:09 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2352.codfw.wmnet with reason: REIMAGE
* 19:08 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2354.codfw.wmnet with reason: REIMAGE
* 19:07 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2356.codfw.wmnet with reason: REIMAGE
* 18:47 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2364.codfw.wmnet
* 18:47 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2362.codfw.wmnet
* 18:47 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2360.codfw.wmnet
* 18:46 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2358.codfw.wmnet
* 18:46 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2362.codfw.wmnet
* 18:46 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2364.codfw.wmnet
* 18:45 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2360.codfw.wmnet
* 18:45 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2358.codfw.wmnet
* 18:17 mutante: releases2002 - rebooting to confirm works now and also new disk gets auto-mounted
* 18:03 mutante: releases1002 - replaced ens5 with ens6 in /etc/network/interfaaces and rebooted again
* 18:01 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on releases1002.eqiad.wmnet with reason: fixing networking - added disk
* 18:01 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on releases1002.eqiad.wmnet with reason: fixing networking - added disk
* 17:59 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on mw2360.codfw.wmnet with reason: new install on buster
* 17:59 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on mw2360.codfw.wmnet with reason: new install on buster
* 17:57 mutante: releases1002 (releases.wm.org active backend) - rebooting - hopefully it does not run into [[phab:T272555|T272555]] but if it does now it's known how to fix
* 17:55 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mw2364.codfw.wmnet with reason: REIMAGE
* 17:54 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mw2360.codfw.wmnet with reason: REIMAGE
* 17:53 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2358.codfw.wmnet with reason: REIMAGE
* 17:52 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2362.codfw.wmnet with reason: REIMAGE
* 17:52 mutante: releases2001 - create new partition table with fdisk, make ext4 filesystem on /dev/vdb1
* 17:50 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2364.codfw.wmnet with reason: REIMAGE
* 17:50 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2362.codfw.wmnet with reason: REIMAGE
* 17:49 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2360.codfw.wmnet with reason: REIMAGE
* 17:49 ppchelko@deploy1001: Finished deploy [restbase/deploy@e54225d]: [[phab:T270411|T270411]] [[phab:T270415|T270415]] [[phab:T270281|T270281]] [[phab:T270277|T270277]] (duration: 65m 37s)
* 17:49 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2358.codfw.wmnet with reason: REIMAGE
* 17:29 mforns@deploy1001: Finished deploy [analytics/refinery@eea071d] (thin): Extra bug-fix train THIN [analytics/refinery@eea071def90a8a856b1e04dda23b77a850134253] (duration: 00m 07s)
* 17:29 mforns@deploy1001: Started deploy [analytics/refinery@eea071d] (thin): Extra bug-fix train THIN [analytics/refinery@eea071def90a8a856b1e04dda23b77a850134253]
* 17:23 mforns@deploy1001: Finished deploy [analytics/refinery@eea071d]: Extra bug-fix train [analytics/refinery@eea071def90a8a856b1e04dda23b77a850134253] (duration: 10m 03s)
* 17:13 mforns@deploy1001: Started deploy [analytics/refinery@eea071d]: Extra bug-fix train [analytics/refinery@eea071def90a8a856b1e04dda23b77a850134253]
* 16:44 ppchelko@deploy1001: Started deploy [restbase/deploy@e54225d]: [[phab:T270411|T270411]] [[phab:T270415|T270415]] [[phab:T270281|T270281]] [[phab:T270277
See [[Server Admin Log/Archives]].
See [[Server Admin Log/Archives]].
<noinclude>
<noinclude>

Revision as of 23:41, 12 August 2022

2022-08-12

  • 23:41 mutante: wikistats-bullseye:~$ /usr/lib/wikistats/update.php wp prefix blk ; /usr/lib/wikistats/update.php wp prefix kcg T315121
  • 23:38 mutante: [mwmaint1002:~] $ sudo systemctl start mediawiki_job_initsitestats.timer T315121
  • 22:14 ryankemper@cumin1001: END (ERROR) - Cookbook sre.elasticsearch.rolling-operation (exit_code=97) Operation.REIMAGE (1 nodes at a time) for ElasticSearch cluster search_eqiad: eqiad cluster reimage (bullseye upgrade) - ryankemper@cumin1001 - T289135
  • 21:48 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic1071.eqiad.wmnet with OS bullseye
  • 21:45 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host clouddb2002-dev.codfw.wmnet with OS bullseye
  • 21:27 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic1071.eqiad.wmnet with reason: host reimage
  • 21:25 ryankemper@cu