You are browsing a read-only backup copy of Wikitech. The live site can be found at wikitech.wikimedia.org

Server Admin Log: Difference between revisions

From Wikitech-static
Jump to navigation Jump to search
imported>Stashbot
(XioNoX: configure OSPF between cr2-drmrs and cr2-eqdfw)
imported>Stashbot
(ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1104 (T302185)', diff saved to https://phabricator.wikimedia.org/P21614 and previous config saved to /var/cache/conftool/dbconfig/20220301-011404-ladsgroup.json)
Line 1: Line 1:
== 2022-02-27 ==
== 2022-03
* 20:42 XioNoX: configure OSPF between cr2-drmrs and cr2-eqdfw
 
== 2022-02-25 ==
* 23:32 dzahn@deploy1002: helmfile [staging] DONE helmfile.d/services/miscweb: apply
* 23:30 dzahn@deploy1002: helmfile [staging] START helmfile.d/services/miscweb: apply
* 21:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3315 ([[phab:T300992|T300992]])', diff saved to https://phabricator.wikimedia.org/P21540 and previous config saved to /var/cache/conftool/dbconfig/20220225-213704-ladsgroup.json
* 21:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3315', diff saved to https://phabricator.wikimedia.org/P21539 and previous config saved to /var/cache/conftool/dbconfig/20220225-212159-ladsgroup.json
* 21:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3315', diff saved to https://phabricator.wikimedia.org/P21538 and previous config saved to /var/cache/conftool/dbconfig/20220225-210654-ladsgroup.json
* 21:02 ryankemper: [WDQS] Restarted wdqs eqiad exporters: `ryankemper@cumin1001:~$ sudo -E cumin -b 1 'wdqs1*' 'systemctl restart prometheus-blazegraph-exporter-wdqs-blazegraph.service'`
* 21:01 ryankemper: [WDQS Deploy] Deploy complete. Successful test query placed on query.wikidata.org, there's no relevant criticals in Icinga, and Grafana looks good. Still looking into `Reduced availability for job jmx_wdqs_updater`; will try restarting blazegraph exporters in eqiad
* 20:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3315 ([[phab:T300992|T300992]])', diff saved to https://phabricator.wikimedia.org/P21537 and previous config saved to /var/cache/conftool/dbconfig/20220225-205149-ladsgroup.json
* 20:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1144:3315 ([[phab:T300992|T300992]])', diff saved to https://phabricator.wikimedia.org/P21536 and previous config saved to /var/cache/conftool/dbconfig/20220225-204844-ladsgroup.json
* 20:48 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1144.eqiad.wmnet with reason: Maintenance
* 20:48 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1144.eqiad.wmnet with reason: Maintenance
* 20:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1161 ([[phab:T300992|T300992]])', diff saved to https://phabricator.wikimedia.org/P21535 and previous config saved to /var/cache/conftool/dbconfig/20220225-204836-ladsgroup.json
* 20:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1161', diff saved to https://phabricator.wikimedia.org/P21534 and previous config saved to /var/cache/conftool/dbconfig/20220225-203331-ladsgroup.json
* 20:31 ryankemper: [WDQS Deploy] Restarting `wdqs-categories` across lvs-managed hosts, one node at a time: `sudo -E cumin -b 1 'A:wdqs-all and not A:wdqs-test' 'depool && sleep 45 && systemctl restart wdqs-categories && sleep 45 && pool'`
* 20:31 ryankemper: [WDQS Deploy] Restarted `wdqs-categories` across all test hosts simultaneously: `sudo -E cumin 'A:wdqs-test' 'systemctl restart wdqs-categories'`
* 20:31 ryankemper: [WDQS Deploy] Restarted `wdqs-updater` across all hosts, 4 hosts at a time: `sudo -E cumin -b 4 'A:wdqs-all' 'systemctl restart wdqs-updater'`
* 20:30 ryankemper@deploy1002: Finished deploy [wdqs/wdqs@5d384a5]: 0.3.104 (duration: 07m 18s)
* 20:23 ryankemper: [WDQS Deploy] Tests passing following deploy of `0.3.104` on canary `wdqs1003`; proceeding to rest of fleet
* 20:22 ryankemper@deploy1002: Started deploy [wdqs/wdqs@5d384a5]: 0.3.104
* 20:22 ryankemper: [WDQS Deploy] Gearing up for deploy of wdqs `0.3.104`. Pre-deploy tests passing on canary `wdqs1003`
* 20:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1161', diff saved to https://phabricator.wikimedia.org/P21533 and previous config saved to /var/cache/conftool/dbconfig/20220225-201826-ladsgroup.json
* 20:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1161 ([[phab:T300992|T300992]])', diff saved to https://phabricator.wikimedia.org/P21532 and previous config saved to /var/cache/conftool/dbconfig/20220225-200322-ladsgroup.json
* 19:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1161 ([[phab:T300992|T300992]])', diff saved to https://phabricator.wikimedia.org/P21531 and previous config saved to /var/cache/conftool/dbconfig/20220225-195917-ladsgroup.json
* 19:59 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
* 19:59 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
* 19:59 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1161.eqiad.wmnet with reason: Maintenance
* 19:59 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1161.eqiad.wmnet with reason: Maintenance
* 19:58 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
* 19:58 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
* 19:57 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1150.eqiad.wmnet with reason: Maintenance
* 19:57 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1150.eqiad.wmnet with reason: Maintenance
* 19:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3315 ([[phab:T300992|T300992]])', diff saved to https://phabricator.wikimedia.org/P21530 and previous config saved to /var/cache/conftool/dbconfig/20220225-195658-ladsgroup.json
* 19:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3315', diff saved to https://phabricator.wikimedia.org/P21529 and previous config saved to /var/cache/conftool/dbconfig/20220225-194153-ladsgroup.json
* 19:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3315', diff saved to https://phabricator.wikimedia.org/P21528 and previous config saved to /var/cache/conftool/dbconfig/20220225-192649-ladsgroup.json
* 19:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3315 ([[phab:T300992|T300992]])', diff saved to https://phabricator.wikimedia.org/P21527 and previous config saved to /var/cache/conftool/dbconfig/20220225-191144-ladsgroup.json
* 19:11 jgleeson: payments updated from {{Gerrit|4638c0ec}} to {{Gerrit|3dfac3b2}}
* 19:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1113:3315 ([[phab:T300992|T300992]])', diff saved to https://phabricator.wikimedia.org/P21526 and previous config saved to /var/cache/conftool/dbconfig/20220225-190939-ladsgroup.json
* 19:09 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1113.eqiad.wmnet with reason: Maintenance
* 19:09 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1113.eqiad.wmnet with reason: Maintenance
* 19:08 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on 8 hosts with reason: Maintenance
* 19:07 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on 8 hosts with reason: Maintenance
* 19:07 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2123.codfw.wmnet with reason: Maintenance
* 19:07 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2123.codfw.wmnet with reason: Maintenance
* 19:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3315 ([[phab:T300992|T300992]])', diff saved to https://phabricator.wikimedia.org/P21525 and previous config saved to /var/cache/conftool/dbconfig/20220225-190737-ladsgroup.json
* 18:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3315', diff saved to https://phabricator.wikimedia.org/P21524 and previous config saved to /var/cache/conftool/dbconfig/20220225-185233-ladsgroup.json
* 18:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3315', diff saved to https://phabricator.wikimedia.org/P21523 and previous config saved to /var/cache/conftool/dbconfig/20220225-183728-ladsgroup.json
* 18:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3315 ([[phab:T300992|T300992]])', diff saved to https://phabricator.wikimedia.org/P21522 and previous config saved to /var/cache/conftool/dbconfig/20220225-182223-ladsgroup.json
* 18:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1096:3315 ([[phab:T300992|T300992]])', diff saved to https://phabricator.wikimedia.org/P21521 and previous config saved to /var/cache/conftool/dbconfig/20220225-181918-ladsgroup.json
* 18:19 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1096.eqiad.wmnet with reason: Maintenance
* 18:19 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1096.eqiad.wmnet with reason: Maintenance
* 18:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1100 ([[phab:T300992|T300992]])', diff saved to https://phabricator.wikimedia.org/P21520 and previous config saved to /var/cache/conftool/dbconfig/20220225-181911-ladsgroup.json
* 18:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1100', diff saved to https://phabricator.wikimedia.org/P21519 and previous config saved to /var/cache/conftool/dbconfig/20220225-180406-ladsgroup.json
* 17:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1100', diff saved to https://phabricator.wikimedia.org/P21518 and previous config saved to /var/cache/conftool/dbconfig/20220225-174901-ladsgroup.json
* 17:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1100 ([[phab:T300992|T300992]])', diff saved to https://phabricator.wikimedia.org/P21517 and previous config saved to /var/cache/conftool/dbconfig/20220225-173356-ladsgroup.json
* 17:29 taavi@deploy1002: Finished deploy [horizon/deploy@9d02cd6]: wmf-puppet-dashboard updates: better error messages and code cleanup (prod) (duration: 08m 20s)
* 17:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1100 ([[phab:T300992|T300992]])', diff saved to https://phabricator.wikimedia.org/P21516 and previous config saved to /var/cache/conftool/dbconfig/20220225-172845-ladsgroup.json
* 17:28 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1100.eqiad.wmnet with reason: Maintenance
* 17:28 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1100.eqiad.wmnet with reason: Maintenance
* 17:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1110 ([[phab:T300992|T300992]])', diff saved to https://phabricator.wikimedia.org/P21515 and previous config saved to /var/cache/conftool/dbconfig/20220225-172837-ladsgroup.json
* 17:21 taavi@deploy1002: Started deploy [horizon/deploy@9d02cd6]: wmf-puppet-dashboard updates: better error messages and code cleanup (prod)
* 17:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1110', diff saved to https://phabricator.wikimedia.org/P21514 and previous config saved to /var/cache/conftool/dbconfig/20220225-171333-ladsgroup.json
* 17:12 ebernhardson: manual trigger of cirrus SaneitizeJobs for with 2hr refresh
* 17:01 taavi@deploy1002: Finished deploy [horizon/deploy@9d02cd6] (dev): wmf-puppet-dashboard updates: better error messages and code cleanup (duration: 01m 57s)
* 16:59 taavi@deploy1002: Started deploy [horizon/deploy@9d02cd6] (dev): wmf-puppet-dashboard updates: better error messages and code cleanup
* 16:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1110', diff saved to https://phabricator.wikimedia.org/P21513 and previous config saved to /var/cache/conftool/dbconfig/20220225-165828-ladsgroup.json
* 16:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1110 ([[phab:T300992|T300992]])', diff saved to https://phabricator.wikimedia.org/P21512 and previous config saved to /var/cache/conftool/dbconfig/20220225-164323-ladsgroup.json
* 16:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1110 ([[phab:T300992|T300992]])', diff saved to https://phabricator.wikimedia.org/P21511 and previous config saved to /var/cache/conftool/dbconfig/20220225-164020-ladsgroup.json
* 16:40 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1110.eqiad.wmnet with reason: Maintenance
* 16:40 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1110.eqiad.wmnet with reason: Maintenance
* 16:36 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp3063.esams.wmnet with OS buster
* 16:35 vgutierrez: pool cp3063 running HAProxy as TLS termination layer - [[phab:T290005|T290005]] [[phab:T271421|T271421]]
* 16:10 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp3063.esams.wmnet with reason: host reimage
* 16:06 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp3063.esams.wmnet with reason: host reimage
* 15:38 vgutierrez@cumin1001: START - Cookbook sre.hosts.reimage for host cp3063.esams.wmnet with OS buster
* 15:36 moritzm: imported PHP 7.4 7.4.28-1+0~20220217.59+debian10~1.gbp1950+wmf1+buster1 to component/php74 for buster-wikimedia [[phab:T271736|T271736]]
* 15:25 vgutierrez: pool cp5005 running HAProxy as TLS termination layer - [[phab:T290005|T290005]] [[phab:T271421|T271421]]
* 15:19 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp5005.eqsin.wmnet with OS buster
* 14:35 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp5005.eqsin.wmnet with reason: host reimage
* 14:32 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp5005.eqsin.wmnet with reason: host reimage
* 14:13 taavi@deploy1002: Finished deploy [horizon/deploy@9d02cd6]: fix wmf-puppet-dashboard routes (duration: 07m 47s)
* 14:05 taavi@deploy1002: Started deploy [horizon/deploy@9d02cd6]: fix wmf-puppet-dashboard routes
* 14:04 vgutierrez@cumin1001: START - Cookbook sre.hosts.reimage for host cp5005.eqsin.wmnet with OS buster
* 13:56 taavi@deploy1002: Finished deploy [horizon/deploy@9d02cd6]: deploying wmf-proxy-dashboard and wmf-puppet-dashboard changes for real after fixing the scap config (duration: 04m 50s)
* 13:52 taavi@deploy1002: Started deploy [horizon/deploy@9d02cd6]: deploying wmf-proxy-dashboard and wmf-puppet-dashboard changes for real after fixing the scap config
* off: restoring psql-all-dbs-20220225.sql.gz into netbox
* 13:30 taavi@deploy1002: Finished deploy [horizon/deploy@9d02cd6] (dev): debugging deployment process (duration: 00m 06s)
* 13:30 taavi@deploy1002: Started deploy [horizon/deploy@9d02cd6] (dev): debugging deployment process
* 13:30 taavi@deploy1002: Started deploy [horizon/deploy@9d02cd6] (dev): debugging deployment process
* 13:29 taavi@deploy1002: Finished deploy [horizon/deploy@9d02cd6] (dev): debugging deployment process (duration: 00m 05s)
* 13:29 taavi@deploy1002: Started deploy [horizon/deploy@9d02cd6] (dev): debugging deployment process
* 12:46 taavi@deploy1002: Finished deploy [horizon/deploy@9d02cd6]: updating wmf-proxy-dashboard on eqiad1 (duration: 02m 04s)
* 12:44 taavi@deploy1002: Started deploy [horizon/deploy@9d02cd6]: updating wmf-proxy-dashboard on eqiad1
* 12:39 taavi@deploy1002: Finished deploy [horizon/deploy@9d02cd6] (dev): updating wmf-proxy-dashboard (duration: 00m 37s)
* 12:39 taavi@deploy1002: Started deploy [horizon/deploy@9d02cd6] (dev): updating wmf-proxy-dashboard
* 12:39 moritzm: drain instances off ganeti2007 [[phab:T302577|T302577]]
* 12:38 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp2040.codfw.wmnet with OS buster
* 12:32 vgutierrez: pool cp2040 running HAProxy as TLS termination layer - [[phab:T290005|T290005]] [[phab:T271421|T271421]]
* 12:14 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp2040.codfw.wmnet with reason: host reimage
* 12:11 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp2040.codfw.wmnet with reason: host reimage
* 11:55 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.addnode (exit_code=0) for new host ganeti2030.codfw.wmnet to ganeti01.svc.codfw.wmnet
* 11:53 vgutierrez@cumin1001: START - Cookbook sre.hosts.reimage for host cp2040.codfw.wmnet with OS buster
* 11:53 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti2030.codfw.wmnet to ganeti01.svc.codfw.wmnet
* 11:47 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.addnode (exit_code=0) for new host ganeti2029.codfw.wmnet to ganeti01.svc.codfw.wmnet
* 11:41 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4025.ulsfo.wmnet with OS buster
* 11:40 vgutierrez: pool cp4025 running HAProxy as TLS termination layer - [[phab:T290005|T290005]] [[phab:T271421|T271421]]
* 11:35 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2030.codfw.wmnet
* 11:29 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2030.codfw.wmnet
* 11:20 XioNoX: re-activate BGP session to Seabone in esams
* 11:13 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4025.ulsfo.wmnet with reason: host reimage
* 11:10 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4025.ulsfo.wmnet with reason: host reimage
* 11:04 moritzm: added ganeti2029 to codfw Ganeti cluster [[phab:T298998|T298998]]
* 10:54 vgutierrez@cumin1001: START - Cookbook sre.hosts.reimage for host cp4025.ulsfo.wmnet with OS buster
* 10:43 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti2029.codfw.wmnet to ganeti01.svc.codfw.wmnet
* 10:42 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2029.codfw.wmnet
* 10:41 moritzm: enabled virtualisation in BIOS for ganeti2029 [[phab:T298998|T298998]]
* 10:33 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2029.codfw.wmnet
* 10:27 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti2029.codfw.wmnet with reason: Enable virtualisation in BIOS
* 10:27 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti2029.codfw.wmnet with reason: Enable virtualisation in BIOS
* 10:22 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti2029.codfw.wmnet to ganeti01.svc.codfw.wmnet
* 10:22 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti2029.codfw.wmnet to ganeti01.svc.codfw.wmnet
* 10:17 vgutierrez: rolling upgrade to HAProxy 2.4.13 on HAProxy cache nodes - [[phab:T290005|T290005]]
* 09:34 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2029.codfw.wmnet
* 09:28 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2029.codfw.wmnet
* 02:43 cstone: Donation Interface revision changed from {{Gerrit|a6a9b63e}} to {{Gerrit|4638c0ec}}
 
== 2022-02-24 ==
* 23:35 ryankemper: [[phab:T302526|T302526]] Deployed https://gerrit.wikimedia.org/r/765652 and ran puppet across wcqs*
* 22:06 mutante: static-bugzilla.wikimedia.org - kubernetes - deployed gerrit:765572  - first prod service behind a k8s ingress ([[phab:T290966|T290966]])
* 22:05 mutante: phabricator - disabled git repo - labs-tools-harvesting-data-refinery/repository/master/
* 21:50 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic2086.codfw.wmnet with OS bullseye
* 21:45 brennen: end of UTC late backport & config window
* 21:43 dancy@deploy1002: Started scap: testing scap container image building
* 21:43 tzatziki: removing 1 file for legal compliance
* 21:42 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic2085.codfw.wmnet with OS bullseye
* 21:41 mutante: phabricator - disabled git repo "frig" - outdated fundraising stuff, checked with fr-tech, not needed [[phab:T296022|T296022]]
* 21:40 brennen@deploy1002: Synchronized php-1.38.0-wmf.23/includes: Backport: [[gerrit:765626{{!}}Revert "Revert "Revert "Show message fallback keys when using &uselang=qqx"""]] (duration: 00m 57s)
* 21:39 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic2086.codfw.wmnet with reason: host reimage
* 21:36 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic2086.codfw.wmnet with reason: host reimage
* 21:34 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic2085.codfw.wmnet with reason: host reimage
* 21:30 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic2085.codfw.wmnet with reason: host reimage
* 21:29 brennen@deploy1002: Synchronized wmf-config/CirrusSearch-production.php: Config: [[gerrit:765577{{!}}cirrus: Reduce write isolation to only cloudelastic (T295705)]] (duration: 00m 55s)
* 21:27 mutante: phabricator - disabling git repo rGEDS (Elasticdash) - only one commit from 2015  - [[phab:T296022|T296022]]
* 21:19 tzatziki: removing 1 file for legal compliance
* 21:19 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host elastic2086.codfw.wmnet with OS bullseye
* 21:18 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic2083.codfw.wmnet with OS bullseye
* 21:13 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host elastic2085.codfw.wmnet with OS bullseye
* 21:11 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic2084.codfw.wmnet with OS bullseye
* 21:07 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic2083.codfw.wmnet with reason: host reimage
* 21:05 tzatziki: removing 4 files for legal compilance
* 21:04 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic2083.codfw.wmnet with reason: host reimage
* 21:02 taavi@deploy1002: Finished deploy [horizon/deploy@9d02cd6]: (no justification provided) (duration: 03m 18s)
* 21:01 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic2084.codfw.wmnet with reason: host reimage
 


==Archives==
==Archives==

Revision as of 01:14, 1 March 2022

2022-03-01

  • 01:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1104 (T302185)', diff saved to https://phabricator.wikimedia.org/P21614 and previous config saved to /var/cache/conftool/dbconfig/20220301-011404-ladsgroup.json
  • 01:14 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1104.eqiad.wmnet with reason: Maintenance
  • 01:13 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1104.eqiad.wmnet with reason: Maintenance
  • 00:17 mutante: 15.wikipedia.org on k8s (staging) deploy1002:~] $ curl -s --resolve "15.wikipedia.org:4111:staging.svc.eqiad.wmnet" 'https://15.wikipedia.org' | grep grandpa => "“Wikipedia is like an all-knowing grandpa.”" | T300171

Archives

See Server Admin Log/Archives.