You are browsing a read-only backup copy of Wikitech. The live site can be found at wikitech.wikimedia.org

Incident documentation/20190321-acmechief: Difference between revisions

From Wikitech-static
Jump to navigation Jump to search
imported>Vgutierrez
No edit summary
 
imported>Krinkle
 
Line 1: Line 1:
== Summary ==
#REDIRECT [[Incidents/20190321-acmechief]]
After upgrading to acme-chief 0.14 and restarting uwsgi-acme-chief service in acmechief1001, acme-chief-api wrongly signaled to puppet the /etc/acmecerts files as directories.. effectively wiping the TLS certs used by the services whose certificates are managed by acme-chief.
 
The issue in acme-chief API is solved by https://gerrit.wikimedia.org/r/c/operations/software/acme-chief/+/498046
 
== Timeline ==
All times are in UTC
* 08:54 uwsgi-acme-chief is restarted in acmechief1001 making effective the acme-chief upgrade to 0.14.
* 08:58 slapd crashes in seaborgium after puppet runs and destroys the TLS files in /etc/acmecerts
* 09:09 toolschecker pages CRITICAL for Test LDAP on checker.tools.wmflabs.org
* 09:13 acme-chief downgraded to 0.12 and uwsgi-acme-chief restarted
 
== Affected servers ==
The following servers have been affected by this issue:
* sodium.wikimedia.org
* seaborgium.wikimedia.org
* cobalt.wikimedia.org
* gerrit2001.wikimedia.org
* netmon2001.wikimedia.org
* netmon1002.wikimedia.org
* mx1001.wikimedia.org
* mx2001.wikimedia.org
* ldap-eqiad-replica01.wikimedia.org
* ldap-eqiad-replica02.wikimedia.org
* fermium.wikimedia.org
* dbmonitor1001.wikimedia.org
* dbmonitor2001.wikimedia.org

Latest revision as of 17:47, 8 April 2022