You are browsing a read-only backup copy of Wikitech. The live site can be found at

Incident documentation/20160606-otrsmail: Difference between revisions

From Wikitech-static
Jump to navigation Jump to search
imported>Alex Monk
( instead of, fix timeline, spelling)
(One intermediate revision by one other user not shown)
Line 1: Line 1:
#REDIRECT [[Incidents/20160606-otrsmail]]
== Summary ==
A broken clamav update prevented mail delivery on (OTRS) for about 11 hours. The new clamav package deprecated a config option, but if that option is still present (which was the case in our puppetised config), clamd and freshclam refuse to start. No mail was lost, the queue was processed once the option was removed.
== Timeline ==
* 08:39: Moritz deploys the new clamav version as part of the jessie 8.5 point release
* 18:55: The error is reported by an OTRS admin in wikimedia-tech
* 19:14: Alex M pokes ops, Rob and Brandon start investigating
* 19:44: Brandon deploys a hotfix (and one verified via puppet)
== Conclusions ==
* Monitoring didn't spot the error, the "OTRS Icinga" check didn't flag an error and we're missing Icinga checks for ClamAV and FreshClam
* The problematic behaviour wasn't noticed before deploying the update, all further ClamAV updates need more scrutiny (ClamAV is handled differently in Debian compared to other packages: Due to sometimes nontransparent security changes clamav is always updated to the latest version instead of applying isolated changes. Also, virus pattern updates often need newer scan engine features)
== Actionables ==
* {{Status}} Icinca checks for ClamAV and FreshClam, double check OTRS Icinga plugin behaviour if ClamAV fails ({{Bug|T137188}})
{{#ifeq:{{SUBPAGENAME}}|Report Template||
[[Category:Incident documentation]]

Latest revision as of 17:45, 8 April 2022