Similar Varnish 'mailbox lag' problems as many times before.
Approximately 82k queries lost (HTTP 503 served instead). source
Automated monitoring -- Icinga alerts on traffic availability.
This is a step by step outline of what happened to cause the incident and how it was remedied. Include the lead-up to the incident, as well as any epilogue, and clearly indicate when the user-visible outage began and ended.
All times in UTC.
- 19:54 Varnish mailbox lag begins climbing on cp1083 OUTAGE BEGINS
- 19:56 first Icinga alert for HTTP availability
PROBLEM - HTTP availability for Nginx -SSL terminators- at ulsfo on icinga1001 is CRITICAL: cluster=cache_text site=ulsfo
- 19:57 Varnish mailbox lag recovers on cp1083 but begins climbing on cp1085
- 20:02 Varnish mailbox lag recovers on cp1085 OUTAGE ENDS