You are browsing a read-only backup copy of Wikitech. The live site can be found at

Incidents/2019-04-23 varnish

From Wikitech-static
Jump to navigation Jump to search


Similar Varnish 'mailbox lag' problems as many times before.


Approximately 82k queries lost (HTTP 503 served instead). source


Automated monitoring -- Icinga alerts on traffic availability.


This is a step by step outline of what happened to cause the incident and how it was remedied. Include the lead-up to the incident, as well as any epilogue, and clearly indicate when the user-visible outage began and ended.

All times in UTC.

  • 19:54 Varnish mailbox lag begins climbing on cp1083 OUTAGE BEGINS
  • 19:56 first Icinga alert for HTTP availability PROBLEM - HTTP availability for Nginx -SSL terminators- at ulsfo on icinga1001 is CRITICAL: cluster=cache_text site=ulsfo
  • 19:57 Varnish mailbox lag recovers on cp1083 but begins climbing on cp1085
  • 20:02 Varnish mailbox lag recovers on cp1085 OUTAGE ENDS

Graphs: Mailbox lag HTTP availability


See Incident_documentation/20190416-varnish#Conclusions


See Incident_documentation/20190416-varnish#Actionables