You are browsing a read-only backup copy of Wikitech. The live site can be found at wikitech.wikimedia.org

Difference between revisions of "Network Error Logging"

From Wikitech-static
Jump to navigation Jump to search
imported>CDanis
m
imported>Jobo
 
Line 13: Line 13:
== Our report receiver implementation ==
== Our report receiver implementation ==
TODO EventGate, Kafka, logstash
TODO EventGate, Kafka, logstash
[[Category:SRE Infrastructure Foundations]]

Latest revision as of 09:17, 14 June 2021

intro

There are many classes of reliability issues (e.g. failures/misconfigurations in intermediate networks) that we only find out about via direct, manual reports from users, or, for very widespread cases, we notice because traffic is 'missing' and below expected rates.

Many modern browsers support a feature called Network Error Logging, or NEL. On successful requests, we ask browsers to remember "if you later encounter an error talking to us, let an error reporting endpoint know".

Asking browsers to enable NEL is implemented by serving HTTP response headers Report-To and NEL, which together define a set of endpoints that can receive reports, sampling fractions for each of failures and successes, and a TTL for this entire definition to be stored in the user's browser. See also Sample Policy Definitions. For our implementation, sub https_deliver_networkerrorlogging in wikimedia-frontend.vcl.erb.

Dashboards

Our report receiver implementation

TODO EventGate, Kafka, logstash