You are browsing a read-only backup copy of Wikitech. The live site can be found at wikitech.wikimedia.org

Logstash/SRE onboard

From Wikitech-static
Jump to navigation Jump to search

Shipping logs to Logstash via Kafka

As part of FY2019 TEC6 goals SRE is implementing a new logging pipeline as part of taking ownership of the Logstash stack. See also the Logging infrastructure design document (source) for more details.

Implementation of the logging infrastructure is in progress as FY2019 Q2, this page aims at outline migration steps for existing applications.

Syslog

Given how ubiquitous syslog is it has been chosen as the transport of choice, more specifically applications log using local syslog interface (i.e. syslog(3) or manually opening the /dev/log unix socket. The "identity" (also known as syslog tag or program name) is used to opt-in applications to ship their logs to Logstash (via arsyslog lookup table in puppet.git. The "level" as intended by syslog (i.e. info/err/warning/etc) is also set to its appropriate value per-message (see also structured logging below for more information).

Structured logging

Most applications using Logstash nowadays are sending structured logs (in other words maps of key: value pairs), this use case is supported on the new logging infrastructure by sending JSON as the syslog message/payload. To have a robust way to detect JSON payloads the "cee cookie" (rsyslog documentation) is required to prefix said JSON messages, in other words a structured syslog message/payload would look like this (excluding syslog headers):

 @cee: {"foo": "bar"}

In case of parsing failure the message is regarded as a regular non-JSON message.

Kafka and JSON

In preparation to sending to Kafka locally generated messages (structured or otherwise) are turned into JSON, message metadata (such as timestamp, source host, etc) is added and then JSON is shipped to Kafka. In the structured logging case the JSON parsed from the message is merged with the message metadata and then shipped to Kafka. Logstash will then consume from Kafka and ingest the message into its pipelines.

UDP logging

A local UDP endpoint to accept the same messages/format as /dev/log can also be configured. This isn't the preferred interface and should be regarded as a compatibility layer only. There are several reasons for such an endpoint: for example the JVM doesn't support unix socket communication with /dev/log out of the box. Furthermore UDP provides a non-blocking interface, useful when transitions applications to local logging and blocking behavior of writing to /dev/log isn't known yet.