You are browsing a read-only backup copy of Wikitech. The live site can be found at wikitech.wikimedia.org
User:Cwhite/Logstash/ECS Schema Guide for Developers
Oftentimes, software is opinionated about what constitutes a log entry. Since WMF's centralized logging infrastructure became generally available, it has experienced incredible organic growth. This growth presents challenges in the storage, ingest, and presentation domains. One such issue is there is no definition to how many fields can be set and subsequentially no typing info provided. Without control on the type of these fields, Elasticsearch must guess the type making type collisions a regular occurrence. Without control of what fields are available, fields remain largely undefined and meaningless to the outside observer. As we strive to boost signal, reduce noise, scale, simplify, and improve the user experience of the centralized logging system, we see the need to agree on a Common Logging Schema. The Observability team has evaluated options and decided to adopt the Elastic Common Schema (ECS).
ECS logs are identified by including the ECS version in the structured log event. This field is
ecs.version and should contain the ECS version the log event is targeting.
The structured log object (a JSON object) consists of a set of attributes. There are a few common attributes (also known as fields) that most every log source will want to populate. When possible, please follow the attribute content recommendations in this document.
timestamp attribute contains an ISO-8601 formatted timestamp indicating the time the log was generated in UTC. This field will be translated to the native date type and moved to
If not provided, the logging pipeline will generate the
@timestamp field indicating the time it was received by the logging pipeline.
message is a short summary or message optimized for viewing in a log viewer. When a message is not provided, it can be constructed from other fields to provide a human-readable summary of the log entry.
The message field is analyzed as a natural language text type. This means that the message field will be:
- tokenized -- the text is broken up on whitespace, stop words, and optionally non-letter characters
- filtered -- the tokens are downcased and stemmed
- indexed -- the filtered tokens are then indexed into an inverted index indicating which documents the token can be found in.
The message field is often times the first field a user will look to when searching for diagnostic information. While there are no restrictions about what data is allowed in the
message field, we recommend following the general recommendation to optimize the field for human consumption.
How to tell if a piece of information is not a good fit for the message field:
- Would this information be glossed over when a user reads the message?
- Is the piece of information useful for measurement?
- Would it take multiple lines to summarize the event?
If the answer to any of the above questions is "yes," consider moving the datapoint(s) to their own field as defined in the ECS documentation or the
Common datapoints with their own fields:
- Event (UU)IDs:
- Stack traces:
- HTTP data:
- URL data:
- (... this list is incomplete)
log.level field is a human-readable string and is indexed as a keyword. If
log.level is omitted, the logging pipeline will attempt to populate it with:
- The value at
- The human-readable definition of
NOTSETif no other level indicator could be found.
For log producers that emit JSON-formatted messages and define their own level,
log.level is used to populate
log.syslog.severity.code per this table:
||RFC5424 definition||Lowercase RFC5424 Severity||RFC5424 Severity code||PHP||Java||NodeJS||Python||Syslog|
|trace, debug||debug-level messages||debug||7|
|info, informational||informational messages||informational||6|
|notice||normal but significant condition||notice||5|
|warning, warn||warning conditions||warning||4|
|error, err||error conditions||error||3|
|critical, crit||critical conditions||critical||2|
|alert||action must be taken immediately||alert||1|
|emerg, emergency, fatal||system is unusable||emergency||0|
log.level cannot be mapped to RFC5424 severity, then
syslog.severity.name will be set to "alert" and
syslog.severity.code will be set to "1".
service.name is a combination of service and cluster. The intent for this field is to indicate not just the service that emitted the log entry, but also indicate what cluster in the overall system the log came from.
- For Kubernetes: this is the namespace name.
- For all others: this is usually the application name and cluster concatenated with a hyphen (-).
|It is important to have a meaningful and clear cluster names to avoid confusion around the concatenated service name and cluster.|
service.type is the application name.
- For Kubernetes: this is the app label.
- For all others: this is the application name.
Oftentimes, one will need diagnostic data to accompany the log entry. Diagnostic data gives the log entry context, more detail, and sometimes a path to reproduction. ECS defines fields to provide for the need for diagnostic data.
host.name and respective fields in the host object.
ECS defines the
labels field for custom key-value data.
These fields are commonly used, but have no clear analogue in ECS.
event.module, or a custom label in the labels object.
As of this writing (1.6.0), there is no great place for HTTP headers. (See this PR).
- The terms "attribute" and "field" are used interchangably.
- Presence of the
timestampfield (without the
@) in Kibana indicates a problem in the logging pipeline and must be rectified.
- In Kibana, 180 characters shows comfortably on one line on a 1920x1080 widescreen monitor.
- Based on a set of rules, the base word is extracted from the word. For example, "running" is stemmed to "run" and "browsers" is stemmed to "browser".
- The WMF uses a number of programming languages in production. Each programming language has its own opinion on how to indicate logging level. Logging level can be customized by the developer further complicating the issue of finding errors. We see the need to agree on a defined set of log levels to make it easier for log consumers not always familiar with the programming language or developer preferences to find what they need. The Observability team has decided to standardize on RFC5424 Syslog severity.
NOTSETindicates a problem either in the log producer or the pipeline and must be rectified.
- In some cases, this field can be generated by the pipeline.