You are browsing a read-only backup copy of Wikitech. The live site can be found at wikitech.wikimedia.org
User:Cwhite/Logstash/ECS Schema Guide for Developers
Rationale
Oftentimes, software is opinionated about what constitutes a log entry. Since WMF's centralized logging infrastructure became generally available, it has experienced incredible organic growth. This growth presents challenges in the storage, ingest, and presentation domains. One such issue is there is no definition to how many fields can be set and subsequentially no typing info provided. Without control on the type of these fields, Elasticsearch must guess the type making type collisions a regular occurrence. Without control of what fields are available, fields remain largely undefined and meaningless to the outside observer. As we strive to boost signal, reduce noise, scale, simplify, and improve the user experience of the centralized logging system, we see the need to agree on a Common Logging Schema. The Observability team has evaluated options and decided to adopt the Elastic Common Schema (ECS).
Required Fields
ECS Version
ECS logs are identified by including the ECS version in the structured log event. This field is ecs.version
and should contain the ECS version the log event is targeting.
Common Fields
The structured log object (a JSON object) consists of a set of attributes. There are a few common attributes[1] that most every log source will want to populate. When possible, please follow the field content recommendations in this document.
Timestamp
Ideally, the timestamp
attribute contains an ISO-8601 formatted timestamp indicating the time the log was generated in UTC. This field will be translated to the native date type and moved to @timestamp
.[2]
If not provided, the logging pipeline will generate the @timestamp
field indicating the time it was received by the logging pipeline.
Message
message
is a short summary or message optimized for viewing in a log viewer.[3] When a message is not provided, it can be constructed from other fields to provide a human-readable summary of the log entry.
The message field is often times the first field a user will look to when searching for diagnostic information. While there are no restrictions about what data is allowed in the message
field, we recommend optimizing the field for human consumption by keeping the message short and putting diagnostic data in the proper place.[4]
How to tell if a piece of information is diagnostic data and not a good fit for the message field:
- Would this information be glossed over when a user reads the message?
- Is the piece of information useful for measurement?
- Is the piece of information useful to correlate with other log entries?
- Would it take multiple lines render the data in the message?
If the answer to any of the above questions is "yes," consider moving the datapoint(s) to their own field as defined in the ECS documentation or the label
object.
Common datapoints with their own fields:
- Event (UU)IDs:
event.id
field. - Stack traces:
error.stack_trace
field. - HTTP data:
http
object field. - URL data:
url
object field. - (... this list is incomplete)
Log Level[5]
The log.level
field is a human-readable string and is indexed as a keyword. If log.level
is omitted, the logging pipeline will attempt to populate it with:
- The value at
log.syslog.severity.name
. - The human-readable definition of
log.syslog.severity.code
. NOTSET
if no other level indicator could be found.[6]
For log producers that emit JSON-formatted messages and define their own level, log.level
is used to populate log.syslog.severity.name
and log.syslog.severity.code
per this table:
Lowercase log.level |
RFC5424 definition | Lowercase RFC5424 Severity | RFC5424 Severity code | PHP[7] | Java[8] | NodeJS[9] | Python[10] | Syslog[11] |
---|---|---|---|---|---|---|---|---|
trace, debug | debug-level messages | debug | 7 | ![]() |
![]() |
![]() |
![]() |
![]() |
info, informational | informational messages | informational | 6 | ![]() |
![]() |
![]() |
![]() |
![]() |
notice | normal but significant condition | notice | 5 | ![]() |
![]() |
![]() |
![]() |
![]() |
warning, warn | warning conditions | warning | 4 | ![]() |
![]() |
![]() |
![]() |
![]() |
error, err | error conditions | error | 3 | ![]() |
![]() |
![]() |
![]() |
![]() |
critical, crit | critical conditions | critical | 2 | ![]() |
![]() |
![]() |
![]() |
![]() |
alert | action must be taken immediately | alert | 1 | ![]() |
![]() |
![]() |
![]() |
![]() |
emerg, emergency, fatal | system is unusable | emergency | 0 | ![]() |
![]() |
![]() |
![]() |
![]() |
If log.level
cannot be mapped to RFC5424 severity, then syslog.severity.name
will be set to "alert" and syslog.severity.code
will be set to "1".
Service Name[12]
service.name
is a combination of service and cluster. The intent for this field is to indicate not just the service that emitted the log entry, but also indicate what cluster in the overall system the log came from.
- For Kubernetes: this is the namespace name.
- For all others: this is usually the application name and cluster concatenated with a hyphen (-).
Examples:
- elasticsearch-logging
- blazegraph-wdqs
- elasticsearch-wdqs
- mediawiki-api_appserver
- mediawiki-jobrunner
- memcached-memcached_gutter
- memcached-memcached
![]() | It is important to have a meaningful and clear cluster names to avoid confusion around the concatenated service name and cluster. |
Service Type[12]
service.type
is the application name.
- For Kubernetes: this is the app label.
- For all others: this is the application name.
Examples:
- elasticsearch
- kafka
- blazegraph
- mediawiki
- restbase
Diagnostic Data
Oftentimes, one will need diagnostic data to accompany the log entry. Diagnostic data gives the log entry context, more detail, and sometimes a path to reproduction. ECS defines fields to provide for the need for diagnostic data.
Hostname
host.name
and respective fields in the host object.
Url Object
HTTP Object
Custom Fields
ECS defines the labels
field for custom key-value data.
![]() | The labels field does not support nested objects. All keys and values are stored as keyword. |
Deprecated Fields
These fields are commonly used, but have no clear analogue in ECS.
Channel
Use log.logger
, event.module
, or a custom label in the labels object.
Type
Use service.type
and/or service.name
.
Program
Use service.type
and/or service.name
.
Missing Fields
HTTP Headers
As of this writing (1.6.0), there is no great place for HTTP headers. (See this PR).
Notes
- ↑ The terms "attribute" and "field" are used interchangeably.
- ↑ Presence of the
timestamp
field (without the@
) in Kibana indicates a problem in the logging pipeline and must be rectified. - ↑ In Kibana, 180 characters shows comfortably on one line on a 1920x1080 widescreen monitor.
- ↑
The message field is analyzed as a natural language text type. This means that the message field will be:
- tokenized -- the text is broken up on whitespace, stop words, and optionally non-letter characters
- filtered -- the tokens are downcased and stemmed. Based on a set of rules, the base word is extracted from the word. For example, "running" is stemmed to "run" and "browsers" is stemmed to "browser".
- indexed -- the filtered tokens are then indexed into an inverted index indicating which documents the token can be found in.
- ↑ The WMF uses a number of programming languages in production. Each programming language has its own opinion on how to indicate logging level. Logging level can be customized by the developer further complicating the issue of finding errors. We see the need to agree on a defined set of log levels to make it easier for log consumers not always familiar with the programming language or developer preferences to find what they need. The Observability team has decided to standardize on RFC5424 Syslog severity.
- ↑
NOTSET
indicates a problem either in the log producer or the pipeline and must be rectified. - ↑ https://www.php-fig.org/psr/psr-3/
- ↑ https://en.wikipedia.org/wiki/Log4j#Log4j_log_levels
- ↑ https://github.com/trentm/node-bunyan#levels
- ↑ https://docs.python.org/3/library/logging.html#levels
- ↑ https://tools.ietf.org/html/rfc5424#section-6.2.1
- ↑ 12.0 12.1 In some cases, this field can be generated by the pipeline.