You are browsing a read-only backup copy of Wikitech. The live site can be found at wikitech.wikimedia.org
Logstash
Logstash is a tool for managing events and logs. When used generically, the term encompasses a larger system of log collection, processing, storage and searching activities.
Overview ("ELK")
File:ELK Tech Talk 2015-08-20.pdf
File:Using Kibana4 to read logs at Wikimedia Tech Talk 2016-11-14.pdf
Various Wikimedia applications send log events to Logstash, which gathers the messages, converts them into json documents, and stores them in an Elasticsearch cluster. Wikimedia uses Kibana as a front-end client to filter and display messages from the Elasticsearch cluster.
Logstash
Logstash is a tool to collect, process, and forward events and log messages. Collection is accomplished via configurable input plugins including raw socket/packet communication, file tailing, and several message bus clients. Once an input plugin has collected data it can be processed by any number of filters which modify and annotate the event data. Finally logstash routes events to output plugins which can forward the events to a variety of external programs including Elasticsearch, local files and several message bus implementations.
Elasticsearch
Elasticsearch is a multi-node Lucene implementation. The same technology powers the CirrusSearch on WMF wikis.
Kibana
Kibana is a browser-based analytics and search interface for Elasticsearch that was developed primarily to view Logstash event data.
Systems feeding into logstash
See 2015-08 Tech talk slides
Writing new filters is easy.
Systems not feeding into logstash
- EventLogging (of program-defined events with schemas), despite its name, uses a different pipeline.
- Varnish logs of the billions of pageviews of WMF wikis would require a lot more hardware. Instead we use Kafka to feed web requests into Hadoop. A notable exception to this rule: varnish user-facing errors (HTTP status 500-599) are sent to logstash to make debugging easier.
- MediaWiki logs usually go to both logstash and log files, but a few log channels aren't. You can check which in
$wmgMonologChannels
in InitialiseSettings.php.
Production Logstash
- Web interface
- logstash.wikimedia.org runs Kibana
- Authentication
- wikitech LDAP username and password and membership in one of the following LDAP groups: nda, ops, wmf
- Hosts
- logstash100[1-6] servers in Eqiad.
- Configuration
- The cluster contains two types of nodes:
- Logstash100[1-3] provide a Logstash instance, an no-data Elasticsearch node, and an Apache vhost serving the Kibana application. The Apache vhosts also act as reverse proxies to the Elasticsearch cluster and perform LDAP-based authentication to restrict access to the potentially sensitive log information.
- Logstash100[4-6] provide Elasticsearch nodes forming the storage layer for log data.
- All hosts run Debian Jessie as a base operating system
- The misc Varnish cluster is being used to provide ssl termination and load balancing support for the Kibana application.
Kibana quick intro
- Start from one of the blue Dashboard links near the top, more are available from the Load icon near the top right.
- In "Events over time" click to zoom out to see what you want, or select a region with the mouse to zoom in.
- smaller time intervals are faster
- be careful you may see no events at all... because you're viewing the future
- When you get lost, click the Home icon near the top right
- As an example query,
wfDebugLog( 'Flow', ...)
in MediaWiki PHP corresponds totype:mediawiki AND channel:flow
- switch to using mw:Structured logging and you can query for ...
AND level:ERROR
- switch to using mw:Structured logging and you can query for ...
Read slide 11 and onwards in the TechTalk on ELK by Bryan Davis, they highlight features of the Kibana web page.
API
The Elasticsearch API is accessible at https://logstash.wikimedia.org/elasticsearch/
Note: The _search endpoint can only be used without a request body (see task T174960). Use _msearch instead for complex queries that need a request body.
Prototype (Beta) Logstash
- Web interface
- logstash-beta.wmflabs.org
- Hosts
- deployment-logstash2.eqiad.wmflabs
- Configuration
- It hosts a functional Logstash + Elasticsearch + Kibana stack at logstash-beta.wmflabs.org that aggregates log data produced by the beta cluster.
Gotchas
GELF transport
Make sure logging events sent to the GELF input don't have a "type" or "_type" field set, or if set, that it contains the value "gelf". The gelf/logstash config discards any events that have a different value set for "type" or "_type". The final "type" seen in Kibana/Elasticsearch will be take from the "facility" element of the original GELF packet. The application sending the log data to Logstash should set "facility" to a reasonably unique value that identifies your application.
Documents
See also
- mw:Manual:Structured logging (MediaWiki part of the job to feed into Logstash)
- Logs#mw-log (the old method of viewing logs)