You are browsing a read-only backup copy of Wikitech. The live site can be found at wikitech.wikimedia.org

Logs: Difference between revisions

From Wikitech-static
Jump to navigation Jump to search
imported>Krinkle
(rewrite, update outdated docs, remove various channels that no longer exist)
imported>Krinkle
No edit summary
Line 1: Line 1:
{{Navigation Wikimedia infrastructure|expand=logging}}
{{Navigation Wikimedia infrastructure|expand=logging}}
: ''This page is about server log files. For [[IRC]] channel logs, see e.g. http://wm-bot.wmflabs.org/ ''
: ''This page is about server log files. For [[IRC]] channel logs, see e.g. http://wm-bot.wmflabs.org/ ''
'''Logs''' of several sorts are generated across the cluster and collected in a single [[Locations|location]] replicated on some machines. Privileged users can explore most logs through the [[Kibana]] front-end at https://logstash.wikimedia.org/.
'''Logs''' of several sorts are generated across the cluster and collected in a single [[Locations|location]] replicated on some machines. Privileged users can explore most logs through the [[OpenSearch Dashboards]] front-end at https://logstash.wikimedia.org/.


{{anchor|mw-log}}
{{anchor|mw-log}}
Line 9: Line 9:
__TOC__
__TOC__
== <code>[[mwlog1002]]:/srv/mw-log/</code> ==
== <code>[[mwlog1002]]:/srv/mw-log/</code> ==
These record <code>wfDebugLog()</code> and similar calls in MediaWiki (see especially [[mw:Manual:Structured_logging|mw:Structured logging]]). All cluster-wide logs are aggregated here (configured through [[MediaWiki UDP logging|$wmgUdp2logDest]], see also [https://phabricator.wikimedia.org/diffusion/OMWC/browse/master/wmf-config/InitialiseSettings.php?grep=wmgMonologChannels wmgMonologChannels]). There are dozens log files, which amount to around 15 GB compressed per day [[phabricator:T88393#1161994|as of April 2015]]. Some are not sent to [[logstash]] ([https://phabricator.wikimedia.org/diffusion/OMWC/browse/master/wmf-config/InitialiseSettings.php?grep=%27logstash%27 settings]) and [https://phabricator.wikimedia.org/diffusion/OMWC/browse/master/wmf-config/InitialiseSettings.php?grep=%27sample%27 some are sampled]; log archives are stored for a [https://phabricator.wikimedia.org/source/operations-puppet/browse/production/modules/role/files/logging/mw-log-cleanup variable amount of time], up to 90 days (per [[m:Data_retention_guidelines#To_what_data_do_these_guidelines_apply?|data retention guideline]]). Note that logstash also records the context data for structured logging, so it might contain significantly more information than the files.
These record <code>wfDebugLog()</code> and similar calls in MediaWiki (see especially [[mw:Manual:Structured_logging|mw:Structured logging]]). All cluster-wide logs are aggregated here (configured through [[MediaWiki UDP logging|$wmgUdp2logDest]], see also [https://noc.wikimedia.org/conf/highlight.php?file=InitialiseSettings.php wmgMonologChannels]). There are dozens log files, which amount to around 15 GB compressed per day [[phabricator:T88393#1161994|as of April 2015]]. Some are not sent to [[logstash]] ([https://noc.wikimedia.org/conf/highlight.php?file=InitialiseSettings.php settings]) and some are sampled; log archives are stored for a [https://phabricator.wikimedia.org/source/operations-puppet/browse/production/modules/role/files/logging/mw-log-cleanup variable amount of time], up to 90 days (per [[m:Data_retention_guidelines#To_what_data_do_these_guidelines_apply?|data retention guideline]]). Note that logstash also records the context data for structured logging, so it might contain significantly more information than the files.


Source: All appserver clusters.
Source: All appserver clusters.
Line 52: Line 52:
== 5xx errors ==
== 5xx errors ==


5xx errors are available on centrallog1001.eqiad.wmnet:/srv/weblog/webrequest/5xx.json. And in logstash, with [https://logstash.wikimedia.org/app/kibana#/dashboard/Varnish-Webrequest-50X 5xx kibana dashboard]
5xx errors are available on centrallog1001.eqiad.wmnet:/srv/weblog/webrequest/5xx.json. And in logstash, with [https://logstash.wikimedia.org/app/dashboards#/view/Varnish-Webrequest-50X Varnish 5xx Logstash dashboard]


== <code>[[deploy1002]]:/var/log/l10updatelog/l10update.log</code> ==
== <code>[[deploy1002]]:/var/log/l10updatelog/l10update.log</code> ==
Line 59: Line 59:


== <code>[[vanadium]]:/var/log/eventlogging/</code>==
== <code>[[vanadium]]:/var/log/eventlogging/</code>==
* <code>various</code>: Logs of [[mediawikiwiki:Extension:EventLogging|EventLogging]] entries. Potentially useful, in case their transformation into SQL and MongoDB records fails.
* <code>various</code>: Logs of [[mw:Extension:EventLogging|EventLogging]] entries. Potentially useful, in case their transformation into SQL and MongoDB records fails.


==Request logs==
==Request logs==

Revision as of 13:38, 10 June 2022

This page is about server log files. For IRC channel logs, see e.g. http://wm-bot.wmflabs.org/

Logs of several sorts are generated across the cluster and collected in a single location replicated on some machines. Privileged users can explore most logs through the OpenSearch Dashboards front-end at https://logstash.wikimedia.org/.

The SRE Observability team is working on a common log format called ECS, see the linked doc and intro slides. ECS documentation can be found at https://doc.wikimedia.org/ecs/

mwlog1002:/srv/mw-log/

These record wfDebugLog() and similar calls in MediaWiki (see especially mw:Structured logging). All cluster-wide logs are aggregated here (configured through $wmgUdp2logDest, see also wmgMonologChannels). There are dozens log files, which amount to around 15 GB compressed per day as of April 2015. Some are not sent to logstash (settings) and some are sampled; log archives are stored for a variable amount of time, up to 90 days (per data retention guideline). Note that logstash also records the context data for structured logging, so it might contain significantly more information than the files.

Source: All appserver clusters.

Directories:

  • archive/: Directory holding a limited number of previous days of the same logs (compressed once a day).

General channels:

  • exception.log: Fatal exceptions that receive either a localised "Internal error" page, or a Wikimedia Error page rendered by php-wmerrors.
    • Error pages report a request ID, e.g. [d84af39036] 2011-04-01: Fatal exception of type MWException".
    • To find details, search for d84af39036 in exception.log, or in Grafana under the "mediawiki" dashboard the exception log for "1903eff7" to find the complete stack trace).
  • apache2.log: aggregated Apache error logs, see #syslog
  • api.log: API requests and their parameters (including redacted POST payloads, and temporary PII). This used to be sampled, but is no longer (during 2014-2015) and is flushed every 30 days as of Nov 2015.

Specific components:

  • antispoof.log: Collision check passes and failures from the AntiSpoof extension. This checks for strings that look the same using different Unicode characters (such as spoofed usernames).
  • badpass.log: Failed login attempts to wikis.
  • captcha.log: Captcha attempts (both failed and successful attempts).
  • centralauth.log (2013-05-09–), centralauth-bug39996.log, centralauthrename.log (2014-07-14–): (temporary) debug logs for bugzilla:35707, bugzilla:39996, bugzilla:67875. In theory, rare events; can include username and page visited/request made.
  • CirrusSearch.log: Logs various info concerning cirrus (update/query failures and various debug info), Cirrus now uses the analytics platform to log search requests (Analytics/Data/Cirrus).
  • CirrusSearchSlowRequests.log: Logs slow requests
  • CirrusSearchChangeFailed.log: Logs update failures
  • external.log: ExternalStore blob fetch failures (see External storage)
  • imagemove.log: Page renames in the File namespace that take place (both failed and successful renames).
  • memcached.log: Memcached for MediaWiki (WANObjectCache, misc ephemeral data, rate limiting counters, advisory locks).
  • poolcounter.log: PoolCounter failures (connection problems, excess queue size, wait timeouts).
  • redis.log: Redis query and connection failures (might involve sessions, job queues, and some other assorted features).
  • resourceloader.log: Exceptions related to ResourceLoader.
  • runJobs.log: Tracks job queue activity and including errors (both failed and successful runs).
    • Can be used to produce stats on jobs run on the various wikis, e.g. with Tim's perl ~/job-stats.pl runJobs.log.
  • swift-backend.log: Errors in the SwiftFileBackend class (timeouts and HTTP 500 type errors for file and listing reads/writes).
  • slow-parse.log (since May 2012; 6 months archive)
  • spam.log: SimpleAntiSpam honeypot hits from bots (attempted user actions are discarded).
  • XWikimediaDebug.log: see X-Wikimedia-Debug#Debug logging.

syslog

The syslog for all application servers can be found on apache2.log on mwlog1001 or /srv/syslog/apache.log on centrallog1001. This includes things like segmentation faults.

5xx errors

5xx errors are available on centrallog1001.eqiad.wmnet:/srv/weblog/webrequest/5xx.json. And in logstash, with Varnish 5xx Logstash dashboard

deploy1002:/var/log/l10updatelog/l10update.log

Source: scap

  • l10update.log: Error log for LocalisationUpdate runs.

vanadium:/var/log/eventlogging/

  • various: Logs of EventLogging entries. Potentially useful, in case their transformation into SQL and MongoDB records fails.

Request logs

Logs of any kind of request, e.g. viewing a wiki page, editing, using the API, loading an image.

  • Analytics/Data/Webrequest: "wmf.webrequest" is a name of one unsampled requests archive in Hive. We started deleting older wmf.webrequest data in March 2015. We currently keep 62 days.

centrallog1001:/srv/weblog/webrequest

The cache (outer layer) request logs; see Squid logging#Log files.

The 1:1000 sampled logs are used for about 15 monthly and quarterly reports and day to day operations (source).

Beta cluster

The mw:Beta cluster in labs has a similar logging configuration to production. Various server logs are written to the remote syslog server deployment-mwlog01.deployment-prep.eqiad1.wikimedia.cloud in /srv/mw-log.

Apache access logs are written to /var/log/apache2/other_vhosts_access.log on each beta cluster host.

See mw:Beta_Cluster#Testing_changes_on_Beta_Cluster for information on how to access the beta logstash web UI.

Mailservers

exim logs are retained for 90 days (see phabricator:T167333).

Dead

Lucene (search)

Each host logs at /a/search/log/log (now less noisy), see Search#Trouble on how to identify which host serves what pool etc.

fenari:/home/wikipedia/syslog

Source: All apaches

  • apache.log: Error log of all apaches (includes sterr of PHP, so PHP Notices, PHP Warnings etc.)
    • Use fatalmonitor to aggregate this into a (tailing) report
    • This has been deprecated in favor of fluorine:/a/mw-log/apache2.log and logstash.

fenari:/var/log/

Source: Machine-specific logs

External links