You are browsing a read-only backup copy of Wikitech. The live site can be found at wikitech.wikimedia.org
WMDE/Wikidata/Alerts: Difference between revisions
imported>Addshore (→Edits: Wikidata edit rate: mention maxlag) |
imported>Addshore No edit summary |
||
Line 14: | Line 14: | ||
The dashboard can be found here: https://grafana.wikimedia.org/d/TUJ0V-0Zk/wikidata-alerts | The dashboard can be found here: https://grafana.wikimedia.org/d/TUJ0V-0Zk/wikidata-alerts | ||
=== Maxlag: Above 10 for 1 hour === | |||
In the past this has been caused by: | |||
- dispatch lag being high, due to waiting for replication, due to a db server being overloaded, due to a long running query that was not correctly killed | |||
=== Edits: Wikidata edit rate === | === Edits: Wikidata edit rate === | ||
Line 29: | Line 34: | ||
Investigate the wb api @ https://grafana.wikimedia.org/d/000000559/api-requests-breakdown?refresh=5m&orgId=1&var-metric=p50&var-module=wb* | Investigate the wb api @ https://grafana.wikimedia.org/d/000000559/api-requests-breakdown?refresh=5m&orgId=1&var-metric=p50&var-module=wb* | ||
In the past this has been caused by: | |||
- s8 db being overloaded, often for a fixable reason | |||
- Memcached being overloaded, in the past indicating UBNs | |||
==Oozie Job== | ==Oozie Job== | ||
Contact WMF analytics to investigate | Contact WMF analytics to investigate |
Revision as of 16:48, 11 June 2020
Icinga
Wikidata related Icinga alerts are defined in puppet https://gerrit.wikimedia.org/r/plugins/gitiles/operations/puppet/+/production/modules/icinga/manifests/monitor/wikidata.pp
The status of alerts can be seen at https://icinga.wikimedia.org/cgi-bin/icinga/status.cgi?search_string=wikidata
All alerts report to the "wikidata" contact group, which can be seen at https://gerrit.wikimedia.org/r/plugins/gitiles/operations/puppet/+/production/modules/nagios_common/files/contactgroups.cfg#52
Internally in WMDE there is a wikidata-monitoring mailing list you can subscribe to, also notifications will land in the wikidata IRC channel.
Grafana
One of the Icinga checks monitors the alert status of the wikidata alerts dashboard on Grafana.
The dashboard can be found here: https://grafana.wikimedia.org/d/TUJ0V-0Zk/wikidata-alerts
Maxlag: Above 10 for 1 hour
In the past this has been caused by:
- dispatch lag being high, due to waiting for replication, due to a db server being overloaded, due to a long running query that was not correctly killed
Edits: Wikidata edit rate
The edit rate on Wikidata can be a good indicator that something somewhere is wrong, although it will not always indicate exactly what that is.
You can view the edits dashboard at https://grafana.wikimedia.org/d/000000170/wikidata-edits
If MAXLAG is high, that might be a reason for low edit rate.
You may want to investigate what is going on with the API (as all edits go via the API) https://grafana.wikimedia.org/d/000000559/api-requests-breakdown?refresh=5m&orgId=1&var-metric=p50&var-module=wb*
API: Max p95 execute time for write modules
Investigate the wb api @ https://grafana.wikimedia.org/d/000000559/api-requests-breakdown?refresh=5m&orgId=1&var-metric=p50&var-module=wb*
In the past this has been caused by:
- s8 db being overloaded, often for a fixable reason - Memcached being overloaded, in the past indicating UBNs
Oozie Job
Contact WMF analytics to investigate