You are browsing a read-only backup copy of Wikitech. The live site can be found at wikitech.wikimedia.org

Incidents/20161227-ores

From Wikitech-static
< Incidents
Revision as of 17:45, 8 April 2022 by imported>Krinkle (Krinkle moved page Incident documentation/20161227-ores to Incidents/20161227-ores)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

Summary

ORES wasn't able to score a growing proportion of edits in Wikidata for several weeks.

Timeline

  • The quantity change API in Wikibase got deployed in mid-November (probably on November 18). (phab:T133042). Pywikibase didn't catch up and failed on items that have statement without boundaries. It wasn't much but started to grow.
  • The failure rate started to grow.
  • December 27th, 00:16 UTC Quantity changes broke ORES gets reported.
  • 00:53 The fix in ores-experiment.wmflabs.org is pushed and confirmed to fix this issue.
  • 05:00 The fix in beta cluster is pushed and confirmed to fix it.
  • <SAL> [2016-12-27T05:06:06Z] <Amir1> starting deploy of ores:228b9b4 in canary nodes (T154168)
  • <SAL> [2016-12-27T05:14:37Z] <Amir1> starting deploy of ores:228b9b4 in all nodes (T154168)
  • <SAL> [2016-12-27T05:25:27Z] <Amir1> finished deploy of ores:228b9b4 in all nodes (T154168)

Conclusions

Unexpected breaking changes can happen all the time. We need to have better monitoring of failure ratio.

Actionables

  • Clean up failure ratio monitoring and set up an alarm when it goes more than a certain threshold (task T154175)