You are browsing a read-only backup copy of Wikitech. The live site can be found at


From Wikitech-static
Jump to navigation Jump to search


Graphite was unable to serve data properly due to large queries in dashboards


  • 16:16 - 502 bad gateway from graphite
  • 16:18 - investigation begins
  • 16:31 - recovery, root cause still unknown, large query suspected
  • 16:47 - stop graphite-index cronjob, suspected as a factor and later excluded
  • 17:57 - the offending queries are found and the related grafana dashboard deleted
  • 18:54 - offending client banned from apache on graphite1001


Graphite doesn't include query cancellation or timeout capabilities for local queries it seems, so queries involving a lot of time series can occupy all uwsgi workers, resulting in "bad gateway" from apache. In addition, grafana clients don't seem to reload dashboards when the dashboard definition is itself updated. This results in clients keep requesting the same (problematic, in this case) dashboard and thus needing a bad server-side.


  • limit the impact of heavy/large graphite queries (bug T116767)