You are browsing a read-only backup copy of Wikitech. The live site can be found at wikitech.wikimedia.org
This page contains historical information. It is probably no longer true.
This page is intended to document the monitoring infrastructure that exists for the fundraising as well as keep track of desired monitoring functionality.
Existing monitoring infrastructure
See RT tickets #405
- Nagios check for alive-ness
- Nagios check for failed builds
- Note: some scripts run by Hudson need to be modified to throw a non-successful exit status when they don't complete properly (eg send/receive mail scripts for civimail)
- Nagios check for too many files in build folders (if the limit of 63999 gets hit, builds will fail)
- Nagios check for queues filling up too fast
- Service communication times
- Nagios checks for timeouts/unacceptably high communication times
- 3rd party service accessibility from payments cluster
- Nagios check for communications access to MaxMind/PayPal