Portal:Cloud VPS/Admin/Alerts

From Wikitech-static
Jump to navigation Jump to search
For any new alert runbook create a runbook page instead (see Portal:Cloud_VPS/Admin/Runbooks)

Alerts possible to WMCS-team (or WMCS-bots as of now):


  • Nova-Fullstack (labnet) - Launch a "full" test of instance creation
  • nova-network (labnet) - handle dynamic NAT and networking gateway
  • nova-api (labnet) - main API gateway for interacting with nova (creation, deletion, etc)
  • nova-scheduler (labcontrol) - schedule and launch instances
  • nova-compute - handles setup and tear down of instances on hypervisor
  • nova-conductor - DB broker for nova components not-nova-api


  • glance-api-http (control) - image management for instances


  • projects and users
    • check-novaobserver-membership - Make sure 'novaobserver' has 'observer' everywhere
    • check-novaadmin-membership - Make sure 'novaadmin' has 'projectadmin' and 'user' everywhere
    • check-keystone-projects - Verify service projects
  • services
    • keystone-http-${auth_port} - admin API port avail (little context)
    • keystone-http-${public_port} - public API port (little context)


  • check_designate_api_process: service api for DNS changes
  • designate-api-http: api external monitoring
  • check_designate_sink_process
  • check_designate_central_process
  • check_designate_mdns`
  • check_designate_pool-manager


  • nfsd-exports - sets up /etc/export.d/ files for instances in cloud
  • interfaces - saturation in/out
  • ldap - there is a scheme to use LDAP for groups w/o having the entire system be an LDAP client.
  • secondary - checks specific to the 'secondary' Tooforge DRBD/NFSd cluster



  • tools-proxy - reverse proxy for all web tools
  • tools-checker-self - reverse proxy for actual check running. This is to monitoring toolforge from prod icinga atm.
  • tools-checker-ldap - without LDAP Toolfroge crumbles.
  • tools-checker-labs-dns-private - verify resolution for internal DNS from within Toolforge
  • tools-checker-nfs-home - NFS /home test (this is a subpath really of one export for project and home)
  • tools-checker-grid-start-trusty - starting and running a process on grid
  • tools-checker-etcd-flannel - etcd is the backend for flannel which is our networking overlay for k8s
  • tools-checker-etcd-k8s - etcd is the persistent data store for k8s itself
  • tools-checker-k8s-node-ready - check to see if k8s thinks workers are healthy (nods)


Ceph Cluster Health

Moved to a runbook: Portal:Cloud_VPS/Admin/Runbooks/Ceph_Cluster_Health