You are browsing a read-only backup copy of Wikitech. The live site can be found at wikitech.wikimedia.org

Wikimedia Cloud Services team/Clinic duties: Difference between revisions

From Wikitech-static
Jump to navigation Jump to search
imported>Nskaggs
(Add service requests to list)
imported>Nskaggs
(shinken is dead. May it's memory live on.)
Line 9: Line 9:
** Call people out for poor behavior in the channel
** Call people out for poor behavior in the channel
** Praise people for helping constructively
** Praise people for helping constructively
* Monitor [https://icinga.wikimedia.org/icinga/ icinga] and [http://shinken.wmflabs.org/problems shinken] for alerts
* Monitor [https://icinga.wikimedia.org/icinga/ icinga] for alerts
* Watch for wmcs-related cronspam and fix the causes when possible
* Watch for wmcs-related cronspam and fix the causes when possible
* Check the [https://grafana-labs.wikimedia.org/dashboard/db/tools-basic-alerts tools grafana board] for trends
* Check the [https://grafana-labs.wikimedia.org/dashboard/db/tools-basic-alerts tools grafana board] for trends

Revision as of 21:52, 17 August 2020

The WMCS team practices a clinic duty rotation that runs from one weekly team meeting to the next. Each team member takes a turn sequentially performing these duties.

🦄 of the week duties

andrew@cloud-cumin-01:~$ sudo cumin --force --timeout 500 -o json  "A:all" "/usr/local/lib/nagios/plugins/check_puppetrun -w 3600 -c 86400" | grep "Failed to apply catalog"
 
andrew@cloud-cumin-01:~$ sudo cumin --force --timeout 500 -o json  "A:all" "/usr/local/lib/nagios/plugins/check_puppetrun -w 3600 -c 86400" | grep  -i unknown

Maintenance tasks (probably not all weeks)