You are browsing a read-only backup copy of Wikitech. The primary site can be found at wikitech.wikimedia.org

Difference between revisions of "Wikimedia Cloud Services team/Clinic duties"

From Wikitech
Jump to navigation Jump to search
imported>BryanDavis
(s/on-call/clinic duty/)
 
imported>Andrew Bogott
 
Line 13: Line 13:
 
* Triage [https://phabricator.wikimedia.org/maniphest/?project=PHID-PROJ-msyn2z45n7mw45bfuscb&statuses=open()&group=none&order=newest#R|new tasks added to Phabricator in the #cloud-services project]
 
* Triage [https://phabricator.wikimedia.org/maniphest/?project=PHID-PROJ-msyn2z45n7mw45bfuscb&statuses=open()&group=none&order=newest#R|new tasks added to Phabricator in the #cloud-services project]
 
* Check for broken puppet on VMs (owners get daily emails from [https://gerrit.wikimedia.org/r/plugins/gitiles/operations/puppet/+/refs/heads/production/modules/base/files/labs/puppet_alert.py puppetalert.py] but you can contact them if an instance is un-puppetized for a particularly long time):
 
* Check for broken puppet on VMs (owners get daily emails from [https://gerrit.wikimedia.org/r/plugins/gitiles/operations/puppet/+/refs/heads/production/modules/base/files/labs/puppet_alert.py puppetalert.py] but you can contact them if an instance is un-puppetized for a particularly long time):
  andrew@labpuppetmaster1001:~$ sudo cumin --force --timeout 500 -o json  "A:all" "/usr/local/lib/nagios/plugins/check_puppetrun -w 3600 -c 86400" | grep "Catalog fetch fail"
+
  andrew@cloud-cumin-01:~$ sudo cumin --force --timeout 500 -o json  "A:all" "/usr/local/lib/nagios/plugins/check_puppetrun -w 3600 -c 86400" | grep "Catalog fetch fail"
 
* Run [[Add_a_wiki#Cloud_Services|<code>maintain-views</code> and <code>maintain-meta_p</code> on labsdb*]] as needed for new tables/wikis
 
* Run [[Add_a_wiki#Cloud_Services|<code>maintain-views</code> and <code>maintain-meta_p</code> on labsdb*]] as needed for new tables/wikis
 
* Monitor https://toolsadmin.wikimedia.org/tools/membership/ for new requests and process them
 
* Monitor https://toolsadmin.wikimedia.org/tools/membership/ for new requests and process them

Latest revision as of 20:20, 12 September 2019

The WMCS team practices a clinic duty rotation that runs from one weekly team meeting to the next. Each team member takes a turn sequentially performing these duties.

🦄 of the week duties

andrew@cloud-cumin-01:~$ sudo cumin --force --timeout 500 -o json  "A:all" "/usr/local/lib/nagios/plugins/check_puppetrun -w 3600 -c 86400" | grep "Catalog fetch fail"

Maintenance tasks (probably not all weeks)