You are browsing a read-only backup copy of Wikitech. The live site can be found at

Incident documentation meeting/QR201407/group1/notes

From Wikitech-static
< Incident documentation meeting/QR201407‎ | group1
Revision as of 22:13, 2 April 2020 by imported>Krinkle (Krinkle moved page Incident documentation/QR201407/group1/notes to Incident documentation meeting/QR201407/group1/notes: rename so that prefixindex/from works without these showing up as sorting "after" 2020)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search


  • migrated to m2 shard, shouldn't have too many load issues in future
  • analytics is responsible for responding to alerts is from analytics
  • ops is responsible for generic looking database alerts
  • EL can be down or lagging for up to 48 hours (weekends) - "Tier 2" support


  • would have been good to have Ariel on the call
  • greg to follow up on explicit next steps with Bryan and Reedy
  • Add to next group's list


  • all green :)
  • seems all bases are covered here, any disagreement? :)


  • blog work, loop back with RobH re future of that box? HA? etc?
  • how far away to get rid of blog?


  • still need to create reproducible steps for this to be reported upstream
  • still need to manually remove a sick node (on purpose)



  • MediaWiki failed to stop trying to use the bogged down machine
    • Greg: need to get this diagnosed and tracked
    • HHVM's impact here?
  • proposal 4 related to Rashomon?