You are browsing a read-only backup copy of Wikitech. The live site can be found at wikitech.wikimedia.org
Incidents/20150625-LabsNFSOutage
Jump to navigation
Jump to search
Summary
There was a 12 minute Labs NFS outage (and subsequently Tools outage) caused by human error (reboot
entered on the wrong terminal accidentally). The system came back up, the arrays were assembled and NFS was back up in 12 minutes.
Timeline
16:35: Yuvi accidentally types reboot
into labstore1002
instead of a labs instance awaiting NFS removal.
16:38: Machine comes back up, Coren starts assembling arrays and bringing the NFS server back up
16:47: Instances all recovered from NFS stall, everything working ok
Actionables
- Feed Yuvi to a Moose
Not done Yuvi, we love you!
- Install the molly-guard package on production hosts https://phabricator.wikimedia.org/T103873
Done