You are browsing a read-only backup copy of Wikitech. The primary site can be found at wikitech.wikimedia.org
Incident documentation/2017-01-11 multiversion
The multiversion code is poorly understood by many deployers. The code is complex and the entry points are a mess. An ongoing effort has been underway to address this. On Jan 11th, a fairly involved refactor landed and caused a brief outage, despite testing in beta, on mwdebug*, and the canary checks.
- 18:28: Gerrit # was merged
- tested on beta, mwdebug, etc
- 18:56 demon@tin: Synchronized multiversion/MWMultiVersion.php: Attempt #2 for Multiversion cleanup (duration: 00m 41s)
- 19:27 demon@tin: Synchronized php-1.29.0-wmf.7/extensions/FlaggedRevs: Stupid errors (duration: 00m 46s)
- Not technically related, but weird autoloader bugs became more apparent (seen also in TMH) in testing this, so we backported a fix here
- 19:34 demon@tin: Synchronized multiversion: MWVersion fallbacks & such (duration: 00m 56s)
- outage immediately reported, began rollback
- PHP fatal error: Call to undefined method stdClass::get()
- 19:36 demon@tin: Synchronized multiversion: rollback (duration: 00m 56s)
The canary checks for MediaWiki remain insufficient to catch production errors prior to code rolling out live. mwdebug* is nice for testing specific config changes, but does not get "real" traffic so it's hard to test things extensively. The multiversion code is incredibly fragile--but we knew this. This refactor is complicated, should be broken down even further (than it already is)...small changes are best with this endeavor.