You are browsing a read-only backup copy of Wikitech. The live site can be found at wikitech.wikimedia.org
Incident documentation/2017-10-17 webm-audio-mediatype
audio/webm files were incorrectly labeled as
After having fixed the cause (https://phabricator.wikimedia.org/T156135), there were a few old entries in the database that had to be fixed (
Because of the small amount of rows to be updated, mlitn misjudged the impact.
mlitn logs in to tin & connects to mysql commonswiki.
~12:58: mlitn runs
UPDATE image SET img_media_type="AUDIO", img_major_mime="audio" WHERE img_media_type="VIDEO" AND img_major_mime="video" AND img_minor_mime="webm" AND img_metadata LIKE '%s:9:"mime_type";s:10:"audio/webm";%'; Query OK, 30 rows affected (0.35 sec) Rows matched: 30 Changed: 30 Warnings: 0
~13:17: mlitn runs
UPDATE oldimage SET oi_media_type="AUDIO", oi_major_mime="audio" WHERE oi_media_type="VIDEO" AND oi_major_mime="video" AND oi_minor_mime="webm" AND oi_metadata LIKE '%s:9:"mime_type";s:10:"audio/webm";%'; Query OK, 4 rows affected (7.45 sec) Rows matched: 4 Changed: 4 Warnings: 0
~13:20: mlitn runs
UPDATE filearchive SET fa_media_type="AUDIO", fa_major_mime="audio" WHERE fa_media_type="VIDEO" AND fa_major_mime="video" AND fa_minor_mime="webm" AND fa_metadata LIKE '%s:9:"mime_type";s:10:"audio/webm";%'; Query OK, 94 rows affected (8.46 sec) Rows matched: 94 Changed: 94 Warnings: 0
~13:25: zeljkof & jynus notice lag on s4 (https://grafana.wikimedia.org/dashboard/db/mysql?orgId=1&var-dc=eqiad%20prometheus%2Fops&var-server=db1053&var-port=9104&from=1508243685418&to=1508248037877)
~13:25: mlitn forgot to
!log what he was doing: there was confusion, and execution of another (unrelated) script was halted as a result
As a general comment, the reason why it has so much impact was because, despite the number of rows being very low, the actual number of rows read was very high (probably most or a significant part of the table, which not only was read, but also blocked, preventing other writes to go on, and blocking replication on the non-masters. There are scripts such as pt-archiver that assure writes are done by primary key on very small batches, ensuring minimal impact (e.g. for deleting >30M rows on commmonswiki https://phabricator.wikimedia.org/T177772#3670119 ).
- Scripting should properly have been done instead of running from CLI
- DBAs should have been pinged, who are in a better position to judge impact & advise
!logshould have been used to warn what was about to be made
- Before performing database edits- backups should be performed to allow easy reversions
- Should dangerous maintenance scripts executions be limited/managed differently/documented better?