You are browsing a read-only backup copy of Wikitech. The live site can be found at wikitech.wikimedia.org
Someone ran some sort of scanner script against this service, causing the backend daemon to crash twice. Service recovered on its own, total downtime ~10 mins.
- 19:48 Apparent vulnerability scan from 126.96.36.199 (212-173.206-83.static-ip.oleane.fr.) begins, about 2000 404s seen in zirconium:/var/log/apache/error.log:
[Mon Jul 14 19:59:42 2014] [error] [client 188.8.131.52] File does not exist: /var/www/planet/en/webavis [Mon Jul 14 19:59:43 2014] [error] [client 184.108.40.206] File does not exist: /var/www/planet/en/webmail [Mon Jul 14 19:59:43 2014] [error] [client 220.127.116.11] File does not exist: /var/www/planet/en/webnews [Mon Jul 14 19:59:43 2014] [error] [client 18.104.22.168] File does not exist: /var/www/planet/en/webroot [Mon Jul 14 19:59:44 2014] [error] [client 22.214.171.124] File does not exist: /var/www/planet/en/webyep-system
- 19:50 zirconium:/var/log/etherpad-lite/etherpad-lite.log starts saying strange things, daemon restarts twice:
[2014-07-14 19:50:31.068] [ERROR] console - Error: Failed assertion: Can't have op.lines of 2 in attribution lines at Object.error (/usr/share/etherpad-lite/src/static/js/Changeset.js:39:11) [2014-07-14 19:50:31.068] [INFO] console - graceful shutdown... RESTART! start... [...] [2014-07-14 19:53:43.458] [INFO] console - graceful shutdown... RESTART! start...
- 19:53 Etherpad-lite daemon at zirconium:9001 stops accepting connections.
Apache error log: [Mon Jul 14 19:53:46 2014] [error] [client 126.96.36.199] (20014)Internal error: proxy: error reading status line from remote server 127.0.0.1:9001 [Mon Jul 14 19:53:46 2014] [error] [client 188.8.131.52] proxy: Error reading from remote server returned by /p/SULfinalisationengineering [Mon Jul 14 19:53:46 2014] [error] [client 184.108.40.206] (20014)Internal error: proxy: error reading status line from remote server 127.0.0.1:9001, referer: https://etherpad.wikimedia.org/p/Reproducibility_Workshop
- 19:54: Greg-g reports on IRC that etherpad.wikimedia.org is down, giving a "Service Temporarily Unavailable" message.
- 19:59: Scanning ends, according to logs
- 20:01: Service is available again
- Service recovered without administrative action
- Icinga didn't detect this outage because it only checks for a running process rather than an HTTP 200 response from port 9001. Plus since Etherpad auto-restarted, Icinga wouldn't have caught it anyway unless itchecked for a PID change.
- Ganglia shows CPU & loadavg spikes (small), no significant change in net traffic
- Etherpad is backed by db1001, no significant impact observed in Ganglia
- Unclear what HTTP response code was sent during the outage
- We are already running the newest release of Etherpad-lite, 1.4.0. No mention of security concerns from upstream https://github.com/ether/etherpad-lite/ (though they do have 255 issues open!), no relevant entries in CVE DB.
- No suggestion of security vulnerability at this time
- Configure etherpad-lite.log rotation, currently: 786MB since 2013-11-23
The above has been done. The new package 1.4.0-2 now has logrotate support and now etherpad-lite finally no longer uses console logging but logs directly to /var/log. Relevant changes are: https://gerrit.wikimedia.org/r/#/c/151115/ and https://gerrit.wikimedia.org/r/#/c/151106/ . Due to the migration to upstart that was adder in the latter, downtime due to the software malfunctioning will be shorter (faster restarts by upstart)
Etherpad-lite stack trace
[2014-07-14 19:53:43.458] [ERROR] console - Error: Failed assertion: Can't have op.lines of 2 in attribution lines at Object.error (/usr/share/etherpad-lite/src/static/js/Changeset.js:39:11) at Object.assert (/usr/share/etherpad-lite/src/static/js/Changeset.js:53:13) at outputMutOp (/usr/share/etherpad-lite/src/static/js/Changeset.js:1167:15) at Object.exports.mutateAttributionLines (/usr/share/etherpad-lite/src/static/js/Changeset.js:1213:9) at async.series.callback.forwardsChangesets (/usr/share/etherpad-lite/src/node/handler/PadMessageHandler.js:1392:19) at async.series.results (/usr/share/etherpad-lite/src/node_modules/async/lib/async.js:486:21) at _asyncMap (/usr/share/etherpad-lite/src/node_modules/async/lib/async.js:185:13) at async.forEachSeries.iterate (/usr/share/etherpad-lite/src/node_modules/async/lib/async.js:108:13) at async.forEachSeries.iterate (/usr/share/etherpad-lite/src/node_modules/async/lib/async.js:119:25) at _asyncMap (/usr/share/etherpad-lite/src/node_modules/async/lib/async.js:187:17) [2014-07-14 19:53:43.458] [INFO] console - graceful shutdown... RESTART!
The above has been reported in https://github.com/ether/etherpad-lite/issues/2215 . No reaction after 2 days