You are browsing a read-only backup copy of Wikitech. The live site can be found at


From Wikitech-static
Jump to navigation Jump to search

2015-05-19 Mailman outage


The mailing lists underwent maintainance on 2015-05-19 @ 17:00 UTC. The window was expected to last until 19:00 UTC. Due to unexpected issues, the mailing list server was offline and experiencing errors until roughly 21:00. This outage was contributed towards by a mailman configuration patch (merged month+ ago) causing an unexpected issue at time of mailman service restart. Tracking down and fixing this issue was a team effort with Rob, JohnFLewis (volunteer), & Daniel.

The 'outage' was actually large scale moderation of all messages sent to lists, even by list members. All messages should be able to be sent onward by list moderators.


  • 17:04 Mailman maintainance window started per T99098
  • ~18:00: Rob finishes first of two changes, notices some odd errors on mailman restart (should have looked closer.)
  • 19:00 Changes per T95195 & T99136 are completed. Testing of changes results in discovery of some/all mailing list messages being held for moderation (even when sender is member of list.)
  • 19:00-21:00: Troubleshooting of all steps taken during maintainance window. John discovers old patchset, and livehack testing determines solution. Patchset reverted and pushed live.


  • lack of central logging (only roots can troubleshoot logs for sodium); should mailman logs route to central logging?
  • configuration changes need to have mailman restarts at time of change
  • unrelated: someone hacked up the mbox file on wiki-research-l and then didnt rebuild the archives. Once we did so today, it resulted in renumbering, which would have been best caught at the time of the introduction of said renumbering.


  • Ensure ALL configuration changes are tested and production service is restarted at time of configuration change.
  • DONE via [1]
  • This may be a good case for a mailman-admins group, similar to other service groups.