You are browsing a read-only backup copy of Wikitech. The live site can be found at wikitech.wikimedia.org

Portal:Toolforge/Admin/Exim

From Wikitech-static
< Portal:Toolforge‎ | Admin
Revision as of 21:18, 19 September 2017 by imported>BryanDavis (some notes related to our last incident)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

Email maintenance

Currently all mail services run on tools-mail. A few pointers:

A diamond collector(ExtendedExim) written for phab:T96898 publishes to: tools.tools-mail.exim.*

Queue Length

Testing outbound email on the Grid

jsub echo -e "Subject: Test message subject\n\nTest message" | /usr/sbin/exim -odf -i <email>

Blocking a sender from sending outbound mail

Edit /etc/exim4/deny_senders.list and add the envelope from address that you want to block on a new line.

$ echo 'wiktcapt@tools.wmflabs.org' | sudo -i tee -a /etc/exim4/deny_senders.list
wiktcapt@tools.wmflabs.org
$ sudo -i cat /etc/exim4/deny_senders.list
# Add MAIL FROM address to block. One per line
wiktcapt@tools.wmflabs.org

Stopping all in/outbound mail

If an incident seems to be occurring, be safe and stop exim before you spend a lot of time digging into the root cause. Having our MX address put on a blacklist for spamming is worse than some downtime.

$ sudo -i puppet agent --disable "Investigating exim incident -- $USER"
$ sudo -i puppet agent -tv   # Check to see that Puppet is actually disabled
$ sudo -i service exim4 stop
$ ps axuww | grep exim       # Did it really stop?

Now you can proceed to investigate the queue without more messages going in or out. SMTP is robust to network segmentation so even leaving things down for a few hours is not a huge problem. Messages will be delivered eventually.

Exim runs as a queue on each exec node

Each exec node (and most other hosts actually) run a local copy of exim that queues messages for outbound delivery via the tools-mail smarthost. Even if you have purged the queue on the smarthost before restarting exim following an incident, there may be messages queued across the grid ready to flood in as soon as the service is back online.

See also