You are browsing a read-only backup copy of Wikitech. The live site can be found at


From Wikitech-static
Jump to navigation Jump to search

Troubleshooting "exim queue warning" alerts

This alert fires when the number of queued emails on an Exim server exceed the defined threshold. Usually about 1000 messages. A few possible causes are:

  • A remote domain who receives a lot of our email is down, and we are queueing messages for redelivery.
  • A user who receives a lot of our email has a problem with their mail inbox (over quota, account removed, etc.) and we are queueing messages for redelivery.
  • Problematic messages are being relayed through our mail system and are being temporarily rejected, causing us to queue valid mail for redelivery (rate limiting, spam prevention, etc.)

To get a better understanding of why this alert happened you can try reviewing the size of the queue on the relevant host(s) and look to see if the problem is localized to a single domain, or if it affects multiple domains. For example:

root@mx1001:~# mailq | exiqsumm -c

Count  Volume  Oldest  Newest  Domain
-----  ------  ------  ------  ------

  881  5700KB      4d      0m
   21   104KB    126d    126d
   16    47KB    259d     11d
   14    35KB    259d     39m
   13    38KB    259d     70d
   11    23KB    259d     18d
    9    21KB    259d     13d

In this case the vast majority of mail in the queue is for Good, now we know the problem is not affecting multiple domains. Now we might wonder how many users at that domain, in this case, are affected. We can get a quick count of deferred messages for a given domain with something like the following

mx1001:~# mailq | grep | grep -v D | sort | uniq -c | sort -n


Ok, from this we can see user is responsible for the vast majority of queued messages. With this info we can then look for a reason in the exim logs. This may take some hunting and pecking through the logs depending on the nature/scope of the problem. Something like the below is a good start and will often give hints

# grep -i $problem_domain /var/log/exim4/mainlog | grep -i error
# for example:

mx1001:~# grep -i /var/log/exim4/mainlog | grep -i error

# example error: []: SMTP error from remote mail server after RCPT TO:<>: 452-4.2.2 The email account that you tried to reach is over quota. Please direct\n452-4.2.2 the recipient to\n452 4.2.2 s27si1505313edm.307 - gsmtp

If errors are present you should see an indication of if they are 4xx (temporary) errors or 5xx (permanent) errors, and a short description of the problem as provided by the recipients mail system.

In the example above we can see a recipient is over quota, and their mail provider is temporarily rejecting messages with a 452 code, so our mail server is queueing them. If the user receives a lot of mail this can push the check over the alert threshold.

show mail queue

exim -bp

flush mail queue

exim -q

force delivery attempt

exim -qf (non-frozen messages)
exim -qff (all messages, frozen or not)

deliver just one specific mail from queue

exim -M [queue-id]

(the queue-id is what you see after the size and before the email address)

search for specific mails in the queue

exiqgrep -f <sender address>
exiqgrep -r <recipient address>
man eqixgrep for more options

count number of mails to a specific recipient

 exiqgrep -cr <recipient address>

remove emails to a specific recipient

exiqgrep -i -r | xargs exim -Mrm

test address routing

On e.g.

exim -bt <address>

cheat sheet