http://www.iki.fi/era/mail/procmail-presentation.html
$Id: procmail-presentation.prep,v 1.14 2000/09/18 09:53:45 era Exp $
This is a short presentation
of Procmail.
It is intended for
use as a handout
in conjunction with a
performance of some sort
(preferrably with a
screen projector
and a nice Unix prompt)
but you may not
enjoy it less
without the
performance
Prerequisites: A little bit of shell programming and a willingness to experiment.
This is strictly from an end-user perspective; mail administration on a system-wide scale is totally outside the scope of this presentation.
Procmail is a tool for processing mail. It can be used to dispatch your mail or run a script on incoming messages which match a certain pattern. Most frequently it's probably used for rejecting spam and setting up various autoresponders. But it's an extremely versatile tool which isn't even restricted to use with mail (although you have to have a taste for the curious to really appreciate the other possibilities).
The orientation of this presentation is to give an overview, with glances at some of the more powerful features of Procmail. Pointers to self-study material will be given along the way.
Here's a quick breakdown of today's topics.
Alcohol is the answer, but I can't remember the question
If you're just curious, here are some questions which could be answered with "Procmail". If you never ask yourself these types of questions, perhaps Procmail is not for you.
Procmail might be the answer if
Certainly some canned solutions are available, so if your needs are simple and straightforward, you can probably just copy somebody's existing Procmail files and take it from there.
If something goes wrong, you probably need to understand something about files and the Unix file permission system, as well as be able to ask your system administrator moderately intelligent questions.
To return to our questions above, here's how Procmail can help you with the problems they were about.
- How can I keep mailing lists separate from my other mail?
- I get a lot of mail and some of it is always urgent, can I make it come out on top when I open my mail program?
- Mail sorting means saving mail to different folders depending on e.g. the Subject line or the author of each message.
- This means you can divert mailing lists to low-priority folders which you will read when you have the time, and allow only important messages into your regular inbox.
- How can I keep out messages from a certain person?
- This is really just a special case of sorting. Sort them to a very low-priority folder, such as
/dev/null
- I get these automatically generated messages which are in an inconvenient format, can they be fixed?
- Procmail allows you to run a regular Unix filter on any message, again depending on the Subject, author, recipient, priority, or message contents if that's what you want.
- (Filters are programs such as
tr
orsort
-- although those are not very meaningful examples in this context.)- Obviously, if your needs are special, you will need to be able to write the actual filter yourself. Frequently a simple Perl or sed script is all it takes, though.
- How can I send people a certain message automatically if they mail me with a particular Subject line?
- Autoresponders are essentially just a special sort of filtering.
- I get a lot of mail but some of it is not really meant for me. Can I automatically pass it on to my assistant?
- Forwarding is again basically another sort of filter -- a filter which sends on the mail to a new address. Actually Procmail has some nice built-in functions which make this particularly easy and elegant.
It's beneficial to understand how Procmail fits into a larger picture before we look at how to do things in practice.
Here's a standard reference model of a mail system. It has three fairly independent components:
These are the Mail Transport Agent, the Mail Delivery Agent, and the Mail User Agent. Each one of these has a distinct role.` , ` , ` , ` | Remote MTA |, ` , ` , ` , ` , ` , ` , ` , ` , ` +--------------+, ` , ` , ` , ` , ` , ` , ` , ` , ` , ` , || ` , ` , ` , ` , ` , ` , ` , ` , ` , Internet` ,\ / ` , ` , ` , ` , ` , ` , ` , ` , ` , ` , ` , ` , \/ ` , ` , ` , ` , ` , ` , ` , ` , ` , ` , ` +--------------+, ` , ` , ` , ` , ` , --------------| MTA |---------------------- +--------------+ || || \/ \/ +-----+ +-----+ +-----+ | MUA |<----- user | MDA | | MDA | +-----+ +-----+ +-----+ ^ | | | mailbox file <--+ +--> other types of delivery? (IMAP?)
Here's a brief discussion of the acronyms on the previous slide:
deliver(8)
mailx(1)
The standard model has a few more elements which are however less central to this discussion. The Zmailer docs have a more complete picture as well as a broader discussion.
A corollary to this picture is that Procmail is basically useful only for processing incoming mail, although if your outgoing mail processing has hooks to invoke a script on outgoing mail, Procmail will work perfectly for that as well. But such hooks are not standard in mainstream mail systems today.
(A note for the curious: Sendmail is properly a MTA but it has built-in delivery functions and you can use it from the command line to send mail, so it's even a primitive MUA.)
In practice, Procmail
is rarely invoked from
the Unix prompt,
except for testing
your Procmail settings.
By its nature,
it's an autonomous program
which is intended to be run
by your MTA
/var/spool/mail/era
or an IMAP folder somewhere
or something like that).
With Sendmail,
Procmail can be installed
as everybody's MDA
(site-wide installation)
but if that's not the case,
Sendmail lets each user
run arbitrary delivery scripts
via the $HOME/.forward
file.
The "standard" .forward
file
for invoking Procmail
is fairly complicated,
but people often just copy and paste it
from the manual page.
Remember to change the last part
to your own login name!
(The manual page is different on different sites, but the usual variations are explained in http://www.iki.fi/era/procmail/mini-faq.html#forward in some [too much?] detail.)
Regardless of how exactly Procmail is invoked, it goes through something like the following:
/etc/procmailrc
$HOME/.procmailrc
DEFAULT
(usually something like /var/spool/mail/era
)
(This also means that you can figure out
whether or not you need a .forward
:
If strange things start happening
when you create $HOME/.procmailrc
,
Procmail is already being invoked
without your knowledge.)
Note that both Sendmail and Procmail are quite easily offended when it comes to who is allowed to read and write these files. As a rule, you should always take care that your home directory and your dot files are not writable by anybody else. (Whether you make them readable is up to you. Sendmail usually runs under your own user ID by the time it reads any dot files of yours, so these files should not need to be world-readable.)
Before we can start tackling mail handling, we also need to talk about the parts of a mail message. These are what Procmail fundamentally works with to decide how to deliver a message, guided by the rules you write.
A mail message is divided into headers and body. The headers contain address and route information and the Subject line. The body is the actual text of the message.
The body is completely free-form, while the headers are fairly rigid. Each header consists of a keyword (possibly several tokens with hyphens between them) followed by a colon, and then the value of the field. If the value is long, it may be split over many lines so that each continued line starts with a space or tab character.
(There's a single empty line after the header, with no space or tab on it, which separates the header from the body; this is occasionally called the neck.)
Here's a minimal example message:
Many MUA:s will hide away a lot of the headers, because many of them are extremely uninteresting unless you try to debug the mail system itself.From era Fri Jul 14 09:29:20 2000 Received: from schildt.ling.helsinki.fi by stoker.lingsoft.fi Received: from localhost by schildt.ling.helsinki.fi From: era eriksson <[email protected]> To: era eriksson <[email protected]> Subject: Finally something even you can understand! Date: Thu, 13 Jul 2000 20:42:12 +0300 (EET DST) Message-Id: <[email protected]> Moo. The interesting headers appear highlighted.
The very first line, the "From" without a colon,
isn't part of the headers really.
It's a separator string
which marks the beginning of a new message
in this particular file storage format.
This is stored as a "Berkeley" or "mbox" message,
which is probably the most widespread
format in use today.
(If you have your mail in /var/spool/mail/you
it's probably in Berkeley format.)
Note that the order of headers is by and large insignificant. From, To, and Subject can occur in any order, and there may be other headers in between.
Incidentally, Procmail does not make any particular assumptions about the contents of the headers (and neither should your recipes make such assumptions, hopefully). In other words, Procmail doesn't care if the headers contain 8-bit (or 16-bit or 32-bit) data -- in violation of RFC822 -- nor does it try to decode or canonicalize or otherwise modify input data in any way (though you can certainly implement recipes to achieve such canonicalizations if you want them). Similarly, MIME headers and data are fundamentelly just ASCII, and you can write regular expressions to identify the "magical" MIME stuff just as well as any other predictably formatted strings. (However, MIME is a complex beast, and covering all possible variants of how MIME may validly encode the same message in different ways is more frustration than fun.)
The rules for what to deliver where are called recipes. The syntax is vaguely related to shell scripts in some ways, but basically, the formalism is completely unlike anything else. It takes some getting used to, but it's fairly succinct and suprisingly simple to understand once you get used to it.
In abstract terms, each recipe has a mode, some conditions (optionally), and an action.
If the conditions are left out, the recipe is unconditional, obviously.
Recipes are read from top to bottom. The first delivering recipe terminates the delivery process (unless you specify otherwise with the mode flags).
Conditions are typically regular expressions, although there are other interesting possibilities as well.
Incidentally, if you have read this far without losing interest, but you don't know what a regular expression is, you'd do well to read a basic Unix book. It will tell you what you need to know in order to get started with Procmail.
Here's a fairly standard
simple .procmailrc
file.
The first three lines are variable assignments, not recipes. The following chunk of three lines is a simple recipe.SHELL=/bin/sh # Good habit to always have this MAILDIR=$HOME/Mail LOGFILE=$HOME/Mail/procmail.log :0: * ^Subject: test testing
The recipe has a condition, which says that the action should be taken only if the message being processed has a header matching the regular expression
The action is to save the message to the mailbox file (or "folder")^Subject: test
testing
.
This file doesn't have an absolute pathname,
so it will be created in the directory
named in the MAILDIR
variable.
Here's that recipe again:
Notice the following details::0: * ^Subject: test testing
The colon line can carry a lot more information than just a single colon. This is probably the hardest part of Procmail to learn because the flags are single-letter options -- sometimes with no meaningful mnemonic -- although they can have a quite significant impact on what a recipe actually does.
Here are some useful flags:
B
HB
or BH
D
c
f
tr A-Z a-z
to turn all text into lowercase, for example.)
This obviously cannot be a delivering recipe.
b
flag does something else than the B
flag.
You get a brief listing of all the flags with procmail -h
at the Unix prompt.
Here's an example of a real-life recipe combo with a lot of flags:
This also demonstrates the other two types of actions: Feeding a message to an arbitrary shell command (signalled by the pipe character:0fhw # Simplify headers by folding any continued lines back * ^[^ :]*:[ ][ ]|$[ ] | formail -cz :0Ahcw: # If the previous recipe matched, save a copy of headers $HOME/headers.log # Processing continues here even though previous recipe "delivered" :0 # Pass this message on to Steve ! [email protected]
|
),
and forwarding a message to another address
(signalled by the exclamation mark !
).
(Decoding the finer points is left as an
exercise :-)
Procmail's regular expression support is fairly good, although it's not quite as supercharged as Perl's. The closest related regex flavor is probably Egrep's, although Procmail has some interesting features (and some not so stunning quirks).
For a beginner, the important thing is to realize that the condition has to match exactly. If you write
then this condition will not match if there are two spaces after the colon, or if Fred changes his email program's setup so that it reads "Fred F. Flinthe" in the From: field instead, or if the header had been* ^From: fred flinthe
all along.From: [email protected] (Fred Flinthe)
(And don't forget the leading star which signals the start of a condition. That would produce some hilarious / hysterical mystery errors which can be very hard to figure out.)
The other source of newbie confusion
comes from not realizing that .*
(dot star)
is the "match anything" wildcard in regex-ese (not just star,
and not whatever else that happens to work in INTERCAL or
Visual Basic or JavaScript).
... And/or not realizing that *
and .
(full stop) and [
and ]
and (
and )
and ?
and all their friends
have a special "magical" meaning in regex-ese.
So if you want to
match any of them literally,
you need to put a backslash
in front.
The next step after you've learned the absolute basics -- for some users, at least -- is the hypercorrectness stage, where you backslash everything which "looks magical" just to be on the safe side. This is dangerous, too, at least if taken to extremes.
The only really unusual things
in Procmail's regex support
are the \<
and \>
"word boundary" operators,
which are actually just a
shorthand for a character
class;
the predefined macros
^TO_
and
^FROM_MAILER
and ^FROM_DAEMON
;
and
the weird \/
grab
operator (highly magical
-- pity we don't have
real backrefs, though).
Another slightly unusual,
but highly convenient
feature --
which you will find is
quite natural
if you're used to working
with regular expressions
-- is the behavior of the
$
and ^
line anchors: they actually
match a literal newline
if you use them anywhere
inside a regular expression.
Finally,
there is also a ^^
anchor which is similar to
Perl's "beginning/end of search space"
operator, which matches on the
beginning/end of the header,
or body, or whatever you are
matching on.
(You can match on the values
of variables, too.)
That's right, you can match multi-line values with a single regular expression. Here's an example which adds a Secret Subliminal Header to messages with a lot of empty lines (i.e. adjacent newlines) right at the top of the body (only incoming messages, fortunately).
:0Bfhw # Look for a lot of empty lines at the top of the body * ^^$$$$$ | formail -a "X-Plode: Poor Netscape luser, I pity you."
Another concept which is tricky to learn is deliveredness. This is a fundamental thing to understand because it determines which recipes will be activated when.
The first successful delivering recipe will cause Procmail to terminate.A delivering recipe is like
exit 0
in a shell script -- it says "we're done".
If that's not what you want to say,
you need to know how to phrase your recipe
differently.
A recipe with a c
flag
is never delivering.
(Or actually, it will cause Procmail
to fork; one copy of Procmail
continues processing
almost as if the recipe with the
c
flag hadn't been there.
So if you add a c
flag
to a recipe which wasn't delivering
in the first place, you will have
two copies of the same message!)
Other than that, any recipe
which forwards or stores the message
is delivering. Anything else
is by definition not delivering;
variable assignments,
filtering actions
(recipes with an f
flag),
unsuccessful delivery attempts
(permission denied, syntax error),
recipes whose conditions don't match
-- these are not delivering actions,
so Procmail will continue processing
with the next recipe until it
reaches end of file.
(Remember, Procmail finally
delivers to whatever
the DEFAULT
variable
is set to if it "falls off"
the end of the .procmailrc
file.)
Here are again those questions we had at the beginning.
:0: * ^From [email protected]\.funet\.fi kt-info :0: # two conditions: both must be met * ^From [email protected]\.com * ^Subject:.*transmeta visit URGENT.linus # (of course it's "torvalds", but don't tell the spammers)
From
pseudoheader
is often fairly reliable for this sort
of thing; mailing lists frequently also add
a List-Info:
or similar header.
:0 * ^From aila\[email protected]\.fi /dev/null
/dev/null
is a very blunt solution
and very often not the Right Thing.
But for killing off a flood
of spew from a looping
mail relay, it's a good tool.
:0c # Copy to secretary * ^From.*@microsoft\.com * ^Subject:.*handoff ! secretary :0: # Store in a Safe Place. Clean out before Christmas forwarded.ms
procmailex
manual page
has an assortment of autoresponder
examples, including a souped-up
vacation(1)
clone
and a simple ftp-by-mail server.
:0BDbfw # remove that #[email protected]! annoying Yahoo footer * ^Do You Yahoo!\?$ | sed -e 's/^--$/,$d' # bug: trims starting from the FIRST "--"
Here's some distilled wisdom for you.
config.h
and act accordingly.)
<strong><large><blink>
NOT
</blink></large></strong>
a mail transport agent.
If your ISP delivers all the mail
for your private little domain
into a large mailbox
and you are trying to split it
back into mom's mail and pop's mail
and your own mail and your dog's mail,
There Will Be Grief.
Just find a better ISP.
See the manual for theunix$ formail -s procmail <large.mbox
formail
companion utility for some other nifty tricks.
It's a very useful mail manipulation tool
in its own right.
There's a lot of material about Procmail out there but it's not very cohesive.
The one single resource you should be aware of if you intend to get serious about Procmail is the mailing list.
Here are some links to get you started:
There's been talk of an O'Reilly book about Procmail from time to time but it just never seems to materialize.
(Quite frankly,
even I sent them
a manuscript proposal
once.
They just sneered
at :-)
In addition, here's a bit of self-promotion: