Introduction to Procmail

http://www.iki.fi/era/mail/procmail-presentation.html
$Id: procmail-presentation.prep,v 1.14 2000/09/18 09:53:45 era Exp $

This is a short presentation of Procmail. It is intended for use as a handout in conjunction with a performance of some sort (preferrably with a screen projector and a nice Unix prompt) but you may not enjoy it less without the performance ...

Prerequisites: A little bit of shell programming and a willingness to experiment.

This is strictly from an end-user perspective; mail administration on a system-wide scale is totally outside the scope of this presentation.

What is Procmail?

Procmail is a tool for processing mail. It can be used to dispatch your mail or run a script on incoming messages which match a certain pattern. Most frequently it's probably used for rejecting spam and setting up various autoresponders. But it's an extremely versatile tool which isn't even restricted to use with mail (although you have to have a taste for the curious to really appreciate the other possibilities).

The orientation of this presentation is to give an overview, with glances at some of the more powerful features of Procmail. Pointers to self-study material will be given along the way.

Overview

Here's a quick breakdown of today's topics.

Starting points -- questions

Alcohol is the answer, but I can't remember the question

If you're just curious, here are some questions which could be answered with "Procmail". If you never ask yourself these types of questions, perhaps Procmail is not for you.

Starting points -- answers (maybe)

Procmail might be the answer if

Certainly some canned solutions are available, so if your needs are simple and straightforward, you can probably just copy somebody's existing Procmail files and take it from there.

If something goes wrong, you probably need to understand something about files and the Unix file permission system, as well as be able to ask your system administrator moderately intelligent questions.

Procmail terminology: Sorting, filtering, forwarding

To return to our questions above, here's how Procmail can help you with the problems they were about.

Procmail terminology (continued)

Mail terminology: MTA / MDA / MUA

It's beneficial to understand how Procmail fits into a larger picture before we look at how to do things in practice.

Here's a standard reference model of a mail system. It has three fairly independent components:

` , ` , ` , ` |  Remote MTA  |, ` , ` , ` , ` , ` ,
` , ` , ` , ` +--------------+, ` , ` , ` , ` , ` ,
` , ` , ` , ` , ` , ||  ` , ` , ` , ` , ` , ` , ` ,
` , ` , Internet` ,\  / ` , ` , ` , ` , ` , ` , ` ,
` , ` , ` , ` , ` , \/  ` , ` , ` , ` , ` , ` , ` ,
` , ` , ` , ` +--------------+, ` , ` , ` , ` , ` ,
--------------|     MTA      |----------------------
              +--------------+
               ||          ||
               \/          \/               +-----+
             +-----+     +-----+            | MUA |<----- user
             | MDA |     | MDA |            +-----+
             +-----+     +-----+               ^
	        |           |                  |
mailbox file <--+           +--> other types of delivery? (IMAP?)
These are the Mail Transport Agent, the Mail Delivery Agent, and the Mail User Agent. Each one of these has a distinct role.

Mail terminology (continued)

Here's a brief discussion of the acronyms on the previous slide:

Mail Transport Agent (MTA) Sendmail, ZMailer, Qmail, Postfix
This program is responsible for keeping track of how to move mail across the Internet and to other networks (X.500, BITNET, etc).
Mail Delivery Agent (MDA) (Sendmail), Filter, Procmail; deliver(8)
This program receives mail from the MTA and from local users and takes care of storing it where the recipient wants it.
Mail User Agent (MUA) Pine, Elm, Mutt; Netscape; MH; Emacs RMAIL, Gnus; mailx(1)
A user interface to the mail system, used for reading, writing, and manipulating mail interactively.

The standard model has a few more elements which are however less central to this discussion. The Zmailer docs have a more complete picture as well as a broader discussion.

A corollary to this picture is that Procmail is basically useful only for processing incoming mail, although if your outgoing mail processing has hooks to invoke a script on outgoing mail, Procmail will work perfectly for that as well. But such hooks are not standard in mainstream mail systems today.

(A note for the curious: Sendmail is properly a MTA but it has built-in delivery functions and you can use it from the command line to send mail, so it's even a primitive MUA.)

Invoking Procmail

In practice, Procmail is rarely invoked from the Unix prompt, except for testing your Procmail settings. By its nature, it's an autonomous program which is intended to be run by your MTA (Sendmail, Qmail, ...) each time you receive a new mail message. Procmail is then responsible for delivering the message. Its power comes from the large flexibility it offers over regular MDAs, which typically only handle simple delivery to a given file (/var/spool/mail/era or an IMAP folder somewhere or something like that).

With Sendmail, Procmail can be installed as everybody's MDA (site-wide installation) but if that's not the case, Sendmail lets each user run arbitrary delivery scripts via the $HOME/.forward file.

The "standard" .forward file for invoking Procmail is fairly complicated, but people often just copy and paste it from the manual page. Remember to change the last part to your own login name!

(The manual page is different on different sites, but the usual variations are explained in http://www.iki.fi/era/procmail/mini-faq.html#forward in some [too much?] detail.)

Invoking Procmail (continued)

Regardless of how exactly Procmail is invoked, it goes through something like the following:

If you have no recipe files, it should work exactly as if Procmail wasn't there.

(This also means that you can figure out whether or not you need a .forward: If strange things start happening when you create $HOME/.procmailrc, Procmail is already being invoked without your knowledge.)

Note that both Sendmail and Procmail are quite easily offended when it comes to who is allowed to read and write these files. As a rule, you should always take care that your home directory and your dot files are not writable by anybody else. (Whether you make them readable is up to you. Sendmail usually runs under your own user ID by the time it reads any dot files of yours, so these files should not need to be world-readable.)

Anatomy of a mail message

Before we can start tackling mail handling, we also need to talk about the parts of a mail message. These are what Procmail fundamentally works with to decide how to deliver a message, guided by the rules you write.

A mail message is divided into headers and body. The headers contain address and route information and the Subject line. The body is the actual text of the message.

The body is completely free-form, while the headers are fairly rigid. Each header consists of a keyword (possibly several tokens with hyphens between them) followed by a colon, and then the value of the field. If the value is long, it may be split over many lines so that each continued line starts with a space or tab character.

(There's a single empty line after the header, with no space or tab on it, which separates the header from the body; this is occasionally called the neck.)

Anatomy of a mail message (continued)

Here's a minimal example message:

	From era  Fri Jul 14 09:29:20 2000
	Received: from schildt.ling.helsinki.fi by stoker.lingsoft.fi
	Received: from localhost by schildt.ling.helsinki.fi
	From: era eriksson <era@iki.fi>
	To: era eriksson <reriksso@lingsoft.fi>
	Subject: Finally something even you can understand!
	Date: Thu, 13 Jul 2000 20:42:12 +0300 (EET DST)
	Message-Id: <moomania-03945@schildt.ling.helsinki.fi>

	Moo.

	The interesting headers appear highlighted.
Many MUA:s will hide away a lot of the headers, because many of them are extremely uninteresting unless you try to debug the mail system itself.

The very first line, the "From" without a colon, isn't part of the headers really. It's a separator string which marks the beginning of a new message in this particular file storage format. This is stored as a "Berkeley" or "mbox" message, which is probably the most widespread format in use today. (If you have your mail in /var/spool/mail/you it's probably in Berkeley format.)

Anatomy of a mail message (continued)

Note that the order of headers is by and large insignificant. From, To, and Subject can occur in any order, and there may be other headers in between.

Incidentally, Procmail does not make any particular assumptions about the contents of the headers (and neither should your recipes make such assumptions, hopefully). In other words, Procmail doesn't care if the headers contain 8-bit (or 16-bit or 32-bit) data -- in violation of RFC822 -- nor does it try to decode or canonicalize or otherwise modify input data in any way (though you can certainly implement recipes to achieve such canonicalizations if you want them). Similarly, MIME headers and data are fundamentelly just ASCII, and you can write regular expressions to identify the "magical" MIME stuff just as well as any other predictably formatted strings. (However, MIME is a complex beast, and covering all possible variants of how MIME may validly encode the same message in different ways is more frustration than fun.)

Recipe syntax

The rules for what to deliver where are called recipes. The syntax is vaguely related to shell scripts in some ways, but basically, the formalism is completely unlike anything else. It takes some getting used to, but it's fairly succinct and suprisingly simple to understand once you get used to it.

In abstract terms, each recipe has a mode, some conditions (optionally), and an action.

If the conditions are left out, the recipe is unconditional, obviously.

Recipes are read from top to bottom. The first delivering recipe terminates the delivery process (unless you specify otherwise with the mode flags).

Conditions are typically regular expressions, although there are other interesting possibilities as well.

Incidentally, if you have read this far without losing interest, but you don't know what a regular expression is, you'd do well to read a basic Unix book. It will tell you what you need to know in order to get started with Procmail.

Recipe syntax (continued)

Here's a fairly standard simple .procmailrc file.

	SHELL=/bin/sh   # Good habit to always have this
	MAILDIR=$HOME/Mail
	LOGFILE=$HOME/Mail/procmail.log

	:0:
	* ^Subject: test
	testing
The first three lines are variable assignments, not recipes. The following chunk of three lines is a simple recipe.

The recipe has a condition, which says that the action should be taken only if the message being processed has a header matching the regular expression

	^Subject: test
The action is to save the message to the mailbox file (or "folder") testing. This file doesn't have an absolute pathname, so it will be created in the directory named in the MAILDIR variable.

Recipe syntax (continued)

Here's that recipe again:

	:0:
	* ^Subject: test
	testing	
Notice the following details:

Flags

The colon line can carry a lot more information than just a single colon. This is probably the hardest part of Procmail to learn because the flags are single-letter options -- sometimes with no meaningful mnemonic -- although they can have a quite significant impact on what a recipe actually does.

Here are some useful flags:

B
Match against the body of the message instead of the headers.
HB or BH
Match against both the headers and the body.
D
Case is significant when matching. (Default is don't care. Mail headers can be mixed case according to RFC822 so normally you shoot yourself in the foot if you treat case as significant.)
c
Clone: Even if this recipe is a delivering one, continue processing as if the message was still undelivered.
This is the key to doing more than one thing to a message.
f
Modify ("filter") the message and continue processing with the modified version. (This is "filtering" because you can run filtering pipelines like tr A-Z a-z to turn all text into lowercase, for example.) This obviously cannot be a delivering recipe.
Note that some of these are uppercase while others are lowercase. The b flag does something else than the B flag.

You get a brief listing of all the flags with procmail -h at the Unix prompt.

Flags (continued)

Here's an example of a real-life recipe combo with a lot of flags:

	:0fhw   # Simplify headers by folding any continued lines back
	* ^[^ 	:]*:[ 	][ 	]|$[ 	]
	| formail -cz

	:0Ahcw: # If the previous recipe matched, save a copy of headers
	$HOME/headers.log

	# Processing continues here even though previous recipe "delivered"
	:0      # Pass this message on to Steve
	! steve@example.com
This also demonstrates the other two types of actions: Feeding a message to an arbitrary shell command (signalled by the pipe character |), and forwarding a message to another address (signalled by the exclamation mark !).

(Decoding the finer points is left as an exercise :-)

Regular expressions

Procmail's regular expression support is fairly good, although it's not quite as supercharged as Perl's. The closest related regex flavor is probably Egrep's, although Procmail has some interesting features (and some not so stunning quirks).

For a beginner, the important thing is to realize that the condition has to match exactly. If you write

	* ^From: fred flinthe
then this condition will not match if there are two spaces after the colon, or if Fred changes his email program's setup so that it reads "Fred F. Flinthe" in the From: field instead, or if the header had been
	From: fredf@stonera.example.net (Fred Flinthe)
all along.

(And don't forget the leading star which signals the start of a condition. That would produce some hilarious / hysterical mystery errors which can be very hard to figure out.)

Regular expressions (continued)

The other source of newbie confusion comes from not realizing that .* (dot star) is the "match anything" wildcard in regex-ese (not just star, and not whatever else that happens to work in INTERCAL or Visual Basic or JavaScript).

... And/or not realizing that * and . (full stop) and [ and ] and ( and ) and ? and all their friends have a special "magical" meaning in regex-ese. So if you want to match any of them literally, you need to put a backslash in front.

The next step after you've learned the absolute basics -- for some users, at least -- is the hypercorrectness stage, where you backslash everything which "looks magical" just to be on the safe side. This is dangerous, too, at least if taken to extremes.

Procmail regex peculiarities

The only really unusual things in Procmail's regex support are the \< and \> "word boundary" operators, which are actually just a shorthand for a character class; the predefined macros ^TO_ and ^FROM_MAILER and ^FROM_DAEMON; and the weird \/ grab operator (highly magical -- pity we don't have real backrefs, though).

Another slightly unusual, but highly convenient feature -- which you will find is quite natural if you're used to working with regular expressions -- is the behavior of the $ and ^ line anchors: they actually match a literal newline if you use them anywhere inside a regular expression.

Finally, there is also a ^^ anchor which is similar to Perl's "beginning/end of search space" operator, which matches on the beginning/end of the header, or body, or whatever you are matching on. (You can match on the values of variables, too.)

That's right, you can match multi-line values with a single regular expression. Here's an example which adds a Secret Subliminal Header to messages with a lot of empty lines (i.e. adjacent newlines) right at the top of the body (only incoming messages, fortunately).

	:0Bfhw     # Look for a lot of empty lines at the top of the body
	* ^^$$$$$
        | formail -a "X-Plode: Poor Netscape luser, I pity you."

Deliveredness

Another concept which is tricky to learn is deliveredness. This is a fundamental thing to understand because it determines which recipes will be activated when.

The first successful delivering recipe will cause Procmail to terminate.
A delivering recipe is like exit 0 in a shell script -- it says "we're done". If that's not what you want to say, you need to know how to phrase your recipe differently.

A recipe with a c flag is never delivering. (Or actually, it will cause Procmail to fork; one copy of Procmail continues processing almost as if the recipe with the c flag hadn't been there. So if you add a c flag to a recipe which wasn't delivering in the first place, you will have two copies of the same message!)

Other than that, any recipe which forwards or stores the message is delivering. Anything else is by definition not delivering; variable assignments, filtering actions (recipes with an f flag), unsuccessful delivery attempts (permission denied, syntax error), recipes whose conditions don't match -- these are not delivering actions, so Procmail will continue processing with the next recipe until it reaches end of file.

(Remember, Procmail finally delivers to whatever the DEFAULT variable is set to if it "falls off" the end of the .procmailrc file.)

Questions revisited, with answers

Here are again those questions we had at the beginning.

Questions revisited (continued)

Questions revisited (continued)

Common pitfalls

Here's some distilled wisdom for you.

Links

There's a lot of material about Procmail out there but it's not very cohesive.

The one single resource you should be aware of if you intend to get serious about Procmail is the mailing list.

Here are some links to get you started:

Procmail Quick Start by Nancy McGough
http://www.ii.com/internet/robots/procmail/qs/
Procmail home page
http://www.procmail.org/
Your one-stop shop for getting the latest version and some essential links.
Searchable archive of the Procmail mailing list
http://www.rosat.mpe-garching.mpg.de/mailing-lists/procmail/

There's been talk of an O'Reilly book about Procmail from time to time but it just never seems to materialize.

(Quite frankly, even I sent them a manuscript proposal once. They just sneered at me. :-)

Links (continued)

In addition, here's a bit of self-promotion:

http://www.iki.fi/era/procmail/
Procmail FAQ. Also has a companion links page which is fairly extensive; includes links to many antispam packages implemented in Procmail. Also links to FAQs about MIME and other general technical information about email.
http://www.iki.fi/era/mail/procmail-debug.html
Another amusing page about getting away with shooting yourself in the foot.
http://www.iki.fi/era/spam/
Assorted spam-related material. Look in any other directory, too. I've got spam material basically everywhere.
http://www.iki.fi/era/rbl/rbl.html
You want this if you're serious about avoiding spam. Works from Procmail too, if your admin won't cooperate.

http://www.iki.fi/era/mail/procmail-presentation.html
$Id: procmail-presentation.prep,v 1.14 2000/09/18 09:53:45 era Exp $