procmail and perlI've reproduced Joe's e-mail here (with his permission). I was casting about for something better than the usual u.*x digestive noise options, but Joe's e-mail illustrates why those are still with us. His hack works; so much for elegance!! I still have hopes for a stream processor which understands the logical structure of e-mails, but hey it's not like anybody seriously pays me for writing about procmail... (send me a check; what do you want me to write?).
I would call both of these "serious hacks". I have hyperlinked some explanations of the technical issues; I did that for you nontechnical folk, I know you visit, although I don't have much of a clue why.
> I am going to be investigating what sort of filtering/rewriting options are > available to disable this sort of thing. Procmail will catch it, but won't > rewrite it. Procmail can catch and can most certainly rewrite the message body in a way that disables most of the HTML junk you may receive...
# In it's simplest form, this might do it for you:
:0 HB
*text/html
{
:0 fw
|/bin/sed 's/text\/html/text\/plain/'
}
Of course, you could rewrite that to modify almost any text string, including looking for img src tags and altering them. You might start with the following rudimentary recipe...
# Long lines ahead
# [ ]+ within the brackets is a space followed by a tab.
# Replace all IMG SRC tags with <removed image> tags
:0 HB
*\<\/((*)?)IMG([ ]+)SRC.([^>]+)
{
:0fw
|/usr/bin/perl -p -e 's/\<((.+)?)IMG([ ]+)SRC.([^>]+)\>/\<removed image\>/gi;'
}
As you can see - there are several different ways to alter the body of an email using procmail (combined with perl, sed, and awk, procmail becomes quite robust). > Is qmail capable of rewriting the message body? We don't utilize qmail here, so we're not certain if it can or can't. > I could write a small c program to do it: do both of you have c > compilers on your systems? Yes, both C and C++ compilers are available on the shell. > I have a parser toolkit already written in Java: do I have the > capability to run a Java application from the command line on your > systems? Java is not installed (nor will it be) on our shell server. > Any other thoughts, observations, comments, etc.? The easiest way to disable most of the HTML bugs and nasties you might receive, without having to resort to fancy and imperfect filtering, is to use an email program that does not render (or only partially renders) HTML - like Pine. (A *Nix mail client, although a PC-Pine version is available for Windows) It can render HTML as rich text but can't do images or Java/Javascript/ActiveX/Cookies that can used to play with your inbox.
After this page went up, Joe took a look at it and my trick for testing perl recipes on the command line and had this to say:
The perl resource utilization wouldn't be too bad for most inboxes,
depending of course on how many emails with images in them you get since
the rule doesn't "fire" unless it finds one.
You mention testing your procmailrc rules with perl. Perl and Procmail
(usually) use different regex libraries, so a regular expression match in
procmail won't necessarily match in perl and vice-versa. The best way to
check a procmailrc file is to use procmail itself. I didn't see this
mentioned elsewhere on your guide, and it's definitely handy for tweaking
rules...
Set up your test recipe within a separate file from your main
.procmailrc. Name it 'testing.rc' or something similar. Here's a sample:
---snip----
LOGFILE=$HOME/testing.log
MATCHFOLDER=$HOME/testing.matched
VERBOSE=off
DEFAULT=/dev/null
LOGABSTRACT=no
CRLF="
"
# Sample test recipe
:0
*^From(.+)joe@blarg\.net
{
LOGABSTRACT=yes
LOG="This email is from Joe. Joy and Happiness abound!"
LOG="$CRLF"
:0
$MATCHFOLDER
}
---snip----
Have your test recipe save matching emails to your test folder and delete
any non-matching emails. (hence the DEFAULT=/dev/null assignment)
Then from the shell, invoke your test recipe with the following command:
formail -s procmail -m testing.rc < $HOME/mail/spam
This invokes 'formail', which will take your test mail folder (like one of
my Spam folders), split out each individual email from the folder, and
process that email using your testing.rc file. Emails that match will be
logged in 'testing.log' and stored in 'testing.matched'. Emails that pass
through the recipe are discarded and not logged. This way when you're done
fine-tuning your recipe you know for certain that it's going to work the
way you intend it to.
That is a cool test jig, and useful for testing all procmailrc files.
-- Joe "support is my life" @blarg.net Blarg! Online Services, Inc.