Greylisting spam…

I think I’ve now spent more time in my life than I ever want to think about fighting spam…  I guess that has to do with spending the 20 years having some involvement in SMTP mail systems.  Of course the stint writing the core infrastructure for MailFrontier’s anti-spam engine didn’t hurt my knowledge base.

My current pain is that I run a server that receives lots of spam, of course there is only three users on the whole machine.  I think at last check I was getting over 500 pieces of spam a day.  For years I’ve been reasonably happy with a well trained spamassassin setup, but recently that’s started failing.  One of the technologies that I’ve not deployed for anti-spam is greylisting… So, here’s the notes about my days spent getting a good greylisting system in place.

Initial round:

postgrey – solid perl package, fairly basic in features, did cut down the flow from about 100 missed messages to 40.  Very nice, but in the course of watching it for a few days noticed that a few classes of senders were making it through.  Not it’s fault, but just a failure in greylisting.  Note: Most of these were large clusters of hosts that are hosted at The Planet (ISP know for spam).

greyboa – my quick homebrew system based on postgrey (don’t ask, it was also a project to learn python asyncore/asynchat).  Worked just about as well as postgrey.   Moved it be based on SQLite since in one of the nightly postgrey runs the DB had some corruption which of course crashed the server and now I wasn’t getting any mail… sigh..

sqlgrey – The current experiment, not that I don’t like my code, but I’m going out of town for a few weeks and don’t want things to die while I’m not watching.  Nice things:  Better logging the postgrey/greyboa, it does some good sender (envelope level) matching to notice things like ‘+’ addressing and Y!Groups unique sender patterns.  Which should reduce the retry behavior a bit.

Here’s my random comment ****— I’m convinced that while spam is a problem, there has to be away to solve it…   No, it’s impossible to not have unsolicited messages, but there should be some much better ways to take advantage of the fact that spammers must operate on a shotgun approach to sending messages.  That shotgun leaves a lot of scatter… 

Fundamentally messages fall in a few buckets:

  • Immediate — communication between well known parties, with a long history
  • Unknown  — messages between people who don’t have an established relationship, or little history
  • Junk         — clearly “low priority” messages.

We should be able (greylisting is a good example) of delaying “unknown” messages for “long” periods of time until enough history has built up to re-bucket them into one of the other classes.  That’s the basics of what I’m doing with greylisting, delay a message until RBL/Razor/Pyzor has a chance to build a little history before I delivery it to my mailbox…