The Boston Diaries

The ongoing saga of a programmer who doesn't live in Boston, nor does he even like Boston, but yet named his weblog/journal “The Boston Diaries.”

Go figure.

Sunday, April 12, 2015

For me, SPF is not that great an anti-spam measure

I was reminded of SPF the other day. It's an anti-spam measure, primarily to help identify email spoofing. I set up an SPF record on my domain years ago, but other than that, I haven't really done anything else with it. But being reminded of it, I thought it might be a good idea to see just how effective it could be. I'm already using a greylist daemon to cut down on spam but hey, the more spam that is caught, the better.

First step—just how effective is the greylist daemon? I have the logs from the greylist daemon for the past month (March 15th to April 11th). Some processing on the logs and I have my answer:

Unique emails processed by the greylist daemon
Emails accepted5,028
Emails rejected5,132
Total10,160

Wow!

A bit over 50% of the emails received by my email server are spam. I'm not sure if I should be depressed that easily half the email addressed to me is spam, or happy that the greylist daemon is an easy way to avoid false positives. I suppose both are in order.

Now, I still get spam despite the greylist daemon, but all that means is that the sender is actually bothering to follow the SMTP protocol—not that high a bar. So, of the emails that do get through, how much would get flagged with SPF? Okay, time to check up on the SPF specification, and boy, is that a mess. The grammar to parse SPF records requires backtracking (lovely!—and lest you think a message from 2010 has any relevence to a 2014 standard, think again; the grammar didn't change all that much) and not entirely context free either (sigh—one letter in the macro-expansion has two meanings depending on where it appears).

Oh, and that grammar? It's actually covers three different grammars—one for parsing the SPF record itself, a second one to parse an email header, and the third a secondary text string via a secondary DNS query (the SPF record itself is obtained via a DNS query, by the way).

Okay, so munging the grammar to what I think is intended and leaving out what I don't need, I went through the log file and for each accepted email, did an SPF check according to the specification. Granted, the data I get now might not reflect the results made at the original time, but it should give me a baseline to go by.

For the test, I pulled out all emails accepted (5,028) and removed those I explicitly allowed (for example, accept anything from a given IP address, or from a given domain) or that did not have a sender address (allowed by the SMTP protocol to prevent mailing loops when generating mail bounce messages), leaving me with 4,343 emails. Then, for those, I looked up the SPF record for the given domain, and if it had one, applied its policy.

The 4,343 accepted emails came from 1,000 unique domains, of which only 433 had an SPF record. Okay, 43% of the domains have an SPF record. And of the domains that had an SPF record, only 629 emails accepted were checked. Or 12½% of all accepted incoming emails could be checked via SPF. Sigh.

But of those that were checked via SPF, how did we fare? Were a lot spam? Or were most acceptable forms of email pef SPF policy?

Results of applying SPF policy against incoming email
fail43IP address was not allowed to send this email
softfail53IP address should not be sending this email (used for testing)
neutral90IP address has no policy
pass443IP address is allowed to send this email

A 70% pass rate for SPF. Only 43, or almost 1% (or around two per day) could have been deleted as spam. Another 53 maybe, possibly, could have been deleted as spam. And 90 no idea one way or the other. Sigh.

You want to know what has a better rate of catching spam than SPF for my email? Any email addressed to my domain registration email not from the registrar. For me, I don't think it worth the effort to implement this.

Obligatory Picture

An abstract representation of where you're coming from]

Obligatory Contact Info

Obligatory Feeds

Obligatory Links

Obligatory Miscellaneous

You have my permission to link freely to any entry here. Go ahead, I won't bite. I promise.

The dates are the permanent links to that day's entries (or entry, if there is only one entry). The titles are the permanent links to that entry only. The format for the links are simple: Start with the base link for this site: https://boston.conman.org/, then add the date you are interested in, say 2000/08/01, so that would make the final URL:

https://boston.conman.org/2000/08/01

You can also specify the entire month by leaving off the day portion. You can even select an arbitrary portion of time.

You may also note subtle shading of the links and that's intentional: the “closer” the link is (relative to the page) the “brighter” it appears. It's an experiment in using color shading to denote the distance a link is from here. If you don't notice it, don't worry; it's not all that important.

It is assumed that every brand name, slogan, corporate name, symbol, design element, et cetera mentioned in these pages is a protected and/or trademarked entity, the sole property of its owner(s), and acknowledgement of this status is implied.

Copyright © 1999-2024 by Sean Conner. All Rights Reserved.