Sunday, April 12, 2015
For me, SPF is not that great an anti-spam measure
I was reminded of SPF the other day. It's an anti-spam measure, primarily to help identify email spoofing. I set up an SPF record on my domain years ago, but other than that, I haven't really done anything else with it. But being reminded of it, I thought it might be a good idea to see just how effective it could be. I'm already using a greylist daemon to cut down on spam but hey, the more spam that is caught, the better.
First step—just how effective is the greylist daemon? I have the logs from the greylist daemon for the past month (March 15th to April 11th). Some processing on the logs and I have my answer:
Emails accepted | 5,028 |
---|---|
Emails rejected | 5,132 |
Total | 10,160 |
Wow!
A bit over 50% of the emails received by my email server are spam. I'm not sure if I should be depressed that easily half the email addressed to me is spam, or happy that the greylist daemon is an easy way to avoid false positives. I suppose both are in order.
Now, I still get spam despite the greylist daemon, but all that means is that the sender is actually bothering to follow the SMTP protocol—not that high a bar. So, of the emails that do get through, how much would get flagged with SPF? Okay, time to check up on the SPF specification, and boy, is that a mess. The grammar to parse SPF records requires backtracking (lovely!—and lest you think a message from 2010 has any relevence to a 2014 standard, think again; the grammar didn't change all that much) and not entirely context free either (sigh—one letter in the macro-expansion has two meanings depending on where it appears).
Oh, and that grammar? It's actually covers three different grammars—one for parsing the SPF record itself, a second one to parse an email header, and the third a secondary text string via a secondary DNS query (the SPF record itself is obtained via a DNS query, by the way).
Okay, so munging the grammar to what I think is intended and leaving out what I don't need, I went through the log file and for each accepted email, did an SPF check according to the specification. Granted, the data I get now might not reflect the results made at the original time, but it should give me a baseline to go by.
For the test, I pulled out all emails accepted (5,028) and removed those I explicitly allowed (for example, accept anything from a given IP address, or from a given domain) or that did not have a sender address (allowed by the SMTP protocol to prevent mailing loops when generating mail bounce messages), leaving me with 4,343 emails. Then, for those, I looked up the SPF record for the given domain, and if it had one, applied its policy.
The 4,343 accepted emails came from 1,000 unique domains, of which only 433 had an SPF record. Okay, 43% of the domains have an SPF record. And of the domains that had an SPF record, only 629 emails accepted were checked. Or 12½% of all accepted incoming emails could be checked via SPF. Sigh.
But of those that were checked via SPF, how did we fare? Were a lot spam? Or were most acceptable forms of email pef SPF policy?
fail | 43 | IP address was not allowed to send this email |
softfail | 53 | IP address should not be sending this email (used for testing) |
neutral | 90 | IP address has no policy |
pass | 443 | IP address is allowed to send this email |
A 70% pass rate for SPF. Only 43, or almost 1% (or around two per day) could have been deleted as spam. Another 53 maybe, possibly, could have been deleted as spam. And 90 no idea one way or the other. Sigh.
You want to know what has a better rate of catching spam than SPF for my email? Any email addressed to my domain registration email not from the registrar. For me, I don't think it worth the effort to implement this.