A month ago, I re-evaluated the use of SPF as an anti-spam measure and found it wanting. Today, I decided to re-evaluate my stance on the various real-time blackhole lists that exist. I was relunctant to use an RBL because of over-aggresive classification for even the smallest of infractions could lead to false positives (wanted email being rejected as spam). It has been over a decade since I first rejected the idea, and I was curious to see just how it would all shake out.
I used the Wikipedia list of RBLs as a starting point, figuring it would be pretty up-to-date. I then dumped information from my greylist daemon. The idea is to see how much additional spam would be caught if, after getting a “GO!” from the greylist daemon, I do a RBL check.
Out of the current 2,830 entries, only 145 had not been whitelisted. I didn't filter these out before running the test, but I don't think it would throw off the results too much. Half an hour of coding later, and I had a simple script to query the various RBLs for each unique IP address (1,446). I let it run for a few hours, as it had quite a few queries to make (1,446 IP addresses, each one requiring one query to see if the IP address is a known spammer, and a possible second one for the reason, across 45 RBL servers—it took awhile).
First up, how many “spam” results did I get from each RBL:
As you can see,
some of them were not worth querying.
it's not straightforward to use that server
as it always sent back a result even when the others did not.
I ultimately decided that any result that only had a “hit” from
list.quorum.to to be “non-spam” because of the issues.
I then proceeded to pour through all 2,830 results.
|Marked as SPAM||2739||97%|
|Not marked as SPAM||91||3%|
And out of the 91 that was not marked as spam, only 7 were spam not marked by any of the RBLs. Not bad. But the real test is false positives—email marked as spam that isn't. And unfortunately, there were a few:
Anyway, that's a problem for me. I will occasionally have issues with the greylisting in some cases (rare, but it does happen, and I have to explicitely authorize the email when I become aware of the issue) but it's even worse with this. For instance:
It's hit-or-miss within the IP range Facebook uses to send email. This would make troubleshooting quite difficult. I could whitelist the problematic domains but for any new site I might want to receive email from, I would have to watch the logs very closely for issues like this. But it's not as bad as I thought it would be, and it would cut out a lot of the spam I do get. It's tempting.
I shall have to think about this.