Sunday, May 10, 2015
Eight years of greylisting
I've been hacking on my greylist daemon over the past few days.
I'm not sure what,
exactly,
prompted me to start hacking away at it though.
The last code change was in December of 2011—all code changes since then have been tweaks to the Makefile
(the file that describes how to build the program).
As I'm hacking on it,
I've come to hate the code handing the protocol the components use for communications
(there's the main component that manages the data and logic;
there's the component that interfaces with sendmail
and another one that interfaces with postfix
).
And over the past few days, I've reflected over what I would do differently if I were to write the greylist daemon now and how well my decisions eight years ago held up.
One decision I made eight years ago was to write my own “key/value” store instead of using a pre-existing one. I rejected outright the use of an SQL database engine (like MySQL or PostgreSQL) and I don't think I would change my mind now. The data stored is short lived (six hours for most entries, otherwise thirty-six days) and I don't think such churn is good for database engines.
In addition,
the only NoSQL based solution (as they're now called) at the time was memcached
(written in 2003; redis
wasn't released until 2009,
two years after I released the greylist daemon).
memcached
(and redis
) can expire entries automatically,
and it could handle five out of the six lookups the greylist daemon makes.
The one lookup neither one can handle
(as far as I can tell) is the IP address lookup.
This lookup compares the IP address of the sending SMTP server against a list. The list describes an address range and what to do if the given IP address “matches.” For example:
0.0.0.0/0 GREYLIST 205.211.164.50/32 ACCEPT 206.214.64.0/19 REJECT 207.115.11.0/26 ACCEPT
If, say, the IP address is 207.115.11.8, then the email is accepted and further processing is skipped because of the matching rule:
207.115.11.0/26 ACCEPT
An IP address of 206.214.64.10 is rejected, because of the matching rule:
206.214.64.0/19 REJECT
An address like 66.252.224.242 will match the rule
0.0.0.0/0 GREYLIST
and because the result is GREYLIST
,
futher checks are made.
There does not appear to be a way of handling this type of query using memcached
or redis
.
I would have to write code to store the IP addresses anyway.
Also,
memcached
is a pure memory cache—if it crashes,
all the data goes away
(and remember—at the time I wrote this,
this was really the only key/value store that existed that wasn't an SQL database engine)
which is something I didn't want to happen.
So my decision at the time to write my own key/value store wasn't a bad one.
Today?
Today I might consider using redis
to store what I could,
but it's another component that,
if it isn't available,
I can't greylist incoming email
(I have to allow the email in—fail safe and all that).
Also,
the code I wrote to store the non-IP address data was easy to write.
I dunno.
It's hard to say how I would store the data today.
The protocol between my components is something I would handle completely differently today. I can't say what the actual protocol would be though.
There are basically two methods of sending data—a series of values in a fixed order (which is how the protocol works today) or as a series of tagged values, which can appear in any order. The former doesn't really deal well with optional data (you end up tagging such values anyway) while the later is harder to parse (since the values aren't in a fixed order, you have to deal with missing values in addition to duplicate values).
The biggest issue I have with the protocol now is what I said above, the code that handles the protocol is a mess—it's all over the place instead of in a few isolated routines. That makes updating the protocol (say, adding new fields, fixed or optional) very difficult.
What I would do now is make the protocol handing portion more of a module—a module for version 1.0 of the protocol, a module for version 1.1 of the protocol, etc., load them all up, and based upon the version embedded in the packet (something I do have, by the way), farm out the processing to the proper protocol version module. It would make updating the protocol easier to deal with in the codebase. The lack of this approach to the protocol is, I think, the biggest problem with the codebase today.
One last aspect I would change is the logging of verious statistics,
or “key performance indicators” as they are called in the industry.
Instead of incrementing a bunch of variables in the codebase and every so often dumping them out to syslog
(messy code requiring the use of signals and all the problems that entail,
and several lines of code modified for every new KPI added)
I would use the method they use at Etsy—statsd
—or at least,
my own take on it.
I don't need the full blown “all-singing, all-dancing, all-graphing” statsd
that Etsy developed but one that just logs to syslog()
.
And given the whole concept is easy,
a small version that just logs to syslog()
is pretty trivial to write
(I wrote a version in Lua with 225 lines of code,
and a full quarter of that is just parsing the command line).
The nice thing about a statsd
-like concept is that it is trivial to add new KPIs to the codebase,
and they're logged automatically without any other changes.
The logging and potentially resetting of values is all isolated in statsd
,
in the way that log messages are logged to files or forwarded to another server is isolated in syslog
.
There's not anything else I would really modify in the greylist daemon. Really, the only bad decision I made eight years ago was not fully isolating the protocol. Everything else was an okay decision.
And frankly, I'm not even sure if the greylist daemon needs any more work done on it.