Monday, May 10, 2010
An update on the updated Greylist Daemon
Internet access at Chez Boca was non-existant today (scuttlebut: a fibre cut, which is currently the third one in about two months time) and without the Intarweb pipes, I can't work (good news! I get a day off! Bad news! I can't surf the web!), so I figured I would take the time to get a few personal projects out the door.
First one up—a new version of the greylist daemon has been released. a few
bugs are fixed—the first being an error codition wasn't properly sent back
to the gld-mcp
. The second one a segmentation fault (not
fatal actually—the greylist daemon restarts itself on a
segfault) if it recieved too many requests (by “too many” I mean “the
tuple storage is filled to capacity”—when that happens, it just dumps
everything it has and starts over, but I forgot to take that into account
elsewhere in the code). The last prevented the Postfix interface from logging any syslog
messages (I think I misunderstood how setlogmask()
worked).
The other changes are small (a more pedantic value for the number of seconds per year, adding sequence numbers to all logged messages (that's another post) and set the version number directly from source control) but the reason I'm pushing out version 1.0.12 now (well, aside from there was nothing else to do yesterday) is related to the one outstanding bug (that I know of) in the program. That bug deals with bulk data transfers (Mark has been bit by it; I haven't) and I suspect I know the problem, but the solution to that problem requires a incompatible change to the program.
Well, okay—the easy solution requires an incompatible change to the program. The problem is the protocol used between the greylist daemon and the master control program. The easy solution is to just change the protocol and be done with it; the harder solution would be to work the change into the existing protocol, and that could be done, but I'm not sure if it's worth the work. The number of people (that I know of) using the greylist daemon I can count on the fingers of one hand, so an incompatible change wouldn't affect that many people.
Then again, I might not hear the end of it from Mark.
In any case, there are more metrics I want to track (more on that below) and those would require a change to the protocol as well (or more technically, an extention to the protocol). The addtional metrics may possibly help with some long term plans that involve making the greylist daemon multithreaded (yes, the main page states I can get over 6,000 requests a second; that's a very conservative value—drop the amount of logging and I can handle over 32,000/second, all on a single core).
Some of the new metrics involve tracking the the lowest and highest
number of tuples during any given period of time. This should help with
fine-tuning the --max-tuples
parameter (currently defaults to
65,536). I've noticed over the years that I don't seem to get much past
3,000 tuples at any one time, but I would like to make sure before I tweak
the value on my mail server.
The other metrics I want to track are the number of tuple searches (or “reads”) and the number of tuple updates (additions or deletions, but in other words, “writes”). These metrics should help if I decide to go multithreaded with the greylist daemon, and how best to store the tuples depending if the application is read heavy or write heavy (but my guess is that reads and writes are nearly equal, which presents a whole set of challenges). But it's not like the program isn't fast as it is—while I claim over 6,000 requests per second, that's a rather conservative figure—drop the logging to a minimum and it can handle over 30,000 requests per second on a single core. I would be interested to see if I can improve on that with additional cores.
Since the there are quite a few changes that require protocol changes, I decided to just make the last release of the 1x line, and start work on version 2 of the greylist daemon—or maybe 1.5 and leave 2x for the multithreaded version.