The Boston Diaries

Friday, September 21, 2007

What the heck do they put in the water in Boston?

BOSTON—Troopers arrested an MIT student at gunpoint Friday after she walked into Logan International Airport wearing a computer circuit board and wiring on her sweatshirt. Authorities call it a fake bomb; she called it art.

Via Flutterby, MIT student charged with wearing fake bomb she says was only art

And upon further reading, she wasn't wearing the circuit board to make a statement about airport security theatrics, but to gain attention at a career day at MIT, and was only at the airport to pick up her boyfriend.

Just below the masthead up there, I've written “The ongoing saga of a programmer who doesn't live in Boston, nor does he even like Boston, but yet named his weblog/journal ‘The Boston Diaries.’” At the time, I didn't like Boston because it's old (you really have to look to find anything in South Florida build prior to the 1960s, and as a result, most every building down here is “modern” in that it was initially designed with bathrooms, decent electrical and more importantly—air conditioning!), cold (water freezes down here at 40°F) and a twisty maze of one-way roads all alike (not to mention the drivers—think we have bad drivers? Hah!).

But this crap? This, along with the Mooninite scare, the Traffic Counter scare, mandatory health insurance (and if you don't have it, you pay even more in taxes) and hypocritical liberal weenies, has dropped my opinion of Boston even further.

Software performance with large sets of data

I have seen this many, many times. Something that runs fast during development and maybe even testing because the data used in testing was too small and didn't match real world conditions. If you are working on a small set of data, everything is fast, even slow things.

Software performance with large sets of data

The primary test I use with greylist daemon is (I think) brutal. I have a list of 27,155 tuples (real tuples, logged from my own SMTP server) of which 25,261 are unique. When I run the greylist daemon, I use an embargo timeout of one second (to ensure a significant number of tuples make it to the whitelist), a greylist timeout of at least two minutes, with the cleanup code (which checks for expired records and removes them) running every minute. Then to run the actual test, I pump the tuple list through a small program that reformats the tuples that the Postfix module expects, which then connects to the daemon. There is no delay in the sending of these tuples— we're talking thousands of tuples per minute, for several minutes, being pumped through the greylist daemon.

Now, there are several lists the tuple is compared against. I have a list of IP addresses that will cause the daemon to accept or reject the tuple. I can check the sender email address or sender domain, and the recipient email address or domain. If it passes all those, then I check the actual tuple list. The IP list is a trie (nice for searching through IP blocks). The other lists are all sorted arrays, using a custom binary search to help with inserting new records.

Any request that the server can handle immediately (say, checking a tuple, or returning the current config or statistics to the Greylist Daemon Master Control Program) are done in the main processing loop; for longer operations (like sending back a list of tuples to the Master Control Program) it calls fork() and the child process handles the request.

I haven't actually profiled the program, but at this point, I haven't had a need to. It doesn't drop a request, even when I run the same grueling test on a 150MHz PC).

I just might though … it would be interesting to see the results.

Friday, September 21, 2007

What the heck do they put in the water in Boston?

Software performance with large sets of data

Obligatory Picture

Obligatory Contact Info

Obligatory Feeds

Obligatory Links

Obligatory Miscellaneous

Obligatory AI Disclaimer