The Boston Diaries

The ongoing saga of a programmer who doesn't live in Boston, nor does he even like Boston, but yet named his weblog/journal “The Boston Diaries.”

Go figure.

Friday, October 12, 2007

Spam from bogus IP space

Earlier today (okay, technically yesterday) I came across the concept of bogons, or IP address not officially allocated for use. They even provide a current list of non-routed IP blocks. Curious about the effect of using said list to block potential spam, I setup a test, consisting of 565,012 tuples (we've stepped up testing of the greylist daemon over the past week) previously processed (I'm keeping some extensive logs here), added the 6,803 IP blocks not allocated, and let it rip.

An hour and a half later, I had my answer.

Of 565,012 tuples processed, only 6,117 came from non-allocated IP space.

It's a little over 1%.

I don't think it's worth adding the non-allocated IP space to the greylist daemon. Not that it makes the daemon run slower, it's just that an IP list of that size takes up quite a bit of memory due to the trie structure used to store the table, and for such a small gain, I don't feel it's really worth it.


I guess he won't be extending anyone's mortgage by three inches anymore …

Who hated Tolstokozhev so much as to hire a hit man to assasinate him? Well, I guess you have about one billion e-mail users to suspect. Tolstokozhev was a famous spammer who sent millions of e-mail promoting viagra, cialis, penis enlargement pills and other medications. Links in these e-mails usually led to some pharmacy shop, which paid Tolstokozhev a share of its revenue. This is a well known affiliate scheme employed by spammers worldwide. Tolstokozhev is estimated to be responsible for up to 30% percent of all viagra and penis enlargement related spam.

Via Wlofie in email, Russian Viagra and Penis Enlargement Spammer Murdered

Well, speaking of anti-spam measures, that certainly works (not that I … um … condone … such actions). A bit messy, in both physical and legal aspects, but still, quite effective.

Update a few minutes later

Then again, it may be a hoax:


Some musings on parallelizing programs

Can you, in your favourite language, show me the easiest way to parallelise the following program (written in C)? I am just genuinely curious.

int gimme(int x)
{
  return x * x;
}

int main(void)
{
  int i = 2, j = 4;
  int a = gimme(i),
      b = gimme(j);
  return a * b;
}

Write Me A Small Program

In my ideal world, any compiler worth its salt would optimize (mostly through variable lifetime analysis and constant propagation) that program down to the following:

int main(void)
{
  return(64);
}

but the author's main point is—what languages today can easily parallelize that code? His point, on a massively parallel computer, the calls to gimme() are both independant of each other, and therefore can be computed at the same time (assume, for the sake of argument, that gimme() takes a bit more time to execute than a single function call and multiply).

And in an ideal world, a compiler should be able to figure out that gimme() is, for lack of a better term, a “pure” function (one that does not rely upon data outside of its input parameters and local variables). This also implies that thread creation is cheaper than it is today (or easier, which is another of the author's points), and that the compiler will take care of the messy thread creation details for me.

But even an ideal compiler won't take us very far.

I was thinking about this as I was running tests on the greylist daemon last night. The program maxed out at around 130 requests per second, and this on a dual-core 2.6GHz Intel Pentium system that hasn't even used swap space, so speed and memory isn't an issue (and remember, the greylist daemon keeps everything in memory).

But as written, the program only uses a single thread of execution. Could I not create two threads (one for each CPU in the machine) and double the speed?

Maybe, but the compiler won't help me much.

When a tuple [IP address, sender address, recipient address] comes in, the program scans a list of IP addresses which can accept or reject that tuple. If the tuple isn't outright accepted or rejected based on the IP address, then the sender address is checked, then the sender domain, then the recipient address, then the recipient domain. Now, those lists are, for the most part, read-only, so no coordination is required between multiple threads for read access.

The actual greylist itself? The program checks to see if the tuple [IP address, sender address, recipient address] already exists, and if not, adds it to the existing list of tuples. And that part, adding the tuple to the list, happens on just about every tuple that comes in. In effect, the tuple list is “write- mostly.” And to ensure a consistent list of tuples, you need to coordinate multiple threads when trying to update the list. And now you get into semaphores, and locking and dining philosophers

Messy stuff.

I suppose each thread could get its own tuple list to update, but then you have to ensure that any given tuple X always goes to the same thread. Easy enough I suppose, but it's not something that you can leave up to the compiler.

In fact, in an ideal world, an ideal compiler would be hard-pressed to automagically parallelize this program. Certain parts, maybe. But overall, it would required thinking how best to write the program to be parallelized in the first place.

Obligatory Picture

An abstract representation of where you're coming from]

Obligatory Contact Info

Obligatory Feeds

Obligatory Links

Obligatory Miscellaneous

You have my permission to link freely to any entry here. Go ahead, I won't bite. I promise.

The dates are the permanent links to that day's entries (or entry, if there is only one entry). The titles are the permanent links to that entry only. The format for the links are simple: Start with the base link for this site: https://boston.conman.org/, then add the date you are interested in, say 2000/08/01, so that would make the final URL:

https://boston.conman.org/2000/08/01

You can also specify the entire month by leaving off the day portion. You can even select an arbitrary portion of time.

You may also note subtle shading of the links and that's intentional: the “closer” the link is (relative to the page) the “brighter” it appears. It's an experiment in using color shading to denote the distance a link is from here. If you don't notice it, don't worry; it's not all that important.

It is assumed that every brand name, slogan, corporate name, symbol, design element, et cetera mentioned in these pages is a protected and/or trademarked entity, the sole property of its owner(s), and acknowledgement of this status is implied.

Copyright © 1999-2024 by Sean Conner. All Rights Reserved.