The Boston Diaries

Wednesday, April 30, 2008

Now that I think about it, I doubt spam will ever go away …

I was talking with G, our Cisco consultant about a networking issue I had when the talk turned to spam (as that's the root cause of all our email problems at The Company, which currently comprise half the issues in our trouble ticket system). One of the surprising bits of information G conveyed was that spammers are now using /16 network blocks for spamming.

A /16 network block is 65,636 consecutive IP addresses and are rather hard to come by (the smallest block ARIN will hand out is a /20, which gives 5,120 consecutive IPs and even then, you have to pay quite a bit and justify the usage of said block). The spammer then runs through the /16 (probably using 256, or a /24, the smallest routable block [1] at a time) until it's been blocked, and then sells it, or returns it; the spammer then obtains another /16 to abuse.

G also talked about other companies providing anti-spam techniques (like Barracuda) and the more he talked, the more I realized that spam is never going away—there's too much money to be made, by both sides.

Hypothetically speaking, if a spammer approached us and offered tons of money to use some of our IPs (we actually have a few /20s) to spam, if the money was good enough … well … the money would have to be mad money … and even then … well … everybody has their price. Anyway (this is a hypothetical situation) we made money to facility spamming. The spammer obviously makes money somehow or he wouldn't be doing this. The anti-spam companies like Barracuda make money by offering spam fitering services that need to be continuously updated.

So, what incentive is there for the commercial anti-spam companies to see spam eliminated completely? (as in, spammers never spam anymore)

Hmmm … sounds like anti-virus companies (not that I'm saying that anti-spam companies spam, but that their incentives are centered around staying in business, and totally eliminating spammers is not conducive to their remaining in business).

I only hope this is my more cynical side talking here, and not reality.

Technically, the smallest routable block consists of four consecutive IP addresses (a /30). What I mean by “smallest routable block” in this context are routes that are accepted by the backbone routers.

Stone knives and bear skins, Part II

I briefly mentioned “Project Leaflet” before, with respect to separating logic, language and layout of an application (in this case, a PHP web application), possibly with the use of an IDE.

But the problem goes deeper—what if you need alternative versions of the language? Or logic?

In C, this is handled by conditional compilation:

#ifdef MSDOS
  fp = fopen("C:\\temp\\foobar","wb");
#elif defined(VMS)
  fp = fopen("SYS$USERS:[TEMP.FOOBAR]","wb");
#elif defined(UNIX)
  fp = fopen("/tmp/foobar","w");
#else
  fp = fopen("foobar","wb");
#endif
  if (fp == NULL)
  {
#if defined(UNIX)
    fprintf(stderr,"could not open /tmp/foobar\n");
    return(ENOENT);
#elif defined(MSDOS)
    fprintf(stderr,"could not open C:\\TEMP\\FOOBAR\n");
    return(ENOTFOUND);
#elif defined(VMS)
    fprintf(stderr,"cold not open SYS:USERS:[TEMP.FOOBAR]\n");
    return(ENOFILE);
#else
    fprintf(stderr,"could not open foobar\n");
    return(EXIT_FAILURE);
#endif
  }

As you can see, this method leaves a lot to be desired, but still, it's much better than what you get with PHP.

One of the design requirements for “Project Leaflet” is that it can use either MySQL or PostgreSQL. I've already gone through the code and abstracted out the database calls on the (okay, laughably incorrect) assumption that the SQL statements themselves won't require changing.

Ha ha.

Now granted, for the most part, the SQL statments are simple enough that either MysQL or PostgreSQL can run them without problem. But there are a few rough spots, like:

$query = "SELECT "
       . "  *, "
       . "  DATE_FORMAT(sent, '%b. %e, %Y at %l:%i%p') as datesent "
       . "FROM pl_emails WHERE id=$id";

PostgreSQL doesn't understand DATE_FORMAT(); no, it wants TO_CHAR(). To make things even more amusing, the format string is completely different:

$query = "SELECT "
       . "  *, "
       . "  TO_CHAR(sent, 'Mon DD YYYY at HH12:MMam') as datesent "
       . "FROM pl_emails WHERE id=$id";

So right now I'm looking at two codebases, separated by a common language. Sure, there are any number of methods to merge the two into a common codebase:

//---------------
// Variant 1
//---------------

	// would this even work, as it requires 
	// the use if $id ... 

$query = $db_view_query['all_by_date'];

//-----------
// Variant 2
//-----------

if ($db === "MySQL")
{
  $query = "SELECT "
       . "  *, "
       . "  DATE_FORMAT(sent, '%b. %e, %Y at %l:%i%p') as datesent "
       . "FROM pl_emails WHERE id=$id";
}
elsif ($db === "PostgreSQL")
{
  $query = "SELECT "
       . "  *, "
       . "  TO_CHAR(sent, 'Mon DD YYYY at HH12:MMam') as datesent "
       . "FROM pl_emails WHERE id=$id";
}
else
{
  // -------------------------------------
  // love the way the language separation 
  // was done ... 
  // -------------------------------------

  die ($lang['a_horrible_death']);

//----------------
// Variant 3
//----------------

$query = "SELECT "
       . "  *, "
       . $dbdatefunct . "(send,'$dbdateformat') as datasent "
       . "FROM pl_emails WHERE id=$id";

Each solution being worse than the previous one. At least C has the decisions being done at compile time; I'm stuck with runtime decisions, or with very gross self-modifying code (variant #3—yes, that's what that is, self-modifying code).

As it stands right now, I have two branches of the code, a MySQL version and a PostgreSQL version, and I'm wavering between keeping them separate or merging the two, and the “keep them separate” faction is winning. That's because I'm currently using git, which makes branching a no-brainer (no, truly—switching between branches is trivial and takes no time at all; yes, it's a bit clunky trying to keep a central repository using git, but the branching is worth the clunkiness). And git's merging capabilities means that propagating fixes between the branches is easy as well (for fixes that apply across all branches, obviously). git comes very close to the fine-grained revision control I talked about.

So, not only do I want find-grained revision control, but a way to say “these changes I'm making apply to all the branches, and these changes only to this branch over here.”

Wednesday, April 30, 2008

Now that I think about it, I doubt spam will ever go away …

Stone knives and bear skins, Part II

Obligatory Picture

Obligatory Contact Info

Obligatory Feeds

Obligatory Links

Obligatory Miscellaneous

Obligatory AI Disclaimer