The Boston Diaries

The ongoing saga of a programmer who doesn't live in Boston, nor does he even like Boston, but yet named his weblog/journal “The Boston Diaries.”

Go figure.

Saturday, September 27, 2003

The Economics of Spam

Paul Graham is bullish on spam. Not that he likes spam, but he does believe that current anti-spam techniques will make spam uneconomical. But what are the currrent economics of spam? He doesn't really cover that, but he does mention a few numbers:

In an article in the Detroit Free Press, one spammer said that he charged a flat fee of $22,000 to send mail to his entire list of 250 million addresses. If filters cut response rates by a factor of 100, the average value of what he was selling would sink to $220. I doubt that would even cover his costs.

Will Filters Kill Spam?

While $22,000 to send unsolicited email may seem expensive, for the reported 250,000,000 recipients it can be pretty darned cheap! That works out to about 1/100 of a cent per recipient. For the same amount, a traditional junk mailing (you know, the physical mail you get in your mailbox) you only get about 100,000 recipients. So far, it seems decent.

The person who responds to spam is a rare bird. Response rates can be as low as 15 per million. That's the whole problem: spammers waste the time of a million people just to reach the 15 stupidest or most perverted.

Will Filters Kill Spam?

Traditional junk mail has a typical response rate of 1-2%, with 3% being an incredible “Let's do another run now!” Fifteen responses per million? That's about a tenth of a percent of a percent response rate. Granted, that's at the low end of things, but are we still getting our money's worth? With a traditional mass mailing you end up spending between $7 (with a 3% response rate) to $22 (@ 1%) to grab a customer. So even with the low 15/1,000,000 rate you get via spam, you end up paying about $6 per respondent. It's not until four respondents per million that you are paying $22.

Scarily enough, it still looks good.

So for a large enough company like Negiyo, paying someone $22,000 to spam is worth it. But this is from the client side, the side willing to pay $22,000 to send spam. How do the economics work from the spammer's side? Because it sure is tempting. $22,000 to send some emails? No wonder people do it.

But 250,000,000 emails?

I won't even get into the number of dead addresses (let's see, spc@pineal.math.fau.edu, spc@cse.fau.edu, spc@armigeron.com, spc@fdma.com, spc@gate.net, spc@emi.net and sconner@verio.net are no longer valid—and those are the ones I remember) or multiple addresses per person (I just counted, I have at least 37 in active use, most as spam traps) such a list would contain. I'm going instead concentrate on the physical act of sending 250,000,000 pieces of email.

Assuming one second per email, and assuming 100,000 seconds per day (it's actually 86,400 seconds per day, but hey, this is a rough calculation here) that's:

2.5×108 ÷ 105 = 2.5×103

That's … um … 2,500 days! Which ends up being … about seven years. So obviously one per second isn't going to cut it. At ten per second it will then take 250 days and at 100/sec only 25 days. But 100 connections per second is very optimistic—10/sec is more realistic. I should know, as I wrote some code to check. A naive implementation will make a separate connection per email:

SMTP connections—one email per connection
Type of connection Time per connection
Local SMTP 0.07 sec
Cable modem to server SMTP 0.45 sec
Fat pipe to server SMTP 0.24 sec

This table just covers connections; it doesn't include the time to send the actual email so the times are going to be a bit worse than reported. Using a cable modem to send the spam isn't going to cut it; never mind that you'll loose the connection after a few hours, it just isn't fast enough—you'll need a server sitting right on the Internet and even then, the simple method will only give you about four emails per second, which will still take almost two years.

But there is no reason to send a single email per connection; the SMTP allows multiple recipients per connection, so doing that gives:

SMTP connections—multiple emails per connection
Type of connection Time per connection
Local SMTP 0.02 sec
Cable modem to server SMTP 0.11 sec
Fat pipe to server SMTP 0.05 sec

With the more sophisticated implementation you can get about 20 emails per second, giving about 125 days to send 250,000,000 emails, given a nice fat pipe to the Internet. Four months.

And a fat pipe will set you back $2,000-$10,000 per month. Assuming you can keep your connection for four months. And you most likely won't so you'll have to keep busy getting new providers. But even getting the cheapest connection, that's $8,000 of overhead (excluding equipment and office space) leaving you $14,000 left for four months of work.

It works out to about $750/week. Doesn't sound so impressive now.

But still, 250,000,000 emails? The upper limit appears to be on the order of 20/second and that's slamming a server, which is going to be noticed. That person claiming to send 250,000,000 emails for $22,000 is either struggling to keep going, or is basically lying and maybe only sending several million at best.

Fortunately (and bear with me for a second) such economies mean that there are only going to be a few spamhouses to contend with; smaller ones won't last long due to the rather large expenses in keeping connectivity, and if Bayesian filters (and even others, like the Controllable Regex Mutilator which filters out 99.9% of spam) get mass adaption (and I've heard that AOL and MSN might be adding Bayesian filtering to their email systems) the economics of spam might just put the spammers out of business.

We can only hope.

Update on Sunday, September 28th

I just realized that my math in calculating an upper limit on the number of SMTP are off, although the numbers reported are accurate. For example, in the first example, it takes my computer here maybe half a second to send an email, but that's a single connection! I neglected to take into account multiple versions running at the same time, thus the figures alone don't tell the whole story, and it may very well take less time to actually send 250,000,000 emails than I indicate.

Basically, a dedicated T1 will give you 1.54Mbps, which works out to 196,608 bytes per second. SMTP runs over TCP/IP which is a miminum of 40 bytes per packet of overhead, mean at best, you can schlep out 4,915 empty TCP/IP packets per second via the T1. Let's give us some room and say you can get an average of 2,500 packets per second, meaning you can maintain 2,500 connections (assuming your computer can handle that many concurrent connections). Given that, say it still takes a second to send an email, but with 2,500 concurrent connections, it will take 100,000 seconds, or one day, to send out 250,000,000 emails, which doesn't sound right to me, but then again, I'm not sure if there are many network stacks that could handle 2,500 concurrent connections and not be so bogged down that through put drops to near zero (basically, I suspect the computer will then become CPU bound and possibly network I/O bound).

If course, that much network activity will be noticed and the rest of what I have to say doesn't change.

Obligatory Picture

[It's the most wonderful time of the year!]

Obligatory Links

Obligatory Miscellaneous

You have my permission to link freely to any entry here. Go ahead, I won't bite. I promise.

The dates are the permanent links to that day's entries (or entry, if there is only one entry). The titles are the permanent links to that entry only. The format for the links are simple: Start with the base link for this site: http://boston.conman.org/, then add the date you are interested in, say 2000/08/01, so that would make the final URL:

http://boston.conman.org/2000/08/01

You can also specify the entire month by leaving off the day portion. You can even select an arbitrary portion of time.

You may also note subtle shading of the links and that's intentional: the “closer” the link is (relative to the page) the “brighter” it appears. It's an experiment in using color shading to denote the distance a link is from here. If you don't notice it, don't worry; it's not all that important.

It is assumed that every brand name, slogan, corporate name, symbol, design element, et cetera mentioned in these pages is a protected and/or trademarked entity, the sole property of its owner(s), and acknowledgement of this status is implied.

Copyright © 1999-2019 by Sean Conner. All Rights Reserved.