The Boston Diaries

The ongoing saga of a programmer who doesn't live in Boston, nor does he even like Boston, but yet named his weblog/journal “The Boston Diaries.”

Go figure.

Monday, August 13, 2007

Ramblings about search engine optimizations and bandwidth utilization

For the past week or so, I've been playing around with search engine optimizations (that last link is so I know what not to do) and poring through the log files.

The last time I made a major search engine optimization to my site was four years ago, and the reason for that optimization was to get rid of the disturbing search requests that were plaguing the log files (and my mind) at the time. It also had the added benefit of reducing the amount of “duplicate content” on my site. A search engine like Google would skip indexing the monthly archives (as well as the front page) but would index the individual entries. The end result: no more disturbing search requests, and better results for people actually looking for stuff.

But it didn't reduce all the duplicate content. There was still the small problem of /2000/1/1.1 having the same content as /2000/01/01.1 (note the leading zeros). Technically, they are two separate pages, each with a unique URL, although internally, the leading zero is ignored by my blogging engine and it would happily serve up the page under either location.

Now, that particular duplicate content issue is something I've known about since I started writing mod_blog and I had code to distinquish between the two requests, but never wrote the code to do anything about it. Until last week. Now, go to /2000/1/1.1 and you'll get a permanent redirect to /2000/01/01.1. This change should further reduce the amount of “duplicate content” on my site, as well as reduce the number of hits from web spiders indexing my site (although the redirection doesn't happen under a very unique condition, but fixing that pretty much requires a complete overhaul of some very old code, but it's such a seldom used bit of code that I'm not terribly worried about it).

I'm a bit concerned about the spiders because of some other information I've pulled out from the log files. My archive of log files (at least, of this blog) go back to October of 2001 and using some homegrown tools, I generated (with the help of GNUPlot) this graph of the growth of my site over the past six years:

[Graph of traffic growth at The Boston Diaries]

In red, you see the number of raw hits to this site (with the scale along the left hand side), with some explosive growth in early 2006 and again in just the last few months here. In green you see the actual bytes transferred (with its scale along the right hand side)—pretty steady up until January of 2006 when it goes vertical, and again it goes vertical in just the past few months.

And I'm at a loss to the sudden explosion of bandwidth usage in my site. Unless it's a lot of people hot linking to images on this site (and yes, that does happen quite often), or a vast increase in the number of spiders indexing my site (and for the past few months, Yahoo's Slurp has been generating about 40,000 hits a month).

I may no longer have disturbing search requests, but I know have a disturbing use of bandwidth.

Obligatory Picture

[It's the most wonderful time of the year!]

Obligatory Links

Obligatory Miscellaneous

You have my permission to link freely to any entry here. Go ahead, I won't bite. I promise.

The dates are the permanent links to that day's entries (or entry, if there is only one entry). The titles are the permanent links to that entry only. The format for the links are simple: Start with the base link for this site:, then add the date you are interested in, say 2000/08/01, so that would make the final URL:

You can also specify the entire month by leaving off the day portion. You can even select an arbitrary portion of time.

You may also note subtle shading of the links and that's intentional: the “closer” the link is (relative to the page) the “brighter” it appears. It's an experiment in using color shading to denote the distance a link is from here. If you don't notice it, don't worry; it's not all that important.

It is assumed that every brand name, slogan, corporate name, symbol, design element, et cetera mentioned in these pages is a protected and/or trademarked entity, the sole property of its owner(s), and acknowledgement of this status is implied.

Copyright © 1999-2019 by Sean Conner. All Rights Reserved.