The Boston Diaries

The ongoing saga of a programmer who doesn't live in Boston, nor does he even like Boston, but yet named his weblog/journal “The Boston Diaries.”

Go figure.

Tuesday, April 09, 2002

The Ins and Outs of Calculating Weblog Traffic

The buzz in Bloggerton is about numbers. The number of readers a blog has and it's not an easy number to calculate. Over the past few months I've been measuring myself against Sean Tevis, a fellow South Florida blogger (whom I actually met in real life once). For a while we were pretty much at parity, but then over the past month or so he's taken off. As he states (as of today) he is getting 4,000 visits and 10,000 page views per month.

And I'm wondering just how he's calculating that.

So here we go. Raw counts for The Boston Diaries: January 2002: 14,297 requests. February 2002: 8,035 requests and March 2002: 7,860 requests. Yes, there's a rather big drop there between January and February, but that can be accounted for—5,870 requests in January were from easily identifiable search engine robots (4,726 just from one alone). If we rerun the count for just the popular browsers (basically, any agent reporting itself as Mozilla, of which Netscape, Mozilla, Opera and IE—yes, that does skip Lynx, but the number of hits via Lynx (that aren't me) is miniscule for purposes of the rough estimates I'm doing here) and only pages (or files) that were successfully served up, we get: January 2002: 5,880 requests. February 2002: 6,089 requests. And March 2002: 5,292 (ouch).

Now, I'm generating this by going over the raw logs with a custom program I wrote that allows one to filter out fields (to make it easier to grep through). Those last figures, for instance, were done with:

escanlog -status 200 -agent boston.conman.org | grep Mozilla | wc -l

escanlog is the program I wrote, and I instructed it to only print out records that successfully completed (-status 200) and only print out the user agent field (-agent) on the log file in question (boston.conman.org). grep and wc -l are standard Unix programs to search for patterns and count characters (or lines, in this invocation).

But those figures are again, misleading. They include images, requests for the RSS file, the CSS file; extraneous stuff that don't really constitute an actual page view. Going over the logs again, this time only taking into account pages (most likely) viewed by humans we get: January 2002: 1,805. February 2002: 2,090. March 2002: 1,538 (ow! But it's still an improvement over December 2001 at 1,090).

Oh wait, one more variable to control for: those counts include those I've done. Remove those, and the results are: December 2001 (since I included it above): 1,009. January 2002: 1,673 (well, Rob and Spring are also being excluded—yea, that's why I had over 100 visits from myself). February 2002: 1,909. March 2002: 1,328 (oooh).

Now, I can pretty much guarantee that those figures up there represent unique visits. A more interesting question to answer would be the number of repeat (or regular) visits. This is tougher since most ISPs dish out dynamic IP addresses whenever someone reconnects but I don't think it's impossible to get a ball park figure, taking the previous results, pulling out the unique IP addresses and sorting, I see for January 2002 (cutting off after 5 unique visits per address):


    197 65.116.145.137
     92 208.55.254.110
     63 211.101.236.143
     45 63.173.190.16
     30 64.129.118.129
     30 24.52.32.105
     20 211.101.236.79
     19 24.4.252.167
     15 208.60.8.130
     11 65.58.147.103
     11 164.77.128.210
     10 64.131.172.241
      9 66.157.2.122
      8 207.49.213.174
      7 65.2.207.3
      6 65.207.131.180
      6 64.39.15.82
      6 12.39.254.108
      5 64.30.224.30
      5 63.251.87.214
      5 212.250.100.122
      5 209.214.129.196
      5 208.1.105.145
      5 204.89.226.65
      5 196.41.28.43
      5 130.74.211.63

And so on. Easily a dozen repeat readers, but there are probably more. One way would be to generate the number of visits per block of IP addresses (most users would fall into a range of addresses, usually along a classical C block and by doing that, I get:


    197 65.116.145
     92 208.55.254
     83 211.101.236
     45 63.173.190
     30 64.129.118
     30 24.52.32
     21 64.12.96
     21 24.4.252
     18 208.60.8
     11 65.58.147
     11 208.1.105
     11 196.41.28
     11 164.77.128
     10 64.131.172
     10 152.163.189
      9 66.157.2
      9 216.10.44
      8 65.207.131
      8 212.250.100
      8 207.49.213
      8 205.188.209
      8 205.188.208
      7 65.2.207
      7 195.163.203
      6 64.39.15
      6 12.39.254
      5 64.30.224
      5 63.251.87
      5 24.51.202
      5 209.214.129
      5 204.89.226
      5 130.74.211
      5 129.74.252

Hmmm … not much difference really. Rerunning for last month (March) I get:


     86 208.55.254
     56 65.214.36
     22 64.131.172
     21 12.164.38
     20 218.45.21
     20 211.101.236
     19 66.176.111
     17 207.200.84
     17 129.74.186
     16 66.27.11
     16 151.203.23
     14 64.12.96
     12 64.129.118
     12 216.76.209
     11 196.41.28
     10 66.27.63
     10 64.30.224
     10 64.231.69
     10 194.222.60
      9 64.152.245
      9 24.51.200
      9 205.188.209
      8 64.90.36
      8 152.163.188
      7 64.158.38
      7 208.60.8
      6 205.188.208
      6 199.44.53
      6 199.174.3
      6 199.174.0
      6 195.163.203
      6 194.82.103
      5 64.34.18
      5 64.210.248
      5 24.71.223
      5 24.52.32
      5 207.158.192
      5 207.114.208
      5 204.89.226
      5 151.100.29
      5 128.242.197
      5 12.225.219

Oh, lets call it two dozen repeat readers and be done with it.

This is an interesting topic, and I would still like to know how Sean Tevis calculates his stats.

Obligatory Picture

[The future's so bright, I gotta wear shades]

Obligatory Contact Info

Obligatory Feeds

Obligatory Links

Obligatory Miscellaneous

You have my permission to link freely to any entry here. Go ahead, I won't bite. I promise.

The dates are the permanent links to that day's entries (or entry, if there is only one entry). The titles are the permanent links to that entry only. The format for the links are simple: Start with the base link for this site: https://boston.conman.org/, then add the date you are interested in, say 2000/08/01, so that would make the final URL:

https://boston.conman.org/2000/08/01

You can also specify the entire month by leaving off the day portion. You can even select an arbitrary portion of time.

You may also note subtle shading of the links and that's intentional: the “closer” the link is (relative to the page) the “brighter” it appears. It's an experiment in using color shading to denote the distance a link is from here. If you don't notice it, don't worry; it's not all that important.

It is assumed that every brand name, slogan, corporate name, symbol, design element, et cetera mentioned in these pages is a protected and/or trademarked entity, the sole property of its owner(s), and acknowledgement of this status is implied.

Copyright © 1999-2024 by Sean Conner. All Rights Reserved.