The Boston Diaries

The ongoing saga of a programmer who doesn't live in Boston, nor does he even like Boston, but yet named his weblog/journal “The Boston Diaries.”

Go figure.

Tuesday, July 16, 2002

The Ins and Outs of Calculating Browser Usage

I spent the past few hours writing a program to parse the browser string from the web server log files. Why didn't I use an existing web analyizer package? I wanted the browser strings to be rewriten to have correct information, as well as being in a more consistent style. This meant changing it from, say:

Mozilla/4.0 (compatible; MSIE 6.0; Windows 98; Win 9x 4.90; Q312461)


MSIE/6.0 Windows/98

This also means I can generate decent stats about the popularity of certain browsers on the fly (using the Unix command line, I can pull out the browser string, feed that through the newly written program, then count unique browsers easier). An initial run through last month's log file for my blog:

Browser Statistics for The Boston Diaries
# Hits Browser/VersionOS/Version
1,228 Googlebot/2.1 -/-
748 MSIE/6.0 WindowsNT/5.1
712 MSIE/6.0 Windows/98
641 MSIE/6.0 WindowsNT/5.0
476 Mercator/2.0 -/-
371 MSIE/5.5 Windows/98
303 MSIE/5.0 Windows/98
302 MSIE/5.5 WindowsNT/5.0
238 -/- -/-
216 MSIE/5.01 WindowsNT/5.0
137 ia_archiver/- -/-
113 Syndic8/1.0 -/-
101 NCSA/- -/-
101 MSIE/5.01 Windows/98
100 MSIE/6.0 WindowsNT/4.0
99 Mozilla/3.01 -/-
89 Gecko/20020529 Linux/i686
88 Gecko/20020523 WindowsNT/5.0
81 MSIE/5.14 Mac_PowerPC/-
79 Mozilla/5.0 -/-
68 SlySearch/1.2 -/-
66 MSIE/5.5 Windows/95
62 MSIE/5.5 WindowsNT/4.0
62 Gecko/20020529 PPC/Mac
61 Openfind/- -/-
55 MSIE/5.0 Mac_PowerPC/-
49 Indy-Library/- -/-
48 Gecko/20020510 Linux/i686
42 Mozilla/3.0 -/-
41 -/-
40 Gecko/20020311 WindowsNT/5.1
38 MSIE/5.01 Windows/95
36 -/-
33 Gecko/20020530 WindowsNT/5.0
28 bumblebee/1.0 -/-
28 Gecko/20020510 WinNT4.0/-
27 Opera/6.02 Windows/2000
27 MSIE/5.0 WindowsNT/4.0

This gives a decent flavor for what's being used to view my site (out of the 7,943 hits last month, about 16% were from the Google spider) but one of the primary reasons I did this was to see just how many people are still using older browsers like Netscape 4x or Internet Explorer 4x (which would show up as Mozilla/4.x and MSIE/4.x respectively). So, strip out the operating system column, and look at only the major version numbers, we then get:

More Specific Browser Statistics for The Boston Diaries
# Hits Browser/major Version
2,210 MSIE/6
1,671 MSIE/5
1,228 Googlebot/2
543 Gecko/-
476 Mercator/2
238 -/-
142 Opera/6
141 Mozilla/3
137 ia_archiver/-
134 Mozilla/4
113 Syndic8/1
101 NCSA/-
79 Mozilla/5
68 SlySearch/1
61 Openfind/-
49 Indy-Library/-
45 MSIE/4
37 Netscape6/6.2
28 bumblebee/1
26 Netscape/7
24 BlogBot/1
22 Win32/-
22 Konqueror/3.0
20 Frontier/8.0
16 Internet/-
16 Ask-Jeeves/-
15 Mozilla/-
14 Microsoft/-
14 Konqueror/2.2
12 w3m/0.2
12 obidos/bot
12 Mozilla/4.7C-CCK-MCD
11 myownhomeblogindexingservicecrawler/-
11 htdig/3.1
10 Mozilla/3.x

The bad news: 48% of the browsers were Internet Explorer 5x or 6x (although surprisingly enough, I did get five hits from a Mozilla based browser under OS/2). The good news though, is that 58% of the hits were from browsers capable of viewing CSS without crashing. And speaking of horrible browsers that can't support CSS, about 2.5% were running Netscape 4x or IE 4x (they can see the site, only it doesn't look that great).

I also checked the log file for Spring's site (Hi honey!). 53% of her visitors are using Internet Explorer 5 or higher, or Mozilla (or Netscape 6 and higher). Only about 3% are using Netscape 4x or Internet Explorer 4x, which is pretty much on par with my site (the rest are mostly robots or experiemental browsers).

Obligatory Picture

[It's the most wonderful time of the year!]

Obligatory Contact Info

Obligatory Feeds

Obligatory Links

Obligatory Miscellaneous

You have my permission to link freely to any entry here. Go ahead, I won't bite. I promise.

The dates are the permanent links to that day's entries (or entry, if there is only one entry). The titles are the permanent links to that entry only. The format for the links are simple: Start with the base link for this site:, then add the date you are interested in, say 2000/08/01, so that would make the final URL:

You can also specify the entire month by leaving off the day portion. You can even select an arbitrary portion of time.

You may also note subtle shading of the links and that's intentional: the “closer” the link is (relative to the page) the “brighter” it appears. It's an experiment in using color shading to denote the distance a link is from here. If you don't notice it, don't worry; it's not all that important.

It is assumed that every brand name, slogan, corporate name, symbol, design element, et cetera mentioned in these pages is a protected and/or trademarked entity, the sole property of its owner(s), and acknowledgement of this status is implied.

Copyright © 1999-2020 by Sean Conner. All Rights Reserved.