The Boston Diaries

The ongoing saga of a programmer who doesn't live in Boston, nor does he even like Boston, but yet named his weblog/journal “The Boston Diaries.”

Go figure.

Wednesday, January 01, 2003

New Year's Day

Does this mean that the Season™ is finally over?

Yet more Gregory updates

Spring stopped by to visit Gregory in the hospital. He's doing better, but the fractures he received are a bit worse than first expected—each of the six ribs and the clavicle are broken in two spots so the bones are kind of swimming around there. But that appears to be the extent of the damage from his motorcycle accident, which is fortunate.

He'll probably be in the hospital for at least a week or so.

Now THIS is War Driving …

I am going to make an actual page for this sometime: all of us from portland going to codecon will have wireless equipment in our vehicles providing a roaming hotspot all the way down to san francisco. We should have at least three vehicles and as many as six during the trip.

c o d e r l o g: wireless caravan

A mobile network … the mind boggles …

Let's see … given sufficient laptops, 802.11b PCMCIA network cards and VoIP and I can see this being way more popular than CB ever was.

Thursday, January 02, 2003

Simple amusements

For those of us living in the United States, today is 01/02/03.

Okay, so I found it amusing …

Mickey Rourke on the Danish 20 Kroner

“The chin makes her look like Mickey Rourke in 9 Weeks,” he added, referring to the steamy 1986 film in which Rourke, who is famed for his skill in the boxing ring, has an affair with Kim Basinger.

Via a search engine request, Taking mickey out of Danish queen

I was going through the log files and one of the referers was to a search request for “queen margrethe+coints+pictures” (which was referencing the entry I did on some foreign coins). Intrigued, I started following links and I came to realize that not many Danish people are happy with the likeness of Margrethe II.

Fancy that.

Friday, January 03, 2003

A further lesson to the RIAA and the MPAA

So, there's been a slight change of plans. As you may remember (surely 2002 isn't too hazy yet), I serialized my most recent science fiction novel, Old Man's War, here in December, and this month I was going to put it up as shareware, a la Agent to the Stars. Well, I won't be doing that. The reason for this is that, well, I kind of sold it. Instead of being available as shareware, Old Man's War will be available either later this year or early next year in a hardcover edition from Tor Books, publishers of (among others) Orson Scott Card, Robert Jordan, Steven Brust and Teddy Roosevelt. Yes, really, Teddy Roosevelt. It's a reissue, I think, not one of those L. Ron Hubbard-eqsue “dictating from beyond the grave” situations.

Via InstaPundit, John Scalzi's Whatever: Change of Plans

Further proof that making intellectual property available increases sales of said intellectual property and I certainly hope examples like this will drive the point home (through the skull if we're lucky) of the RIAA and the MPAA.

Then again, perhaps we'll be lucky, they won't get a clue and implode instead.

One can only hope.

One last Gregory update

Gregory was moved to a rehabilitation center (for physical therapy) in Lauderhill last night, although there was some confusion as it appeared that Gregory was lost, either in transit or the bureaucracy (or both!) as the hospital has him discharged, but the rehabilitation center had no record of him. When Spring finally found him at the center, the head nurse explained that the receptionist was clueless.

When we went to visit him tonight he was looking much better as he laid there in the adjustable bed watching the Miami-Ohio football game on TV. He's now able to move about a bit and will be dischared from the rehabilitation center tomorrow morning (which means I was wrong about the length of his stay by a few days). He's going to be all right.

Sunday, January 05, 2003

More Ins and Outs of Calculating Weblog Traffic

As I do occasionally, I run the stats for the Boston Diaries. I use some programs I wrote to pretty much manually go through the log files as I feel it gives me a better feel for the actual traffic I get than if I were to use a program like Analog. Besides, doing it this way I often times find interesting things going on with autonomous agents silently indexing websites for their own nefarious reasons (muahahahahaha!).

I suspect that most people who run their stats don't take the time to really look into the results, because it wouldn't surprise me if the reported stats for most bloggers is inflated quite a bit.

I ran the stats as I have in the past and noticed that I had a higher rate of traffic than normal; I usually get about 100 human hits per day but last month it looked more like 116 per day. Okay, not that big a spike but enough to make me curious as to what's going on. I look at some of the requests that are being counted as human hits and I see [output truncated somewhat]: GET /2002/11/29 HTTP/1.0 200 Mozilla la2@unspecified.mail GET /2002/11/29.1 HTTP/1.0 200 Mozilla la2@unspecified.mail GET /2002/11/23.1 HTTP/1.0 200 Mozilla la2@unspecified.mail

Interesting … seems to be some unspecified robot. A quick query shows it to be from Spain, but other than that, no real information unless I want to track this down further. I'm not that curious, so add that to the list of agents to ignore and rerun the stats.

Still high—about 114 visits per day. Check the requests and find: GET / HTTP/1.1 200 Mozilla/4.7 GET /2002/6 HTTP/1.1 200 Mozilla/4.7 GET /2001/10 HTTP/1.1 200 Mozilla/4.7 GET /2000/6 HTTP/1.1 200 Mozilla/4.7 GET /2002/5 HTTP/1.1 200 Mozilla/4.7

Now that is odd. Netscape 4.7 is usually a bit more verbose about what it is than just Mozilla/4.7. Looking up the address I see that it belongs to NameProtect®:

NameProtect, Inc.® is committed to setting the industry standard when it comes to trademark research and registration services. As one of the world's leading trademark research firms, we have helped thousands of entrepreneurs, businesses, attorneys, and other intellectual property professionals with trademark needs.

NameProtect®—About us

Oh how nice …

I probably wouldn't be so upset over these guys if they weren't tring to hide behind a browser, or if they respected the Robots Exclusion Protocol, but they don't do either (and I wonder what they'll think of my using their logo here? It won't be the first time I got a cease-and-desist letter for trademark violations—my first, and so far, only one was in September/October of 1998).

This section of your report includes information on generic top-level domain names (.com, .net, .org) and other country-specific domain name registrations that are similar to your name. Use this section to identify potential competitors and assess the potential for your web traffic to be diverted.

NameGuard Free Name Monitoring

Okay, so removing the “anonymous” NameProtect® robot and rerunning again, I see I'm now down to a more normal 106 human visits per day, but just on the safe side … GET /2000/08/30 HTTP/1.0 200 Mozilla/3.0 (compatible) GET /2000/08/28.2 HTTP/1.0 200 Mozilla/3.0 (compatible) GET /2000/08/31.3 HTTP/1.0 200 Mozilla/3.0 (compatible) GET /2000/08/19.1 HTTP/1.0 200 Mozilla/3.0 (compatible) GET /2000/08/14.7 HTTP/1.0 200 Mozilla/3.0 (compatible) GET /2000/08/15 HTTP/1.0 200 Mozilla/3.0 (compatible)

Large number of requests from this address. 143 to be exact, the majority on December 8th and requesting entries mostly from August of 2000. Hard to tell if this is an actual user or a robot someone is working on. If I filter these requests out, I get 101 human visits per day.

Which is about what I expect.

A brief snippit of overheard conversation

“Do you mind if I open the blinds?” I asked.

“No,” said Spring, “go ahead.” I head over to the sliding glass door. Spring starts singing: “Let the sunshine in! Let the sunshine in!”

“Cut it with the hippy crap,” I said, opening the blinds.

Monday, January 06, 2003

Maybe one day sanity will return to the airlines

Last Thursday I was flying to LA on the Midnight flight. I went through security my usual sour stuff. I beeped, of course, and was shuttled to the “toss-em” line. A security guy came over. I assumed the position. I had a button up shirt on that was untucked. He reached around while he was behind me and grabbed around my front pocket. I guess he was going for my flashlight, but the area could have loosely been called “crotch.” I said, “You have to ask me before you touch me or it's assault.”

He said, “Once you cross that line, I can do whatever I want.”

I said that wasn't true. I say that I have the option of saying no and not flying. He said, “Are you going to let me search you, or do I just throw you out?”

I said, “Finish up, and then call the police please.”

When he was finished with my shoes, he said, “Okay, you can go.”

I said, “I'd like to see your supervisor and I'd like LVPD to come here as well. I was assaulted by you.”

He said, “You're free to go, there's no problem.”

I said, “I have a problem, please send someone over.”

Via jwz's Livejournal, Federal V.I.P. Penn

I like Penn (of Penn and Teller). He's great. And I think it's wonderful that he's willing to fight (and can afford to fight) this craziness of airline security. I think (and I think Penn thinks the same) that it's insane that he gets special treatment just because he's famous (and if I may get cynical here, that may also mean he has the resources to make this look really bad, or the financial resources to fight this in court).

I'm suing Attorney General John Ashcroft and various federal agencies, to make them stop demanding that citizens identify themselves in order to travel. Not only airports, but trains, buses, and cruise ships are now imposing ID requirements. This violates several constitutional rights. Stop showing ID whenever someone asks (or demands) it, and you will start to discover just what your rights are.

John Gilmore, Entrepreneur

So between Penn and John Gilmore's suit against the Federal goverment on behalf of anonymous travel, this is slowly restoring my faith that this current lunancy (link via The Duff Wire) will go away soon.

Forget antiglobalization—we're already there …

To: Sean Conner <>
Subject: A smile for you
Date: Sun, 5 Jan 2003 19:41:17 -0600


I found this amusing, but you've prolly already seen it. Just in case …

Keep smilin',

What is globalization, one may ask.

Well, below here is probably the best example on the definition of globalization.

Question: Explain “globalization?”

Answer: Princess Diana's death

Question: How's that?

Answer: An English princess with an Egyptian boyfriend crashes in a French tunnel, driving a German car with a Dutch engine, driven by a Belgian who was high on Scottish whiskey, followed closely by Italian Paparazzi, on Japanese motorcycles, treated by an American doctor, using Brazilian medicines!

And this is sent to you by a Russian-Jewish Canadian, using Bill Gates' technology which he stole from the Japanese. And you are probably reading this on one of the IBM clones that use Philippine-made chips, and Korean made monitors, assembled by Bangladeshi workers in a Singapore plant, transported by lorries driven by Indians, hijacked by Indonesians and finally sold to you by a Chinese!

That's Globalization!

Well … there isn't much else to add to that I'm afraid …

Tuesday, January 07, 2003

Things that make you go “Hmmmmm …”

Following are the ten most alarming theories about September 11, the “war on terror,” and the future of the world. Feel free to accept them as gospel, study them as symptoms of a traumatized culture, or scoff at them as anti-American propaganda: I'm only the messenger. Personally, though, at this point the only person I hold above suspicion in the matter of September 11 is that poor kid with the goat.

Via jwz's Livejournal , Top Ten Conspiracy Theories of 2002

As if the Top Ten Conspriacy Theories of the JKF weren't bad enough …

The thought never occured to me

Normally, a small form would arrive in the mail from the DMV and I would fill it out, and send it back with a check to renew the registration on my car. Fairly painless.

Only this year, I never did receive that form.

I suspect that's due to never having actually updated the address on my driver's license (even though I have only ten days from moving to do so) since going to the DMV is like being in a real life version of Brazil, only twice as annoying and no Harry Tuttle or Jill Layton to help out. But since I need to register my vehicle and I suspect not actually living at the address listed on my driver's license would cause a bureaucratic snafu the likes I've yet to see I figure the best course of action would be to break down and get a new driver's license!

Spring did mention finding an empty and fast DMV when she went nearly two years ago so I figure I would give that office a try. It couldn't hurt, right?

Never mind I was in a foul mood by the time I got there due to traffic. Never mind that I missed the strip mall and had to circle back around across six lanes of heavy season traffic. Never mind that the XXXXXXXXXXXXX XXXX of a XXXXXXXXX woman couldn't make up her mind as to which XXXXXXX parking lane she XXXXXXXXX wanted to park in, or the XXXXXXXXX XXXXXXX XXXXXXX of a woman was trying to back into a space in a huge conversion van and tying up traffic to XXXXXXX XXXX and back. Never mind all that.


And then some!

So I decided to try the old DMV office I used to use. It wasn't quite as crowded as the people were only crowded up to the door but not out.


I gave up the notion of just waiting in the office. No way I was going to waste five, six days easy waiting for my turn to talk to a surly public employee who probably never heard of “fast friendly service.”

But there was a web address listed on the door, claiming that you could renew your driver's license, renew the registration and about half a dozen other things on-line!

What have I got to loose?

Five minutes.

And that was to change the address on my driver's license and renew my car registration.

For some reason, the thought that I could do this all on-line never occured to me. Not once. Here I am, having had Internet access for twelve years now and it's still yet to sink in that other parts of society are now using this wonderful thing called the Internet!


Thursday, January 09, 2003

More things to make you go “Hmmmmmm …”

To: Sean Conner <>
Subject: Something to suck your time …
Date: Tue, 7 Jan 2003 23:35:44 -0500

Here's more on the “Things that make you go ‘Hmmmmm …’” topic you posted on your site today. This site goes into much more detail … very interesting … warning, big time sink ahead:

Fourty years later and we still don't know all the details about November 22nd, so I'm wondering just how long it'll be before all the details about September 11th come out.

I don't suppose anyone alive today will ever know for sure …

Friday, January 10, 2003

Notes on surviving a Slashdot Effect

If you read “meta” sites like Slashdot, Kuro5hin, Fark, Met4filter (natch), and Memepool you've probably encountered links to stories that you can't reach—namely because the act of linking to a server not prepared for massive traffic has brought down the server, or worse, put the hapless soul over their bandwidth cap denying any use to anyone for the rest of the month or day or whatever time period the ISP or hosting provider uses to allocate bandwidth.

The ethics of linking

Mark and I have often gone back and forth about what we would need to do to survive a slashdotting if we ever got linked. Most of the solutions we've come up with so far center on distributing the affected site(s) to other servers and round-robining (is that a term?) between them (or some other form of load balancing). So far, that hasn't been a problem (and thankfully—we both have fears of being slashdotted and finding the slagged remains of the 33MHz 486 that is currently our server).

But one of the suggestions in the “The ethics of linkage” is to redirect all requests back to Google as they can probably can't be slashdotted at all. By using mod_rewrite you can probably do something along the lines of:

RewriteEngine on
RewriteBase   /
# untested!  Use at own risk!
# be sure to change domain after "cache:" as needed
RewriteCond   %{HTTP_REFERER}% ^http://.**
RewriteRule   ^.*$$1 [R][L]

But it would only help if the URLs that are being slashdotted exist in the Google cache; otherwise it does no good. For instance this entry, the very one you are reading now, has yet (as of January 10th, 2003) to be read and cached by Google, and it probably won't be cached for some time. So I can only hope that if this article gets slashdotted, it's after Google has googled it.

Which means that it is still a good idea to think of other ways of surviving a slashdotting, but for an ad-hoc method, this is probably a decent solution until we get something better into place.

Saturday, January 11, 2003

“Avast ye swabbies! Copyright and Trademark violations abound!”

To: Sean Conner <>
Subject: More run-ins with
Date: Sat, 11 Jan 2003 04:52:18 -0600

Early on, I found spiders from rummaging through a bunch of my dynamically-generated web pages (message boards, mailing list admin pages, etc.) Of course, there was no reverse DNS on the spider and it was claiming to be some version of Internet Explorer, but hitting pages once a second and crawling every day on a dynamically-generated calendar is a tip-off you're not dealing with a meth-addled web surfer. Rainman, perhaps, but definitely not a real human.

I don't have anything to hide but that's no justification for letting ill-mannered commercial robots rummage through the electronic equivalent of my sock drawer. I close the door when I'm in the bathroom. I wear pants. Modesty and privacy do not imply improper behavior. Besides, I have a few hundred megabytes of photos of improv comedy shows I've played in. I don't want my connection saturated because some anonymous robot was brainlessly and greedily slurping content that no human was ever going to enjoy, at least not in the way I intended. My network, my rules.

Email from Bob Apthorpe

Now I know blogger's readership figures are inflated. I checked and sure enough, Cyveillance came ripping through my site last month for 213 hits (that I didn't notice—I think I'm now down to 75 or so real human hits per day). Now, unlike NameProtect®'s rather terse use of Mozilla/4.7 as a user-agent, Cyveillance has gone the other extreme:

Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0);Mozilla/4.0 (compatible; MSIE 5.05; Windows NT 5.0);Mozilla/4.0 (compatible; MSIE 5.05; Windows NT 4.0);Mozilla/4.0 (compatible; MSIE 5.05; Windows NT 3.51)

I guess they're running their robot under Windows 2000 (reported as Windows NT5.0), Windows NT 4.0 and Windows NT 3.51 and want to cover all the bases.

Brand Protection Solution

Cyveillance's Brand Protection Solution helps companies actively protect their brand equity by returning control over online brand integrity and use. By identifying and providing detailed intelligence on sites leveraging a company's brand for their own commercial purposes, Cyveillance enables companies to transform the Internet from a branding liability to a high-impact branding medium.

With Cyveillance's Brand Protection Solution, clients are able to accomplish the following:

Client Success Story

Many clients have leveraged Cyveillance's Brand Protection Solution to prevent revenue leakage and recoup lost dollars. For example, a large insurance agency leveraged Cyveillance's Brand Protection Solution because the client wanted to stop traffic diversion from its corporate Web site by other sites leveraging this client's name, logo and slogan to drive business. Cyveillance identified several hundred cases of sites diverting potential buyers away from this client's site. These cases included several in which the client's own agents were using the brand to drive traffic from the corporate Web site and others in which sites were using the recognizable name and logo in meta tags, URLs and titles.

With this knowledge, the client could immediately take action against the misrepresented sites, prevent further revenue leakage and strengthen brand equity.

Cyveillance Brand Management

Beautiful the way they phrase things, isn't it?

I would think that effective use of Google would be just as effective and possibly cheaper than hiring an outfit like NameProtect® or Cyveillance, but that's just me.

It would be nice if these sites would follow the Robots Exclusion protocol but nooooooooooooo!

My only consolation is that they find their way towards, because, you know, the name says it all, and besides, they just <SARCASM>loooooove robots</SARCASM> coming through their site.

Sunday, January 12, 2003

When shell scripts are faster than C programs

My stats program for The Boston Diaries basically consists of a shells script that calls a custom program (in C) to print out only certain fields from the website logfile which is then fed into a pipeline of some twenty invocations of grep. It basically looks like:

cat logfile | 				     \
escanlog -status 200 -host -command -agent | \
        grep Mozilla |                       \
        grep -v 'Slurp/cat' |                \
        grep -v 'ZyBorg' |                   \
        grep -v 'bdstyle.css' |              \
        grep -v 'screen.css' |               \
        grep -v '^'|           \
        grep -v '^' |             \
        grep -v '^' |            \
        grep -v 'Ask Jeeves' |               \
        grep -v '' |              \
        grep -v '"Mozilla"' |                \
        grep -v 'Mozilla/4.5' |              \
        grep -v '.gif ' |                    \
        grep -v '.png ' |                    \
        grep -v '.jpg ' |                    \
        grep -v 'bostondiaries.rss' |        \
        grep -v 'bd.rss' |                   \
        grep -v 'favicon.ico' |              \
        grep -v 'robots.txt' |               \
        grep -v $HOMEIP

It's servicable, but it does filter out Lynx and possibly Opera users since I filter for Mozilla and then reject what I don't want. Twenty greps—that's pretty harsh, especially on my server. And given that more and more robots are hiding themselves it seems, the list of exclusions could only get longer and longer.

I think that at this point, a custom program would be much better.

So I wrote one. In C. Why not Perl? Well, I don't know Perl, and I have all the code I really need in C already; there's even a regex library installed on the systems I can call, so that, mixed with the code I already have to parse a website log file, and an extensive library of of C code to handle higher level data structures it wouldn't take me all that long to write the program I wanted.

First, start out with a list of rules:

# configuration file for filtering web log files

reject	host	'^XXXXXXXXXXXXX$'	# home ip

filter  status  '^2[0-9][0-9]$'

reject  command '^HEAD.*'
reject  command '.*bostondiaries\.rss.*'
reject  command '.*/robots\.txt.*'
reject  command '.*/bd\.rss.*'
reject  command '.*/CSS/screen\.css.*'
reject  command '.*/bdstyle\.css.*'
reject  command '.*\.gif .*'
reject  command '.*\.png .*'
reject  command '.*\.jpg .*'
reject  command '.*favicon\.ico.*'

accept  agent   '.*Lynx.*'

filter  agent   '.*Mozilla.*'

reject  agent   '.*Slurp/cat.*'
reject  agent   '.*ZyBorg.*'
reject  agent   '.*la2@unspecified\.mail.*'
reject  agent   '.*Ask Jeeves.*'
reject  agent   '.*Gulper Web Bot.*'
reject  agent   '^Mozilla$'
reject  agent   '^Mozilla/4.7$'
reject  agent   '^Mozilla/4.5 \[en\] (Win98; I)$'
reject  agent   '.**'
reject  host    '^4\.64\.202\.64$'
reject  host    '^63\.148\.99.*'

The second column is which field you want to check from the logfile while the last column is the regular expression to match on. The first column is the rule to apply; basically, filter will continue processing if the field matches the regular expression, otherwise that request is discarded. reject will automatically reject the request if the given field matches the regular expression, and accept will automatically accept the request (both of these are forms of short circuiting the evaluation). Once all the rules are finished processing (or an accept is run, the resulting request is printed.

So, for example, the first line there rejects any requests from my home network) (as requests from there would skew the results), and the second line will reject any request that doesn't result in a valid page. And we go on from there.

The programing itself goes rather quickly. That is good.

The program itself goes rather slowly. That is not good.

It is, in fact, slower than the shell script with twenty invocations to grep. It is, in fact, a C program that is beaten by a shell script.

That is really not good.

Time to profile the code. Now, it might be that the regex library I was calling was slow, but I discounted that—it should be well tested, right?


Results of profiling: each sample counts as 0.01 seconds.
% time cumulative seconds self seconds calls self ms/call total ms/call function
65.69 0.90 0.90 11468 0.08 0.08 read_line
8.03 1.01 0.11 11467 0.01 0.02 process_rules
5.84 1.09 0.08 265271 0.00 0.00 NodeValid
5.84 1.17 0.08 11467 0.01 0.01 read_entry
2.92 1.21 0.04 11495 0.00 0.00 mem_alloc

So right away we can see that read_line() is sucking up a lot of time here. Let's see what it could be:

char *read_line(FILE *fpin)
  int     c;
  char   *buffer = NULL;
  char   *np;
  size_t  size   = 0;
  size_t  ns     = 0;

    if (size == ns)
      ns = size + LINESIZE;
      np = realloc(buffer,ns);
      if (np == NULL) return(buffer);
      buffer = np;
      buffer[size] = '\0';

    c = fgetc(fpin);
    if (c == '\n') return(buffer);
    buffer[size]   = c;
    buffer[++size] = '\0';

Not much going on. About the only thing that might be sucking up time is the allocation of memory (since LINESIZE was set to 256). I ran a quick test and found that the longest line in the logfiles is under 1,000 characters, while the average size was about 170. Given the number of lines in some of the files (40,000 lines, which as logfiles go, isn't that big) and given that the code I have makes a duplicate of the line (since I break one of those up into individual fields) I'm calling malloc() upwards of 80,000 times!—more if the lines are exceptionally long.

Fast forward over rewriting and profiling. Since malloc() can be an expensive operation, skip that so we need to pass in the memory to use to read_line() and since we're going to make a copy of it anyway why not pass in two blocks of memory?

void read_line(FILE *fpin,char *p1,char *p2,size_t size)
  int          c;

  while((size-- > 0) && (c = fgetc(fpin)) != EOF)
    if (c == '\n') break;
    *p1++ = *p2++ = c;
  *p1 = *p2  = '\0';

Now, run it under the profiler:

Results of profiling: each sample counts as 0.01 seconds.
% time cumulative seconds self seconds calls self ms/call total ms/call function
80.93 2.97 2.97 1 2970.00 3670.00 process
8.99 3.30 0.33 11468 0.03 0.03 read_line
5.18 3.49 0.19 11467 0.02 0.02 read_entry
4.90 3.67 0.18 11467 0.02 0.02 process_rules
0.00 3.67 0.00 31 0.00 0.00 empty_string

Much better. Now the actual processing is taking most of the time, so time to test it again on the server.

Still slower than the shell script.


I'm beginning to suspect that it's the call to regexec() (the regular expression engine) that is slowing the program down, but there is still more I can do to speed the program up.

I can remove the call to fgetc(). Under Unix, I can map the file into memory using mmap() and then it just becomes searching through memory looking for each line instead of having to explicitly call the I/O routines. Okay, so I code up my own File object, which mmap()'s the file into memory, and modify read_line() appropriately, compile and profile:

Results of profiling: each sample counts as 0.01 seconds.
% time cumulative seconds self seconds calls self ms/call total ms/call function
42.86 0.06 0.06 11467 5.23 5.23 process_rules
35.71 0.11 0.05 11467 4.36 4.36 read_entry
7.14 0.12 0.01 11468 0.87 0.87 file_eof
7.14 0.13 0.01 11467 0.87 0.87 read_line
7.14 0.14 0.01 1 10000.00 140000.00 process

Finally! process_rules() finally shows up as sucking up the majority of runtime. Test it on the server and … it's still slow.

Okay, so now I know that regexec() is making a C program slower than a shell script with twenty invocations of grep. Just to satisfy my own curiousity, I crank up the optimizations the compiler uses (using gcc -O4 -fomit-frame-pointer ...) and create two versions—one with the call to regexec() stubbed out and one with the call still in. I then run both on the server, timing the execution of each.

Timings of two programs—one calling regexec() and one not
Stubbed version regexec() version
User time 6.16 2842.71
System time 0.21 41.02
Elapsed time 00:06.38 48:41.89

Not quite seven seconds for the stubbed version, and almost an hour for the one calling regexec(). And this on this month's logfile, which isn't even complete yet (approximately 11,400 lines). I'm beginning to wonder just what options RedHat used to compile the regex library.

I then search the net for a later version of the library. It seems there really is only one which nearly everybody doing regular expressions in C uses, written by Henry Spencer and last updated in 1997 (which means it's very stable code, or no one is using C anymore—both of which may be true). I sucked down a copy, compiled and ran the regression tests it came with. It passed, so I recompiled with heavy optimzation (gcc -O4 -fomit-frame-pointer ...) and used that in my program:

Results of profiling: each sample counts as 0.01 seconds.
% time cumulative seconds self seconds calls self ms/call total ms/call function
60.68 1.96 1.96 424573 0.00 0.00 sstep
21.67 2.66 0.70 137497 0.01 0.02 smatcher
10.84 3.01 0.35 20124 0.02 0.11 sfast
2.79 3.10 0.09 137497 0.00 0.02 regexec
2.17 3.17 0.07 11467 0.01 0.28 process_rules

Okay, now we're getting somewhere. Try it on the server and it runs in less than a minute (which is tollerable, given the machine).


I guess the main point of this is to make sure when you profile code, to make sure that any libraries you may be using are also being profiled! I could have saved quite a bit of time if I knew for a fact that it was the regex library that was slowing me down (I could have, in fact, just did the stub thing first and see if it was indeed my code, but alas … ). But even so, it was a fun exercise to do.

Tuesday, January 14, 2003

Russian Ark

Hence there was but a single shooting day with four hours of existing light. Thousands of people in front of and behind the camera simply had to work together perfectly. The Hermitage was closed and restored to its original condition allowing cinematographer Tilman Buttner to travel through the Museum through an equivalent of 33 studios, each of which had to be lit in one go to allow for 360-degree camera movements. All of this was accomplished within a vulnerable environment that holds some of the greatest art treasures of all time, from Da Vinci to Rembrandt. After months of rehearsals, 867 actors, hundreds of extras, three live orchestras and 22 assistant directors had to know their precise positions and lines.

Via, Russian Ark: Production Notes

A movie filmed in one 90-minute take.


While this wasn't the first film planned as a “real time” film (Hitchcock did one, but if you watch the film, you'll notice that about every 10 to 12 minutes the camera will track to a wierd location (like the back of someone's black coat) and then continue on—it was during these “tracking shots” that the action was stopped, camera reloaded, then filming resumed so that, if need be, any ten minute segment of that film could conceivably be reshot without destroying the “continuity” as it were) it's the first to be shot continuously in real time; no film reloading here.

Continuous tracking shots are hard to do. Robert Altman's The Player has as its opening shot the longest (at the time) and most technically complicated tracking shot in a film pushing the limits of a single film canister (and if you look closely at the shot, you'll see Robert Altman himself, pitching a sequel to The Graduate). Russian Ark, however, is orders of magnitude beyond that.

Eight hundred and sixty-seven actors!

My mind is boggling at the thought.

Information wants to be availale …

The reasons are clear enough: in an attention economy, the key is to capture customers and keep them focused. The dojinshi market does exactly that. Fans obsess; obsessions work to the benefit of the original artist. Thus, were the law to ban dojinshi, lawyers may sleep better, but the market for comics generally would be hurt. Manga publishers in Japan recognize this. They understand how “theft” can benefit the “victim,” even if lawyers are trained to make the thought inconceivable.

Via dive into mark, What lawyers can learn from comic books

I've linked to a few other articles where making intellectual property (books, music, comics) easily available helps sales in the long run, even if it may facilitate an apparent “pirate market” in the short run. And this article by Lawrence Lessig expresses that point all the more so (and while he is correct in his reference to the potential legal action by Sony against someone who hacked the Aibo, I can see Sony's side of the picture—they were trying to limit their liability if someone saw the information, hacked their Aibo and broke it, then tried to return it to Sony possibly despite langauge in their warantee that modifications to the Aibo will void it; Sonly has since changed their mind).

Thursday, January 16, 2003

Snippits of an overheard conversation dealing with a convalescing motorcycle accident victim and his erstwhile friend who has a remarkable revelation to make

“Gregory, you've been a lot nicer since your accident, you know that?”

“I guess having a near-death experience will do that to you.”

“It's either that or the drugs.”

Alphabet Soup of the Viet-Cong

I started to convert my website to XML back in October but what with the holiday season and what not, that particular project got pushed back. But the past week or so I've resurrected that particular project and I've been immersed in XML, XSLT, HTML and CSS and other alphabet soups of technology.

I had started with converting my humor columns over to XML (of which I had converted about half) and wrote an XSLT file to convert them to XHTML. Earlier this week, I picked up where I resumed, converting the rest to XML and tweaking the templates. Given the recent brouhaha over XHTML, plus an inability of some older browsers to properly handle the XHTML markup, I rethought the notion of using XHTML and went back to HTML 4 strict (which is now an easy thing to do given I'm using templates).

I had started with one of the lower sections of my website and was working my way up (I had finished with Murphy's Law, now time to work on the High-Brow Literary Section) when I started having integration problems, mainly with XSLT. I was working on one template file for the writing section, and I already had a separate template file for the columns.

<?xml version="1.0" ?>


<xsl:include href="murphy/murphy.xsl"/>


When I wasn't getting errors I was getting odd results. Perhaps it was still my unfamiliarity with XSLT and the differences between <xsl:include> and <xsl:import> but I was having a difficult trying to locate the source of the odd results, like spurious output when there shouldn't have been any.

I then switched to a top-down conversion, with a single XSLT file. I rewrote what I had, making naming changes to clarify what template what was and what was going on.

Whatever I did, it cleared up the problems I was having.

It's slow going, and XSLT is not the prettiest of languages to program in (and yes, it is Turing Complete so it is a programming language) and I'm still trying to get used to XPath expressions.

<xsl:call-template name="common-meta-tags">
  <xsl:with-param name="year"><xsl:value-of select="substring(@ref,1,4)"/></xsl:with-param>

<xsl:if test="position()&gt;1">

<xsl:if test="position()&lt;last()">

site.xsl—Portion of code to generate the index of about pages

Yes, the relational operators like < have to be encoded as &lt; since this is XML—like I said, it's not pretty. And certain XPath expressions can use the short form, while others (such as selecting adjacent nodes with preceding-sibling and following-sibling) have to use the fully qualified notation and you have to know when that is (of the thirteen axis you can step by, child, descendant, descendant-or-self, parent, ancestor, ancestor-or-self, following-sibling, preceding-sibling, following, preceding, attribute, namespace or self only five can be expressed in a shorthand notation, self (as “.”), parent (as “‥”), child (as the name of the element), descentant-or-self (as “/”) or attribute (with a “@” preceding the name of the attribute)).

I hope you got all that (and I suspect I just lost all my readers at this point).

You also don't have variables, even though <xsl:variable> would lead you to think so; it's more a named constant than a variable.

It's stuff like this that reminds me of the Vietnam draftee Kansas farm boy walking through the jungles of South East Asia oblivious to the various trip wires the Viet-Cong have planted …

Friday, January 17, 2003

Twenty-first century CB radio …

About two weeks ago I made a brief mention of roving wireless network. There is now a webpage with more information.

Very geeky, and very cool …

Sunday, January 19, 2003

“I'm turning Japanese, I think I'm turning Japanese, I really think so.”

JeffK mentioned that over the past few days, when he views The Boston Diaries his browser asks if he wants to download and install Japanese language support. I found the notion odd, but like some stores I've heard, computers can be affect by wierd things so it was remotely possible that for whatever reason his browser felt the need to install Japanese language support whenever my page was loaded.

So we head over to his computer and as he's bringing up my blog, it suddently hits me why his computer is asking to install Japanese language support: my entry on the 14th! (don't worry, it's fixed for now).

When writing English, I was taught that you italicize foreign words. Easy enough to do in HTML, just slap some <I> tags around the word and be done with it. But semantically that doesn't really mean anything, what with the semantic web being a current hot topic and all. While it's apparent to most readers that garçon is French and über is German, what about slumpmässig? Could be German for all you know (it's not—it's Swedish). By using the features inherent in HTML we can add semantics to foreign words beyond just italicizing them.

And that's what I do, in fact. For a foreign word like slumpmässig I'll encode it up like:

<I LANG="se" TITLE="chance; luck, hazard">slumpm&auml;ssig</I>

Certain browsers, like MSIE and Mozilla, will display a tooltip with the text in the TITLE attribute, where I stick the translation of the word (if you happen to be using MSIE or Mozilla, try holding the mouse over a foreign word), and an intelligently programmed HTML vocalizer (used perhaps, by the blind to speak pages) can use the language tag to help recognize which language the word is written in and use that to guide the pronounciation.

Semantically much better than just <I>slumpm&auml;ssig<I>.

So, when I wrote that entry on the 14th I did what I've been doing now for some time and slapped some semantics around the Japanese terms.

The <I LANG="ja" TITLE="fan art">dojinshi</I> market .... <I LANG="ja" TITLE="comic book">Manga</I> publishers ...

They are Japanese terms after all.

Since I seem to already have the Japanese language support installed I didn't notice anything odd when I loaded the page to proof read the entry. But it seems that other browsers that don't have the Japanese language support saw the language attribute for “Japanese,” realized they weren't installed, so decided to ask the user if it was okay to install Japanese language support. But I'm using an Anglicized spelling for a Japanese word so there's no real need to download Japanese language support for what I used, so how do I get around that?

That, I don't know. I'm fudging it right now by using LANG="x-ja" which is allowed (any language code starting with “x” is for private use; that shouldn't trigger any download message from browsers—it's intended for words like Nazgûl which don't have an officially designated language), which I suppose, is better than nothing.

Update on Saturday, September 23rd, 2023

I think it's more semantically correctly to use the <I> tag than the <SPAN> tag to mark foreign words, so I'm going back and making that change.

Monday, January 20, 2003

A night of driving

Spring and I met Mark at JeffK's house, and from there we headed over to Kelly's for a night time (or rather, very early morning time) wardriving session. It was planned earlier that day pretty much on a whim, just to see what we could find. Mark had thought we would only find two WAPs; I on the other hand thought we would more than a dozen. Since I had the largest car at the moment (Mark's BMW is currently in the shop) I was the designated driver for tonight's festivities.

Once at Kelly's we piled into my car—me in the driver seat, Mark got shutgun, with JeffK, Spring and Kelly in the back seat. To ensure that the computer would remain operational for the duration of the trip, Mark brought along a transformer that ran off the cigarette lighter and provided 110 volts AC current to avoid having to use the laptop batteries; Mark and Kelly both using laptops.

As we started driving off, Kelly realized that he had neglected to install the proper scanning software on his laptop, so we returned back to his house. We sat in the driveway for some twenty minutes as Kelly attempted multiple times to download the software via his wireless access point. While he was doing that, I took the opportunity to take a few pictures while Mark checked his email.

Once the software was installed, we started wardriving. We determined that the scanning software worked better with Mark's wireless network card, so that was installed in Kelly's machine. Less than a mile from Kelly's house we hit our first WAP. The position was recorded for later analysis.

We then headed south down towards Cypress Creek Blvd. with a high concentration of high tech companies in South Florida. Once there we headed east towards I-95. It was pretty quiet until we turned into a large corporate park where we hit about three WAPs right there; one closed off, two open. One of the open ones belonged to a hotel—we assumed for the benefit of their customers.

From there we then headed south along Powerline Road and while we would occasionally get a stray signal we couldn't get a strong lock on any one WAP. Fortunately there was very little in the way of traffic since we were (or rather, I was) driving quite erratic, taking sudden turns, backing up, driving very slowly, attempting to track down the stray signals.

After that, we tried driving through Margate and in the north western corner of the city (18th near 80th, where I used to live during high school) we found one WAP, athough we couldn't find it with three additional passes; afterwhich we decided that it be best to move on least we attract unwanted attention.

We then headed back towards Kelly's and on the way, we were able to pin down the first WAP we found to one of two houses in a gated community. Oddly enough, the scanning software incorrectly identified the openness of Kelly's WAP. Go figure.

So we found more WAPs than Mark thought we would, but less than I expected. Not too bad for something planned at the last minute. This is something we are planning on doing again, only this time with better planning and hopefully, better software.

Friday, January 24, 2003

A small work reunion

I had another reunion today, this time a reunion of people who formerly worked for XXXXXXXXXXXXXXXXX, an ISP we all worked for in Boca Raton. The venue for today's lunch was Lucile's Bad to the Bone BBQ, a favorite locale for us high tech workers in Boca Raton.

Rob and I arrived first to find the restaurant without power (in fact, the entire strip mall was out of power). Consequently the available menu was quite limited today—grilled items only and no soda (except for diet rootbeer which were bottled) or draft beer. Despite the lack of power they were still doing brisk business and it certainly didn't bother us that much—burgers and ribs were still available.

Joining us for lunch was Tim (the former web designer), NeoMike (former sysadmin) and R, (former network admin) who was still working there as a consultant. The place has passed from being an ISP to a bona fide spam house with fairly insane security (as Rob found out trying to visit R at the office) which is understandable—being a spam house and all.

Quite a bit of conversation centered around the duties that R does for SpamHouse. As reprehensible as spam is, the technical challenges do sound facinating what with having to route taffic out one circuit with inbound traffic coming in from another, and updating the BGP tables to keep connectivity. It's the technical challenge that makes the job interesting; something like the technical challenges inherent in writing a virus. Or nuclear weapons.



Complete chaos.

Bob, the DM of the Friday Night Game, set up a WAP so I borrowed the wireless network card from Rob to test the network (obviously it works). But it's a mad house here right now with scores of people visiting (since he himself is having a reunion of friends tomorrow.

Oops, food is here. Gotta go …

Monday, January 27, 2003

Trading towers

I had such the headache this morning; my eye balls felt like popping out. Quite bad. And yet the pain brought these weird thoughts to mind.

“What wierd thoughts,” asked Spring.

“I have this odd mental picture—Trading Spaces in Middle-Earth. Sauron and Saruman agreed to switch towers and redecorate one of the rooms in two days and only $1,000 budget.”

Spring giggled. “Elrond and Frodo exchanges houses. Let's see what Elrond does with the Hobbit hole …”

Napkin holder found at McDonalds'

Look Up for Napkins

A distrubing trend

Rob knocked on the bathroom door. “Sean! Spring cut her hand!

“What?” I said. The door had muffled Rob.

“Spring cut her hand!”

“Okay, I'll be out in a second.” I finished up with the business at hand, then grabbed the gauze bandages and hydrogen peroxide. I couldn't find the medical tape that I knew I had somewhere, so I ran downstairs with the supplies I had.

“Spring cut her hand on the futon,” said Rob. Spring, barly standing, was bent over the kitchen sink, holding her right hand under a stream of water. Rob had expressed interest in a futon and earlier today Spring found one while shopping and bought it. Then later she and Rob went back to the store (since Rob has a vehicle large enough to transport it) to pick it up, and while attempting to carry it into the Facility in the Middle of Nowhere it closed on her hand, slicing her middle finger. Rob ripped open some of the gauze bandages while I went back upstairs to find the medical tape.

“I couldn't find any,” I said a few minutes later.

“Well, grab any type of tape. Duct tape, electrical tape. Anything,” said Rob. I located the tape box in the kitchen and pulled out a roll of electrical tape. “Perfect,” said Rob. I pulled off a long piece of it and handed it to him. “This may hurt,” he said and started wrapping it around the bandages applied to Spring's hand. We then drove to the Emergency Room.

This is the third time in four months I've been to the emergency room (for Rob in September and then Gregory in December). Prior to September I think I've been to the emergency room twice in my life (that I can recall).

This is not a good trend.

They took Spring in immedately and rebandged her hand. Then it was about an hour before they were able to fully examine the wound. An X-ray (nothing broken, no cut tendons), four stitches and a prescription later she was released. She should be find by next week when the stitches come out.

Notes made at an Emergency Room on a found piece of paper and pen using a Woman's Day Magazine as a tablet while waiting for Spring.

Tuesday, January 28, 2003

We're number one!

Woo hoo!

According to State Farm, South Florida has the number one most dangerous intersection in the United States! Flamingo Road and Pines (aka Hollywood) Boulevard.

South Florida is also home to the top three most dangerous intersections in Florida!

While I haven't personally seen the #1 spot, I have, however, driven through the other two dangerous intersections, Sunrise and University and Commercial and University and frankly, I don't see much difference between those and any other intersection in South Florida.

Oddly enough, all three are in Broward County and out west (depends on who you ask what “west” means—for some it's anything west if I-95; others it's anything west of US-441; generally by University you are in the western portion of South Florida, at least as Broward County is concerned).

Wednesday, January 29, 2003

Call, while supplies last …

Does anyone want a cheap Nigerian 419 Scam email knockoff? It's from Zimbabwe but it has all the hall marks of your classical Nigerian 419 Scam (but only for a paltry US$10,000,000 I'm afraid). I'm tired of the silliness and I know that some of my friends have yet to receive one, so I thought I'd spread the wealth …


I've had my eye on that state for a while: it seems that whenever constitutional evil is being perpetrated in this country, Florida is mixed up in the mess, and never in a good way.

Florida and the Death of Justice

I feel like I'm in an episode of Connections.

It starts with Spring, looking on the Internet to buy a box of blank white cards—the type she used in language class years ago. She comes across 1000 Blank White Cards, a game created in Boston (Cambridge, but close enough), named after a box of 1000 blank white cards (used by language students, like Spring, to make flash cards) and inspired by Nomic (a game where, like law, you can change the rules). She checks the site out, and finds a link to The Boston Diaries, which has nothing to do with Boston except the name. The author of The Boston Diaries then starts searching for more information on 1000 Blank White Cards and comes across the Seattle Electric Grimmeldeck, which is possibly named after James Grimmelmann, who obviously plays the game and has written an article about Constitutional Law and Florida—Florida being the state where both Spring and the author of The Boston Diaries live.

Where's James Burke when you need him?

Doesn't matter if you're rich or poor, associations are eeeeeeevil!

When Southampton decided, this fall, to place a limit on the size of all new houses, it settled on twenty thousand square feet, on the ground that that figure represents a reasonable limit, given the big-house norms of the area. At twenty thousand square feet, a house has perhaps ten or eleven bedrooms, a dozen bathrooms, a six-car garage, and maybe, oh, a mini-trading floor for the kids. By comparison, Rennert's house, at forty-two thousand square feet, has twenty-nine bedrooms, thirty-three bathrooms, and two bowling alleys. What the Town of Southampton was saying, in other words, is that twelve bedrooms and one bowling alley is fine, but twenty-nine bedrooms and two bowling alleys is not. Think of the twenty-thousand figure as the community standard—a social consensus—for the maximum size a Hamptons monster home ought to be. With that extra bowling alley and those seventeen additional bedrooms, Rennert just went too far.

Sagaponack HOmeowners Association vs. Ira Rennert

It's warming to know that even the insanely stupid rich have problems with associations. Oh, and who is this Ira Rennert to whom a 20,000 square foot home is just too small? Oh, one of those robber barron CEO type people with a penchant for funnelling money into his pocket.

Obligatory Picture

[“I am NOT a number, I am … a Q-CODE!”]

Obligatory Contact Info

Obligatory Feeds

Obligatory Links

Obligatory Miscellaneous

You have my permission to link freely to any entry here. Go ahead, I won't bite. I promise.

The dates are the permanent links to that day's entries (or entry, if there is only one entry). The titles are the permanent links to that entry only. The format for the links are simple: Start with the base link for this site:, then add the date you are interested in, say 2000/08/01, so that would make the final URL:

You can also specify the entire month by leaving off the day portion. You can even select an arbitrary portion of time.

You may also note subtle shading of the links and that's intentional: the “closer” the link is (relative to the page) the “brighter” it appears. It's an experiment in using color shading to denote the distance a link is from here. If you don't notice it, don't worry; it's not all that important.

It is assumed that every brand name, slogan, corporate name, symbol, design element, et cetera mentioned in these pages is a protected and/or trademarked entity, the sole property of its owner(s), and acknowledgement of this status is implied.

Copyright © 1999-2024 by Sean Conner. All Rights Reserved.