Saturday, July 03, 2004
Take me out to the ballpark
Through Spring's church, we obtained tickets to a minor-league baseball game at Roger Dean Stadium, which is located at the north campus of FAU. I wasn't all that thrilled, since this is baseball we're talking about here; I find watching paint dry to be more stimulating than watching a bunch of tabaco chewing groin scratching men stand around a dirt field for a few hours. I find football to be more watchable than baseball (football also has the advantage of having cute cheerleaders).
But I said I would go, if only for the opportunity to take pictures.
Which I did (and I ran out of storage space, having taken 115 pictures—luckily I'm not subjecting you to every last one of them).
I must admit that the minor-league game was a bit more enjoyable to watch the a major-league game. The first half of the game went by quickly; the first four innings screamed by in a little over an hour. It was the last five innings that took the remaining two hours.
Sure, there was the seven inning stretch, where we listened to Kate Smith's rendition of “God Bless America” but there were also a bunch of silly events for kids, like a race from 1st to 2nd base with the Palm Beach Cardinals mascot, or the remote controlled car race from 1st to 3rd that took time between innings.
But I suspect that the teams started playing harder in the last half, and for a while in the 8th it looked like the Palm Beach Cardinals might pull out ahead, but no, in the end, the Jupiter Hammerheads won 4 to 2.
There was a fireworks show after the game—an impressive display with patriotic songs blaring away through the speakers, like Bruce Springsteen's “Born in the U.S.A.,” Neil Diamond's “America,” a John Philip Sousa march and John Cougar Peter Paul Mary Tiger Mellencamp Jingleheimerschmit's “R.O.C.K. in the U.S.A” among others. This was a Good Thing™ since Spring has to work tomorrow and it would be very unlikely we would be able to make any fireworks show (and amazingly enough, this year The Kids (especially The Younger) haven't had fireworks fever—man, was it bad last year).
Sunday, July 04, 2004
Blow Stuff Up Day
Have a safe and wonderful Fourth (and hopefully, this won't happen to you).
Wednesday, July 07, 2004
Thoughts about Gmail
I have a Gmail account (sean.conner@gmail.com—okay Google, let's test your anti-spam measures) and I've been playing around with it today. For a web application it's pretty slick. Instead of placing email within folders, you instead can add labels to each email. So I can create a label called “Friends” and tag email from my friends with it. So far, not much different than folders, but the kicker is that you can attach multiple labels to each email. So for instance, I can create another label “Mississippabamasisboombah” and if I were to receive email from Hoade, I can label it with both “Friends” and “Mississippabamasisboombah.” The interface to create labels and tag emails is easy.
It also seems to automatically apply labels to incoming email, probably based upon the content of emails already labeled. Email I received from my friend Ken Maier (whom I gave an invite to) was automatically labeled. Both slick and scary at the same time if you ask me, especially since I have privacy concerns about Gmail.
No conversations in the trash. Who needs delete when you have 1000 MB of storage?!
Gmail trash filter
There also doesn't seem to be a way to export your email from Gmail. Granted, you have a gigabyte of storage and I'm sure that given their Google File System loosing data isn't that much of a concern, but still, I would like a local copy of my email just the same, thank you very much. Ken did send me a link to this tool that will supposedly download email stored at Gmail, but I think it may fall foul of Gmail's Terms of Use:
5. Intellectual Property Rights. Google's Intellectual Property Rights. You acknowledge that Google owns all right, title and interest in and to the Service, including without limitation all intellectual property rights (the “Google Rights”), and such Google Rights are protected by U.S. and international intellectual property laws. Accordingly, you agree that you will not copy, reproduce, alter, modify, or create derivative works from the Service. You also agree that you will not use any robot, spider, other automated device, or manual process to monitor or copy any content from the Service. The Google Rights include rights to (i) the Service developed and provided by Google; and (ii) all software associated with the Service. The Google Rights do not include third-party content used as part of Service, including the content of communications appearing on the Service.
Gmail's Terms of Use (emphasis added)
That bit about “manual process to monitor or copy any content” is a bit
worrying too; if I bounce forward (Gmail seems to lack a
“bounce” feature, which sucks as I use that quite often) all my email
would that fall under the “manual process to copy any content?” Even if
it's my own content? Remember, “Google reserves the right to
refuse service to anyone at any time without notice for any reason.” Fall
foul of Google, and poof there goes your access to your email.
I think Spring has the right idea for Gmail—she uses it for her mailing lists; since it goes out to multiple recipients it's not exactly private and most mailing lists keep an archive anyway so loss of use of Gmail isn't that bad; it's only bad if Gmail is your primary source of email.
Saturday, July 10, 2004
Is profiling even viable now?
Mark brought up (in email) an interesting optimization technique using GCC 3:
I came across an interesting optimization that is GCC specific but quite clever.
In lots of places in the Linux kernel you will see something like:
p = get_some_object(); if (unlikely(p == NULL)) { kill_random_process(); return (ESOMETHING); } do_stuff(p);The conditional is clearly an error path and as such means it is rarely taken. This is actually a macro defined like this:
#define unlikely(b) __builtin_expect(b, 0)On newer versions of GCC this tells the compiler to expect the condition not to be taken. You could also tell the compiler that the branch is likely to be taken:
#define likely(b) __builtin_expect(b, 1)So how does this help GCC anyhow? Well, on some architectures (PowerPC) there is actually a bit in the branch instruction to tell the CPU's speculative execution unit if the branch is likely to be taken. On other architectures it avoids conditional branches to make the “fast path” branch free (with
-freorder-blocks
).
I was curious to see if this would actually help any, so I found a
machine that had GCC 3 installed (swift
), compiled a version of
mod_blog with profiling
information, ran it, found a function that looked good to speed up, added
some calls to __builtin_expect()
, reran the code and got a
rather encouragine interesting result.
I then reran the code, and got a completely different result.
In fact, each time I run the code, the profiling information I get is nearly useless—well, to a degree. For instance one run:
% time | cumulative seconds | self seconds | calls | self ms/call | total ms/call | name |
---|---|---|---|---|---|---|
100.00 | 0.01 | 0.01 | 119529 | 0.00 | 0.00 | line_ioreq |
0.00 | 0.01 | 0.00 | 141779 | 0.00 | 0.00 | BufferIOCtl |
0.00 | 0.01 | 0.00 | 60991 | 0.00 | 0.00 | line_readchar |
0.00 | 0.01 | 0.00 | 59747 | 0.00 | 0.00 | ht_readchar |
Then another run:
% time | cumulative seconds | self seconds | calls | self ms/call | total ms/call | name |
---|---|---|---|---|---|---|
33.33 | 0.01 | 0.01 | 119529 | 0.00 | 0.00 | line_ioreq |
33.33 | 0.02 | 0.01 | 60991 | 0.00 | 0.00 | line_readchar |
33.33 | 0.03 | 0.01 | 21200 | 0.00 | 0.00 | ufh_write |
0.00 | 0.03 | 0.00 | 141779 | 0.00 | 0.00 | BufferIOCtl |
Yet another run:
% time | cumulative seconds | self seconds | calls | self ms/call | total ms/call | name |
---|---|---|---|---|---|---|
0.00 | 0.00 | 0.00 | 141779 | 0.00 | 0.00 | BufferIOCtl |
0.00 | 0.00 | 0.00 | 119529 | 0.00 | 0.00 | line_ioreq |
0.00 | 0.00 | 0.00 | 60991 | 0.00 | 0.00 | line_readchar |
0.00 | 0.00 | 0.00 | 59747 | 0.00 | 0.00 | ht_readchar |
And still another one:
% time | cumulative seconds | self seconds | calls | self ms/call | total ms/call | name |
---|---|---|---|---|---|---|
50.00 | 0.01 | 0.01 | 60991 | 0.00 | 0.00 | line_readchar |
50.00 | 0.02 | 0.01 | 1990 | 0.01 | 0.01 | HtmlParseNext |
0.00 | 0.02 | 0.00 | 141779 | 0.00 | 0.00 | BufferIOCtl |
0.00 | 0.02 | 0.00 | 119529 | 0.00 | 0.00 | line_ioreq |
Like I said, nearly useless. Sure, there are the usual suspects, like
BufferIOCtl()
and line_ioreq()
, but it's
impossible to say what improvements I'm getting by doing this. And by
today's standards, swift
isn't a fast machine being only
(only!) a 1.3GHz Pentium III
with half a gig of RAM. I
could only imagine the impossibility of profiling under a faster machine, or
even imagining what could be profiled under a faster machine.
I have to wonder what the Linux guys are smoking to even think, in the
grand scheme of things, if __builtin_expect()
will even improve
things all that much.
Unless they have access to better profiling mechanics than I do.
Looks like I might have to find a slower machine to get a better feel for how to improve the speed of the program.
Profiling is still viable, if you run the program long enough
So last night I said, “I might have to find a slower machine to get a better feel for how to improve the speed of [mod_blog].” Well, I found one other way—increase the amount of work the program does.
mod_blog
, or at least the program I was working with last
night, bp
(for “build page”) is responsible for building the
HTML pages served up
the webserver. In its default mode (which is what I was working with last
night) it generates the main index page, and the RSS file, which only
(only!) generates about 100,000 bytes of output and happened too
quickly to get any meaningful data out of profiling the program. But it can
also generate pages with
hundreds of entries given the right options, so by having it generate a
page with every entry in 2000 through 2003 inclusive (which
generates a page that is 3,352,028 bytes in size) I was able to get
meaningful profiling information:
% time | cumulative seconds | self seconds | calls | self ms/call | total ms/call | name |
---|---|---|---|---|---|---|
29.73 | 0.22 | 0.22 | 5817769 | 0.00 | 0.00 | line_ioreq |
14.86 | 0.33 | 0.11 | 6650539 | 0.00 | 0.00 | BufferIOCtl |
10.81 | 0.41 | 0.08 | 2908866 | 0.00 | 0.00 | ht_readchar |
9.46 | 0.48 | 0.07 | 790760 | 0.00 | 0.00 | ufh_write |
5.41 | 0.52 | 0.04 | 2908398 | 0.00 | 0.00 | line_readchar |
Yes, with a run time of nearly 3 seconds (2.992) I was able to generate
consistant, meaningful profiling information (meaning, we went from calling
BufferIOCtl()
141,779 times to 6,650,539 times) which was good
enough to see if the GCC optimzation of using -freorder-blocks
with __builtin_expect()
would help any, at least on the Intel
x86 line.
Well, after running a dozen tests, I can say that it does help
on the Intel platform, but at best, using __builtin_expect()
with -freorder-blocks
will only give you a few percent boost in
speed. As in, single digit percentage boost.
In certain cases.
Certainly, when I did tests generating the output to
/dev/null
you could see the boost, but in running tests
generating an actual file (as opposed to just tossing the data in the bit
bucket), it's not quite so clear cut (and for each test, I ran it five
times, taking the timings from the fifth run—this to help smooth out any
system caching effects). The best improvement came when using -O3
-march=pentium3
-fomit-frame-pointer
to compile the program, but it was still hardly
noticable from a user perspective (maybe about a tenth or two tenths of a
second).
Mark is expecting __builtin_expect()
to have better impact
on systems where the CPU
will use the hint.
Sunday, July 11, 2004
“Hey! What's that code doing there?”
While the __builtin_expect()
aspect of GCC didn't work, all the recent profiling I've done on mod_blog
(which reminds me, I need to make the current codebase available) did
however, bring my attention to BufferIOCtl()
, which if you noticed, was one of the top four
functions in term of CPU
utilization.
int (BufferIOCtl)(const Buffer buf,int cmd, ... ) { va_list alist; int rc; ddt(buf != NULL); ddt(buf->ioreq != NULL); ddt(cmd > -1); if (buf == NULL) return(ErrorPush(CgiErr,BUFFERIOCTL,BUFERR_NULLPTR,"i",cmd)); if (buf->ioreq == NULL) return(ErrorPush(CgiErr,BUFFERIOCTL,BUFERR_NULLHANDLER,"i",cmd)); va_start(alist,cmd); rc = (*buf->ioreq)(buf,cmd,alist); va_end(alist); return(rc); }
ddt()
is similar to the ANSI C call assert()
, which
basically states a condition that should exist (and if that condition isn't
met, the program aborts—this action can be turned off for production code;
it's meant for debugging). But you'll notice that the code first checks to
see if buf
is not NULL
within ddt()
,
then the first thing it does is check to see if buff
is
NULL
.
It shouldn't be NULL
to begin with.
The same for the tests of buf->ioreq
. When I removed
the extraneous code:
int (BufferIOCtl)(const Buffer buf,int cmd, ... ) { va_list alist; int rc; ddt(buf != NULL); ddt(buf->ioreq != NULL); ddt(cmd > -1); va_start(alist,cmd); rc = (*buf->ioreq)(buf,cmd,alist); va_end(alist); return(rc); }
The runtime of BufferIOCtl()
dropped to 1/3 the original
time.
Not much in the grand scheme of things, but just goes to show you how
expensive extraneous if
statements can be. Especially if it's
called 6,646,086 times.