The Boston Diaries

The ongoing saga of a programmer who doesn't live in Boston, nor does he even like Boston, but yet named his weblog/journal “The Boston Diaries.”

Go figure.

Thursday, April 13, 2006

Leaky abstractions at hot spots

Wlofie mention clive, a C-based Live Journal client, was vastly slower than jlj, a Perl based Live Journal client. Having encountered some weirdness in the past like this, I thought it might be fun to profile the code and see where the problem might lie.

Results of profiling
% time cumulative seconds self seconds calls self µs/call total µs/call name
% time cumulative seconds self seconds calls self µs/call total µs/call name
73.33 0.11 0.11 3693 29.79 29.79 getalloc
6.67 0.12 0.01 18630 0.54 0.54 lj_debug
6.67 0.13 0.01 3323 3.01 3.55 hash
6.67 0.14 0.01 1483 6.74 10.78 getp
6.67 0.15 0.01 4 2500.00 34335.39 delete_hash
0.00 0.15 0.00 3693 0.00 29.79 lj_free
0.00 0.15 0.00 1841 0.00 0.00 chomp
0.00 0.15 0.00 941 0.00 10.78 get
0.00 0.15 0.00 936 0.00 0.00 grow_alloctable
0.00 0.15 0.00 936 0.00 0.54 lj_malloc
0.00 0.15 0.00 936 0.00 0.00 newalloc
0.00 0.15 0.00 924 0.00 1.07 next
0.00 0.15 0.00 920 0.00 137.20 del
0.00 0.15 0.00 920 0.00 6.23 put
0.00 0.15 0.00 12 0.00 0.00 MD5Update
0.00 0.15 0.00 11 0.00 35.15 geti
0.00 0.15 0.00 11 0.00 35.15 getpi
0.00 0.15 0.00 7 0.00 0.38 lj_set_config
0.00 0.15 0.00 6 0.00 0.00 MD5Transform
0.00 0.15 0.00 4 0.00 0.00 MD5Data
0.00 0.15 0.00 4 0.00 0.00 MD5End
0.00 0.15 0.00 4 0.00 0.00 MD5Final
0.00 0.15 0.00 4 0.00 0.00 MD5Init
0.00 0.15 0.00 4 0.00 0.00 MD5Pad
0.00 0.15 0.00 4 0.00 0.00 closeconn
0.00 0.15 0.00 4 0.00 0.54 create_hash
0.00 0.15 0.00 4 0.00 0.00 getconn
0.00 0.15 0.00 4 0.00 1556.88 lj_send_request
0.00 0.15 0.00 4 0.00 0.54 md5_hex
0.00 0.15 0.00 4 0.00 0.00 reset
0.00 0.15 0.00 3 0.00 0.54 lj_realloc
0.00 0.15 0.00 3 0.00 1.07 lj_urlencode
0.00 0.15 0.00 2 0.00 36006.75 lj_getchallenge
0.00 0.15 0.00 2 0.00 36009.43 lj_setauth
0.00 0.15 0.00 1 0.00 0.00 create_getfriends_request
0.00 0.15 0.00 1 0.00 1.07 create_login_request
0.00 0.15 0.00 1 0.00 0.77 lj_ask_username
0.00 0.15 0.00 1 0.00 0.54 lj_config_file
0.00 0.15 0.00 1 0.00 76284.83 lj_getfriends
0.00 0.15 0.00 1 0.00 4.83 lj_init_config
0.00 0.15 0.00 1 0.00 73708.73 lj_login
0.00 0.15 0.00 1 0.00 150000.00 main

In looking over the results, it's obvious the problem spots are in delete_hash(), lj_getchallenge(), lj_setauth(), lj_getfriends() (the option I used to test the program with) or lj_login. A quick glance at delete_hash() didn't show any problems, and commenting it out didn't solve anything.

So do the next best thing—run the program under a debugger and follow the execution to see where it might be hanging. A breakpoint in lj_login() (another possible hot spot) and following the code there lead to the following bit of code (from lj_send_request()):

/*
 * First we ditch the headers...if we ever wanted to do
 * anything with the headers this is where we 'd put the code
 * I'm using key here because it' s just as good as any other buffer
 * I could set up a buffer just for this, but let's not waste memory
 */
while (fgets(key, LJ_BUFFER_MAX, conn) && strcmp(key, "\r\n"))
{
#ifdef DEBUG3
  int i = strlen(key);

  if (!strcmp(key + (i - 2), "\r\n"))
    key[i - 2] = '\0';
  lj_debug(3, "[%s]\n", key);
#endif
}

/*
 * Now we read in the key-value pairs.  We're making the
 * assumption that neither the key nor the value will exceed
 * 1024 bytes.  REVISIT THIS
 */
while (fgets(key, LJ_BUFFER_MAX, conn)
       && fgets(value, LJ_BUFFER_MAX, conn))
{
  chomp(key);
  chomp(value);
  lj_debug(3, "Putting [%s]->[%s]...", key, value);
  put(lj_hash, key, value);
}

closeconn(conn);

The code was blocking in the second loop to a call to fgets(). And I pretty much knew what was happening: leaky abstractions (and it's fitting that the article talks about TCP, which is exactly where the leaky abstraction is happening in this very piece of code).

fgets() is a nice function, since it reads in a line of text; if there's no more text to read, it returns NULL indicating there is no more input. But fgets() is (in the case of Unix) written on top of lower level code, notably read(). When applied to a disk file, if there is no more data to be read, read() returns a 0, meaning “no more data.”

But we're not reading from a file. We're reading from a network connection. And therein lies a difference. The function read() also works for network connections, but it behaves just a bit differently in that case. In the case of a disk file, it's pretty apparent when there is no more data to read. But in the case of a TCP connection, it's not clear-cut when the other side has stopped sending data. So read() will just sit there, waiting for data to come in. If there's no data after a period of time (about 30 seconds on my computer) read() will finally conclude that there's no data to be had. That result then bubbles up to fgets() and the loop finally terminates.

But not until a long 30 seconds per request goes by.

No wonder clive is slow (and no, it has nothing to do with “keepalives” in the HTTP protocol—I tested that hypothesis).

The fix is to parse the headers for a “Content-Length” header and read only those number of bytes, but since Wlofie is now using another program, and I don't really post to my LiveJournal (except once a year) there's no incentive to really fix the program (but I did find the problem, which is what I set out to do).

Obligatory Picture

[It's the most wonderful time of the year!]

Obligatory Contact Info

Obligatory Feeds

Obligatory Links

Obligatory Miscellaneous

You have my permission to link freely to any entry here. Go ahead, I won't bite. I promise.

The dates are the permanent links to that day's entries (or entry, if there is only one entry). The titles are the permanent links to that entry only. The format for the links are simple: Start with the base link for this site: http://boston.conman.org/, then add the date you are interested in, say 2000/08/01, so that would make the final URL:

http://boston.conman.org/2000/08/01

You can also specify the entire month by leaving off the day portion. You can even select an arbitrary portion of time.

You may also note subtle shading of the links and that's intentional: the “closer” the link is (relative to the page) the “brighter” it appears. It's an experiment in using color shading to denote the distance a link is from here. If you don't notice it, don't worry; it's not all that important.

It is assumed that every brand name, slogan, corporate name, symbol, design element, et cetera mentioned in these pages is a protected and/or trademarked entity, the sole property of its owner(s), and acknowledgement of this status is implied.

Copyright © 1999-2021 by Sean Conner. All Rights Reserved.