Thursday, April 13, 2006
Leaky abstractions at hot spots
Wlofie mention
clive
, a C-based Live Journal client, was vastly slower
than jlj
, a Perl based Live Journal client. Having encountered some weirdness in the past like
this, I thought it might be fun to profile the code and see where the
problem might lie.
% time | cumulative seconds | self seconds | calls | self µs/call | total µs/call | name |
---|---|---|---|---|---|---|
% time | cumulative seconds | self seconds | calls | self µs/call | total µs/call | name |
73.33 | 0.11 | 0.11 | 3693 | 29.79 | 29.79 | getalloc |
6.67 | 0.12 | 0.01 | 18630 | 0.54 | 0.54 | lj_debug |
6.67 | 0.13 | 0.01 | 3323 | 3.01 | 3.55 | hash |
6.67 | 0.14 | 0.01 | 1483 | 6.74 | 10.78 | getp |
6.67 | 0.15 | 0.01 | 4 | 2500.00 | 34335.39 | delete_hash |
0.00 | 0.15 | 0.00 | 3693 | 0.00 | 29.79 | lj_free |
0.00 | 0.15 | 0.00 | 1841 | 0.00 | 0.00 | chomp |
0.00 | 0.15 | 0.00 | 941 | 0.00 | 10.78 | get |
0.00 | 0.15 | 0.00 | 936 | 0.00 | 0.00 | grow_alloctable |
0.00 | 0.15 | 0.00 | 936 | 0.00 | 0.54 | lj_malloc |
0.00 | 0.15 | 0.00 | 936 | 0.00 | 0.00 | newalloc |
0.00 | 0.15 | 0.00 | 924 | 0.00 | 1.07 | next |
0.00 | 0.15 | 0.00 | 920 | 0.00 | 137.20 | del |
0.00 | 0.15 | 0.00 | 920 | 0.00 | 6.23 | put |
0.00 | 0.15 | 0.00 | 12 | 0.00 | 0.00 | MD5Update |
0.00 | 0.15 | 0.00 | 11 | 0.00 | 35.15 | geti |
0.00 | 0.15 | 0.00 | 11 | 0.00 | 35.15 | getpi |
0.00 | 0.15 | 0.00 | 7 | 0.00 | 0.38 | lj_set_config |
0.00 | 0.15 | 0.00 | 6 | 0.00 | 0.00 | MD5Transform |
0.00 | 0.15 | 0.00 | 4 | 0.00 | 0.00 | MD5Data |
0.00 | 0.15 | 0.00 | 4 | 0.00 | 0.00 | MD5End |
0.00 | 0.15 | 0.00 | 4 | 0.00 | 0.00 | MD5Final |
0.00 | 0.15 | 0.00 | 4 | 0.00 | 0.00 | MD5Init |
0.00 | 0.15 | 0.00 | 4 | 0.00 | 0.00 | MD5Pad |
0.00 | 0.15 | 0.00 | 4 | 0.00 | 0.00 | closeconn |
0.00 | 0.15 | 0.00 | 4 | 0.00 | 0.54 | create_hash |
0.00 | 0.15 | 0.00 | 4 | 0.00 | 0.00 | getconn |
0.00 | 0.15 | 0.00 | 4 | 0.00 | 1556.88 | lj_send_request |
0.00 | 0.15 | 0.00 | 4 | 0.00 | 0.54 | md5_hex |
0.00 | 0.15 | 0.00 | 4 | 0.00 | 0.00 | reset |
0.00 | 0.15 | 0.00 | 3 | 0.00 | 0.54 | lj_realloc |
0.00 | 0.15 | 0.00 | 3 | 0.00 | 1.07 | lj_urlencode |
0.00 | 0.15 | 0.00 | 2 | 0.00 | 36006.75 | lj_getchallenge |
0.00 | 0.15 | 0.00 | 2 | 0.00 | 36009.43 | lj_setauth |
0.00 | 0.15 | 0.00 | 1 | 0.00 | 0.00 | create_getfriends_request |
0.00 | 0.15 | 0.00 | 1 | 0.00 | 1.07 | create_login_request |
0.00 | 0.15 | 0.00 | 1 | 0.00 | 0.77 | lj_ask_username |
0.00 | 0.15 | 0.00 | 1 | 0.00 | 0.54 | lj_config_file |
0.00 | 0.15 | 0.00 | 1 | 0.00 | 76284.83 | lj_getfriends |
0.00 | 0.15 | 0.00 | 1 | 0.00 | 4.83 | lj_init_config |
0.00 | 0.15 | 0.00 | 1 | 0.00 | 73708.73 | lj_login |
0.00 | 0.15 | 0.00 | 1 | 0.00 | 150000.00 | main |
In looking over the results, it's obvious the problem spots are in
delete_hash()
, lj_getchallenge()
,
lj_setauth()
, lj_getfriends()
(the option I used
to test the program with) or lj_login
. A quick glance at
delete_hash()
didn't show any problems, and commenting it out
didn't solve anything.
So do the next best thing—run the program under a debugger and follow
the execution to see where it might be hanging. A breakpoint in
lj_login()
(another possible hot spot) and following the code
there lead to the following bit of code (from
lj_send_request()
):
/* * First we ditch the headers...if we ever wanted to do * anything with the headers this is where we 'd put the code * I'm using key here because it' s just as good as any other buffer * I could set up a buffer just for this, but let's not waste memory */ while (fgets(key, LJ_BUFFER_MAX, conn) && strcmp(key, "\r\n")) { #ifdef DEBUG3 int i = strlen(key); if (!strcmp(key + (i - 2), "\r\n")) key[i - 2] = '\0'; lj_debug(3, "[%s]\n", key); #endif } /* * Now we read in the key-value pairs. We're making the * assumption that neither the key nor the value will exceed * 1024 bytes. REVISIT THIS */ while (fgets(key, LJ_BUFFER_MAX, conn) && fgets(value, LJ_BUFFER_MAX, conn)) { chomp(key); chomp(value); lj_debug(3, "Putting [%s]->[%s]...", key, value); put(lj_hash, key, value); } closeconn(conn);
The code was blocking in the second loop to a call to
fgets()
. And I pretty much knew what was happening: leaky
abstractions (and it's fitting that the article talks about TCP, which is exactly where
the leaky abstraction is happening in this very piece of code).
fgets()
is a nice function, since it reads in a line of
text; if there's no more text to read, it returns NULL
indicating there is no more input. But fgets()
is (in the case
of Unix) written on top of lower level code, notably read()
.
When applied to a disk file, if there is no more data to be read,
read()
returns a 0, meaning “no more data.”
But we're not reading from a file. We're reading from a network
connection. And therein lies a difference. The function
read()
also works for network connections, but it behaves just
a bit differently in that case. In the case of a disk file, it's pretty
apparent when there is no more data to read. But in the case of a TCP connection, it's not
clear-cut when the other side has stopped sending data. So
read()
will just sit there, waiting for data to come in. If
there's no data after a period of time (about 30 seconds on my computer)
read()
will finally conclude that there's no data to be had.
That result then bubbles up to fgets()
and the loop
finally terminates.
But not until a long 30 seconds per request goes by.
No wonder clive
is slow (and no, it has nothing to do with
“keepalives” in the HTTP protocol—I tested that hypothesis).
The fix is to parse the headers for a “Content-Length” header and read only those number of bytes, but since Wlofie is now using another program, and I don't really post to my LiveJournal (except once a year) there's no incentive to really fix the program (but I did find the problem, which is what I set out to do).