Wednesday, January 18, 2006

LaBrea ad-naseum

Just because I was curious, I ran the Perl script that parses the LaBrea logs on the 1.1G logfile from the other day. I knew it would take at least four hours or so to run.

I then created a modifed version of ltpstat to parse this file (it's the output from syslogd, hense some slight parsing modifications were required) and ran it at the same time.

What I found interesting (during the actual run of both programs) is that my program had a larger virtual memory footprint over the Perl version (easily seven to eight times larger) but the resident set size (the amount of physical memory being used—the rest not physically allocated or shoved off into swap space) of my program was half that of the Perl script. In retrospect this was expected—Perl was growing the data as it was being generated whereas my program allocates the whole thing at once, but my program has less overhead in keeping track of said data.

And ltpstat is faster than the Perl script even if it isn't gathering the stats in real-time— 3 hours, 22 minutes and 24 seconds to run vs. 6 hours, 39 minutes 49 seconds—almost half the time. I didn't see how much memory the Perl script was using just prior to finishing, but I can't see how it would be less than ltpstat.

The instance of ltpstat I started yesterday is still running:

Start: Tue Jan 17 14:55:59 2006 End: Wed Jan 18 14:55:59 2006 Running time: 1d
Pool-max: 1048576
Pool-num: 388929
Rec-max:  1048576
Rec-num:  388929
UIP-max:  1048576
UIP-num:  20298
Reported-bandwidth: 64 (Kb/sec)

Looks like I may break a million connections sooner than I expected.

