Tuesday, January 17, 2006
“It comes in two flavors, ‘icky’ and ‘yucky.’”
The magnets are still stuck in The Younger. And this morning he was in pain. Quite a bit of it too. Hopefully the pain means they are on the move, but we won't know until later today after his insides are scanned once again.
If the magnets haven't moved, tomorrow the doctors go in.
Here's hoping the magnets leave his system on their own.
Real-time LaBrea data processing program
I finished writing the real-time LaBrea data processing program (most of it last night at The Hospital) and the final few bugs were real doozies.
The program works by getting data from LaBrea, then it looks up the connection and updates the information accordingly. The code pretty much looks like:
void start_tarpit(time_t stamp,char *line) { struct tprecord *exists; struct tprecord rec; size_t index; read_record(&rec,line); exists = tpr_search(&rec,&index); if (exists == NULL) exists = pull_free_record(); exists->src = rec.src; exists->sport = rec.sport; /* ... */ record_add(index,exists); } /*******************************/ struct tprecord *pull_free_record(void) { if (g_poolnum == g_poolmax) do_forced_garbage_collection(); return(&g_pool[g_poolnum++]); } /*****************************/ void record_add(size_t index,struct tprecord *rec) { if (g_recnum == g_recmax) do_forced_garbage_collection(); memmove( &g_rec[index + 1], &g_rec[index], (g_recnum - index + 1) * sizeof(struct tprecord *) ); g_rec[index] = rec; g_recnum++; }
tpr_search()
is the binary search routine I wrote the other day, and as
you can see, it returns the index to where the record is in the array, or to
where it should be. pull_free_record()
just returns
the next free slot in the structure array, and if there are no slots
available, it does a removes some older records according to some criteria.
And record_add
will add the record to the pointer array,
also removing older records if there is no space left.
Some records are deleted. All remaining records move about. The pointer array is resorted. So between the time
exists = tpr_search(&rec,&index);
and
record_add(index,exists);
index
may not be a valid index anymore!
Oops.
(Never mind the fact that one of the two calls to
do_forced_garbage_collection()
is redundant)
Simple enough to fix once I knew what was going on.
Another bug dealt with named pipes. Instead of directly piping the data
from LaBrea to ltpstat
(what I call the read-time LaBrea data
processing program), I decided to go through a named pipe, which would allow
me to start and stop either one independantly from the other.
Now, I'm testing my program, running cat sample >
labrea.pipe
in one window, and ltpstat --input
labrea.pipe
in another. It's working, until the data in
sample
runs out and cat
closes its side of the
named pipe.
Now, the code that reads in the data is in a library I wrote, and it just
assumes when a read()
returns 0, that it's the end of the file,
and marks it as being closed. ltpstat
ignores the “end of
file” status, and keeps trying to read a now-closed file. We get into a
busy loop and the system load shoots up. Also, if I now try to pump more
data through the named pipe, ltpstat
ignores the data.
Even when I modify the library code to not mark “end of file” when there's
nothing to read does nothing, as from that point on, read()
just returns nothing anyway and seems to be a “feature” of Linux
(or Unix—I didn't have Advanced
Programming in the Unix® Environment with me to look this up), so I
restructure the main loop:
while(1) { in = openinput(); while(!StreamEOF(in)) { process_labrea_output(); } StreamFree(in); }
That was all fine and good, until I threw signals into the mix.
I use signals to tell ltpstat
(what I call the read-time
LaBrea data processing program) to dump various
information—SIGHUP
to print the number of connections and
unique IPs being tracked, SIGUSR1
to do a raw dump of all the
data accumulated so far and SIGUSR2
to generate a more or less
human readable dump.
But signals are basically interrupts. And operating system theory states that one process should never find another process with its pants down (so to speak—and I should warn you—the paper I linked to is way more technical than I've gotten here). Signals and system calls interact (that is, if a process is signaled while making a system call into the kernel) in one of two ways—the system call will simply fail, or it will be restarted automatically. And one can select which method to use.
If I elected to have the system call fail, and ltpstat
would
fail, either to open (a system call) the named pipe, or in reading (another
system call) the named pipe after I signal the program.
If I elected to have the system call restarted, and the signal handlers I set up would never get called (due to the way I handle the signals).
I ended up reimplementing two routines from my library (which is used in more than just this program) for just this program. I select for the “system call fail” method, check to see if the system called failed due to a signal, and if so, check for the signals, and try again.
Again, this took a few hours to track down.
But now the program works, and I can finally get real time statistics from LaBrea.
Last minute update!
I just received a call from Spring—the mangets are in The Younger's stomach! (why this wasn't found out sooner, I don't know). So, because it's in his stomach, he's being moved yet again to another hospital, one that has the equipment that might get the magnets from his stomach.
I'll be heading there once I get word which hospital it is.
Some initial data from a real-time LaBrea data processing program
While I'm waiting a call back, some more on LaBrea.
Yesterday (from January 16 at 06:28:25 to January 17 08:54:50) LaBrea
generated 1.1G of log data, and it took
full five minutes to run grep 'Initial Connect' daemon.log.0 | wc
-l
(255,344 new tarpitting connections by the way).
LaBrea was also running at full speed, maxed out at 64Kbps bandwidth to keep all these connections tarpitted (the maximum I set LaBrea to use, by the way).
That first large dip in the graph (the one around 6:30 in the morning) is
probably due to the system attempting to rotate a 1.1G log file. The second dip, at the right (around
3:00 pm) is when I restarted LaBrea so its logging information would go
through ltpstat
.
After an hour of running:
Start: Tue Jan 17 14:55:59 2006 End: Tue Jan 17 15:55:59 2006 Running time: 1h Pool-max: 1048576 Pool-num: 24322 Rec-max: 1048576 Rec-num: 24322 UIP-max: 1048576 UIP-num: 1282 Reported-bandwidth: 40 (Kb/sec)
And after two hours:
Start: Tue Jan 17 14:55:59 2006 End: Tue Jan 17 16:56:19 2006 Running time: 2h 20s Pool-max: 1048576 Pool-num: 33326 Rec-max: 1048576 Rec-num: 33326 UIP-max: 1048576 UIP-num: 1632 Reported-bandwidth: 40 (Kb/sec)
And right this second:
Start: Tue Jan 17 14:55:59 2006 End: Tue Jan 17 18:37:19 2006 Running time: 3h 41m 20s Pool-max: 1048576 Pool-num: 42931 Rec-max: 1048576 Rec-num: 42931 UIP-max: 1048576 UIP-num: 2148 Reported-bandwidth: 40 (Kb/sec)
Okay, pool-max
and rec-max
are the maximum
sizes for the structure array and pointer array, and both should always be
equal at all times (I'm displaying this number more for debugging purposes
than anything else), while pool-num
and rec-num
(which should also be equal at all times) represent the current number of
connectioned tarpitted. I also keep track of unique IPs, which just now is currently 2,148
(out of 1,048,576 that I can store). I also just found out that IP address 195.130.152.85
has 4,809 connections currently tarpitted (and in the few seconds it took to
do that query, five more connections were tarpitted).
I'll be releasing this code in the next few days, when I can write up some documentation and slap on a license.
This just in …
Just now received a call from Spring—the trip to the other hospital has been postponed until tomorrow morning.
That is all.