Thursday, January 19, 2012
Okay, what failed this time?
I'm running the regression tests for “Project: Wolowizard” and about half way through the tests (around the two hour mark or so) start failing. Sometimes expected results just aren't showing up. I'm freaking about a bit because of all the issues we've had in running these tests, only for it to start failing in yet a different way.
Now, a bit about how this all works—there are four computers involved; one runs the tests, injecting messages towards a mini-cluster of two machines, either of which (depending on which one gets the message) sends a message to the fourth machine, which does a bunch of processing (which may involve interaction with a simulated cell phone on the testing machine), then responds back to the mini-cluster, which then responds back to the testing machine.
Now, I can check the immedate results from the mini-cluster, but the
actual data I'm interested in is logged via
syslog, so I have
that data forwarded to the testing machine and my code grovels through a log
file for the actual data I want. And it's that data (or part
thereof) that apparently isn't being logged, and thus, the tests are
Now, it just so happens that the part of the test that's failing is the part dealing with the mini-cluster, and it looks like about half the tests are failing (hmm …. ).
I log into each of the two computers comprising the mini-cluster, and
/etc/syslog.conf, in the off chance that changed. Nope.
I then explain the problem to Bunny, standing (or rather, sitting) in as my
when it hits me—I should check to see if the program is running.
Rats. It is.
The tests are still failing, and my shoes began to squeak.
Okay, just because
syslogd is running doesn't necessarily
mean it's running correctly. So I run
logger -p local1.info
FOO on each machine and yes, one of the machines is failing to foward
the logs to the testing machine.
syslogd on that system, and lo! The log entries
are getting through now.
You know, I expect there to be issues with the stuff I'm testing; what I don't expect is the stuff that we didn't write is having issues (the Protocol Stack From Hell™ notwithstanding).
Okay, reset everything and start the regression test over again …
Update in the wee-hours of the morning, Friday, January 20th, 2012
A bit over half-way through the regression tests, and the log files
rotate. Aaaaaaaaaah! Okay, reset all the data, and start from the last
failed test. That's easy, since I can specify which cases to run. That's
hard, because I have to specify nearly a 100 cases. That's easy, since I
can use the Unix command
seq to list them. That's hard,
because the test cases aren't just numbers, but things like
“1.b.77” and “1.c.18”, and while the shell supports
command line expantion from a running program via the backtick (ala
for i in `seq 34 77`; do echo 1.b.$i; done) I need to nest two
such operations (
echo `for i in `seq 34 77`;do echo 1.b.$i;
done`) to specify the test cases from the command line, and the
command line doesn't support that. Okay, I can create a temporary
file that lists the test cases …