Sunday, March 11, 2007
Home alone (with a few cats though)
Spring and Wlofie are away for the week, gone camping at Gulf Wars, an SCA event.
Me? I'm not the camping type; roughing it is a hotel room with spotty Internet access.
Here's hoping they have fun.
As you wish.
Computers excel at following instructions to the letter.
Programmers don't quite excell at giving instructions to the computer.
Case in point: the daemon I'm
working on. Through testing, I found that the automatic restarting wasn't working in all cases.
If the program ran in the foreground, it would restart properly upon a
crash. If it started up at an actual daemon though, it would fail. It took
me a few hours to debug the problem, primarily because for this problem, I
couldn't use gdb
(the Unix debugger) for a few reasons:
- going into daemon mode creates a new process, which isn't the
one that gdb starts debugging. To get around that problem,
you can start the program up, and then attach
gdb
to the running process. That still leaves gdb
will catch the segfault for you, instead of passing it on to the program. There very well may be a way to pass it on, but I'm not sure how wellgdb
handles signal handlers.
Painful as it is, the lack of a debugger can be worked around. And before I reveal the actual problem, here's the relevant code (sans error checking, as that only clutters things up):
int main(int argc,char *argv[]) { global_argv = argv; /* save argument list for later restarting */ if (gf_run_in_foreground == 0) daemon_init(); signal(SIGSEGV,crash_recovery); /* rest of program */ } void daemon_init(void) { pid_t pid; pid = fork(); if (pid == 0) /* parent exits, child process continues on */ exit(EXIT_SUCCESS); chdir("/tmp"); /* safe place to execute from */ setsid(); /* become a session leader */ close(STDERR_FILENO); /* close these, we don't need them */ close(STDOUT_FILENO); close(STDIN_FILENO); } void crash_recovery(int sig) { extern char **environ; sigse_t sigset; syslog(LOG_ERR,"restarting program"); /*--------------------------------- ; unblock any blocked signals, ; including the one we're handling ;---------------------------------*/ sigfillset(&sigset); sigprocmask(SIG_UNBLOCK,&sigset,NULL); /*--------------------------------- ; restart ourselves. If the call ; to execve() fails, there's not ; much else to do but exit. ;---------------------------------*/ execve(global_argv[0],global_argv,environ); _exit(EXIT_FAILURE); }
Another bit of critical information: I would start the program thusly:
GenericUnixPrompt> ./obj/daemon
If you're good (say, the calibre of Mark) you'll see the problem. If not,
don't worry—it took me a few hours. Here's a hint: Once I removed the
call to chdir()
, the code worked fine in daemon mode, and no,
chdir()
wasn't failing.
In fact, it didn't matter where I put the chdir()
call, having it in there would cause the re-exec to fail when running in
daemon mode.
The problem?
By changing directories, the relative path I was using to start the
program was no longer valid when calling execve()
, and of all
the places where I could check the return code, that wasn't one of
them. It didn't dawn on me (until thinking about it for a while after
removing the call to chdir()
) what the actual problem was.
Sheesh.
Here was the program, doing exactly what I told it to do, only I didn't realized what I was telling it to do wasn't what I thought I was telling it to do.
My brain hurts.
As a postscript to this, even if I were able to start the
program under gdb
, trace into the new process created, pass on
the segfault to the signal handler, it wouldn't reveal the problem because
gdb
uses the full path to the program when running it, thus
masking the real problem.