The Boston Diaries

The ongoing saga of a programmer who doesn't live in Boston, nor does he even like Boston, but yet named his weblog/journal “The Boston Diaries.”

Go figure.

Friday, March 09, 2007

Cheating your way to a robust daemon

Programs written in Erlang have minimal (if any) error checking. The intent by the designers of Erlang is for buggy Erlang code to crash early and hard. No defensive programming for these guys, which seems odd given that Erlang is used primarily in phone switches, which have ridiculous uptime and reliability requirements, but not really.

You see, most Erlang programs are watched over by an even simpler program that simply waits for a crashing program and restarts it automatically, while logging the incident.

It's a pretty neat concept, and for the daemon I'm writing, I've done just that.

Well, I don't actually have a separate process watching, because one isn't needed. No, what I've done is catch a few signals that end up killing the program (like SIGSEGV) and instead of terminating the program, restarting it.

extern char **environ;
char         *global_argv[];

int main(int argc,char *argv[])
{
  /*----------------------------------
  ; save our command line.  I do this
  ; so *if* we re-exec ourselves, we
  ; re-exec ourselves as we were initially
  ; exec'ed.
  ;----------------------------------*/

  global_argv = argv;

  /*--------------------------------------
  ; Wrote my own signal() function that ensures
  ; reliable signal semantics (via W. Richard Stevens'
  ; Advanced Programming in the UNIX Environment).
  ; 
  ; Here, I'm capturing those signals that
  ; may be the result of bad programming and
  ; end up generating a core file, and catching
  ; those to restart the program.
  ;----------------------------------------*/

  set_signal(SIGSEGV,crash_recovery);
  set_signal(SIGBUS, crash_recovery);
  set_signal(SIGFPE, crash_recovery);

  /* ... */

  return(EXIT_SUCCESS);
}

void crash_recovery(int sig)
{
  syslog(LOG_ERR,"received sig %d---restarting",sig);

  /*--------------------------------------------
  ; The signal we're handling may very well be
  ; blocked, which will persist across the execve()
  ; call.  This results in the first crash being
  ; caught, but not subsequent crashes.  By unblocking
  ; all signals, we assue we can catch further
  ; crashes.
  ;----------------------------------------------*/

  sigfillset(&sigset);
  sigprocmask(SIG_UNBLOCK,&sigset,NULL);

  /*---------------------------------------
  ; close all the files, but keep the standard
  ; STDIN, STDOUT and STDERR open.  Sure, we 
  ; loose any connections, but we'll loose them
  ; anyway if the program were to go away.  
  ;----------------------------------------*/

  for (i = 3 ; (i < OPEN_MAX) && (close(i) == 0) ; i++)
    ;

  /*---------------------------------------
  ; restart myself.
  ;------------------------------------*/

  execve(global_argv[0],global_argv,environ);

  /*----------------------------------
  ; well ... if we get here, we're screwed
  ; so might as well give up.
  ;------------------------------------*/

  _exit(EXIT_FAILURE);
}

This could be a very bad idea but I'll see how well it works out.

Obligatory Picture

[Here I am, enjoying my vacaton in a rain forest.]

Obligatory Links

Obligatory Miscellaneous

You have my permission to link freely to any entry here. Go ahead, I won't bite. I promise.

The dates are the permanent links to that day's entries (or entry, if there is only one entry). The titles are the permanent links to that entry only. The format for the links are simple: Start with the base link for this site: http://boston.conman.org/, then add the date you are interested in, say 2000/08/01, so that would make the final URL:

http://boston.conman.org/2000/08/01

You can also specify the entire month by leaving off the day portion. You can even select an arbitrary portion of time.

You may also note subtle shading of the links and that's intentional: the “closer” the link is (relative to the page) the “brighter” it appears. It's an experiment in using color shading to denote the distance a link is from here. If you don't notice it, don't worry; it's not all that important.

It is assumed that every brand name, slogan, corporate name, symbol, design element, et cetera mentioned in these pages is a protected and/or trademarked entity, the sole property of its owner(s), and acknowledgement of this status is implied.

Copyright © 1999-2017 by Sean Conner. All Rights Reserved.