The Boston Diaries

The ongoing saga of a programmer who doesn't live in Boston, nor does he even like Boston, but yet named his weblog/journal “The Boston Diaries.”

Go figure.

Thursday, April 08, 2004

It works, but mysteriously crashes after a day or so …

A program I'm trying to run (for a small side project) keeps crashing. Well, “crashing” isn't the right term—it technically doesn't crash, but calls exit() when certain errors occur. The error in question happens with the following code:

x = fcntl(fd, F_GETFL, &fl);
if (x < 0)
{
  syslog(LOG_ERR, "fcntl F_GETFL: FD %d: %s", fd, strerror(errno));
  exit(1);
}

and the error in question is:

fcntl F_GETFL: FD -1: Bad file descriptor

It's in a function called set_nonblock() and it pretty much takes a file desriptor (reference to an open file) as a parameter and makes two calls to fcntl() and it's failing with an invalid file descriptor on the first call. So I check the code that calls set_nonblock(); there are only two locations were set_nonblock() is called, and in both cases, the file descriptor is checked before the call to set_nonblock() which means that the file descriptor is being clobbered between the initial test and the call.

Not good.

So I add more logging, and run again (mind you, this is over the course of several days).

I finally get a location:

stp.c:233: failed assertion newsock >= 0

Okay, check the code:

int wait_for_connection(int s)
{
  int                newsock;
  int                len;
  struct sockaddr_in peer;

  ddt(s > -1);

    len = sizeof(struct sockaddr_in);
    newsock = accept(s, (struct sockaddr *) &peer, &len);
    /* dump_sockaddr (peer, len); */
    if (newsock < 0) {
        if (errno != EINTR)
            perror("accept");
    }
    get_hinfo_from_sockaddr(peer, len, client_hostname);
    ddt(newsock >= 0);
    set_nonblock(newsock);
    return (newsock);
}

Line 233 is highlighted, and ddt() (which is a function I wrote) basically checks the condition and if false, logs it (via syslog()) and exits the program. And I see the error. It's subtle, but it's there. The fragment:

newsock = accept(s, (struct sockaddr *) &peer, &len);

if (newsock < 0) {
  if (errno != EINTR)
    perror("accept");
}     

is the culprit.

Under Unix, a system call (like accept()) can be interrupted, and if so, the call fails with an error code of EINTR. Why could a system call be interrupted? Well, say a program creates a child process (which this one does), and that child does its job and exits, then the parent process (which created the child process) is “interrupted” with a message: “your child process has finished.” Normally, if a system call is interrupted, you want to try the system call again, only this code doesn't do that! (although it looks like the author intended to recall accept() but forgot to write that code).

Patch the code:

int wait_for_connection(int s)
{
  int                newsock;
  int                len;
  struct sockaddr_in peer;

  ddt(s > -1);
 
  do
  {
    len     = sizeof(struct sockaddr_in);
    newsock = accept(s,(struct sockaddr *) &peer,&len);
    if (newsock < 0)
    {
      if (errno != EINTR)
      {
        perror("accept");
        return(-1);
      }
    } while (newsock < 0);

    get_hinfo_from_sockaddr(peer,sizeof(struct sockaddr_in),client_hostname);
    set_nonblock(newsock);
    return(newsock);
  }
}

and try again. Hopefully, this (and some other minor cleanup) will fix the problem.

Obligatory Picture

[It's the most wonderful time of the year!]

Obligatory Links

Obligatory Miscellaneous

You have my permission to link freely to any entry here. Go ahead, I won't bite. I promise.

The dates are the permanent links to that day's entries (or entry, if there is only one entry). The titles are the permanent links to that entry only. The format for the links are simple: Start with the base link for this site: http://boston.conman.org/, then add the date you are interested in, say 2000/08/01, so that would make the final URL:

http://boston.conman.org/2000/08/01

You can also specify the entire month by leaving off the day portion. You can even select an arbitrary portion of time.

You may also note subtle shading of the links and that's intentional: the “closer” the link is (relative to the page) the “brighter” it appears. It's an experiment in using color shading to denote the distance a link is from here. If you don't notice it, don't worry; it's not all that important.

It is assumed that every brand name, slogan, corporate name, symbol, design element, et cetera mentioned in these pages is a protected and/or trademarked entity, the sole property of its owner(s), and acknowledgement of this status is implied.

Copyright © 1999-2019 by Sean Conner. All Rights Reserved.