The Boston Diaries

The ongoing saga of a programmer who doesn't live in Boston, nor does he even like Boston, but yet named his weblog/journal “The Boston Diaries.”

Go figure.

Wednesday, May 05, 2010

Millions of moving parts

In a system of a million parts, if each part malfunctions only one time out of a million, a breakdown is certain.

—Stanislaw Lem

In between paying work, I'm getting syslogintr ready for release—cleaning up the Lua scripts, adding licensing information, making sure everything I have actually works, that type of thing. I have quite a few scripts that isolated some aspects of working scripts—for instance, checking for ssh attempts and blocking the offending IP but weren't fully tested. A few were tested (as I'm using them at home), but not all.

I update the code on my private server, rewrite its script to use the new modules (as I'm calling them) only to watch the server seize up tight. After a few hours of debugging, I fixed the issue.

Only it wasn't with my code.

But first, the scenario I'm working with. Every hour, syslogintr will check to see if the webserver and nameserver are still running (why here? Because I can, that's why) and log some stats gathered from those processes. The checks are fairly easy—for the webserver I query mod_status and log the results; for the nameserver, I pull the PID from /var/run/named.pid and from that, check to see if the process exists. If they're both running, everything is fine. It was when both were not running that syslogintr froze.

Now, when the appropriate check determines that the process isn't running it not only logs the situation, but sends me an email to alert me of the situation. If only one of the two processes were down, syslogintr would work fine. It was only when both were down that it froze up solid.

I thought it was another type of syslog deadlockPostfix spews forth multiple log entries for each email going through the system and it could be that too much data is logged before syslogintr can read it, and thus, Postfix blocks, causing syslotintr to block, and thus, deadlock.

Sure, I could maybe increase the socket buffer size, but that only pushes the problem out a bit, it doesn't fix the issue once and for all. But any real fix would probably have to deal with threads, one to just read data continuously from the sockets and queue them up, and another one to pull the queued results and process them, and that would require a major restructure of the whole program (and I can't stand the pthreads API). Faced with that, I decide to see what Stevens has to say about socket buffers:

With UDP, however, when a datagram arrives that will not fit in the socket receive buffer, that datagram is discarded. Recall that UDP has no flow control: It is easy for a fast sender to overwhelm a slower receiver, causing datagrams to be discarded by the receiver's UDP

Hmm … okay, according to this, I shouldn't get deadlocks because nothing should block. And when I checked the socket receive buffer size, it was way larger than I expected it to be (around 99K if you can believe it) so even if a process could be blocked sending a UDP packet, Postfix (and certainly syslogintr wasn't sending that much data.

And on my side, there wasn't much code to check (around 2300 lines of code for everything). And when a process list showed that sendmail was hanging, I decided to start looking there.

Now, I use Postfix, but Postfix comes with a “sendmail” executable that's compatible (command line wise) with the venerable sendmail. Imagine my surprise then:

[spc]brevard:~>ls -l /usr/sbin/sendmail
lrwxrwxrwx  1 root root 21 Feb  2  2007 /usr/sbin/sendmail -> /etc/alternatives/mta
[spc]brevard:~>ls -l /etc/alternatives/mta
lrwxrwxrwx  1 root root 26 May  5 16:30 /etc/alternatives/mta -> /usr/sbin/sendmail.sendmail

Um … what the … ?

[spc]brevard:~>ls -l /usr/sbin/sendmail*
lrwxrwxrwx  1 root root      21 Feb  2  2007 /usr/sbin/sendmail -> /etc/alternatives/mta
-rwxr-xr-x  1 root root  157424 Aug 12  2006 /usr/sbin/sendmail.postfix
-rwxr-sr-x  1 root smmsp 733912 Jun 14  2006 /usr/sbin/sendmail.sendmail

Oh.

I was using sendmail's sendmail instead of Postfix's sendmail all this time.

Yikes!

When I used Postfix's sendmail everything worked perfectly.

Sigh.


mod_lua patched

And speaking of bugs, the bug I submitted to Apache was fixed!

Woot!

Obligatory Picture

[“I am NOT a number, I am … a Q-CODE!”]

Obligatory Contact Info

Obligatory Feeds

Obligatory Links

Obligatory Miscellaneous

You have my permission to link freely to any entry here. Go ahead, I won't bite. I promise.

The dates are the permanent links to that day's entries (or entry, if there is only one entry). The titles are the permanent links to that entry only. The format for the links are simple: Start with the base link for this site: https://boston.conman.org/, then add the date you are interested in, say 2000/08/01, so that would make the final URL:

https://boston.conman.org/2000/08/01

You can also specify the entire month by leaving off the day portion. You can even select an arbitrary portion of time.

You may also note subtle shading of the links and that's intentional: the “closer” the link is (relative to the page) the “brighter” it appears. It's an experiment in using color shading to denote the distance a link is from here. If you don't notice it, don't worry; it's not all that important.

It is assumed that every brand name, slogan, corporate name, symbol, design element, et cetera mentioned in these pages is a protected and/or trademarked entity, the sole property of its owner(s), and acknowledgement of this status is implied.

Copyright © 1999-2024 by Sean Conner. All Rights Reserved.