Saturday, January 12, 2013
A few notes about yesterday's crashes
I wrote crashreport()
in an attempt to find out why glibc
was
reporting a double free (or memory corruption), so imagine my surprise
when I found other crashes
happening. I did find the root causes for the crashes yesterday, but I
have yet to figure out why the memory corruption happened.
First off, no points to Apache for failing to report the
unexpected termination of a child process. I can certainly understand that
the Apache developers don't expect anyone to use CGI anymore, and if people do, to use a
CGI developed in a
scripting language that probably won't core dump. But still, they make the
CGI module, and that
the program the CGI
module executes can be written in anything and hiding the fact that
a program crashed due to SIGSEGV
or SIGABRT
is, to
me, inexcusable.
Had Apache logged the crash, I probably would have found the error a few years ago (seriously). The actual crash only happened after the output was generated and sent to the browser, so I never saw anything unusual. And because Apache never said anything about a crash and well … everything is okay, right?
Second, the code path with the crash was in a seldom used code
path—specifically, when the addentry.html
page was requested.
I normally use email to create entries, not the web interface. But it's not
like I never use the web interface, but I can safely count on two
hands the number of times I've used it over the past thirteen years.
So to say it doesn't get a lot of use is an understatement.
Now, are there features I don't use? Yes. And such code is currently
commented out. That code was written at a time when I expected other people
might use the codebase, but alas, only one other person ever used
mod_blog
(only to stop blogging due to personal reasons) and now, as
far as I know, I'm the only one who uses this codebase. That doesn't bother
me, but it does indicate that I should probably remove the code that I don't
use.
But the web interface? I use it just enough to justify its existence in the codebase.
Third, the addtion of command line and evironment variables to the output
of crashreport()
(and I solved the global variable issues I
had) certainly helped with the diagnosis. It revealed a request that would
reliably crash the program (the aforementioned addentry.html
page) and with a reliable way to crash the program, it's easy to isolate the
buggy code (if a bit tedious).
And to tell the truth, the bug has existed since May 26th, 2009, when I made the following commit:
Basically, I rewrote the core blogging engine over the past twelve hours. I still have yet to support adding new entries via the engine, but until I get that fixed, I can add them manually.
only I didn't quite update all the code properly. And since the
code path in question isn't executed except when called as a CGI program (I should note that
mod_blog
can be run from the command line as well), and Apache
never logs CGI programs
that crash, no wonder I never saw this bug.