Friday, January 11, 2013
Did I repeat myself? Did I repeat myself? Did I repeat myself? Let me fix that.
“Did you mean to post four copies of your post?” asked Bunny.
“No, why?”
“Because there are four copies of your post, that's why,” said Bunny.
“How do you link to the post in question like that when you speak?” I asked. She shook her head and walked away. “Bunny?”
So yes, there was a glitch last night. The only way for four copies to show up like that would be for four copies to be sent via email (which is my preferred interface). And in checking the logs, I did indeed see four copies show up:
Jan 11 05:03:58 brevard postfix/local: 084412EB72E4: to=<XXXXXXXXXXXXXXXXXXXXX>, relay=local, delay=3, status=deferred (Command died with signal 6: "/home/spc/web/sites/boston.conman.org/htdocs/boston.cgi --config /home/spc/web/sites/boston.conman.org/journal/boston.cnf --email --cmd new". Command output: *** glibc detected *** double free or corruption (fasttop): 0x09cf8d38 *** ) Jan 11 05:27:10 brevard postfix/local: 084412EB72E4: to=<XXXXXXXXXXXXXXXXXXXXX>, relay=local, delay=1395, status=deferred (Command died with signal 6: "/home/spc/web/sites/boston.conman.org/htdocs/boston.cgi --config /home/spc/web/sites/boston.conman.org/journal/boston.cnf --email --cmd new". Command output: *** glibc detected *** double free or corruption (fasttop): 0x08646d38 *** ) Jan 11 06:00:29 brevard postfix/local: 084412EB72E4: to=<XXXXXXXXXXXXXXXXXXXXX>, relay=local, delay=3394, status=deferred (Command died with signal 6: "/home/spc/web/sites/boston.conman.org/htdocs/boston.cgi --config /home/spc/web/sites/boston.conman.org/journal/boston.cnf --email --cmd new". Command output: *** glibc detected *** double free or corruption (fasttop): 0x0934bd38 *** ) Jan 11 07:07:09 brevard postfix/local: 084412EB72E4: to=<XXXXXXXXXXXXXXXXXXXXX>, relay=local, delay=7394, status=sent (delivered to command: /home/spc/web/sites/boston.conman.org/htdocs/boston.cgi --config /home/spc/web/sites/boston.conman.org/journal/boston.cnf --email --cmd new)
The fact that four showed up showed that mod_blog
crashed
when exiting. I find it disturbing that I've been running the latest
version since the 1st
and this is the first time it crashed. Not only that, but it crashed three
times in a row then worked.
Such bugs are terrifying.
So, inspired by the crash reports in Redis, I decided to roll my own and it was surprisingly easy given the output (from a test of the code):
CRASH: pid=6103 signal='Aborted' CRASH: reason='Unspecified/untranslated error' CRASH: CS=0073 DS=C02E007B ES=007B FS=0000 GS=0033 CRASH: EIP=00B767A2 EFL=00000246 ESP=BFE162F8 EBP=BFE1630C ESI=000017D7 EDI=00CBBFF4 CRASH: EAX=00000000 EBX=000017D7 ECX=000017D7 EDX=00000006 CRASH: UESP=BFE162F8 TRAPNO=00000000 ERR=00000000 CRASH: STACK DUMP CRASH: BFE162F8: C5 78 BB 00 F4 BF CB 00 CRASH: BFE16300: F4 BF CB 00 00 00 00 00 C0 C6 F0 B7 3C 64 E1 BF CRASH: BFE16310: 29 93 BB 00 06 00 00 00 20 63 E1 BF 00 00 00 00 CRASH: BFE16320: 20 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 CRASH: BFE16330: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 CRASH: BFE16340: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 CRASH: BFE16350: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 CRASH: BFE16360: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 CRASH: BFE16370: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 CRASH: BFE16380: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 CRASH: BFE16390: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 CRASH: BFE163A0: A7 C4 CB 00 A8 C4 CB 00 F4 BF CB 00 A7 C4 CB 00 CRASH: BFE163B0: 60 C4 CB 00 D8 63 E1 BF 73 D0 C4 00 F4 BF CB 00 CRASH: BFE163C0: 4F E7 BE 00 02 00 00 00 B8 C6 54 08 42 00 00 00 CRASH: BFE163D0: B8 C6 54 08 42 00 00 00 14 64 E1 BF 9C E9 BE 00 CRASH: BFE163E0: 60 C4 CB 00 B8 C6 54 08 42 00 00 00 32 30 31 34 CRASH: BFE163F0: A7 C4 CB 00 00 00 00 00 CRASH: STACK TRACE CRASH: /home/spc/source/boston/build/boston[0x805d3c4] CRASH: /home/spc/source/boston/build/boston[0x805daa2] CRASH: /lib/tls/libc.so.6[0xbb79b0] CRASH: /lib/tls/libc.so.6(abort+0xe9)[0xbb9329] CRASH: /lib/tls/libc.so.6(__assert_fail+0x101)[0xbb0e41] CRASH: /home/spc/source/boston/build/boston[0x8053f30] CRASH: /home/spc/source/boston/build/boston[0x805d0b9] CRASH: /home/spc/source/boston/build/boston(ChunkProcessStream+0xc8)[0x805d32e] CRASH: /home/spc/source/boston/build/boston(ChunkProcess+0xcb)[0x805d240] CRASH: /home/spc/source/boston/build/boston(generic_cb+0x74)[0x8053a68] CRASH: /home/spc/source/boston/build/boston(pagegen_days+0x1c4)[0x8057f5c] CRASH: /home/spc/source/boston/build/boston(generate_pages+0xd0)[0x8057bfc] CRASH: /home/spc/source/boston/build/boston[0x80501bb] CRASH: /home/spc/source/boston/build/boston(main_cli+0x1e5)[0x80500cd] CRASH: /home/spc/source/boston/build/boston(main+0x133)[0x8051eb3] CRASH: /lib/tls/libc.so.6(__libc_start_main+0xd3)[0xba4e93] CRASH: /home/spc/source/boston/build/boston[0x804cdbd]
POSIX allows you to register a signal handler with additional contextual information. This contextual information includes the contents of the CPU registers and it's just a matter of finding the structure definition in a header file. Once I have the registers, it's not hard to get the stack dump.
The stack trace was even eaiser than that. It seems that the Linux C
library comes with a few functions to do just
that—backtrace()
collects the return addresses off the stack;
backtrace_symbols()
will resolve the addresses back to function
names and backtrace_symbols_fd()
does the same, only it writes
the informtaion to a file. I ended up using
backtrace_symbols_fd()
since I'm executing in the context of a
signal handler, and I know that open()
, read()
and
close()
are marked as “async-signal-safe” (that is, safe to
call in a signal handler) by POSIX, but malloc()
and
free()
are not, and backtrace_symbols()
is known
to call malloc()
(and thus, I would need to call
free()
) and I don't want to risk a crash in the crash report
code. I'm taking a risk because backtrace()
and
backtrace_symbols_fd()
could call non-async-signal-safe
functions, but I'm doing as much as I can to get as much context as I can as
safely as I can.