Tuesday, October 16, 2007
Heisenbugs … they're everywhere!
So I ran the greylist daemon for over eight hours under valgrind without it once hanging. I then restarted the server, this time running alone.
A few hours later, it hung.
And just for the record, when I normally attach to the running processing
using gdb
, it's where I would expect it to be:
(gdb) where #0 0x008067a2 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2 #1 0x003c2dd1 in recvfrom () from /lib/tls/libc.so.6 #2 0x08049411 in mainloop (sock=0) at src/main.c:88 #3 0x080493a6 in main (argc=1, argv=0xbfe5c084) at src/main.c:68
but when the process hangs:
(gdb) where #0 0x00dff7a2 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2 #1 0x00955e5e in __lll_mutex_lock_wait () from /lib/tls/libc.so.6 #2 0x008e7e4f in _L_mutex_lock_10231 () from /lib/tls/libc.so.6 #3 0x00000000 in ?? ()
I have no clue as to what's going on (and neither does gdb
apparently). Running the program under valgrind
obviously
changes the environment, enough to mask the condition that causes the bug in
the first place.
This is proving to be a very difficult bug to find.