Thursday, June 18, 2015
It's not even a Heisenbug
I don't think it's a case of one thread releasing a mutex locked by another thread.
Whatever the bug, I can reproduce it rather consistently and every crash has always been along the same call chain. The mechanism used to trigger the problematic call chain is a way to work around a deficiency in POSIX (which I'm sure the original author of said mechanism would say stands for “Piece of XXXX In eXecution”)—not that I blame the original author, Unix and threads don't really mix that well (but then in my opinion, if it wasn't in Unix Version 7, it's not supported well or has a horrible interface—threads, networking, removable storage, graphical user interfaces, all have … issues under Unix).
But as of now, I've yet to figure out the actual root cause of the assert:
tpp.c:63: __pthread_tpp_change_priority: Assertion `new_prio == -1 || (new_prio >= __sched_fifo_min_prio && new_prio <= __sched_fifo_max_prio)' failed.
Every source I've found on the web with that bug states it has to do with an uninitialized mutex attribute. The code isn't using mutex attributes, and even when I added code to use a mutex attribute (initialized!) it still asserts. Sigh.
And the even weirder thing is—I don't think the component that's crashing is even being used by what I'm testing! See, the component I'm testing, E, makes a service request to T. T requires that service X (the one that is crashing) to be running, but the requests from E shouldn't cause T to make a request to X. X just sits there and periodically, logs a bunch of stats that basically show it's not doing a whole lot of work.
I got nothing.