The Boston Diaries

The ongoing saga of a programmer who doesn't live in Boston, nor does he even like Boston, but yet named his weblog/journal “The Boston Diaries.”

Go figure.

Wednesday, March 05, 2014

The Case Of The Missing Core Files

At work, I test the various components of “Project: Wolowizard.” These tests usually require running multiple copies of a program on a single computer. I use Lua (with help from a module) to start and monitor the programs being tested. The code starts N copies, and if any of the programs crash, the reason is logged. It's fairly straight forward code.

Now, one of the compents of “Project: Wolowizard” was updated to support a new project (“Project: Sippy-Cup”) and that component is occasionally crashing on an assert, but the problem is: there are no core files to check.

And I've spent the past two days trying to figure out why there are no core files to check.

The first culprit—have we told the system not to generate core files? Yup. The account under which the program runs (root) has a core file size limit of zero bytes. There are a few ways to fix this, and I picked what to me, was the simplest solution: in the Lua script that runs the programs, set the core file size to “unlimited.” And this is easy enough to do:

proc = require "org.conman.process"
proc.limits.hard.core = "inf"
proc.limits.soft.core = "inf"

Slight digression: you can set various resource limits for things like maximum memory usage to core file size. The hard limit normally can't be changed, but the soft limit can—any process an lower a limit. But a process running as root can raise a limit, and raise the hard limit. Since the program I'm running is running as root, setting both the hard and soft limits to “infinity” is easy.

But there was still a disturbing lack of core files.

I checked the code of the Lua module I was using, and yes, I flubbed the parsing code. I made the fix, my tests showed I got the logic right, installed the updated module and still, no core files.

I did a bunch more tests and checked off the following reasons for the lack of core files: it wasn't because the program dropped permissions; it wasn't because the program couldn't write the core file in its current working directory; and the program is not setuid. It was clear there was something wrong the module.

I was able to isolate the issue to the following:

struct rlimit limit;
lua_Number    ival;

/* ... */

if (lua_isnumber(L,3))
  ival = lua_tonumber(L,3);

/* ... */

if (ival >= RLIM_INFINITY)
  ival = RLIM_INFINITY;
  
limit.rlim_cur = ival;

Now, lua_Number is of type double (a floating point value), and imit.rlim_cur is some form of integer. ival was properly HUGE_VAL (the C floating point equivalent of “infinity”) but limit.rlim_cur was 0.

But it worked on my home system just fine.

Then it dawned on me—my home system was a 32-bit system! That was the system I did the patch and initial test; the systems at work are all 64-bit systems. Some digging revealed that the definition of RLIM_INFINITY on the 64-bit system was

((unsigned long int)(~0UL))

or in other words: the largest unsigned long integer value. And on a 64-bit system, an unsigned long integer is 64-bits in size.

I do believe I was bit by an IEEE 754 floating point implementation detail.

Lua treats all numbers as type double, and on modern systems, that means IEEE 754 floating point. A double can store 53-bit integers without loss, and on a 32-bit system, you can pass integer values into and out of a double without issue (32 being less than 53) and because I did my initial testing of the Lua module on a 32-bit system, there was no issue.

But on a 64-bit system … it gets interesting. Doing some empirical testing, I found the largest integer value you can store into a double and get something out is 18,446,744,073,709,550,591 (and what you get out is 18,446,744,073,709,549,568—I'll leave the reason for the discrepancy for the reader); anything larger, you get zero back out.

So, no wonder I wasn't getting any core files! I was inadvertantly setting the core file size to zero bytes!

Sigh.

Off to fix the code …

Obligatory Picture

[I'm wearing a goatee—I must be my evil twin brother.]

Obligatory Links

Obligatory Miscellaneous

You have my permission to link freely to any entry here. Go ahead, I won't bite. I promise.

The dates are the permanent links to that day's entries (or entry, if there is only one entry). The titles are the permanent links to that entry only. The format for the links are simple: Start with the base link for this site: http://boston.conman.org/, then add the date you are interested in, say 2000/08/01, so that would make the final URL:

http://boston.conman.org/2000/08/01

You can also specify the entire month by leaving off the day portion. You can even select an arbitrary portion of time.

You may also note subtle shading of the links and that's intentional: the “closer” the link is (relative to the page) the “brighter” it appears. It's an experiment in using color shading to denote the distance a link is from here. If you don't notice it, don't worry; it's not all that important.

It is assumed that every brand name, slogan, corporate name, symbol, design element, et cetera mentioned in these pages is a protected and/or trademarked entity, the sole property of its owner(s), and acknowledgement of this status is implied.

Copyright © 1999-2014 by Sean Conner. All Rights Reserved.

Listed on BlogShares