Tuesday, January 29, 2013
I'm terribly upset that GCC didn't start NetHack
Ah yes, undefined behavior of C. It's easy to see in retrospect, but it's still a bit surprising. The code:
#include <limits.h> int foo(int a,int b) { return a / b; } int main(void) { return foo(INT_MIN,-1); }
And when you compile with the right options …
[spc]lucy:/tmp>gcc -std=c99 -O0 crash.c [spc]lucy:/tmp>./a.out Floating point exception (core dumped) [spc]lucy:/tmp>
What's actually going on here? Well, I compiled the code with crashreport()
, so I could
capture the crash:
CRASH(9372/000): pid=9372 signal='Floating point exception' CRASH(9372/001): reason='Integer divide-by-zero' CRASH(9372/002): pc=0x804883d CRASH(9372/003): CS=0073 DS=007B ES=007B FS=0000 GS=0033 CRASH(9372/004): EIP=0804883D EFL=00010296 ESP=BFFC370C EBP=BFFC3710 ESI=BFFC37C4 EDI=BFFC3750 CRASH(9372/005): EAX=80000000 EBX=00CBBFF4 ECX=BFFC371C EDX=FFFFFFFF CRASH(9372/006): UESP=BFFC370C TRAPNO=00000000 ERR=00000000 CRASH(9372/007): STACK DUMP CRASH(9372/008): BFFC370C: 1C 37 FC BF CRASH(9372/009): BFFC3710: 38 37 FC BF 7C 88 04 08 00 00 00 80 FF FF FF FF CRASH(9372/010): BFFC3720: F4 BF CB 00 F4 BF CB 00 F8 A9 04 08 F4 BF CB 00 CRASH(9372/011): BFFC3730: 00 00 00 00 A0 CC B8 00 98 37 FC BF 93 4E BA 00 CRASH(9372/012): BFFC3740: 01 00 00 00 C4 37 FC BF CC 37 FC BF 26 22 B8 00 CRASH(9372/013): BFFC3750: F4 BF CB 00 00 00 00 00 50 37 FC BF 98 37 FC BF CRASH(9372/014): BFFC3760: 40 37 FC BF 55 4E BA 00 00 00 00 00 00 00 00 00 CRASH(9372/015): BFFC3770: 00 00 00 00 D4 CF B8 00 01 00 00 00 80 87 04 08 CRASH(9372/016): BFFC3780: 00 00 00 00 60 21 B8 00 B0 2C B8 00 D4 CF B8 00 CRASH(9372/017): BFFC3790: 01 00 00 00 80 87 04 08 00 00 00 00 A1 87 04 08 CRASH(9372/018): BFFC37A0: 47 88 04 08 01 00 00 00 C4 37 FC BF CC 92 04 08 CRASH(9372/019): BFFC37B0: 20 93 04 08 B0 2C B8 00 BC 37 FC BF 92 9A B8 00 CRASH(9372/020): BFFC37C0: 01 00 00 00 BB A9 FF BF 00 00 00 00 C3 A9 FF BF CRASH(9372/021): BFFC37D0: E4 A9 FF BF F4 A9 FF BF FF A9 FF BF 0D AA FF BF CRASH(9372/022): BFFC37E0: 34 AA FF BF 51 AA FF BF 79 AA FF BF 95 AA FF BF CRASH(9372/023): BFFC37F0: A7 AA FF BF B8 AA FF BF CE AA FF BF EC AA FF BF CRASH(9372/024): BFFC3800: F5 AA FF BF 04 AB FF BF C7 AC FF BF CRASH(9372/025): STACK TRACE CRASH(9372/026): ./a.out[0x804889c] CRASH(9372/027): ./a.out[0x8049078] CRASH(9372/028): /lib/tls/libc.so.6[0xbb79b0] CRASH(9372/029): ./a.out[0x804887c] CRASH(9372/030): /lib/tls/libc.so.6(__libc_start_main+0xd3)[0xba4e93] CRASH(9372/031): ./a.out[0x80487a1] CRASH(9372/032): DONE
And from there, we can load up the program and do some disassembly:
[spc]lucy:/tmp>gdb a.out GNU gdb Red Hat Linux (6.3.0.0-1.132.EL4rh) Copyright 2004 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "i386-redhat-linux-gnu"…Using host libthread_db library "/lib/tls/libthread_db.so.1". (gdb) disassemble 0x804883d Dump of assembler code for function foo: 0x08048828 <foo+0>: push %ebp 0x08048829 <foo+1>: mov %esp,%ebp 0x0804882b <foo+3>: sub $0x4,%esp 0x0804882e <foo+6>: mov 0x8(%ebp),%edx 0x08048831 <foo+9>: lea 0xc(%ebp),%eax 0x08048834 <foo+12>: mov %eax,0xfffffffc(%ebp) 0x08048837 <foo+15>: mov %edx,%eax 0x08048839 <foo+17>: mov 0xfffffffc(%ebp),%ecx 0x0804883c <foo+20>: cltd 0x0804883d <foo+21>: idivl (%ecx) 0x0804883f <foo+23>: mov %eax,0xfffffffc(%ebp) 0x08048842 <foo+26>: mov 0xfffffffc(%ebp),%eax 0x08048845 <foo+29>: leave 0x08048846 <foo+30>: ret End of assembler dump. (gdb)
It faulted on the IDIV
instruction, but it wasn't
technically an “integer division-by-zero.” The Intel 80386 (and the
Pentium™ in my computer is little more than a glorified Intel 80386)
book I have describes IDIV
as:
An 80386 interrupt zero (0) [which is reported as an “Integer division-by-zero”] is taken if a zero divisor or a quotient too large for the destination register is generated. [emphasis added]
Now, EAX
is -2,147,483,648 (80000000 in hexadecimal notation,
which can be represented in 32-bits (we're running 32-bit code here—the
issue still happens on 64-bit systems but the value will be vastly larger),
but -2,147,483,648 divided by -1 should be 2,147,483,648, but 2,147,483,648
cannot be respresented in 32-bits [Technically, the value can be represented in 32
bits, but the instruction in question, IDIV
is a
signed instruction, and because of the way Intel does signed
integer math, the signed quantity 2,147,483,648 cannot be
represented as a 32-bit signed quantity in 32-bits. —Editor] and
thus, because the quotient is then considered “too large” we get the fault
which ends the program.
This is fine as far as C goes, because C says such behavior is “undefined” and thus, anything goes.
Simple once you know what's going on.
(And for the 99% of my readership who don't get the NetHack reference in the title … )