I'm terribly upset that GCC didn't start NetHack

Tuesday, January 29, 2013

Ah yes, undefined behavior of C. It's easy to see in retrospect, but it's still a bit surprising. The code:

#include <limits.h>

int foo(int a,int b)
{
  return a / b;
}

int main(void)
{
  return foo(INT_MIN,-1);
}

And when you compile with the right options …

[spc]lucy:/tmp>gcc -std=c99 -O0 crash.c
[spc]lucy:/tmp>./a.out
Floating point exception (core dumped)
[spc]lucy:/tmp>

What's actually going on here? Well, I compiled the code with crashreport(), so I could capture the crash:

CRASH(9372/000): pid=9372 signal='Floating point exception'
CRASH(9372/001): reason='Integer divide-by-zero'
CRASH(9372/002): pc=0x804883d
CRASH(9372/003): CS=0073 DS=007B ES=007B FS=0000 GS=0033
CRASH(9372/004): EIP=0804883D EFL=00010296 ESP=BFFC370C EBP=BFFC3710 ESI=BFFC37C4 EDI=BFFC3750
CRASH(9372/005): EAX=80000000 EBX=00CBBFF4 ECX=BFFC371C EDX=FFFFFFFF
CRASH(9372/006): UESP=BFFC370C TRAPNO=00000000 ERR=00000000
CRASH(9372/007): STACK DUMP
CRASH(9372/008):        BFFC370C:                                     1C 37 FC BF 
CRASH(9372/009):        BFFC3710: 38 37 FC BF 7C 88 04 08 00 00 00 80 FF FF FF FF 
CRASH(9372/010):        BFFC3720: F4 BF CB 00 F4 BF CB 00 F8 A9 04 08 F4 BF CB 00 
CRASH(9372/011):        BFFC3730: 00 00 00 00 A0 CC B8 00 98 37 FC BF 93 4E BA 00 
CRASH(9372/012):        BFFC3740: 01 00 00 00 C4 37 FC BF CC 37 FC BF 26 22 B8 00 
CRASH(9372/013):        BFFC3750: F4 BF CB 00 00 00 00 00 50 37 FC BF 98 37 FC BF 
CRASH(9372/014):        BFFC3760: 40 37 FC BF 55 4E BA 00 00 00 00 00 00 00 00 00 
CRASH(9372/015):        BFFC3770: 00 00 00 00 D4 CF B8 00 01 00 00 00 80 87 04 08 
CRASH(9372/016):        BFFC3780: 00 00 00 00 60 21 B8 00 B0 2C B8 00 D4 CF B8 00 
CRASH(9372/017):        BFFC3790: 01 00 00 00 80 87 04 08 00 00 00 00 A1 87 04 08 
CRASH(9372/018):        BFFC37A0: 47 88 04 08 01 00 00 00 C4 37 FC BF CC 92 04 08 
CRASH(9372/019):        BFFC37B0: 20 93 04 08 B0 2C B8 00 BC 37 FC BF 92 9A B8 00 
CRASH(9372/020):        BFFC37C0: 01 00 00 00 BB A9 FF BF 00 00 00 00 C3 A9 FF BF 
CRASH(9372/021):        BFFC37D0: E4 A9 FF BF F4 A9 FF BF FF A9 FF BF 0D AA FF BF 
CRASH(9372/022):        BFFC37E0: 34 AA FF BF 51 AA FF BF 79 AA FF BF 95 AA FF BF 
CRASH(9372/023):        BFFC37F0: A7 AA FF BF B8 AA FF BF CE AA FF BF EC AA FF BF 
CRASH(9372/024):        BFFC3800: F5 AA FF BF 04 AB FF BF C7 AC FF BF             
CRASH(9372/025): STACK TRACE
CRASH(9372/026):        ./a.out[0x804889c]
CRASH(9372/027):        ./a.out[0x8049078]
CRASH(9372/028):        /lib/tls/libc.so.6[0xbb79b0]
CRASH(9372/029):        ./a.out[0x804887c]
CRASH(9372/030):        /lib/tls/libc.so.6(__libc_start_main+0xd3)[0xba4e93]
CRASH(9372/031):        ./a.out[0x80487a1]
CRASH(9372/032): DONE

And from there, we can load up the program and do some disassembly:

[spc]lucy:/tmp>gdb a.out
GNU gdb Red Hat Linux (6.3.0.0-1.132.EL4rh)
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain
conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "i386-redhat-linux-gnu"…Using host libthread_db
library "/lib/tls/libthread_db.so.1".

(gdb) disassemble 0x804883d
Dump of assembler code for function foo:
0x08048828 <foo+0>:     push   %ebp
0x08048829 <foo+1>:     mov    %esp,%ebp
0x0804882b <foo+3>:     sub    $0x4,%esp
0x0804882e <foo+6>:     mov    0x8(%ebp),%edx
0x08048831 <foo+9>:     lea    0xc(%ebp),%eax
0x08048834 <foo+12>:    mov    %eax,0xfffffffc(%ebp)
0x08048837 <foo+15>:    mov    %edx,%eax
0x08048839 <foo+17>:    mov    0xfffffffc(%ebp),%ecx
0x0804883c <foo+20>:    cltd   
0x0804883d <foo+21>:    idivl  (%ecx)
0x0804883f <foo+23>:    mov    %eax,0xfffffffc(%ebp)
0x08048842 <foo+26>:    mov    0xfffffffc(%ebp),%eax
0x08048845 <foo+29>:    leave  
0x08048846 <foo+30>:    ret    
End of assembler dump.
(gdb)

It faulted on the IDIV instruction, but it wasn't technically an “integer division-by-zero.” The Intel 80386 (and the Pentium™ in my computer is little more than a glorified Intel 80386) book I have describes IDIV as:

An 80386 interrupt zero (0) [which is reported as an “Integer division-by-zero”] is taken if a zero divisor or a quotient too large for the destination register is generated. [emphasis added]

Now, EAX is -2,147,483,648 (80000000 in hexadecimal notation, which can be represented in 32-bits (we're running 32-bit code here—the issue still happens on 64-bit systems but the value will be vastly larger), but -2,147,483,648 divided by -1 should be 2,147,483,648, but 2,147,483,648 cannot be respresented in 32-bits [Technically, the value can be represented in 32 bits, but the instruction in question, IDIV is a signed instruction, and because of the way Intel does signed integer math, the signed quantity 2,147,483,648 cannot be represented as a 32-bit signed quantity in 32-bits. —Editor] and thus, because the quotient is then considered “too large” we get the fault which ends the program.

This is fine as far as C goes, because C says such behavior is “undefined” and thus, anything goes.

Simple once you know what's going on.

(And for the 99% of my readership who don't get the NetHack reference in the title … )