Monday, September 07, 2015
Some more usless µbenchmarks checking for integer overflow
Using the INTO
instruction to check for overflow was dog slow,
so what about using JO
?
Will that be slow?
The results speak for themselves (reminder—the expressions are compiled and run 1,000,000 times):
overflow | method | time | result |
---|---|---|---|
true | INTO | 0.009080000 | 1 |
true | JO | 0.006808000 | 1 |
false | - | 0.005938000 | 1 |
overflow | method | time | result |
---|---|---|---|
true | INTO | 0.079844000 | 46 |
true | JO | 0.030274000 | 46 |
false | - | 0.030245000 | 46 |
Even though the code using the JO
instruction is longer than either version:
xor eax,eax mov ax,0x1 add ax,1 jo error add ax,1 jo error add ax,1 jo error add ax,1 jo error add ax,1 jo error imul 100 jo error mov bx,13 cwd idiv bx jo error mov [$0804F58E],ax ret error: into ret
it performed about the same as the non-overflow checking version.
That's probably due to the branch prediction having very little overhead on performance.
One thing to notice,
however,
is that were a compiler to go down this path and check explicitely for overflow,
not only would the code be larger,
but overall it might be a bit slower than normal as there are commonly used optimizations
(at least on the x86 architecture)
that cannot be used.
For instance,
a cheap way to multiply a value by 5 is to skip the IMUL
instruction and instead do LEA EAX,[EAX*4 + EAX]
,
but the LEA
does not set the overflow flag.
Doing three INC EAX
in a row is smaller (and just as fast) as doing ADD EAX,3
,
but while the INC
instruction does set the overflow flag,
you have to check the flag after each INC
or you could miss an actual overflow,
which defeats the purpose of using INC
to generate smaller code.
And one more thing before I go,
and this is about DynASM—it's not stated anywhere,
but if you use local labels,
you have to call
dasm_setupglobal()
or else the program will crash.
I found this out the hard way.