Wednesday, November 28, 2007
Stupid multithreaded benchmarks
Dan Lyke ran a
stupid benchmark—just incrementing a variable to a billion, one just
flat out, and one between some pthreads
locking primitives.
I wanted to see what stupid results I would get if I used spin locks. First, the relevant bit
of code (used nasm
to compile it under a
2.6GHz dual-core Pentium running Linux
2.6):
bits 32 global t1 global t2 section .data gv dd 0 glock dd 0 section .text align 16 t1: mov eax,[gv] inc eax mov [gv],eax cmp eax,1000000000 jl t1 ; ret ; the following implements _exit() ; for the multithreaded version xor ebx,ebx mov eax,252 int $80 mov eax,1 int $80 hlt align 16 t2: mov al,1 t2.wait: xchg al,[glock] or al,al jne t2.wait t2.here: mov eax,[gv] inc eax mov [gv],eax mov byte [glock],0 cmp eax,1000000000 jl t2 ;ret xor ebx,ebx mov eax,252 int $80 mov eax,1 int $80 hlt
Straightforward implementations here. t1()
is the straight
through counting routine, while t2()
is the one with the spin
lock. Running single threaded yielded these results:
routine | time to execute |
---|---|
t1() | 2.454s |
t2() | 39.752s |
While I expected the spin lock to be faster than the pthread
locking, I wasn't expecting it to be this slow. But maybe, just
maybe, I'll get some of that speed back by running dual threads. At the
very least, it should be a bit faster than single core, right?
Right?
routine | time to execute |
---|---|
t1() | 0m10.334s |
t2() | 2m31.307s |
Um …
Wow.
I didn't expect spinlocks to be so expensive.
Ouch.